Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
technical:recipes:gnnunlock [2024-07-01 14:22] – frey | technical:recipes:gnnunlock [2024-07-05 11:15] (current) – [Build the C++ training program] frey | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Installing GNNUnlock on Caviness ====== | ||
+ | The GNNUnlock Python code (available on [[https:// | ||
+ | |||
+ | - Create a Python virtual environment with the required TensorFlow dependencies | ||
+ | - Compile the GraphSAINT binary components in the virtual environment | ||
+ | - Optionally compile the GraphSAINT C++ training program | ||
+ | - Create a VALET package definition to manage the GNNUnlock virtual environment(s) | ||
+ | |||
+ | In the resulting virtual environment the following tasks can be performed: | ||
+ | |||
+ | * Download e.g. Reddit training data from Google drive and process with the C++ training program | ||
+ | * Convert the example GNNUnlock circuits data to graph format for GNNUnlock | ||
+ | * Perform GNNUnlock training with converted circuits data | ||
+ | |||
+ | |||
+ | ===== Create the virtual environment ===== | ||
+ | |||
+ | The GNNUnlock C++ program is best-compiled using Intel compilers and the MKL library. | ||
+ | |||
+ | In this recipe a versioned software directory hierarchy will be created for GNNUnlock in the user's home directory. | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness ~]$ GNNUNLOCK_PREFIX=~/ | ||
+ | [frey@login01.caviness ~]$ GNNUNLOCK_VERSION=2024.07.01 | ||
+ | [frey@login01.caviness ~]$ vpkg_require intel-oneapi/ | ||
+ | [frey@login01.caviness ~]$ rm -rf ~/ | ||
+ | [frey@login01.caviness ~]$ conda create --prefix " | ||
+ | --override-channels --channel intel --channel anaconda \ | ||
+ | python'> | ||
+ | tensorflow' | ||
+ | cython'> | ||
+ | numpy'> | ||
+ | scipy'> | ||
+ | scikit-learn'> | ||
+ | pyyaml'> | ||
+ | [frey@login01.caviness ~]$ conda activate " | ||
+ | </ | ||
+ | |||
+ | The conda cache is removed to prevent existing downloaded packages from interfering with what's available online and to keep the user's home directory from growing too large. | ||
+ | |||
+ | ===== Clone the source code ===== | ||
+ | |||
+ | The source repositories for both GNNUnlock and GraphSAINT will be cloned into the virtualenv directory itself, starting with GNNUnlock: | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness ~]$ cd " | ||
+ | [frey@login01.caviness 2024.07.01]$ git clone https:// | ||
+ | [frey@login01.caviness 2024.07.01]$ cd GNNUnlock | ||
+ | </ | ||
+ | |||
+ | The examples presented in the GNNUnlock documentation assume that GraphSAINT has been cloned as a sub-directory of the GNNUnlock directory: | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness GNNUnlock]$ git clone https:// | ||
+ | [frey@login01.caviness GNNUnlock]$ pushd GraphSAINT | ||
+ | [frey@login01.caviness GraphSAINT]$ | ||
+ | </ | ||
+ | |||
+ | At this point the GraphSAINT repository is the current working directory. | ||
+ | |||
+ | ===== Build GraphSAINT binary components ===== | ||
+ | |||
+ | GraphSAINT includes several binary (cython) components that must be compiled in the current virtualenv. | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness GraphSAINT]$ python graphsaint/ | ||
+ | </ | ||
+ | |||
+ | The binary components are installed in the '' | ||
+ | |||
+ | ===== Build the C++ training program ===== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | Since we wish to use Intel oneAPI compilers and MKL, the '' | ||
+ | |||
+ | <code diff> | ||
+ | --- A/ | ||
+ | +++ B/ | ||
+ | @@ -1,8 +1,8 @@ | ||
+ | -CC=icc | ||
+ | +CC=icpx | ||
+ | | ||
+ | | ||
+ | -LIBS=-L${MKLROOT}/ | ||
+ | -CFLAGS=-I${IDIR} -I${MKLROOT}/ | ||
+ | +LIBS= | ||
+ | +CFLAGS=-I${IDIR} -qmkl=parallel -qopenmp -pthread -Wall -O3 --std=c++11 | ||
+ | |||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | The patch gets applied in the source directory: | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness GraphSAINT]$ pushd ipdps19_cpp | ||
+ | [frey@login01.caviness ipdps19_cpp]$ patch -p1 < ../ | ||
+ | [frey@login01.caviness ipdps19_cpp]$ make | ||
+ | [frey@login01.caviness ipdps19_cpp]$ install train " | ||
+ | </ | ||
+ | |||
+ | The compiled program is installed in the '' | ||
+ | |||
+ | Note the value of two environment variables used in this recipe before exiting and proceeding to the next section: | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness ipdps19_cpp]$ echo $GNNUNLOCK_PREFIX | ||
+ | / | ||
+ | |||
+ | [frey@login01.caviness ipdps19_cpp]$ echo $GNNUNLOCK_VERSION | ||
+ | 2024.07.01 | ||
+ | </ | ||
+ | |||
+ | ===== VALET package definition ===== | ||
+ | |||
+ | Before going any further, a VALET package definition file should be created to facilitate the use of GNNUnlock in the future. | ||
+ | |||
+ | Recall that '' | ||
+ | |||
+ | <file yaml gnnunlock.vpkg_yaml> | ||
+ | gnnunlock: | ||
+ | prefix: «GNNUNLOCK_PREFIX» | ||
+ | description: | ||
+ | url: " | ||
+ | | ||
+ | versions: | ||
+ | " | ||
+ | description: | ||
+ | dependencies: | ||
+ | - intel-oneapi/ | ||
+ | actions: | ||
+ | - action: source | ||
+ | script: | ||
+ | sh: intel-python.sh | ||
+ | - variable: GNNUNLOCK_DIR | ||
+ | value: ${VALET_PATH_PREFIX}/ | ||
+ | - variable: GRAPHSAINT_DIR | ||
+ | value: ${VALET_PATH_PREFIX}/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | The package can be added to the environment of a new login shell: | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness ~]$ vpkg_require gnnunlock/ | ||
+ | Adding dependency `binutils/ | ||
+ | Adding dependency `gcc/ | ||
+ | Adding dependency `intel-oneapi/ | ||
+ | Adding package `gnnunlock/ | ||
+ | </ | ||
+ | |||
+ | The C++ training program is available as expected where it was installed: | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness ~]$ which ipdps19-train | ||
+ | ~/ | ||
+ | </ | ||
+ | |||
+ | The GNNUnlock and GraphSAINT repositories are easily referenced using the '' | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness ~]$ cd $GRAPHSAINT_DIR | ||
+ | [frey@login00.caviness GraphSAINT]$ pwd | ||
+ | / | ||
+ | |||
+ | [frey@login00.caviness ~]$ cd $GNNUNLOCK_DIR | ||
+ | [frey@login00.caviness GNNUNLOCK_DIR]$ pwd | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | At this point the shell is in the appropriate working directory for the GNNUnlock example. | ||
+ | |||
+ | ===== Examples ===== | ||
+ | |||
+ | <WRAP center round important 60%> | ||
+ | The use of a login node in this recipe is purely for illustrative purposes. | ||
+ | </ | ||
+ | |||
+ | ==== TensorFlow and Python ==== | ||
+ | |||
+ | The GNNUnlock repository includes example circuit data that must be transformed to a graph format before GNNUnlock can be executed. | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness GNNUnlock]$ mkdir -p Netlist_to_graph/ | ||
+ | [frey@login01.caviness GNNUnlock]$ pushd Netlist_to_graph/ | ||
+ | [frey@login01.caviness anti_sat_iscas_c7552]$ cp ../ | ||
+ | [frey@login01.caviness anti_sat_iscas_c7552]$ perl ../ | ||
+ | Can't locate / | ||
+ | </ | ||
+ | |||
+ | The documentation //did state// that line 6 of that Perl script must be modified, but rather than changing it to the absolute path at which '' | ||
+ | |||
+ | <code perl> | ||
+ | require " | ||
+ | </ | ||
+ | |||
+ | This instructs Perl to read the module file '' | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness anti_sat_iscas_c7552]$ ln -s ../ | ||
+ | [frey@login01.caviness anti_sat_iscas_c7552]$ perl ../ | ||
+ | AntiSAT_bench_to_graph.pl | ||
+ | | ||
+ | Lilas Alrahis < | ||
+ | NYUAD, Abu Dhabi, UAE | ||
+ | |||
+ | ' | ||
+ | |||
+ | |||
+ | Program completed in 443 sec without error. | ||
+ | </ | ||
+ | |||
+ | The same " | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login01.caviness anti_sat_iscas_c7552]$ python graph_parser.py | ||
+ | </ | ||
+ | |||
+ | At long last, the GraphSAINT program can be used to train with the graph data. | ||
+ | |||
+ | <WRAP center round important 60%> | ||
+ | All execution of GraphSAINT code (in both the GraphSAINT and GNNUnlock documentation) must be made from the GraphSAINT repository directory. | ||
+ | </ | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness anti_sat_iscas_c7552]$ cd $GRAPHSAINT_DIR | ||
+ | [frey@login00.caviness GraphSAINT]$ python -m graphsaint.tensorflow_version.train \ | ||
+ | --data_prefix ../ | ||
+ | --train_config ../ | ||
+ | </ | ||
+ | |||
+ | Circa 40 iterations into the training, the program was actively-occupying around 3.5 GiB of memory and utilizing all 36 cores in the node: | ||
+ | |||
+ | < | ||
+ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | ||
+ | 2893 frey 20 0 30.941g 3.479g | ||
+ | </ | ||
+ | |||
+ | Memory usage appears to continually increase as training proceeds, so users are encouraged to benchmark and properly-budget memory requests for GNNUnlock jobs. | ||
+ | |||
+ | ==== C++ train ==== | ||
+ | |||
+ | The C++ training program was tested with Reddit data available in the [[https:// | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness ~]$ mkdir ~/ | ||
+ | [frey@login00.caviness ~]$ mv ~/ | ||
+ | [frey@login00.caviness ~]$ cd ~/ | ||
+ | [frey@login00.caviness data_cpp]$ unzip reddit-20240701T143527Z-001.zip | ||
+ | [frey@login00.caviness data_cpp]$ ls -l reddit | ||
+ | total 1236252 | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone 1121959440 Jan 20 2020 feats_norm_col.bin | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | -rw-r--r-- 1 frey everyone | ||
+ | </ | ||
+ | |||
+ | Training must be effected from the '' | ||
+ | |||
+ | <code bash> | ||
+ | [frey@login00.caviness ~]$ vpkg_require gnnunlock/ | ||
+ | Adding dependency `binutils/ | ||
+ | Adding dependency `gcc/ | ||
+ | Adding dependency `intel-oneapi/ | ||
+ | Adding package `gnnunlock/ | ||
+ | |||
+ | [frey@login00.caviness data_cpp]$ ipdps19-train reddit 5 4 softmax | ||
+ | OMP: Info #277: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. | ||
+ | ============ | ||
+ | ITERATION 0 | ||
+ | ============ | ||
+ | Sampling 4 subgraphs. | ||
+ | Thread 0 doubling from 207000 to 414000. | ||
+ | Thread 3 doubling from 207000 to 414000. | ||
+ | Thread 1 doubling from 207000 to 414000. | ||
+ | Thread 2 doubling from 207000 to 414000. | ||
+ | thread 0 finish in 113ms while pre use 4ms and post use 91ms. | ||
+ | thread 2 finish in 155ms while pre use 6ms and post use 118ms. | ||
+ | thread 1 finish in 159ms while pre use 7ms and post use 122ms. | ||
+ | thread 3 finish in 159ms while pre use 6ms and post use 123ms. | ||
+ | Sampling: total time 0.16187406s. | ||
+ | Training itr 0 f1_mic: 0.034096, f1_mac: 0.019856 | ||
+ | ============ | ||
+ | ITERATION 1 | ||
+ | ============ | ||
+ | Training itr 1 f1_mic: 0.206164, f1_mac: 0.050644 | ||
+ | ============ | ||
+ | ITERATION 2 | ||
+ | ============ | ||
+ | Training itr 2 f1_mic: 0.233685, f1_mac: 0.061633 | ||
+ | ============ | ||
+ | ITERATION 3 | ||
+ | ============ | ||
+ | Training itr 3 f1_mic: 0.253775, f1_mac: 0.060568 | ||
+ | ============ | ||
+ | ITERATION 4 | ||
+ | ============ | ||
+ | Sampling 4 subgraphs. | ||
+ | Thread 3 doubling from 207000 to 414000. | ||
+ | Thread 1 doubling from 207000 to 414000. | ||
+ | Thread 0 doubling from 207000 to 414000. | ||
+ | Thread 2 doubling from 207000 to 414000. | ||
+ | thread 2 finish in 109ms while pre use 1ms and post use 89ms. | ||
+ | thread 3 finish in 110ms while pre use 2ms and post use 92ms. | ||
+ | thread 1 finish in 111ms while pre use 2ms and post use 92ms. | ||
+ | thread 0 finish in 111ms while pre use 3ms and post use 92ms. | ||
+ | Sampling: total time 0.11241198s. | ||
+ | Training itr 4 f1_mic: 0.297525, f1_mac: 0.080492 | ||
+ | -------------------- | ||
+ | DENSE time: 0.451507 | ||
+ | SPARSE time: 0.226233 | ||
+ | RELU time: 0.037294 | ||
+ | NORM time: 0.069778 | ||
+ | LOOKUP time: 0.096633 | ||
+ | BIAS time: 0.006502 | ||
+ | MASK time: 0.002519 | ||
+ | REDUCE time: 0.004366 | ||
+ | SIGMOID time: 0.000000 | ||
+ | SOFTMAX time: 0.000000 | ||
+ | -------------------- | ||
+ | Testing f1_mic: 0.365237, f1_mac: 0.107992 | ||
+ | </ | ||
+ | |||
+ | The OMP warning indicates that the C++ code uses an OpenMP API that was part of an older OpenMP standard; the function in question still works as expected, but is likely to be removed in future releases of OpenMP. |