mkdir WORKSPACE
cd WORKSPACE #Your WORKSPACE
WORKSPACE=$(pwd)
git clone [email protected]:HSF-reco-and-software-triggers/Tracking-ML-Exa.TrkX.git ;
git clone [email protected]:ZhengGang85129/GNNforLRT.git#Metric plots
git clone [email protected]:gnnparticletracking/largeradiustracking/analysis.git;
2.Install the miniconda (If you do have the virtual enviroment, you can skip this step)
#choose one
wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh> # for Linux
wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh> # for Mac
#choose one
sh Miniconda3-latest-Linux-x86_64.sh # for Linux
sh Miniconda3-latest-MacOSX-arm64.sh # for Mac
Tracking-ML-Exa.TrkX
folder. We are going to install the necessary libraries, such as traintrack, torch etc, for the projects.source ${HOME}/miniconda3/etc/profile.d/conda.sh;
conda create --name exatrkx-gpu python=3.9;
conda activate exatrkx-gpu;
pip install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f <https://data.pyg.org/whl/torch-2.1.0+cu121.html>
cd ${WORKSPACE}/Tracking-ML-Exa.Trkx;
pip3 install -e .;
pip3 install traintrack;
pip3 install wandb;
pip3 install tensorboard;
pip3 install faiss-gpu
pip3 install pytorch_lightning==1.9.2
pip install "ruamel.yaml<0.18.0"
Project/
├── configs/
│ ├── pipeline-PU200_best.yaml #used for PU200 sample, training pipelines are written in this yaml file.
│ ├── pipeline-noPU_best.yaml #used for PU0 sample, training pipelines are written in this yaml file.
│ ├── batch_gpu_default.yaml #GPU implementation setting
│ ├── batch_cpu_default.yaml #CPU implementation setting
│ └── project_config.yaml # Some default settings for project_config.yaml
│
│
├── LightningModules/
│ ├── Embedding
│ │ ├── Models
│ │ │ ├── inference.py
│ │ │ └── layerless_embedding.py
│ │ │
│ │ ├── embedding_base.py
│ │ ├── utils.py
│ │ ├── train-noPU-best.yaml #This is associated with stage specified in configs/pipeline-noPU_best.yaml
│ │ └── train-PU200-best.yaml #This is associated with stage specified in configs/pipeline-PU200_best.yaml
│ │
│ │
│ ├── Filter
│ │ ├── Models
│ │ │ ├── pyramid_filter.py
│ │ │ ├── vanilla_filter.py
│ │ │ └── inference.py
│ │ │
│ │ ├── filter_base.py
│ │ ├── utils.py
│ │ ├── train-noPU-best.yaml #This is associated with stage specified in configs/pipeline-noPU_best.yaml
│ │ └── train-PU200-best.yaml #This is associated with stage specified in configs/pipeline-PU200_best.yaml
│ │
│ │
│ └── GNN
│ ├── Models
│ │ ├── agnn.py
│ │ ├── agnn_regression.py
│ │ ├── checkpoint_agnn.py
│ │ ├── gcn.py
│ │ ├── inference.py
│ │ ├── interaction_gnn.py
│ │ ├── interaction_multistep_gnn.py
│ │ └── split_checkpoint_agnn.py
│ │
│ ├── gnn_base.py
│ ├── regression_base.py
│ ├── utils.py
│ ├── train-noPU-best.yaml #This is associated with stage specified in configs/pipeline-noPU_best.yaml
│ └── train-PU200-best.yaml #This is associated with stage specified in configs/pipeline-PU200_best.yaml
│
│
│
├── evaluation # This is a directory for evaluation
│ ├── metrics #This is an empty directory, the pdf generated by plotting/plt_performance.py will be here
│ ├── tracks
│ │ ├── track_reconstruction_DBSCAN.py #DBSCAN plot
│ │ ├── track_reconstruction_DBSCAN_optimize_search.py # DBSCAN optimzed script
│ │ ├── DBSCAN_config #Contain the configuration files used for DBSCAN
│ │ ........... (ignore)
│ │
│ │
│ └── plt_performance.py # plot the output score, ROC, Cut, Purity distribution.
│
│
│
│
└── HyperOptim # Used for Hyperparameter optimization
├── Embedding # This is a directory
├── Filter # This is a directory
├── GNN # This is a directory
├── prepare_train_config.py
├── train.py
├── run_model.py
├── random_search.py
├── to_table.py
└── config_hyperparam.py
LightningModules/Embedding/train-noPU-best.yaml
: (Embedding stage)
input_dir
as the path to your sample directory. Note, the files should be npz files.output_dir
as the folder that will contain the embedding results.train_split
: the default three numbers represent the number of events in the (train, validation, test) dataset. Total # events ≥ # events for (train + validation + test) dataset.LightningModules/Filtering/train-noPU-best.yaml
: (Filter stage)
input_dir
as the path to your embedding results directory.output_dir
as the folder that will contain the filtering results.datatype_split
: the default three numbers represent the number of events in the (train, validation, test) dataset. Total # events ≥ # events for (train + validation + test) dataset.LightningModules/GNN/train-noPU-best.yaml
: (GNN stage)
input_dir
as the path to your embedding results directory.output_dir
as the folder that will contain the filtering results.datatype_split
: the default three numbers represent the number of events in the (train, validation, test) dataset. Total # events ≥ # events for (train + validation + test) dataset.WORKSPACE/Project
.
cd ${WORKSPACE}/Project;
traintrack ./configs/pipeline-noPU_best.yaml
That’s all!
3.Basically the work chains are Embedding → Filter → GNN according to the stage_list
in ./configs/pipeline-noPU_best.yaml