Installation

  1. You have to clone the TrackML , our dedicated(for training) , and analysis(only for evaluation) repository:
mkdir WORKSPACE
cd WORKSPACE #Your WORKSPACE
WORKSPACE=$(pwd)
git clone [email protected]:HSF-reco-and-software-triggers/Tracking-ML-Exa.TrkX.git ;
git clone [email protected]:ZhengGang85129/GNNforLRT.git#Metric plots
git clone [email protected]:gnnparticletracking/largeradiustracking/analysis.git;

2.Install the miniconda (If you do have the virtual enviroment, you can skip this step)

#choose one
wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh> # for Linux
wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh> # for Mac
#choose one
sh Miniconda3-latest-Linux-x86_64.sh # for Linux
sh Miniconda3-latest-MacOSX-arm64.sh # for Mac
  1. Now, please cd to Tracking-ML-Exa.TrkX folder. We are going to install the necessary libraries, such as traintrack, torch etc, for the projects.
source ${HOME}/miniconda3/etc/profile.d/conda.sh;
conda create --name exatrkx-gpu python=3.9;
conda activate exatrkx-gpu;
pip install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f <https://data.pyg.org/whl/torch-2.1.0+cu121.html>
cd ${WORKSPACE}/Tracking-ML-Exa.Trkx;
pip3 install -e .;
pip3 install traintrack;
pip3 install wandb;
pip3 install tensorboard;
pip3 install faiss-gpu
pip3 install pytorch_lightning==1.9.2
pip install "ruamel.yaml<0.18.0"

Introduction To Our Repository— GNNforLRT

Repository Structure

  1. Our repository are structured as following:
Project/
├── configs/
│   ├── pipeline-PU200_best.yaml #used for PU200 sample, training pipelines are written in this yaml file.
│   ├── pipeline-noPU_best.yaml #used for PU0 sample, training pipelines are written in this yaml file.
│   ├── batch_gpu_default.yaml #GPU implementation setting
│   ├── batch_cpu_default.yaml #CPU implementation setting
│   └── project_config.yaml # Some default settings for project_config.yaml
│ 
│ 
├── LightningModules/
│   ├── Embedding
│   │   ├── Models
│   │   │     ├── inference.py
│   │   │     └── layerless_embedding.py
│   │   │     
│   │   ├──	embedding_base.py
│   │   ├──	utils.py 
│   │   ├──	train-noPU-best.yaml #This is associated with stage specified in configs/pipeline-noPU_best.yaml
│   │   └── train-PU200-best.yaml #This is associated with stage specified in configs/pipeline-PU200_best.yaml
│   │   
│   │   
│   ├── Filter
│   │   ├── Models
│   │   │     ├── pyramid_filter.py
│   │   │     ├── vanilla_filter.py
│   │   │     └── inference.py
│   │   │     
│   │   ├──	filter_base.py
│   │   ├──	utils.py
│   │   ├──	train-noPU-best.yaml #This is associated with stage specified in configs/pipeline-noPU_best.yaml
│   │   └── train-PU200-best.yaml #This is associated with stage specified in configs/pipeline-PU200_best.yaml
│   │   
│   │   
│   └── GNN
│       ├── Models
│       │     ├── agnn.py
│       │     ├── agnn_regression.py
│       │     ├── checkpoint_agnn.py
│       │     ├── gcn.py
│       │     ├── inference.py
│       │     ├── interaction_gnn.py
│       │     ├── interaction_multistep_gnn.py
│       │     └── split_checkpoint_agnn.py
│       │     
│       ├──	gnn_base.py
│       ├──	regression_base.py
│       ├──	utils.py
│       ├──	train-noPU-best.yaml #This is associated with stage specified in configs/pipeline-noPU_best.yaml
│       └── train-PU200-best.yaml #This is associated with stage specified in configs/pipeline-PU200_best.yaml
│      
│
│ 
├── evaluation # This is a directory for evaluation
│   ├── metrics  #This is an empty directory, the pdf generated by plotting/plt_performance.py will be here
│   ├── tracks
│ 	│   ├── track_reconstruction_DBSCAN.py #DBSCAN plot
│ 	│   ├── track_reconstruction_DBSCAN_optimize_search.py # DBSCAN optimzed script 
│ 	│   ├── DBSCAN_config #Contain the configuration files used for DBSCAN
│ 	│   ........... (ignore)
│ 	│
│ 	│
│   └──  plt_performance.py  # plot the output score, ROC, Cut, Purity distribution.
│
│
│
│    
└── HyperOptim # Used for Hyperparameter optimization
    ├── Embedding # This is a directory
    ├── Filter # This is a directory
    ├── GNN # This is a directory
    ├── prepare_train_config.py
    ├── train.py
    ├── run_model.py
    ├── random_search.py
    ├── to_table.py
    └── config_hyperparam.py

Model Training

  1. Training is easy to implement; however, you still need to ensure several things. Here, I use PU = 0 samples as an example; you can implement similar procedures for PU = 200 samples. The following three YAML files contain hyperparameters for our model, basic information about the sample, and details for importing the necessary module for model training/inference.
    1. LightningModules/Embedding/train-noPU-best.yaml: (Embedding stage)
      1. Specify input_dir as the path to your sample directory. Note, the files should be npz files.
      2. Specify output_dir as the folder that will contain the embedding results.
      3. train_split: the default three numbers represent the number of events in the (train, validation, test) dataset. Total # events ≥ # events for (train + validation + test) dataset.
    2. LightningModules/Filtering/train-noPU-best.yaml: (Filter stage)
      1. Specify input_dir as the path to your embedding results directory.
      2. Specify output_dir as the folder that will contain the filtering results.
      3. datatype_split: the default three numbers represent the number of events in the (train, validation, test) dataset. Total # events ≥ # events for (train + validation + test) dataset.
    3. LightningModules/GNN/train-noPU-best.yaml: (GNN stage)
      1. Specify input_dir as the path to your embedding results directory.
      2. Specify output_dir as the folder that will contain the filtering results.
      3. datatype_split: the default three numbers represent the number of events in the (train, validation, test) dataset. Total # events ≥ # events for (train + validation + test) dataset.
    4. The hyper-parameters of our model in these YAML files have already been optimized by random scan approaches stage by stage.
  2. If you have made sure everything is ready, then please implement the following command to run the training+inference for models in WORKSPACE/Project.

cd ${WORKSPACE}/Project;

traintrack ./configs/pipeline-noPU_best.yaml

That’s all!

3.Basically the work chains are Embedding → Filter → GNN according to the stage_list in ./configs/pipeline-noPU_best.yaml

Model Evaluation