Skip to end of metadata
Go to start of metadata

Quick start: if you don't want to install the software yourself, you can directly use my installation like bellow:

# start an interactive job with 2 CPU and 2000M memory
srun --pty -p interactive -t 0-12:0:0 --mem 2000MB -c 2 /bin/bash

# load related module
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9

# make a testing folder and go to it
mkdir /n/scratch3/users/${USER:0:1}/$USER/HiC-Pro-test && cd /n/scratch3/users/${USER:0:1}/$USER/HiC-Pro-test

# download and untar the data
wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz
tar -xvzf HiCPro_testdata.tar.gz

# modify the config file for the test run:
nano config_test_latest.txt

# modify row 22 and set default partition to short
JOB_QUEUE = short
# modify row 23 and set email to your email address
JOB_MAIL = xxx@hms.harvard.edu
# modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes
BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes

# load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually)
module purge
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9
export PATH=$PATH:/home/ld32/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/
export PYTHONPATH=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1

# run the software in parallel mode using Slurm 
HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test_para -p

# You should see: 
Run HiC-Pro 2.10.0 parallel mode
Submitted batch job 17841812
Submitted batch job 17841813
Two jobs are all submitted. Please use sacct and your email to monitor them.

# Or if you want to run the workflow in serial mode:  
HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test

### Stop here if you don't install the software yourself.

-----------------------------------------------------------------------------------------------------------------------------------------------------------

If you want to install the software on your own, please read on:

Start an interactive job, with a walltime of 12 hours, 2000MB of memory. Load related modules:

srun --pty -p interactive -t 0-12:0:0 --mem 16G -c 2 /bin/bash
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9

Create a work directory in  home, and change into the newly-created directory:

mkdir -p /home/$USER/HiC-Pro
cd  /home/$USER/HiC-Pro

The software needs a few R and python packages that we don't have in the R and python modules. Here are the commands I use to install them. You can install your own copy or feel free to use my installations. 

If you do want to install it, please change ld32 to your username:

mkdir -p /home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1  /home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib
R -e 'install.packages(c("ggplot2", "RColorBrewer"),  repos="http://cran.us.r-project.org",  lib="/home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1")'
pip install --install-option="--prefix=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib" bx-python==0.8.1 iced==0.5.1 pysam==0.14.1 pandas==0.23.1

Set up the path, so that the downstream command can use the packages: 

export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1

Download the software, unpack:  

wget https://github.com/nservant/HiC-Pro/archive/v2.10.0.tar.gz
tar -xvzf v2.10.0.tar.gz
cd HiC-Pro-2.10.0

Modify the config file: 

# open the config file using text editor 'nano'
nano config-install.txt

# copy the following to the file. Make sure to replace my username (ld32) with your username 
PREFIX = /home/ld32/pub/HiC-Pro-2.10.0
BOWTIE2_PATH =
SAMTOOLS_PATH =
R_PATH = /home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1
PYTHON_PATH = /home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7
CLUSTER_SYS = SLURM

Finally, install the software: 

mkdir -p /home/$USER/pub/HiC-Pro-2.10.0
make configure
# disable a python package installation command, otherwise it give permission error
sed -i "s/\${PYTHON_PATH}\/python setup.py install;//g" Makefile  
make CONFIG_SYS=config-install.txt install

Download some test data and run some test runs in serial mode (without submitting additional sbatch jobs)

# download and untar the data
wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz
tar -xvzf HiCPro_testdata.tar.gz

# modify the config file for the test run:
vi config_test_latest.txt

# modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes
BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes

# load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually)
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9
export PATH=$PATH:/home/$USER/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/
export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1

# run the software in single machine mode
HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test
# you should see the software run from step1 to the last step as mentioned on page: https://github.com/nservant/HiC-Pro

# for your reference, here is the test data downloading page: https://zerkalo.curie.fr/partage/HiC-Pro/

Download some test data and run some test runs in parallel mode (submit additional sbatch jobs to run the software):  

# download and untar the data
wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz
tar -xvzf HiCPro_testdata.tar.gz

# modify the config file for the test run:
nano config_test_latest.txt

# modify row 22 and set default partition to short
JOB_QUEUE = short
# modify row 23 and set email to your email address
JOB_MAIL = xxx@hms.harvard.edu
# modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes
BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes

# load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually)
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9
export PATH=$PATH:/home/$USER/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/
export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1
# run the software in parallel mode using Slurm 
HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test_para -p
# You should see: 
--------------
Run HiC-Pro 2.10.0 parallel mode
The following command will launch the parallel workflow through 2 torque jobs:
sbatch HiCPro_step1_IMR90_split.sh
The following command will merge the processed data and run the remaining steps per sample:
sbatch HiCPro_step2_IMR90_split.sh
--------------

# Now you can submit the step1 jobs:
cd hicpro_latest_test_para 
sbatch HiCPro_step1_IMR90_split.sh

# After step1 jobs finish (you should receive email), submit the next step: 
sbatch HiCPro_step2_IMR90_split.sh 

# Or if you don't want to wait, you can also submit two steps at the same time 
# Notice the -d option, that is job dependency setting for step2 to wait for the first step: 
cd hicpro_latest_test_para
jobid=`sbatch --parsable HiCPro_step1_IMR90_split.sh`
sbatch -d afterok:$jobid  HiCPro_step2_IMR90_split.sh


# Or even better you can modify the slurm job running script to directly submit jobs for you: 
vi /home/ld32/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/scripts/make_slurm_script.sh

# Comment out the user reminder message at row 90, 91, 130 and 131 like:
# echo "The following command will launch the parallel workflow through $count torque jobs:"
# echo sbatch ${torque_script}
# echo "The following command will merge the processed data and run the remaining steps per sample:"
# echo sbatch ${torque_script_s2}


# And at the bottom of the script, add these command to submit the jobs:
jobid=`sbatch --parsable $torque_script`
echo Submitted batch job $jobid
sbatch -d afterok:$jobid $torque_script_s2
echo Two jobs are all submitted. Please use sacct and your email to monitor them.

# for your reference, here is the test data downloading page: https://zerkalo.curie.fr/partage/HiC-Pro/
  • No labels