Skip to end of metadata
Go to start of metadata

Due to Aspera license limitations, users have to install the software under their home to user the software. 

Log on to O2

If you need help connecting to O2, please review the Using Slurm Basic wiki page.

From Windows, use the graphical PuTTY program to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your eCommons ID instead of user123:

ssh user123@o2.hms.harvard.edu

Start interactive job, and create working folder

For example, for user abc123, the working directory will be

srun --pty -p interactive -t 0-12:0:0 --mem 2000MB -n 1 /bin/bash

mkdir /n/scratch3/users/a/abc123/testAspera 

cd /n/scratch3/users/a/abc123/testAspera

Download aspera and install it. You only need to do this once. 

# Download the aspera software
wget https://download.asperasoft.com/download/sw/connect/3.9.1/ibm-aspera-connect-3.9.1.171801-linux-g2.12-64.tar.gz 

# Decompress 
tar -xvzf ibm-aspera-connect-3.9.1.171801-linux-g2.12-64.tar.gz

# Install it 
sh ibm-aspera-connect-3.9.1.171801-linux-g2.12-64.sh

# Setup software license
mkdir -p ~/.ssh; ln -s ~/.aspera/connect/etc/asperaweb_id_dsa.openssh ~/.ssh/

# Setup path
export PATH=~/.aspera/connect/bin:$PATH

# To make sure the path is automatically available once you login laster on, add the command to ~/.bashrc
echo export PATH=~/.aspera/connect/bin:\$PATH >> $HOME/.bashrc


# Set sratoolkit to use /n/scratch3/users/a/abc123/ncbi as cache space
# By default, sratoolkit uses /home/$USER/ncbi as cache. If you download multiple data set, your 100G home space will be filled up quickly. 
mkdir -p ~/.ncbi
echo /repository/user/main/public/root = \"/n/scratch3/users/a/abc123/ncbi\" >> ~/.ncbi/user-settings.mkfg

# Make sure the ascp command is available now
which ascp


Use sratoolkit prefetch, which tries ascp then http,  to download sra data, then convert the data from .sra to .fastq format

# Load sratookit module 
module load sratoolkit/2.9.0

# Use prefetch to download SRA file. 
prefetch -v SRR5138775

# Convert SRA file to FASTQ with fastq-dump.
fastq-dump --split-files SRR5138775

# Set sratoolkit to use /n/scratch3/users/a/abc123/ncbi as cache space
# By default, sratoolkit uses /home/$USER/ncbi as cache. If you download multiple data set, your 100G home space will be filled up quickly.
mkdir -p ~/.ncbi
echo /repository/user/main/public/root = \"/n/scratch3/users/a/abc123/ncbi\" >> ~/.ncbi/user-settings.mkfg


# Note: The default maximum file size is 20G. When downloading large file more than 20G, it gives error: 
prefetch -v SRR7890863
...
2020-05-27T17:58:24 prefetch.2.9.0 warn: Maximum file size download limit is 20GB
2020-05-27T17:58:24 prefetch.2.9.0: 1) 'SRR7890863' (29GB) is larger than maximum allowed: skipped
2020-05-27T17:58:25 prefetch.2.9.0: 'SRR7890863' has no remote vdbcache
...
Download of some files was skipped because they are too large
You can change size download limit by setting
--min-size and --max-size command line arguments


# You can add --max-size 35G: 
prefetch -v --max-size 35G SRR7890863

Additional tips: 

  1. If you need download a lot of data, run screen command before starting interactive job, to keep the session alive: 
    screen: Keep Linux Sessions Alive (so you can go back to the same terminal window from anywhere, anytime)
  2. If you a lot of samples to download, running prefetch command one by one is a lot of work. To automate the process, you can find the accession IDs from the website and put them in a loop to download one by one.  For example to download SRR6519510 to SRR6519519:
    for i in {6519510..6519519}; do
         prefetch SRR$i;
    done
  3. If you have more than a dozens of samples to download, running one by one needs lot of time. You can run them in parallel, For example you submit 5 jobs, let each job work on 100 accession IDs. Because these 5 jobs share the same network from O2 to NCBI cloud, these parallel prefetch commands will run slower than in serial mode. Not sure how much slower. Please share your experience.
  4. Let us know if you have any questions.
  • No labels