Skip to end of metadata
Go to start of metadata

Due to Aspera license limitations, users have to install the software under their home to user the software. 

Log on to O2

If you need help connecting to O2, please review the Using Slurm Basic wiki page.

From Windows, use the graphical PuTTY program to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your eCommons ID instead of user123:

ssh user123@o2.hms.harvard.edu

Start interactive job, and create working folder

For example, for user abc123, the working directory will be

srun --pty -p interactive -t 0-12:0:0 --mem 2000MB -n 1 /bin/bash

mkdir /n/scratch3/users/a/abc123/testAspera 

cd /n/scratch3/users/a/abc123/testAspera

Download aspera and install it. You only need to do this once. 

# Download the aspera software
wget https://download.asperasoft.com/download/sw/connect/3.9.1/ibm-aspera-connect-3.9.1.171801-linux-g2.12-64.tar.gz 

# Decompress 
tar -xvzf ibm-aspera-connect-3.9.1.171801-linux-g2.12-64.tar.gz

# Install it 
sh ibm-aspera-connect-3.9.1.171801-linux-g2.12-64.sh

# Setup software license
mkdir -p ~/.ssh; ln -s ~/.aspera/connect/etc/asperaweb_id_dsa.openssh ~/.ssh/

# Setup path
export PATH=~/.aspera/connect/bin:$PATH

# To make sure the path is automatically available once you login laster on, add the command to ~/.bashrc
echo export PATH=~/.aspera/connect/bin:\$PATH >> $HOME/.bashrc


# Set sratoolkit to use /n/scratch3/users/a/abc123/ncbi as cache space
# By default, sratoolkit uses /home/$USER/ncbi as cache. If you download multiple data set, your 100G home space will be filled up quickly. 
mkdir -p ~/.ncbi
echo /repository/user/main/public/root = \"/n/scratch3/users/a/abc123/ncbi\" >> ~/.ncbi/user-settings.mkfg

# Make sure the ascp command is available now
which ascp

Directly use ascp to download sra data to current working directory and convert to .fastq (There is another way to download, see below)

#Connect to Transfer Cluster
#Please note: Jobs cannot be launched from Transfer Cluster, for longer transfers using Aspera it is best to use Screen to maintain a connection
ssh ecommon@transfer.rc.hms.harvard.edu

# Find out the ftp url link for the data you want to download, for example:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR513/SRR5138775/SRR5138775.sra

# Remove the protocol part (ftp://), and add the ':' to the address above and submit ascp command:
ascp -i asperaweb_id_dsa.openssh -pQTk1 -l 450m anonftp@ftp-trace.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR513/SRR5138775/SRR5138775.sra .

# Load sratookit module 
module load sratoolkit/2.9.0

fastq-dump --split-files  SRR5138775.sra

Use sratoolkit prefetch, which try ascp then http,  to download sra data, then convert the data from .sra to .fastq format

# Load sratookit module 
module load sratoolkit/2.9.0

# Use prefetch to download SRA file. 
prefetch -v SRR5138775

# Convert SRA file to FASTQ with fastq-dump.
fastq-dump --split-files SRR5138775

# Set sratoolkit to use /n/scratch3/users/a/abc123/ncbi as cache space
# By default, sratoolkit uses /home/$USER/ncbi as cache. If you download multiple data set, your 100G home space will be filled up quickly.
mkdir -p ~/.ncbi
echo /repository/user/main/public/root = \"/n/scratch3/users/a/abc123/ncbi\" >> ~/.ncbi/user-settings.mkfg


# Note: The default maximum file size is 20G. When downloading large file more than 20G, it gives error: 
prefetch -v SRR7890863
...
2020-05-27T17:58:24 prefetch.2.9.0 warn: Maximum file size download limit is 20GB
2020-05-27T17:58:24 prefetch.2.9.0: 1) 'SRR7890863' (29GB) is larger than maximum allowed: skipped
2020-05-27T17:58:25 prefetch.2.9.0: 'SRR7890863' has no remote vdbcache
...
Download of some files was skipped because they are too large
You can change size download limit by setting
--min-size and --max-size command line arguments


# You can add --max-size 35G: 
prefetch -v --max-size 35G SRR7890863



  • No labels