- How can I change my password?
Use the passwd command once connected to genotoul.toulouse.inra.fr
- Which operating system is required to connect via SSH to the platform?
All systems implementing SSH can be used to connect your self to genotoul.toulouse.inra.fr (Windows, Linux, MacOS).
- How to transfer your files from/to the platform?
From your Windows desktop to the server Unix (genotoul.toulouse.inra.fr), transfer your files using a secure program such as WinSCP or Filezilla. With Filezilla you have to set the transfer port to 22.
For WinSCP :
Enter the host address, login and password and connection type: SCP
Frame 1 corresponds to the directory tree on your local desktop.
Frame 2 corresponds to your directory tree on genotoul.toulouse.inra.fr.
To transfer files drag them from one frame to the other.
- How to add auto-completion on your account?
Connect yourself to genotoul.toulouse.inra.fr. Edit the .cshrc file and add the following commands:
Then save, log out and then re-connect.
- How to access the platform?
You access the platform either through the website (public access) or via a command line SSH connection. To access via SSH, you must have a user account. To request a user account please complete the following form: Create Account
- How to connect to the platform under MS-Windows?
Download Xming and Putty. Only the Xming installation requires administrator rights.
Launch putty by double-clicking on the icone. Once started, fill in the Host Name by genotoul.toulouse.inra.fr.
Then configure the X11 forwarding in Connection> SSH> X11 as show
You can save the configuration using the save button
Once the configuration is complete, click Open to start a session
Enter your unix login and password (use this page to ask for an account)
You are now in your Unix environment.
For more information about the configuration of putty refer to the online documentation.
- Where I can find OpenGridSheduler documentation ?
- How can I know my quota usage on /work directory ?
Use the following command line (on genotoul server):
# mmlsquota -u user_name
- How can I lanch java onto the nodes ?
$ java -Xmx4g
- How can I book more than 1 CPU ?
With default parameters, each job is limited to 1 CPU. To book more, use the following options:
# Book n slots on the same node (up to 46)
$ qsub -pe parallel_smp n mysqcript.sh
# Book n slots on any nodes (could be the same)
$ qsub -pe parallel_fill n mysqcript.sh
# Book n slots on strictly different nodes
$ qsub -pe parallel_rr n mysqcript.sh
- Which scheduler is used ?
- Which commands can I use to submit my job ?
qsub: submit a batch job to Sun Grid Engine (default "workq").
qarray: submit a batch job-array to Sun Grid Engine.
qlogin: submit an interactive X-windows session to Sun Grid Engine (automaticly "interq")
qrsh: submit an interactive login session to Sun Grid Engine.
- What are the available queues ?
Each jobs are submitted to a specific queue (the default one is the workq). Each queue has a different priority considering the maximum time of execution allowed.
max computing time allowed
- Which are the disk spaces available for user ?
/home/user: 100Mb available to store your configuration files.
/work/user: 1To available as working directory. you have read/write access from any cluster node. Files are automaticly deleted if they have not been accessed within the last 120 days (to know them: find repertoire/ -atime +120).
/save/user: 200Gb available for data you want to save with 30 retention days. You have read only access on this directory from any cluster nodes.
If you need more space in /work or in /save you are invited to fill the ressources request form.
/usr/local/bioinfo/bin: directory gathering all bioinformatic binaries
/bank: biological banks in different format
- What resources are available on the cluster ?
34 cluster nodes (1632 cores): ceri001...ceri034 : 4*12 core with 384Gb of memory.
1 node with 1To of memory (32 cores): cerimem01=4*octo core with 1To of memory.
- With default parameters, what are my job limitations ?
Without any parameters, on any queue, all jobs are limited to mem=1Gb, h_vmem=8Gb of memory, 1 CPU.
- How can I submit a simple job on the cluster ?
1 - First write a script (ex: myscript.sh) with the command line as following:
#$ -o /work/.../output.t
#$ -e /work/.../error.txt
#$ -q longq
#$ -M firstname.lastname@example.org
#$ -m bea
# My command lines I want to run on the cluster
blastall -d swissprot -p blastx -i /save/.../z72882.fa
2 - To submit the job, use the qsub command line as following:
- How can I submit a MPI job ?
1. First of all, the parallel environment has to be booked on the cluster using the -pe option. As example for a qsub:
#$ -pe parallel_rr 100
2. Then the environment variables and the MPI compiler wanted have to be loaded as following :
module load compiler/intel-2013.1.117
module load mpi/openmpi-1.5.4
- How can I monitor a running job ?
To do so, you can use the qstat command, following are some usefull options:
$ qstat -u user : list only the specified user's jobs.
$ qstat -j job_id : provide several information on the specified job.
$ qstat -s r : list only the running jobs.
You can also have access to a graphical user interface which provides the same informations. This interface is accessible with the qmon command.
- How can I book more than 1Gb of memory ?
With default parameters, each jobs are limited to 4Gb of memory. To request more, you can use the -l h_vmem=XG -l mem=YG option. As example:
$ qsub -l h_vmem=10G -l mem=8G myscript.sh
- How can I retrieve information on a finished job ?
To do so, use the qacct command line as following:
$ qacct -j job_id
This command line allow as well to make some SGE usage statistics.
- How can I kill my job ?
To do so, you can use the qdel command, following are some usefull options:
# Kill the specified job
$ qdel -j job_id
# Kill all job launched by the specified user
$ qdel -u user
- How can I access available banks list ?
The bank list and their updating dates are listed here.
You can access available banks information (version, updates, location, and so on) by connecting on this page.
You can obtain the same information using the following command lines:
1. biomaj.sh --status: provide the status of each available banks
2. biomaj.sh --status bank_name: return the specified bank status
Specific information on NCBI Blast
You can access the bank list available to be used within the NCBI Blast command line or the web site with the following command line:
Command example: blastall -p blastn -d nt -i <path_to_fasta_file> -e 0.001 -o <path_to_output_file>
You can access the bwa index list available with the following command line:
Command example: /bin/bash -c "bwa sampe /bank/bwadb/ensembl_homo_sapiens_genome <(bwa aln /bank/bwadb/ensembl_homo_sapiens_genome <path_to_read1_fastq.gz_file>) <(bwa aln /bank/bwadb/ensembl_homo_sapiens_genome <path_to_read2_fastq.gz_file>) <path_to_read1_fastq.gz_file> <path_to_read2_fastq.gz_file> | samtools view -bS - > <path_to_output_bam>";
You can access the bowtie index list available with the following command line:
Command example: bowtie /bank/bowtiedb/ensembl_homo_sapiens_genome -1 <path_to_read1_fastq_file> -2 <path_to_read2_fastq_file> | samtools view -bS - > <path_to_output_bam>
You can access the bowtie2 index list available with the following command line:
Command example: bowtie2 -x /bank/bowtie2db/ensembl_homo_sapiens_genome -1 <path_to_read1_fastq_file> -2 <path_to_read2_fastq_file> | samtools view -bS - > <path_to_output_bam>
- Do you have Newbler user guide ?
Yes, here is the pdf.
- How can I publish my short reads data ?
If you want to publish results on short reads you have to publish your data.
For genomic data you can submit it in the Short Read Archive (SRA).
The european mirror of SRA is the ENA, have a look to the video in the paragraph "Submitting public access data using SRA Webin".
For RNAseq data you can submit it in MAGE-TAG: http://www.ebi.ac.uk/cgi-bin/microarray/magetab.cgi.
Follow those step :
- create your account
- fill all the needed information
- compute the md5 of your raw file or ask to the bioinformatics platform if your data are in NG6. md5 is a unique string generated fron the content of a file. To compute it on windows use WinMD5, on linux use in the terminal the command md5sum /path/to/file
- transfert your data when it's required or ask to the bioinformatics platform if your data are in NG6.
- Where to train yourself in biostatistics ?
Here is an online course web site : https://www.coursera.org/course/biostats
- Where to train yourself in bio-informatics?
- How to cite the Genotoul Bio-informatic platform in your publications?
Research teams can thank the Toulouse Midi-Pyrenees bioinformatics platform , using in their publications the following sentence : "We are grateful to the genotoul bioinformatics platform Toulouse Midi-Pyrenees for providing help and/or computing and/or storage resources".
In cases of collaboration, you can directly quote the person who participated to the project : Name, bioinformatics platform Toulouse Midi-Pyrenees, MIAT, INRA Auzeville CS 52627 31326 Castanet Tolosan cedex.
- Where can I find the platforms' publications?
You can find them following this link.