Creating samplesheet
Creating a samplesheet is necessary to start the pipeline and that the pipeline knows which samples to run. The samplesheet should be a comma-separated file. An example of a samplesheet is shown below. All the headers are required, but rows do not necessarily have to be filled in.
project,run,sequencingStartDate,sequencer,flowcell,externalSampleID,seqType,capturingKit,barcode,lane,barcodeType,Gender testproject,run1,010101,sequencer2,flowcell3,S1,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},ATCGAA,1,,Male testproject,run1,010101,sequencer2,flowcell3,S2,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},TTAACC,1,,Female testproject,run1,010101,sequencer2,flowcell3,S3,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},GACAAA,1,,Male testproject,run1,010101,sequencer2,flowcell3,S4,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},ACGTTA,1,,Unknown
columns that cannot be blank are the following:
- project (project name)
- run (runnumber)
- sequencingStartDate (yymmdd)
- sequencer (name of sequencer)
- flowcell (flowcell name)
- externalSampleID (name of the sample)
- seqType (SR (single read) or PE(paired end))
- capturingKit (see below for more info)
The columns sequencingStartDate, sequencer, run and flowcell are combined describing the rawdata folder. e.g. 161214_NB501093_0100_ABCDEF3XX. This naming format is the same as the naming of the raw data from the sequencer.
The capturingKit column should contain the path relative to /apps/data/ followed by a backslash (to escape the forward slash), followed by a forward slash and then the name of the capturingkit e.g. Agilent\/ONCO_v3 e.g. UMCG\/All_Exon_v1
Columns that can be left blank are the following:
- barcode (when there is a barcode used fill in barcode, NOTE: should be filled in case of external samples, see below)
- lane (in case of different lanes fill in lane number)
- barcodeType (can fill the barcode type e.g. AGI,rPI etc)
- Gender (Male,Female or Unknown)
External samples
When there are samples from an external source (not in-house), columns externalFastQ_1 and externalFastQ_2 should be added and filled with the name of the fastq.gz (or fq.gz), these files should be placed in the rawdata folder (that folder should have the naming convention as mentioned above, sequencingStartdate_sequencer_run_flowcell) The name is not strict and can be anything e.g. 010101_sequencer2_run1_flowcell3. Note: The name of the folder should be the same as in the samplesheet Note2: Barcode should now be filled with unique names per sample (e.g. use same name as externalSampleID)
project,run,sequencingStartDate,sequencer,flowcell,externalSampleID,seqType,capturingKit,internalSampleID,barcode,lane,barcodeType,contact,Gender,externalFastQ_1,externalFastQ_2 testproject,run1,010101,sequencer2,flowcell3,Sample1,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},S1,Sample1,,,Male,1_S1_L001_R1_001.fastq.gz,1_S1_L001_R2_001.fastq.gz testproject,run1,010101,sequencer2,flowcell3,Sample2,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},S2,Sample2,,,Female,2_S2_L001_R1_001.fastq.gz,2_S2_L001_R2_001.fastq.gz testproject,run1,010101,sequencer2,flowcell3,Sample3,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},S3,Sample3,,,Male,3_S3_L001_R1_001.fastq.gz,3_S3_L001_R2_001.fastq.gz testproject,run1,010101,sequencer2,flowcell3,Sample4,PE,PATH_RELATIVE_TO_apps/data/${NAMEOFCAPTURINGKIT},S4,Sample4,,,Unknown,4_S4_L001_R1_001.fastq.gz,4_S4_L001_R2_001.fastq.gz
Columns that can be left blank are the folowing:
- barcode (when there is a barcode used fill in barcode, NOTE: should be filled in case of external samples, see below )
- lane (in case of different lanes fill in lane number)
- barcodeType (can fill the barcode type e.g. AGI,rPI etc)
- Gender (Male,Female or Unknown)