Small snippet that generates a genome index and aligns the RNAseq reads. #2

Merged
J.v.N. merged 2 commits from step3/star-alignment-snippet into master 2021-02-15 17:09:32 +01:00
Owner

Minimal example of alignment using STAR:

  • Generates genome index.
  • Performs the alignment.
Minimal example of alignment using `STAR`: - Generates genome index. - Performs the alignment.
H.C. Donker added 1 commit 2021-02-09 17:36:22 +01:00
H.C. Donker added 1 commit 2021-02-11 10:47:32 +01:00
H.C. Donker requested review from A.K. Saikumar Jayalatha 2021-02-12 11:59:24 +01:00
H.C. Donker requested review from J.v.N. 2021-02-12 11:59:40 +01:00
Owner

Looks quite well.

Looks quite well.
H.C. Donker force-pushed step3/star-alignment-snippet from d71460a7dd to 1037d154e3 2021-02-12 14:46:31 +01:00 Compare
A.K. Saikumar Jayalatha approved these changes 2021-02-12 15:29:05 +01:00
@ -0,0 +21,4 @@
# Storage location reference data (in this case on calculon).
REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"
GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"
Alternative reference file use : gencode.v19.annotation.gtf/gff3
##### Alternative reference file use : gencode.v19.annotation.gtf/gff3
Owner

Should I upload these to the server as well? Also, what is the reason to use another reference file than the ones we already have?

Should I upload these to the server as well? Also, what is the reason to use another reference file than the ones we already have?
@ -0,0 +22,4 @@
# Storage location reference data (in this case on calculon).
REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"
GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"
FASTA_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"
Alternative assembly file used : Homo_sapiens_assembly19.fasta
##### Alternative assembly file used : Homo_sapiens_assembly19.fasta
@ -0,0 +28,4 @@
--runThreadN 8 \
--runMode genomeGenerate \
--readFilesCommand zcat \
--sjdbOverhang 100 \
sjdbOverhang 150
This was chosen based on the base pair reads for my analysis
##### sjdbOverhang 150 ##### This was chosen based on the base pair reads for my analysis

Alternative cut-off used : 150

This is also due to how my samples were sequenced (low-input method)

#### Alternative cut-off used : 150 #### This is also due to how my samples were sequenced (low-input method)

Alternative sjbOverhang cut-off used : 150

Alternative sjbOverhang cut-off used : 150
@ -0,0 +32,4 @@
--genomeFastaFiles ${FASTA_FILE} \
--sjdbGTFfile ${GTF_FILE} \
--genomeDir ${GENOME_INDEX}
Alternative sjdbOverhang used : 150
This is also chosen due to the low-input RNA for sequencing.
##### Alternative sjdbOverhang used : 150 ##### This is also chosen due to the low-input RNA for sequencing.
J.v.N. approved these changes 2021-02-12 16:43:12 +01:00
@ -0,0 +2,4 @@
#
# Align reads against reference genome.
STORAGE="/groups/umcg-griac/tmp04/rawdata/$(whoami)/step3"
Owner

I think /groups/umcg-griac/tmp04/rawdata/$(whoami) should be a separate variable named 'project_directory' (or similar). This implies that the project has to be here.

I think `/groups/umcg-griac/tmp04/rawdata/$(whoami)` should be a separate variable named 'project_directory' (or similar). This implies that the project has to be here.
@ -0,0 +8,4 @@
mkdir -p "${GENOME_INDEX}"
# Store the generated `Aligned.sortedByCoord.out.bam` in this dir.
ALIGNMENT_OUTPUT="${STORAGE}/alignment"
mkdir -p "${OUTPUT}"
Owner

${OUTPUT} is unknown - should probably be ${ALIGNMENT_OUTPUT}

${OUTPUT} is unknown - should probably be ${ALIGNMENT_OUTPUT}
@ -0,0 +21,4 @@
# Storage location reference data (in this case on calculon).
REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"
GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"
Owner

I am unsure if STAR will decompress these files. Should I decompress them? Also, make sure the user notes the genome version.

I am unsure if STAR will decompress these files. Should I decompress them? Also, make sure the user notes the genome version.
J.v.N. added 1 commit 2021-02-15 17:08:37 +01:00
J.v.N. merged commit 59f1b425b1 into master 2021-02-15 17:09:32 +01:00
Sign in to join this conversation.
No Label
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: GRIAC/system_genetics#2
No description provided.