Small snippet that generates a genome index and aligns the RNAseq reads. #2
No reviewers
Labels
No Label
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: GRIAC/system_genetics#2
Loading…
Reference in New Issue
No description provided.
Delete Branch "step3/star-alignment-snippet"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Minimal example of alignment using
STAR
:Looks quite well.
d71460a7dd
to1037d154e3
@ -0,0 +21,4 @@
# Storage location reference data (in this case on calculon).
REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"
GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"
Alternative reference file use : gencode.v19.annotation.gtf/gff3
Should I upload these to the server as well? Also, what is the reason to use another reference file than the ones we already have?
@ -0,0 +22,4 @@
# Storage location reference data (in this case on calculon).
REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"
GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"
FASTA_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"
Alternative assembly file used : Homo_sapiens_assembly19.fasta
@ -0,0 +28,4 @@
--runThreadN 8 \
--runMode genomeGenerate \
--readFilesCommand zcat \
--sjdbOverhang 100 \
sjdbOverhang 150
This was chosen based on the base pair reads for my analysis
Alternative cut-off used : 150
This is also due to how my samples were sequenced (low-input method)
Alternative sjbOverhang cut-off used : 150
@ -0,0 +32,4 @@
--genomeFastaFiles ${FASTA_FILE} \
--sjdbGTFfile ${GTF_FILE} \
--genomeDir ${GENOME_INDEX}
Alternative sjdbOverhang used : 150
This is also chosen due to the low-input RNA for sequencing.
@ -0,0 +2,4 @@
#
# Align reads against reference genome.
STORAGE="/groups/umcg-griac/tmp04/rawdata/$(whoami)/step3"
I think
/groups/umcg-griac/tmp04/rawdata/$(whoami)
should be a separate variable named 'project_directory' (or similar). This implies that the project has to be here.@ -0,0 +8,4 @@
mkdir -p "${GENOME_INDEX}"
# Store the generated `Aligned.sortedByCoord.out.bam` in this dir.
ALIGNMENT_OUTPUT="${STORAGE}/alignment"
mkdir -p "${OUTPUT}"
${OUTPUT} is unknown - should probably be ${ALIGNMENT_OUTPUT}
@ -0,0 +21,4 @@
# Storage location reference data (in this case on calculon).
REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"
GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"
I am unsure if STAR will decompress these files. Should I decompress them? Also, make sure the user notes the genome version.