Small snippet that generates a genome index and aligns the RNAseq reads. #2
Reference in New Issue
Block a user
No description provided.
Delete Branch "step3/star-alignment-snippet"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Minimal example of alignment using
STAR:Looks quite well.
d71460a7ddto1037d154e3@@ -0,0 +21,4 @@# Storage location reference data (in this case on calculon).REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"Alternative reference file use : gencode.v19.annotation.gtf/gff3
Should I upload these to the server as well? Also, what is the reason to use another reference file than the ones we already have?
@@ -0,0 +22,4 @@# Storage location reference data (in this case on calculon).REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"FASTA_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"Alternative assembly file used : Homo_sapiens_assembly19.fasta
@@ -0,0 +28,4 @@--runThreadN 8 \--runMode genomeGenerate \--readFilesCommand zcat \--sjdbOverhang 100 \sjdbOverhang 150
This was chosen based on the base pair reads for my analysis
Alternative cut-off used : 150
This is also due to how my samples were sequenced (low-input method)
Alternative sjbOverhang cut-off used : 150
@@ -0,0 +32,4 @@--genomeFastaFiles ${FASTA_FILE} \--sjdbGTFfile ${GTF_FILE} \--genomeDir ${GENOME_INDEX}Alternative sjdbOverhang used : 150
This is also chosen due to the low-input RNA for sequencing.
@@ -0,0 +2,4 @@## Align reads against reference genome.STORAGE="/groups/umcg-griac/tmp04/rawdata/$(whoami)/step3"I think
/groups/umcg-griac/tmp04/rawdata/$(whoami)should be a separate variable named 'project_directory' (or similar). This implies that the project has to be here.@@ -0,0 +8,4 @@mkdir -p "${GENOME_INDEX}"# Store the generated `Aligned.sortedByCoord.out.bam` in this dir.ALIGNMENT_OUTPUT="${STORAGE}/alignment"mkdir -p "${OUTPUT}"${OUTPUT} is unknown - should probably be ${ALIGNMENT_OUTPUT}
@@ -0,0 +21,4 @@# Storage location reference data (in this case on calculon).REFERENCE_DATA="/groups/umcg-griac/prm02/rawdata/reference/genome"GTF_FILE="${REFERENCE_DATA}/Homo_sapiens.GRCh38.100.gtf.gz"I am unsure if STAR will decompress these files. Should I decompress them? Also, make sure the user notes the genome version.