FastQC, Trimming, Overall QC (initial) #1

Merged
J.v.N. merged 6 commits from P300299/system_genetics:master into master 2021-02-15 16:20:22 +01:00
1 changed files with 36 additions and 0 deletions
Showing only changes of commit 4c52527340 - Show all commits

View File

@ -0,0 +1,36 @@
## this script is to generate jobs of trimming for each samples on the cluster
P300299 marked this conversation as resolved Outdated

Perhaps write some instructions how to use. E.g., you need samples.csv.

Perhaps write some instructions how to use. E.g., you need `samples.csv`.
## please run this script first and then submit the jobs for each samples
P300299 marked this conversation as resolved Outdated

Maybe a general question, do we want to:

  • provide am entire script that processes an entire batch of samples, or
  • do we want to focus on the core of trimming (only trim_galore/Trimmomatic command with options), and remove all the for loops and if-else statements?
Maybe a general question, do we want to: - provide am entire script that processes an entire batch of samples, or - do we want to focus on the core of trimming (only `trim_galore`/`Trimmomatic` command with options), and remove all the for loops and if-else statements?
## reference: http://www.usadellab.org/cms/?page=trimmomatic
#!/bin/bash
# $1 indicates the path of raw samples.
# In the input folder, one sample has one independent folder with two pair-end f
astq files.
# The folder name should be the sample name.
# the fastq file should be sample_1.fastq and sample_2.fastq
# please prepare a sample.list that include file names for each sample
out="/ * your output folder * /"
input="/ * your input folder * /"
cat sample.list | while read line
do
sample=$(echo $line)
echo '#!/bin/bash' > rnaseq.${sample}.sh
echo "#SBATCH --job-name=RNAseq.${sample}" >> rnaseq.${sample}.sh
echo "#SBATCH --error=RNAseq.${sample}.err" >> rnaseq.${sample}.sh
echo "#SBATCH --output=RNAseq.${sample}.out" >> rnaseq.${sample}.sh
echo "#SBATCH --mem=15gb" >> rnaseq.${sample}.sh
P300299 marked this conversation as resolved Outdated

Does this depend on sequencing machine/library type? If so, specify.

Does this depend on sequencing machine/library type? If so, specify.
echo "#SBATCH --time=6:00:00" >> rnaseq.${sample}.sh
echo "#SBATCH --cpus-per-task=6" >> rnaseq.${sample}.sh
echo "ml Java" >>rnaseq.${sample}.sh
echo "java -jar /* your folder of software */trimmomatic-0.36.jar PE \
-phred33 /$input/${sample}\_1.fq.gz /$input/${sample}\_2.fq.gz \
$out/trimmomatic/${sample}\_1_paired.fq $out/trimmomatic/${sample}\_1_unpaired.fq \
$out/trimmomatic/${sample}\_2_paired.fq $out/trimmomatic/${sample}\_2_unpaired.fq \
ILLUMINACLIP: TruSeq3-PE.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 HEADCROP:8 MINLEN:50" >> rnaseq.${sample}.sh
done