FastCAR/README.md

141 lines
4.5 KiB
Markdown
Raw Permalink Normal View History

2020-03-25 14:50:37 +01:00
# FastCAR
2020-03-26 11:50:21 +01:00
FastCAR is an R package to remove ambient RNA from cells in droplet based single cell RNA sequencing data.
2020-03-26 14:12:27 +01:00
### Installation
2020-03-26 11:50:21 +01:00
2020-03-26 14:12:27 +01:00
FastCAR can be installed from git with the following command.
2020-03-26 11:50:21 +01:00
```
devtools::install_git("https://git.web.rug.nl/P278949/FastCAR")
```
Running FastCAR is quite simple.
First load the library and dependencies.
```
library(Matrix)
library(Seurat)
library(qlcMatrix)
2021-11-01 16:41:25 +01:00
library(pheatmap)
library(ggplot2)
library(gridExtra)
2020-03-26 11:50:21 +01:00
```
2020-03-26 14:09:20 +01:00
Specify the locations of the expression matrices
2020-03-26 11:50:21 +01:00
```
cellExpressionFolder = c("Cellranger_output/sample1/filtered_feature_bc_matrix/")
fullMatrixFolder = c("Cellranger_output/sample1/raw_feature_bc_matrix/")
```
2020-03-26 14:09:20 +01:00
Load both the cell matrix and the full matrix
```
2020-03-26 11:50:21 +01:00
cellMatrix = read.cell.matrix(cellExpressionFolder)
fullMatrix = read.full.matrix(fullMatrixFolder)
```
2020-03-26 14:09:20 +01:00
The following functions give an idea of the effect that different settings have on the ambient RNA profile.
These are optional as they do take a few minutes and the default settings work fine
2020-03-26 12:05:04 +01:00
Plotting the number of empty droplets, the number of genes identified in the ambient RNA, and the number of genes that will be corrected for at different UMI cutoffs,
2020-03-26 11:50:21 +01:00
```
2020-03-26 14:09:20 +01:00
ambProfile = describe.ambient.RNA.sequence(fullCellMatrix = fullMatrix,
start = 10,
stop = 500,
by = 10,
contaminationChanceCutoff = 0.05)
2020-03-26 11:50:21 +01:00
plot.ambient.profile(ambProfile)
```
2020-03-26 12:08:24 +01:00
![picture](Images/Example_profile.png)
2020-03-26 11:50:21 +01:00
2021-11-01 16:41:25 +01:00
The actual effect on the chances of genes affecting your DE analyses can be determined and visualized with the following function
```
correctionEffectProfile = describe.correction.effect(allExpression, cellExpression, 50, 500, 10, 0.05)
plot.correction.effect.chance(correctionEffectProfile)
```
![picture](Images/DE_affect_chance.png)
How many reads will be removed of these genes can be visualized from the same profile
```
plot.correction.effect.removal(correctionEffectProfile)
```
![picture](Images/Counts_removed.png)
2020-03-26 12:05:04 +01:00
Set the empty droplet cutoff and the contamination chance cutoff
The empty droplet cutoff is the number of UMIs a droplet can contain at the most to be considered empty.
2020-03-26 15:19:45 +01:00
100 works fine but we tested this method in only one tissue. For other tissues these settings may need to be changed.
2020-03-26 12:05:04 +01:00
Increasing this number also increases the highest possible value of expression of a given gene.
As the correction will remove this value from every cell it is adviced not to set this too high and thereby overcorrect the expression in lowly expressing cells.
The contamination chance cutoff is the allowed probability of a gene contaminating a cell.
As we developed FastCAR specifically for differential expression analyses between groups we suggest setting this such that not enough cells could be contaminated to affect this.
In a cluster of a thousand cells divided into two groups there would be 2-3 cells per group with ambient RNA contamination of any given gene.
Such low cell numbers are disregarded for differential expression analyses.
2020-03-26 11:50:21 +01:00
There is an experimental function that gives a recommendation based on the ambient profiling results.
This selects the first instance of the maximum number of genes being corrected for.
I have no idea yet if this is actually a good idea.
```
emptyDropletCutoff = recommend.empty.cutoff(ambProfile)
```
2020-03-26 11:50:21 +01:00
```
2021-11-01 16:41:25 +01:00
emptyDropletCutoff = 150
contaminationChanceCutoff = 0.005
2020-03-26 11:50:21 +01:00
```
2020-03-26 14:09:20 +01:00
Determine the ambient RNA profile and remove the ambient RNA from each cell
2020-03-26 11:50:21 +01:00
```
ambientProfile = determine.background.to.remove(fullMatrix, cellMatrix, emptyDropletCutoff, contaminationChanceCutoff)
cellMatrix = remove.background(cellMatrix, ambientProfile)
```
This corrected matrix can be used to to make a Seurat object
2020-03-26 11:50:21 +01:00
```
seuratObject = CreateSeuratObject(cellMatrix)
2020-03-26 11:50:21 +01:00
```
## Authors
2020-03-26 14:09:20 +01:00
* **Marijn Berg** - m.berg@umcg.nl
2020-03-26 11:50:21 +01:00
## License
This project is licensed under the GPL-3 License - see the [LICENSE.md](LICENSE.md) file for details
2020-03-26 14:12:27 +01:00
## Changelog
### v0.1
First fully working version of the R package
### v0.2
Fixed function to write the corrected matrix to file.
Added readout of which genes will be corrected for and how many reads will be removed per cell
Added some input checks to functions
2021-11-01 16:41:25 +01:00
### v0.2
Fixed a bug that caused FastCAR to be incompatible with biobase libraries
Added better profiling to determine the effect of different settings on the corrections
Swapped base R plots for ggplot2 plots