DEXSeq
1)Introduction
DEXSeq是一种在多个比较RNA-seq实验中,检验差异外显子使用情况的方法。 通过差异外显子使用(DEU),我们指的是由实验条件引起的外显子相对使用的变化。 外显子的相对使用定义为:
number of transcripts from the gene that contain this exon / number of all transcripts from the gene
大致思想:. For each exon (or part of an exon) and each sample, we count how many reads map to this exon and how many reads map to any of the other exons of the same gene. We consider the ratio of these two counts, and how it changes across conditions, to infer changes in the relative exon usage
2)安装
if("DEXSeq" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("DEXSeq")}
suppressMessages(library(DEXSeq))
ls('package:DEXSeq')
pythonScriptsDir = system.file( "python_scripts", package="DEXSeq" )
list.files(pythonScriptsDir)
## [1] "dexseq_count.py" "dexseq_prepare_annotation.py" #查看是否含有这两个脚本
python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.72.gtf Dmel.BDGP5.25.62.DEXSeq.chr.gff #GTF转化为GFF with collapsed exon counting bins.
python dexseq_count.py Dmel.BDGP5.25.62.DEXSeq.chr.gff untreated1.sam untreated1fb.txt #count
3) 用自带实验数据集(数据预处理)
suppressMessages(library(pasilla))
inDir = system.file("extdata", package="pasilla")
countFiles = list.files(inDir, pattern="fb.txt$", full.names=TRUE) #countfile(如果不是自带数据集,可以由dexseq_count.py脚本生成)
basename(countFiles)
flattenedFile = list.files(inDir, pattern="gff$", full.names=TRUE)
basename(flattenedFile) #gff文件(如果不是自带数据集,可以由dexseq_prepare_annotation.py脚本生成)
########构造数据框sampleTable,包含sample名字,实验,文库类型等信息#######################
sampleTable = data.frame(
row.names = c( "treated1", "treated2", "treated3",
"untreated1", "untreated2", "untreated3", "untreated4" ),
condition = c("knockdown", "knockdown", "knockdown",
"control", "control", "control", "control" ),
libType = c( "single-end", "paired-end", "paired-end",
"single-end", "single-end", "paired-end", "paired-end" ) )
sampleTable ##############构建 DEXSeqDataSet object#############################
dxd = DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile ) #四个参数
4)Standard analysis work-flow
########以下是简单的实验设计#####
genesForSubset = read.table(file.path(inDir, "geneIDsinsubset.txt"),stringsAsFactors=FALSE)[[1]] #基因子集ID
dxd = dxd[geneIDs( dxd ) %in% genesForSubset,] #取子集,减少运行量
head(colData(dxd))
head( counts(dxd), 5 )
split( seq_len(ncol(dxd)), colData(dxd)$exon )
sampleAnnotation( dxd )
############# dispersion estimates and the size factors#############
dxd = estimateSizeFactors( dxd ) ##Normalisation
dxd = estimateDispersions( dxd )
plotDispEsts( dxd ) #图1 #################Testing for differential exon usage############
dxd = testForDEU( dxd )
dxd = estimateExonFoldChanges( dxd, fitExpToVar="condition")
dxr1 = DEXSeqResults( dxd )
dxr1
mcols(dxr1)$description
table ( dxr1$padj < 0.1 )
table ( tapply( dxr1$padj < 0.1, dxr1$groupID, any ) )
plotMA( dxr1, cex=0.8 ) #图2
To see how the power to detect differential exon usage depends on the number of reads that map to an exon, a so-called MA plot is useful, which plots the logarithm of fold change versus average normalized count per exon and marks by red colour the exons which are considered significant; here, the exons with an adjusted p values of less than 0.1
############以下是更复杂的实验设计##################
formulaFullModel = ~ sample + exon + libType:exon + condition:exon
formulaReducedModel = ~ sample + exon + libType:exon
dxd = estimateDispersions( dxd, formula = formulaFullModel )
dxd = testForDEU( dxd,
reducedModel = formulaReducedModel,
fullModel = formulaFullModel )
dxr2 = DEXSeqResults( dxd )
table( dxr2$padj < 0.1 )
table( before = dxr1$padj < 0.1, now = dxr2$padj < 0.1 )##和简单的实验设计比较
5)Visualization
plotDEXSeq( dxr2, "FBgn0010909", legend=TRUE, cex.axis=1.2, cex=1.3,
lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", displayTranscripts=TRUE, legend=TRUE,
cex.axis=1.2, cex=1.3, lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", expression=FALSE, norCounts=TRUE,
legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", expression=FALSE, splicing=TRUE,
legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
DEXSeqHTML( dxr2, FDR=0.1, color=c("#FF000080", "#0000FF80") )
最新文章
- PHP 原创视频教程-网站开发新手视频教程
- 广度优先搜索 cdoevs 1226 倒水问题
- android05
- 嵌入式中的 *(volatile unsigned int *)0x500 解释
- iOS Safari 中点击事件失效的解决办法
- SQL server 定时自动执行SQL存储过程
- Kafka 0.10 Producer网络流程简述
- js将字符串转化成函数:eval(logOutCallbackFun+";()";);
- 【Android Developers Training】 42. 从另一台设备接收文件
- mysql varchar vs oracle varchar2
- JS前端调用后台方法
- Asp.net Core 使用Jenkins + Dockor 实现持续集成、自动化部署(四):发布与回滚
- Struts vs spring mvc
- MyBatis大杂烩
- 购物商城学习--第二讲(maven工程介绍)
- windows用户态程序的Dump
- 每天一个linux命令:ifconfig命令 临时修改重启后恢复原样
- scala 学习笔记十 元组
- Java 枚举(enum) 的常见用法和开发规范
- [IOS]Xcode各版本官方下载及百度云盘下载, Mac和IOS及Xcode版本历史
热门文章
- 如何检测NFC芯片型号?NFC手机即可!
- Web(click and script) 与 Web(HTTP/HTML)协议区别
- yum安装软件时报错:Loaded plugins:fastestnirror,security Existing lock /var/run/yum.pid
- [转]Java 运算符的优先级
- Linux下XAMPP装完之后,Navicat无法连上数据库的问题的解决 注意&#39;mypassword&#39;是当前的mysql登录密码
- oracle 收集的一些图
- SparkStreaming 运行原理与核心概念
- Spring AOP的总结
- 0_Simple__simpleSurfaceWrite
- VBA 定义能返回数组公式的自定义函数