#r #model.matrix
Вопрос:
Я новичок в DESeq2. В настоящее время я пытаюсь использовать различные формулы проектирования для анализа данных из пакета Bioconductor airway
.
Я следую шагу в DESeq2
виньетке: Рабочий процесс RNA-seq для вычисления статистических результатов. Однако приведенное ниже сообщение об ошибке появилось, когда я указываю термин взаимодействия в формуле проектирования.
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
Мой вопрос в том, что ошибка возникает, когда я следую инструкциям по example("results")
указанию формулы проектирования. Почему возникает эта ошибка и как я могу сгенерировать результат с эффектом взаимодействия? ?
Я буду очень признателен, если кто-нибудь сможет помочь мне с этой проблемой.
- загрузка данных из
package(airway)
> # Loading data
> library("airway")
> library("DESeq2")
> data(gse)
> gse
class: RangedSummarizedExperiment
dim: 58294 8
metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
assays(3): counts abundance length
rownames(58294): ENSG00000000003.14 ENSG00000000005.5 ... ENSG00000285993.1
ENSG00000285994.1
rowData names(1): gene_id
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(3): names donor condition
- переименуйте и удалите переменную
> # rename variable
> (gse$cell <- gse$donor)
[1] N61311 N61311 N052611 N052611 N080611 N080611 N061011 N061011
Levels: N052611 N061011 N080611 N61311
> (gse$dex <- gse$condition)
[1] Untreated Dexamethasone Untreated Dexamethasone Untreated Dexamethasone Untreated
[8] Dexamethasone
Levels: Untreated Dexamethasone
> levels(gse$dex) = c("untrt", "trt")
> levels(gse$dex)
[1] "untrt" "trt"
- создайте DESeqDataSet с формулой проектирования
~ cell dex
и выполните анализ.
> # building DESeqDataSet
> dds <- DESeqDataSet(gse, design = ~ cell dex)
using counts and average transcript lengths from tximeta
> dds
class: DESeqDataSet
dim: 58294 8
metadata(7): tximetaInfo quantInfo ... txdbInfo version
assays(3): counts abundance avgTxLength
rownames(58294): ENSG00000000003.14 ENSG00000000005.5 ... ENSG00000285993.1
ENSG00000285994.1
rowData names(1): gene_id
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(5): names donor condition cell dex
> # Filtering
> keep = rowSums(counts(dds)) > 1
> dds = dds[keep,]
> dim(dds)
[1] 31604 8
> # Defferential analysis
> design(dds)
~cell dex
> dds = DESeq(dds)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> resultsNames(dds)
[1] "Intercept" "cell_N061011_vs_N052611" "cell_N080611_vs_N052611"
[4] "cell_N61311_vs_N052611" "dex_trt_vs_untrt"
> results(dds, contrast = c("dex", "untrt", "trt"))
log2 fold change (MLE): dex untrt vs trt
Wald test p-value: dex untrt vs trt
DataFrame with 31604 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003.14 739.940717 0.3611537 0.106869 3.379419 0.000726392 0.00531137
ENSG00000000419.12 511.735722 -0.2063147 0.128665 -1.603509 0.108822318 0.29318870
ENSG00000000457.13 314.194855 -0.0378308 0.158633 -0.238479 0.811509461 0.92255697
ENSG00000000460.16 79.793622 0.1152590 0.314991 0.365912 0.714430444 0.87298038
ENSG00000000938.12 0.307267 1.3691185 3.503764 0.390757 0.695977205 NA
... ... ... ... ... ... ...
ENSG00000285979.1 38.353886 -0.3423657 0.359511 -0.952310 0.340940 0.600750
ENSG00000285987.1 1.562508 -0.7064145 1.547295 -0.456548 0.647996 NA
ENSG00000285990.1 0.642315 -0.3647333 3.433276 -0.106235 0.915396 NA
ENSG00000285991.1 11.276284 0.1165515 0.748601 0.155692 0.876275 0.952921
ENSG00000285994.1 3.651041 0.0960094 1.068660 0.089841 0.928414 NA
- Analyze the data with interaction term
~ cell dex cell:dex
.
In this step, after I specify the design formula with the interaction term ~ cell dex cell:dex
. The error occur when I tried to run DESeq()
function on the data set.
The design formula I used ~ cell dex cell:dex
is the same as the interaction design formula they demonstrate in example("results")
.
> # Defferential analysis using interaction term
> dds_int = dds
> design(dds_int) = formula(~ cell dex cell:dex)
> dds_int = DESeq(dds_int)
using pre-existing normalization factors
estimating dispersions
found already estimated dispersions, replacing these
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
- I tried to sepcify the design formula while building the
DESeqDataSet
. However, the same error occurs when I tried the runDESeq()
on the dataset
> dds_int = DESeqDataSet(gse, design = ~ cell dex cell:dex)
using counts and average transcript lengths from tximeta
> dim(dds_int)
[1] 58294 8
>
> keep = rowSums(counts(dds_int)) > 1
> dds_int = dds_int[keep,]
> dim(dds_int)
[1] 31604 8
>
> design(dds_int)
~cell dex cell:dex
>
> dds_int = DESeq(dds_int)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
- I attempted to create
model.matrix
and use it to run theDESeq
Analysis. However, the same error still occurs.
> # model formula
> dds_int = dds
> attach(as.data.frame(colData(dds_int)))
The following objects are masked from as_data_frame(colData(dds_int)):
cell, condition, dex, donor, names
>
> mm = model.matrix( ~ cell dex cell:dex)
> design(dds_int) = mm
> design(dds_int)
(Intercept) cellN061011 cellN080611 cellN61311 dextrt cellN061011:dextrt cellN080611:dextrt
1 1 0 0 1 0 0 0
2 1 0 0 1 1 0 0
3 1 0 0 0 0 0 0
4 1 0 0 0 1 0 0
5 1 0 1 0 0 0 0
6 1 0 1 0 1 0 1
7 1 1 0 0 0 0 0
8 1 1 0 0 1 1 0
cellN61311:dextrt
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
attr(,"assign")
[1] 0 1 1 1 2 3 3 3
attr(,"contrasts")
attr(,"contrasts")$cell
[1] "contr.treatment"
attr(,"contrasts")$dex
[1] "contr.treatment"
>
> dds_int = DESeq(dds_int, test="Wald", modelMatrixType = "standard")
using supplied model matrix
using pre-existing normalization factors
estimating dispersions
found already estimated dispersions, replacing these
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
Две формулы проектирования термина взаимодействия, которые я создаю в приведенном выше фрагменте кода, совпадают с той, что показана в example(results)
. Интересно, почему ошибка продолжает возникать. Как я могу сгенерировать результат с эффектом взаимодействия?
Спасибо всем вам за потраченное время.