Skip to Content

why is reproducible research so so so hard?

这是跟着 Gao 老师远程实习,做 gene loss 的事了。我得到了两个表格,想比较一下是不是完全一样的。

其中部分内容分别为:

# A tibble: 7 x 10
     X1             X2    X3       X4       X5    X6    X7    X8              X9   X10
  <int>          <chr> <int>    <int>    <int> <chr> <int> <int>           <chr> <chr>
1   165 LOC_Os11g48060     4 57749194 57750836     +     2     0 relic-retaining   sbi
2  4397 LOC_Os02g08140     3 47955604 47956683     +     1     0 relic-retaining   sbi
3  5146 LOC_Os03g10940     1  6133765  6136732     -     1     0 relic-retaining   sbi
4  8968 LOC_Os02g52510     4 63921086 63928674     -     2     0 relic-retaining   sbi
5  9435 LOC_Os12g18650     6 49944588 49946702     -     2     0 relic-retaining   sbi
6 10318 LOC_Os01g74500     3 74386656 74396272     +    NA    NA   relic-lacking   sbi
7 16364 LOC_Os01g55549     3 63381108 63384547     -     4     0 relic-retaining   sbi

# A tibble: 7 x 10
          ID   chr    start      end strand frameshift `preliminary stop-codon` `orthologous group` `orthologous counterpart`
       <chr> <dbl>    <dbl>    <dbl>  <chr>      <chr>                    <chr>               <dbl>                     <chr>
1 Sor_chr4_2     4 57749194 57750836      +          2                        0                 165            LOC_Os11g48060
2 Sor_chr3_1     3 47955604 47956683      +          1                        0                4397            LOC_Os02g08140
3 Sor_chr1_3     1  6133764  6136732      -          1                        0                5146            LOC_Os03g10940
4 Sor_chr4_3     4 63921085 63928674      -          2                        0                8968            LOC_Os02g52510
5 Sor_chr6_1     6 49944587 49946702      -          2                        0                9435            LOC_Os12g18650
6 Sor_chr3_4     3 74386656 74396272      +         NA                       NA               10318            LOC_Os01g74500
7 Sor_chr3_2     3 63381107 63384547      -          4                        0               16364            LOC_Os01g55549

I once thought start or paste(start, end) would be a good ID, but it turned not. So I had to use orthologous group as ID to process the data.

为什么有些行的 start 一模一样,另一些却正好差 1?

这两个表格具体是什么已经不重要了,应该一个是师兄的结果,一个是我重复的结果。重要的是差异为何如此奇葩,可重复性科研为什么就如此如此如此之难?