Please install the package hJAM
(Hierarchical Joint
Analysis of Marginal Summary Statistics) from github as
mJAM
is developed as part of the hJAM
package.
The hJAM
package also contains other
JAM
-family approaches: hJAM
(for MR and TWAS)
and SHAJAM
(for MR and TWAS with high-dimensional
intermediates).
devtools::install_github("USCbiostats/hJAM")
For macOS users with apple silicon (M1 Macs), please install R
version 4.2-arm64 from https://mac.r-project.org/ . This is a specific build
for compatibility with the M1 arm-based Macs. One of crucial
dependencies in mJAM
, Rmpfr
, is not yet
compatible with Intel-based R versions running on M1 Macs and it will
cause R/Rstudio to abort. But using version 4.2-arm64 will solve this
issue.
library(hJAM)
library(dplyr) ## for the use of piping operator %>%
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
In the first example, we’ll illustrate the usage of
mJAM-SuSiE
and mJAM-Forward
with a simulated
dataset, consisting of 3 ancestry groups and 50 SNPs. These 50 SNPs
consist of 5 LD blocks, each with 10 SNPs and uniform pairwise LD
between them. The first SNP, named “rs1”, is the only true causal
SNP.
First, let’s load the marginal summary statistics of these 50 SNPs.
# Load marginal betas data
MargBeta <- read.table("./Example-Data/Marg_1.txt")
names(MargBeta) <- c("SNP", "MargBeta_P1", "MargSEBeta_P1",
"MargBeta_P2", "MargSEBeta_P2",
"MargBeta_P3", "MargSEBeta_P3")
head(MargBeta)
## SNP MargBeta_P1 MargSEBeta_P1 MargBeta_P2 MargSEBeta_P2 MargBeta_P3
## 1 rs1 0.06054038 0.02489064 0.077964524 0.02068987 0.026311626
## 2 rs2 -0.01068437 0.02126704 0.040433268 0.01813179 0.024127159
## 3 rs3 0.05264446 0.02153163 0.037045359 0.01838805 0.019806117
## 4 rs4 -0.01167104 0.02121476 0.009254768 0.01826285 0.006243936
## 5 rs5 0.03411272 0.02145365 0.032849181 0.01819438 0.024697970
## 6 rs6 0.02948659 0.02152562 0.049799131 0.01836573 0.016761096
## MargSEBeta_P3
## 1 0.02020833
## 2 0.01781806
## 3 0.01786128
## 4 0.01792291
## 5 0.01774054
## 6 0.01782569
Then load the effect allele frequencies (not necessarily the minor alleles).
# Load effective allele frequency data
EAF <- read.table("./Example-Data/EAF_1.txt")
names(EAF) <- c("SNP", "EAF_P1", "EAF_P2", "EAF_P3")
head(EAF)
## SNP EAF_P1 EAF_P2 EAF_P3
## 1 rs1 0.2010 0.4072 0.6075
## 2 rs2 0.2374 0.4205 0.5825
## 3 rs3 0.2329 0.4102 0.5935
## 4 rs4 0.2394 0.4151 0.5850
## 5 rs5 0.2386 0.4158 0.5851
## 6 rs6 0.2344 0.4191 0.5895
Finally one reference dosage matrix for each ancestry group.
# Load individual-level SNP data from reference panel
RefDosage_P1 <- read.table("./Example-Data/Dosage_S1_1.txt", header = TRUE)
RefDosage_P2 <- read.table("./Example-Data/Dosage_S2_1.txt", header = TRUE)
RefDosage_P3 <- read.table("./Example-Data/Dosage_S3_1.txt", header = TRUE)
RefDosage_P1[1:5, 1:5]
## rs1 rs2 rs3 rs4 rs5
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 1 0 0 1
## 4 1 1 1 0 1
## 5 1 1 1 0 1
RefDosage_P2[1:5, 1:5]
## rs1 rs2 rs3 rs4 rs5
## 1 1 2 1 2 2
## 2 0 0 0 0 0
## 3 0 0 0 0 2
## 4 0 0 0 0 0
## 5 0 0 0 0 0
RefDosage_P3[1:5, 1:5]
## rs1 rs2 rs3 rs4 rs5
## 1 0 1 0 1 0
## 2 1 0 0 1 1
## 3 1 1 1 1 0
## 4 2 2 1 1 2
## 5 1 1 2 0 1
Now we’ll use the marginal summary statistics (beta
,
se
, EAF
and reference dosage
) to
run mJAM-SuSiE
and mJAM-SuSiE
.
# --- Fit mJAM_Forward
Ex1_Forward_fit <- mJAM_Forward(N_GWAS = c(5000, 5000, 5000),
X_ref = list(RefDosage_P1,RefDosage_P2,RefDosage_P3),
Marg_Result = MargBeta,
EAF_Result = EAF,
condp_cut = 0.05/50, # Bonferroni-corrected p-val
within_pop_threshold = 0.50,
across_pop_threshold = 0.20,
coverage = 0.95)
## No. 1 selected index SNP: rs1
## 10 SNPs got pruned. 41 SNPs left.
## Continue to next round of index SNP selection.
## Analysis ended.
mJAM_Forward
outputs a list with three tables:
index
, cs
and marginal_est
.
# Here're the selected index SNP(s)
Ex1_Forward_fit$index
## # A tibble: 1 × 6
## SNP b_joint b_joint_var cond_log10p final_log10p pcut
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 rs1 0.0544 0.000156 -4.86 -4.86 0.001
index
table lists all the index SNP(s) being selected
(SNP
), along with their log10(p-value) conditional on all
SNP(s) above (cond_log10p
), the log10(p-value) conditional
on all other index SNP(s) (final_log10p
), and the p-value
threshold used in this analysis (pcut
).
b_joint
is the conditional effect estimates in the final
model, and b_joint_var
is the squared standard error of
b_joint
.
We choose to report log10(p-value) instead of p-value on the original scale because R does not support p-value precision smaller than around 1e-300. To avoid ambiguous results under large-effect regions, we calculate p-values on the log/log10 scales and report these.
# Here're the credible sets SNPs
Ex1_Forward_fit$cs %>% filter(CS_in == T)
## # A tibble: 3 × 7
## index_SNP CS_SNP Post_Model_Prob Post_Med_Prob SD_Post_CS_Prob CumSum_…¹ CS_in
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 rs1 rs1 1 1 0.768 0.768 TRUE
## 2 rs1 rs8 0.207 0.826 0.131 0.899 TRUE
## 3 rs1 rs10 0.0859 0.850 0.0560 0.955 TRUE
## # … with abbreviated variable name ¹CumSum_Prob
cs
table records the posterior probabilities of all SNPs
being considered for credible set SNPs. Post_Model_Prob
stands for posterior model probability which quantifies the marginal
association between CS_SNP
and its index_SNP
.
Post_Med_Prob
stands for posterior mediation probability
which quantifies the mediation effect of the CS_SNP
through
index_SNP
. Stronger LD between CS_SNP
and
index_SNP
often leads to larger Post_Med_Prob
.
By taking the product of Post_Model_Prob
and
Post_Med_Prob
and then standardizing, we have
SD_Post_CS_Prob
which we use to determine the credible
set.
Here we only print out those that are selected in the 95% mJAM
credible set (rs1, rs8 and rs10) by filtering this table using
CS_in == TRUE
.
Ex1_Forward_fit$mJAM_marg_est %>% head()
## # A tibble: 6 × 4
## SNP mJAM_effect mJAM_se mJAM_log10p
## <chr> <dbl> <dbl> <dbl>
## 1 rs1 0.0544 0.0125 -4.87
## 2 rs2 0.0206 0.0112 -1.18
## 3 rs3 0.035 0.0113 -2.71
## 4 rs4 0.00245 0.0112 -0.0826
## 5 rs5 0.0302 0.0112 -2.15
## 6 rs6 0.0323 0.0112 -2.41
If you’d like to check out the multi-population marginal estimates
from mJAM, mJAM_marg_est
provides the marginal estimates of
all SNPs under the mJAM model. This includes mJAM_effect
(the point estimate), mJAM_se
(the standard error of the
point estimate) and mJAM_log10p
(log10 of the p-value based
on point estimate and standard error).
# --- Fit mJAM_SuSiE
Ex1_SuSiE_fit <- mJAM_SuSiE(marginal.betas = list(MargBeta$MargBeta_P1, MargBeta$MargBeta_P2, MargBeta$MargBeta_P3),
marginal.se = list(MargBeta$MargSEBeta_P1, MargBeta$MargSEBeta_P2, MargBeta$MargSEBeta_P3),
EAFs = list(EAF$EAF_P1, EAF$EAF_P2, EAF$EAF_P3),
N_GWAS = c(5000, 5000, 5000),
X_ref = list(RefDosage_P1,RefDosage_P2,RefDosage_P3),
SNP_names = MargBeta$SNP,
SuSiE_num_comp = 10,
SuSiE_coverage = 0.95)
## --- mJAM_SuSiE results
mJAM_SuSiE_get_cs(susie_fit = Ex1_SuSiE_fit$fit, SNP_names = MargBeta$SNP)
## index coverage CS_size index_SNP CS_SNP
## 1 L1 0.9558736 2 rs1 rs1
## 2 L1 0.9558736 2 rs1 rs8
mJAM_SuSiE
outputs a table with 5 columns:
index
, coverage
,
CS_size
,index_SNP
and CS_SNP
.
index
is a label for a distinct credible set.coverage
is the empirical coverage of this credible
set.CS_size
shows how many SNPs in total are there in this
credible set.index_SNP
is the name of the index SNP (SNP with
highest posterior probability) in this credible set.CS_SNP
is names of individual SNPs selected in this
credible set.In this example, mJAM-SuSiE identifies one credible set with
rs1
and rs8
in it. Both of them are included
in the mJAM-Forward credible set as well.
In the second example, we will explore the scenario when we have missing data in marginal summary statistics and in the reference dosage. The dataset used in example 2 is the same as in example 1, except that there are 2 SNPs made missing in either summary statistics or reference dosage: the dosage of rs15 in population 1 is missing; the summary statistic of rs20 in population 3 is missing.
Now let’s load the data:
# Load marginal betas data
MargBeta <- read.table("./Example-Data/Marg_2.txt",
colClasses = c("character", rep("numeric", 6)))
names(MargBeta) <- c("SNP", "MargBeta_P1", "MargSEBeta_P1",
"MargBeta_P2", "MargSEBeta_P2",
"MargBeta_P3", "MargSEBeta_P3")
# Take a look at SNP No.20-25, rs20 in P3 missing
MargBeta[20:25,]
## SNP MargBeta_P1 MargSEBeta_P1 MargBeta_P2 MargSEBeta_P2 MargBeta_P3
## 20 rs20 0.001697950 0.02514703 0.01435850 0.02056006 NA
## 21 rs21 -0.010286390 0.02488344 -0.06636717 0.02066592 -0.009864962
## 22 rs22 0.035685926 0.02539317 0.03215446 0.02055493 -0.034835993
## 23 rs23 0.031534739 0.02484755 0.04407369 0.02058289 -0.026641422
## 24 rs24 -0.002939056 0.02526160 0.01915812 0.02059609 -0.008948019
## 25 rs25 -0.005859474 0.02549730 -0.01123426 0.02081805 -0.027068635
## MargSEBeta_P3
## 20 NA
## 21 0.02024422
## 22 0.02054327
## 23 0.02026291
## 24 0.02053724
## 25 0.02019436
# Load effective allele frequency data
EAF <- read.table("./Example-Data/EAF_2.txt",
colClasses = c("character", rep("numeric", 3)))
names(EAF) <- c("SNP", "EAF_P1", "EAF_P2", "EAF_P3")
# Take a look at SNP No.20-25, rs20 in P3 missing
EAF[20:25, ]
## SNP EAF_P1 EAF_P2 EAF_P3
## 20 rs20 0.1957 0.3927 NA
## 21 rs21 0.2046 0.3954 0.5990
## 22 rs22 0.1999 0.3968 0.5998
## 23 rs23 0.1995 0.4047 0.5998
## 24 rs24 0.1974 0.3973 0.6099
## 25 rs25 0.1979 0.4015 0.5992
Note that rs20’s beta
,se
, and
EAF
are missing in population 3.
# Load individual-level SNP data from reference panel
RefDosage_P1 <- read.table("./Example-Data/Dosage_S1_2.txt", header = TRUE)
RefDosage_P2 <- read.table("./Example-Data/Dosage_S2_2.txt", header = TRUE)
RefDosage_P3 <- read.table("./Example-Data/Dosage_S3_2.txt", header = TRUE)
# Notice that rs15 in P1 is missing
length(names(RefDosage_P1)); "rs15" %in% names(RefDosage_P1)
## [1] 49
## [1] FALSE
length(names(RefDosage_P2)); "rs15" %in% names(RefDosage_P2)
## [1] 50
## [1] TRUE
length(names(RefDosage_P3)); "rs15" %in% names(RefDosage_P3)
## [1] 50
## [1] TRUE
Note that rs15 is missing in the reference panel of population 1.
Note on input formatting requirement:
When combining summary data from multiple populations, remember to
put NA
into where the SNP information is missing in some
populations but present in at least one other population. For SNPs that
are missing in ALL populations, just exclude them.
# Fill in rs15 as NAs
RefDosage_P1 <- cbind(RefDosage_P1[,1:14], NA, RefDosage_P1[,15:49])
names(RefDosage_P1) <- names(RefDosage_P2)
Now we’ll use the marginal summary statistics (beta
,
se
, EAF
and reference dosage
) to
run mJAM-SuSiE
and mJAM-SuSiE
.
Ex2_Forward_fit <- mJAM_Forward(N_GWAS = c(5000, 5000, 5000),
X_ref = list(RefDosage_P1,RefDosage_P2,RefDosage_P3),
Marg_Result = MargBeta,
EAF_Result = EAF,
condp_cut = 0.05/50,
within_pop_threshold = 0.50,
across_pop_threshold = 0.20)
## No. 1 selected index SNP: rs1
## 10 SNPs got pruned. 41 SNPs left.
## Continue to next round of index SNP selection.
## Analysis ended.
## --- mJAM Forward results: index SNPs
Ex2_Forward_fit$index
## # A tibble: 1 × 6
## SNP b_joint b_joint_var cond_log10p final_log10p pcut
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 rs1 0.0544 0.000156 -4.86 -4.86 0.001
## --- mJAM Forward results: credible set SNPs
Ex2_Forward_fit$cs %>% filter(CS_in == T)
## # A tibble: 3 × 7
## index_SNP CS_SNP Post_Model_Prob Post_Med_Prob SD_Post_CS_Prob CumSum_…¹ CS_in
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 rs1 rs1 1 1 0.768 0.768 TRUE
## 2 rs1 rs8 0.207 0.826 0.131 0.899 TRUE
## 3 rs1 rs10 0.0859 0.850 0.0561 0.955 TRUE
## # … with abbreviated variable name ¹CumSum_Prob
In this toy example, missing values do not affect
mJAM_Forward
’s results.
# --- Fit mJAM_SuSiE, with missing SNPs.
Ex2_SuSiE_fit <- mJAM_SuSiE(marginal.betas = list(MargBeta$MargBeta_P1, MargBeta$MargBeta_P2, MargBeta$MargBeta_P3),
marginal.se = list(MargBeta$MargSEBeta_P1, MargBeta$MargSEBeta_P2, MargBeta$MargSEBeta_P3),
EAFs = list(EAF$EAF_P1, EAF$EAF_P2, EAF$EAF_P3),
N_GWAS = c(5000, 5000, 5000),
X_ref = list(RefDosage_P1,RefDosage_P2,RefDosage_P3),
p_cases = c(0.5, 0.5, 0.5),
SNP_names = MargBeta$SNP,
SuSiE_num_comp = 10,
SuSiE_coverage = 0.95)
## --- mJAM_SuSiE results
mJAM_SuSiE_get_cs(susie_fit = Ex2_SuSiE_fit$fit, SNP_names = MargBeta$SNP)
## index coverage CS_size index_SNP CS_SNP
## 1 L1 0.9560648 2 rs1 rs1
## 2 L1 0.9560648 2 rs1 rs8
In this toy example, missing values do not affect
mJAM_SuSiE
’s results.
One practical advantage of mJAM-Forward is that it provides the
flexibility to pre-specify your own index SNP by using the option
index_snps
in mJAM_Forward()
. This will bypass
the step of index SNP selection but still construct credible sets of the
specified index SNPs. In the next example, we will specify two imaginary
index SNPs: rs2
and rs11
by adding
index_snps = c("rs2", "rs11")
to
mJAM_Forward()
.
Note that the order of index_snps
matters: if we put
index_snps = c("rs2", "rs11")
, mJAM_Forward
will first construct the credible set for rs2
and then
prune out SNPs highly correlated with rs2
. Then it will
construct the credible set for rs11
and form credible set
of rs11
by conditioning on the effect of rs2
.
If you’d like to get unconditional credible sets for rs2
and rs11
separately, we recommend running
mJAM_Forward
twice where specifying one
index_snp
at a time.
Ex3_Forward_fit <- mJAM_Forward(N_GWAS = c(5000, 5000, 5000),
X_ref = list(RefDosage_P1,RefDosage_P2,RefDosage_P3),
Marg_Result = MargBeta,
EAF_Result = EAF,
condp_cut = 0.05/50,
index_snps = c("rs2", "rs11"),
within_pop_threshold = 0.50,
across_pop_threshold = 0.20)
## No. 1 selected index SNP: rs2
## 2 SNPs got pruned. 49 SNPs left.
## Continue to next round of index SNP selection.
## No. 2 selected index SNP: rs11
## 3 SNPs got pruned. 49 SNPs left.
## Continue to next round of index SNP selection.
## Analysis ended.
## --- mJAM Forward results: index SNPs
Ex3_Forward_fit$index
## # A tibble: 2 × 6
## SNP b_joint[,1] b_joint_var cond_log10p final_log10p pcut
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 rs2 0.0210 0.000125 -1.19 -1.22 0.001
## 2 rs11 -0.0286 0.000125 -1.63 -1.63 0.001
## --- mJAM Forward results: credible set SNPs
Ex3_Forward_fit$cs %>% filter(CS_in == T)
## # A tibble: 15 × 7
## index_SNP CS_SNP Post_Model_Prob Post_Med_Prob SD_Post_CS_Prob CumSum…¹ CS_in
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 rs2 rs2 1 1 0.394 0.394 TRUE
## 2 rs2 rs1 1 0.525 0.207 0.601 TRUE
## 3 rs2 rs5 1 0.177 0.0697 0.671 TRUE
## 4 rs2 rs6 1 0.159 0.0625 0.733 TRUE
## 5 rs2 rs7 0.534 0.297 0.0624 0.795 TRUE
## 6 rs2 rs3 1 0.137 0.0540 0.849 TRUE
## 7 rs2 rs9 1 0.0854 0.0337 0.883 TRUE
## 8 rs2 rs4 0.186 0.417 0.0305 0.914 TRUE
## 9 rs2 rs10 1 0.0715 0.0282 0.942 TRUE
## 10 rs2 rs8 1 0.0514 0.0203 0.962 TRUE
## 11 rs11 rs11 1 1 0.913 0.913 TRUE
## 12 rs11 rs26 1 0.0147 0.0134 0.927 TRUE
## 13 rs11 rs9 1 0.0101 0.00925 0.936 TRUE
## 14 rs11 rs6 0.708 0.0132 0.00853 0.944 TRUE
## 15 rs11 rs35 1 0.00822 0.00751 0.952 TRUE
## # … with abbreviated variable name ¹CumSum_Prob
In the above example, mJAM credible set for rs2
consists
of all 10 SNPs in the same LD block as rs2
. Conditional on
rs2
, the credible set for rs11
consists of 5
SNPs (all within the same LD block as rs11
).
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS 13.0.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.0 hJAM_1.0.1 fontawesome_0.3.0
##
## loaded via a namespace (and not attached):
## [1] shape_1.4.6 tidyselect_1.2.0 xfun_0.32 bslib_0.4.0
## [5] reshape2_1.4.4 splines_4.2.0 lattice_0.20-45 colorspace_2.1-0
## [9] vctrs_0.5.2 generics_0.1.3 htmltools_0.5.3 yaml_2.3.5
## [13] gmp_0.7-1 utf8_1.2.3 survival_3.3-1 rlang_1.0.6
## [17] jquerylib_0.1.4 pillar_1.8.1 glue_1.6.2 Rmpfr_0.9-1
## [21] withr_2.5.0 foreach_1.5.2 lifecycle_1.0.3 plyr_1.8.8
## [25] stringr_1.5.0 munsell_0.5.0 gtable_0.3.1 codetools_0.2-18
## [29] evaluate_0.16 knitr_1.40 fastmap_1.1.0 fansi_1.0.4
## [33] Rcpp_1.0.10 scales_1.2.1 susieR_0.10.0 cachem_1.0.6
## [37] jsonlite_1.8.0 ggplot2_3.4.1 digest_0.6.29 stringi_1.7.12
## [41] grid_4.2.0 cli_3.6.0 tools_4.2.0 magrittr_2.0.3
## [45] sass_0.4.2 glmnet_4.1-6 tibble_3.1.8 pkgconfig_2.0.3
## [49] Matrix_1.4-1 matrixcalc_1.0-6 rmarkdown_2.14 reshape_0.8.9
## [53] rstudioapi_0.14 iterators_1.0.14 R6_2.5.1 compiler_4.2.0