comp565_fall2023_A1
pdf
keyboard_arrow_up
School
McGill University *
*We aren’t endorsed by this school
Course
565
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
4
Uploaded by stephenlu2002
Assignment 1
COMP 565 ML in Genomics and Healthcare
This assignment is worth 8% of your total grade and due at
midnight on September 25, 2023
Question 1 [2%] Implementing LD score regression
For a phenotype of interest, we have collected the marginal statistics
˜
β
for
M
= 4268
SNPs and
the
M
×
M
LD matrix
R
(i.e., pairwise SNP-SNP Pearson correlation). The marginal statistics
are based on
N
= 1000
individuals. Download the marginal statistics and LD matrix from here:
https://drive.google.com/drive/folders/1tq4bTdbsv1iwO4wHxq1smzoN9D5luapp?usp=sharing
For this question, you may also assume there is no population stratification in this dataset. Both
phenotype and genotype were standardized.
Implement the very basic LD score regression algorithm with a programming language of your
choice (preferably Python or R) to estimate the heritability of the phenotype.
What’s your estimate of the heritability?
Submit your answer to this question in iPython notebook with name
COMP565
A1
ldsr.ipynb
or R Markdown
COMP565
A1
ldsr.Rmd
on MyCourses. This way the TA can run your code to
validate its output. Do not submit the data provided to you as long as you have the clear path to
the data you run.
Question 2 [6%] Bayesian fine-mapping
For a phenotype of interest, we have identified a GWAS locus based on N=498 individuals,
which harbour 100 SNPs. As shown in Figure 1, because of the extensive LD, identifying the
1
Figure 1: Manhattan plot for the GWAS locus to finemap. The causal SNPs are in fact coloured
in red although in practice we will know which SNPs are causal.
causal SNPs based on the p-values of the z-scores alone is error prone. Because this is an as-
signment, I have highlighted the causal SNPs namely rs10104559, rs1365732, rs12676370 but
of course in real world applications, we will not know them.
Download the marginal z-score and LD matrix from here:
https://drive.google.com/drive/folders/1tr7BCceyIcKxiO_i6iCNjvk44HHpImgG?usp=sharing
Your task is to implement a simplified version of the FINEMAP algorithm discussed in Lecture
5. To make the task easier, you may assume there are maximum 3 causal SNPs in the locus.
You can divide the tasks into four small tasks:
2
1. (1%) Implement the efficient Bayes factor for each causal configurations:
y
=
Xλ
+
ϵ,
ϵ
∼ N
(0
, σ
2
I
n
)
,
λ
∼ N
(0
, s
2
λ
σ
2
∆
γ
)
where
s
2
λ
is user-defined prior variance in the unit of
σ
2
,
∆
γ
is the diagonal matrix with di-
agonal equal to
γ
(causal configuration). You may assume that
s
2
λ
= 0
.
005
. Therefore,
assuming there are
k
causal SNPs, then
Σ
CC
=
Ns
2
I
k
= 2
.
49
I
k
For the multivariate Gaussian density function, we may find many existing libraries. In R,
mvtnorm
. In Python, it can be found in scipy
https://docs.scipy.org/doc/scipy-0.14.
0/reference/generated/scipy.stats.multivariate_normal.html
2. (1%) Implement the prior calculation for each configurations
3. (2%) Implement posterior inference over all possible configurations assuming at maxi-
mum 3 causal SNPs (i.e.,
(
100
1
)
+
(
100
2
)
+
(
100
3
)
= 100 + 4950 + 161700 = 166
,
750
possible
configurations). Therefore,
no
stochastic sampling is required here (as opposed to the
original FINEMAP).
To obtain all possible configurations, we may also use existing libraries. In R, this is done
by
gtools:combinations
. In Python, check out
itertools.combinations https://docs.
python.org/2/library/itertools.html
Some configurations may result in non-finite multivariate Gaussian density. Discard those
configurations.
Visualize the configuration posteriors by ranking them in increasing order as shown in
Figure 2. As we can see, the vast majority of the configurations have very small posterior
probabilities.
4. (2%) Implement posterior inclusion probabilities (PIP) to calculate SNP-level posterior
probabilities.
Visualize the normalized inferred PIP aligned with GWAS marginal -log10 p-values in Fig-
ure 3. It looks like we missed one of the 3 causal SNPs due to its nearly perfect LD with
the other causal SNPs. But in general, we are able to pull down quite a few non-causal
ones. That is, if we were going to experimentally validate the top SNPs, 2 out of 6 top
SNPs based on PIP are true causal ones, whereas we would have got a lot more false
positives if we were to follow the -log10 P-values instead.
Similar to Question 1, submit your code in
COMP565
A1
finemap.ipynb
or
COMP565
A1
finemap.Rmd
via myCourses. Your code should generate the plots illustrated in Figure 2 and 3. There may
be some difference due to the numerical implementations of various MVN libraries but it should
not differ too much from the provided PIP values in
SNP
pip.csv.gz
. This way the TA can run
your notebook to validate its output. Do not submit the data provided to you as long as you
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
have the clear path to the data you run in your notebook. You will be also evaluated based on
the correctness of your code. Therefore, making your code readable is also very important.
Figure 2: Posteriors of all of the valid configurations in increasing order.
−log10 p
pip
0
10
20
30
40
0.0
0.2
0.4
0.6
causal_SNP
FALSE
TRUE
Figure 3: Inference results. Top panel. -log10 P-values of the marginal z-scores of the 100
SNPs. Bottom panel. The inferred posterior inclusion probabilities (PIP) of the 100 same SNPs.
4
Related Documents
Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Recommended textbooks for you
- Operations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks Cole

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole