Likelihood Ratio

Last updated: 2025-02-20

Checks: 7 0

Knit directory: PODFRIDGE/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20230302)

The command set.seed(20230302) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 99925b4

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 99925b4. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/lr.Rmd) and HTML (docs/lr.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	99925b4	sammuller917	2025-02-20	Update lr.Rmd
html	42687fd	sammuller917	2025-02-20	Update lr.html
Rmd	c2cacc8	sammuller917	2025-02-20	updated text and formating
html	c2cacc8	sammuller917	2025-02-20	updated text and formating
html	f143ee1	tinalasisi	2024-09-16	Revised website
html	6176bd3	tinalasisi	2024-09-16	Cleaning up repo and adding license.
html	9a85666	sammuller917	2024-07-11	Update lr.html
Rmd	9ce5626	sammuller917	2024-07-11	Update lr.Rmd
html	49f9b23	sammuller917	2024-07-03	created lr page
Rmd	8f1347c	sammuller917	2024-07-03	Update lr.Rmd
html	b71e11f	linmatch	2024-07-03	new com
Rmd	609a240	sammuller917	2024-07-03	Update lr.Rmd
Rmd	009643c	Tina Lasisi	2024-07-01	Create lr.Rmd

Introduction

The following supplementary resource attempts to explain the statistical calculations used for the individual likelihood ratios at each locus given two known allele profiles. It uses the Weight of Evidence text as a starting and ending point to show the derivation of the standard Match Probability equation into the likelihood ratio (LR) equation we will use in our simulation. This provides a detailed explanation of why certain variables are used or not used in the calculations, and what the final numbers represent.

5.6 in Weight of Evidence

The following equation predicts the probability that the next allele sampled in a population will be allele A, also known as the Match Probability, \(M\). \[ M=\frac{m \theta+(1-\theta) p}{1+(n-1) \theta} \] Where \(p\) is the probability of allele A appearing in the population, \(n\) is the number of alleles being sampled, and \(m\) is the number of observed allele A in the sampled population.

\(\theta\) represents the correction for population stratification and genetic drift. The higher the value of \(\theta\), the higher the likelihood of seeing any given allele in a population once it has already been observed. The most common \(\theta\) value is 0.01, but in many cases \(\theta\) can be set to 0.

Developing the Likelihood Ratio

A likelihood ratio calculation will be used to compare a forensic unknown, profile c, to a convicted offender profile, profile O, and determine the likelihood that those individuals are related. To determine the likelihood of relatedness, or the relatedness score, we compare the probability that profile A and profile B sharing alleles at specific loci is due to them being related, versus the probability that the shared alleles are due to random chance and the pair is unrelated. This gives us our base likelihood ratio, which we will be deriving using the equation above. \[ R = \frac{P(profile\:c \:|\:profile\:O,\:related)}{P(profile\:c\:|\:profile\:O,\:unrelated)} \] We know related individuals will likely share a certain number of alleles due to recent common ancestors. This is known as alleles being identical by descent, or IBD, and can either be 0, 1, or 2. The probability for sharing 0, 1, or 2 alleles due to IBD is represented as \(\kappa\), and will be calculated for each relationship.

To determine the probability of sharing \(i\) alleles for a related pair, we must multiple the probability of sharing 0, 1, or 2 alleles due to IBD by the probability of profile O having \(i\) shared alleles based on profile c, and sum the possible options for all values of i.

This results in the following equation: \[ P(profile\:c\:|\:profile\:O,\:related)=\kappa_2(M_0)+\kappa_1(M_1)+\kappa_0(M_2) \]

In the instance of \(\kappa_2\), we know the match probability to be certain, so \(M_0 = 1\), as in this scenario there is a 100% match between both alleles due to them both being IBD. If both alleles in profile O do not match to profile c, then \(\kappa_2\) will be 0 and will not be used.

To determine what equations should be used for \(M_1\) and \(M_2\), we must look at the alleles of the population being sampled, that is, the alleles of profile c and profile O at the designated locus. The table below describes the potential allele combinations at a designated locus where C is any other allele. \[ \begin{aligned} &\begin{array}{llc} \hline c & O \\ \hline \mathrm{AA} & \mathrm{AA} \\ \mathrm{AA} & \mathrm{AB} \\ \mathrm{AB} & \mathrm{AA}\\ \mathrm{AB} & \mathrm{AB}\\ \mathrm{AB} & \mathrm{AC} \\ \hline \end{array} \end{aligned} \]

\(M_1\) will tell us the match probability of profile O to one allele in profile c. To calculate \(M_1\) we must evaluate the scenarios in which profile c is homozygous or heterozygous.

In the case of homozygous profile c (AA) we calculate the probability that profile O has a matching allele A. This is based on the condition that two A alleles have already been observed in the sampled population, giving us an \(m\) value of 2 and an \(n\) value of 2. Using equation 5.6 we come up with the following \(M_1\) for use when profile c is homozygous \[ M_{1\:c|homo}=\frac {2\theta+(1-\theta)p_A}{1+(2-1)\theta} \] which when \(\theta = 0\) reduces to \(p_A\)

In the case of heterozygous profile c (AB) we calculate the probability that profile O has one matching allele, either matching allele A or allele B. This will sum the equation for observing an A allele after one has already been observed, \(m = 1\), \(n = 2\), and \(p_A\), with the equation for observing a B allele after one has already been observed, \(m = 1\), \(n = 2\), \(p_B\). The options are equally likely, so we multiply by \(\frac {1}{2}\). Using equation 5.6 we come up with the following \(M_1\) for use when profile c is heterozygous.
\[ \begin{equation} M_{1\:c|hetero}=(\frac {1\theta+(1-\theta)p_A}{1+(2-1)\theta}+\frac {1\theta+(1-\theta)p_B}{1+(2-1)\theta})*\frac{1}{2}=\frac {2\theta+(1-\theta)(p_A+p_B)}{2+2\theta}*\frac{1}{2}\\ \: \\ M_{1\:c|hetero}=\frac{\theta+(1-\theta)(p_A+p_B)/2}{1+\theta} \end{equation} \] which when \(\theta = 0\) reduces to \(\frac{p_A+p_B}{2}\)

In the scenario of \(\kappa_0\), no alleles are presumed IBD, so we must use \(M_2\) to determine the probability that the alleles in profile O match to any of the alleles found in profile c randomly. This is also known as the unrelated match probability.

To determine \(M_2\), we again start with profile c, and ask whether it is homozygous or heterozygous.

If profile c is homozygous, we are trying to determine the probability of profile O having matching two matching allele As. This probability is the product of the calculation for each allele in profile O given that all previous alleles have been A. We will calculate equation 5.6 using an m and n value of 2, and then an m and n value of 3, as seen below, or in the Weight of Evidence as formula 6.3 \[ M_{2\: homo}=\frac{2\theta+(1-\theta)p_A}{1+(2-1)\theta}*\frac{3\theta+(1-\theta)p_A}{1+(3-1)\theta} \] which when \(\theta = 0\) reduces to \({p_A}^2\)

In the instance where only one allele matches between profile O and profile C, \(M_2\) would be doubled to account for either allele being the one that is IBD vs IBS, making \(M_2 = 2{p_A}^2\).

If profile c is heterozygous, we are trying to determine the probability of profile O having two matching alleles, allele A and allele B. This probability evaluates 5.6 for the instances where m = 1, n = 2 and again with m = 1, n = 3, and multiplying by two for the two possible orderings of the A and B alleles, as seen below and in Weight of Evidence as formula 6.4 \[ M_{2\: hetero}=2\frac{\theta+(1-\theta)p_A}{1+(2-1)\theta}*\frac{\theta+(1-\theta)p_B}{1+(3-1)\theta} \] which when \(\theta = 0\) reduces to \(2p_Ap_B\)

The various match probability equations for each scenario are shown in the table below: \[ \begin{aligned} &\begin{array}{llc} \hline c & O & \kappa_i\:used & M_0|\theta=0&M_1|\theta=0&M_2|\theta=0\\ \hline \mathrm{AA} & \mathrm{AA} & \kappa_2,\kappa_1,\kappa_0 & 1&p_A & {p_A}^2 \\ \mathrm{AA}& \mathrm{AC} & \kappa_1,\kappa_0 & 0 & p_A & 2{p_A}^2\\ \mathrm{AB} & \mathrm{AC} & \kappa_1,\kappa_0 & 0 & \frac{p_A}{2} & {p_A}^2 \\ \mathrm{AB} & \mathrm{CB} & \kappa_1,\kappa_0 & 0 & \frac{p_B}{2} & {p_B}^2 \\ \mathrm{AB} & \mathrm{AB} & \kappa_2,\kappa_1,\kappa_0 & 1 & \frac{p_A}{2}+\frac{p_B}{2} & 2p_Ap_B \\ \hline \end{array} \end{aligned} \]

The Final Likelihood Ratio

To calculate the final likelihood ratio, we must place our related probability estimate for matched alleles over the unrelated estimate for matched alleles. We have already done the calculation for this denominator in calculating the \(M_2\) equation, which is comparing the probability of a profile appearing twice in a pair of unrelated individuals

For heterozygous profile c using a \(\theta=0\), the denominator of the likelihood ratio will be \(2p_Ap_B\)

For homozygous profile c using a \(\theta=0\), the denominator of the likelihood ratio will be \({p_A}^2\)

The following proofs demonstrate the simplification of the likelihood ratio equations for each scenario present in the table

\[ \begin{equation} profile\;c=AA,\;profile\;O=AA\\ R=\frac{\kappa_2+\kappa_1(p_A)+\kappa_0({p_A}^2)}{{p_A}^2}=\frac{\kappa_2}{{p_A}^2}+\frac{\kappa_1}{p_A}+\kappa_0 \end{equation} \] \[ \begin{equation} profile\;c=AA,\;profile\;O=AC\\ R=\frac{\kappa_1(p_A)+\kappa_0(2{p_A}^2)}{2{p_A}^2}=\frac{\kappa_1}{2p_A}+\kappa_0 \end{equation} \] \[ \begin{equation} profile\;c=AB,\;profile\;O=AC\\ R=\frac{\kappa_1(\frac{p_A}{2})+\kappa_0({p_A}^2)}{{p_A}^2}=\frac{\kappa_1}{2p_A}+\kappa_0 \end{equation} \] This above equation would also be utilized in the instance of profile c = AB and profile O = CB, although \(p_B\) would be substituted for \(p_A\).

In this instance, the matching allele is unknown between profiles, so we must combine the calculations for the first or the second allele being the matching allele across two heterozygous profiles, transforming \(2p_A\) into \(4p_A\).

\[ \begin{equation} profile\;c=AB,\;profile\;O=AB\\ R=\frac{\kappa_2+\kappa_1(\frac{p_A}{2}+\frac{p_B}{2})+\kappa_0(2p_Ap_B)}{2p_Ap_B}=\frac{\kappa_2}{2p_Ap_B}+\frac{\kappa_1}{\frac{p_A+p_B}{4p_Ap_B}}+\kappa_0 \end{equation} \]

This is where we draw our values for the final simulation, and where we see our equation derivations match up with the text once again.

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Detroit
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         knitr_1.46        rlang_1.1.3      
 [5] xfun_0.43         stringi_1.8.4     promises_1.3.0    jsonlite_1.8.8   
 [9] workflowr_1.7.1   glue_1.7.0        rprojroot_2.0.4   git2r_0.33.0     
[13] htmltools_0.5.8.1 httpuv_1.6.15     sass_0.4.9        fansi_1.0.6      
[17] rmarkdown_2.26    jquerylib_0.1.4   evaluate_0.23     tibble_3.2.1     
[21] fastmap_1.1.1     yaml_2.3.8        lifecycle_1.0.4   whisker_0.4.1    
[25] stringr_1.5.1     compiler_4.4.0    fs_1.6.4          Rcpp_1.0.12      
[29] pkgconfig_2.0.3   rstudioapi_0.16.0 later_1.3.2       digest_0.6.35    
[33] R6_2.5.1          utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3   
[37] bslib_0.7.0       tools_4.4.0       cachem_1.0.8

Likelihood Ratio

Sam Muller

2025-02-20 13:56:33

Introduction

Developing the Likelihood Ratio

The Final Likelihood Ratio