Last updated: 2025-02-20
Checks: 7 0
Knit directory: PODFRIDGE/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230302)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 99925b4. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/lr.Rmd
) and HTML
(docs/lr.html
) files. If you’ve configured a remote Git
repository (see ?wflow_git_remote
), click on the hyperlinks
in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 99925b4 | sammuller917 | 2025-02-20 | Update lr.Rmd |
html | 42687fd | sammuller917 | 2025-02-20 | Update lr.html |
Rmd | c2cacc8 | sammuller917 | 2025-02-20 | updated text and formating |
html | c2cacc8 | sammuller917 | 2025-02-20 | updated text and formating |
html | f143ee1 | tinalasisi | 2024-09-16 | Revised website |
html | 6176bd3 | tinalasisi | 2024-09-16 | Cleaning up repo and adding license. |
html | 9a85666 | sammuller917 | 2024-07-11 | Update lr.html |
Rmd | 9ce5626 | sammuller917 | 2024-07-11 | Update lr.Rmd |
html | 49f9b23 | sammuller917 | 2024-07-03 | created lr page |
Rmd | 8f1347c | sammuller917 | 2024-07-03 | Update lr.Rmd |
html | b71e11f | linmatch | 2024-07-03 | new com |
Rmd | 609a240 | sammuller917 | 2024-07-03 | Update lr.Rmd |
Rmd | 009643c | Tina Lasisi | 2024-07-01 | Create lr.Rmd |
The following supplementary resource attempts to explain the statistical calculations used for the individual likelihood ratios at each locus given two known allele profiles. It uses the Weight of Evidence text as a starting and ending point to show the derivation of the standard Match Probability equation into the likelihood ratio (LR) equation we will use in our simulation. This provides a detailed explanation of why certain variables are used or not used in the calculations, and what the final numbers represent.
5.6 in Weight of Evidence
The following equation predicts the probability that the next allele sampled in a population will be allele A, also known as the Match Probability, \(M\). \[ M=\frac{m \theta+(1-\theta) p}{1+(n-1) \theta} \] Where \(p\) is the probability of allele A appearing in the population, \(n\) is the number of alleles being sampled, and \(m\) is the number of observed allele A in the sampled population.
\(\theta\) represents the correction for population stratification and genetic drift. The higher the value of \(\theta\), the higher the likelihood of seeing any given allele in a population once it has already been observed. The most common \(\theta\) value is 0.01, but in many cases \(\theta\) can be set to 0.
A likelihood ratio calculation will be used to compare a forensic unknown, profile c, to a convicted offender profile, profile O, and determine the likelihood that those individuals are related. To determine the likelihood of relatedness, or the relatedness score, we compare the probability that profile A and profile B sharing alleles at specific loci is due to them being related, versus the probability that the shared alleles are due to random chance and the pair is unrelated. This gives us our base likelihood ratio, which we will be deriving using the equation above. \[ R = \frac{P(profile\:c \:|\:profile\:O,\:related)}{P(profile\:c\:|\:profile\:O,\:unrelated)} \] We know related individuals will likely share a certain number of alleles due to recent common ancestors. This is known as alleles being identical by descent, or IBD, and can either be 0, 1, or 2. The probability for sharing 0, 1, or 2 alleles due to IBD is represented as \(\kappa\), and will be calculated for each relationship.
To determine the probability of sharing \(i\) alleles for a related pair, we must multiple the probability of sharing 0, 1, or 2 alleles due to IBD by the probability of profile O having \(i\) shared alleles based on profile c, and sum the possible options for all values of i.
This results in the following equation: \[ P(profile\:c\:|\:profile\:O,\:related)=\kappa_2(M_0)+\kappa_1(M_1)+\kappa_0(M_2) \]
In the instance of \(\kappa_2\), we know the match probability to be certain, so \(M_0 = 1\), as in this scenario there is a 100% match between both alleles due to them both being IBD. If both alleles in profile O do not match to profile c, then \(\kappa_2\) will be 0 and will not be used.
To determine what equations should be used for \(M_1\) and \(M_2\), we must look at the alleles of the population being sampled, that is, the alleles of profile c and profile O at the designated locus. The table below describes the potential allele combinations at a designated locus where C is any other allele. \[ \begin{aligned} &\begin{array}{llc} \hline c & O \\ \hline \mathrm{AA} & \mathrm{AA} \\ \mathrm{AA} & \mathrm{AB} \\ \mathrm{AB} & \mathrm{AA}\\ \mathrm{AB} & \mathrm{AB}\\ \mathrm{AB} & \mathrm{AC} \\ \hline \end{array} \end{aligned} \]
\(M_1\) will tell us the match probability of profile O to one allele in profile c. To calculate \(M_1\) we must evaluate the scenarios in which profile c is homozygous or heterozygous.
In the case of homozygous profile c (AA) we calculate the probability that profile O has a matching allele A. This is based on the condition that two A alleles have already been observed in the sampled population, giving us an \(m\) value of 2 and an \(n\) value of 2. Using equation 5.6 we come up with the following \(M_1\) for use when profile c is homozygous \[ M_{1\:c|homo}=\frac {2\theta+(1-\theta)p_A}{1+(2-1)\theta} \] which when \(\theta = 0\) reduces to \(p_A\)
In the case of heterozygous profile c (AB) we calculate the
probability that profile O has one matching allele, either matching
allele A or allele B. This will sum the equation for observing an A
allele after one has already been observed, \(m = 1\), \(n =
2\), and \(p_A\), with the
equation for observing a B allele after one has already been observed,
\(m = 1\), \(n = 2\), \(p_B\). The options are equally likely, so
we multiply by \(\frac {1}{2}\). Using
equation 5.6 we come up with the following \(M_1\) for use when profile c is
heterozygous.
\[
\begin{equation}
M_{1\:c|hetero}=(\frac {1\theta+(1-\theta)p_A}{1+(2-1)\theta}+\frac
{1\theta+(1-\theta)p_B}{1+(2-1)\theta})*\frac{1}{2}=\frac
{2\theta+(1-\theta)(p_A+p_B)}{2+2\theta}*\frac{1}{2}\\
\: \\
M_{1\:c|hetero}=\frac{\theta+(1-\theta)(p_A+p_B)/2}{1+\theta}
\end{equation}
\] which when \(\theta = 0\)
reduces to \(\frac{p_A+p_B}{2}\)
In the scenario of \(\kappa_0\), no alleles are presumed IBD, so we must use \(M_2\) to determine the probability that the alleles in profile O match to any of the alleles found in profile c randomly. This is also known as the unrelated match probability.
To determine \(M_2\), we again start with profile c, and ask whether it is homozygous or heterozygous.
If profile c is homozygous, we are trying to determine the probability of profile O having matching two matching allele As. This probability is the product of the calculation for each allele in profile O given that all previous alleles have been A. We will calculate equation 5.6 using an m and n value of 2, and then an m and n value of 3, as seen below, or in the Weight of Evidence as formula 6.3 \[ M_{2\: homo}=\frac{2\theta+(1-\theta)p_A}{1+(2-1)\theta}*\frac{3\theta+(1-\theta)p_A}{1+(3-1)\theta} \] which when \(\theta = 0\) reduces to \({p_A}^2\)
In the instance where only one allele matches between profile O and profile C, \(M_2\) would be doubled to account for either allele being the one that is IBD vs IBS, making \(M_2 = 2{p_A}^2\).
If profile c is heterozygous, we are trying to determine the probability of profile O having two matching alleles, allele A and allele B. This probability evaluates 5.6 for the instances where m = 1, n = 2 and again with m = 1, n = 3, and multiplying by two for the two possible orderings of the A and B alleles, as seen below and in Weight of Evidence as formula 6.4 \[ M_{2\: hetero}=2\frac{\theta+(1-\theta)p_A}{1+(2-1)\theta}*\frac{\theta+(1-\theta)p_B}{1+(3-1)\theta} \] which when \(\theta = 0\) reduces to \(2p_Ap_B\)
The various match probability equations for each scenario are shown in the table below: \[ \begin{aligned} &\begin{array}{llc} \hline c & O & \kappa_i\:used & M_0|\theta=0&M_1|\theta=0&M_2|\theta=0\\ \hline \mathrm{AA} & \mathrm{AA} & \kappa_2,\kappa_1,\kappa_0 & 1&p_A & {p_A}^2 \\ \mathrm{AA}& \mathrm{AC} & \kappa_1,\kappa_0 & 0 & p_A & 2{p_A}^2\\ \mathrm{AB} & \mathrm{AC} & \kappa_1,\kappa_0 & 0 & \frac{p_A}{2} & {p_A}^2 \\ \mathrm{AB} & \mathrm{CB} & \kappa_1,\kappa_0 & 0 & \frac{p_B}{2} & {p_B}^2 \\ \mathrm{AB} & \mathrm{AB} & \kappa_2,\kappa_1,\kappa_0 & 1 & \frac{p_A}{2}+\frac{p_B}{2} & 2p_Ap_B \\ \hline \end{array} \end{aligned} \]
To calculate the final likelihood ratio, we must place our related probability estimate for matched alleles over the unrelated estimate for matched alleles. We have already done the calculation for this denominator in calculating the \(M_2\) equation, which is comparing the probability of a profile appearing twice in a pair of unrelated individuals
For heterozygous profile c using a \(\theta=0\), the denominator of the likelihood ratio will be \(2p_Ap_B\)
For homozygous profile c using a \(\theta=0\), the denominator of the likelihood ratio will be \({p_A}^2\)
The following proofs demonstrate the simplification of the likelihood ratio equations for each scenario present in the table
\[ \begin{equation} profile\;c=AA,\;profile\;O=AA\\ R=\frac{\kappa_2+\kappa_1(p_A)+\kappa_0({p_A}^2)}{{p_A}^2}=\frac{\kappa_2}{{p_A}^2}+\frac{\kappa_1}{p_A}+\kappa_0 \end{equation} \] \[ \begin{equation} profile\;c=AA,\;profile\;O=AC\\ R=\frac{\kappa_1(p_A)+\kappa_0(2{p_A}^2)}{2{p_A}^2}=\frac{\kappa_1}{2p_A}+\kappa_0 \end{equation} \] \[ \begin{equation} profile\;c=AB,\;profile\;O=AC\\ R=\frac{\kappa_1(\frac{p_A}{2})+\kappa_0({p_A}^2)}{{p_A}^2}=\frac{\kappa_1}{2p_A}+\kappa_0 \end{equation} \] This above equation would also be utilized in the instance of profile c = AB and profile O = CB, although \(p_B\) would be substituted for \(p_A\).
In this instance, the matching allele is unknown between profiles, so we must combine the calculations for the first or the second allele being the matching allele across two heterozygous profiles, transforming \(2p_A\) into \(4p_A\).
\[ \begin{equation} profile\;c=AB,\;profile\;O=AB\\ R=\frac{\kappa_2+\kappa_1(\frac{p_A}{2}+\frac{p_B}{2})+\kappa_0(2p_Ap_B)}{2p_Ap_B}=\frac{\kappa_2}{2p_Ap_B}+\frac{\kappa_1}{\frac{p_A+p_B}{4p_Ap_B}}+\kappa_0 \end{equation} \]
This is where we draw our values for the final simulation, and where we see our equation derivations match up with the text once again.
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Detroit
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 cli_3.6.2 knitr_1.46 rlang_1.1.3
[5] xfun_0.43 stringi_1.8.4 promises_1.3.0 jsonlite_1.8.8
[9] workflowr_1.7.1 glue_1.7.0 rprojroot_2.0.4 git2r_0.33.0
[13] htmltools_0.5.8.1 httpuv_1.6.15 sass_0.4.9 fansi_1.0.6
[17] rmarkdown_2.26 jquerylib_0.1.4 evaluate_0.23 tibble_3.2.1
[21] fastmap_1.1.1 yaml_2.3.8 lifecycle_1.0.4 whisker_0.4.1
[25] stringr_1.5.1 compiler_4.4.0 fs_1.6.4 Rcpp_1.0.12
[29] pkgconfig_2.0.3 rstudioapi_0.16.0 later_1.3.2 digest_0.6.35
[33] R6_2.5.1 utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3
[37] bslib_0.7.0 tools_4.4.0 cachem_1.0.8