Analyze your data Home Help Citations Job Queue Stats HyPhy package
Navigation Banner

How does PARRIS infer selection?

Complete method details can be found in this Bioinformatics paper
Phase 1: Nucleotide model maximum likelihood (ML) fit
A nucleotide model (any model from the time-reversible class can be chosen) is fitted to the data and tree (either NJ or user supplied) using maximum likelihood to obtain branch lengths and substitution rates. If the input alignment contains multiple segments, base frequencies and substitution rates are inferred jointly from the entire alignment, while branch lengths are fitted to each segment separately. The "best-fitting" model can be determined automatically by a model selection procedure or chosen by the user.
Phase 2: Null model M1 (no selection) fit
Holding branch lengths proportional to and subsitution rate parameters constant at the values estimated in Phase 1, a codon model obtained by crossing MG94 and the nucleotide model of Phase 1 is fitted to the data to obtain independent rate distributions for ω (dN/dS) and dS. This methods allows for rate heterogeneity both in synonymous and non-synonymous rates, by fitting a 3 bin general discrete distribution to synonymous rates, and a 2 bin discrete distribution to ω yielding 6 possible values for the ratio dN. The ω distribution has the form: ω_1<1 (weight P) and 1 (weight 1-P).
Phase 3: Alternative model M2 (selection) fit
Holding branch lengths proportional to and subsitution rate parameters constant at the values estimated in Phase 1, a codon model obtained by crossing MG94 and the nucleotide model of Phase 1 is fitted to the data to obtain independent rate distributions for ω (dN/dS) and dS. This methods allows for rate heterogeneity both in synonymous and non-synonymous rates, by fitting a 3 bin general discrete distribution to synonymous rates, and a 3 bin discrete distribution to ω yielding 3 possible values for the ratio dN. The ω distribution has the form: ω_1<1 (weight P_1), 1 (weight (1-P_1)P_2) and ω_2>1 (weight (1-P_1)(1-P_2)).
Phase 4: LRT
Because model M1 is nested in M2 (set P_2 = 1 or ω_2 = 1), they can be tested against each other using a likelihood ratio test with 2 degrees of freedom (this is actually a conservative test because P_2 and ω_2 are not separately identifible if on the boundary (i.e. when P_2 = 0 or 1 or when ω_2 = 1).
UCSD Viral Evolution Group 2004-2017  
Datamonkeys Webcomic New! Spidermonkey. HyPhy Package Page Datamonkey.org start page