Statistical Genetics, April 5-6, 2002

Workshop Abstracts

See Schedule

Friday, April 5, 2002

8:30 Economics 200
Refreshments
9:00 Economics 200
Spencer Muse, North Carolina State University
Heterogeneity in the Process of Nucleotide Substitution
DNA sequences evolve in response to numerous factors, including natural selection, demographic parameters, and random "neutral" changes. Particularly when we focus on the evolution of protein-coding genes, modeling the process of sequence evolution becomes challenging, as heterogeneity in the nucleotide substitution process exists at many levels: rates vary from site to site, from gene to gene, and from species to species. I will discuss statistical approaches for identifying the sources of such heterogeneity, and demonstrate their efficacy to studies of DNA sequence evolution in plant genomes.
9:40 Economics 200
Sergei Kosakovsky Pond, University of Arizona
Modeling Synonymous and Non-Synonymous Substitution Rate Patterns
When analyzing protein coding sequence data, it is highly desirable to be able to discriminate between patterns of synonymous and non-synonymous mutation rates, because synonymous and non-synonymous mutations differ radically in their biological consequences. Current evolutionary models which account for rate variability, make some undesirable a-priori assumptions about the structure of rate patterns. These assumptions may lead to erroneous classification of loci as being under selective pressure, for instance. I will present a new computationally feasible evolutionary which allows us to study syn and non-syn rates independently and avoid most of the shortcomings of existing models.
10:40 Economics 200
Bruce Walsh, University of Arizona
What? I'm related to YOU?? -- Estimating the time to the most recent common ancestor for a pair of individuals
11:20 Economics 200
David Brian Walton, University of Arizona
Hidden Markov Model Filtering of Single Molecule Kinesin Assay Data
The motor protein kinesin, powered by ATP hydrolysis, moves processively along microtubules. This behavior has made it suitable for extensive single molecule experiments. Recent studies measuring the position of a bead tethered to a moving kinesin while trapped in an optical tweezer have generated extensive data records. Past analysis have evaluated the global properties of the data, particularly determining velocity and trajectory variances. However, the details of the data potentially contain much more information about the protein. We introduce the use of hidden Markov models in order to estimate model parameters based on the kinesin assays by using the EM algorithm, as well as a method for model selection. We look at the results of the application of the algorithms on simulated data, with an eye toward applying the algorithm to the experimental data.
2:00 Mathematics 402
Jay Taylor, University of Arizona
Tradeoffs between Transmission and Within-Host Adaptation
Population genetic studies of the human immunodeficiency viruses have demonstrated substantial evolution of HIV's during transmission and chronic infection. Viral genotypes involved in initiating new infections are typically minority forms relative to the viral population present in the donor host. Changes in genotype frequencies are due, in part, to sampling effects during transmission, when the viral population is forced through a severe bottleneck. However, studies of cell tropism and coreceptor usage by HIV-1 also suggest that transmission may exert selective pressures which are in conflict with those experienced once the host has been successfully colonized.
Here we consider the population genetic consequences of tradeoffs between transmission and within-host adaptation using a single-locus model incorporating genetic drift, transmission bottlenecks, mutation, and both forms of selection. We obtain deterministic and stochastic limits (at the level of a measure-valued process) as the size of the infected host population is scaled to infinity and present the invariant measures and transmission probabilities for the former. With a detailed study of the genetics of transmission of HIV-1, these considerations could prove useful to ongoing efforts to develop an HIV vaccine.
2:40 Mathematics 402
Kevin Greer, University of Arizona
Hierarchical Clustering of cDNA Microarray Data
Microarray analysis measures the expression levels of thousands of genes concurrently, creating data sets that are too large for humans to assimilate. Numerous techniques have been employed to reduce and display these data sets including pattern recognition techniques, statistical approaches, and learning algorithms, however evaluating the performance of each technique has proven difficult without knowledge of the true classifications within the data. This seminar will focus on the application of various hierarchical clustering techniques to microarray data, including the data normalization and centering processes that are usually performed in advance of clustering. As part of the seminar, the results of a study that evaluated the performance of ten hierarchical clustering methods using simulated microarray data will be presented. In order to evaluate the performance of these hierarchical clustering techniques, computer generated simulated experimental microarray data sets were created that contained data with known classifications. Performance was evaluated based on the percentage of data points that were clustered correctly, as defined by the groups assigned during dataset generation. While all of the algorithms performed well at low levels of variance, performance varied at higher levels of variance. In particular, the Flexible-Beta and Ward's Minimum-Variance methods proved most robust, performing well at all levels of variance, while the Single Linkage algorithm performed poorly at higher levels of variance due to its tendency to incorrectly chain observations together. Lastly, a hierarchical clustering algorithm that utilizes replicate measurements will be presented.
3:30 Mathematics 401H
Refreshments
4:00 Mathematics 501
Applied Mathematics Colloquium
Zhao-Bang Zeng, North Carolina State University
Estimating genetic architecture of quantitative traits
Understanding and estimating the structure and parameters associated with the genetic architecture of quantitative traits is a major research focus in quantitative genetics. With the availability of a well-saturated genetic map of molecular markers, it is possible to identify a major part of the structure of the genetic architecture of quantitative traits and to estimate the associated parameters. Multiple interval mapping, which was recently proposed for simultaneously mapping multiple quantitative trait loci (QTL), is well suited to the identification and estimation of the genetic architecture parameters, including the number, genomic positions, effects and interactions of significant QTL and their contribution to the genetic variance. With multiple traits and multiple environments involved in a QTL mapping experiment, pleiotropic effects and QTL by environment interactions can also be estimated. I will briefly review the method and discuss some issues associated with multiple interval mapping, such as likelihood analysis and model selection. The potential power and advantages of the method for mapping multiple QTL and estimating the genetic architecture will be illustrated through two Drosophila experiments. I will also point out potential problems and difficulties in resolving the details of the genetic architecture as well as other areas that require further investigation.

Saturday, April 6, 2002

8:30 Economics 200
Refreshments
9:00 Economics 200
Zhao-Bang Zeng, North Carolina State University
Minicourse: QTL Mapping
Quantitative Trait Loci (QTL) mapping analysis has been used to study the genetic basis of quantitative traits in a variety of medical, agricultural and evolutionary applications. The primary purpose of QTL mapping is to localize chromosomal regions that contain genes affecting quantitative trait variation in a population. Curerntly, more emphasis is placed on studying the whole genome genetic architecture of QTL which includes number, positions, effects and interaction of QTL and QTL by environment interaction. This mini-course will give an overview of statistical framework, key statistical steps and issues in linkage map construction and in QTL mapping, and current methods to infer the genetic architecture of QTL.
1:30 Economics 200
Spencer Muse, North Carolina State University
Minicourse: Molecular Evolution
The course will be divided roughly into two pieces. The first half will focus on phylogeny reconstruction, including distance, likelihood, and parsimony methods. Methods for assessing confidence in estimated phylogenies will be discussed. The second half will deal with molecular evolutionary studies, including methods for comparing rates between loci and lineages. Models of sequence evolution will be surveyed, and model selection issues will be covered briefly. Tests of selection will be covered.