The University of Arizona / Department of Mathematics

MATH 577-1, Fall 2002


This is a new bioinformatics course, taught for the second time in Fall 2002. It focuses on software and custom programming, with some discussion of algorithms and other theoretical/mathematical topics.

(Since this is an experimental course taught by a Mathematics Department faculty member, it's listed as Math 577, "Topics in Applied Mathematics", in the Schedule of Classes.)

Math 577-1 complements two of UA's other bioinformatics classes: the biology class MCB 416/516, which focuses on sequence and genome analysis and includes an introduction to programming, and the computer science class C SC 650, which focuses on algorithms. (Those two classes are taught by Prof. David Mount and Prof. John Kececioglu, respectively.)

Target Audience: A computer-literate graduate student, either in a computing field or in biology, who would like to learn more about bioinformatics and bioinformatics software (some programming required), and who is not afraid of a bit of statistical or theoretical analysis.

Recommended Texts:


Many free software packages that can be downloaded from the Internet, and installed on Linux or Unix systems, will be discussed. This includes software tools for DNA and protein sequence analysis, and protein structure visualization and prediction. Such tools are available from NCBI and elsewhere.

The writing of software for data retrieval and manipulation, in Perl and other scripting languages, will also be a focus (the rudiments of Perl and network programming will be covered). This includes setting up a local sequence database with BLAST and other search facilities, and interfacing it to the Web. (MySQL and CGI issues will be covered.) Also, standard data models and file formats will be covered, such as the ASN.1 sequence format used by GenBank, the protein sequence format used by SWISS-PROT, and protein structure formats, such as the PDB format used by the Protein Data Bank. Parsing of file formats and conversion between file formats will be discussed. BioPerl, which is a powerful tool for these tasks, will be discussed. Other cutting-edge software technologies to be covered include XML, which is being applied in the DAS project to exchange annotations on genomic sequence data.

If there's time, the GCG software suite for sequence analysis, which is available locally at the UA's BCF (Biotechnology Computing Facility), will be covered, along with other software available at the BCF for sequence alignment and fragment assembly.

Quite a few theoretical topics will be covered, to explain what goes on inside software packages. These include algorithms for sequence analysis; e.g., sequence alignment and multiple sequence alignment via hidden Markov models. (There will be some mathematics and statistics here!) Also, there will be some discussion of parsing data and data records. The basic theory of relational and other database management systems may be covered too, in connection with MySQL.

Note: If you're interested in taking the course, please register early, so that it can run. (Registering for audit is fine.)

[ Top of Page | Math Dept. Home Page | CSc Dept. Home Page ]
Last updated May 6, 2002
Robert S. Maier (

Valid XHTML 1.0! | Best viewed with any browser.