Office: Social Science 420
Phone: (406) 243-5605
- Associate Professor
- Chair of the Computer Science Department
Spring 2015: 10:00 to 11:00 Monday, Wednesday, and Friday, or by appointment
Field Of Study:
Bioinformatics resides at the intersection of the computer and biological sciences, and is a good example of the gains that can be realized when traditionally disparate scientific communities collaborate. Problems in the biological domain often involve large scale empirical analysis. Computer science tools in the realm of computational pattern recognition and algorithm analysis can be very useful for such problems. They provide a means for exploratory analysis of the problem domains, and often lead to innovative solutions.
Identification of Ribonucleac Acid Functional Structure
In collaboration with partners in the Biochemestry Department, several of the graduate students studying with me will begin investigating methods of identifying common secondary (and possibly tertiary) structures of functionally similar RNA sequences. This will involve the analysis of experimentally derived data on randomly generated RNA sequences and their affinity for a particular protein (nucleocapsid protein (N) from River Valley Fever Virus (RVFV)). All of the resulting RNA sequences will have a known affinity for this protein (aptamer status) and it will be our job to identify their secondary structures as well as common structural features, the implication being that these secondary structures (and their resultant tertiary structures) are responsible for the sequence's high affinity for the target protein.
Use of data mining techniques on sparse metagenomic data
In collaboration with partners in the Division of Biological Sciences, one of the graduate students studying with me will begin investigating the relationship between geochemical, environmental nutrient composition, and physical conditions with the functional diversity of the localized organismal population. The investigation will involve the identification of metabolic capabilities common to the inhabitants of each of a diverse set of geochemically distinct geothermal environments. This will be accomplished through genomic analysis aimed at metabolic pathway detection and characterization. The analysis will be performed from a systems biology perspective and will be made challenging by the fact that the metagenomic data is incomplete. The data are available as a result of the Yellowstone Metagenome Project, and they represent between 35-50 megabases of sequence data per site. This data, while extensive, does not represent full coverage of all genomes represented in the data; therefore, the analysis must be predictive (given the incomplete nature of the data) and must include some measure of statistical assurance in the findings. The general approach will be to identify, and be able to predict, functional diversity by building a library of genes indicative of each lifestyle, and to develop tools that will be able to recognize the presence of these functional lifestyle-predictive biomarkers. Since we will be dealing with incomplete data, this will require the development of inferential tools where the presence of certain genes can be inferred by the presence of others. The approach will be to mine existing complete genome data for associative rules that can be used make such inferences, and make them in a statistically sound way.
Global Evolutionary Forces (translational and metabolic efficiency)
Functional selection is typically considered to be the dominant force shaping proteome evolution. Mutations that give rise to changes in protein structure can lead to alterations of function that affect fitness. There are, however, evolutionary forces that are less gene-centric and more global in nature (many near neutral mutations accumulate in genes and across many genes to impart a selective advantage). Two examples are translational and metabolic efficiency. My research has shown that metabolic efficiency is an evolutionary force that influences prokaryotic and low-order eukaryotic proteomes. In the case of translational efficiency, my work has made possible the isolation of the predictive codon usage bias in prokaryotic genomes where it was not possible before. In these genomes, other biases obscure the presence of translational efficiency bias. My techniques disambiguate the translational efficiency bias.
To make the two tools I have created available to the research community, I have developed a Codon Usage Bias Database (CUB-DB) and have made it available online. In addition to my two techniques, this database includes bias adherence data for several of the more traditional approaches. The database includes all organisms currently sequenced and resident on NCBI’s microbial genome database. It is synchronized with NCBI roughly weekly, so new genomes should be available relatively quickly after publication.
While I have employed such techniques as discrete event and stochastic simulations, I intend to investigate the use of continuous simulation techniques, and find methods for getting back to first principles in solving biological problems. I also intend to implement graph theory in some of the metabolic pathway analysis, and hidden Markov models in expression prediction.
- CSCI 466 Networks (SS 362 M-W-F-- 9:10AM-10:00AM)
Prereq., CSCI 232. Concepts and practice of computer networking, network protocol layers, switching, routing, flow, and congestion control. Network programming.
- CSCI 460 Operating Systems
Prereq: CSCI 232 data structures, 205 programming languages, CSCI 361 architecture, or consent of instr. Operating system design principles. Processes, threads, synchronization, deadlock, memory management, file management and file systems, protection, and security. Comparison of commonly used existing operating systems. Writing programs that make use of operating system services.
- CSCI 451 Computational Biology (also listed at graduate level as CSCI 558 — Introduction to Bioinformatics)
Designed for attendance by both computer scientists and biologists. The course will explore the importance of interdisciplinary partnerships between these two fields. Students will learn to use various existing computational tools for investigating genomic and other biological data. This will include tools for performing sequence alignments and searches, building phylogenetic trees, predicting RNA secondary structure, and predicting protein tertiary structure. The underlying algorithmic approaches taken by these tools will be discussed, and in some cases, actually implemented by the class participants. The course will examine the data repositories where genomic and other biological data are stored. There will be some light programming required using Perl as the language of choice. It is assumed that the class participants have no experience programming in Perl and will learn this skill as part of the course.
- CSCI 205 Programming Languages
Concepts and principles of programming languages with an emphasis on C, C++, and object-oriented programming. Syntax and semantics of object-oriented languages. Principles and implementation of late binding, memory allocation and de-allocation, type-checking, scope, polymorphism, inheritance.
- CSCI 447/557 Introduction to Machine Learning
Prereq., CSCI 232 Data Structures. Introduction to the framework of learning from examples, various learning algorithms such as neural networks, and generic learning principles such as inductive bias, Occam's Razor, and data mining.
- CSCI 448/548 Pattern Recognition
Prereq., Upper division status (Junior or Senior or Graduate Student) or consent of instr. Introduction to the framework of unsupervised learning techniques such as clustering (agglomerative, fuzzy, graph theory based, etc.), multivariate analysis approaches (PCA, MDS, LDA, etc.), image analysis (edge detection, etc.), as well as feature selection and generation. Emphasis will be on the underlying algorithms and their implementation.
Deole, R., Challacombe, J., Raiford, D. W., & Hoff, W. D. (2013). An extremely halophilic proteobacterium combines a highly acidic proteome with a low cytoplasmic potassium content. J Biol Chem, 288(1), 581-588.
Raiford, Douglas W., Heizer Jr., Esley M., Millerz, Robert V., Doom, Travis E., Raymer, Michael L., & Krane, Dan E. . (2012). Metabolic and Translational Efficiency in Microbial Organisms. Journal of Molecular Evolution 74(3), 206-216.
Raiford, D. W., Doom, T. E., Krane, D. E., and Raymer, M. E. (2011). A genetic optimization approach for isolating translational efficiency bias. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(2), 342-352.
Raiford, Douglas W., Krane, Dan E., Doom, Travis E., Raymer, Michael L. (2010) "Automated Isolation of Translational Efficiency Bias that Resists the Confounding Effect of GC(AT)-Content," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(2), 238-250
Kotamarti, R. M., Hahsler, M., Raiford, D., McGee, M., & Dunham, M. H. (2010). Analyzing taxonomic classification using extensible Markov models. Bioinformatics, 26(18), 2235-2241.
Raiford, D. W., Heizer Jr., E. M., Miller, R. V., Akashi, H., Raymer, M. L., and Krane, D. E. (2008). Do amino acid biosynthetic costs constrain protein evolution in Saccharomyces cerevisiae? J. Mol. Evol., 67(6)(Dec), 621-30..
Heizer Jr., E., Raiford, D. W., Raymer, M., Doom, T., Miller, R., & Krane, Dan. 2006. Amino acid cost and codon usage biases in six prokaryotic genomes: A whole-genome analysis. Molecular Biology and Evolution, 23(9), 1670–1680.
Kotamarti, Rao M., Hahsler, Michael, Raiford, Douglas W., & Dunham, Margaret H. 2010 (May 25). Sequence transformation to a complex signature form for consistent phylogeny using extensible Markov model. Accepted for publication In: Proceedings for 2010 IEEE symposium on computational intelligence in bioinformatics and computational biology (CIBCB) May 2-5.
Heizer Jr, Esley M., Raiford, Douglas W., Raymer, Michael L., & Krane, Dan E. 2010 (June 1517). Perceived cost of auxotrophic amino acids in two bacterial species. Pages 119-122 of: Proceedings for the 4th annual Ohio Collaborative Conference on Bioinformatics (OCCBIO 2010). Case Western Reserve University, Cleveland, OH, USA.
Kotamarti, Rao M., Raiford, Douglas W., Raymer, Michael L., & Dunham, Margaret H. 2009. A data mining approach to predicting phylum for microbial organisms using genome-wide sequence data. In: Proceedings of the 9th IEEE international conference on bioinformatics and bioengineering (BIBE 2009). Taichung, Taiwan: IEEE Computer Society.
Raiford, D. W., Krane, D. E., Doom, T. E., and Raymer, M. L. (2007). A multi-objective genetic algorithm that employs a hybrid approach for isolating codon usage bias indicative of translational efficiency. In Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), volume I, pages 278–285, Boston, Massachusetts (Conference Center at Harvard Medical School). Awarded Honorary Mention for the Best Paper Award (acceptance rate <12%)
Raiford, D. W., Krane, D. E., Doom, T. E., and Raymer, M. L. (2006). Isolation and visualization of codon usage biases. In Proceedings of the Sixth IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2006), pages 179-182, Washington D.C. (acceptance rate 28%)
Raiford, D. W., Doom, T. E., Krane, D. E., & Raymer, M. L. 2006 (June 26-28). An investigation of codon usage bias including visualization and quantification in organisms exhibiting multiple biases. In: Proceedings for the Ohio Collaborative Conference on Bioinformatics (OCCBIO).
Anderson, P. E., Raiford, D. W., Sweeney, D. J., Doom, T. E., and Raymer, M. L. (2005). Stochastic model of protease-ligand reactions. In Proceedings of the 5th IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2005), pages 306–310, Minneapolis, Minnesota (acceptance rate 29%)