LIST OF CONTRIBUTORS xxi PREFACE xxvii I PATTERN RECOGNITION IN SEQUENCES 1 1 COMBINATORIAL HAPLOTYPING PROBLEMS 3 Giuseppe Lancia 1.1 Introduction / 3 1.2 Single Individual Haplotyping / 5 1.2.1 The Minimum Error Correction Model / 8 1.2.2 Probabilistic Approaches and Alternative Models / 10 1.3 Population Haplotyping / 12 1.
3.1 Clark''s Rule / 14 1.3.2 Pure Parsimony / 15 1.3.3 Perfect Phylogeny / 19 1.3.4 Disease Association / 21 1.
3.5 Other Models / 22 References / 23 2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28 Sima Behpour and Bhaskar DasGupta 2.1 Introduction / 28 2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32 2.2.1 Average Length of Optimal Barcodes / 33 2.3 Entropy-Based Information Content Technique for Designing Approximation Algorithms for String Barcoding Problems / 34 2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36 2.
4.1 Reductions from Set Covering Problem / 36 2.4.2 Reduction from Graph-Coloring Problem / 38 2.5 Heuristic Algorithms for String Barcoding Problems / 39 2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39 2.5.
2 Balanced Partitioning Approach / 40 2.6 Conclusion / 40 Acknowledgments / 41 References / 41 3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43 Matteo Comin and Davide Verzotto 3.1 Introduction / 43 3.2 Whole-Genome Sequence Analysis / 44 3.2.1 Background on Whole-Genome Comparison / 44 3.2.2 Alignment-Free Methods / 45 3.
2.3 Average Common Subword / 46 3.2.4 Kullback-Leibler Information Divergence / 47 3.3 Underlying Approach / 47 3.3.1 Irredundant Common Subwords / 48 3.3.
2 Underlying Subwords / 49 3.3.3 Efficient Computation of Underlying Subwords / 50 3.3.4 Extension to Inversions and Complements / 53 3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53 3.4 Experimental Results / 54 3.
4.1 Genome Data sets and Reference Taxonomies / 54 3.4.2 Whole-Genome Phylogeny Reconstruction / 56 3.5 Conclusion / 61 Author''s Contributions / 62 Acknowledgments / 62 References / 62 4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65 Chengpeng Bi 4.1 Introduction / 65 4.2 Multiple Sequence Local Alignment / 67 4.2.
1 Overall Objective Function / 67 4.2.2 Maximum Likelihood Model / 68 4.3 Motif Finding Algorithms / 70 4.3.1 DEM Motif Algorithm / 70 4.3.2 WEM Motif Finding Algorithm / 70 4.
3.3 Metropolis Motif Finding Algorithm / 72 4.3.4 Gibbs Motif Finding Algorithm / 73 4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74 4.4 Time Complexity / 75 4.5 Case Studies / 75 4.
5.1 Performance Evaluation / 76 4.5.2 CRP Binding Sites / 76 4.5.3 Multiple Motifs in Helix-Turn-Helix Protein Structure / 78 4.6 Conclusion / 80 References / 81 5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83 Carl Barton, Tomás Flouri, Costas S. Iliopoulos, and Solon P.
Pissis 5.1 Introduction / 83 5.2 Definitions and Notation / 85 5.3 Problem Definition / 87 5.4 Algorithms / 88 5.5 Conclusion / 94 References / 95 II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97 6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99 Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang 6.1 Introduction / 99 6.2 Representative Protein Secondary Structure Prediction Methods / 102 6.
2.1 Chou-Fasman / 103 6.2.2 GOR / 104 6.2.3 PHD / 104 6.2.4 PSIPRED / 104 6.
2.5 SPINE-X / 105 6.2.6 PSSpred / 105 6.2.7 Meta Methods / 105 6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106 6.3.
1 Measures / 106 6.3.2 Benchmark / 106 6.3.3 Performances / 107 6.4 Conclusion / 110 Acknowledgments / 110 References / 111 7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114 Yann Guermeur and Fabien Lauer 7.1 Introduction / 114 7.2 Biological Sequence Segmentation / 115 7.
3 MSVMpred / 117 7.3.1 Base Classifiers / 117 7.3.2 Ensemble Methods / 118 7.3.3 Convex Combination / 119 7.4 Postprocessing with A Generative Model / 119 7.
5 Dedication to Protein Secondary Structure Prediction / 120 7.5.1 Biological Problem / 121 7.5.2 MSVMpred2 / 121 7.5.3 Hidden Semi-Markov Model / 122 7.5.
4 Experimental Results / 122 7.6 Conclusions and Ongoing Research / 125 Acknowledgments / 126 References / 126 8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129 Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih 8.1 Introduction / 129 8.2 A Few Basic Concepts / 130 8.2.1 Hierarchy of Protein Structures / 130 8.2.2 Secondary Structure Elements / 131 8.
2.3 Structural Motifs / 132 8.2.4 Available Sources for Protein Data / 134 8.3 State of the Art / 135 8.3.1 Protein Structure Motif Search / 135 8.3.
2 Promotif / 136 8.3.3 Secondary-Structure Matching / 137 8.3.4 Multiple Structural Alignment by Secondary Structures / 138 8.4 A Novel Geometrical Approach to Motif Retrieval / 138 8.4.1 Secondary Structures Cooccurrences / 138 8.
4.2 Cross Motif Search / 143 8.4.3 Complete Cross Motif Search / 146 8.5 Implementation Notes / 149 8.5.1 Optimizations / 149 8.5.
2 Parallel Approaches / 150 8.6 Conclusions and Future Work / 151 Acknowledgment / 152 References / 152 9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155 Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang 9.1 Introduction / 155 9.2 Background / 156 9.2.1 Noncoding RNAs and Their Secondary Structures / 156 9.
2.2 Pseudoknotted ncRNA Search Tools / 157 9.3 Methodology / 157 9.4 Results and Interpretation / 161 9.5 Conclusion / 162 References / 163 III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165 10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167 Wajdi Dhifli and Engelbert Mephu Nguifo 10.1 Introduction / 167 10.2 From Protein 3D-Structures to Protein Graphs / 169 10.2.
1 Parsing Protein 3D-Structures into Graphs / 169 10.3 Graph Mining / 172 10.4 Subgraph Mining / 173 10.5 Frequent Subgraph Discovery / 173 10.5.1 Problem Definition / 174 10.5.2 Candidates Generation / 176 10.
5.3 Frequent Subgraph Discovery Approaches / 177 10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs / 178 10.6 Feature Selection / 179 10.6.1 Relevance of a Feature / 179 10.7 Feature Selection for Subgraphs / 180 10.
7.1 Problem Statement / 180 10.7.2 Mining Top-k Subgraphs / 180 10.7.3 Clustering-Based Subgraph Selection / 181 10.7.4 Sampling-Based Approaches / 181 10.
7.5 Approximate Subgraph Mining / 181 10.7.6 Discriminative Subgraph Selection / 182 10.7.7 Other Significant Subgraph Selection Approaches / 182 10.8 Discussion / 183 10.9 Conclusion / 185 Acknowledgments / 185 References / 186 11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190 Chinua Umoja, Xiaxia Yu, and Robert Harrison 11.
1 Introduction / 190 11.2 Genetic Algorithms / 192 11.2.1 GA Model Selection in Protein Structure Prediction / 196 11.2.2 Common Methodology / 198 11.3 Supervised Machine Learning Algorithm / 201 11.3.
1 Artificial Neural Networks / 201 11.3.2 ANNs in Protein Structure Prediction / 202 11.3.3 Support Vector Machines / 203 11.4 Fuzzy Application / 204 11.4.1 Fuzzy Logic / 204 11.
4.2 Fuzzy SVMs / 204 11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205 11.4.4 Fuzzy Decision Trees / 206 11.5 Conclusion / 207 References / 208 12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212 Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat 12.
1 Introduction / 212 12.2 Protein Structure Overview / 213 12.3 Technical Challenges and Open Issues / 214 12.4 Prediction Assessment / 215 12.5 Current Approaches / 216 12.5.1 DomCut / 216 12.5.
2 Scooby-Domain / 217 12.5.3 FIEFDom / 218 12.5.4 Chatterjee et al. (2009) / 219 12.5.5 Drop / 219 12.
6 Domain Boundary Prediction Using Enhanced General Regression Network / 220 12.6.1 Multi-Domain Benchmark Data Set / 220 12.6.2 Compact Domain Profile / 221 12.6.3 The Enhanced Semi-Parametric Model / 222 12.6.
4 Training, Testing, and Validation / 225 12.6.5 Experimental Results / 226 12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227 12.7.1 Compositional Index / 228 12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated Annealing / 229 12.
7.3 Experimental Results / 230 12.8 Conclusion / 232 References / 23.