PART I FUNDAMENTAL ALGORITHMS AND METHODS OF MEDICAL INFORMATICS Chapter 1Parsing and Transforming Text Files Peeking into Large Files Paging through Large Text Files Extracting Lines that Match a Regular Expression Changing Every File in a Subdirectory Counting the Words in a File Making a Word List with Occurrence Tally Using Printf Formatting Style Chapter 2 Utility Scripts Random Numbers Converting Non-ASCII to Base64 ASCII Creating a Universally Unique Identifier Splitting Text into Sentences One-Way Hash on a Name One-Way Hash on a File A Prime Number Generator Chapter 3 Viewing and Modifying Images Viewing a JPEG Image Converting between Image Formats Batch Conversions Drawing a Graph from List Data Drawing an Image Mashup Chapter 4 Indexing Text ZIPF Distribution of a Text File Preparing a Concordance Extracting Phrases Preparing an Index Comparing Texts Using Similarity Scores PART II MEDICAL DATA RESOURCES Chapter 5 The National Library of Medicine''s Medical Subject Headings (MeSH ) Determining the Hierarchical Lineage for MeSH Terms Creating a MeSH Database Reading the MeSH Database Creating an SQLite Database for MeSH Reading the SQLite MeSH Database Chapter 6 The International Classification of Diseases Creating the ICD Dictionary Building the ICD-O (Oncology) Dictionary Chapter 7 SEER: The Cancer Surveillance, Epidemiology, and End Results Program Parsing the SEER Data Files Finding the Occurrences of All Cancers in the SEER Data Files Finding the Age Distributions of the Cancers in the SEER Data Files Chapter 8 OMIM: The Online Mendelian Inheritance in Man Collecting the OMIM Entry Terms Finding Inherited Cancer Conditions Chapter 9 PubMed Building a Large Text Corpus of Biomedical Information Creating a List of Doublets from a PubMed Corpus Downloading Gene Synonyms from PubMed Downloading Protein Synonyms from PubMed Chapter 10 Taxonomy Finding a Taxonomic Hierarchy Finding the Restricted Classes of Human Infectious Pathogens Chapter 11 Developmental Lineage Classification and Taxonomyof Neoplasms Building the Doublet Hash Scanning the Literature for Candidate Terms Adding Terms to the Neoplasm Classification Determining the Lineage of Every Neoplasm Concept Chapter 12 U.S. Census Files Total Population of the United States Stratified Distribution for the U.S. Census Adjusting for Age Chapter 13 Centers for Disease Control and Prevention Mortality Files Death Certificate Data Obtaining the CDC Data Files How Death Certificates Are Represented in Data Records Ranking, by Number of Occurrences, Every Condition in the CDC Mortality Files PART III PRIMARY TASKS OF MEDICAL INFORMATICS Chapter 14 Autocoding A Neoplasm Autocoder Recoding Chapter 15 Text Scrubber for Deidentifyin g Confidential Text Chapter 16 Web Pages and CGI Scripts Grabbing Web Pages CGI Script for Searching the Neoplasm Classification Chapter 17 Image Annotation Inserting a Header Comment Extracting the Header Comment in a JPEG Image File Inserting IPTC Annotations Extracting Comment, EXIF, and IPTC Annotations Dealing with DICOM Finding DICOM Images DICOM-to-JPEG Conversion Chapter 18 Describing Data with Data, Using XML Parsing XML Resource Description Framework (RDF) Dublin Core Metadata Insert an RDF Document into an Image File Insert an Image File into an RDF Document RDF Schema Visualizing an RDF Schema with GraphViz Obtaining GraphViz Converting a Data Structure to GraphViz PART IV MEDICAL DISCOVERY Chapter 19 Case Study: Emphysema Rates Chapter 20 Case Study: Cancer Occurrence Rates Chapter 21 Case Study: Germ Cell Tumor Rates across Ethnicities Chapter 22 Case Study: Ranking the Death-Certifying Process, by State Chapter 23 Case Study: Data Mashups for Epidemics Tally of Coccidioidomycosis Cases by State Creating the Map Mashup Chapter 24 Case Study: Sickle Cell Rates Chapter 25 Case Study: Site-Specific Tumor Biology Anatomic Origins of Mesotheliomas Mesothelioma Records in the SEER Data Sets Graphic Representation Chapter 26 Case Study: Bimodal Tumors Chapter 27 Case Study: The Age of Occurrence of Precancers Epilogue for Healthcare Professionals and Medical Scientists Learn One or More Open Source Programming Languages Don''t Agonize Over Which Language You Should Choose Learn Algorithms Unless You Are a Professional Programmer, Relax and Enjoy Being a Newbie Do Not Delegate Simple Programming Tasks to Others Break Complex Tasks into Simple Methods and Algorithms Write Fast Scripts Concentrate on the Questions, Not the Answers Appendix How to Acquire Ruby How to Acquire Perl How to Acquire Python How to Acquire RMagick How to Acquire SQLite How to Acquire the Public Data Files Used in This Book Other Publicly Available Files, Data Sets, and Utilities t;BR>Drawing a Graph from List Data Drawing an Image Mashup Chapter 4 Indexing Text ZIPF Distribution of a Text File Preparing a Concordance Extracting Phrases Preparing an Index Comparing Texts Using Similarity Scores PART II MEDICAL DATA RESOURCES Chapter 5 The National Library of Medicine''s Medical Subject Headings (MeSH ) Determining the Hierarchical Lineage for MeSH Terms Creating a MeSH Database Reading the MeSH Database Creating an SQLite Database for MeSH Reading the SQLite MeSH Database Chapter 6 The International Classification of Diseases Creating the ICD Dictionary Building the ICD-O (Oncology) Dictionary Chapter 7 SEER: The Cancer Surveillance, Epidemiology, and End Results Program Parsing the SEER Data Files Finding the Occurrences of All Cancers in the SEER Data Files Finding the Age Distributions of the Cancers in the SEER Data Files Chapter 8 OMIM: The Online Mendelian Inheritance in Man Collecting the OMIM Entry Terms Finding Inherited Cancer Conditions Chapter 9 PubMed Building a Large Text Corpus of Biomedical Information Creating a List of Doublets from a PubMed Corpus Downloading Gene Synonyms from PubMed Downloading Protein Synonyms from PubMed Chapter 10 Taxonomy Finding a Taxonomic Hierarchy Finding the Restricted Classes of Human Infectious Pathogens Chapter 11 Developmental Lineage Classification and Taxonomyof Neoplasms Building the Doublet Hash Scanning the Literature for Candidate Terms Adding Terms to the Neoplasm Classification Determining the Lineage of Every Neoplasm Concept Chapter 12 U.S. Census Files Total Population of the United States Stratified Distribution for the U.S.
Census Adjusting for Age Chapter 13 Centers for Disease Control and Prevention Mortality Files Death Certificate Data Obtaining the CDC Data Files How Death Certificates Are Represented in Data Records Ranking, by Number of Occurrences, Every Condition in the CDC Mortality Files PART III PRIMARY TASKS OF MEDICAL INFORMATICS Chapter 14 Autocoding A Neoplasm Autocoder Recoding Chapter 15 Text Scrubber for Deidentifyin g Confidential Text Chapter 16 Web Pages and CGI Scripts Grabbing Web Pages CGI Script for Searching the Neoplasm Classification Chapter 17 Image Annotation Inserting a Header Comment Extracting the Header Comment in a JPEG Image File Inserting IPTC Annotations Extracting Comment, EXIF, and IPTC Annotations Dealing with DICOM Finding DICOM Images DICOM-to-JPEG Conversion Chapter 18 Describing Data with Data, Using XML Parsing XML Resource Description Framework (RDF) Dublin Core Metadata Insert an RDF Document into an Image File Insert an Image File into an RDF Document RDF Schema Visualizing an RDF Schema with GraphViz Obtaining GraphViz Converting a Data Structure to GraphViz PART IV MEDICAL DISCOVERY Chapter 19 Case Study: Emphysema Rates Chapter 20 Case Study: Cancer Occurrence Rates Chapter 21 Case Study: Germ Cell Tumor Rates across Ethnicities Chapter 22 Case Study: Ranking the Death-Certifying Process, by State Chapter 23 Case Study: Data Mashups for Epidemics Tally of Coccidioidomycosis Cases by State Creating the Map Mashup Chapter 24 Case Study: Sickle Cell Rates Chapter 25 Case Study: Site-Specific Tumor Biology Anatomic Origins of Mesotheliomas Mesothelioma Records in the SEER Data Sets Graphic Representation Chapter 26 Case Study: Bimodal Tumors Chapter 27 Case Study: The Age of Occurrence of Precancers Epilogue for Healthcare Professionals and Medical Scientists Learn One or More Open Source Programming Languages Don''t Agonize Over Which Language You Should Choose Learn Algorithms Unless You Are a Professional Programmer, Relax and Enjoy Being a Newbie Do Not Delegate Simple Programming Tasks to Others Break Complex Tasks into Simple Methods and Algorithms Write Fast Scripts Concentrate on the Questions, Not the Answers Appendix How to Acquire.