Acknowledgments xviiReader Services xviiiPART I: THE FUNDAMENTALS OF BIG DATAChapter 1: Understanding Big Data 3 Concepts and Terminology 5Datasets 5Data Analysis 6Data Analytics 6Descriptive Analytics 8Diagnostic Analytics 9Predictive Analytics 10Prescriptive Analytics 11Business Intelligence (BI) 12Key Performance Indicators (KPI) 12Big Data Characteristics 13Volume 14Velocity 14Variety 15Veracity 16Value 16Different Types of Data 17Structured Data 18Unstructured Data 19Semi-structured Data 19Metadata 20Case Study Background 20History 20Technical Infrastructure and Automation Environment 21Business Goals and Obstacles 22Case Study Example 24Identifying Data Characteristics 26Volume 26Velocity 26Variety 26Veracity 26Value 27Identifying Types of Data 27 Chapter 2: Business Motivations and Drivers for Big Data Adoption 29 Marketplace Dynamics 30Business Architecture 33Business Process Management 36Information and Communications Technology 37Data Analytics and Data Science 37Digitization 38Affordable Technology and Commodity Hardware 38Social Media 39Hyper-Connected Communities and Devices 40Cloud Computing 40Internet of Everything (IoE) 42Case Study Example 43 Chapter 3: Big Data Adoption and Planning Considerations 47 Organization Prerequisites 49Data Procurement 49Privacy 49Security 50Provenance 51Limited Realtime Support 52Distinct Performance Challenges 53Distinct Governance Requirements 53Distinct Methodology 53Clouds 54Big Data Analytics Lifecycle 55Business Case Evaluation 56Data Identification 57Data Acquisition and Filtering 58Data Extraction 60Data Validation and Cleansing 62Data Aggregation and Representation 64Data Analysis 66Data Visualization 68Utilization of Analysis Results 69Case Study Example 71Big Data Analytics Lifecycle 73Business Case Evaluation 73Data Identification 74Data Acquisition and Filtering 74Data Extraction 74Data Validation and Cleansing 75Data Aggregation and Representation 75Data Analysis 75Data Visualization 76Utilization of Analysis Results 76 Chapter 4: Enterprise Technologies and Big Data Business Intelligence 77 Online Transaction Processing (OLTP) 78Online Analytical Processing (OLAP) 79Extract Transform Load (ETL) 79Data Warehouses 80Data Marts 81Traditional BI 82Ad-hoc Reports 82Dashboards 82Big Data BI 84Traditional Data Visualization 84Data Visualization for Big Data 85Case Study Example 86Enterprise Technology 86Big Data Business Intelligence 87 PART II: STORING AND ANALYZING BIG DATAChapter 5: Big Data Storage Concepts 91 Clusters 93File Systems and Distributed File Systems 93NoSQL 94Sharding 95Replication 97Master-Slave 98Peer-to-Peer 100Sharding and Replication 103Combining Sharding and Master-Slave Replication 104Combining Sharding and Peer-to-Peer Replication 105CAP Theorem 106ACID 108BASE 113Case Study Example 117 Chapter 6: Big Data Processing Concepts 119 Parallel Data Processing 120Distributed Data Processing 121Hadoop 122Processing Workloads 122Batch 123Transactional 123Cluster 124Processing in Batch Mode 125Batch Processing with MapReduce 125Map and Reduce Tasks 126Map 127Combine 127Partition 129Shuffle and Sort 130Reduce 131A Simple MapReduce Example 133Understanding MapReduce Algorithms 134Processing in Realtime Mode 137Speed Consistency Volume (SCV) 137Event Stream Processing 140Complex Event Processing 141Realtime Big Data Processing and SCV 141Realtime Big Data Processing and MapReduce 142Case Study Example 143Processing Workloads 143Processing in Batch Mode 143Processing in Realtime 144 Chapter 7: Big Data Storage Technology 145 On-Disk Storage Devices 147Distributed File Systems 147RDBMS Databases 149NoSQL Databases 152Characteristics 152Rationale 153Types 154Key-Value 156Document 157Column-Family 159Graph 160NewSQL Databases 163In-Memory Storage Devices 163In-Memory Data Grids 166Read-through 170Write-through 170Write-behind 172Refresh-ahead 172In-Memory Databases 175Case Study Example 179 Chapter 8: Big Data Analysis Techniques 181 Quantitative Analysis 183Qualitative Analysis 184Data Mining 184Statistical Analysis 184A/B Testing 185Correlation 186Regression 188Machine Learning 190Classification (Supervised Machine Learning) 190Clustering (Unsupervised Machine Learning) 191Outlier Detection 192Filtering 193Semantic Analysis 195Natural Language Processing 195Text Analytics 196Sentiment Analysis 197Visual Analysis 198Heat Maps 198Time Series Plots 200Network Graphs 201Spatial Data Mapping 202Case Study Example 204Correlation 204Regression 204Time Series Plot 205Clustering 205Classification 205 Appendix A: Case Study Conclusion 207About the Authors 211 Thomas Erl 211Wajid Khattak 211Paul Buhler 212 Index 213.
Big Data Fundamentals : Concepts, Drivers and Techniques