Foreword xv Introduction 1 About This Book 2 Foolish Assumptions 2 Icons Used in This Book 3 Beyond the Book 3 Where to Go from Here 4 Part 1: Getting Started with Data Science 5 Chapter 1: Wrapping Your Head around Data Science 7 Seeing Who Can Make Use of Data Science 8 Analyzing the Pieces of the Data Science Puzzle 10 Collecting, querying, and consuming data 10 Applying mathematical modeling to data science tasks 11 Deriving insights from statistical methods 12 Coding, coding, coding -- it''s just part of the game 12 Applying data science to a subject area 12 Communicating data insights 14 Exploring the Data Science Solution Alternatives 14 Assembling your own in-house team 14 Outsourcing requirements to private data science consultants 15 Leveraging cloud-based platform solutions 15 Letting Data Science Make You More Marketable 16 Chapter 2: Exploring Data Engineering Pipelines and Infrastructure 17 Defining Big Data by the Three Vs 18 Grappling with data volume 18 Handling data velocity 18 Dealing with data variety 19 Identifying Big Data Sources 20 Grasping the Difference between Data Science and Data Engineering 21 Defining data science 21 Defining data engineering 22 Comparing data scientists and data engineers 23 Making Sense of Data in Hadoop 24 Digging into MapReduce 24 Stepping into real-time processing 26 Storing data on the Hadoop distributed file system (HDFS) 27 Putting it all together on the Hadoop platform 28 Identifying Alternative Big Data Solutions 28 Introducing massively parallel processing (MPP) platforms 29 Introducing NoSQL databases 29 Data Engineering in Action: A Case Study 30 Identifying the business challenge 30 Solving business problems with data engineering 32 Boasting about benefits 32 Chapter 3: Applying Data-Driven Insights to Business and Industry 33 Benefiting from Business-Centric Data Science 34 Converting Raw Data into Actionable Insights with Data Analytics 35 Types of analytics 35 Common challenges in analytics 36 Data wrangling 36 Taking Action on Business Insights 37 Distinguishing between Business Intelligence and Data Science 39 Business intelligence, defined 39 The kinds of data used in business intelligence 40 Technologies and skillsets that are useful in business intelligence 40 Defining Business-Centric Data Science 41 Kinds of data that are useful in business-centric data science 42 Technologies and skillsets that are useful in business-centric data science 43 Making business value from machine learning methods 43 Differentiating between Business Intelligence and Business-Centric Data Science 44 Knowing Whom to Call to Get the Job Done Right 45 Exploring Data Science in Business: A Data-Driven Business Success Story 46 Part 2: Using Data Science to Extract Meaning from Your Data 49 Chapter 4: Machine Learning: Learning from Data with Your Machine 51 Defining Machine Learning and Its Processes 51 Walking through the steps of the machine learning process 52 Getting familiar with machine learning terms 52 Considering Learning Styles 53 Learning with supervised algorithms 53 Learning with unsupervised algorithms 53 Learning with reinforcement 54 Seeing What You Can Do 54 Selecting algorithms based on function 54 Using Spark to generate real-time big data analytics 58 Chapter 5: Math, Probability, and Statistical Modeling 61 Exploring Probability and Inferential Statistics 62 Probability distributions 63 Conditional probability with Naïve Bayes 65 Quantifying Correlation 66 Calculating correlation with Pearson''s r 66 Ranking variable-pairs using Spearman''s rank correlation 66 Reducing Data Dimensionality with Linear Algebra 67 Decomposing data to reduce dimensionality 67 Reducing dimensionality with factor analysis 69 Decreasing dimensionality and removing outliers with PCA 70 Modeling Decisions with Multi-Criteria Decision Making 70 Turning to traditional MCDM 71 Focusing on fuzzy MCDM 72 Introducing Regression Methods 73 Linear regression 73 Logistic regression 74 Ordinary least squares (OLS) regression methods 74 Detecting Outliers 75 Analyzing extreme values 75 Detecting outliers with univariate analysis 76 Detecting outliers with multivariate analysis 77 Introducing Time Series Analysis 78 Identifying patterns in time series 78 Modeling univariate time series data 79 Chapter 6: Using Clustering to Subdivide Data 81 Introducing Clustering Basics 81 Getting to know clustering algorithms 82 Looking at clustering similarity metrics 85 Identifying Clusters in Your Data 86 Clustering with the k-means algorithm 86 Estimating clusters with kernel density estimation (KDE) 87 Clustering with hierarchical algorithms 88 Dabbling in the DBScan neighborhood 90 Categorizing Data with Decision Tree and Random Forest Algorithms 91 Chapter 7: Modeling with Instances 93 Recognizing the Difference between Clustering and Classification 94 Reintroducing clustering concepts 94 Getting to know classification algorithms 95 Making Sense of Data with Nearest Neighbor Analysis 97 Classifying Data with Average Nearest Neighbor Algorithms 98 Classifying with K-Nearest Neighbor Algorithms 101 Understanding how the k-nearest neighbor algorithm works 102 Knowing when to use the k-nearest neighbor algorithm 103 Exploring common applications of k-nearest neighbor algorithms 104 Solving Real-World Problems with Nearest Neighbor Algorithms 104 Seeing k-nearest neighbor algorithms in action 104 Seeing average nearest neighbor algorithms in action 105 Chapter 8: Building Models That Operate Internet-of-Things Devices 107 Overviewing the Vocabulary and Technologies 108 Learning the lingo 108 Procuring IoT platforms 110 Spark streaming for the IoT 110 Getting context-aware with sensor fusion 111 Digging into the Data Science Approaches 111 Taking on time series 112 Geospatial analysis 112 Dabbling in deep learning 113 Advancing Artificial Intelligence Innovation 113 Part 3: Creating Data Visualizations That Clearly Communicate Meaning 115 Chapter 9: Following the Principles of Data Visualization Design 117 Data Visualizations: The Big Three 118 Data storytelling for organizational decision makers 118 Data showcasing for analysts 118 Designing data art for activists 119 Designing to Meet the Needs of Your Target Audience 119 Step 1: Brainstorm (about Brenda) 120 Step 2: Define the purpose 121 Step 3: Choose the most functional visualization type for your purpose 121 Picking the Most Appropriate Design Style 122 Inducing a calculating, exacting response 122 Eliciting a strong emotional response 123 Choosing How to Add Context 124 Creating context with data 125 Creating context with annotations 125 Creating context with graphical elements 125 Selecting the Appropriate Data Graphic Type 127 Standard chart graphics 127 Comparative graphics 130 Statistical plots 134 Topology structures 135 Spatial plots and maps 138 Choosing a Data Graphic 140 Chapter 10: Using D3.js for Data Visualization 141 Introducing the D3.js Library 141 Knowing When to Use D3.js (and When Not To) 142 Getting Started in D3.js 143 Bringing in the HTML and DOM 144 Bringing in the JavaScript and SVG 145 Bringing in the Cascading Style Sheets (CSS) 146 Bringing in the web servers and PHP 146 Implementing More Advanced Concepts and Practices in D3.js 147 Getting to know chain syntax 151 Getting to know scales 152 Getting to know transitions and interactions 153 Chapter 11: Web-Based Applications for Visualization Design 157 Designing Data Visualizations for Collaboration 158 Visualizing and collaborating with Plotly 159 Talking about Tableau Public 161 Visualizing Spatial Data with Online Geographic Tools 162 Making pretty maps with OpenHeatMap 163 Mapmaking and spatial data analytics with CartoDB 164 Visualizing with Open Source: Web-Based Data Visualization Platforms 166 Making pretty data graphics with Google Fusion Tables 166 Using iCharts for web-based data visualization 167 Using RAW for web-based data visualization 168 Knowing When to Stick with Infographics 170 Making cool infographics with Infogr.am 170 Making cool infographics with Piktochart 172 Chapter 12: Exploring Best Practices in Dashboard Design 173 Focusing on the Audience 174 Starting with the Big Picture 175 Getting the Details Right 176 Testing Your Design 178 Chapter 13: Making Maps from Spatial Data 179 Getting into the Basics of GIS 180 Spatial databases 181 File formats in GIS 182 Map projections and coo.
Data Science for Dummies