Introduction 1 About This Book 3 Foolish Assumptions 3 Icons Used in This Book 4 Beyond the Book 4 Where to Go from Here 4 Part 1: Getting Started with Data Science 5 Chapter 1: Wrapping Your Head Around Data Science 7 Seeing Who Can Make Use of Data Science 8 Inspecting the Pieces of the Data Science Puzzle 10 Collecting, querying, and consuming data 11 Applying mathematical modeling to data science tasks 12 Deriving insights from statistical methods 12 Coding, coding, coding -- it''s just part of the game 13 Applying data science to a subject area 13 Communicating data insights 14 Exploring Career Alternatives That Involve Data Science 15 The data implementer 16 The data leader 16 The data entrepreneur 17 Chapter 2: Tapping into Critical Aspects of Data Engineering 19 Defining Big Data and the Three Vs 19 Grappling with data volume 21 Handling data velocity 21 Dealing with data variety 22 Identifying Important Data Sources 23 Grasping the Differences among Data Approaches 24 Defining data science 25 Defining machine learning engineering 26 Defining data engineering 26 Comparing machine learning engineers, data scientists, and data engineers 27 Storing and Processing Data for Data Science 28 Storing data and doing data science directly in the cloud 28 Storing big data on-premise 32 Processing big data in real-time 35 Part 2: Using Data Science to Extract Meaning from Your Data 37 Chapter 3: Machine Learning Means Using a Machine to Learn from Data 39 Defining Machine Learning and Its Processes 40 Walking through the steps of the machine learning process 40 Becoming familiar with machine learning terms 41 Considering Learning Styles 42 Learning with supervised algorithms 42 Learning with unsupervised algorithms 43 Learning with reinforcement 43 Seeing What You Can Do 43 Selecting algorithms based on function 44 Using Spark to generate real-time big data analytics 48 Chapter 4: Math, Probability, and Statistical Modeling 51 Exploring Probability and Inferential Statistics 52 Probability distributions 53 Conditional probability with Naïve Bayes 55 Quantifying Correlation 56 Calculating correlation with Pearson''s r 56 Ranking variable-pairs using Spearman''s rank correlation 58 Reducing Data Dimensionality with Linear Algebra 59 Decomposing data to reduce dimensionality 59 Reducing dimensionality with factor analysis 63 Decreasing dimensionality and removing outliers with PCA 64 Modeling Decisions with Multiple Criteria Decision-Making 65 Turning to traditional MCDM 65 Focusing on fuzzy MCDM 67 Introducing Regression Methods 67 Linear regression 67 Logistic regression 69 Ordinary least squares (OLS) regression methods 70 Detecting Outliers 70 Analyzing extreme values 70 Detecting outliers with univariate analysis 71 Detecting outliers with multivariate analysis 73 Introducing Time Series Analysis 73 Identifying patterns in time series 74 Modeling univariate time series data 75 Chapter 5: Grouping Your Way into Accurate Predictions 77 Starting with Clustering Basics 78 Getting to know clustering algorithms 79 Examining clustering similarity metrics 81 Identifying Clusters in Your Data 82 Clustering with the k-means algorithm 82 Estimating clusters with kernel density estimation (KDE) 84 Clustering with hierarchical algorithms 84 Dabbling in the DBScan neighborhood 87 Categorizing Data with Decision Tree and Random Forest Algorithms 88 Drawing a Line between Clustering and Classification 89 Introducing instance-based learning classifiers 90 Getting to know classification algorithms 90 Making Sense of Data with Nearest Neighbor Analysis 93 Classifying Data with Average Nearest Neighbor Algorithms 94 Classifying with K-Nearest Neighbor Algorithms 97 Understanding how the k-nearest neighbor algorithm works 98 Knowing when to use the k-nearest neighbor algorithm 99 Exploring common applications of k-nearest neighbour algorithms 100 Solving Real-World Problems with Nearest Neighbor Algorithms 100 Seeing k-nearest neighbor algorithms in action 101 Seeing average nearest neighbor algorithms in action 101 Chapter 6: Coding Up Data Insights and Decision Engines 103 Seeing Where Python and R Fit into Your Data Science Strategy 104 Using Python for Data Science 104 Sorting out the various Python data types 106 Putting loops to good use in Python 109 Having fun with functions 110 Keeping cool with classes 112 Checking out some useful Python libraries 114 Using Open Source R for Data Science 120 Comprehending R''s basic vocabulary 121 Delving into functions and operators 124 Iterating in R 127 Observing how objects work 129 Sorting out R''s popular statistical analysis packages 131 Examining packages for visualizing, mapping, and graphing in R 133 Chapter 7: Generating Insights with Software Applications 137 Choosing the Best Tools for Your Data Science Strategy 138 Getting a Handle on SQL and Relational Databases 139 Investing Some Effort into Database Design 144 Defining data types 144 Designing constraints properly 145 Normalizing your database 145 Narrowing the Focus with SQL Functions 147 Making Life Easier with Excel 151 Using Excel to quickly get to know your data 152 Reformatting and summarizing with PivotTables 157 Automating Excel tasks with macros 158 Chapter 8: Telling Powerful Stories with Data 161 Data Visualizations: The Big Three 162 Data storytelling for decision makers 162 Data showcasing for analysts 163 Designing data art for activists 164 Designing to Meet the Needs of Your Target Audience 164 Step 1: Brainstorm (All about Eve) 165 Step 2: Define the purpose 166 Step 3: Choose the most functional visualization type for your purpose 166 Picking the Most Appropriate Design Style 167 Inducing a calculating, exacting response 167 Eliciting a strong emotional response 168 Selecting the Appropriate Data Graphic Type 170 Standard chart graphics 171 Comparative graphics 173 Statistical plots 176 Topology structures 179 Spatial plots and maps 180 Testing Data Graphics 183 Adding Context 184 Creating context with data 184 Creating context with annotations 185 Creating context with graphical elements 186 Part 3: Taking Stock of Your Data Science Capabilities 187 Chapter 9: Developing Your Business Acumen 189 Bridging the Business Gap 189 Contrasting business acumen with subject matter expertise 190 Defining business acumen 191 Traversing the Business Landscape 192 Seeing how data roles support the business in making money 192 Leveling up your business acumen 195 Fortifying your leadership skills 196 Surveying Use Cases and Case Studies 197 Documentation for data leaders 199 Documentation for data implementers 202 Chapter 10: Improving Operations 205 Establishing Essential Context for Operational Improvements Use Cases 206 Exploring Ways That Data Science Is Used to Improve Operations 207 Making major improvements to traditional manufacturing operations 208 Optimizing business operations with data science 210 An AI case study: Automated, personalized, and effective debt collection processes 211 Gaining logistical efficiencies with better use of real-time data 216 Another AI case study: Real-time optimized logistics routing 217 Modernizing media and the press with data science and AI 222 Generating content with the click of a button 222 Yet another case study: Increasing content generation rates 224 Chapter 11: Making Marketing Improvements 229 Exploring Popular Use Cases for Data Science in Marketing 229 Turning Web Analytics into Dollars and Sense 232 Getting acquainted with omnichannel analytics 233 Mapping your channels 233 Building analytics around channel performance 235 Scoring your company''s channels 235 Building Data Products That Increase Sales-and-Marketing ROI 238 Increasing Profit Margins with Marketing Mix Modeling 239 Collecting data on the four Ps 240 Implementing marketing mix modeling 241 Increasing profitability with MMM 243 Chapter 12: Enabling Improved Decision-Making 245 Improving Decision-Making 245 Barking Up the Business Intelligence Tree 247 Using Data Analytics to Support Decision-Making 249 Types of analytics 252 Common challenges in analytics 252 Data wrangling 253 Increasing Profit Margins with Data Science 254 Seeing which kinds of data are useful when using data science for decision support 255 Directing improved decision-making for call center agents 257 Discovering the tipping point where the old way stops working 262 Chapter 13: Decreasing Lending Risk.
Data Science for Dummies