Data Science : The Executive Summary - a Technical Book for Non-Technical Professionals
Data Science : The Executive Summary - a Technical Book for Non-Technical Professionals
Click to enlarge
Author(s): Cady, Field
ISBN No.: 9781119544180
Pages: 208
Year: 202012
Format: E-Book
Price: $ 108.95
Dispatch delay: Dispatched between 7 to 15 days
Status: Available

1 Introduction 1 1.1 Why Managers Need to Know About Data Science 1 1.2 The New Age of Data Literacy 2 1.3 Data-Driven Development 3 1.4 How to Use this Book 4 2 The Business Side of Data Science 7 2.1 What Is Data Science? 7 2.1.1 What Data Scientists Do 7 2.


1.2 History of Data Science 9 2.1.3 Data Science Roadmap 12 2.1.4 Demystifying the Terms: Data Science, Machine Learning, Statistics, and Business Intelligence 13 2.1.4.


1 Machine Learning 13 2.1.4.2 Statistics 14 2.1.4.3 Business Intelligence 15 2.1.


5 What Data Scientists Don''t (Necessarily) Do 15 2.1.5.1 Working Without Data 16 2.1.5.2 Working with Data that Can''t Be Interpreted 17 2.1.


5.3 Replacing Subject Matter Experts 17 2.1.5.4 Designing Mathematical Algorithms 18 2.2 Data Science in an Organization 19 2.2.1 Types of Value Added 19 2.


2.1.1 Business Insights 19 2.2.1.2 Intelligent Products 19 2.2.1.


3 Building Analytics Frameworks 20 2.2.1.4 Offline Batch Analytics 21 2.2.2 One-Person Shops and Data Science Teams 21 2.2.3 Related Job Roles 22 2.


2.3.1 Data Engineer 22 2.2.3.2 Data Analyst 22 2.2.3.


3 Software Engineer 23 2.3 Hiring Data Scientists 25 2.3.1 Do I Even Need Data Science? 26 2.3.2 The Simplest Option: Citizen Data Scientists 27 2.3.3 The Harder Option: Dedicated Data Scientists 28 2.


3.4 Programming, Algorithmic Thinking, and Code Quality 28 2.3.5 Hiring Checklist 31 2.3.6 Data Science Salaries 32 2.3.7 Bad Hires and Red Flags 32 2.


3.8 Advice with Data Science Consultants 34 2.4 Management Failure Cases 36 2.4.1 Using Them as Devs 36 2.4.2 Inadequate Data 36 2.4.


3 Using Them as Graph Monkeys 37 2.4.4 Nebulous Questions 37 2.4.5 Laundry Lists of Questions Without Prioritization 38 3 Working with Modern Data 41 3.1 Unstructured Data and Passive Collection 41 3.2 Data Types and Sources 42 3.3 Data Formats 43 3.


3.1 CSV Files 43 3.3.2 JSON Files 44 3.3.3 XML and HTML 46 3.4 Databases 47 3.4.


1 Relational Databases and Document Stores 48 3.4.2 Database Operations 49 3.5 Data Analytics Software Architectures 50 3.5.1 Shared Storage 51 3.5.2 Shared Relational Database 52 3.


5.3 Document Store+Analytics RDB 52 3.5.4 Storage+Parallel Processing 53 4 Telling the Story, Summarizing Data 55 4.1 Choosing What to Measure 56 4.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture IsWorth a Thousand Numbers 58 4.3 Experiments, Correlation, and Causality 60 4.4 Summarizing One Number 62 4.


5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails 63 4.5.1 Measuring Central Tendency 63 4.5.1.1 Mean 63 4.5.1.


2 Median 64 4.5.1.3 Mode 65 4.5.2 Measuring Spread 65 4.5.2.


1 Standard Deviation 65 4.5.2.2 Percentiles 66 4.5.3 Advanced Material: Managing Heavy Tails 67 4.6 Summarizing Two Numbers: Correlations and Scatterplots 68 4.6.


1 Correlations 68 4.6.1.1 Pearson Correlation 71 4.6.1.2 Ordinal Correlations 71 4.6.


2 Mutual Information 72 4.7 Advanced Material: Fitting a Line or Curve 72 4.7.1 Effects of Outliers 75 4.7.2 Optimization and Choosing Cost Functions 76 4.8 Statistics: How to Not Fool Yourself 77 4.8.


1 The Central Concept: The p -Value 78 4.8.2 Reality Check: Picking a Null Hypothesis and Modeling Assumptions 80 4.8.3 Advanced Material: Parameter Estimation and Confidence Intervals 81 4.8.4 Advanced Material: Statistical TestsWorth Knowing 82 4.8.


4.1 2-Test 83 4.8.4.2 T -test 83 4.8.4.3 Fisher''s Exact Test 84 4.


8.4.4 Multiple Hypothesis Testing 84 4.8.5 Bayesian Statistics 85 4.9 Advanced Material: Probability Distributions Worth Knowing 86 4.9.1 Probability Distributions: Discrete and Continuous 87 4.


9.2 Flipping Coins: Bernoulli Distribution 89 4.9.3 Adding Coin Flips: Binomial Distribution 89 4.9.4 Throwing Darts: Uniform Distribution 91 4.9.5 Bell-Shaped Curves: Normal Distribution 91 4.


9.6 Heavy Tails 101: Log-Normal Distribution 92 4.9.7 Waiting Around: Exponential Distribution and the Geometric Distribution 93 4.9.8 Time to Failure: Weibull Distribution 94 4.9.9 Counting Events: Poisson Distribution 95 5 Machine Learning 101 5.


1 Supervised Learning, Unsupervised Learning, and Binary Classifiers 102 5.1.1 Reality Check: Getting Labeled Data and Assuming Independence 103 5.1.2 Feature Extraction and the Limitations of Machine Learning 104 5.1.3 Overfitting 105 5.1.


4 Cross-Validation Strategies 106 5.2 Measuring Performance 107 5.2.1 Confusion Matrices 108 5.2.2 ROC Curves 108 5.2.3 Area Under the ROC Curve 110 5.


2.4 Selecting Classification Cutoffs 110 5.2.5 Other Performance Metrics 111 5.2.6 Lift Curves 112 5.3 Advanced Material: Important Classifiers 113 5.3.


1 Decision Trees 113 5.3.2 Random Forests 115 5.3.3 Ensemble Classifiers 116 5.3.4 Support Vector Machines 116 5.3.


5 Logistic Regression 119 5.3.6 Lasso Regression 121 5.3.7 Naive Bayes 121 5.3.8 Neural Nets 123 5.4 Structure of the Data: Unsupervised Learning 124 5.


4.1 The Curse of Dimensionality 125 5.4.2 Principal Component Analysis and Factor Analysis 125 5.4.2.1 Scree Plots and Understanding Dimensionality 128 5.4.


2.2 Factor Analysis 128 5.4.2.3 Limitations of PCA 129 5.4.3 Clustering 129 5.4.


3.1 Real-World Assessment of Clusters 130 5.4.3.2 k -means Clustering 131 5.4.3.3 Advanced Material: Other Clustering Algorithms 132 5.


4.3.4 Advanced Material: Evaluating Cluster Quality 133 5.5 Learning as You Go: Reinforcement Learning 135 5.5.1 Multi-Armed Bandits and -Greedy Algorithms 136 5.5.2 Markov Decision Processes and Q-Learning 137 6 Knowing the Tools 141 6.


1 A Note on Learning to Code 141 6.2 Cheat Sheet 142 6.3 Parts of the Data Science Ecosystem 143 6.3.1 Scripting Languages 144 6.3.2 Technical Computing Languages 145 6.3.


2.1 Python''s Technical Computing Stack 145 6.3.2.2 R 146 6.3.2.3 Matlab and Octave 146 6.


3.2.4 Mathematica 147 6.3.2.5 SAS 147 6.3.2.


6 Julia 147 6.3.3 Visualization 147 6.3.3.1 Tableau 148 6.3.3.


2 Excel 148 6.3.3.3 D3.js 148 6.3.4 Databases 148 6.3.


5 Big Data 149 6.3.5.1 Types of Big Data Technologies 150 6.3.5.2 Spark 151 6.3.


6 Advanced Material: The Map-Reduce Paradigm 151 6.4 Advanced Material: Database Query Crash Course 153 6.4.1 Basic Queries 153 6.4.2 Groups and Aggregations 154 6.4.3 Joins 156 6.


4.4 Nesting Queries 157 7 Deep Learning and Artificial Intelligence 161 7.1 Overview of AI 161 7.1.1 Don '' t Fear the Skynet: Strong and Weak AI 161 7.1.2 System 1 and System 2 162 7.2 Neural Networks 164 7.


2.1 What Neural Nets Can and Can''t Do 164 7.2.2 Enough Boilerplate: What''s a Neural Net? 165 7.2.3 Convolutional Neural Nets 166 7.2.4 Advanced Material: Training Neural Networks 167 7.


2.4.1 Manual Versus Automatic Feature Extraction 168 7.2.4.2 Dataset Sizes and Data Augmentation 168 7.2.4.


3 Batches and Epochs 169 7.2.4.4 Transfer Learning 170 7.2.4.5 Feature Extraction 171 7.2.


4.6 Word Embeddings 171 7.3 Natural Language Processing 172 7.3.1 The Great Divide: Language Versus Statistics 172 7.3.2 Save Yourself Some Trouble: Consider Regular Expressions 173 7.3.


3 Software and Datasets 174 7.3.4 Key Issue: Vectorization 175 7.3.5 Bag-of-Words 175 7.4 Knowledge Bases and Graphs 177 Postscript 181 Index 183.


To be able to view the table of contents for this publication then please subscribe by clicking the button below...
To be able to view the full description for this publication then please subscribe by clicking the button below...