Introduction xix Chapter 1 What Is Machine Learning? 1 History of Machine Learning 1 Alan Turing 1 Arthur Samuel 2 Tom M. Mitchell 2 Summary Definition 2 Algorithm Types for Machine Learning 3 Supervised Learning 3 Unsupervised Learning 3 The Human Touch 4 Uses for Machine Learning 4 Software 4 Stock Trading 5 Robotics 6 Medicine and Healthcare 6 Advertising 6 Retail and E-Commerce 7 Gaming Analytics 8 The Internet of Things 9 Languages for Machine Learning 10 Python 10 R 10 Matlab 10 Scala 10 Clojure 11 Ruby 11 Software Used in This Book 11 Checking the Java Version 11 Weka Toolkit 12 Mahout 12 SpringXD 13 Hadoop 13 Using an IDE 14 Data Repositories 14 UC Irvine Machine Learning Repository 14 Infochimps 14 Kaggle 15 Summary 15 Chapter 2 Planning for Machine Learning 17 The Machine Learning Cycle 17 It All Starts with a Question 18 I Don''t Have Data! 19 Starting Local 19 Competitions 19 One Solution Fits All? 20 Defining the Process 20 Planning 20 Developing 21 Testing 21 Reporting 21 Refining 22 Production 22 Building a Data Team 22 Mathematics and Statistics 22 Programming 23 Graphic Design 23 Domain Knowledge 23 Data Processing 23 Using Your Computer 24 A Cluster of Machines 24 Cloud-Based Services 24 Data Storage 25 Physical Discs 25 Cloud-Based Storage 25 Data Privacy 25 Cultural Norms 25 Generational Expectations 26 The Anonymity of User Data 26 Don''t Cross "The Creepy Line" 27 Data Quality and Cleaning 28 Presence Checks 28 Type Checks 29 Length Checks 29 Range Checks 30 Format Checks 30 The Britney Dilemma 30 What''s in a Country Name? 33 Dates and Times 35 Final Thoughts on Data Cleaning 35 Thinking about Input Data 36 Raw Text 36 Comma Separated Variables 36 JSON 37 YAML 39 XML 39 Spreadsheets 40 Databases 41 Thinking about Output Data 42 Don''t Be Afraid to Experiment 42 Summary 43 Chapter 3 Working with Decision Trees 45 The Basics of Decision Trees 45 Uses for Decision Trees 45 Advantages of Decision Trees 46 Limitations of Decision Trees 46 Different Algorithm Types 47 How Decision Trees Work 48 Decision Trees in Weka 53 The Requirement 53 Training Data 53 Using Weka to Create a Decision Tree 55 Creating Java Code from the Classifi cation 60 Testing the Classifi er Code 64 Thinking about Future Iterations 66 Summary 67 Chapter 4 Bayesian Networks 69 Pilots to Paperclips 69 A Little Graph Theory 70 A Little Probability Theory 72 Coin Flips 72 Conditional Probability 72 Winning the Lottery 73 Bayes'' Theorem 73 How Bayesian Networks Work 75 Assigning Probabilities 76 Calculating Results 77 Node Counts 78 Using Domain Experts 78 A Bayesian Network Walkthrough 79 Java APIs for Bayesian Networks 79 Planning the Network 79 Coding Up the Network 81 Summary 90 Chapter 5 Artificial Neural Networks 91 What Is a Neural Network? 91 Artificial Neural Network Uses 92 High-Frequency Trading 92 Credit Applications 93 Data Center Management 93 Robotics 93 Medical Monitoring 93 Breaking Down the Artifi cial Neural Network 94 Perceptrons 94 Activation Functions 95 Multilayer Perceptrons 96 Back Propagation 98 Data Preparation for Artifi cial Neural Networks 99 Artificial Neural Networks with Weka 100 Generating a Dataset 100 Loading the Data into Weka 102 Configuring the Multilayer Perceptron 103 Training the Network 105 Altering the Network 108 Increasing the Test Data Size 108 Implementing a Neural Network in Java 109 Create the Project 109 The Code 111 Converting from CSV to Arff 114 Running the Neural Network 114 Summary 115 Chapter 6 Association Rules Learning 117 Where Is Association Rules Learning Used? 117 Web Usage Mining 118 Beer and Diapers 118 How Association Rules Learning Works 119 Support 121 Confidence 121 Lift 122 Conviction 122 Defining the Process 122 Algorithms 123 Apriori 123 FP-Growth 124 Mining the Baskets--A Walkthrough 124 Downloading the Raw Data 124 Setting Up the Project in Eclipse 125 Setting Up the Items Data File 126 Setting Up the Data 129 Running Mahout 131 Inspecting the Results 133 Putting It All Together 135 Further Development 136 Summary 137 Chapter 7 Support Vector Machines 139 What Is a Support Vector Machine? 139 Where Are Support Vector Machines Used? 140 The Basic Classifi cation Principles 140 Binary and Multiclass Classifi cation 140 Linear Classifi ers 142 Confidence 143 Maximizing and Minimizing to Find the Line 143 How Support Vector Machines Approach Classifi cation 144 Using Linear Classifi cation 144 Using Non-Linear Classifi cation 146 Using Support Vector Machines in Weka 147 Installing LibSVM 147 A Classification Walkthrough 148 Implementing LibSVM with Java 154 Summary 159 Chapter 8 Clustering 161 What Is Clustering? 161 Where Is Clustering Used? 162 The Internet 162 Business and Retail 163 Law Enforcement 163 Computing 163 Clustering Models 164 How the K-Means Works 164 Calculating the Number of Clusters in a Dataset 166 K-Means Clustering with Weka 168 Preparing the Data 168 The Workbench Method 169 The Command-Line Method 174 The Coded Method 178 Summary 186 Chapter 9 Machine Learning in Real Time with Spring XD 187 Capturing the Firehose of Data 187 Considerations of Using Data in Real Time 188 Potential Uses for a Real-Time System 188 Using Spring XD 189 Spring XD Streams 190 Input Sources, Sinks, and Processors 190 Learning from Twitter Data 193 The Development Plan 193 Configuring the Twitter API Developer Application 194 Configuring Spring XD 196 Starting the Spring XD Server 197 Creating Sample Data 198 The Spring XD Shell 198 Streams 101 199 Spring XD and Twitter 202 Setting the Twitter Credentials 202 Creating Your First Twitter Stream 203 Where to Go from Here 205 Introducing Processors 206 How Processors Work within a Stream 206 Creating Your Own Processor 207 Real-Time Sentiment Analysis 215 How the Basic Analysis Works 215 Creating a Sentiment Processor 217 Spring XD Taps 221 Summary 222 Chapter 10 Machine Learning as a Batch Process 223 Is It Big Data? 223 Considerations for Batch Processing Data 224 Volume and Frequency 224 How Much Data? 225 Which Process Method? 225 Practical Examples of Batch Processes 225 Hadoop 225 Sqoop 226 Pig 226 Mahout 226 Cloud-Based Elastic Map Reduce 226 A Note about the Walkthroughs 227 Using the Hadoop Framework 227 The Hadoop Architecture 227 Setting Up a Single-Node Cluster 229 How MapReduce Works 233 Mining the Hashtags 234 Hadoop Support in Spring XD 235 Objectives for This Walkthrough 235 What''s a Hashtag? 235 Creating the MapReduce Classes 236 Performing ETL on Existing Data 247 Product Recommendation with Mahout 250 Mining Sales Data 256 Welcome to My Coffee Shop! 257 Going Small Scale 258 Writing the Core Methods 258 Using Hadoop and MapReduce 260 Using Pig to Mine Sales Data 263 Scheduling Batch Jobs 273 Summary 274 Chapter 11 Apache Spark 275 Spark: A Hadoop Replacement? 275 Java, Scala, or Python? 276 Scala Crash Course 276 Installing Scala 276 Packages 277 Data Types 277 Classes 278 Calling Functions 278 Operators 279 Control Structures 279 Downloading and Installing Spark 280 A Quick Intro to Spark 280 Starting the Shell 281 Data Sources 282 Testing Spark 282 Spark Monitor 284 Comparing Hadoop MapReduce to Spark 285 Writing Standalone Programs with Spark 288 Spark Programs in Scala 288 Installing SBT 288 Spark Programs in Java 291 Spark Program Summary 295 Spark SQL 295 Basic Concepts 295 Using SparkSQL with RDDs 296 Spark Streaming 305 Basic Concepts 305 Creating Your First Stream with Scala 306 Creating Your First Stream with Java 309 MLib: The Machine Learning Library 311 Dependencies 311 Decision Trees 312 Clustering 313 Summary 313 Chapter 12 Machine Learning with R 315 Installing R 315 Mac OSX 315 Windows 316 Linux 316 Your First Run 316 Installing R-Studio 317 The R Basics 318 Variables and Vector.
Machine Learning : Hands-On for Developers and Technical Professionals