Table of Contents Introduction 1 About This Book 1 Foolish Assumptions 3 Icons Used in This Book 4 Beyond the Book 4 Where to Go from Here 5 Part 1: Getting Started With Data Science And Python 7 Chapter 1: Discovering the Match between Data Science and Python 9 Defining the Sexiest Job of the 21st Century 11 Considering the emergence of data science 12 Outlining the core competencies of a data scientist 12 Linking data science, big data, and AI 13 Understanding the role of programming 14 Creating the Data Science Pipeline 14 Preparing the data 15 Performing exploratory data analysis 15 Learning from data 15 Visualizing 15 Obtaining insights and data products 16 Understanding Python''s Role in Data Science 16 Considering the shifting profile of data scientists 16 Working with a multipurpose, simple, and efficient language 17 Learning to Use Python Fast 18 Loading data 19 Training a model 19 Viewing a result 19 Chapter 2: Introducing Python''s Capabilities and Wonders 21 Why Python? 22 Grasping Python''s Core Philosophy 23 Contributing to data science 23 Discovering present and future development goals 24 Working with Python 25 Getting a taste of the language 25 Understanding the need for indentation 26 Working at the command line or in the IDE 27 Performing Rapid Prototyping and Experimentation 31 Considering Speed of Execution 32 Visualizing Power 33 Using the Python Ecosystem for Data Science 35 Accessing scientific tools using SciPy 35 Performing fundamental scientific computing using NumPy 36 Performing data analysis using pandas 36 Implementing machine learning using Scikit-learn 36 Going for deep learning with Keras and TensorFlow 37 Plotting the data using matplotlib 38 Creating graphs with NetworkX 38 Parsing HTML documents using Beautiful Soup 38 Chapter 3: Setting Up Python for Data Science 39 Considering the Off-the-Shelf Cross-Platform Scientific Distributions 40 Getting Continuum Analytics Anaconda 40 Getting Enthought Deployment Manager 41 Getting WinPython 42 Installing Anaconda on Windows 42 Installing Anaconda on Linux 46 Installing Anaconda on Mac OS X 47 Downloading the Datasets and Example Code 48 Using Jupyter Notebook 49 Defining the code repository 50 Understanding the datasets used in this book 57 Chapter 4: Working with Google Colab 59 Defining Google Colab 60 Understanding what Google Colab does 60 Considering the online coding difference 61 Using local runtime support 63 Getting a Google Account 63 Creating the account 64 Signing in 64 Working with Notebooks 65 Creating a new notebook 65 Opening existing notebooks 66 Saving notebooks 68 Downloading notebooks 71 Performing Common Tasks 71 Creating code cells 71 Creating text cells 72 Creating special cells 73 Editing cells 74 Moving cells 75 Using Hardware Acceleration 75 Executing the Code 76 Viewing Your Notebook 76 Displaying the table of contents 77 Getting notebook information 77 Checking code execution 78 Sharing Your Notebook 79 Getting Help 80 Part 2: getting your hands dirty with data 81 Chapter 5: Understanding the Tools 83 Using the Jupyter Console 84 Interacting with screen text 84 Changing the window appearance 86 Getting Python help 87 Getting IPython help 89 Using magic functions 90 Discovering objects 91 Using Jupyter Notebook 93 Working with styles 93 Restarting the kernel 94 Restoring a checkpoint 95 Performing Multimedia and Graphic Integration 96 Embedding plots and other images 96 Loading examples from online sites 96 Obtaining online graphics and multimedia 96 Chapter 6: Working with Real Data 99 Uploading, Streaming, and Sampling Data 100 Uploading small amounts of data into memory 101 Streaming large amounts of data into memory 102 Generating variations on image data 103 Sampling data in different ways 104 Accessing Data in Structured Flat-File Form 105 Reading from a text file 106 Reading CSV delimited format 107 Reading Excel and other Microsoft Office files 109 Sending Data in Unstructured File Form 111 Managing Data from Relational Databases 113 Interacting with Data from NoSQL Databases 115 Accessing Data from the Web 116 Chapter 7: Conditioning Your Data 121 Juggling between NumPy and pandas 122 Knowing when to use NumPy 122 Knowing when to use pandas 122 Validating Your Data 124 Figuring out what''s in your data 124 Removing duplicates 126 Creating a data map and data plan 126 Manipulating Categorical Variables 129 Creating categorical variables 130 Renaming levels 131 Combining levels 132 Dealing with Dates in Your Data 133 Formatting date and time values 134 Using the right time transformation 135 Dealing with Missing Data 136 Finding the missing data 136 Encoding missingness 137 Imputing missing data 138 Slicing and Dicing: Filtering and Selecting Data 139 Slicing rows 140 Slicing columns 140 Dicing 141 Concatenating and Transforming 142 Adding new cases and variables 142 Removing data 144 Sorting and shuffling 145 Aggregating Data at Any Level 146 Chapter 8: Shaping Data 149 Working with HTML Pages 150 Parsing XML and HTML 150 Using XPath for data extraction 151 Working with Raw Text 153 Dealing with Unicode 153 Stemming and removing stop words 153 Introducing regular expressions 155 Using the Bag of Words Model and Beyond 158 Understanding the bag of words model 159 Working with n-grams 161 Implementing TF-IDF transformations 162 Working with Graph Data 165 Understanding the adjacency matrix 165 Using NetworkX basics 166 Chapter 9: Putting What You Know in Action 169 Contextualizing Problems and Data 170 Evaluating a data science problem 171 Researching solutions 173 Formulating a hypothesis 174 Preparing your data 175 Considering the Art of Feature Creation 175 Defining feature creation 175 Combining variables 176 Understanding binning and discretization 177 Using indicator variables 177 Transforming distributions 178 Performing Operations on Arrays 178 Using vectorization 179 Performing simple arithmetic on vectors and matrices 179 Performing matrix vector multiplication 180 Performing matrix multiplication 181 Part 3: visualizing information 183 Chapter 10: Getting a Crash Course in MatPlotLib 185 Starting with a Graph 186 Defining the plot 186 Drawing multiple lines and plots 187 Saving your work to disk 188 Setting the Axis, Ticks, Grids 189 Getting the axes 189 Formatting the axes 190 Adding grids 191 Defining the Line Appearance 192 Working with line styles 193 Using colors 194 Adding markers 195 Using Labels, Annotations, and Legends 197 Adding labels 198 Annotating the chart 198 Creating a legend 199 Chapter 11: Visualizing the Data 201 Choosing the Right Graph 202 Showing parts of a whole with pie charts 202 Creating comparisons with bar charts 203 Showing distributions using histograms 205 Depicting groups using boxplots 206 Seeing data patterns using scatterplots 208 Creating Advanced Scatterplots 209 Depicting groups 209 Showing correlations 211 Plotting Time Series 212 Representing time on axes 212 Plotting trends over time 214 Plotting Geographical Data 216 Using an environment in Notebook 217 Getting the Basemap toolkit 218 Dealing with deprecated library issues 218 Using Basemap to plot geographic data 220 Visualizing Graphs 221 Developing undirected graphs 222 Developing directed graphs 224 Part 4: wrangling data 227 Chapter 12: Stretching Python''s Capabilities 229 Playing with Scikit-learn 230 Understanding classes in Scikit-learn 230 Defining applications for data science 231 Performing the Hashing Trick 234 Using hash functions 235 Demonstrating the hashing trick 235 Working with deterministic selection 239 Considering Timing and Performance 240 Benchmarking with timeit 241 Working with the memory profiler 244 Running in Parallel on Multiple Cores 247 Performing multicore parallelism 248 Demonstrating multiprocessing 248 Chapter 13: Exploring Data Analysis 251 The EDA Approach 252 Defining Descriptive Statistics for Numeric Data 253 Measuring central tendency 254 Measuring variance and range 255 Working with percentiles 256 Defining measures of normality 257 Counting for Categorical Data 259 Understanding frequencies 259 Creating contingency tables 261 Creating Applied Visualization for EDA 261<.
Python for Data Science for Dummies