it's just part of the game 13
Applying data science to a subject area 13
Communicating data insights 14
Exploring Career Alternatives That Involve Data Science 15
The data implementer 16
The data leader 16
The data entrepreneur 17
Chapter 2: Tapping into Critical Aspects of Data Engineering 19
Defining Big Data and the Three Vs 19
Grappling with data volume 21
Handling data velocity 21
Dealing with data variety 22
Identifying Important Data Sources 23
Grasping the Differences among Data Approaches 24
Defining data science 25
Defining machine learning engineering 26
Defining data engineering 26
Comparing machine learning engineers, data scientists, and data engineers 27
Storing and Processing Data for Data Science 28
Storing data and doing data science directly in the cloud 28
Storing big data on-premise 32
Processing big data in real-time 35
Part 2: Using Data Science to Extract Meaning from Your Data 37
Chapter 3: Machine Learning Means Using a Machine to Learn from Data 39
Defining Machine Learning and Its Processes 40
Walking through the steps of the machine learning process 40
Becoming familiar with machine learning terms 41
Considering Learning Styles 42
Learning with supervised algorithms 42
Learning with unsupervised algorithms 43
Learning with reinforcement 43
Seeing What You Can Do 43
Selecting algorithms based on function 44
Using Spark to generate real-time big data analytics 48
Chapter 4: Math, Probability, and Statistical Modeling 51
Exploring Probability and Inferential Statistics 52
Probability distributions 53
Conditional probability with Naïve Bayes 55
Quantifying Correlation 56
Calculating correlation with Pearson's r 56
Ranking variable-pairs using Spearman's rank correlation 58
Reducing Data Dimensionality with Linear Algebra 59
Decomposing data to reduce dimensionality 59
Reducing dimensionality with factor analysis 63
Decreasing dimensionality and removing outliers with PCA 64
Modeling Decisions with Multiple Criteria Decision-Making 65
Turning to traditional MCDM 65
Focusing on fuzzy MCDM 67
Introducing Regression Methods 67
Linear regression 67
Logistic regression 69
Ordinary least squares (OLS) regression methods 70
Detecting Outliers 70
Analyzing extreme values 70
Detecting outliers with univariate analysis 71
Detecting outliers with multivariate analysis 73
Introducing Time Series Analysis 73
Identifying patterns in time series 74
Modeling univariate time series data 75
Chapter 5: Grouping Your Way into Accurate Predictions 77
Starting with Clustering Basics 78
Getting to know clustering algorithms 79
Examining clustering similarity metrics 81
Identifying Clusters in Your Data 82
Clustering with the k-means algorithm 82
Estimating clusters with kernel density estimation (KDE) 84
Clustering with hierarchical algorithms 84
Dabbling in the DBScan neighborhood 87
Categorizing Data with Decision Tree and Random Forest Algorithms 88
Drawing a Line between Clustering and Classification 89
Introducing instance-based learning classifiers 90
Getting to know classification algorithms 90
Making Sense of Data with Nearest Neighbor Analysis 93
Classifying Data with Average Nearest Neighbor Algorithms 94
Classifying with K-Nearest Neighbor Algorithms 97
Understanding how the k-nearest neighbor algorithm works 98
Knowing when to use the k-nearest neighbor algorithm 99
Exploring common applications of k-nearest neighbour algorithms 100
Solving Real-World Problems with Nearest Neighbor Algorithms 100
Seeing k-nearest neighbor algorithms in action 101
Seeing average nearest neighbor algorithms in action 101
Chapter 6: Coding Up Data Insights and Decision Engines 103
Seeing Where Python and R Fit into Your Data Science Strategy 104
Using Python for Data Science 104
Sorting out the various Python data types 106
Putting loops to good use in Python 109
Having fun with functions 110
Keeping cool with classes 112
Checking out some useful Python libraries 114
Using Open Source R for Data Science 120
Comprehending R's basic vocabulary 121
Delving into functions and operators 124
Iterating in R 127
Observing how objects work 129
Sorting out R's popular statistical analysis packages 131
Examining packages for visualizing, mapping, and graphing in R 133
Chapter 7: Generating Insights with Software Applications 137
Choosing the Best Tools for Your Data Science Strategy 138
Getting a Handle on SQL and Relational Databases 139
Investing Some Effort into Database Design 144
Defining data types 144
Designing constraints properly 145
Normalizing your database 145
Narrowing the Focus with SQL Functions 147
Making Life Easier with Excel 151
Using Excel to quickly get to know your data 152
Reformatting and summarizing with PivotTables 157
Automating Excel tasks with macros 158
Chapter 8: Telling Powerful Stories with Data 161
Data Visualizations: The Big Three 162
Data storytelling for decision makers 162
Data showcasing for analysts 163
Designing data art for activists 164
Designing to Meet the Needs of Your Target Audience 164
Step 1: Brainstorm (All about Eve) 165
Step 2: Define the purpose 166
Step 3: Choose the most functional visualization type for your purpose 166
Picking the Most Appropriate Design Style 167
Inducing a calculating, exacting response 167
Eliciting a strong emotional response 168
Selecting the Appropriate Data Graphic Type 170
Standard chart graphics 171
Comparative graphics 173
Statistical plots 176
Topology structures 179
Spatial plots and maps 180
Testing Data Graphics 183
Adding Context 184
Creating context with data 184
Creating context with annotations 185
Creating context with graphical elements 186
Part 3: Taking Stock of Your Data Science Capabilities 187
Chapter 9: Developing Your Business Acumen 189
Bridging the Business Gap 189
Contrasting business acumen with subject matter expertise 190
Defining business acumen 191
Traversing the Business Landscape 192
Seeing how data roles support the business in making money 192
Leveling up your business acumen 195
Fortifying your leadership skills 196
Surveying Use Cases and Case Studies 197
Documen