Gain essential skills in today’s digital age to store, process and analyse data to inform business decisions. In this course, part of the Big Data MicroMasters program, you will develop your knowledge of big data analytics and enhance your programming and mathematical skills. You will learn to use essential analytic tools such as Apache Spark and R.
By the end of this course, you will be able to approach large-scale data science problems with creativity and initiative.
Section 1: Simple linear regression
Fit a simple linear regression between two variables in R;Interpret output from R;Use models to predict a response variable;Validate the assumptions of the model.
Section 2: Modelling data
Adapt the simple linear regression model in R to deal with multiple variables;Incorporate continuous and categorical variables in their models;Select the best-fitting model by inspecting the R output.
Section 3: Many models
Manipulate nested dataframes in R;Use R to apply simultaneous linear models to large data frames by stratifying the data;Interpret the output of learner models.
Section 4: Classification
Adapt linear models to take into account when the response is a categorical variable;Implement Logistic regression (LR) in R;Implement Generalised linear models (GLMs) in R;Implement Linear discriminant analysis (LDA) in R.
Section 5: Prediction using models
Implement the principles of building a model to do prediction using classification;Split data into training and test sets, perform cross validation and model evaluation metrics;Use model selection for explaining data with models;Analyse the overfitting and bias-variance trade-off in prediction problems.
Section 6: Getting bigger
Set up and apply sparklyr;Use logical verbs in R by applying native sparklyr versions of the verbs.
Section 7: Supervised machine learning with sparklyr
Apply sparklyr to machine learning regression and classification models;Use machine learning models for prediction;Illustrate how distributed computing techniques can be used for “bigger” problems.
Section 8: Deep learning
Use massive amounts of data to train multi-layer networks for classification;Understand some of the guiding principles behind training deep networks, including the use of autoencoders, dropout, regularization, and early termination;Use sparklyr and H2O to train deep networks.
Section 9: Deep learning applications and scaling up
Understand some of the ways in which massive amounts of unlabelled data, and partially labelled data, is used to train neural network models;Leverage existing trained networks for targeting new applications;Implement architectures for object classification and object detection and assess their effectiveness.
Section 10: Bringing it all together
Consolidate your understanding of relationships between the methodologies presented in this course, theirrelative strengths, weaknesses and range of applicability of these methods.
What is Data Science?
IBM Corporation via Coursera
9 hours of effort required
250,052+ already enrolled!
★★★★★ (31,765 Ratings)
Big Data Modeling and Management Systems
The University of California, San Diego via Coursera
13 hours of effort required
47,475+ already enrolled!
★★★★★ (2,363 Ratings)
There are no reviews yet. Be the first one to write one.
It’s a tough reality: every year, over 14.1 million workers suffer from work-related injuries. For…
If you’ve ever wanted to learn how to cook, but didn’t know where to start,…
Choosing the right career path can be a daunting task, especially with the myriad of…
Believe it or not, the concept of human resources has existed for more than 100…
Web3 managed to change the gaming industry by leveraging blockchain technology. It offers a decentralized…
College is often fun and is filled with lots of activities, especially in the first…