A Quick Introduction to Machine Learning
The Steps of a data journey – situating machine learning
This is a visual representation of the data journey, from collecting the data to exploring, cleaning, describing and understanding the data, to analyzing the data, and lastly, to communicating with others the story the data tell.
Step 1: Define, find, gather
The first step is to get data, whether this is using a pre-established database or establishing what variables are needed and creating and implementing a collection method. Security measures should be established and implemented to protect the integrity of the data once it’s been collected.
Step 2: Explore, clean, describe
Data should be explored to understand the format and variables and also checked for for errors and missing values. It may be necessary to clean the data before using it for analysis which includes doing such things like correcting formatting, removing or correcting erroneous data, or something as simple as taking out extra space. It important to document what you found and what you did to clean the data.
Step 3: Analyze, model
The purpose of doing analysis and modeling is to use statistical techniques to turn the data into information to provide meaningful insights. Analysis and modelling is used to describe a phenomenon, draw conclusions about a population or make predictions about future events.
Step 4: Tell the story
The statistical information that comes from analysis and modeling is easier to digest if it is presented in some sort of story. It could be a research paper, an infographic, an article for the media, or some combination of these and other data presentation methods.
Foundation: stewardship, metadata, standards and quality
In order to successfully follow the steps of the data journey, it is essential to build your work on a solid foundation of stewardship, metadata, standards and quality.
Stewardship encompasses all activities to govern, safeguard and protect data.
Metadata should describe all the processing and manipulation that the data has undergone.
Standard methods, practices and classifications should be applied throughout.
Quality should be proactively managed throughout the process and relevant quality indicators should accompany all deliverables.
This video course will focus primarily on how Machine learning can be used at the find, gather and protect step in the data journey to search through data and find only the parts that are needed. It can also be used at the explorer clean and describe step in the data journey to reveal what’s in the data. And finally, machine learning can be used at the analyze and model step in the data journey to find relationships between variables and predict outcomes or future events.