Supervised/Unsupervised Learning Defined

AI has two approaches in programming intelligent machines: supervised learning and unsupervised learning. Supervised learning requires data with defined input/output relationships (labeled data). The comparison being taught by a supervisor or teacher. The resulting supervised learning algorithm uses the learning to predict outcomes from new input data. Over time the model must be maintained to ensure that the labeled data is both current and complete.

Unsupervised machine learning requires no supervision. Using this approach, the model works on its own to infer information from unlabeled data. There is no information on the outputs, the model identifies patterns from the data. This approach supports more complex processing tasks when compared to supervised learning. Unsupervised learning can be more unpredictable when compared to other learning methods.

A picture containing text, person, indoor, computer

Description automatically generated

Supervised Learning Advantages

Supervised learning takes advantage of collecting/developing data from existing experiences. This provides an approach for optimizing a model’s performance.  Supervised learning is valuable in addressing computational problems.

Unsupervised Learning Advantages

Unsupervised learning is adept at discovering unknown patterns in data. The identification of the patterns occurs in real-time and labeling is completed in the presence of the learners. Using unlabeled data, unsupervised learning does not require the data labeling effort. 

Supervised Learning in Action

Supervised learning trains the machine to complete a task. Suppose you wanted to predict how many games a pitcher will win in an upcoming season using prior year performance. The process requires the collection of a data set of pitching performance by pitcher. Example data could be:

  • Games won
  • Games lost
  • Strike outs
  • Walks
  • Ground balls
  • Fly balls
  • Home runs
  • Runs allowed

These inputs for a particular pitcher would be collected and the model determine the output, number of games won.

The labeled data defines a training data set used as an input for training the model. The model may conclude that more strikes and less walks are desirable. Similarly, more ground balls and fewer flyballs. The learning process takes this training data, isolates attributes and develops an algorithm(s) which become the model. 

Unsupervised Learning in Action

Unsupervised learning uses data with no labels. An example for unsupervised learning would be if you went to a baseball game and had no idea how the game is played, you would watch and make observations to develop an understanding of how the game is played. You would notice

  • There are 9 players on the field
  • Each team puts 9 players on the field while the other team’s players take turns hitting the ball
  • If the batter misses hitting the ball three times the next batter comes up
  • When the batting team has three players who swung and missed three times the team in the field gets to bat.
  • And so forth

You would be learning baseball without any assistance. The learning would have occurred by identifying patterns that were not previously known.

Summary – Supervised vs. Unsupervised Learning

The learning methods differ on how they use data. Input data is labeled for supervised learning and unlabeled for unsupervised learning. Supervised learning uses the output data to learn and outputs to new inputs.. Unsupervised learning does not use output data. Supervised learning is a simpler method with learning performed offline versus unsupervised learning being computationally more complex occurring in real time. The major unsupervised learning drawback is that without labels, complete information on data grouping and output data is not available. Supervised learning requires the classification of the data.  Supervised learning is considered a trusted process with accurate results, whereas, unsupervised learning in more unpredictable.