Introduction to Machine Learning Models
Machine learning models (algorithms) can be classified in multiple ways. Though organizing these models is complex, several useful classification approaches help us understand them better.
Each classification method emphasizes different model characteristics and serves important purposes:
- Choose the appropriate model for specific problems (e.g., regression, classification, clustering)
- Understand the data requirements and underlying assumptions of each model
- Evaluate computational complexity and efficiency during training and inference
- Compare performance between similar models within the same category
- Facilitate interpretability and explainability of the model in its application context
However, too many classifications can create confusion rather than clarity. For this reason, we’ll focus on two main classifications while briefly summarizing the others.
These two main classification approaches for machine learning algorithms—the “paradigm-based” approach and the “problem-based” approach—are complementary. They examine two distinct yet equally crucial aspects of understanding these tools.
Machine Learning Models: A Paradigm-Based Approach
Supervised Learning
Supervised learning is one of the most common approaches in machine learning. This method trains a model on a dataset containing input features (independent variables) and corresponding target outcomes (dependent variables). The model’s task is to learn the relationship between these features and targets. Once trained, it can make predictions or classifications for new data where the target is unknown. This approach is powerful because it uses known outcomes to guide learning.
The following algorithms employ this approach:
- Linear and logistic regression
- Decision trees
- Support Vector Machines (SVMs)
- Neural networks
This type of algorithm is commonly used for visual recognition, medical diagnostics, and asset valuation.
Unsupervised Learning
In contrast to supervised learning, unsupervised learning uses datasets without predefined outcome information for model training.
The model’s task is to identify relationships and patterns within the provided data, grouping them into homogeneous sets.
This category includes:
- Clustering algorithms (e.g., K-Means and DBSCAN)
- Dimensionality reduction algorithms (e.g., PCA, t-SNE)
These techniques are applied in dimensionality reduction, customer segmentation, and exploratory data analysis.
Semi-supervised Learning
This type of learning occurs when only a portion of the available data contains target information, and obtaining the missing information would be too costly or complicated.
In such cases, classification is performed based on the limited available information.
Typically, the algorithms used are variations of unsupervised learning techniques.
Reinforcement Learning
This paradigm differs significantly from the previously discussed approaches. It involves an algorithm interacting with an environment, receiving rewards or punishments based on its actions. Learning is thus guided by the environment’s feedback to the algorithm’s decisions.
Examples of reinforcement learning algorithms include:
- Q-learning
- Deep Q Network
These algorithms find applications in robotic control, game strategy development, and management of dynamic environments such as portfolio optimization.
Self-supervised Learning
This approach is similar to unsupervised learning, but the algorithm creates its own pseudo-labels (also known as pseudo-targets or pseudo-outcomes).
Self-supervised learning is widely used in deep neural networks and includes models such as:
- BERT
- GPT
These models are primarily applied in natural language processing and image recognition tasks.
Generative Models
These sophisticated models are designed to learn the underlying patterns and distributions of training data, enabling them to create entirely new data samples that closely resemble the initial dataset. By capturing the essential characteristics and statistical properties of the input data, these models can generate synthetic but realistic outputs.
Examples of generative algorithms include:
- GANs (Generative Adversarial Networks), which use a competitive training process between a generator and discriminator to create highly realistic synthetic data
- VAEs (Variational Autoencoders), which learn compressed representations of data while maintaining the ability to generate new samples through probabilistic modeling
Generative models have revolutionized creative applications, finding extensive use in generating synthetic text content, creating photorealistic images, producing high-quality video sequences, and even composing music. Their ability to understand and replicate complex patterns has made them invaluable tools in various creative and practical applications.
Machine Learning Model: A Problem-based Approach
Regression
Regression algorithms predict continuous variables (such as prices, temperatures, and other values expressed on a continuous scale). These supervised algorithms are used for estimating numerical outcomes.
This group includes:
- Simple regression
- Polynomial regression
In the medical field, regression algorithms help predict continuous values—such as hospital stay duration and blood glucose levels.
Classification
Unlike regression, classification algorithms predict discrete, categorical variables. These supervised algorithms sort data into distinct categories.
This group includes:
- Support Vector Machines (SVM)
- Decision trees
Classification algorithms are widely used in clinical settings, particularly in automated diagnosis systems that analyze radiological images to distinguish between benign and malignant tumors.
Ensemble
Ensemble algorithms form a unique category, applicable to regression and classification problems. They improve predictions by combining multiple models.
Examples include:
- Bagging algorithms (e.g., Random Forest)
- Boosting algorithms (e.g., Gradient Boosting Machine, XGBoost)
Ensemble models find extensive applications in clinical settings, particularly in enhancing clinical decision support systems and risk assessment.
Unsupervised
These algorithms are identical to the unsupervised category discussed in the previous classification. They work with unlabeled data to discover patterns and structures.
Unsupervised algorithms help identify patient subgroups with similar characteristics—for example, grouping patients with metabolic syndrome or stratifying cancer patients based on tumor characteristics.
In the “problem-based” classification, regression, classification, and ensemble categories fall under the Supervised Learning category of the “paradigm-based” classification. The “unsupervised” category, however, remains consistent across both classification methods.
Summary of “paradigm-based” – “problem-based” classifications
By Paradigm | By Problem |
---|---|
Supervised Learning | Regression |
Unsupervised Learning | Classification |
Semi-supervised Learning | Ensemble |
Reinforcement Learning | Unsupervised |
Self-supervised Learning | |
Generative Models |
Alternative Classifications of Machine Learning Models
As mentioned earlier, there are numerous other classifications of machine learning algorithms. However, these additional classifications don’t significantly enhance our understanding beyond what we’ve already explored in depth.
Nevertheless, these alternative classifications can be valuable in specific contexts. They often highlight particular requirements or considerations, such as the type of data being processed or the computational resources needed.
Let’s briefly recap additional classification approaches:
- Model type: probabilistic, linear, non-linear, distance-based, and generative
- Computational complexity: lightweight or intensive
- Data type: tabular, sequential, or unstructured
- Interpretability: interpretable or black-box
- Optimization method: gradient-based, heuristic, or exact
- Training mode: batch, online, or incremental learning
- Application domain: computer vision, natural language processing, recommendation systems, or anomaly detection
Further Reading
Scikit-learn: choosing the right estimator
Types of Machine Learning – GeeksforGeeks
Machine Learning Algorithms Cheat Sheet: Types, Applications…
Conclusion
The machine learning model classifications presented here provide complementary perspectives for understanding and organizing these powerful tools. The paradigm-based classification illuminates how models learn—whether through supervised, unsupervised, or reinforcement learning—while the problem-based classification helps us select the most appropriate model for specific tasks like regression or classification. This dual framework enables us to identify the ideal model from a broad range for our specific analytical needs.
This course will be structured according to the “problem-based” classification when describing various machine learning systems.