Although linear regression models, both simple and multiple, are widely used, they have significant limitations and must meet specific assumptions. When these assumptions aren’t met, nonlinear regression models become necessary.
Logistic Regression
The dependent variable in logistic regression models is categorical, taking either binary values (such as yes/no, success/failure) or multiple classes (like low/medium/high risk categories).
Moreover, unlike traditional linear regression which predicts continuous outcomes, logistic regression estimates the probability of an observation belonging to a particular category.
Subsequently, this fundamental difference in the target variable’s characteristics transforms what might initially appear to be a regression problem into a classification task, where the goal is to assign observations to discrete categories rather than predict a continuous value.
Finally, the mathematical framework underlying logistic regression employs a logistic function to ensure predictions are bounded between 0 and 1, making it particularly suitable for probability estimation.
Logistic regression has several key requirements:
Independence of observations: Each observation must be independent of others.
Linear relationship between independent variables and the logit of the dependent variable: While logistic regression predicts binary outcomes, the independent variables must have a linear relationship with the log-odds of the dependent variable.
Absence of multicollinearity: Independent variables should not be highly correlated with each other.
Sufficient sample size: The dataset must be large enough to produce reliable estimates.
Polynomial Regression
This specialized model type is employed in scenarios where the relationships between variables exhibit non-linear patterns and cannot be adequately captured by simple linear equations. By incorporating polynomial terms of different degrees, this regression approach can model curved relationships and complex interactions between variables.
The flexibility of polynomial regression allows it to fit data that follows quadratic, cubic, or higher-order patterns, making it particularly valuable when analyzing relationships that show systematic curvature or multiple turning points in their trajectory.
Key statistical assumptions for polynomial regression:
Correct functional relationship: The polynomial model must accurately reflect the relationship between independent and dependent variables.
Independence of observations: Each observation must be independent of all others.
Homogeneity of variance (homoscedasticity): Error variance must remain constant across all levels of independent variables.
Normality of errors: The errors must follow a normal distribution.
Ridge Regression
This advanced variant of linear regression incorporates L2 regularization techniques to address common challenges in statistical modeling.
By implementing a specialized penalty term in the optimization process, Ridge regression effectively manages and reduces the impact of correlated predictor variables within the dataset. This mathematical adjustment is particularly valuable in scenarios where multiple features might exhibit strong relationships with each other, as it helps prevent the model from becoming overly complex and susceptible to overfitting.
The regularization approach effectively shrinks less important feature coefficients toward zero, resulting in a more parsimonious and interpretable model that maintains good predictive performance while avoiding the pitfalls of excessive model complexity.
Lasso Regression
This regression technique implements L1 regularization to perform both parameter estimation and variable selection simultaneously.
Through the introduction of a specific penalization term in the objective function, Lasso (Least Absolute Shrinkage and Selection Operator) regression effectively identifies and eliminates less influential predictors by forcing their coefficients exactly to zero.
This characteristic makes it particularly valuable in high-dimensional datasets where feature selection is crucial for model interpretability and computational efficiency.
Unlike other regression methods that merely reduce the impact of less important variables, Lasso’s unique mathematical properties enable it to completely remove irrelevant features from the final model, resulting in a more streamlined and focused analytical approach.
ElasticNet Regression
This advanced regression technique represents a sophisticated hybrid approach that combines the strengths of both L1 (Lasso) and L2 (Ridge) regularization methods. By incorporating these two forms of regularization simultaneously, ElasticNet effectively addresses multiple modeling challenges in a single framework.
The method carefully balances the feature selection capabilities inherent in L1 regularization with the coefficient shrinkage properties of L2 regularization, resulting in a more robust and flexible modeling approach.
This balanced combination proves particularly valuable when dealing with datasets characterized by complex predictor variable relationships, as it helps mitigate potential overfitting issues while maintaining the ability to identify and retain the most relevant features for prediction.
The methodology allows for fine-tuning of the regularization mix through adjustable parameters, enabling analysts to optimize the trade-off between model simplicity and predictive accuracy according to specific analytical requirements.
Lasso, Ridge, and ElasticNet regressions share these key assumptions:
Linearity: A linear relationship between independent variables and the dependent variable is assumed.
Independence of observations: The observations must be independent.
Normal distribution of errors: The errors should be normally distributed.
Homogeneity of error variance: The variance of errors should be constant.
Several machine learning algorithms typically used for classification can also be adapted for regression problems, including Support Vector Machines (SVM), Random Forests, and Neural Networks.
These models will be explored in detail in the following sections.
Cross-Validation in Regression Models
Cross-validation is a fundamental technique for evaluating regression model performance and optimizing hyperparameters—such as regularization coefficients in Ridge, Lasso, and ElasticNet models.
The process involves dividing the dataset into multiple subsets (folds). During each iteration, one subset serves as the test set while the others are used for training. This rotation continues until each subset has acted as the test set exactly once. By averaging the evaluation metrics from all iterations, cross-validation provides a more robust estimate of model performance.
Nonlinear Regression in Python: Implementation in Python
To demonstrate the nonlinear regression models discussed above, we created a synthetic dataset containing patient features and analyzed it with various models.
Let’s begin by importing the necessary libraries:
# import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression, Ridge, Lasso, ElasticNet, LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, r2_score
Next, we create a synthetic dataset containing two outcomes: a categorical one (mortality) and a continuous one (length of stay). We then separate the features from the outcomes (X and y), normalize the features, and split the dataset into training and test groups.
# Generate synthetic data for non linear regression
# Clinical dataset simulation
np.random.seed(42)
n = 500
age = np.random.normal(65, 10, n)
bmi = np.random.normal(27, 4, n)
creatinine = np.random.normal(1.1, 0.3, n)
comorbidity_index = np.random.randint(0, 5, n)
# Binary outcome: mortality (with distribution check)
logit = -5 + 0.05*age + 0.2*bmi + 1.5*creatinine + 0.8*comorbidity_index
prob_mortality = 1 / (1 + np.exp(-logit))
mortality = np.random.binomial(1, prob_mortality)
# Verify that there are at least two classes
while len(np.unique(mortality)) < 2:
mortality = np.random.binomial(1, prob_mortality)
# Continuous outcome: length of stay
length_of_stay = 5 + 0.1*age + 0.3*(bmi**2) + np.random.normal(0, 2, n)
# DataFrame
df = pd.DataFrame({
'Age': age,
'BMI': bmi,
'Creatinine': creatinine,
'ComorbidityIndex': comorbidity_index,
'Mortality': mortality,
'LengthOfStay': length_of_stay
})
X = df[['Age', 'BMI', 'Creatinine', 'ComorbidityIndex']]
y_class = df['Mortality']
y_reg = df['LengthOfStay']
# Standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Stratified train/test split for classification
X_train, X_test, y_train_class, y_test_class = train_test_split(
X_scaled, y_class, test_size=0.3, random_state=0, stratify=y_class
)
# Parallel split for regression
_, _, y_train_reg, y_test_reg = train_test_split(X_scaled, y_reg, test_size=0.3, random_state=0)
With our dataset prepared, we can now implement our nonlinear regression models.
### LOGISTIC REGRESSION
log_model = LogisticRegression()
log_model.fit(X_train, y_train_class)
y_pred_log = log_model.predict(X_test)
print("Logistic Regression Accuracy:", accuracy_score(y_test_class, y_pred_log))
print("ROC AUC:", roc_auc_score(y_test_class, log_model.predict_proba(X_test)[:, 1]))
# Logistic Regression Accuracy: 0.9933333333333333
# ROC AUC: 0.8053691275167785
For logistic regression, we use these key metrics:
Accuracy: 7he percentage of correct predictions
ROC AUC: area under the ROC curve, ranging from 0.5 (random guessing) to 1.0 (perfect prediction)
For polynomial regression, we implement a quadratic (degree 2) model to capture the nonlinear relationship between BMI and Length of Stay.
Using PolynomialFeatures with degree 2, we generate both BMI and BMI² terms before applying the regression model.
### POLYNOMIAL REGRESSION
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(df[['BMI']])
X_poly_train, X_poly_test, y_poly_train, y_poly_test = train_test_split(X_poly, y_reg, test_size=0.3, random_state=0)
poly_model = LinearRegression()
poly_model.fit(X_poly_train, y_poly_train)
y_poly_pred = poly_model.predict(X_poly_test)
print("Polynomial Regression R2:", r2_score(y_poly_test, y_poly_pred))
plt.scatter(df['BMI'], y_reg, alpha=0.3, label='True')
plt.scatter(df['BMI'], poly_model.predict(X_poly), color='red', alpha=0.5, label='Predicted')
plt.xlabel('BMI')
plt.ylabel('Length of Stay')
plt.title('Polynomial Regression (Degree 2)')
plt.legend()
plt.show()
# Polynomial Regression R2: 0.9988122026503744
For polynomial regression, we use R² (R-squared) as a metric that measures how much variance in length of stay is explained by BMI. This value ranges from 0 (indicating the model explains none of the variance) to 1 (indicating the model explains all of the variance).

Ridge Regression models the four features linearly to predict the Length of Stay outcome.
It uses L2 penalization to reduce overfitting by shrinking the coefficients of correlated variables.
The model uses a regularization factor (alpha = 1) to control the strength of this penalization.
In our example, as in the following models, the R² is extremely low, suggesting the features contribute very little to explaining the variance in the target variable.
### RIDGE REGRESSION
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train_reg)
y_ridge_pred = ridge.predict(X_test)
print("Ridge Regression R2:", r2_score(y_test_reg, y_ridge_pred))
# Ridge Regression R2: 0.00926429710537402
Lasso Regression employs L1 penalization to eliminate less significant variables.
### LASSO REGRESSION
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train_reg)
y_lasso_pred = lasso.predict(X_test)
print("Lasso Regression R2:", r2_score(y_test_reg, y_lasso_pred))
# Lasso Regression R2: 0.009750680411254486
ElasticNet Regression applies both L1 and L2 penalties. Setting l1_ratio to 0.5 ensures that the weights of both penalties are equal.
### ELASTICNET REGRESSION
elastic = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic.fit(X_train, y_train_reg)
y_elastic_pred = elastic.predict(X_test)
print("ElasticNet Regression R2:", r2_score(y_test_reg, y_elastic_pred))
#ElasticNet Regression R2: 0.009368200844887875
Overview of Nonlinear Regression Models
Model | Type | Main Output | Interpretation |
---|---|---|---|
Logistic Regression | Classification | Accuracy, ROC AUC | Binary performance |
Polynomial Regression | Nonlinear regression | R² + Plot | Quadratic curve |
Ridge | Regression | R² | Controls overfitting |
Lasso | Regression | R² | Performs variable selection |
ElasticNet | Regression | R² | Combines L1 and L2 |
Conclusion
Nonlinear regression models provide powerful tools for uncovering complex relationships that traditional linear methods cannot capture. In the medical field, where outcomes depend on intricate interactions and thresholds, using only linear assumptions can produce misleading conclusions.
The key takeaway: there is no one-size-fits-all model.
Each regression type has its own strengths and limitations that depend on the data structure, clinical question, and required level of interpretability.
In clinical data science, it’s crucial not only to find the best predictive model but also to select one that aligns with medical reasoning, withstands noise, and leads to actionable decisions.
When domain knowledge meets statistical techniques, nonlinear regressions transform from mere mathematical tools into instruments of insight.