Basic Workflow Pattern
Loading...
Core Methods
fit(X, y) → Train model
predict(X) → Generate predictions
predict_proba(X) → Class probabilities (for supported classifiers)
score(X, y) → Default metric
Accuracy for classifiers
R² for regressors
Classification Algorithms
Linear Models
Logistic Regression
Loading...
Use for: Baseline classification, linearly separable data
Key Parameters:
C → Inverse of regularization (smaller = stronger regularization)
penalty → "l1" or "l2"
Linear SVM (LinearSVC)
Loading...
Use for: Large sparse datasets, text classification
Support Vector Machines (Kernel SVM)
Loading...
Use for: Non-linear decision boundaries
Key Parameters:
kernel → "linear", "rbf", "poly", "sigmoid"
C → Regularization strength
gamma → Controls curve complexity (RBF, poly, sigmoid)
Tree-Based Methods
Decision Tree Classifier
Loading...
Use for: Interpretable models, mixed feature types
Key Parameters:
max_depth
min_samples_split
min_samples_leaf
Random Forest Classifier
Loading...
Use for: Strong baseline for tabular data
Key Parameters:
n_estimators → Number of trees
max_depth → Tree depth
Gradient Boosting Classifier
Loading...
Use for: High performance on tabular data (slower than Random Forest)
ExtraTreesClassifier (Extremely Randomized Trees)
Loading...
Note: Similar to Random Forest but uses more random splits → can reduce variance and improve speed.
KNN & Naive Bayes
K-Nearest Neighbors (KNN)
Loading...
Use for: Small datasets, non-linear boundaries
Important: Sensitive to scaling and large datasets.
Naive Bayes
Loading...
Use for:
Very fast baseline
Text classification (MultinomialNB)
Regression Algorithms
Linear Regression Family
Linear Regression
Loading...
Use for: Baseline regression
Ridge Regression (L2)
Loading...
Lasso Regression (L1)
Loading...
ElasticNet (L1 + L2)
Loading...
Use when: Many features may be irrelevant (sparse solution).
Support Vector Regression (SVR)
Loading...
Use for: Non-linear regression (may be slow on large datasets)
Tree-Based Regression
DecisionTreeRegressor
Loading...
RandomForestRegressor
Loading...
GradientBoostingRegressor
Loading...
Choose based on: Accuracy vs speed trade-off.
Clustering Algorithms
K-Means
Loading...
Use for: Well-separated spherical clusters
Select n_clusters using:
Elbow method
Silhouette score
Hierarchical & Density-Based
Agglomerative Clustering
Loading...
DBSCAN
Loading...
Use DBSCAN for:
Arbitrary-shaped clusters
Outlier detection
Dimensionality Reduction
PCA (Principal Component Analysis)
Loading...
Use for:
Visualization
Removing redundancy
Speeding up models
t-SNE
Loading...
Use for: 2D/3D visualization of high-dimensional data
Model Selection & Evaluation
Train/Test Split
Loading...
Tip: Use stratify=y for classification to preserve class balance.
Cross-Validation
Loading...
Common Scoring Options:
"accuracy"
"f1"
"roc_auc"
"r2"
"neg_mean_squared_error"
Hyperparameter Tuning
Grid Search
Loading...
Random Search
Loading...
Use when: Parameter space is large (faster than full grid search).
Preprocessing Cheats
Scaling & Normalization
Loading...
Important for:
SVM
KNN
Logistic Regression
Neural networks
Encoding Categorical Variables
Loading...
Pipelines (Best Practice)
Loading...
Advantages:
Prevents data leakage
Combines preprocessing + model
Easy hyperparameter tuning
Loading...
Quick “Which Algorithm Should I Try?” Guide
Classification
Start with: LogisticRegression or RandomForestClassifier
Complex boundaries / small-medium data: SVC
Text / sparse data: LinearSVC, MultinomialNB
Regression
Start with: LinearRegression or Ridge
Non-linear + tabular: RandomForestRegressor, GradientBoostingRegressor
Small, complex data: SVR
Clustering
Spherical clusters: KMeans
Arbitrary shapes & outliers: DBSCAN
Hierarchical view: AgglomerativeClustering
Additional Readings
To deepen your understanding of Scikit-Learn algorithms, preprocessing techniques, and model workflows, explore:
“Using Scikit-learn in Python for Machine Learning Tasks” — Library overview and workflows (AlmaBetter)
“Data Preprocessing with Scikit-Learn: A Tutorial” — Encoding, scaling, and feature engineering (AlmaBetter)
