# Simple-ML A simple, robust JavaScript machine learning library built from scratch with no external dependencies. Simple-ML provides easy-to-use implementations of popular machine learning algorithms for regression, classification, clustering, and data preprocessing. ## 🎯 Philosophy - **Simplicity**: Intuitive and consistent API - **Robustness**: Rigorous input validation and edge case handling - **Performance**: Optimized pure JavaScript implementations - **Modularity**: Clear organizational structure ## 📦 Installation ### Node.js / NPM ```bash npm install simple-ml ``` ### Browser (via CDN) ```html ``` ### ES Modules (Modern Browsers) ```html ``` ## 🚀 Quick Start ```javascript import { LinearRegression, trainTestSplit } from 'simple-ml'; // Prepare your data (use at least 10-20 samples for reliable results) const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]; const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]; // Split into training and test sets const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.2 }); // Create and train model const model = new LinearRegression(); model.fit(XTrain, yTrain); // Make predictions const predictions = model.predict(XTest); // Evaluate model const score = model.score(XTest, yTest); console.log('R² Score:', score); // Close to 1.0 for perfect fit ``` --- ## 📚 Complete API Reference with Examples ### 1. Regression Algorithms #### 1.1 Linear Regression Ordinary Least Squares Linear Regression. ```javascript import { LinearRegression } from 'simple-ml'; // Create model with options const model = new LinearRegression({ fitIntercept: true, // Whether to calculate intercept (default: true) normalize: false // Whether to normalize features (default: false) }); // Training data const X = [[1], [2], [3], [4], [5]]; const y = [2, 4, 6, 8, 10]; // Fit the model model.fit(X, y); // Access model parameters console.log('Coefficients:', model.coefficients); // [2.0] console.log('Intercept:', model.intercept); // 0.0 // Make predictions const predictions = model.predict([[6], [7]]); console.log('Predictions:', predictions); // [12, 14] // Evaluate model (R² score) const score = model.score(X, y); console.log('R² Score:', score); // 1.0 (perfect fit) ``` **Multiple Features Example:** ```javascript // Multiple features const X = [ [1, 2], [2, 3], [3, 4], [4, 5] ]; const y = [5, 8, 11, 14]; const model = new LinearRegression(); model.fit(X, y); console.log('Coefficients:', model.coefficients); // [1.0, 2.0] console.log('Intercept:', model.intercept); const pred = model.predict([[5, 6]]); console.log('Prediction:', pred); // [17] ``` #### 1.2 Ridge Regression Linear Regression with L2 regularization. ```javascript import { RidgeRegression } from 'simple-ml'; // Create Ridge model const ridge = new RidgeRegression({ alpha: 1.0, // Regularization strength (default: 1.0) fitIntercept: true, normalize: false }); // Training data const X = [[1], [2], [3], [4], [5]]; const y = [2.1, 3.9, 6.2, 7.8, 10.1]; // Fit model ridge.fit(X, y); console.log('Coefficients:', ridge.coefficients); console.log('Intercept:', ridge.intercept); // Make predictions const predictions = ridge.predict([[6], [7]]); console.log('Predictions:', predictions); // Evaluate const score = ridge.score(X, y); console.log('R² Score:', score); ``` **Tuning Alpha Example:** ```javascript // Compare different alpha values const alphas = [0.1, 1.0, 10.0, 100.0]; alphas.forEach(alpha => { const model = new RidgeRegression({ alpha }); model.fit(X, y); const score = model.score(X, y); console.log(`Alpha ${alpha}: R² = ${score.toFixed(4)}`); }); ``` #### 1.3 Lasso Regression Linear Regression with L1 regularization (feature selection). ```javascript import { LassoRegression } from 'simple-ml'; // Create Lasso model const lasso = new LassoRegression({ alpha: 0.1, // Regularization strength (default: 1.0) maxIterations: 1000, // Max iterations for coordinate descent tolerance: 1e-4, // Convergence tolerance fitIntercept: true }); // Training data with correlated features const X = [ [1, 1], [2, 2], [3, 3], [4, 4], [5, 5] ]; const y = [2, 4, 6, 8, 10]; // Fit model lasso.fit(X, y); console.log('Coefficients:', lasso.coefficients); console.log('Intercept:', lasso.intercept); // Lasso may zero out some coefficients console.log('Non-zero features:', lasso.coefficients.filter(c => Math.abs(c) > 1e-10).length ); // Predictions const predictions = lasso.predict([[6, 6]]); console.log('Predictions:', predictions); ``` #### 1.4 Logistic Regression Binary and multiclass classification using logistic function. ```javascript import { LogisticRegression } from 'simple-ml'; // Binary classification const logReg = new LogisticRegression({ learningRate: 0.1, // Learning rate for gradient descent maxIterations: 1000, // Maximum iterations tolerance: 1e-4, // Convergence tolerance penalty: 'l2', // Regularization: 'l2', 'l1', or 'none' C: 1.0, // Inverse regularization strength multiClass: 'ovr' // 'ovr' (one-vs-rest) or 'multinomial' (softmax) }); // Binary classification data const X = [ [1, 2], [2, 3], [3, 1], // Class 0 [6, 5], [7, 7], [8, 6] // Class 1 ]; const y = [0, 0, 0, 1, 1, 1]; // Fit model logReg.fit(X, y); console.log('Coefficients:', logReg.coefficients); console.log('Intercept:', logReg.intercept); // Predict classes const predictions = logReg.predict([[2, 2], [7, 6]]); console.log('Predictions:', predictions); // [0, 1] // Predict probabilities const probabilities = logReg.predictProba([[2, 2], [7, 6]]); console.log('Probabilities:', probabilities); // [[0.95, 0.05], [0.02, 0.98]] // Evaluate const score = logReg.score(X, y); console.log('Accuracy:', score); ``` **Multiclass Example:** ```javascript // Multiclass classification const X = [ [1, 1], [1, 2], [2, 1], // Class 0 [5, 5], [5, 6], [6, 5], // Class 1 [9, 9], [9, 10], [10, 9] // Class 2 ]; const y = [0, 0, 0, 1, 1, 1, 2, 2, 2]; // Use 'multinomial' for ordered/continuous classes const multiLogReg = new LogisticRegression({ multiClass: 'multinomial', // Better for this type of data learningRate: 0.1, maxIterations: 1000 }); multiLogReg.fit(X, y); const pred = multiLogReg.predict([[2, 2], [6, 6], [10, 10]]); console.log('Multiclass Predictions:', pred); // [0, 1, 2] const proba = multiLogReg.predictProba([[2, 2], [6, 6], [10, 10]]); console.log('Class Probabilities:', proba); // [[0.803, 0.195, 0.002], // → class 0 // [0.007, 0.708, 0.285], // → class 1 // [0.000, 0.057, 0.943]] // → class 2 console.log('Accuracy:', multiLogReg.score(X, y)); // 1.0 ``` **Choosing `multiClass` Mode:** - **`'ovr'` (One-vs-Rest, default)**: Fast and works well for independent categories - Use for: Animals (cat, dog, bird), Topics (sports, politics, tech) - Each class vs all others is trained separately - **`'multinomial'` (Softmax)**: More robust, handles ordered/continuous classes better - Use for: Ratings (low, medium, high), Sizes (S, M, L, XL) - Trains all classes simultaneously with softmax function - **Recommended when classes have natural ordering** ```javascript // Example: 'ovr' for independent categories const categories = new LogisticRegression({ multiClass: 'ovr' }); const X_cat = [[1, 0], [0, 1], [1, 1]]; const y_cat = ['cat', 'dog', 'bird']; categories.fit(X_cat, y_cat); // Example: 'multinomial' for ordered classes const ratings = new LogisticRegression({ multiClass: 'multinomial' }); const X_rating = [[1, 2], [5, 6], [9, 10]]; const y_rating = ['low', 'medium', 'high']; ratings.fit(X_rating, y_rating); ``` #### 1.5 Polynomial Regression Regression with polynomial features. ```javascript import { PolynomialRegression } from 'simple-ml'; // Create polynomial model const poly = new PolynomialRegression({ degree: 2, // Polynomial degree (default: 2) fitIntercept: true, normalize: false }); // Non-linear data const X = [[1], [2], [3], [4], [5]]; const y = [1, 4, 9, 16, 25]; // y = x² // Fit model poly.fit(X, y); console.log('Coefficients:', poly.coefficients); console.log('Intercept:', poly.intercept); // Predictions const predictions = poly.predict([[6], [7]]); console.log('Predictions:', predictions); // [36, 49] // Evaluate const score = poly.score(X, y); console.log('R² Score:', score); // Close to 1.0 ``` **Higher Degree Example:** ```javascript // Cubic polynomial const cubicPoly = new PolynomialRegression({ degree: 3 }); const X = [[1], [2], [3], [4]]; const y = [1, 8, 27, 64]; // y = x³ cubicPoly.fit(X, y); const pred = cubicPoly.predict([[5]]); console.log('Prediction for x=5:', pred); // [125] ``` --- ### 2. Classification Algorithms #### 2.1 K-Nearest Neighbors (KNN) Non-parametric classification based on nearest neighbors. ```javascript import { KNeighborsClassifier } from 'simple-ml'; // Create KNN classifier const knn = new KNeighborsClassifier({ k: 3, // Number of neighbors (default: 5) weights: 'uniform' // 'uniform' or 'distance' }); // Training data const X = [ [1, 2], [2, 3], [3, 1], // Class 'A' [6, 5], [7, 7], [8, 6] // Class 'B' ]; const y = ['A', 'A', 'A', 'B', 'B', 'B']; // Fit model (stores training data) knn.fit(X, y); // Predict const predictions = knn.predict([[2, 2], [7, 6]]); console.log('Predictions:', predictions); // ['A', 'B'] // Predict with probabilities const probabilities = knn.predictProba([[2, 2]]); console.log('Probabilities:', probabilities); // Evaluate const score = knn.score(X, y); console.log('Accuracy:', score); ``` **Distance-Weighted KNN:** ```javascript // Use distance weighting const weightedKnn = new KNeighborsClassifier({ k: 5, weights: 'distance' // Closer neighbors have more influence }); weightedKnn.fit(X, y); const pred = weightedKnn.predict([[4, 4]]); console.log('Distance-weighted prediction:', pred); ``` **Finding Optimal K:** ```javascript import { trainTestSplit, accuracy } from 'simple-ml'; const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.3 }); // Test different k values for (let k = 1; k <= 10; k++) { const model = new KNeighborsClassifier({ k }); model.fit(XTrain, yTrain); const pred = model.predict(XTest); const acc = accuracy(yTest, pred); console.log(`k=${k}: Accuracy = ${acc.toFixed(3)}`); } ``` #### 2.2 Gaussian Naive Bayes Probabilistic classifier assuming Gaussian distribution. ```javascript import { GaussianNaiveBayes } from 'simple-ml'; // Create model const gnb = new GaussianNaiveBayes({ priors: null // Class priors (default: null = uniform) }); // Training data const X = [ [1, 2], [2, 3], [3, 4], // Class 0 [6, 7], [7, 8], [8, 9] // Class 1 ]; const y = [0, 0, 0, 1, 1, 1]; // Fit model gnb.fit(X, y); // Access learned parameters console.log('Class Priors:', gnb.classPrior); console.log('Means:', gnb.theta); console.log('Variances:', gnb.sigma); // Predict const predictions = gnb.predict([[2, 3], [7, 8]]); console.log('Predictions:', predictions); // [0, 1] // Predict probabilities const probabilities = gnb.predictProba([[4, 5]]); console.log('Probabilities:', probabilities); // Evaluate const score = gnb.score(X, y); console.log('Accuracy:', score); ``` #### 2.3 Multinomial Naive Bayes Naive Bayes for discrete/count features (e.g., text classification). ```javascript import { MultinomialNaiveBayes } from 'simple-ml'; // Create model const mnb = new MultinomialNaiveBayes({ alpha: 1.0 // Laplace smoothing parameter }); // Training data (word counts) const X = [ [2, 1, 0], // Document 1: "spam" words [1, 1, 0], // Document 2: "spam" words [0, 0, 2], // Document 3: "ham" words [0, 1, 2] // Document 4: "ham" words ]; const y = ['spam', 'spam', 'ham', 'ham']; // Fit model mnb.fit(X, y); // Predict const predictions = mnb.predict([[2, 0, 1], [0, 0, 3]]); console.log('Predictions:', predictions); // Predict probabilities const probabilities = mnb.predictProba([[1, 1, 1]]); console.log('Probabilities:', probabilities); ``` #### 2.4 Decision Tree Classifier Tree-based classifier with interpretable rules. ```javascript import { DecisionTreeClassifier } from 'simple-ml'; // Create decision tree const dt = new DecisionTreeClassifier({ criterion: 'gini', // 'gini' or 'entropy' maxDepth: 5, // Maximum tree depth (default: Infinity) minSamplesSplit: 2, // Min samples to split a node minSamplesLeaf: 1, // Min samples in leaf node maxFeatures: null // Max features to consider }); // Training data const X = [ [2.5, 2.5], [3, 3], [2, 3], // Class 0 [7, 7], [8, 6], [7, 8], // Class 1 [3, 8], [4, 7], [3, 7] // Class 2 ]; const y = [0, 0, 0, 1, 1, 1, 2, 2, 2]; // Fit model dt.fit(X, y); // Predict const predictions = dt.predict([[2.5, 2.5], [7.5, 7], [3.5, 7.5]]); console.log('Predictions:', predictions); // [0, 1, 2] // Evaluate const score = dt.score(X, y); console.log('Accuracy:', score); // Get feature importances (if available) if (dt.featureImportances) { console.log('Feature Importances:', dt.featureImportances); } ``` **Using Entropy:** ```javascript const entropyDT = new DecisionTreeClassifier({ criterion: 'entropy', maxDepth: 3 }); entropyDT.fit(X, y); const pred = entropyDT.predict([[5, 5]]); console.log('Entropy-based prediction:', pred); ``` --- ### 3. Clustering #### 3.1 K-Means Clustering Centroid-based clustering algorithm. ```javascript import { KMeans } from 'simple-ml'; // Create K-Means model const kmeans = new KMeans({ nClusters: 3, // Number of clusters (required) maxIterations: 300, // Max iterations (default: 300) tolerance: 1e-4, // Convergence tolerance initMethod: 'kmeans++', // 'kmeans++' or 'random' nInit: 10, // Number of initializations randomState: 42 // Random seed for reproducibility }); // Data to cluster const X = [ [1, 2], [1.5, 1.8], [1, 0.6], // Cluster 1 [5, 8], [6, 9], [5, 7], // Cluster 2 [10, 2], [9, 3], [10, 3] // Cluster 3 ]; // Fit model kmeans.fit(X); // Get cluster labels console.log('Labels:', kmeans.labels); // [0, 0, 0, 1, 1, 1, 2, 2, 2] // Get cluster centroids console.log('Centroids:', kmeans.centroids); // [[1.17, 1.47], [5.33, 8.0], [9.67, 2.67]] // Get inertia (sum of squared distances) console.log('Inertia:', kmeans.inertia); // Predict cluster for new data const newData = [[1.2, 1.9], [5.5, 8.2], [9.5, 2.8]]; const predictions = kmeans.predict(newData); console.log('Predictions:', predictions); // [0, 1, 2] ``` **Finding Optimal K (Elbow Method):** ```javascript // Test different numbers of clusters const inertias = []; for (let k = 2; k <= 10; k++) { const model = new KMeans({ nClusters: k, nInit: 10 }); model.fit(X); inertias.push(model.inertia); console.log(`K=${k}: Inertia = ${model.inertia.toFixed(2)}`); } // Plot inertias to find "elbow" ``` --- ### 4. Preprocessing #### 4.1 StandardScaler Z-score normalization (mean=0, std=1). ```javascript import { StandardScaler } from 'simple-ml'; // Create scaler const scaler = new StandardScaler({ withMean: true, // Center data (default: true) withStd: true // Scale to unit variance (default: true) }); // Data to scale const X = [ [1, 2], [3, 4], [5, 6], [7, 8] ]; // Fit and transform const XScaled = scaler.fitTransform(X); console.log('Scaled data:', XScaled); // Access learned parameters console.log('Mean:', scaler.mean); // [4, 5] console.log('Std:', scaler.std); // [2.236, 2.236] // Transform new data const newData = [[9, 10]]; const newScaled = scaler.transform(newData); console.log('New data scaled:', newScaled); // Inverse transform const original = scaler.inverseTransform(XScaled); console.log('Original data:', original); ``` #### 4.2 MinMaxScaler Scale features to a specified range. ```javascript import { MinMaxScaler } from 'simple-ml'; // Create scaler const scaler = new MinMaxScaler({ featureRange: [0, 1] // Target range (default: [0, 1]) }); const X = [ [1, 2], [3, 4], [5, 6] ]; // Fit and transform const XScaled = scaler.fitTransform(X); console.log('Scaled to [0,1]:', XScaled); // [[0, 0], [0.5, 0.5], [1, 1]] // Access min and max console.log('Data min:', scaler.dataMin); console.log('Data max:', scaler.dataMax); // Transform new data const newScaled = scaler.transform([[7, 8]]); console.log('New data scaled:', newScaled); // [[1.5, 1.5]] // Inverse transform const original = scaler.inverseTransform(XScaled); console.log('Original:', original); ``` **Custom Range Example:** ```javascript // Scale to [-1, 1] const customScaler = new MinMaxScaler({ featureRange: [-1, 1] }); const scaled = customScaler.fitTransform(X); console.log('Scaled to [-1,1]:', scaled); ``` #### 4.3 RobustScaler Robust scaling using median and IQR (resistant to outliers). ```javascript import { RobustScaler } from 'simple-ml'; // Create scaler const scaler = new RobustScaler({ withCentering: true, // Center using median withScaling: true, // Scale using IQR quantileRange: [25, 75] // IQR percentiles }); // Data with outliers const X = [ [1, 2], [2, 3], [3, 4], [100, 200] // Outlier ]; // Fit and transform const XScaled = scaler.fitTransform(X); console.log('Robust scaled:', XScaled); // Access median and IQR console.log('Median:', scaler.center); console.log('IQR:', scaler.scale); // Transform and inverse const newData = [[50, 60]]; const scaled = scaler.transform(newData); const original = scaler.inverseTransform(scaled); ``` #### 4.4 LabelEncoder Encode categorical labels to integers. ```javascript import { LabelEncoder } from 'simple-ml'; // Create encoder const le = new LabelEncoder(); // Categorical labels const labels = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat']; // Fit and transform const encoded = le.fitTransform(labels); console.log('Encoded:', encoded); // [0, 1, 0, 2, 1, 0] // Access classes console.log('Classes:', le.classes); // ['cat', 'dog', 'bird'] // Transform new labels const newEncoded = le.transform(['dog', 'cat']); console.log('New encoded:', newEncoded); // [1, 0] // Inverse transform const original = le.inverseTransform(encoded); console.log('Original labels:', original); ``` **Numeric Labels Example:** ```javascript // Works with numbers too const numLabels = [10, 20, 10, 30, 20]; const encoded = le.fitTransform(numLabels); console.log('Encoded numbers:', encoded); // [0, 1, 0, 2, 1] console.log('Classes:', le.classes); // [10, 20, 30] ``` #### 4.5 OneHotEncoder Convert categorical features to binary columns. ```javascript import { OneHotEncoder } from 'simple-ml'; // Create encoder const ohe = new OneHotEncoder({ dropFirst: false, // Drop first column to avoid multicollinearity sparse: false // Return dense array }); // Categorical data const X = [ ['red'], ['blue'], ['green'], ['red'], ['blue'] ]; // Fit and transform const encoded = ohe.fitTransform(X); console.log('One-hot encoded:', encoded); // [[1, 0, 0], // [0, 1, 0], // [0, 0, 1], // [1, 0, 0], // [0, 1, 0]] // Access categories console.log('Categories:', ohe.categories); // [['red', 'blue', 'green']] // Transform new data const newEncoded = ohe.transform([['green'], ['red']]); console.log('New encoded:', newEncoded); // Inverse transform const original = ohe.inverseTransform(encoded); console.log('Original:', original); ``` **Multiple Features Example:** ```javascript // Multiple categorical features const X = [ ['red', 'small'], ['blue', 'large'], ['red', 'large'] ]; const encoder = new OneHotEncoder(); const encoded = encoder.fitTransform(X); console.log('Encoded (multiple features):', encoded); console.log('Categories:', encoder.categories); ``` #### 4.6 SimpleImputer Fill missing values in dataset. ```javascript import { SimpleImputer } from 'simple-ml'; // Create imputer const imputer = new SimpleImputer({ strategy: 'mean', // 'mean', 'median', 'most_frequent', 'constant' fillValue: null // Value for 'constant' strategy }); // Data with missing values (null) const X = [ [1, 2], [null, 3], [7, null], [4, 5] ]; // Fit and transform const XFilled = imputer.fitTransform(X); console.log('Filled data:', XFilled); // [[1, 2], // [4, 3], // null filled with mean (4) // [7, 3.33], // null filled with mean (3.33) // [4, 5]] // Access learned statistics console.log('Statistics:', imputer.statistics); // Transform new data const newData = [[null, 6]]; const filled = imputer.transform(newData); console.log('New data filled:', filled); ``` **Different Strategies:** ```javascript // Median strategy const medianImputer = new SimpleImputer({ strategy: 'median' }); const filled1 = medianImputer.fitTransform(X); // Most frequent strategy const modeImputer = new SimpleImputer({ strategy: 'most_frequent' }); const filled2 = modeImputer.fitTransform(X); // Constant strategy const constantImputer = new SimpleImputer({ strategy: 'constant', fillValue: 0 }); const filled3 = constantImputer.fitTransform(X); console.log('Filled with zeros:', filled3); ``` --- ### 5. Metrics #### 5.1 Regression Metrics ```javascript import { meanAbsoluteError, meanSquaredError, rootMeanSquaredError, r2Score, meanAbsolutePercentageError, maxError } from 'simple-ml'; const yTrue = [3, -0.5, 2, 7]; const yPred = [2.5, 0.0, 2, 8]; // Mean Absolute Error const mae = meanAbsoluteError(yTrue, yPred); console.log('MAE:', mae); // 0.5 // Mean Squared Error const mse = meanSquaredError(yTrue, yPred); console.log('MSE:', mse); // 0.375 // Root Mean Squared Error const rmse = rootMeanSquaredError(yTrue, yPred); console.log('RMSE:', rmse); // 0.612 // R² Score (coefficient of determination) const r2 = r2Score(yTrue, yPred); console.log('R² Score:', r2); // 0.948 // Mean Absolute Percentage Error const mape = meanAbsolutePercentageError(yTrue, yPred); console.log('MAPE:', mape); // Maximum Error const maxErr = maxError(yTrue, yPred); console.log('Max Error:', maxErr); // 1.0 ``` #### 5.2 Classification Metrics ```javascript import { accuracy, precision, recall, f1Score, confusionMatrix, classificationReport } from 'simple-ml'; const yTrue = [0, 1, 2, 0, 1, 2, 0, 1, 2]; const yPred = [0, 2, 1, 0, 1, 2, 0, 2, 2]; // Accuracy const acc = accuracy(yTrue, yPred); console.log('Accuracy:', acc); // 0.667 // Precision (per class or average) const prec = precision(yTrue, yPred, { average: 'macro' }); console.log('Precision:', prec); // Recall const rec = recall(yTrue, yPred, { average: 'macro' }); console.log('Recall:', rec); // F1 Score const f1 = f1Score(yTrue, yPred, { average: 'macro' }); console.log('F1 Score:', f1); // Confusion Matrix const cm = confusionMatrix(yTrue, yPred); console.log('Confusion Matrix:', cm); // [[3, 0, 0], // [0, 1, 2], // [0, 1, 2]] // Classification Report (comprehensive) const report = classificationReport(yTrue, yPred); console.log('Classification Report:', report); ``` **Binary Classification Metrics:** ```javascript const yTrue = [0, 0, 1, 1, 0, 1, 1, 0]; const yPred = [0, 1, 1, 1, 0, 0, 1, 0]; console.log('Binary Accuracy:', accuracy(yTrue, yPred)); console.log('Binary Precision:', precision(yTrue, yPred)); console.log('Binary Recall:', recall(yTrue, yPred)); console.log('Binary F1:', f1Score(yTrue, yPred)); ``` #### 5.3 Clustering Metrics ```javascript import { silhouetteScore, daviesBouldinScore, calinskiHarabaszScore } from 'simple-ml'; const X = [ [1, 2], [1.5, 1.8], [1, 0.6], [5, 8], [6, 9], [5, 7], [10, 2], [9, 3], [10, 3] ]; const labels = [0, 0, 0, 1, 1, 1, 2, 2, 2]; // Silhouette Score (higher is better, range: [-1, 1]) const silhouette = silhouetteScore(X, labels); console.log('Silhouette Score:', silhouette); // Davies-Bouldin Score (lower is better) const db = daviesBouldinScore(X, labels); console.log('Davies-Bouldin Score:', db); // Calinski-Harabasz Score (higher is better) const ch = calinskiHarabaszScore(X, labels); console.log('Calinski-Harabasz Score:', ch); ``` --- ### 6. Model Selection #### 6.1 Train-Test Split ```javascript import { trainTestSplit } from 'simple-ml'; const X = [[1], [2], [3], [4], [5], [6], [7], [8]]; const y = [2, 4, 6, 8, 10, 12, 14, 16]; // Basic split const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.25, // 25% for testing (default: 0.25) shuffle: true, // Shuffle before splitting (default: true) randomState: 42 // Random seed for reproducibility }); console.log('Training samples:', XTrain.length); // 6 console.log('Test samples:', XTest.length); // 2 console.log('X Train:', XTrain); console.log('y Train:', yTrain); console.log('X Test:', XTest); console.log('y Test:', yTest); ``` **Stratified Split (for classification):** ```javascript const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]]; const y = [0, 0, 0, 1, 1, 1]; const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.33, stratify: y, // Maintain class proportions randomState: 42 }); console.log('Train labels:', yTrain); console.log('Test labels:', yTest); ``` #### 6.2 Cross-Validation ```javascript import { crossValidate } from 'simple-ml'; import { LinearRegression } from 'simple-ml'; const X = [[1], [2], [3], [4], [5], [6], [7], [8]]; const y = [2, 4, 6, 8, 10, 12, 14, 16]; // 5-fold cross-validation const model = new LinearRegression(); const cvResults = crossValidate(model, X, y, { cv: 5, // Number of folds (default: 5) scoring: 'r2', // Scoring method shuffle: true, randomState: 42 }); console.log('Fold Scores:', cvResults.scores); console.log('Mean Score:', cvResults.meanScore); console.log('Std Score:', cvResults.stdScore); console.log('Fit Times:', cvResults.fitTimes); console.log('Score Times:', cvResults.scoreTimes); ``` **Cross-Validation for Classification:** ```javascript import { KNeighborsClassifier } from 'simple-ml'; const X = [ [1, 2], [2, 3], [3, 1], [6, 5], [7, 7], [8, 6] ]; const y = [0, 0, 0, 1, 1, 1]; const knn = new KNeighborsClassifier({ k: 3 }); const results = crossValidate(knn, X, y, { cv: 3, scoring: 'accuracy' }); console.log('CV Accuracy:', results.meanScore); ``` --- ### 7. Complete Pipeline Example Combining multiple components in a machine learning pipeline: ```javascript import { LinearRegression, KNeighborsClassifier, StandardScaler, LabelEncoder, trainTestSplit, crossValidate, r2Score, accuracy, meanSquaredError } from 'simple-ml'; // ========== REGRESSION PIPELINE ========== // 1. Load and prepare data const XRaw = [[1, 100], [2, 200], [3, 300], [4, 400], [5, 500]]; const yReg = [10, 20, 30, 40, 50]; // 2. Scale features const scaler = new StandardScaler(); const XScaled = scaler.fitTransform(XRaw); // 3. Split data const { XTrain, XTest, yTrain, yTest } = trainTestSplit( XScaled, yReg, { testSize: 0.2 } ); // 4. Train model const regModel = new LinearRegression(); regModel.fit(XTrain, yTrain); // 5. Evaluate const yPred = regModel.predict(XTest); console.log('Test R²:', r2Score(yTest, yPred)); console.log('Test MSE:', meanSquaredError(yTest, yPred)); // 6. Cross-validation const cvResults = crossValidate(regModel, XScaled, yReg, { cv: 5 }); console.log('CV R²:', cvResults.meanScore); // ========== CLASSIFICATION PIPELINE ========== // 1. Prepare classification data const XClass = [ [5.1, 3.5], [4.9, 3.0], [7.0, 3.2], [6.4, 3.2], [5.9, 3.0], [6.3, 2.5] ]; const yClass = ['setosa', 'setosa', 'versicolor', 'versicolor', 'virginica', 'virginica']; // 2. Encode labels const labelEncoder = new LabelEncoder(); const yEncoded = labelEncoder.fitTransform(yClass); // 3. Scale features const classScaler = new StandardScaler(); const XClassScaled = classScaler.fitTransform(XClass); // 4. Split data const split = trainTestSplit(XClassScaled, yEncoded, { testSize: 0.33 }); // 5. Train classifier const classifier = new KNeighborsClassifier({ k: 3 }); classifier.fit(split.XTrain, split.yTrain); // 6. Evaluate const predictions = classifier.predict(split.XTest); console.log('Test Accuracy:', accuracy(split.yTest, predictions)); // 7. Predict new sample const newSample = [[6.0, 3.0]]; const newScaled = classScaler.transform(newSample); const pred = classifier.predict(newScaled); const predLabel = labelEncoder.inverseTransform(pred); console.log('Prediction for new sample:', predLabel); ``` --- ## ⚠️ Best Practices & Important Notes ### Dataset Size Recommendations For reliable model evaluation: - **Minimum**: 10-20 samples total - **Recommended**: 50+ samples for simple models, 100+ for complex models - **Test set**: Use at least 5-10 samples in the test set **Why?** With very small test sets (1-2 samples), metrics like R² may not be meaningful: ```javascript // ❌ Too small - test set has only 1 sample const X = [[1], [2], [3], [4], [5]]; const y = [2, 4, 6, 8, 10]; const split = trainTestSplit(X, y, { testSize: 0.2 }); // Only 1 test sample! // ✅ Better - reasonable test set size const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]; const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]; const split = trainTestSplit(X, y, { testSize: 0.2 }); // 2 test samples // ✅ Recommended - adequate test set const X = Array.from({length: 50}, (_, i) => [i + 1]); const y = X.map(x => 2 * x[0] + Math.random()); const split = trainTestSplit(X, y, { testSize: 0.2 }); // 10 test samples ``` ### Feature Scaling Always scale features when using distance-based algorithms (KNN) or gradient descent (Logistic Regression): ```javascript import { StandardScaler, KNeighborsClassifier } from 'simple-ml'; // Features with different scales const X = [[1, 1000], [2, 2000], [3, 3000]]; // Scale before training const scaler = new StandardScaler(); const XScaled = scaler.fitTransform(X); const model = new KNeighborsClassifier({ k: 3 }); model.fit(XScaled, y); ``` ### Handling Missing Values Use `SimpleImputer` before training any model: ```javascript import { SimpleImputer } from 'simple-ml'; const X = [[1, 2], [null, 3], [7, null], [4, 5]]; const imputer = new SimpleImputer({ strategy: 'mean' }); const XFilled = imputer.fitTransform(X); // Now safe to use with any model model.fit(XFilled, y); ``` ### Cross-Validation vs Train-Test Split Use cross-validation for small datasets: ```javascript // For small datasets (<100 samples), use cross-validation const cvResults = crossValidate(model, X, y, { cv: 5 }); console.log('Mean CV Score:', cvResults.meanScore); // For large datasets, train-test split is faster const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y); model.fit(XTrain, yTrain); console.log('Test Score:', model.score(XTest, yTest)); ``` --- ## 🌐 Browser Usage ### Interactive Demos ```bash # Start local server npm run dev # Access demos: # - http://localhost:5000/examples/browser-example.html # - http://localhost:5000/examples/simple-demo.html ``` ### Browser Example ```html Simple-ML Demo

Machine Learning in the Browser

``` --- ## 🛠️ Build Formats After running `npm run build`: - **`dist/simple-ml.umd.js`** - Browser global `SimpleML` - **`dist/simple-ml.modern.js`** - ES2017+ for modern browsers - **`dist/simple-ml.module.js`** - ES Modules for bundlers - **`dist/simple-ml.cjs`** - CommonJS for Node.js --- ## 🧪 Testing ```bash npm test # Run all tests npm run build # Build for browser npm run dev # Watch mode ``` --- ## 📝 License MIT --- ## 🤝 Contributing Contributions welcome! Please: - Maintain consistent API patterns - Add comprehensive input validation - Include tests for new features - Follow existing code style --- Built with ❤️ in pure JavaScript