شروحات لخوارزميات الذكاء الاصطناعي الرياضية و الاحصائية

Linear Regression

y: Predicted value or output of the model.

f: Activation function (e.g., identity function for regression).

w_i: Weights assigned to each input feature.

x_i: Input features or variables.

b: Bias term, allowing flexibility in the prediction.

Softmax Function

P(y|x): Probability of class \(y\) given input \(x\).

e: Exponential function to ensure positive outputs.

w^Tx: Dot product of weights and inputs for a specific class.

Denominator: Sum of exponentials for all classes ensures probabilities sum to 1.

Binary Cross-Entropy Loss

L: Loss function value (lower is better).

N: Total number of samples.

y_i: True label (0 or 1).

p_i: Predicted probability for the positive class.

log: Natural logarithm ensures steep penalties for incorrect predictions.

Adam Optimizer Algorithm

\(\theta_t\): Updated parameter values at time step \(t\).

\(\theta_{t-1}\): Previous parameter values.

\(\eta\): Learning rate, which controls the step size for updates.

\(\hat{m_t}\): Bias-corrected first moment estimate (mean of gradients).

\(\hat{v_t}\): Bias-corrected second moment estimate (variance of gradients).

\(\epsilon\): A small constant to prevent division by zero (e.g., \(10^{-8}\)).

Stochastic Gradient Descent (SGD)

\(\theta_t\): Updated parameter values at time step \(t\).
\(\theta_{t-1}\): Previous parameter values.
\(\eta\): Learning rate, which determines the step size for updates.
\(\nabla_\theta J(\theta)\): Gradient of the loss function \(J(\theta)\) with respect to the parameters.
Purpose: Minimizes the loss function by iteratively updating the parameters in the direction of the negative gradient.
Illustration: The image represents how the algorithm iteratively reduces the cost by adjusting weights using gradients.

Binary Classification Algorithm

\(P(y|x):\) Probability of class \(y = 1\) given input features \(x\).

\(\sigma:\) Sigmoid activation function, maps output to a range between 0 and 1.

\(w^T x:\) Weighted sum of the input features.

\(b:\) Bias term, allowing flexibility in the decision boundary.

Output: If \(P(y|x) > 0.5\), classify as class 1; otherwise, classify as class 0.

Purpose: To separate data into two distinct classes by fitting a decision boundary.

K-Nearest Neighbors (KNN) Algorithm

\(d(x, x_i):\) Distance between the input \(x\) and the \(i\)-th training data point \(x_i\).

\(\sqrt{\sum_{j=1}^n (x_j - x_{ij})^2}:\) Euclidean distance formula to calculate the similarity between two points.

K: Number of neighbors to consider for classification.

Majority Voting: The class label is determined by the majority of the K nearest neighbors.

Purpose: Classify new data points by considering the closest points in the training set.

Support Vector Classifier (SVC)

\(f(x):\) Decision function that determines the class of \(x\).

\(w^T:\) Weight vector that defines the orientation of the decision boundary.

\(\phi(x):\) Feature mapping function to project input into higher dimensions for linear separability.

\(b:\) Bias term that shifts the decision boundary.

Support Vectors: Data points closest to the decision boundary, crucial for determining the margin.

Purpose: Finds the hyperplane that maximizes the margin between two classes for classification.

Programming Ocean Academy

AI Algorithms Explainer