Machine Learning (ML)

A collection of online tools

Open Source Code: Github

Coursera – by Andrew Ng

*(There could be errors in the code, so please double check your results)

Related Topics

Formula Sheet

Linear Regression

$\circ$ Gradient Descent

Feature Scaling \begin{equation} X_i := \frac{X_i - \mu_i}{S_i} \ \text{ where $\mu$=average and $S=$range(max-min) or standard deviation}\end{equation} Hypothesis \begin{equation} h_\theta(x) = \theta^T x = \theta_0 + \theta_1 x_1 + ... +\theta_n x_n \end{equation} Cost Function \begin{equation} J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left(h_\theta (x^{(i)})-y{(i)}\right)^2 \end{equation} Gradient Descent \begin{equation} \theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta) = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left(h_\theta(x^{(i)}) -y^{(i)}\right)x^{(i)}_j \end{equation} In the multivariate case, the cost function can also be written in the following vectorized from: \begin{equation} J(\theta) = \frac{1}{2m} (X \theta - \vec{y})^T (X \theta -\vec{y}) \end{equation}

$\circ$ Normal Equations

\begin{equation} \theta = (X^T X)^{-1} X^T \vec{y} \end{equation} Recommended to use when $n < (\sim 10^4)$ and inverse $(X^T X)^{-1}$ is not singular. No scaling needed for X.

Classification

$\circ$ Logistic Regression

Sigmoid/Logistic function \begin{equation} g(z) = \frac{1}{1+e^{-z}} \end{equation} Hypothesis \begin{equation} h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}} \end{equation} Probability that y=1, given x, parameterized by $\theta$ \begin{equation} h_\theta(x) = P(y=1| x;\theta) \end{equation} Decision Boundary is defined as $h_\theta(x) =0.5$ $h_\theta(x) =1$ when $ \theta^T x > 0$
Cost Function (used to get a convex function for $J(\theta)$) \begin{equation} Cost(h_\theta(x),y) \begin{cases} -log(h_\theta(x)),& \text{if } y=1\\ -log(1-h_\theta(x)), & \text{if } y=0 \end{cases} \end{equation} or \begin{equation} Cost(h_\theta(x),y) -y \log(h_\theta(x)) -(1-y)\log(1-h_\theta(x)) \end{equation} Therefore, \begin{equation} J(\theta)= -\frac{1}{m} \left[ \sum_{i=1}^m y^{(i)} \log(h_\theta(x^{(i)})) +(1-y^{(i)})\log(1-h_\theta(x^{(i)})) \right] \end{equation} Gradient Descent \begin{equation} \theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta) = \theta_j - \alpha \sum_{i=1}^m (h_\theta(x^{(i)}) -y^{(i)})x^{(i)}_j \end{equation} Other Topic here