A lot of computation behind deep leaning models use linear algebra. Linear algebra gives us a language to manipulate many numbers at once: Allowing for simple notation to express complex computation. List of numbers Let’s consider a simple example. Given a list of numbers $a_1, a_2, \ldots a_n$ compute The sum of the numbers $S_1$ The sum of squares of the numbers $S_2$ The list of numbers minus their average $b_1, \ldots b_n$ In mathematical notation the mathematical notation for these operations are $$ \begin{aligned} S_1 &= \sum_{i=1}^n a_n\\ S_2 &= \sum_{i=1}^n a_n^2\\ b_i &= a_i - \mu_a \quad \text{where} \quad \mu_a = \frac{1}{n} \sum_{i=1}^n a_n \end{aligned} $$ This is clearly not a fun way to write down these operations, especially considering that we might want to combine a few of them one after the other. In terms of code the above operations are not very compact. Let’s just consider the last one 1 2 3 4 5 6 7 8 # Assume a = [a_1, a_2, ...] is given S = 0 for v in a: S += v mean = S / len(a) b = [] for v in a: b.append(v - mean) In PyTorch this can be written in a single line using just linear algebra 1 b = a - a.dot(torch.ones_like(a)) / len(a) or even shorter relying additional torch functions 1 b = a - a.mean() Let’s start with some basic definitions Vector A vector is an array of numbers. These numbers are arranged in order. We typically write it as a column vector as below: $$ \mathbf{v}=\begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_{n} \end{bmatrix} $$ Vector Typically, we denote a vector in bold lower-case letters, for example, $\mathbf{v}$. The elements of the vector are denoted in italics with a subscript indicating its position (index) in the array, for example $v_i$. What does the vector allows us to do? We can add, subtract, multiply, divide elements of two vectors which we call element-wise addition, subtraction, multiplication, division. Element-wise operation An element-wise operation on vectors independently applies the operation on every element. For two vectors $\mathbf{v}$ and $\mathbf{w}$ an element-wise is $$ \mathbf{v} \odot \mathbf{w}=\begin{bmatrix} v_1 \odot w_1 \\ v_2 \odot w_2 \\ \vdots \\ v_{n} \odot w_{n} \end{bmatrix} $$ where $\odot$ can be $+, -, \cdot, /, \ldots$. Generally, the size of the two vectors need to match for element-wise operations to work. Matrix A matrix is a 2-D array of numbers. Below is a $(n\times m)$-matrix ordered in a rectangle with a height $n$ and a width $m$. $$ \mathbf{M}= \begin{bmatrix} M_{1,1} & M_{1,2} & \cdots & M_{1,m}\\ M_{2,1} & M_{2,2} & \cdots & M_{2,m}\\ \vdots & \vdots & \ddots & \vdots \\ M_{n,1} & M_{n,2} & \cdots & M_{n,m}\\ \end{bmatrix} $$ Matrix Typically, we denote a matrix in bold upper-case letters, for example, $\mathbf{M}$. The elements of the matrix are denoted as $M_{i,j}$, where $i$ and $j$ denote the row and column of this element. Element-wise operations, as defined above, can be applied to matrices as well. Matrices also have a few unique operations. Matrix transpose A transpose of a matrix is the mirror image of the matrix across its diagonal line. Given a matrix $\mathbf{M}$, the transpose of $\mathbf{M}$ is denoted as $\mathbf{M}^\top$. We show an example $2\times 4$ matrix below $$ \small \mathbf{M}= \begin{bmatrix} M_{1,1} & M_{1,2} & M_{1,3} & M_{1,4} \\ M_{2,1} & M_{2,2} & M_{2,3} & M_{2,4} \end{bmatrix} \rightarrow \mathbf{M}^\top= \begin{bmatrix} M_{1,1} & M_{2,1}\\ M_{1,2} & M_{2,2}\\ M_{1,3} & M_{2,3}\\ M_{1,4} & M_{2,4} \end{bmatrix} $$ Vectors are special matrices of shape $n \times 1$. A transpose of a vector is a row vector. Row vector A row vector is the transpose of a vector $$ \mathbf{v}=\begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \rightarrow \mathbf{v}^\top=\begin{bmatrix}v_1, v_2, \cdots, v_n\end{bmatrix} $$ To disambiguate, regular vectors are sometimes called column vectors. Likely the most common matrix operation is matrix multiplication. Matrix multiplication Consider a matrix $\mathbf{A}$ of shape $n \times p$ and $\mathbf{B}$ of shape $p \times m$. Matrix multiplication produces a matrix $\mathbf{C}=\mathbf{A}\mathbf{B}$ of shape $n \times m$: $$ C_{i,j}=\sum_{k=1}^p A_{i,k} \cdot B_{k,j}, $$ Matrix multiplication requires the number of columns for $\mathbf{A}$ to be equal to that of rows for $\mathbf{B}$. Matrix multiplication Let’s look at an example multiplication of a $2 \times 4$ and $4 \times 3$ matrix. The result is a $2 \times 3$ matrix. $$ \begin{smallmatrix} \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ \color{gray} a_{21} & \color{gray} a_{22} & \color{gray} a_{23} & \color{gray} a_{24} \end{bmatrix}& \begin{bmatrix} \color{gray} b_{11} & b_{12} & \color{gray} b_{13} \\ \color{gray} b_{21} & b_{22} & \color{gray} b_{23} \\ \color{gray} b_{31} & b_{32} & \color{gray} b_{33} \\ \color{gray} b_{41} & b_{42} & \color{gray} b_{43} \end{bmatrix}&=& \begin{bmatrix} \color{gray} c_{11} & c_{12} & \color{gray} c_{13} \\ \color{gray} c_{21} & \color{gray} c_{22} & \color{gray} c_{23} \end{bmatrix}\\ \\ \\ 2 \times 4 & 4 \times 3 & & 2 \times 3 \end{smallmatrix} $$ Each element is the inner product of a row and column of the input matrices. Matrix multiplication applies to vectors as well. Any vector $\mathbf{v}$, specifically column vector, can be multiplied on the right $$ \mathbf{M} \mathbf{v}. $$ Any row vector $\mathbf{v}^\top$, can be multiplied on the left $$ \mathbf{v}^\top \mathbf{M}. $$ There are also two specific “matrix multiplications” between vectors Inner product The inner product or dot product between two size-$n$ vectors $\mathbf{v}, \mathbf{w}$ is a scalar: $$ \mathbf{v}^\top \mathbf{w}=\sum_{i=1}^n v_iw_i $$ Outer product The outer product between a size-$n$ vector $\mathbf{v}$ and a size-$m$ vector $\mathbf{w}$ is a $n \times m$ matrix: $$ M = \mathbf{v} \mathbf{w}^\top, $$ where $M_{i,j} = v_i \cdot w_j$. A vector can also be thought a point in a high-dimensional space, or similarly, the distance between two points in a high-dimensional space. The length of the vector is its Euclidean norm. Euclidean norm The Euclidean norm of a vector $\mathbf{v}$ is $$ \|\mathbf{v}\| = \sqrt{\mathbf{v}^\top \mathbf{v}} = \sqrt{\sum_{i=1}^n v_i^2} $$ The norm above is also sometimes referred to as the 2-norm. Finally, a similar definition of the norm exists for matrices. It’s called the Frobenius norm. Frobenius norm The Frobenius norm of a matrix $\mathbf{M}$ is $$ \|\mathbf{M}\| = \sqrt{\sum_{i=1}^n \sum_{j=1}^m M_{i,j}^2} $$ Euclidean Norm in 2D We first look at a two-dimensional example. The length of $\mathbf{v}$ is $ |\mathbf{v}|=\sqrt{v_1^2 + v_2^2}$.