What Is Deep Learning?

Short answer: Deep learning is machine learning that performs remarkably well throughout a large range of applications. Longer answer: Deep learning is an emerging and fast-evolving sub-field of machine learning. Machine learning originates from statistics. Both machine learning and statistics use models to understand and analyze data. Generally speaking, statistics uses simpler models whose properties are mathematically understood. Machine learning uses more complex models with few mathematical guarantees. Deep learning uses a very specific and powerful kind of models for which mathematical guarantees are very hard to come by. Deep learning models often perform remarkably well when trained on sufficient data. Figure 1 summarizes the relationship between deep learning, machine learning, and statistics. Figure 1: The relationship between deep learning, machine learning, and statistics. Complete answer: We completely understand this was a bit dense, we will go through all the details below. Deep learning in everyday life You might have also seen applications of deep learning in in everyday life. Below is a partial list of applications: Autonomous Vehicles AI Chatbot, e.g. ChatGPT Parts of Search, e.g. Bing Chat Audio recognition, e.g. Siri Recommendation Advertisement Online photo libraries, e.g. Google Photos Facial recognition, e.g. FaceID Statistics Statistics is the study of data. The goal is to understand the properties of data. Statistics defines the fundamental mathematical language for the study of data. To go deeper in our analysis, we need to introduce statistical models. Statistical model A statistical model, or model for short, is a parametric function $f_\theta$ that maps some input $x$ to outputs $y$. $$ f_\theta: \mathbb{R}^n \to \mathbb{R}^m $$ The input $x \in \mathbb{R}^n$ is a vector of dimension $n\ge 1$, the output is a vector $y \in \mathbb{R}^m$ of dimension $m \ge 1$, and the parameters $\theta$ is a vector of numbers. The function $f$ determines the structure of the model. The parameters $\theta$ determine inner workings of the model. Let’s look at a few concrete examples. Predicting your class grade TODO: @jozhang97 finish this up. No need to use a notebook, just explain the problem and setup. (data: some variables are dependent. grade, first homework score, first N homework scores, …) How would one estimate one’s grade the class? Show mean and std for the grade; mean conditioned on being in top-50% and bottom-50% for homework 1 (no model in this case); then introduce model. Predicting your class grade Let’s look at a more complex example of predicting the weather in Austin, specifically whether it will rain in Austin and what temperature it will be. One of the simplest models takes as input the observed temperature from the previous day in Austin and predicts the current weather. A more clever model uses the temperatures from the previous years. An even more complex model takes as input the satellite imagery of Texas. A model is generally defined by its structure (the form of $f$) and parameters $\theta$. The structure defines how inputs are manipulated to produce certain outputs. The parameters, also called weights, determine the exact mathematical operations of the model. The use of models in statistics Statistics uses models to analyze and explain the data. Statisticians prefer to use models for which we understand mathematical properties. Linear models are a very popular class here. Model complexity often hinders the understanding and analysis of the underlying data. The use of models in machine learning Machine learning focuses on the model itself and the predictions that the model makes. Model complexiy is only a concern if the model does not fit the data well, produce wrong predictions, or are too hard to fit to data. The use of models in deep learning Deep learning uses a special kind of statistical model called deep network, also sometimes referred to as neural network. Deep networks are larger, and capable of ingesting more data than other machine learning models. Machine learning Machine learning is the systematic study of the structure of models $f$, the optimization to obtain good parameters $\theta$, and the application of statistical models to new tasks and data. Machine learning models are generally more complex, and more often used for prediction rather than analysis. Now what is deep learning really? Deep learning Deep learning is a subfield of machine learning that studies a very specific kind of models: deep networks. Deep networks A deep network is a statistical model with many layers of computation. Each layer has its own structure, weights, and outputs. A deep network stacks layers on top of one another such that the output of one layer becomes the input of the next. This specific structure allows models to scale to a very large number of parameters and ingest significantly more data. Among all machine learning models, deep networks scale the best as the data increases. Deep learning has become extremely popular because deep networks leverage their powerful capacity to solve real problems. Their large complexity and parameter count has made them challenging to analyze. While deep learning models work very well, the empirical research is much ahead of the theoretical research: it is an open research topic to provide a grounded theory why deep learning works. Tl;dr Deep learning is a subfield of machine learning. studies deep networks, a family of powerful and scalable models.