Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, and music/audio signal recognition where they have been shown to produce state-of-the-art results on various tasks. Alternatively, "deep learning" has been characterized as "just a buzzword for", or "largely a rebranding of", neural networks.

There are a number of ways that the field of deep learning has been characterized. Deep learning is a class of machine learning training algorithms that

Deep learning algorithms are contrasted with shallow learning algorithms by the number of parameterized transformations a signal encounters as it propagates from the input layer to the output layer, where a parameterized transformation is a processing unit that has trainable parameters, such as weights and thresholds.

**model architectures composed of multiple non-linear transformations**. Deep learning is part of a broader family of machine learning methods based on learning representations of data. An observation (e.g., an image) can be represented in many ways (e.g., a vector of intensity values, one per pixel), but some representations make it easier to learn tasks of interest (e.g., is this the image of a human face?) from examples, and research in this area attempts to define what makes better representations and how to create models to learn these representations.Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, and music/audio signal recognition where they have been shown to produce state-of-the-art results on various tasks. Alternatively, "deep learning" has been characterized as "just a buzzword for", or "largely a rebranding of", neural networks.

**Definitions**There are a number of ways that the field of deep learning has been characterized. Deep learning is a class of machine learning training algorithms that

- use many layers of nonlinear processing units for feature extraction and transformation. The algorithms may be supervised or unsupervised and applications include pattern recognition and statistical classification.
- are based on the (unsupervised) learning of multiple levels of features or representations of the data.
**Higher level features are derived from lower level features to form a hierarchical representation.** - are part of the broader machine learning field of learning representations of data.
- learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts.
- form a new field with the goal of moving toward artificial intelligence. The different levels of representation help make sense of data such as images, sounds and texts.

Deep learning algorithms are contrasted with shallow learning algorithms by the number of parameterized transformations a signal encounters as it propagates from the input layer to the output layer, where a parameterized transformation is a processing unit that has trainable parameters, such as weights and thresholds.

**A chain of transformations from input to output is a**. For a feedforward neural network, the depth of the CAPs, and thus the depth of the network, is the*credit assignment path*(CAP). CAPs describe potentially causal connections between input and output and may vary in length**number of hidden layers plus one (the output layer is also parameterized)**. For recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP is potentially unlimited in length. There is no universally agreed upon threshold of depth dividing shallow learning from deep learning, but most researchers in the field agree that**deep learning has multiple nonlinear layers (CAP > 2) and Schmidhuber considers CAP > 10 to be very deep learning**.**Fundamental concepts**

Deep learning algorithms are based on distributed representations, a concept used in machine learning. The underlying assumption behind distributed representations is that

**observed data is generated by the interactions of many different factors on different levels**. Deep learning adds the

**assumption that these factors are organized into multiple levels, corresponding to different levels of abstraction or composition**. Varying numbers of layers and layer sizes can be used to provide different amounts of abstraction.

Deep learning algorithms in particular exploit this idea of hierarchical explanatory factors. Different concepts are learned from other concepts, with the more abstract,

**higher level concepts being learned from the lower level ones**. These architectures are often constructed with a greedy layer-by-layer method that models this idea. Deep learning helps to disentangle these abstractions and pick out which features are useful for learning.

Many deep learning algorithms are framed as

**unsupervised learning problems**. Because of this, these algorithms can make use of the unlabeled data that other algorithms cannot. Unlabeled data is usually more abundant than labeled data, making this a very important benefit of these algorithms. The deep belief network is an example of a deep structure that can be trained in an unsupervised manner.

Deep learning architecturesThere are huge number of different variants of deep architectures, however, most of them are branched from some original parent architectures. It is not always possible to compare the performance of multiple architectures all together since, they are not all implemented on the same data set. It is important to mention that deep learning is a fast growing field that one may find some different architectures, variants, or algorithms every couple of weeks.

Deep neural networks[edit]A deep neural network (DNN) is defined[2][4] to be an artificial neural network with multiple hidden layers of units between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships. The extra layers enable composition of features from lower layers, giving the potential of modeling complex data with fewer units than a similarly performing shallow network.[2] DNNs are typically designed asfeedforward networks, but recent research has successfully applied the deep learning architecture to recurrent neural networks for applications such as language modeling.[40] Convolutional deep neural networks (CNNs) are used in computer vision where their success is well-documented.[41] More recently, CNNs have been applied to acoustic modeling for automatic speech recognition (ASR), where they have shown success over previous models.[42] For simplicity, a look at training DNNs is given here.

A DNN can be discriminatively trained with the standard backpropagation algorithm. The weight updates can be done via stochastic gradient descent using the following equation:

Deep neural networks[edit]A deep neural network (DNN) is defined[2][4] to be an artificial neural network with multiple hidden layers of units between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships. The extra layers enable composition of features from lower layers, giving the potential of modeling complex data with fewer units than a similarly performing shallow network.[2] DNNs are typically designed asfeedforward networks, but recent research has successfully applied the deep learning architecture to recurrent neural networks for applications such as language modeling.[40] Convolutional deep neural networks (CNNs) are used in computer vision where their success is well-documented.[41] More recently, CNNs have been applied to acoustic modeling for automatic speech recognition (ASR), where they have shown success over previous models.[42] For simplicity, a look at training DNNs is given here.

A DNN can be discriminatively trained with the standard backpropagation algorithm. The weight updates can be done via stochastic gradient descent using the following equation: