Multilater Perceptrons (MLPs)

A Multilayer Perceptron (MLP) is a feed-forward neural network. It is represented as a function \(y : \mathbb{R}^d \rightarrow \mathbb{R}^k\) in the following form: \[ y(\mathbf{x}) = \phi( \mathbf{f}_l \circ \mathbf{f}_{l-1} \circ \cdots \mathbf{f}_1(\mathbf{x})) \] Here, \(l\) denotes the total number of hidden layers. Let \(n_i\) denote the number of neurons in layer \(i\), then we have that \(\mathbf{f}_i : \mathbb{R}^{n_{i-1}} \rightarrow \mathbb{R}^{n_i}\), where most if not always, the hidden layers are of then form \(\mathbf{f}_i = \sigma(\mathbf{W}_i\mathbf{f}_{i-1} + \mathbf{b}_i)\), where \(\sigma\) is a non-linearity applied to each element of its input vector. At the very end, the output layer applies the function \(\phi\) which depends on the task at hand, for e.g. in regression it could be the identity function, assuming the last hidden layer's dimension is the same as the output dimension of the regression function, or the softmax function followed by argmax for multilabel classification.

In contrast to Rosenblatt's perceptron, the MLP is a non-linear function and thus can accommodate more complex functions - in fact MLPs can in theory approximate any continuous function with finite support.