Convolutional Neural Network (CNN)

A CNN is a type of deep neural network designed for processing structured grid data, such as images. It is particularly effective in image recognition and computer vision tasks. The key feature of CNNs is the use of convolutional layers, which apply filters (kernels) to input data, enabling the network to automatically and adaptively learn spatial hierarchies of features. CNNs typically include convolutional, pooling, and fully connected layers. Convolutional layers capture local patterns and spatial relationships, while pooling layers reduce dimensionality and retain important information. Fully connected layers at the end of the network use learned features to make predictions. CNNs have demonstrated exceptional performance in tasks like image classification, object detection, and segmentation.

See a small demo here.

Convolutional Layers

Various types of filters are used in the convolutional layers to capture different features and patterns in the input data. The choice of filters depends on the specific task and the nature of the data. Commonly used filters include:

Edge Detectors: filters like the Sobel, Prewitt, or Scharr filters are often used to detect edges in images. These filters emphasize changes in intensity, helping the network identify boundaries between objects.
Blur Filters: Gaussian filters are used to introduce blur and smoothness. They help in reducing noise and capturing more general features.
Sharpening Filters: filters that enhance details and emphasize high-frequency components, such as the Laplacian filter, are used for sharpening.
Identity Filters: simple filters, like the identity filter, do not modify the input and are useful for preserving certain features during the convolution operation.
Learned Filters: in modern CNNs, especially in deeper layers, filters are learned during training. These filters adapt to the specific features present in the training data and become specialized for the task.
Gabor Filters: Gabor filters are commonly used for texture analysis. They are particularly effective at capturing textures and repetitive patterns.
Color Filters: in the case of color images, filters may operate on different color channels. For example, separate filters may be applied to the red, green, and blue channels.

Pooling

Pooling is a downsampling operation commonly used in Convolutional Neural Networks (CNNs) to reduce the spatial dimensions of the input data and extract the most relevant information. The two main types of pooling are Max Pooling and Average Pooling.

Max Pooling: In max pooling, a small window (typically 2x2 or 3x3) moves over the input data, and for each window, the maximum value is retained. This helps capture the most prominent features in that local region. Max pooling is particularly effective in retaining important features and reducing spatial dimensions.

Average Pooling: In average pooling, instead of taking the maximum value, the average value of the elements in the window is computed. Average pooling smoothens the representation, and it is useful in scenarios where the exact spatial location of features is less critical.

Pooling Mechanism Steps:

Sliding Window: A small window (pooling window) slides over the input data in both horizontal and vertical directions.
Pooling Operation: For each window, the pooling operation (max or average) is applied, and a single value is retained.
Reduction of Spatial Dimensions: The output of pooling is a downsampled version of the input, with reduced spatial dimensions.

![[pooling.gif | center | 550]] source: https://pub.towardsai.net/introduction-to-pooling-layers-in-cnn-dafe61eabe34

Purpose and Benefits:

Spatial Hierarchical Features: Pooling helps create a spatial hierarchy of features. As the window slides, the network captures the most relevant information in each local region.
Translation Invariance: Pooling provides a degree of translation invariance, making the network less sensitive to small variations in the spatial location of features.
Parameter Reduction: By reducing the spatial dimensions, pooling reduces the number of parameters in the network, helping manage computational complexity and avoiding overfitting.

Overall architecture

CNN.png | center | 650