Thursday, January 4, 2024

How Computer See an Image









Did you ever image how a computer sees an image. Well thinking of image from the computer perspective image is nothing but the matrix of number, in which each cell of matrix identified as pixel and the value of pixel define a color combination or intensity of light.

Image as Function

An image can be expressed as a mathematical function that depends on two variables, x and y, which specify a two-dimensional region. A digital picture consists of a matrix of pixels. A pixel is the fundamental unit of a picture. An image is composed of pixels, each with a value that represents the intensity of light at a specific location within the image. Now, let's examine an example image, after implementing the pixel grid onto it.


The image seen above has dimensions of 28 × 28. The dimensions of the picture are 28 pixels in width and 28 pixels in height. Therefore, the total number of pixels is 784, which is calculated by multiplying 28 by 28. Given an image with dimensions of 224 × 250, the matrix representing the picture will have a dimensionality of (224, 250). Each element in the matrix corresponds to a pixel and indicates the brightness intensity of that pixel. The value of zero corresponds to the color black, whereas the value of 255 corresponds to the color white.

Color Images

Grayscale pictures assign a single-color intensity value to each pixel, whereas color images in the RGB system consist of three channels (red, green, and blue). To clarify, color pictures are encoded using three matrices: one matrix represents the red intensity of each pixel, another matrix represents the green intensity, and the third matrix represents the blue intensity.

RGB Channels of Colored Image

Image Processing

During machine learning (ML) projects, it is customary to do a data pretreatment or cleaning phase. As a machine learning engineer, a significant portion of your work will be dedicated to data preprocessing and data preparation prior to constructing your learning model. The objective of this stage is to prepare your data for the machine learning model, facilitating its analysis and computer processing. This statement also applies to images. To effectively address the issue at hand and use the available dataset, it is necessary to do data preprocessing before inputting the photos into the machine learning model.
Image processing includes basic operations such as imagine scaling. The pre-processing responsibilities include many operations such as geometric and color transformations, converting color images to grayscale, and more.  The obtained data is often disorganized and originates from several sources. 

Converting Color Image to Grayscale Image


Data Augmentation

Another prevalent preprocessing approach is enhancing the current dataset by adding altered copies of the original images. Scaling, rotations, and other affine transformations are often used to increase the size of your dataset and expose the neural network to a diverse range of picture variants. This enhances the probability of your model accurately identifying items regardless of their appearance or configuration.

Feature Extraction

Feature extraction is an essential element of the computer vision process. The DL model revolves on the concept of extracting valuable characteristics that accurately delineate the objects in the picture.
In the context of machine learning, a feature refers to a quantifiable attribute or characteristic of an observed phenomena. Features are the data inputs that are provided to a machine learning model in order to generate a prediction or classification. Assume you want to forecast the cost of a home. The input characteristics, such as the area, number of rooms, and bathrooms, will be used by the model to get a forecasted price. Choosing effective attributes that distinctly differentiate your items enhances the prediction capability of machine learning algorithms.

I will write more in detail about Data Augmentation and Feature Extraction for Image Classification using Deep Learning in Python programming language, stay tune!
Here is the link of my notebook for the above amusing results: Computer Vision - Practical Approach | Kaggle


No comments:

Post a Comment