What is a neural network and how does it learn

· Category: AI & Machine Learning

Short answer

A neural network is a collection of interconnected layers that learn hierarchical representations by adjusting connection weights to minimize prediction error.

How it works

Input data flows through the network layer by layer in a forward pass. Each neuron computes a weighted sum of its inputs, adds a bias, and applies a non-linear activation function. The output is compared to the target using a loss function. Backpropagation then computes gradients of the loss with respect to each weight using the chain rule. An optimizer such as stochastic gradient descent updates the weights in the direction that reduces loss.

Example

A simple feedforward network trained on handwritten digits might have an input layer of 784 neurons for image pixels, one hidden layer of 128 neurons with ReLU activation, and an output layer of 10 neurons with softmax. During training, the network gradually learns edge detectors in early layers and digit shapes in deeper layers.

Why it matters

Neural networks form the backbone of modern artificial intelligence, enabling breakthroughs in computer vision, natural language processing, and generative modeling. Their ability to automatically learn features from raw data eliminates much of the manual engineering required by traditional methods.

Example

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(256, 10)
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

This snippet defines a simple neural network with dropout for regularization, a cross-entropy loss, and the Adam optimizer in PyTorch.