What are word embeddings and why do they matter

· Category: AI & Machine Learning

Short answer

Word embeddings map words to dense vectors where geometric relationships encode semantic and syntactic meaning.

How it works

Algorithms like Word2Vec and GloVe learn embeddings by predicting word co-occurrence statistics across large corpora. Words that appear in similar contexts develop similar vector representations. Contextual embeddings from transformers generate dynamic representations that change based on surrounding words, disambiguating homonyms. Cosine similarity between vectors measures semantic relatedness, and vector arithmetic can capture analogies such as king minus man plus woman approximating queen.

Example

In a trained embedding space, the words "doctor" and "physician" cluster closely together, while "doctor" and "banana" are far apart. Querying nearest neighbors of "france" might return "spain," "germany," and "italy," reflecting geographic relationships.

Why it matters

Embeddings transform discrete text into continuous representations that machine learning models can process. They enable semantic search, recommendation systems, and transfer learning in NLP. High-quality embeddings reduce the need for massive labeled datasets by capturing general language knowledge from raw text.

Example

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(256, 10)
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

This snippet defines a simple neural network with dropout for regularization, a cross-entropy loss, and the Adam optimizer in PyTorch.