What are word embeddings and why do they matter
· Category: AI & Machine Learning
Short answer
Word embeddings map words to dense vectors where geometric relationships encode semantic and syntactic meaning.
How it works
Algorithms like Word2Vec and GloVe learn embeddings by predicting word co-occurrence statistics across large corpora. Words that appear in similar contexts develop similar vector representations. Contextual embeddings from transformers generate dynamic representations that change based on surrounding words, disambiguating homonyms. Cosine similarity between vectors measures semantic relatedness, and vector arithmetic can capture analogies such as king minus man plus woman approximating queen.
Example
In a trained embedding space, the words "doctor" and "physician" cluster closely together, while "doctor" and "banana" are far apart. Querying nearest neighbors of "france" might return "spain," "germany," and "italy," reflecting geographic relationships.
Why it matters
Embeddings transform discrete text into continuous representations that machine learning models can process. They enable semantic search, recommendation systems, and transfer learning in NLP. High-quality embeddings reduce the need for massive labeled datasets by capturing general language knowledge from raw text.
Example
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 10)
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
This snippet defines a simple neural network with dropout for regularization, a cross-entropy loss, and the Adam optimizer in PyTorch.