๐ 1. Introduction to Feature Representation
๐ Why Representations Are Foundational in AI
At the core of every intelligent system lies the question: how is knowledge represented? Whether we're predicting stock prices, recognizing faces, or translating text, we must transform raw data into structured formats that algorithms can operate on.
๐ Feature Representation โ A Philosophical and Technical Definition
๐ง What Is Feature Representation?
Feature representation is both the act and artifact of translating raw empirical phenomena into structured, mathematical entities that preserve information essential for reasoning, learning, and decision-making.
It is the bridge between the ontological world (what is) and the computational world (what can be learned or acted upon), encoding data in a way that both machines and theorists can manipulate meaningfully.
๐ญ Philosophical Perspective
Feature representation functions as a synthetic abstraction: it neither captures the full scope of reality nor merely compresses it. Instead, it selectively projects aspects of reality into structured spacesโvectorial, relational, topologicalโwhere structure, similarity, and transformation become legible.
๐ข Technical Definition
A feature representation is a mapping:
A feature representation is a mapping: \( \Phi : \mathcal{X} \rightarrow \mathcal{F} \)
where \( X \) is the raw input space (e.g., pixels, waveforms, text) and \( F \) is the feature space, often structured by algebraic or geometric principles (e.g., \( \mathbb{R}^n \), graphs, manifolds).โ๏ธ Desired Properties of Representations
- Discriminative Power: separates signal from noise
- Invariance: ignores irrelevant transformations (e.g., rotation, scaling)
- Learnability: facilitates efficient statistical or neural modeling
- Interpretability: provides insight into data and decisions
๐งญ Epistemic Role
Feature representations are the epistemic vessels by which intelligent systems apprehend the world โ shaping not only how machines โseeโ but also what they are capable of โknowing.โ
These structured formats โ feature representations โ are the very language machines use to perceive, reason, and act.
Without meaningful representation, even the most powerful learning algorithms cannot extract insight from data. Representation is not just preprocessing โ it is the epistemological foundation of machine intelligence.
๐ญ Representation as Abstraction
A representation is an abstraction that reduces complexity while preserving meaning. It captures the essence of input data in a form amenable to learning:
- ๐ผ๏ธ A photo becomes a matrix of pixel intensities
- ๐ A sentence becomes a sequence of vectors
- ๐งช A molecule becomes a graph of atoms and bonds
Through abstraction, we discard noise and emphasize structure, making data more task-relevant and model-compatible.
โ๏ธ Key Properties of Representations
- Dimensionality: degrees of freedom in the representation. Too low โ loss of nuance; too high โ curse of dimensionality.
- Invariance: ignores irrelevant transformations (e.g., translation in images).
- Learnability: how easily a model can extract meaningful structure.
- Interpretability: can humans understand or diagnose the representation?
๐ Real-World Examples
| Domain | Input | Representation | Learning Paradigm |
|---|---|---|---|
| Vision | Image | Pixel tensor (3D) | CNN, ResNet |
| Chemistry | Molecule | Graph (nodes/edges) | GNN, molecular fingerprints |
| NLP | Sentence | Token embeddings | Transformers, RNNs |
| Knowledge | Facts, rules | Logic/symbolic graph | Neuro-symbolic systems |
๐ Visual: Data Transformation Map
graph TD;
A[Raw Data] --> B[Preprocessing]
B --> C[Vector/Tensor Representation]
C --> D[Latent Embedding]
D --> E[Prediction / Output]
style D fill:#cce5ff,stroke:#333,stroke-width:2px;
style C fill:#d4edda,stroke:#333,stroke-width:2px;
๐งฑ 2. Linear & Algebraic Spaces
โ Vector Spaces
At the heart of machine learning lies the vector space โ a structured domain where data points are represented as vectors in โโฟ. Each feature corresponds to a dimension, and relationships between data points are captured via operations like dot products, norms, and projections.
- Expressiveness: Any real-world data can be embedded in a vector space.
- Computational Simplicity: Enables fast, gradient-based optimization.
- Ubiquity: Foundational for models like SVMs, linear regression, neural nets.
Let \( \vec{x} \in \mathbb{R}^n \), meaning:
\[ \vec{x} = [x_1, x_2, \dots, x_n] \]
๐งฑ Matrix & Tensor Representations
Many data types require structured collections of values:
- Matrix: 2D representation โ โmรn โ e.g., grayscale images
- Tensor: Generalization to n-dimensions โ e.g., RGB images, multi-head attention
import torch
tensor = torch.randn(3, 3, 224, 224) # Batch of 3 RGB images: (C, H, W)
These representations preserve geometry and locality โ crucial for convolutional and transformer-based models.
๐ Group Representations
A group is a set of operations (e.g., rotation, reflection) applied to data without changing its structure. In ML, group theory supports symmetry-aware learning.
- Equivariance: CNNs maintain structure under translation
- Physics-Informed ML: Models invariant to SO(3) rotations in 3D space
Example: Using group SO(3) for molecular modeling with E(n) Equivariant Graph Neural Networks.
๐ง Intuition Recap
| Structure Type | Shape / Domain | Real-World Use | ML Application |
|---|---|---|---|
| Vector | โโฟ | Tabular data, embeddings | Linear models, MLPs |
| Matrix | โmรn | Images, text, audio | CNN, RNN |
| Tensor | โnโร...รnโ | Video, multi-modal input | Attention, 3D models |
| Group Rep. | Algebraic group actions | Symmetries, invariants | Equivariant networks |
๐ 3. Probabilistic Representations
๐ Distributions as Representations
Probabilistic feature representations model uncertainty, variability, and generative structure within data. Rather than assigning fixed values, they encode likelihoods โ acknowledging the stochastic nature of both the world and our observations.
Core Distributions
- Gaussian (\( \mathcal{N}(\mu, \sigma^2) \)): foundational for generative modeling, assumptions of noise
- Bernoulli: binary outcomes (e.g., click vs no-click)
- Multinomial / Categorical: discrete events (e.g., token distributions, class labels)
Applications
- ๐ง Bayesian inference: priors/posteriors over model parameters
- ๐ฒ Generative models: VAE, GAN, diffusion processes
๐ Statistical Moments
Statistical moments describe summary properties of distributions. They're often used to compactly encode variation in data.
| Moment | Mathematical Form | Intuition |
|---|---|---|
| Mean (\( \mu \)) | \( \mathbb{E}[X] \) | Central tendency |
| Variance (\( \sigma^2 \)) | \( \mathbb{E}[(X - \mu)^2] \) | Dispersion / spread |
| Skewness | \( \mathbb{E}[(X - \mu)^3] \) | Asymmetry of distribution |
| Kurtosis | \( \mathbb{E}[(X - \mu)^4] \) | Tail heaviness |
Use cases include:
- ๐ Feature engineering (finance, time-series)
- โ ๏ธ Anomaly detection
- ๐ผ๏ธ Texture analysis in computer vision
๐ Density Functions
The probability density function (PDF) defines a distribution over a continuous variable. Learning the PDF enables models to perform density estimation and capture generative structure.
- Normalizing Flows: learn bijective mappings from noise to data
- Diffusion Models: reverse Gaussian noise to form coherent samples
Score Functions
The score function is the gradient of the log-density: \( \nabla_x \log p(x) \)
- ๐งฎ Score Matching: estimate gradients of log-probability
- ๐ Generative SDEs: leverage score for controlled sampling
๐จ Visuals
- ๐ Normal distributions with shifting \( \mu \), \( \sigma \), and skew
- ๐งญ Score field: show \( \nabla_x \log p(x) \) vectors pulling toward density peaks
- ๐ Flow transformation: noise โ data via invertible functions
๐ 4. Geometric & Topological Spaces
Feature representations grounded in geometry and topology enable machines to reason not just about data values, but about shape, structure, and curvature โ core to perception, movement, and relational understanding.
๐ Euclidean Space
The most familiar setting, Euclidean space assumes flat geometry with standard distance measures.
- L2 norm: \( \|\vec{x} - \vec{y}\|_2 \)
- Assumes orthogonality and linearity
Applications:
- ๐ Clustering algorithms: k-means, DBSCAN
- ๐ผ๏ธ Convolutional Neural Networks: filters on pixel grids
Most deep learning layers (dense, conv) implicitly operate in Euclidean space.
๐ Riemannian Manifolds
When data lies on a non-linear surface, Euclidean assumptions fail. A Riemannian manifold is a smooth space that's locally Euclidean but has global curvature.
Examples:
- SPD Matrices (e.g., covariance, diffusion tensors)
- Spheres, tori (e.g., SO(3) pose spaces)
Key Properties:
- Local inner products vary with location
- Geodesics replace straight lines
- Requires tools from differential geometry
Applications:
- ๐ Diffusion models on manifolds
- ๐ง Kernel learning in curved spaces
- ๐ Pose estimation
๐ป Hyperbolic Space
In contrast to Euclidean flatness and spherical curvature, hyperbolic spaces have negative curvature โ ideal for hierarchical and tree-like data.
Characteristics:
- Distance grows exponentially with radius
- Can embed hierarchical structures with low distortion
- Often modeled with the Poincarรฉ ball
Applications:
- ๐ Knowledge graph embeddings
- ๐ท๏ธ Hierarchical classification
- ๐ GNNs for taxonomy-like graphs
๐ Topological Features
Topology studies shape and connectivity over distance and angles โ offering invariance to deformations.
Key Tools:
- Persistent Homology: finds multi-scale connected components, loops, voids
- Betti Numbers: counts of n-dimensional topological features
- Simplicial Complexes: abstract graphs representing shapes
Why It Matters:
- Invariant to stretching, bending
- Captures "holes," "loops," flares in the data
Applications:
- ๐ก Sensor networks
- ๐ง Brain region analysis
- โ ๏ธ Anomaly detection via topological signatures
๐งญ Summary Table
| Space Type | Curvature | Key Idea | Use Case |
|---|---|---|---|
| Euclidean | 0 | Flat distances | Clustering, CNN |
| Riemannian | Variable | Curved geometry | Pose, kernel methods |
| Hyperbolic | < 0 | Hierarchical structure | Graphs, taxonomies |
| Topological | N/A | Connectivity/shape | Anomaly detection, sensors |
๐ง 5. Latent & Embedding Spaces
Latent and embedding spaces represent the hidden, internal geometries where machine learning models distill complex data into tractable, often interpretable, forms. These spaces are learned, structured, and often non-Euclidean โ revealing how data "wants" to be organized.
๐ง Latent Embeddings
Latent variables are abstract internal coordinates inferred by models. Though unobserved, they are essential for generating or decoding observable data.
Sources:
- Autoencoders (AEs): compressed representations via reconstruction loss
- Variational Autoencoders (VAEs): probabilistic latent variables with smoothness constraints
- Transformers: encode token sequences into semantic embedding vectors
Role:
- Capture conceptual abstraction
- Enable dimensionality reduction, generation, and transfer learning
๐ Feature Manifolds
Real-world data often lives on low-dimensional, non-linear manifolds embedded in high-dimensional spaces.
Key Ideas:
- Intrinsic Dimensionality: actual degrees of freedom โช raw dimensions
- Manifold Learning: extract structure via nonlinear embeddings
- t-SNE: preserves local proximity
- Isomap: preserves global geodesic distances
- LLE: preserves local linear neighborhoods
Applications:
- ๐ Visualization of high-dimensional data
- ๐งน Noise filtering and denoising
- ๐ง Model introspection and analysis
from sklearn.manifold import TSNE
embedded = TSNE(n_components=2).fit_transform(features)
๐ Metric Learning
Metric learning aims to create feature spaces where semantic similarity is aligned with geometric closeness.
Objectives:
- Pull similar data points together
- Push dissimilar points apart
Techniques:
- ๐ Contrastive Loss
- ๐บ Triplet Loss
- ๐ฏ Siamese Networks
Applications:
- ๐ท Face verification
- ๐ผ๏ธ Image retrieval
- ๐ฏ Recommender systems
The geometry of the learned space encodes task-specific knowledge: not just what, but how similar.
๐ฌ Key Insights
| Space Type | Nature | Learning Role | Use Case |
|---|---|---|---|
| Latent Space | Abstract, model-specific | Compress, generalize | Generation, transfer learning |
| Manifold Space | Non-linear low-D surface | Discover structure | Visualization, denoising |
| Metric Space | Distance-preserving | Encode similarity | Retrieval, verification |
๐ 6. Graph & Relational Representations
Graph and relational structures represent data not just as items, but as interconnected systems. This enables learning from topology, hierarchy, and multi-agent interactions โ foundational in domains where relationships matter as much as features.
๐ Graphs
A graph is formally defined as: \( G = (V, E) \)
- Nodes (V): Entities (e.g., atoms, documents, users)
- Edges (E): Relationships or interactions
- Optional Features: Attributes on nodes or edges (e.g., weights, types)
Why Graphs?
- ๐งฌ Model complex, non-grid systems: molecules, social networks, citations
- ๐ Operate in non-Euclidean domains
- โ๏ธ Use local connectivity and parameter sharing
Learning Frameworks:
- Graph Neural Networks (GNNs): iterative neighborhood aggregation
- Message Passing: sum/mean/max over neighbor messages
Graphs encode not just what something is โ but how it connects.
๐งฐ Multi-Relational Sets
Beyond simple graphs, multi-relational structures represent typed and directed relationships โ a key element in symbolic and structured AI.
Examples:
- ๐ Knowledge Graphs: Triplets of (head, relation, tail)
- ๐๏ธ Relational Databases: tables joined by keys
- ๐งฎ Tensor Representations: 3D relations for models like RESCAL, TuckER, DistMult
Applications:
- ๐ง Symbolic reasoning and rule learning
- ๐ฎ Relational world models in RL
- ๐ ๏ธ Program synthesis and high-level planning
Multi-relational learning bridges neural and symbolic paradigms โ blending statistical learning with logic.
๐ Visualization Idea
- ๐ Hover to inspect node features and connectivity
- ๐ง Click to view GNN embeddings over layers
- ๐๏ธ Highlight communities and structural motifs
๐ฌ Summary Table
| Structure Type | Key Element | Learning Focus | Use Case |
|---|---|---|---|
| Graph (mono-rel) | Nodes + edges | Local/global patterns | Molecules, social, citation |
| Multi-relational | Triples, tensors | Entity-relation modeling | Knowledge bases, planning |
| Symbolic-graph mix | Logic + links | Reasoning over structure | Neuro-symbolic AI |
๐ฏ 7. Domain-Specific Spaces
Some feature representations arise not from universal structures like vectors or graphs, but from the specific physics or logic of the data domain. These structures capture domain-relevant properties such as periodicity, relevance, and spatial orientation โ enabling models to reason in more natural coordinates.
๐ต Frequency Domain
The frequency domain reveals structure hidden in the time or spatial domain โ analyzing how data behaves over cycles rather than individual points.
Core Tools:
- Fourier Transform: decomposes signal into sine/cosine bases
- Wavelet Transform: captures local and multiscale frequency components
Applications:
- ๐ Audio signal processing (speech, music)
- ๐ง Electrophysiology (EEG, MEG)
- ๐ Seismic data analysis
- ๐งฎ Spectral CNNs on graphs
Frequency representations encode harmonic structure, essential for modeling periodic and oscillatory phenomena.
๐ฏ Attention Weights
Attention-based representations apply contextual weighting to features based on relevance to a task or query โ yielding highly adaptive, dynamic encodings.
The attention mechanism is typically formalized as:
\[
\text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^T}{\sqrt{d}} \right) V
\]
- \( Q \): Query
- \( K \): Key
- \( V \): Value
Applications:
- ๐ง Transformers (BERT, GPT)
- ๐ฃ๏ธ Speech-to-text alignment
- ๐ Document relevance modeling
Attention enables context-aware learning by encoding dynamic importance and relational salience.
๐ Complex & Quaternion Representations
These advanced number systems extend real-valued vectors, encoding additional dimensions like rotation, phase, and symmetry.
Complex Numbers:
- Encode amplitude and phase
- Used in signal processing and quantum computing
Quaternions:
- Extend complex numbers to 3D and 4D
- Represent rotations without gimbal lock
- Enable smooth transformations in 3D space
Applications:
- ๐ค 3D motion (robotics, animation)
- ๐ฌ Quantum machine learning
- ๐งฉ Geometric deep learning
These representations matter where direction, symmetry, or interference are more informative than raw magnitudes.
๐ง Summary Table
| Domain Feature | Representation Type | Use Case |
|---|---|---|
| Periodicity | Frequency (Fourier, Wavelet) | Audio, EEG, seismic |
| Contextual salience | Attention weights | NLP, multimodal transformers |
| Rotation / Phase | Complex / Quaternion numbers | Signal processing, quantum ML |
๐ฃ 8. Logic & Symbolic Representations
Unlike statistical or geometric models, symbolic representations operate over discrete structures and formal logic โ enabling machines to reason with precision, constraints, and explicit rules.
These systems align closely with human cognition: they are interpretable, compositional, and rule-based, which makes them crucial for domains requiring explainability, inference, and correctness.
๐ฃ Symbolic Logic
Symbolic logic represents knowledge through facts and rules in a formal language, such as propositional or first-order logic.
Structure:
- Atoms: atomic facts (e.g.,
IsRed(x),Parent(Alice, Bob)) - Formulas: logical expressions (e.g., \( \forall x (Bird(x) \rightarrow CanFly(x)) \))
- Inference Rules: mechanisms for reasoning (e.g., Modus Ponens)
Applications:
- ๐ Theorem proving
- ๐ง Knowledge bases (e.g., Cyc, Wikidata)
- ๐ Rule-based expert systems
Symbolic features are inherently interpretable and support exact reasoning, but struggle with noise and ambiguity.
๐ง Hybrid Representations
Modern AI seeks to fuse the pattern recognition of neural networks with the rigor of symbolic logic โ forming neuro-symbolic systems that can learn, reason, and explain.
Approaches:
- ๐งฎ Differentiable logic: neural approximations of symbolic rules
- ๐งญ Logic-guided training: embed logical constraints in the loss function
- ๐ฒ Probabilistic logic: model uncertainty in structured domains
Key Tools:
- ๐ DeepProbLog: combines Prolog with neural predicates
- ๐ Logic Tensor Networks: blends fuzzy logic with tensor algebra
- ๐งฉ Neural Theorem Provers: learn to perform symbolic reasoning chains
Benefits:
- โก Combines learnability with logical generalization
- ๐ Enables explainability and rule extraction
- ๐ง Powers robust planning in structured environments
๐ Summary Table
| Representation Type | Strength | Application Domain |
|---|---|---|
| Symbolic Logic | Precise, explainable | Reasoning, rule-based systems |
| Neuro-symbolic Hybrid | Learnable + inferable | Robotics, scientific reasoning |
| Probabilistic Logic | Uncertainty-aware logic | AI planning, knowledge graphs |
๐งฎ 9. Energy & Metric-Based Representations
These representations define how data is situated within a space of forces and distances โ where learning is seen not just as prediction, but as shaping a landscape or relational field that guides inference, classification, and generation.
๐งฎ Energy Functions
In energy-based models (EBMs), a data point is represented by its energy level โ a scalar that encodes the compatibility of the input with the model.
Key Idea:
- ๐ป Lower energy = higher plausibility
- ๐ Model learns an energy landscape with valleys (data) and hills (noise)
Applications:
- ๐ง Hopfield Networks โ associative memory via energy minimization
- ๐ซ๏ธ Diffusion Models โ denoising paths toward data modes
- โ๏ธ Contrastive EBMs โ push down energy of data, up for distractors
Formalization:
The energy function maps inputs to scalar values:
\( E_\theta(x) \rightarrow \mathbb{R} \)
There is no explicit output; inference is searching for low-energy states.
๐ Metric Spaces
Metric-based representations encode **similarity** through geometric structure โ the closer two points are, the more semantically related they are.
Formal Definition:
A metric \( d(x, y) \) satisfies:
- Non-negativity: \( d(x, y) \geq 0 \)
- Identity: \( d(x, y) = 0 \Leftrightarrow x = y \)
- Symmetry: \( d(x, y) = d(y, x) \)
- Triangle Inequality: \( d(x, z) \leq d(x, y) + d(y, z) \)
Common Metrics:
- ๐ L2 Norm: Euclidean distance
- ๐งญ Cosine Similarity: angle between vectors
- ๐ Mahalanobis Distance: covariance-aware scaling
Techniques:
- ๐ฏ Siamese Networks: pairwise similarity learning
- ๐บ Triplet Loss: anchor-positive-negative discrimination
- ๐ฆ Prototypical Networks: class means in embedding space
Applications:
- ๐ค Face verification
- ๐ Image search and retrieval
- ๐ Few-shot learning
๐ Visual Concepts
- ๐ฅ Energy Contours: heatmap showing energy basins vs noise peaks
- ๐งญ Embedding Geometry: visualize pairwise push-pull via t-SNE or PCA
๐ฌ Comparison Table
| Representation Type | Learning View | Model Examples | Use Cases |
|---|---|---|---|
| Energy Function | Minimize cost | EBM, Hopfield, Diffusion | Generation, memory, anomaly detection |
| Metric Embedding | Preserve similarity | Siamese, Triplet, ProtoNet | Verification, retrieval |
๐ Emerging Trends and Extensions
As the field of representation learning matures, several cutting-edge trends are redefining how we conceptualize, design, and analyze feature spaces โ blending deep learning with geometry, physics, information theory, and ethics.
1. ๐ง Neural Fields (Implicit Representations)
Represent continuous functions like images, 3D shapes, or fields using neural networks instead of discrete grids or tensors.
- Examples: NeRF (Neural Radiance Fields), SIREN
- Why: Enables high-resolution, continuous feature modeling with compact MLPs
2. ๐ Geometric Deep Learning (GDL)
Unifies learning across manifolds, graphs, and group-symmetric structures with tools from modern geometry and physics.
- Topics: Group-equivariant networks, gauge equivariance, directional GNNs
- Why: Exploits geometric priors โ highly active area in research and application
3. ๐ฌ Tensor Programs & Feature Dynamics
Applies asymptotic analysis to neural networks, viewing them as dynamical systems that propagate signals and representations.
- Frameworks: Neural Tangent Kernels (NTK), Mean Field Theory
- Why: Provides a theoretical lens on how representations evolve in deep networks
4. ๐งฉ Representation Compression & Sparsity
Seeks minimal representations that retain predictive power while reducing redundancy.
- Topics: Lottery Ticket Hypothesis, compressed sensing, structured sparsity
- Why: Enables efficient, interpretable, and deployable models
5. โ๏ธ Representation Ethics & Robustness
Addresses critical concerns in fairness, bias, and adversarial vulnerability in learned representations.
- Techniques: Adversarial training, disentangled representations, causal modeling
- Why: Ensures trustworthy and socially responsible AI systems
๐ Advanced Topics to Deepen Academic Rigor
| Topic | Why It Matters |
|---|---|
| Category Theory in Representation | Abstracts relationships across representation types via morphisms and functors |
| Information Bottleneck Principle | Balances compression and relevance in latent spaces |
| Lie Groups & Symmetry Representation | Core to equivariant networks in physics-aware AI |
| Topos Theory | Unifies logic, set theory, and topology in AI foundations |
| Geodesic Embeddings | Use shortest paths on manifolds for better inductive biases |
๐ Comparative Summary Table
This summary acts as a conceptual compass โ helping you choose appropriate representation spaces based on data structure, modeling objectives, and theoretical assumptions.
| Space Type | Key Property | Typical Use Case | Example Model |
|---|---|---|---|
| Vector Space | Linear algebraic basis | Tabular data, regression | SVM, Logistic Regression |
| Latent Space | Compressed, abstract | Autoencoding, generative models | VAE, Autoencoder |
| Graph Space | Relational links | Molecules, social networks | Graph Neural Networks (GNN) |
| Topological Space | Connectivity invariance | Sensor topology, persistence | TDA (Persistent Homology) |
| Attention Space | Weighted contextual relevance | Language, vision, alignment | Transformer, BERT |
| Density Space | Probability & uncertainty | Generative flows, modeling noise | Normalizing Flows, VAEs |
| Metric Space | Distance preservation | Similarity tasks, verification | Siamese Network, Triplet Net |
| Energy Space | Energy minimization | Generation, memory recall | EBMs, Hopfield Networks |
| Manifold Space | Curved, structured surface | Dimensionality reduction, clustering | Isomap, LLE, Diffusion Models |
| Symbolic Space | Logical/formal reasoning | Planning, theorem proving | Logic Tensor Networks, DeepProbLog |
๐ Resources & Articles
๐ฌ Foundational Papers
| Title | Author(s) | Focus |
|---|---|---|
| A Few Useful Things to Know about Machine Learning | Pedro Domingos | Broad ML insights, including importance of representation |
| Representation Learning: A Review and New Perspectives | Yoshua Bengio et al. | Seminal work on representation learning |
| Auto-Encoding Variational Bayes | Kingma & Welling | Introduced VAEs for probabilistic representations |
| Inductive Biases, Deep Learning, and Graph Networks | Battaglia et al. (DeepMind) | Geometric and graph-based representations |
| Attention is All You Need | Vaswani et al. | Introduced transformer-based attention representations |
๐ Books
- Deep Learning โ Goodfellow, Bengio, Courville
A classic with deep treatment of representation learning, manifolds, and structured data. - Mathematics for Machine Learning โ Deisenroth, Faisal, Ong
Rigorous treatment of vector spaces, matrices, probability. - Geometric Deep Learning โ Bronstein et al.
Modern foundation for grids, groups, graphs, and geodesics.
๐งฐ Frameworks & Libraries
| Library | Purpose |
|---|---|
| PyTorch Geometric | Graph and relational learning |
| HuggingFace Transformers | Attention-based language modeling |
| scikit-learn | Manifold learning, classical models |
| Giotto-TDA | Topological Data Analysis in Python |
| TensorFlow Probability | Probabilistic modeling and distributions |
| JAX | Neural fields, differentiable programming |
๐ Online Courses & Lectures
- Stanford CS231n โ Deep Learning for Computer Vision
- MIT 6.S191 โ Introduction to Deep Learning
- Oxford Geometric Deep Learning โ Bronsteinโs landmark lectures
- Yann LeCun: Representation Learning โ Conceptual overview from a pioneer