๐Ÿ“˜ 1. Introduction to Feature Representation


๐Ÿ” Why Representations Are Foundational in AI

At the core of every intelligent system lies the question: how is knowledge represented? Whether we're predicting stock prices, recognizing faces, or translating text, we must transform raw data into structured formats that algorithms can operate on.

๐Ÿ“˜ Feature Representation โ€” A Philosophical and Technical Definition


๐Ÿง  What Is Feature Representation?

Feature representation is both the act and artifact of translating raw empirical phenomena into structured, mathematical entities that preserve information essential for reasoning, learning, and decision-making.

It is the bridge between the ontological world (what is) and the computational world (what can be learned or acted upon), encoding data in a way that both machines and theorists can manipulate meaningfully.


๐ŸŽญ Philosophical Perspective

Feature representation functions as a synthetic abstraction: it neither captures the full scope of reality nor merely compresses it. Instead, it selectively projects aspects of reality into structured spacesโ€”vectorial, relational, topologicalโ€”where structure, similarity, and transformation become legible.


๐Ÿ”ข Technical Definition

A feature representation is a mapping:

A feature representation is a mapping: \( \Phi : \mathcal{X} \rightarrow \mathcal{F} \)

where \( X \) is the raw input space (e.g., pixels, waveforms, text) and \( F \) is the feature space, often structured by algebraic or geometric principles (e.g., \( \mathbb{R}^n \), graphs, manifolds).


โš™๏ธ Desired Properties of Representations

  • Discriminative Power: separates signal from noise
  • Invariance: ignores irrelevant transformations (e.g., rotation, scaling)
  • Learnability: facilitates efficient statistical or neural modeling
  • Interpretability: provides insight into data and decisions

๐Ÿงญ Epistemic Role

Feature representations are the epistemic vessels by which intelligent systems apprehend the world โ€” shaping not only how machines โ€œseeโ€ but also what they are capable of โ€œknowing.โ€

These structured formats โ€” feature representations โ€” are the very language machines use to perceive, reason, and act.

Without meaningful representation, even the most powerful learning algorithms cannot extract insight from data. Representation is not just preprocessing โ€” it is the epistemological foundation of machine intelligence.

๐ŸŽญ Representation as Abstraction

A representation is an abstraction that reduces complexity while preserving meaning. It captures the essence of input data in a form amenable to learning:

  • ๐Ÿ–ผ๏ธ A photo becomes a matrix of pixel intensities
  • ๐Ÿ“ A sentence becomes a sequence of vectors
  • ๐Ÿงช A molecule becomes a graph of atoms and bonds

Through abstraction, we discard noise and emphasize structure, making data more task-relevant and model-compatible.


โš™๏ธ Key Properties of Representations

  • Dimensionality: degrees of freedom in the representation. Too low โ†’ loss of nuance; too high โ†’ curse of dimensionality.
  • Invariance: ignores irrelevant transformations (e.g., translation in images).
  • Learnability: how easily a model can extract meaningful structure.
  • Interpretability: can humans understand or diagnose the representation?

๐ŸŒ Real-World Examples

Domain Input Representation Learning Paradigm
Vision Image Pixel tensor (3D) CNN, ResNet
Chemistry Molecule Graph (nodes/edges) GNN, molecular fingerprints
NLP Sentence Token embeddings Transformers, RNNs
Knowledge Facts, rules Logic/symbolic graph Neuro-symbolic systems

๐Ÿ“˜ Visual: Data Transformation Map


graph TD;
    A[Raw Data] --> B[Preprocessing]
    B --> C[Vector/Tensor Representation]
    C --> D[Latent Embedding]
    D --> E[Prediction / Output]
    style D fill:#cce5ff,stroke:#333,stroke-width:2px;
    style C fill:#d4edda,stroke:#333,stroke-width:2px;
  

๐Ÿงฑ 2. Linear & Algebraic Spaces


โœ… Vector Spaces

At the heart of machine learning lies the vector space โ€” a structured domain where data points are represented as vectors in โ„โฟ. Each feature corresponds to a dimension, and relationships between data points are captured via operations like dot products, norms, and projections.

  • Expressiveness: Any real-world data can be embedded in a vector space.
  • Computational Simplicity: Enables fast, gradient-based optimization.
  • Ubiquity: Foundational for models like SVMs, linear regression, neural nets.

Let \( \vec{x} \in \mathbb{R}^n \), meaning:

\[ \vec{x} = [x_1, x_2, \dots, x_n] \]


๐Ÿงฑ Matrix & Tensor Representations

Many data types require structured collections of values:

  • Matrix: 2D representation โˆˆ โ„mร—n โ€” e.g., grayscale images
  • Tensor: Generalization to n-dimensions โ€” e.g., RGB images, multi-head attention

import torch
tensor = torch.randn(3, 3, 224, 224)  # Batch of 3 RGB images: (C, H, W)
  

These representations preserve geometry and locality โ€” crucial for convolutional and transformer-based models.


๐Ÿ”„ Group Representations

A group is a set of operations (e.g., rotation, reflection) applied to data without changing its structure. In ML, group theory supports symmetry-aware learning.

  • Equivariance: CNNs maintain structure under translation
  • Physics-Informed ML: Models invariant to SO(3) rotations in 3D space

Example: Using group SO(3) for molecular modeling with E(n) Equivariant Graph Neural Networks.


๐Ÿง  Intuition Recap

Structure Type Shape / Domain Real-World Use ML Application
Vector โ„โฟ Tabular data, embeddings Linear models, MLPs
Matrix โ„mร—n Images, text, audio CNN, RNN
Tensor โ„nโ‚ร—...ร—nโ‚– Video, multi-modal input Attention, 3D models
Group Rep. Algebraic group actions Symmetries, invariants Equivariant networks

๐Ÿ“Š 3. Probabilistic Representations


๐Ÿ“ˆ Distributions as Representations

Probabilistic feature representations model uncertainty, variability, and generative structure within data. Rather than assigning fixed values, they encode likelihoods โ€” acknowledging the stochastic nature of both the world and our observations.

Core Distributions

  • Gaussian (\( \mathcal{N}(\mu, \sigma^2) \)): foundational for generative modeling, assumptions of noise
  • Bernoulli: binary outcomes (e.g., click vs no-click)
  • Multinomial / Categorical: discrete events (e.g., token distributions, class labels)

Applications

  • ๐Ÿง  Bayesian inference: priors/posteriors over model parameters
  • ๐ŸŽฒ Generative models: VAE, GAN, diffusion processes

๐Ÿ“‰ Statistical Moments

Statistical moments describe summary properties of distributions. They're often used to compactly encode variation in data.

Moment Mathematical Form Intuition
Mean (\( \mu \)) \( \mathbb{E}[X] \) Central tendency
Variance (\( \sigma^2 \)) \( \mathbb{E}[(X - \mu)^2] \) Dispersion / spread
Skewness \( \mathbb{E}[(X - \mu)^3] \) Asymmetry of distribution
Kurtosis \( \mathbb{E}[(X - \mu)^4] \) Tail heaviness

Use cases include:

  • ๐Ÿ“Š Feature engineering (finance, time-series)
  • โš ๏ธ Anomaly detection
  • ๐Ÿ–ผ๏ธ Texture analysis in computer vision

๐Ÿ“˜ Density Functions

The probability density function (PDF) defines a distribution over a continuous variable. Learning the PDF enables models to perform density estimation and capture generative structure.

  • Normalizing Flows: learn bijective mappings from noise to data
  • Diffusion Models: reverse Gaussian noise to form coherent samples

Score Functions

The score function is the gradient of the log-density: \( \nabla_x \log p(x) \)

  • ๐Ÿงฎ Score Matching: estimate gradients of log-probability
  • ๐ŸŒ€ Generative SDEs: leverage score for controlled sampling

๐ŸŽจ Visuals

  • ๐Ÿ“ˆ Normal distributions with shifting \( \mu \), \( \sigma \), and skew
  • ๐Ÿงญ Score field: show \( \nabla_x \log p(x) \) vectors pulling toward density peaks
  • ๐Ÿ”„ Flow transformation: noise โ†’ data via invertible functions

๐Ÿ“ 4. Geometric & Topological Spaces


Feature representations grounded in geometry and topology enable machines to reason not just about data values, but about shape, structure, and curvature โ€” core to perception, movement, and relational understanding.


๐Ÿ“ Euclidean Space

The most familiar setting, Euclidean space assumes flat geometry with standard distance measures.

  • L2 norm: \( \|\vec{x} - \vec{y}\|_2 \)
  • Assumes orthogonality and linearity

Applications:

  • ๐Ÿ“Š Clustering algorithms: k-means, DBSCAN
  • ๐Ÿ–ผ๏ธ Convolutional Neural Networks: filters on pixel grids
Most deep learning layers (dense, conv) implicitly operate in Euclidean space.

๐ŸŒ Riemannian Manifolds

When data lies on a non-linear surface, Euclidean assumptions fail. A Riemannian manifold is a smooth space that's locally Euclidean but has global curvature.

Examples:

  • SPD Matrices (e.g., covariance, diffusion tensors)
  • Spheres, tori (e.g., SO(3) pose spaces)

Key Properties:

  • Local inner products vary with location
  • Geodesics replace straight lines
  • Requires tools from differential geometry

Applications:

  • ๐ŸŒ€ Diffusion models on manifolds
  • ๐Ÿง  Kernel learning in curved spaces
  • ๐Ÿ“ Pose estimation

๐Ÿ”ป Hyperbolic Space

In contrast to Euclidean flatness and spherical curvature, hyperbolic spaces have negative curvature โ€” ideal for hierarchical and tree-like data.

Characteristics:

  • Distance grows exponentially with radius
  • Can embed hierarchical structures with low distortion
  • Often modeled with the Poincarรฉ ball

Applications:

  • ๐Ÿ“š Knowledge graph embeddings
  • ๐Ÿท๏ธ Hierarchical classification
  • ๐Ÿ”— GNNs for taxonomy-like graphs

๐ŸŒ€ Topological Features

Topology studies shape and connectivity over distance and angles โ€” offering invariance to deformations.

Key Tools:

  • Persistent Homology: finds multi-scale connected components, loops, voids
  • Betti Numbers: counts of n-dimensional topological features
  • Simplicial Complexes: abstract graphs representing shapes

Why It Matters:

  • Invariant to stretching, bending
  • Captures "holes," "loops," flares in the data

Applications:

  • ๐Ÿ“ก Sensor networks
  • ๐Ÿง  Brain region analysis
  • โš ๏ธ Anomaly detection via topological signatures

๐Ÿงญ Summary Table

Space Type Curvature Key Idea Use Case
Euclidean 0 Flat distances Clustering, CNN
Riemannian Variable Curved geometry Pose, kernel methods
Hyperbolic < 0 Hierarchical structure Graphs, taxonomies
Topological N/A Connectivity/shape Anomaly detection, sensors

๐Ÿง  5. Latent & Embedding Spaces


Latent and embedding spaces represent the hidden, internal geometries where machine learning models distill complex data into tractable, often interpretable, forms. These spaces are learned, structured, and often non-Euclidean โ€” revealing how data "wants" to be organized.


๐Ÿง  Latent Embeddings

Latent variables are abstract internal coordinates inferred by models. Though unobserved, they are essential for generating or decoding observable data.

Sources:

  • Autoencoders (AEs): compressed representations via reconstruction loss
  • Variational Autoencoders (VAEs): probabilistic latent variables with smoothness constraints
  • Transformers: encode token sequences into semantic embedding vectors

Role:

  • Capture conceptual abstraction
  • Enable dimensionality reduction, generation, and transfer learning

๐Ÿ”„ Feature Manifolds

Real-world data often lives on low-dimensional, non-linear manifolds embedded in high-dimensional spaces.

Key Ideas:

  • Intrinsic Dimensionality: actual degrees of freedom โ‰ช raw dimensions
  • Manifold Learning: extract structure via nonlinear embeddings
  • t-SNE: preserves local proximity
  • Isomap: preserves global geodesic distances
  • LLE: preserves local linear neighborhoods

Applications:

  • ๐Ÿ“Š Visualization of high-dimensional data
  • ๐Ÿงน Noise filtering and denoising
  • ๐Ÿง  Model introspection and analysis

from sklearn.manifold import TSNE
embedded = TSNE(n_components=2).fit_transform(features)
  

๐Ÿ”— Metric Learning

Metric learning aims to create feature spaces where semantic similarity is aligned with geometric closeness.

Objectives:

  • Pull similar data points together
  • Push dissimilar points apart

Techniques:

  • ๐Ÿ” Contrastive Loss
  • ๐Ÿ”บ Triplet Loss
  • ๐Ÿ‘ฏ Siamese Networks

Applications:

  • ๐Ÿ“ท Face verification
  • ๐Ÿ–ผ๏ธ Image retrieval
  • ๐ŸŽฏ Recommender systems
The geometry of the learned space encodes task-specific knowledge: not just what, but how similar.

๐Ÿ”ฌ Key Insights

Space Type Nature Learning Role Use Case
Latent Space Abstract, model-specific Compress, generalize Generation, transfer learning
Manifold Space Non-linear low-D surface Discover structure Visualization, denoising
Metric Space Distance-preserving Encode similarity Retrieval, verification

๐Ÿ”— 6. Graph & Relational Representations


Graph and relational structures represent data not just as items, but as interconnected systems. This enables learning from topology, hierarchy, and multi-agent interactions โ€” foundational in domains where relationships matter as much as features.


๐Ÿ”— Graphs

A graph is formally defined as: \( G = (V, E) \)

  • Nodes (V): Entities (e.g., atoms, documents, users)
  • Edges (E): Relationships or interactions
  • Optional Features: Attributes on nodes or edges (e.g., weights, types)

Why Graphs?

  • ๐Ÿงฌ Model complex, non-grid systems: molecules, social networks, citations
  • ๐ŸŒ€ Operate in non-Euclidean domains
  • โš™๏ธ Use local connectivity and parameter sharing

Learning Frameworks:

  • Graph Neural Networks (GNNs): iterative neighborhood aggregation
  • Message Passing: sum/mean/max over neighbor messages
Graphs encode not just what something is โ€” but how it connects.

๐Ÿงฐ Multi-Relational Sets

Beyond simple graphs, multi-relational structures represent typed and directed relationships โ€” a key element in symbolic and structured AI.

Examples:

  • ๐Ÿ“š Knowledge Graphs: Triplets of (head, relation, tail)
  • ๐Ÿ—ƒ๏ธ Relational Databases: tables joined by keys
  • ๐Ÿงฎ Tensor Representations: 3D relations for models like RESCAL, TuckER, DistMult

Applications:

  • ๐Ÿง  Symbolic reasoning and rule learning
  • ๐ŸŽฎ Relational world models in RL
  • ๐Ÿ› ๏ธ Program synthesis and high-level planning
Multi-relational learning bridges neural and symbolic paradigms โ€” blending statistical learning with logic.

๐Ÿ“˜ Visualization Idea

  • ๐Ÿ‘† Hover to inspect node features and connectivity
  • ๐Ÿง  Click to view GNN embeddings over layers
  • ๐Ÿ–‡๏ธ Highlight communities and structural motifs

๐Ÿ”ฌ Summary Table

Structure Type Key Element Learning Focus Use Case
Graph (mono-rel) Nodes + edges Local/global patterns Molecules, social, citation
Multi-relational Triples, tensors Entity-relation modeling Knowledge bases, planning
Symbolic-graph mix Logic + links Reasoning over structure Neuro-symbolic AI

๐ŸŽฏ 7. Domain-Specific Spaces


Some feature representations arise not from universal structures like vectors or graphs, but from the specific physics or logic of the data domain. These structures capture domain-relevant properties such as periodicity, relevance, and spatial orientation โ€” enabling models to reason in more natural coordinates.


๐ŸŽต Frequency Domain

The frequency domain reveals structure hidden in the time or spatial domain โ€” analyzing how data behaves over cycles rather than individual points.

Core Tools:

  • Fourier Transform: decomposes signal into sine/cosine bases
  • Wavelet Transform: captures local and multiscale frequency components

Applications:

  • ๐Ÿ”Š Audio signal processing (speech, music)
  • ๐Ÿง  Electrophysiology (EEG, MEG)
  • ๐ŸŒ Seismic data analysis
  • ๐Ÿงฎ Spectral CNNs on graphs
Frequency representations encode harmonic structure, essential for modeling periodic and oscillatory phenomena.

๐ŸŽฏ Attention Weights

Attention-based representations apply contextual weighting to features based on relevance to a task or query โ€” yielding highly adaptive, dynamic encodings.

The attention mechanism is typically formalized as:
\[ \text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^T}{\sqrt{d}} \right) V \]

  • \( Q \): Query
  • \( K \): Key
  • \( V \): Value

Applications:

  • ๐Ÿง  Transformers (BERT, GPT)
  • ๐Ÿ—ฃ๏ธ Speech-to-text alignment
  • ๐Ÿ“š Document relevance modeling
Attention enables context-aware learning by encoding dynamic importance and relational salience.

๐ŸŒ€ Complex & Quaternion Representations

These advanced number systems extend real-valued vectors, encoding additional dimensions like rotation, phase, and symmetry.

Complex Numbers:

  • Encode amplitude and phase
  • Used in signal processing and quantum computing

Quaternions:

  • Extend complex numbers to 3D and 4D
  • Represent rotations without gimbal lock
  • Enable smooth transformations in 3D space

Applications:

  • ๐Ÿค– 3D motion (robotics, animation)
  • ๐Ÿ”ฌ Quantum machine learning
  • ๐Ÿงฉ Geometric deep learning
These representations matter where direction, symmetry, or interference are more informative than raw magnitudes.

๐Ÿง  Summary Table

Domain Feature Representation Type Use Case
Periodicity Frequency (Fourier, Wavelet) Audio, EEG, seismic
Contextual salience Attention weights NLP, multimodal transformers
Rotation / Phase Complex / Quaternion numbers Signal processing, quantum ML

๐Ÿ”ฃ 8. Logic & Symbolic Representations


Unlike statistical or geometric models, symbolic representations operate over discrete structures and formal logic โ€” enabling machines to reason with precision, constraints, and explicit rules.

These systems align closely with human cognition: they are interpretable, compositional, and rule-based, which makes them crucial for domains requiring explainability, inference, and correctness.


๐Ÿ”ฃ Symbolic Logic

Symbolic logic represents knowledge through facts and rules in a formal language, such as propositional or first-order logic.

Structure:

  • Atoms: atomic facts (e.g., IsRed(x), Parent(Alice, Bob))
  • Formulas: logical expressions (e.g., \( \forall x (Bird(x) \rightarrow CanFly(x)) \))
  • Inference Rules: mechanisms for reasoning (e.g., Modus Ponens)

Applications:

  • ๐Ÿ“œ Theorem proving
  • ๐Ÿง  Knowledge bases (e.g., Cyc, Wikidata)
  • ๐Ÿ“‹ Rule-based expert systems
Symbolic features are inherently interpretable and support exact reasoning, but struggle with noise and ambiguity.

๐Ÿง  Hybrid Representations

Modern AI seeks to fuse the pattern recognition of neural networks with the rigor of symbolic logic โ€” forming neuro-symbolic systems that can learn, reason, and explain.

Approaches:

  • ๐Ÿงฎ Differentiable logic: neural approximations of symbolic rules
  • ๐Ÿงญ Logic-guided training: embed logical constraints in the loss function
  • ๐ŸŽฒ Probabilistic logic: model uncertainty in structured domains

Key Tools:

  • ๐Ÿ“Œ DeepProbLog: combines Prolog with neural predicates
  • ๐Ÿ“ Logic Tensor Networks: blends fuzzy logic with tensor algebra
  • ๐Ÿงฉ Neural Theorem Provers: learn to perform symbolic reasoning chains

Benefits:

  • โšก Combines learnability with logical generalization
  • ๐Ÿ” Enables explainability and rule extraction
  • ๐Ÿง  Powers robust planning in structured environments

๐Ÿ“Š Summary Table

Representation Type Strength Application Domain
Symbolic Logic Precise, explainable Reasoning, rule-based systems
Neuro-symbolic Hybrid Learnable + inferable Robotics, scientific reasoning
Probabilistic Logic Uncertainty-aware logic AI planning, knowledge graphs

๐Ÿงฎ 9. Energy & Metric-Based Representations


These representations define how data is situated within a space of forces and distances โ€” where learning is seen not just as prediction, but as shaping a landscape or relational field that guides inference, classification, and generation.


๐Ÿงฎ Energy Functions

In energy-based models (EBMs), a data point is represented by its energy level โ€” a scalar that encodes the compatibility of the input with the model.

Key Idea:

  • ๐Ÿ”ป Lower energy = higher plausibility
  • ๐Ÿ“‰ Model learns an energy landscape with valleys (data) and hills (noise)

Applications:

  • ๐Ÿง  Hopfield Networks โ€” associative memory via energy minimization
  • ๐ŸŒซ๏ธ Diffusion Models โ€” denoising paths toward data modes
  • โš–๏ธ Contrastive EBMs โ€” push down energy of data, up for distractors

Formalization:

The energy function maps inputs to scalar values:
\( E_\theta(x) \rightarrow \mathbb{R} \)

There is no explicit output; inference is searching for low-energy states.


๐Ÿ“ Metric Spaces

Metric-based representations encode **similarity** through geometric structure โ€” the closer two points are, the more semantically related they are.

Formal Definition:

A metric \( d(x, y) \) satisfies:

  • Non-negativity: \( d(x, y) \geq 0 \)
  • Identity: \( d(x, y) = 0 \Leftrightarrow x = y \)
  • Symmetry: \( d(x, y) = d(y, x) \)
  • Triangle Inequality: \( d(x, z) \leq d(x, y) + d(y, z) \)

Common Metrics:

  • ๐Ÿ“ L2 Norm: Euclidean distance
  • ๐Ÿงญ Cosine Similarity: angle between vectors
  • ๐Ÿ“ Mahalanobis Distance: covariance-aware scaling

Techniques:

  • ๐Ÿ‘ฏ Siamese Networks: pairwise similarity learning
  • ๐Ÿ”บ Triplet Loss: anchor-positive-negative discrimination
  • ๐Ÿ“ฆ Prototypical Networks: class means in embedding space

Applications:

  • ๐Ÿ‘ค Face verification
  • ๐Ÿ” Image search and retrieval
  • ๐Ÿ“š Few-shot learning

๐Ÿ“˜ Visual Concepts

  • ๐Ÿ”ฅ Energy Contours: heatmap showing energy basins vs noise peaks
  • ๐Ÿงญ Embedding Geometry: visualize pairwise push-pull via t-SNE or PCA

๐Ÿ”ฌ Comparison Table

Representation Type Learning View Model Examples Use Cases
Energy Function Minimize cost EBM, Hopfield, Diffusion Generation, memory, anomaly detection
Metric Embedding Preserve similarity Siamese, Triplet, ProtoNet Verification, retrieval

๐Ÿ”Ÿ Comparative Summary Table


This summary acts as a conceptual compass โ€” helping you choose appropriate representation spaces based on data structure, modeling objectives, and theoretical assumptions.

Space Type Key Property Typical Use Case Example Model
Vector Space Linear algebraic basis Tabular data, regression SVM, Logistic Regression
Latent Space Compressed, abstract Autoencoding, generative models VAE, Autoencoder
Graph Space Relational links Molecules, social networks Graph Neural Networks (GNN)
Topological Space Connectivity invariance Sensor topology, persistence TDA (Persistent Homology)
Attention Space Weighted contextual relevance Language, vision, alignment Transformer, BERT
Density Space Probability & uncertainty Generative flows, modeling noise Normalizing Flows, VAEs
Metric Space Distance preservation Similarity tasks, verification Siamese Network, Triplet Net
Energy Space Energy minimization Generation, memory recall EBMs, Hopfield Networks
Manifold Space Curved, structured surface Dimensionality reduction, clustering Isomap, LLE, Diffusion Models
Symbolic Space Logical/formal reasoning Planning, theorem proving Logic Tensor Networks, DeepProbLog

๐Ÿ“š Resources & Articles


๐Ÿ”ฌ Foundational Papers

Title Author(s) Focus
A Few Useful Things to Know about Machine Learning Pedro Domingos Broad ML insights, including importance of representation
Representation Learning: A Review and New Perspectives Yoshua Bengio et al. Seminal work on representation learning
Auto-Encoding Variational Bayes Kingma & Welling Introduced VAEs for probabilistic representations
Inductive Biases, Deep Learning, and Graph Networks Battaglia et al. (DeepMind) Geometric and graph-based representations
Attention is All You Need Vaswani et al. Introduced transformer-based attention representations

๐Ÿ“˜ Books

  • Deep Learning โ€“ Goodfellow, Bengio, Courville
    A classic with deep treatment of representation learning, manifolds, and structured data.
  • Mathematics for Machine Learning โ€“ Deisenroth, Faisal, Ong
    Rigorous treatment of vector spaces, matrices, probability.
  • Geometric Deep Learning โ€“ Bronstein et al.
    Modern foundation for grids, groups, graphs, and geodesics.

๐Ÿงฐ Frameworks & Libraries

Library Purpose
PyTorch Geometric Graph and relational learning
HuggingFace Transformers Attention-based language modeling
scikit-learn Manifold learning, classical models
Giotto-TDA Topological Data Analysis in Python
TensorFlow Probability Probabilistic modeling and distributions
JAX Neural fields, differentiable programming

๐ŸŒ Online Courses & Lectures

  • Stanford CS231n โ€“ Deep Learning for Computer Vision
  • MIT 6.S191 โ€“ Introduction to Deep Learning
  • Oxford Geometric Deep Learning โ€“ Bronsteinโ€™s landmark lectures
  • Yann LeCun: Representation Learning โ€“ Conceptual overview from a pioneer