334 Geometric Embedding of Multidimensional Random Variables: Geometric Operations of Joint Distributions, Marginals, and Conditionals
20
0
·
2026/05/25
·
10 mins read
☕
WriterShelf™ is a unique multiple pen name blogging and forum platform. Protect relationships and your privacy. Take your writing in new directions. ** Join WriterShelf**
WriterShelf™ is an open writing platform. The views, information and opinions in this article are those of the author.
Article info
This article is part of:
分類於:
⟩
⟩
合計:2462字
Like
or Dislike
About the Author
I love science as much as art, logic as deeply as emotion.
I write the softest human stories beneath the hardest sci-fi.
May words bridge us to kindred spirits across the world.
More from this author
More to explore
Paper 3: Geometric Embedding of Multidimensional Random Variables: Geometric Operations of Joint Distributions, Marginals, and Conditionals
Author: Zhang Suhang
Affiliation: Luoyang, Henan
---
Abstract
This paper extends the probabilistic-geometric isomorphism framework of Paper 1 and the one-dimensional geometric realizations of Paper 2 to multidimensional random variables. We prove that any n‑dimensional joint distribution p(x_1,\dots,x_n) can be uniquely embedded as a hypersurface (or more generally a Riemannian manifold) in \mathbb{R}^{n+1} such that the probability density function equals the exponential of some curvature function or height function of that hypersurface. The core results are as follows:
1. Marginalization: Integrating the joint distribution hypersurface along a coordinate direction is equivalent to orthogonal projection onto the lower-dimensional coordinate plane; the weighted volume of the projection yields the marginal distribution.
2. Conditioning: Fixing some coordinates gives the conditional distribution, which corresponds to a lower-dimensional profile obtained by slicing the hypersurface with a parallel plane; the normalized measure on the slice is the conditional distribution.
3. Independence: Random variables are independent iff the joint hypersurface decomposes as a direct product of coordinate subspaces; then the potential function is additive and the volume element factorizes.
4. Bayes’ theorem: The posterior distribution corresponds to renormalization on a slice; the prior corresponds to the geometric weight of the slice.
We provide explicit geometric constructions for the two‑dimensional case and show that the hypersurface of a multivariate normal distribution is an elliptic paraboloid, whose projections and slices remain paraboloids (i.e., marginal and conditional normality). This paper supplies a purely geometric language for high‑dimensional probabilistic inference and lays the foundation for Paper 4 (axiomatic reconstruction) and Paper 5 (stochastic processes).
---
Keywords
Multivariate distributions; geometric embedding; marginalization as projection; conditioning as slicing; independence as direct product; geometrization of Bayes’ theorem
---
§1 Introduction
Paper 1 established a probabilistic-geometric isomorphism: every probability distribution corresponds to a geometric potential function h(x)=-\log p(x), and probability becomes the weighted volume \int e^{-h}d\nu. Paper 2 gave intuitive density‑curve realizations for all common one‑dimensional distributions, where the normal distribution corresponds to a parabola. However, real‑world applications often involve high‑dimensional random vectors – for example, multivariate normal distributions, conditional distributions in regression analysis, marginalization in latent variable models, etc. In high dimensions, geometric intuition becomes even more powerful because we can imagine the joint distribution as a surface or hypersurface.
The goal of this paper is to realize the joint density function p(x_1,\dots,x_n) as a graph in \mathbb{R}^{n+1}:
\Gamma = \{(x_1,\dots,x_n, z) : z = h(x_1,\dots,x_n)\},
where h = -\log p (the potential realization), or more intuitively z = p(x) (the density‑surface realization). We mainly adopt the density‑surface realization because it makes probability directly correspond to a volume element:
P(X \in A) = \int_A p(x)\,dx = \int_{x\in A}\int_{z=0}^{p(x)} dz\,dx = \text{volume under the surface } z=p(x).
For n=2, the joint density is a surface in three dimensions, probability is the volume under the surface, the marginal distribution is the projected area onto a coordinate axis, and the conditional distribution is a slice curve of the surface.
The geometric operations described here will fundamentally change the way we think about high‑dimensional probabilistic inference: instead of performing complicated multiple integrals, we can directly read off results through geometric actions such as projection, slicing, and decomposition.
Organization: §2 establishes the general construction of multidimensional geometric embedding; §3 uses the two‑dimensional case to illustrate geometric graphs and operations; §4 proves marginalization = projection; §5 proves conditioning = slicing; §6 proves independence = direct product decomposition; §7 gives the geometric version of Bayes’ theorem; §8 verifies all operations with the multivariate normal distribution; §9 discusses higher dimensions (n>2) and computational geometry implications; §10 concludes.
---
§2 Construction of Multidimensional Geometric Embedding
2.1 Density‑surface embedding
Let X = (X_1,\dots,X_n) be a continuous random vector with joint density function p:\mathbb{R}^n\to[0,\infty). Define the embedding map
\iota: \mathbb{R}^n \to \mathbb{R}^{n+1},\qquad \iota(x) = (x,\, p(x)).
Its image \Sigma = \iota(\mathbb{R}^n) is an n-dimensional hypersurface (if p is smooth). We call \Sigma the density surface. For any Borel set A\subseteq\mathbb{R}^n,
P(X\in A) = \int_A p(x)\,dx = \operatorname{Vol}_n\bigl(\{(x,z): x\in A,\; 0\le z\le p(x)\}\bigr),
i.e., the (n+1)-dimensional volume of the cylindrical region between the hypersurface \Sigma and the plane z=0. Here “volume” refers to Lebesgue measure in \mathbb{R}^{n+1}.
2.2 Potential‑function embedding (alternative)
To be consistent with Paper 1, we could also use the potential embedding \iota_h(x) = (x, h(x)) with h=-\log p. Then P(A)=\int_A e^{-h(x)}dx, which is not directly the volume under a surface but an integral of a weight factor over the surface. The potential form is more convenient in theoretical derivations (especially connections to exponential families and information geometry), while the density surface is more intuitive for visualizing geometric operations. This paper mainly uses the density surface, but we will point out the connection to potential functions when necessary.
2.3 Discrete and mixed cases
For discrete or mixed distributions, a similar embedding can be constructed: if X takes values in a discrete set, the surface degenerates into vertical line segments over points; if partly continuous and partly discrete, a mixed measure can be used. For simplicity, this paper assumes absolutely continuous distributions; the conclusions can be naturally extended.
---
§3 Two‑Dimensional Case: Joint Density Surface
Let n=2 and let p(x,y) be the joint density. Embedding into three dimensions yields the surface z = p(x,y). This surface has the following geometric features:
· The total volume under the surface equals 1 (because \iint p(x,y)\,dxdy = 1).
· The surface is non‑negative and meets the plane z=0 at infinity (or at the boundary if the support is bounded).
We now associate three geometric operations with core concepts of probability theory.
Figure 1 (described in words): A typical bivariate normal density surface looks like a mountain. Projection onto the x-direction (integrating out y) yields a bell‑shaped curve (the marginal density p_X(x)). For a fixed x=x_0, a vertical slice yields a curve z = p(x_0,y); after normalization this becomes the conditional density p(y|x_0).
---
§4 Marginalization = Projection
Theorem 4.1 (Marginalization as projection).
Let (X,Y) have joint density p(x,y) and let \Sigma: z=p(x,y) be its density surface. Then the marginal density
p_X(x) = \int_{-\infty}^{\infty} p(x,y)\,dy
equals the projection area density of \Sigma in the y-direction. More precisely, for each fixed x, consider the area under the curve y\mapsto p(x,y); that area is p_X(x). Geometrically, this amounts to intersecting \Sigma with a plane x=\text{constant} parallel to the y-axis, and then measuring the area between the resulting curve and the plane z=0.
Proof: Fix x and define f_x(y)=p(x,y). The area under the curve z=f_x(y) in the yz-plane is \int f_x(y)\,dy = \int p(x,y)\,dy = p_X(x). Hence “integrating along y” is exactly computing the area under that cross‑sectional curve. This is the density of the orthogonal projection onto the xz-plane (with area weighting). ∎
Corollary 4.2. Orthogonal projection of the joint surface onto the coordinate plane x (by accumulating “volume projection” along y) directly yields the marginal density function.
Geometric operation (to compute marginal p_X):
1. Slice the surface \Sigma with a family of planes x=\text{constant} perpendicular to the x-axis.
2. For each cross‑section, compute the area between the curve and z=0.
3. Take these areas as a function of x; that function is p_X(x).
This operation uses only geometric measurements, no probabilistic language.
---
§5 Conditioning = Slicing
Theorem 5.1 (Conditional distribution as a slice).
The conditional density
p(y|x) = \frac{p(x,y)}{p_X(x)}
is obtained geometrically as follows: at a fixed x=x_0, take the vertical plane x=x_0 to cut the surface \Sigma, producing the cross‑sectional curve z = p(x_0,y); then normalize this curve so that the area under it becomes 1, i.e.,
p(y|x_0) = \frac{p(x_0,y)}{\int p(x_0,y)\,dy}.
The normalizing factor is precisely the marginal density p_X(x_0) (from Theorem 4.1).
Proof: Direct from definition. Geometrically, the area under the slice curve z=p(x_0,y) is p_X(x_0). Dividing the height of the curve by this area gives a new curve \tilde{z}=p(y|x_0) that satisfies \int \tilde{z}\,dy = 1 and has the same shape up to vertical scaling. ∎
Geometric operation (given x_0, find conditional distribution):
1. Cut the surface with the plane x=x_0 to obtain the slice curve.
2. Measure the total area A = p_X(x_0) under this curve.
3. Scale the curve heights by 1/A to obtain the normalized conditional density curve.
4. This curve completely determines the conditional distribution (one can further compute conditional expectations, etc.).
Remark: For discrete conditioning, similar operations apply (point‑set slicing + normalization). For mixed cases, slicing may produce singular distributions, but the normalization principle remains the same.
---
§6 Independence = Direct Product Decomposition
Theorem 6.1 (Geometric criterion for independence).
Random variables X and Y are independent if and only if the joint density surface \Sigma can be expressed as a direct product of two lower‑dimensional surfaces; i.e., there exist functions f(x) and g(y) such that
p(x,y) = f(x)\,g(y),
with \int f(x)\,dx = \int g(y)\,dy = 1. Then \Sigma is a product surface: the height at (x,y) equals the product f(x)g(y). Geometrically this means:
· All cross‑sections parallel to the x-axis have the same shape (up to a constant factor) and are proportional to g(y);
· All cross‑sections parallel to the y-axis have the same shape (up to a constant factor) and are proportional to f(x);
· The level curves of the surface form a rectangular grid.
Proof: Independence \Leftrightarrow p(x,y)=p_X(x)p_Y(y). Set f=p_X, g=p_Y. The geometric properties follow directly from the product form: for fixed x, the slice curve z = p_X(x)p_Y(y) is proportional to p_Y(y) with factor p_X(x). ∎
Corollary 6.2. If independence holds, then the conditional distribution p(y|x)=p_Y(y) does not depend on x; geometrically, all slice curves coincide after normalization.
Geometric test for independence:
1. Take two different cross‑sections at x_1 and x_2 on the surface; after normalization, if the resulting curves coincide, then Y is independent of X.
2. Alternatively, check whether the Gaussian curvature of the surface factorizes as a product of two functions (in appropriate coordinates).
---
§7 Geometric Version of Bayes’ Theorem
Consider a prior distribution \pi(\theta) and a likelihood function L(x|\theta). The joint density is p(x,\theta) = L(x|\theta)\pi(\theta). The posterior density is \pi(\theta|x) = p(x,\theta)/p_X(x).
Geometric operation (Bayesian updating):
· The joint surface z = p(x,\theta) contains all the information.
· After observing X=x_0, slice the surface with the plane x=x_0 to obtain the curve z = p(x_0,\theta).
· The area under this curve is the marginal likelihood p_X(x_0).
· Normalizing this curve yields the posterior \pi(\theta|x_0) = p(x_0,\theta)/p_X(x_0).
Geometric interpretation: Bayesian inference is nothing but slicing along the coordinate of the observed variable and normalizing. The prior information is encoded in the global shape of the surface along the \theta direction, while the likelihood manifests through the variation in the x-direction.
---
§8 Example: Geometry of the Multivariate Normal Distribution
8.1 Bivariate normal
Let (X,Y) follow a bivariate normal distribution with mean vector \mu=(\mu_X,\mu_Y) and covariance matrix
\Sigma = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix}.
The joint density is
p(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} - \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} \right] \right).
Geometric surface: This is an ellipsoidal mound; its level curves are ellipses. The potential function h=-\log p is a quadratic form:
h(x,y) = \frac{1}{2}(x-\mu_X,\; y-\mu_Y)\,\Sigma^{-1}\,(x-\mu_X,\; y-\mu_Y)^{\mathsf{T}} + \text{constant},
i.e., an elliptic paraboloid.
Marginal distribution: Projection onto the x-axis gives the univariate normal N(\mu_X,\sigma_X^2). Geometrically, for any fixed x, the slice y\mapsto p(x,y) is a curve proportional to N(\mu_{Y|x},\sigma_{Y|x}^2); the area under this curve is precisely the marginal density p_X(x) – a one‑dimensional normal curve.
Conditional distribution: Fix x=x_0. The slice p(x_0,y) is proportional to N(\mu_{Y|x_0},\sigma_{Y|x}^2); after normalization it becomes the conditional normal. Geometrically, these slice curves all have the same Gaussian bell shape, but their centers and widths vary linearly with x_0.
Independence: When \rho=0, p(x,y)=p_X(x)p_Y(y), the surface decomposes as a product of two one‑dimensional normal surfaces, and the slice shapes no longer change with x (except for the scaling factor).
8.2 High‑dimensional normal
For an n-dimensional normal, the density hypersurface z=p(x) lives in \mathbb{R}^{n+1}. Marginal distributions correspond to orthogonal projections onto lower‑dimensional coordinate hyperplanes, conditional distributions correspond to slices parallel to coordinate axes, and independence corresponds to block‑diagonalization of the covariance matrix. All operations remain geometrically clear.
---
§9 Higher Dimensions and Computational Geometry Implications
For n>2, the joint density hypersurface cannot be visualized directly, but the geometric operations are still well‑defined:
· Marginalization: Integrating over some coordinates is equivalent to projecting onto the complementary coordinate hyperplane (accumulating volume under the hypersurface).
· Conditioning: Fixing some coordinates and taking a slice parallel to the remaining coordinates yields a lower‑dimensional hypersurface; normalization gives the conditional distribution.
· Independence: The joint hypersurface decomposes as a direct product of two lower‑dimensional hypersurfaces.
Computational geometry potential: Traditional multivariate probability calculations (e.g., high‑dimensional integrals) are often difficult. The geometric perspective suggests using Monte Carlo volume estimation, adaptive slice sampling, and other geometric algorithms to approximate marginals and conditionals. For example, the marginal density p_X(x) equals the cross‑sectional area of the hypersurface at fixed x along the other coordinates; this can be estimated by sampling on the low‑dimensional slice. This provides new ideas for high‑dimensional Bayesian computation.
---
§10 Conclusion and Outlook
This paper completes the third step of the probabilistic‑geometric isomorphism framework: multidimensional embedding. We have shown:
· Joint distribution \leftrightarrow hypersurface;
· Marginalization \leftrightarrow orthogonal projection (integration);
· Conditioning \leftrightarrow slicing and normalization;
· Independence \leftrightarrow direct product decomposition;
· Bayes’ theorem \leftrightarrow slicing + normalization.
These results allow high‑dimensional probabilistic inference to be carried out entirely in geometric space, without explicitly writing multiple integrals. Together with the isomorphism framework of Paper 1 and the one‑dimensional realizations of Paper 2, we now possess a complete geometric toolkit ranging from low to high dimensions and from continuous to discrete distributions.
Next steps: Paper 4 will prove the equivalence between probability axioms and geometric axioms, thereby declaring the completion of the unified framework. Paper 5 will extend static hypersurfaces to dynamic geometric flows, covering stochastic processes.
---
Appendix A: Text Description of Two‑Dimensional Geometry
Figure A.1: Joint density surface z=p(x,y), shaped like a hill. Labels:
· The base is the xy-plane.
· At a fixed x=x_0, take a vertical plane parallel to the yz-plane; its intersection with the surface gives a curve (slice).
· The area between this curve and z=0 equals p_X(x_0).
· Normalizing this curve gives the conditional density p(y|x_0).
Figure A.2: Superposition of slice curves for two different x_0. If independent, the normalized curves coincide exactly; if dependent, the curves change (their positions and shapes vary with x_0).
---
References
[1] Zhang Suhang. Foundational Paradigm of Probabilistic-Geometric Isomorphism: From Gaussian Distributions to General Measure Correspondences, 2026. (Paper 1)
[2] Zhang Suhang. Geometric Realizations of One-Dimensional Probability Distributions: Bell Curve, Staircase, Lattice, and Fractal, 2026. (Paper 2)
[3] Anderson, T. W. An Introduction to Multivariate Statistical Analysis. Wiley, 2003.
[4] Billingsley, P. Probability and Measure. Wiley, 1995.
[5] Cover, T. M., & Thomas, J. A. Elements of Information Theory. Wiley, 2006.
[6] Chen Xiru. Advanced Mathematical Statistics. University of Science and Technology of China Press, 1999. (In Chinese)
---
(End of paper)