Chapter 6 High-dimensional Data
6.1 High-Dimensional Objects
Consider the \(n\times d\) data matrix
where each point \(\x\in\R^d\) and each attribute \(X_j\in\R^n\).
Hypercube
let the minimum and amximum values for each attribute \(X_j\) be given as
The data hyperspace can be considered as a \(d\)-dimensional hyper-rectangle, defined as
Assume the data is centered to have mean \(\mmu=\0\). Let \(m\) denote the largest absolute value in \(\D\), given as
The data hyperspace can be represented as a hypercube, centered at \(\0\), with all sides of length \(l=2m\), given as
The unit hypercube has all sides of length \(l=1\), and is denoted as \(H_d(1)\).
Hypersphere
Assume that the data has been centered, so that \(\mmu=\0\). Let \(r\) denote the largest magnitude among all points:
The data hyperspace can also be represented as a \(d\)-dimensional hyperball centered at \(\0\) with radius \(r\), defined as
Note
\(B_d(r)=\{\x|\lv\x\rv\leq r\}\)
The surface of the hyperball is called a hypersphere, and it consists of all the points exactly at distance \(r\) from the center of the hyperball, defined, as
Note
\(S_d(r)=\{\x|\lv\x\rv=r\}\)
Because the hyperball consists of all the surface and interior points, it is also called a closed hypersphere.
Hyperplanes
A hyperplane in \(d\) dimensions is given as the set of all points \(\x\in\R^d\) that satisfy the equation \(h(\x)=0\), where \(h(\x)\) is the hyperplane function, defined as follows:
Here, \(\w\) is a \(d\) dimensional weight vector and \(b\) is a scalar, called the bias. For points that comprise the hyperplane, we have
Note
\(h(\x)=\w^T\x+b=0\)
The hyperplane is thus defined as the set of all points such that \(\w^T\x=-b\).
Note that a hyperplane in \(d\)-dimensions has dimension \(d-1\). A hyperplane splits the original \(d\)-dimensional space into two half-space. Points on one side satisfy the equation \(h(\x)>0\), points on the other side satisfy the equation \(h(\x)<0\), and points on the hyperplane satisfy the condition \(h(\x)=0\).
6.2 High-Dimensional Volumes
Hypercube
Note
\(\rm{vol}(H_d(l))=l^d\)
Hypersphere
Note
\(\dp\rm{vol}(S_d(r))=K_d\cd r^d=\bigg(\frac{\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2}+1)}\bigg)r^d\)
where
is a scalar that depends on the dimenionality \(d\), and \(\Gamma\) is the gamma function defined as
The gamma function has the following property for any \(\alpha>1\)
For any integer \(n\geq 1\), we immediately have
When \(d\) is even, then \(\frac{d}{2}+1\) is an integer, and we have
and when \(d\) is odd, we have
where \(d!!\) denotes the double factorial (or multifactorial), given as
Putting it all together we have
Note
\(\dp\Gamma\bigg(\frac{d}{2}+1\bigg)=\) \(\left\{\begin{array}{lr}(\frac{d}{2})!\quad\rm{if\ }d\rm{\ is\ even}\\\sqrt\pi(\frac{d!!}{2^{(d+1)/2}})\quad\rm{if\ }d\rm{\ is\ odd}\end{array}\right.\)
Surface Area
The surface area of the hypersphere can be obtained by differentiating its volume with respect to \(r\), given as
Asymptotic Volume
For the unit hypersphere with \(r=1\)
Note
\(\dp\lim_{d\ra\infty}\rm{vol}(S_d(1))=\lim_{d\ra\infty}\frac{\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2}+1)}\ra 0\)
6.3 Hypersphere Inscribed Within Hypercube
In two dimensions, we have
In three dimensions, we have
As the dimensionality \(d\) increases asymptotically, we get
Note
\(\dp\lim_{d\ra\infty}\frac{\rm{vol}(S_d(r))}{\rm{vol}(H_d(2r))}=\) \(\dp\lim_{d\ra\infty}\frac{\pi^{\frac{d}{2}}}{2^d\Gamma(\frac{d}{2}+1)}\ra 0\)
This means that as the dimensionality increases, most of the volume of the hypercube is in the “corners”, whereas the center is essentially empty.
6.4 Volume of Thin Hypersphere Shell
Let \(S_d(r,\epsilon)\) denote the thin hypershell of width \(\epsilon\). Its volume is given as
Let us consider the ratio of the volume of the thin shell to the volume of the outer sphere:
Asymptotic Volume
As \(d\) increases, in the limit we obtain
Note
\(\dp\lim_{d\ra\infty}\frac{\rm{vol}(S_d(r,\epsilon))}{\rm{vol}(S_d(r))}=\) \(\dp\lim_{d\ra\infty}1-\bigg(1-\frac{\epsilon}{r}\bigg)^d\ra 1\)
That is, almoast all of the volume of the hypersphere is contained in the thin shell as \(d\ra\infty\). This means that in high-dimensional spaces, unlike in lower dimensions, most of the volume is concentrated around the surface (within \(\epsilon\)) of the hypersphere, and the center is essentially void.
6.5 Diagonals in Hyperspace
Consider the angle \(\th_d\) betwen the ones vector \(\1\) and the first standard basis vector \(\e_1\), in \(d\) dimensions:
Asymptotic Angle
As \(d\) increases, the angle between the \(d\)-dimensional ones vector \(\1\) and the first axis vector \(\e_1\) is given as
which implies that
Note
\(\dp\lim_{d\ra\infty}\th_d\ra\frac{\pi}{2}=90^\circ\)
This implies that in high dimensions all of the diagonal vectors are perpendicular (or orthogonal) to all the coordinates axes. Because there are \(2^d\) corners in a \(d\)-dimensional hyperspace, there are \(2^d\) diagonal vectors from the origin to each of the corners. Because the diagonal vectors in opposite directions define a new axis, we obtain \(2^{d-1}\) new axes, each of which is essentially orthogonal to all of the \(d\) principal coordinate axes.
6.6 Density of The Multivariate Normal
Consider the probability of a point being with a fraction \(\alpha>0\), of the peack density at the mean.
For a multivariate normal distribution, with \(\mmu=\0_d\), and \(\Sg=\I_d\), we have
At the mean \(\mmu=\0_d\), the peak density is \(f(\0_d)=\frac{1}{(\sqrt{2\pi})^d}\). Thus, the set of points \(\x\) with density at least \(\alpha\) fraction of the density at the mean, with \(0<\alpha<1\), is given as
which implies that
The probability that a point \(\x\) is within \(\alpha\) times the density at the mean can be computed from the \(\chi_d^2\) density function
Note
\(\dp P\bigg(\frac{f(\x)}{f(\0)}\geq\alpha\bigg)=P(\x^T\x\leq-2\ln(\alpha))=\) \(\dp\int_0^{-2\ln(\alpha)}f_{\chi_d^2}(\x^T\x)=F_{\chi_d^2}(-2\ln(\alpha))\)
where \(f_{\chi_d^2}\) is the chi-squared probability density function with \(q\) degrees of freedom:
and \(F_{\chi_d^2}\) is its cumulative distribution function.
As dimensionality increases, this probability decreases sharply, and eventually tends to zero, that is,
Note
\(\dp\lim_{d\ra\infty}P(\x^T\x\leq-2\ln(\alpha))\ra 0\)
Thus, in higher dimensions the probability density around the mean decreases very rapidly as one moves away from the mean. In essence the entire probability mass migrates to the tail regions.
Distance of Points from the Mean
Let \(r^2\) denote the square of the distance of a point \(\x\) to the center \(\mmu=\0\), given as
\(\x^T\x\) follows a \(\chi^2\) distribution with \(d\) degress of freedom, which has mean \(d\) and variance \(2d\). It follows that the mean and variance of the random variable \(r^2\) is
We conclude that for large \(d\), the radius \(r\) follows a normal distribution with mean \(\sqrt{d}\) and standard deviation \(1/\sqrt{2}\). The density at the mean distance \(r=\sqrt{d}\) is exponentially smaller than that at the peak density because
Whereas the density of the standard multivariate normal is maximized at the center \(\0\), most of the probability mass is concentrated in a small band around the mean distance of \(\sqrt{d}\) from the center.