\[ \begin{align}\begin{aligned}\newcommand{\bs}{\boldsymbol} \newcommand{\dp}{\displaystyle} \newcommand{\rm}{\mathrm} \newcommand{\cl}{\mathcal} \newcommand{\pd}{\partial}\\\newcommand{\cd}{\cdot} \newcommand{\cds}{\cdots} \newcommand{\dds}{\ddots} \newcommand{\lag}{\langle} \newcommand{\lv}{\lVert} \newcommand{\ol}{\overline} \newcommand{\od}{\odot} \newcommand{\ra}{\rightarrow} \newcommand{\rag}{\rangle} \newcommand{\rv}{\rVert} \newcommand{\seq}{\subseteq} \newcommand{\td}{\tilde} \newcommand{\vds}{\vdots} \newcommand{\wh}{\widehat}\\\newcommand{\0}{\boldsymbol{0}} \newcommand{\1}{\boldsymbol{1}} \newcommand{\a}{\boldsymbol{\mathrm{a}}} \newcommand{\b}{\boldsymbol{\mathrm{b}}} \newcommand{\c}{\boldsymbol{\mathrm{c}}} \newcommand{\d}{\boldsymbol{\mathrm{d}}} \newcommand{\e}{\boldsymbol{\mathrm{e}}} \newcommand{\f}{\boldsymbol{\mathrm{f}}} \newcommand{\g}{\boldsymbol{\mathrm{g}}} \newcommand{\h}{\boldsymbol{\mathrm{h}}} \newcommand{\i}{\boldsymbol{\mathrm{i}}} \newcommand{\j}{\boldsymbol{j}} \newcommand{\m}{\boldsymbol{\mathrm{m}}} \newcommand{\n}{\boldsymbol{\mathrm{n}}} \newcommand{\o}{\boldsymbol{\mathrm{o}}} \newcommand{\p}{\boldsymbol{\mathrm{p}}} \newcommand{\q}{\boldsymbol{\mathrm{q}}} \newcommand{\r}{\boldsymbol{\mathrm{r}}} \newcommand{\u}{\boldsymbol{\mathrm{u}}} \newcommand{\v}{\boldsymbol{\mathrm{v}}} \newcommand{\w}{\boldsymbol{\mathrm{w}}} \newcommand{\x}{\boldsymbol{\mathrm{x}}} \newcommand{\y}{\boldsymbol{\mathrm{y}}} \newcommand{\z}{\boldsymbol{\mathrm{z}}}\\\newcommand{\A}{\boldsymbol{\mathrm{A}}} \newcommand{\B}{\boldsymbol{\mathrm{B}}} \newcommand{\C}{\boldsymbol{\mathrm{C}}} \newcommand{\D}{\boldsymbol{\mathrm{D}}} \newcommand{\H}{\boldsymbol{\mathrm{H}}} \newcommand{\I}{\boldsymbol{\mathrm{I}}} \newcommand{\K}{\boldsymbol{\mathrm{K}}} \newcommand{\M}{\boldsymbol{\mathrm{M}}} \newcommand{\N}{\boldsymbol{\mathrm{N}}} \newcommand{\P}{\boldsymbol{\mathrm{P}}} \newcommand{\Q}{\boldsymbol{\mathrm{Q}}} \newcommand{\S}{\boldsymbol{\mathrm{S}}} \newcommand{\U}{\boldsymbol{\mathrm{U}}} \newcommand{\W}{\boldsymbol{\mathrm{W}}} \newcommand{\X}{\boldsymbol{\mathrm{X}}} \newcommand{\Y}{\boldsymbol{\mathrm{Y}}} \newcommand{\Z}{\boldsymbol{\mathrm{Z}}}\\\newcommand{\R}{\mathbb{R}}\\\newcommand{\cE}{\mathcal{E}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}}\\\newcommand{\ld}{\lambda} \newcommand{\Ld}{\boldsymbol{\mathrm{\Lambda}}} \newcommand{\sg}{\sigma} \newcommand{\Sg}{\boldsymbol{\mathrm{\Sigma}}} \newcommand{\th}{\theta} \newcommand{\ve}{\varepsilon}\\\newcommand{\mmu}{\boldsymbol{\mu}} \newcommand{\ppi}{\boldsymbol{\pi}} \newcommand{\CC}{\mathcal{C}} \newcommand{\TT}{\mathcal{T}}\\ \newcommand{\bb}{\begin{bmatrix}} \newcommand{\eb}{\end{bmatrix}} \newcommand{\bp}{\begin{pmatrix}} \newcommand{\ep}{\end{pmatrix}} \newcommand{\bv}{\begin{vmatrix}} \newcommand{\ev}{\end{vmatrix}}\\\newcommand{\im}{^{-1}} \newcommand{\pr}{^{\prime}} \newcommand{\ppr}{^{\prime\prime}}\end{aligned}\end{align} \]

Chapter 6 High-dimensional Data

6.1 High-Dimensional Objects

Consider the \(n\times d\) data matrix

\[\begin{split}\left(\begin{array}{c|cccc}&X_1&X_2&\cds&X_d\\ \hline \x_1&x_{11}&x_{12}&\cds&x_{1d}\\\x_2&x_{21}&x_{22}&\cds&x_{2d}\\ \vds&\vds&\vds&\dds&\vds\\\x_n&x_{n1}&x_{n2}&\cds&x_{nd}\end{array}\right)\end{split}\]

where each point \(\x\in\R^d\) and each attribute \(X_j\in\R^n\).

Hypercube

let the minimum and amximum values for each attribute \(X_j\) be given as

\[\min(X_j)=\min_i\{x_{ij}\}\quad\max(X_j)=\max_i\{x_{ij}\}\]

The data hyperspace can be considered as a \(d\)-dimensional hyper-rectangle, defined as

\[ \begin{align}\begin{aligned}R_d&=\prod_{j=1}^d[\min(X_j),\max(X_j)]\\&=\{\x=(x_1,x_2,\cds,x_d)^T|x_j\in[\min(X_j),\max(X_j)],\rm{\ for\ }j=1,\cds,d\}\end{aligned}\end{align} \]

Assume the data is centered to have mean \(\mmu=\0\). Let \(m\) denote the largest absolute value in \(\D\), given as

\[m=\max_{j=1}^d\max_{i=1}^n\{|x_{ij}|\}\]

The data hyperspace can be represented as a hypercube, centered at \(\0\), with all sides of length \(l=2m\), given as

\[H_d(l)=\{\x=(x_1,x_2,\cds,x_d)^T|\forall i,x_i\in[-l/2,l/2]\}\]

The unit hypercube has all sides of length \(l=1\), and is denoted as \(H_d(1)\).

Hypersphere

Assume that the data has been centered, so that \(\mmu=\0\). Let \(r\) denote the largest magnitude among all points:

\[r=\max_i\{\lv\x_i\rv\}\]

The data hyperspace can also be represented as a \(d\)-dimensional hyperball centered at \(\0\) with radius \(r\), defined as

Note

\(B_d(r)=\{\x|\lv\x\rv\leq r\}\)

\[\rm{or\ }B_d(r)=\{\x=(x_1,x_2,\cds,x_d)^T|\sum_{j=1}^dx_j^2\leq r^2\}\]

The surface of the hyperball is called a hypersphere, and it consists of all the points exactly at distance \(r\) from the center of the hyperball, defined, as

Note

\(S_d(r)=\{\x|\lv\x\rv=r\}\)

\[\rm{or\ }S_d(r)=\{\x=(x_1,x_2,\cds,x_d)^T|\sum_{j=1}^d(x_j)^2=r^2\}\]

Because the hyperball consists of all the surface and interior points, it is also called a closed hypersphere.

Hyperplanes

A hyperplane in \(d\) dimensions is given as the set of all points \(\x\in\R^d\) that satisfy the equation \(h(\x)=0\), where \(h(\x)\) is the hyperplane function, defined as follows:

\[h(\x)=\w^T\x+b=w_1x_1+w_2x_2+\cds+w_dx_d+b\]

Here, \(\w\) is a \(d\) dimensional weight vector and \(b\) is a scalar, called the bias. For points that comprise the hyperplane, we have

Note

\(h(\x)=\w^T\x+b=0\)

The hyperplane is thus defined as the set of all points such that \(\w^T\x=-b\).

Note that a hyperplane in \(d\)-dimensions has dimension \(d-1\). A hyperplane splits the original \(d\)-dimensional space into two half-space. Points on one side satisfy the equation \(h(\x)>0\), points on the other side satisfy the equation \(h(\x)<0\), and points on the hyperplane satisfy the condition \(h(\x)=0\).

6.2 High-Dimensional Volumes

Hypercube

Note

\(\rm{vol}(H_d(l))=l^d\)

Hypersphere

Note

\(\dp\rm{vol}(S_d(r))=K_d\cd r^d=\bigg(\frac{\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2}+1)}\bigg)r^d\)

where

\[K_d=\frac{\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2}+1)}\]

is a scalar that depends on the dimenionality \(d\), and \(\Gamma\) is the gamma function defined as

\[\Gamma(\alpha)=\int_0^\infty x^{\alpha-1}e^{-x}dx\]

The gamma function has the following property for any \(\alpha>1\)

\[\Gamma(\alpha)=(\alpha-1)\Gamma(\alpha-1)\]

For any integer \(n\geq 1\), we immediately have

\[\Gamma(n)=(n-1)!\]

When \(d\) is even, then \(\frac{d}{2}+1\) is an integer, and we have

\[\Gamma\bigg(\frac{d}{2}+1\bigg)=\bigg(\frac{d}{2}\bigg)!\]

and when \(d\) is odd, we have

\[\Gamma\bigg(\frac{d}{2}+1\bigg)=\bigg(\frac{d}{2}\bigg)\bigg(\frac{d-2}{2} \cds\bigg(\frac{d-(d-1)}{2}\bigg)\Gamma\bigg(\frac{1}{2}\bigg)= \bigg(\frac{d!!}{2^{(d+1)/2}}\bigg)\sqrt\pi\]

where \(d!!\) denotes the double factorial (or multifactorial), given as

\[\begin{split}d!!=\left\{\begin{array}{lr}1\quad\rm{if\ }d=0\rm{\ or\ }d=1\\d\cd(d-2)!!\quad\rm{if\ }d\geq 2\end{array}\right.\end{split}\]

Putting it all together we have

Note

\(\dp\Gamma\bigg(\frac{d}{2}+1\bigg)=\) \(\left\{\begin{array}{lr}(\frac{d}{2})!\quad\rm{if\ }d\rm{\ is\ even}\\\sqrt\pi(\frac{d!!}{2^{(d+1)/2}})\quad\rm{if\ }d\rm{\ is\ odd}\end{array}\right.\)

Surface Area

The surface area of the hypersphere can be obtained by differentiating its volume with respect to \(r\), given as

\[\rm{area}(S_d(r))=\frac{d}{dr}\rm{vol}(S_d(r))= \bigg(\frac{\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2}+1)}\bigg)dr^{d-1}= \bigg(\frac{2\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2})}\bigg)r^{d-1}\]

Asymptotic Volume

For the unit hypersphere with \(r=1\)

Note

\(\dp\lim_{d\ra\infty}\rm{vol}(S_d(1))=\lim_{d\ra\infty}\frac{\pi^{\frac{d}{2}}}{\Gamma(\frac{d}{2}+1)}\ra 0\)

6.3 Hypersphere Inscribed Within Hypercube

In two dimensions, we have

\[\frac{\rm{vol}(S_2(r))}{\rm{vol}(H_2(2r))}=\frac{\pi r^2}{4r^2}=\frac{\pi}{4}=78.5\%\]

In three dimensions, we have

\[\frac{\rm{vol}(S_3(r))}{\rm{vol}(H_3(2r))}=\frac{\frac{4}{3}\pi r^3}{8r^3}=\frac{\pi}{6}=52.4\%\]

As the dimensionality \(d\) increases asymptotically, we get

Note

\(\dp\lim_{d\ra\infty}\frac{\rm{vol}(S_d(r))}{\rm{vol}(H_d(2r))}=\) \(\dp\lim_{d\ra\infty}\frac{\pi^{\frac{d}{2}}}{2^d\Gamma(\frac{d}{2}+1)}\ra 0\)

This means that as the dimensionality increases, most of the volume of the hypercube is in the “corners”, whereas the center is essentially empty.

6.4 Volume of Thin Hypersphere Shell

Let \(S_d(r,\epsilon)\) denote the thin hypershell of width \(\epsilon\). Its volume is given as

\[\rm{vol}(S_d(r,\epsilon))=\rm{vol}(S_d(r))-\rm{vol}(S_d(r-\epsilon))=K_dr^d-K_d(r-\epsilon)^d.\]

Let us consider the ratio of the volume of the thin shell to the volume of the outer sphere:

\[\frac{\rm{vol}(S_d(r,\epsilon))}{\rm{vol}(S_d(r))}= \frac{K_dr^d-K_d(r-\epsilon)^d}{K_dr^d}=1-\bigg(1-\frac{\epsilon}{r}\bigg)^d\]

Asymptotic Volume

As \(d\) increases, in the limit we obtain

Note

\(\dp\lim_{d\ra\infty}\frac{\rm{vol}(S_d(r,\epsilon))}{\rm{vol}(S_d(r))}=\) \(\dp\lim_{d\ra\infty}1-\bigg(1-\frac{\epsilon}{r}\bigg)^d\ra 1\)

That is, almoast all of the volume of the hypersphere is contained in the thin shell as \(d\ra\infty\). This means that in high-dimensional spaces, unlike in lower dimensions, most of the volume is concentrated around the surface (within \(\epsilon\)) of the hypersphere, and the center is essentially void.

6.5 Diagonals in Hyperspace

Consider the angle \(\th_d\) betwen the ones vector \(\1\) and the first standard basis vector \(\e_1\), in \(d\) dimensions:

\[\cos\th_d=\frac{\e_1^T\1}{\lv\e_1\rv\lv\1\rv}= \frac{\e_1^T\1}{\sqrt{\e_1^T\e_1}\sqrt{\1^T\1}}=\frac{1}{\sqrt{1}\sqrt{d}}= \frac{1}{\sqrt{d}}\]

Asymptotic Angle

As \(d\) increases, the angle between the \(d\)-dimensional ones vector \(\1\) and the first axis vector \(\e_1\) is given as

\[\lim_{d\ra\infty}\cos\th_d=\lim_{d\ra\infty}\frac{1}{\sqrt{d}}\ra 0\]

which implies that

Note

\(\dp\lim_{d\ra\infty}\th_d\ra\frac{\pi}{2}=90^\circ\)

This implies that in high dimensions all of the diagonal vectors are perpendicular (or orthogonal) to all the coordinates axes. Because there are \(2^d\) corners in a \(d\)-dimensional hyperspace, there are \(2^d\) diagonal vectors from the origin to each of the corners. Because the diagonal vectors in opposite directions define a new axis, we obtain \(2^{d-1}\) new axes, each of which is essentially orthogonal to all of the \(d\) principal coordinate axes.

6.6 Density of The Multivariate Normal

Consider the probability of a point being with a fraction \(\alpha>0\), of the peack density at the mean.

For a multivariate normal distribution, with \(\mmu=\0_d\), and \(\Sg=\I_d\), we have

\[f(\x)=\frac{1}{(\sqrt{2\pi})^d}\exp\bigg\{-\frac{\x^T\x}{2}\bigg\}\]

At the mean \(\mmu=\0_d\), the peak density is \(f(\0_d)=\frac{1}{(\sqrt{2\pi})^d}\). Thus, the set of points \(\x\) with density at least \(\alpha\) fraction of the density at the mean, with \(0<\alpha<1\), is given as

\[\frac{f(\x)}{f(\0)}\geq\alpha\]

which implies that

\[ \begin{align}\begin{aligned}\exp\bigg\{-\frac{\x^T\x}{2}\bigg\}&\geq\alpha\\\rm{or\ }\x^T\x&\leq-2\ln(\alpha)\\\rm{and\ thus\ }\sum_{i=1}^d(x_i)^2&\leq-2\ln(\alpha)\end{aligned}\end{align} \]

The probability that a point \(\x\) is within \(\alpha\) times the density at the mean can be computed from the \(\chi_d^2\) density function

Note

\(\dp P\bigg(\frac{f(\x)}{f(\0)}\geq\alpha\bigg)=P(\x^T\x\leq-2\ln(\alpha))=\) \(\dp\int_0^{-2\ln(\alpha)}f_{\chi_d^2}(\x^T\x)=F_{\chi_d^2}(-2\ln(\alpha))\)

where \(f_{\chi_d^2}\) is the chi-squared probability density function with \(q\) degrees of freedom:

\[f_{\chi_d^2}(x)=\frac{1}{2^{q/2}\Gamma(q/2)}x^{\frac{q}{2}-1}e^{-\frac{x}{2}}\]

and \(F_{\chi_d^2}\) is its cumulative distribution function.

As dimensionality increases, this probability decreases sharply, and eventually tends to zero, that is,

Note

\(\dp\lim_{d\ra\infty}P(\x^T\x\leq-2\ln(\alpha))\ra 0\)

Thus, in higher dimensions the probability density around the mean decreases very rapidly as one moves away from the mean. In essence the entire probability mass migrates to the tail regions.

Distance of Points from the Mean

Let \(r^2\) denote the square of the distance of a point \(\x\) to the center \(\mmu=\0\), given as

\[r^2=\lv\x-\0\rv^2=\x^T\x=\sum_{i=1}^dx_i^2\]

\(\x^T\x\) follows a \(\chi^2\) distribution with \(d\) degress of freedom, which has mean \(d\) and variance \(2d\). It follows that the mean and variance of the random variable \(r^2\) is

\[\mu_{r^2}=d\quad\sg_{r^2}^2=2d\]

We conclude that for large \(d\), the radius \(r\) follows a normal distribution with mean \(\sqrt{d}\) and standard deviation \(1/\sqrt{2}\). The density at the mean distance \(r=\sqrt{d}\) is exponentially smaller than that at the peak density because

\[\frac{f(\x)}{f(\0)}=\exp\{-\x^T\x/2\}=\exp\{-d/2\}\]

Whereas the density of the standard multivariate normal is maximized at the center \(\0\), most of the probability mass is concentrated in a small band around the mean distance of \(\sqrt{d}\) from the center.