Experimental Study of Random Projections Below the JL Limit
Metadata[+] Show full item record
Random projection is a method used to reduce dimensionality of desired objects with pair-wise distances preserved at a relatively high probability. The mathematical theory behind this is called the Johnson-Lindenstrauss (JL) lemma. So, the basic idea of the JL lemma is that a set of points in a high dimensional space p are randomly projected down to a lower dimensional space q. This q can be as low as q0 to still make sure that with a certain probability the projected pair-wise distances are within [plus-minus][epsilon], of the pairwise distances before the projection, where plus or minus [eplison] is usually a very small value. This technique has already been used in a variety of areas like clustering, image and text data processing. Lots of researchers have already studied the properties and performance of the JL lemma above q0 (q0 is usually called the JL limit or JL bound), where q = p-1, p-2,..., q0, but no research has investigated using the JL lemma below the JL limit (q = q0-1, q0-2,..., 2). With much lower dimension, the data processing, storing almost everything is going to be so much easier. We can visualize the clustering information about data sets in 2D plots. One thing should not be forgotten is that the distance preservation is probabilistic. How well will the distances being preserved below the JL bound? Will it affect or even completely destroy the cluster structure after the projection? What is a good projection method? We are going to study and answer these questions as much as we can in this thesis.