3.2.4  :  Spatial Indexes

A class of queries that are of interest and that are not trivial to treat efficiently with standard relational database design are those that aim to find objects that are (or have been) spatially close to each other. A typical question is:

Find all halos within a distance of :R> Mpc from a given position (:X,:Y,:Z)
(a parameter in SQL is indicated by a name prefixed with a colon), This can be trivially translated to SQL as follows:
SELECT *
  FROM MPAHaloTrees..MHalo H
 WHERE SQRT((H.X-:X)*(H.X-:X)+(H.Y-:Y)*(H.Y-:Y)+(H.Z-:Z)*(H.Z-:Z)) <= :R
To execute this query though the database can not make use of standard techniques for performance improvement. Such techniques generally require a representation of the data in which desired objects are placed nearby on disk. Disks are basically 1-dimentsional structures, so an ordering in some quantity is generally used. For spatial structures in 2 dimensions or higher, there is no exact way to accomplish this. This implies that the database will have to scan through the whole table and evaluate the expression in the WHERE clause. During this scan the whole table will have to be read from disk and it is this IO part of the query execution which is really the expensive step and should if possible be minimized.

To support spatial queries that perform reasonably well we provide a number of options, some of which are still under development. These are described below.

Zone index

To an object we associate integer valued positional coordinates, obtained from binning the simulation box by a relatively course grid. We have chosen grid cells of size about 10 h-1 Mpc, 503 cells in the Millennium, 73 in the millimil. When the size of the grid cell is L, the index columns are defined as ix=floor(x/L) and similar for y and z.

For an interesting general discussion of this problem and various approaches solutions see the PhD thesis by Volker Markl and references therein.

A very general reference to spatial indexes is