Friday, May 27, 2011

Handling bad catalog data

I am making progress with processing the "Hyg 2.0" dataset.  One of the challenges in dealing with any large body of data is dealing with imperfect data reduction.  The Hipparcos mission is a dismaying introduction to this  issue for people accustomed to dealing with hand-editable volumes of data.  The author of the "Hyg 2.0" file made this decision:

Distance: The star's distance in parsecs, the most common unit in astrometry. To convert parsecs to light years, multiply by 3.262. A value of 10000000 indicates missing or dubious (e.g., negative) parallax data in Hipparcos.
 There are 705 such stars in the data.  Only two are brighter than 6th Magnitude; only 1 has a Flamsteed number.  Amazingly, one is in Gliese.  The brightest are hot giants and so will be well out of our range of interest.

For the solitaries with simple spectral types it's not hard to compute a magnitude-based distance and recompute the distance.  For multiple star systems it gets a bit more complicated.  My thinking right now is to ignore the lot.  Nothing there is going to be familiar enough to draw attention to itself for anyone but an expert; I may re-digest them later but I think that the "Backfill" approach will be close enough for gaming and fiction purposes.

