Introducing First-Sighting Rate

Well, this is interesting. We’ve got a new graph on oceansofnyc.com/stats/ which shows the “first sighting rate” on each day, compared to what we’d expect given the TLC data.

The first-sighting rate is the fraction of recent sightings that turned out to be brand-new Oceans, i.e., not already in the catalog. Early on, it was near 100% — almost every sighting added a new vehicle. As the catalog has filled in, the rate has dropped, because more and more sightings are duplicates of Oceans we’ve already logged.

The interesting move is comparing the actual first-sighting rate to a theoretical rate computed from the NYC TLC dataset. If we had perfect, uniform random sampling of every Ocean on the road, what fraction of new sightings would we expect to be first-time finds? That’s the baseline.

Where the actual rate sits relative to that baseline tells us something:

Below the line — we’re systematically missing some segment of the fleet. Same vehicles being seen over and over, while others sit unwatched.
At the line — sampling is roughly uniform.
Above the line — contributors are doing a better-than-random job of seeking out unseen Oceans. Quite possibly the case.

The graph is live on the stats page. It’s also the basis for Ocean Points, which scale inversely with the recent first-sighting rate: rarer finds, more points.