Forum Replies Created
-
AuthorPosts
-
Francis Boscoe
SpectatorAfter many rounds of improvements on AGGIE’s part, here is my assessment of how it compares with the previously existing data in the New York State Cancer Registry. It’s not as detailed as what New Jersey did, but should suffice.
-
County level
94.9% – AGGIE returns same county
4.9% – AGGIE replaces known with unknown – this reflects our conservative approach (requiring high match score) and is similar to what we had in our old system. We can handle manual review of this many cases.
0.2% – AGGIE replaces unknown with known – I spot-checked a few and AGGIE looked good
0.02% – AGGIE replaces known with known – I spot-checked a few and AGGIE looked goodLatitude/longitude (restricting to where it is known on both databases)
96.1% – AGGIE is within 100 meters of existing registry value
3.7% – AGGIE within 100 m – 1 km
0.2% – AGGIE more than 1 km differentMost of the differences in the 1-3 km range seem to arise from choosing different points on the same road. Of the ones I’ve spot-checked, AGGIE is correct 63%, registry original value 27% and neither 12%.
In the 20-30 km range, AGGIE was correct 3 times, registry 6 times (only 9 examples total). Two of the AGGIE errors were where it replaced COUNTY ROUTE 2 with COUNTY ROUTE and placed a point on a seemingly random county route (matched to county parcel layer with a score of 100). Maybe this can be fixed, but obviously it is an infrequent occurrence. 3 of the errors were on Route 12 in Watertown, but the errors were inconsistent – registry was right twice and AGGIE once.
All the 18 differences of >50 km were typos by the registry.
On balance, AGGIE wins. Time to turn it back on for NY.
Francis Boscoe
SpectatorThank you Jim and David – I am helping Lindsey write up these results. It reminds me of the variations in lip cancer I looked at a few years ago. It’s almost like you could have a rule of thumb – wherever you have debate about what should count and what shouldn’t count, expect an order of magnitude variation in rates between registries.
July 27, 2017 at 7:56 pm in reply to: Facility/Hospital Addresses – Reliability and Use for Research #6030Francis Boscoe
SpectatorSo, Recinda just reminded me (again) of the existence of this forum and I think it’s time to make more use of it.
Is this still an active question? Do you mean that the facility address is being used as the patient address? If so, it would be neat if we could flag these in SEER*DMS as non-residential addresses. We tend to find them ad hoc.
Francis Boscoe
SpectatorI am unable to edit the above post – I just get taken to a blank screen. Anyhow, it was a screen shot showing how the two streets are miles apart. 512 KB is a tiny file size limit, you might want to rethink that.
Francis Boscoe
SpectatorAnother example. Ocean Avenue is not the same as Ocean Parkway. There are tens of thousands more like this.
Francis Boscoe
SpectatorHere is the example that was discussed on today’s call (the house numbers are real, but are altered from any patient’s):
For the non-existent address 182 Washington Avenue, Albany, NY 12203
AGGIE currently gives
182 Washington Avenue Extension, Albany, NY 12203 (in the NAACCR version)
182 Washington Avenue Avenue, Albany, NY 12203 (in the SEER*DMS version – I suspect a small bug here)with a match score of 100 (*update – based on the call, this result will be penalized in the future to have a score less than 100).
However, in this case, the correct address is 182 Washington Avenue, Albany, NY 12210
Other viable candidates would have been:
282 Washington Avenue, Albany, NY 12203
782 Washington Avenue, Albany, NY 12203
982 Washington Avenue, Albany, NY 12203
1082 Washington Avenue, Albany, NY 12203Here are my own Bayesian prior probabilities for each of these possibilities:
182 Washington Avenue Extension, Albany, NY 12203 (0.35)
182 Washington Avenue, Albany, NY 12210 (0.35)
282 Washington Avenue, Albany, NY 12203 (0.05)
782 Washington Avenue, Albany, NY 12203 (0.05)
982 Washington Avenue, Albany, NY 12203 (0.05)
1082 Washington Avenue, Albany, NY 12203 (0.14)
None of the above (0.01)So AGGIE is picking the most likely choice here (at least a tie for the most likely choice) – and I think that most of the time, this would be the case – but would still be incorrect 65% of the time.
I think this is a typical example, in that there will usually be a handful of possible alternatives for every typo. No matter how much we tweak the weights and penalties, I don’t see how AGGIE could ever guess correctly more than half the time. Certain kinds of analyses can tolerate having a few percent of the records geocoded to the wrong place. In New York, because we are legally mandated to publish small-area case counts, and because we do many small-area cancer investigations, we can’t. Hence requiring a match score of 100.
-
AuthorPosts