It's important for the country to move fast on this. Fingal did exactly the right thing by not talking the idea to death or worrying too much about which data to offer or how to serve it up. They continue to publish all sorts of data in a couple of reasonable formats (CSV, XML) and will let "the market" figure out what's useful. The app competition was a great way to quickly coalesce that market - software developers and companies or even members of the public with ideas for data reuse.
I have a few modest suggestions for continuing to improve the quality of the Fingal data offerings, born of working with a selection of the data sets for my own contest entry. Nothing controversial, I think, and it's certainly not meant as a criticism. I have really enjoyed the challenge and thoroughly believe in what Fingal is trying to achieve.
Cleanliness
My application sucks in data from the Fingal site and caches it locally in a database. I have a script that will check regularly if the Fingal files have changed and reimport the data if they have. It's an automated process so I don't have the opportunity to manually review the content before pushing it out again to users of my application. I depend, therefore, on the source data being clean. It isn't at the moment. For example, the address of The Little Theatre in Skerries is given as "The Old School, Skerries COmmujnity Centre".
Slightly more dangerously,
<Schools> <ID>122</ID> <School_Roll_No>Blanchardstown Youthreach</School_Roll_No> <Name>Blanchardstown Youthreach</Name> <Address1>The Brace Centre</Address1> <Address2>Main Street Main Street Main Street Main Street</Address2> <Address3>Blanchardstown</Address3> <Phone>(01) 8217007</Phone> <School_Level>Youthreach</School_Level> <Mixed_Status>Mixed</Mixed_Status> <Fee_paying>No</Fee_paying> <Completion_Prog></Completion_Prog> <Lat>53.3874624894966</Lat> <Long>-6.37903666361972</Long> </Schools>
Those carriage returns in the (erroneous) Address2 field will bite you if you try to, say, generate some JavaScript and use the value to initialise a string without normalizing the whitespace first.
I have a "Tourism" category in my application but had to take out the Architectural and Protected Structure data at the last minute when I noticed that not all items have names which made my generated listings look corrupt. With further thought, I might be able to adapt to the lack of a handy reference, or drop the nameless items. Probably, though, each item should just have a name. I do recognise that Fingal County Council is not the primary source for this particular data so maybe that requirement could be pushed upstream.
Consistency
I'm using quite a few of the data sets so I notice inconsistencies between them. Most of the time, for example, latitude and longitude are given like this:
<LAT>53.5861182846244</LAT> <LONG>-6.2898600784955</LONG>Occasionally, though, they come like this:
<Lat>53.3976219424703</Lat> <Long>-6.4010389300682</Long>Since XML is case-sensitive, these are entirely different names for the same field. Of course I can handle this in code but the more consistency built in, the more chance I have of being able to use new data sets as they are published without further impact on code.
Data that is republished by Fingal could be automatically transformed to fit local data conventions.
IDs
Generally, there are no IDs in the Fingal data. Each item should have an ID, something that doesn't vary even as the item is updated. That way, I can isolate changes in the source and apply the same changes in turn to wherever I have used that data. My application generates web content for a community site. I originally considered a facility to apply updates to a previously-generated site but without IDs it's awkward.
Also, the further upstream the IDs are added, the more chance downstream applications have to link to each other's data.
Schemas
The great advantage of XML as a text data format is that it can be so tightly constrained, with consequent reliability benefits to the software that consumes that XML. The Fingal data is schema-less which means I will inevitably make assumptions about the data format that prove to be untrue when the data is later updated.
For example, a field might have the following values (a real example):
<cafe>Yes</cafe> <cafe>No</cafe> <cafe/>If this empty element variation appears for the first time in a future update, it's very likely to break code, which assumed an explicit boolean value. Better to define a schema and make what looks like a boolean actually a boolean (at the same time limiting the inconsistency of Yes/yes/Y/true... values). My code will know what to expect, and Fingal can use the same schemas to validate any updates they make.
Changes
I don't want to hit the Fingal server every time I need data so of course I cache it locally. When checking that my cached data is up-to-date I should be polite and not re-download the files from Fingal's server if they have not changed. It seems that the Fingal server uses the Last-Modified and ETag HTTP headers so I can work with that. Perhaps it could be made explicit what the supported change mechanism is. This is probably enough for data that comes in discrete files but a changes feed would be a nice touch, perhaps.
0 comments:
Post a Comment