Friday, January 13, 2012

Refreshing the toolbox

Learning a new programming tool seems to follow a pattern:
  1. Read up on it to understand what problems it addresses and how to get started.
  2. Use it for a decent-sized, real-world task.
  3. Read the documentation to fill in the gaps and realise what you could have done better.
I've had a bunch of tools on my radar for a long time which I never got to use on a paid project. Aside from the occasional minor personal project or experiment, these tools stayed in the toolbox. The recent Fingal Open Data app contest was the opportunity I was waiting for. There were no restrictions on choice of technology so I lined up everything on my list of things to try:
  • Groovy. I've been using Groovy for small scripts for some years now. It's a natural choice of dynamic language for a Java developer. I found it very rapid for development on the Fingal app. It's the small things that make the biggest difference: all the convenience methods on collections, for example, and GStrings, which are mini-templates. The XmlSlurper was handy for processing the Fingal data.

    I attended the Groovy & Grails Exchange in London in December so Groovy and associated technologies were all very fresh in my mind when I began work on my contest entry.
  • Google App Engine. This was my deployment platform. It has traditionally been difficult to experiment with Java-based web apps because of the cost of hosting a minimally-capable web container. GAE is good enough though, and it comes with a NoSQL database, another trend I've been following lately.
  • Gaelyk. This is a thin Groovy layer on top of GAE. Besides making Groovy a native language on that platform, it adds templating and various convenience methods for accessing the underlying platform services.
  • Gradle. Built on Groovy, this is my candidate build tool to replace Ant. I pretty much used an out-of-the-box Gradle script for this application, only adding a CodeNarc plugin for static code analysis. It worked perfectly, took care of deployment to development and production servers, and it looks very clean and expressive so I will continue to use it when possible.
  • Sublime Text 2. A full-blown IDE felt too heavyweight for this project, and I seem to waste a lot of time keeping Eclipse's plug-ins and build classpath up-to-date so I looked around for a good code editor. I see all the Apple users at conferences rocking their Textmate editors so I wanted a Windows equivalent. I found Sublime Text. It's not free but you can try it for as long as you like before purchase. Having used it for this project, I'll be buying. It's very slick. I didn't have Groovy code-completion or refactoring but it's a great editor.
  • Bootstrap 2.0. Twitter's Boostrap project has made getting a website or web application frontend up and running very simple indeed. I've tried other CSS frameworks but this one seems particularly solid and straightforward. The 2.0 version is not released yet but I didn't have any funky UI requirements so it was safe to deploy the work-in-progress here.
It all worked out very well. No roadblocks and there was plenty of spare horsepower in each of these tools to handle a much larger application. I would add in Objectify if the data structure got any more complicated though. In fact, I might add it to Whatz n the Hood anyway, just to get a feel for it.

Thursday, January 12, 2012

Fingal open data

Irish software developers had a real treat over Christmas and New Year: a challenge to create an application using any of the dozens of data sets published by Fingal County Council. It's all in the name of open data, a movement to free up the information collected and generated by public administrations at local, national and EU levels. The hope is that this will seed economic growth, as well as contribute to greater transparency and accountability.

It's important for the country to move fast on this. Fingal did exactly the right thing by not talking the idea to death or worrying too much about which data to offer or how to serve it up. They continue to publish all sorts of data in a couple of reasonable formats (CSV, XML) and will let "the market" figure out what's useful. The app competition was a great way to quickly coalesce that market - software developers and companies or even members of the public with ideas for data reuse.

I have a few modest suggestions for continuing to improve the quality of the Fingal data offerings, born of working with a selection of the data sets for my own contest entry. Nothing controversial, I think, and it's certainly not meant as a criticism. I have really enjoyed the challenge and thoroughly believe in what Fingal is trying to achieve.

Cleanliness

My application sucks in data from the Fingal site and caches it locally in a database. I have a script that will check regularly if the Fingal files have changed and reimport the data if they have. It's an automated process so I don't have the opportunity to manually review the content before pushing it out again to users of my application. I depend, therefore, on the source data being clean. It isn't at the moment. For example, the address of The Little Theatre in Skerries is given as "The Old School, Skerries COmmujnity Centre".

Slightly more dangerously,

<Schools>
  <ID>122</ID>
  <School_Roll_No>Blanchardstown Youthreach</School_Roll_No>
  <Name>Blanchardstown Youthreach</Name>
  <Address1>The Brace Centre</Address1>
  <Address2>Main Street

  Main Street

  Main Street

  Main Street</Address2>
  <Address3>Blanchardstown</Address3>
  <Phone>(01) 8217007</Phone>
  <School_Level>Youthreach</School_Level>
  <Mixed_Status>Mixed</Mixed_Status>
  <Fee_paying>No</Fee_paying>
  <Completion_Prog></Completion_Prog>
  <Lat>53.3874624894966</Lat>
  <Long>-6.37903666361972</Long>
</Schools>

Those carriage returns in the (erroneous) Address2 field will bite you if you try to, say, generate some JavaScript and use the value to initialise a string without normalizing the whitespace first.

I have a "Tourism" category in my application but had to take out the Architectural and Protected Structure data at the last minute when I noticed that not all items have names which made my generated listings look corrupt. With further thought, I might be able to adapt to the lack of a handy reference, or drop the nameless items. Probably, though, each item should just have a name. I do recognise that Fingal County Council is not the primary source for this particular data so maybe that requirement could be pushed upstream.

Consistency

I'm using quite a few of the data sets so I notice inconsistencies between them. Most of the time, for example, latitude and longitude are given like this:
<LAT>53.5861182846244</LAT>
<LONG>-6.2898600784955</LONG>
Occasionally, though, they come like this:
<Lat>53.3976219424703</Lat>
<Long>-6.4010389300682</Long>
Since XML is case-sensitive, these are entirely different names for the same field. Of course I can handle this in code but the more consistency built in, the more chance I have of being able to use new data sets as they are published without further impact on code.

Data that is republished by Fingal could be automatically transformed to fit local data conventions.

IDs

Generally, there are no IDs in the Fingal data. Each item should have an ID, something that doesn't vary even as the item is updated. That way, I can isolate changes in the source and apply the same changes in turn to wherever I have used that data. My application generates web content for a community site. I originally considered a facility to apply updates to a previously-generated site but without IDs it's awkward.

Also, the further upstream the IDs are added, the more chance downstream applications have to link to each other's data.

Schemas

The great advantage of XML as a text data format is that it can be so tightly constrained, with consequent reliability benefits to the software that consumes that XML. The Fingal data is schema-less which means I will inevitably make assumptions about the data format that prove to be untrue when the data is later updated.

For example, a field might have the following values (a real example):
<cafe>Yes</cafe>
<cafe>No</cafe>
<cafe/>
If this empty element variation appears for the first time in a future update, it's very likely to break code, which assumed an explicit boolean value. Better to define a schema and make what looks like a boolean actually a boolean (at the same time limiting the inconsistency of Yes/yes/Y/true... values). My code will know what to expect, and Fingal can use the same schemas to validate any updates they make.

Changes

I don't want to hit the Fingal server every time I need data so of course I cache it locally. When checking that my cached data is up-to-date I should be polite and not re-download the files from Fingal's server if they have not changed. It seems that the Fingal server uses the Last-Modified and ETag HTTP headers so I can work with that. Perhaps it could be made explicit what the supported change mechanism is. This is probably enough for data that comes in discrete files but a changes feed would be a nice touch, perhaps.

Friday, July 9, 2010

Recession cuisine

It's hard to beat an Irish stew for easy, tasty and healthy. It's also great value. But exactly what does a portion cost? I've just worked it out based on what I throw into a pot. These quantities make three hearty meals. Prices are Lidl's.
  • 250g mushrooms €0.99
  • 400g diced beef steak €3.49
  • 1kg potatoes €1.75
  • 0.5kg carrots €0.50
  • 1 large onion €0.33
  • Half a bulb of garlic €0.16
  • 2 beef stock cubes €0.23
There are some dried herbs and seasoning in there too but this will be close enough. I figure that comes out at just under €2.50 per meal. Wonderful! And it freezes great too. Why would you eat anything else?

Sunday, June 13, 2010

Day 4 Epicenter 2010

Gaelyk: Groovy in the Cloud - Tim Berglund

One hears suspiciously little about practical applications running on Google App Engine (GAE). And yet it offers lots of features to the developer, mostly for free. You get to run on Google's infrastructure so you get high availability and scalability. There is a good selection of services provided like a datastore, authentication, caching, messaging, email, task queue, etc.

GAE supports Java too (and Python). Unfortunately limitations of the environment mean we can't bring all of our favourite Java tools to the party. Grails, for example, is not really an option at the moment. It's wedded to Hibernate and thus to relational databases. There is a plugin for Google App Engine but it "feels goofy" according to Tim Berglund.

Enter Gaelyk, a lightweight Groovy toolkit for Google App Engine. It wraps the various GAE services making their use less verbose. It injects various variables (eg for the datastore) so they are always available when needed. It also supplies Groovlets (like servlets) and Groovy templates for serving web pages. It's more than a little retro but it's fine for small applications.

Worth a try.


jQTouch - Marc Grabanski
Being a Java developer myself and not even owning a Mac, I feel a certain reluctance to switch to Objective-C just to write an iPhone app. Marc Grabanski reminded us that web-based apps can often do the job just as well. They can also look like native apps, be responsive like native apps, and take advantage of the HTML 5 and CSS 3 features already built into WebKit.

With this in mind, the jQTouch jQuery plugin has been developed to make iPhone web app development a breeze. It's got attractive visual themes and CSS animations for wow factor. It also supports a clever nested menu navigation system that is not dependent on page loads so response time is instant.

It's important to note that at this point you have, really, a website that looks well on an iPhone, and not something you can sell via the app store. That might be enough but if you want to make a downloadable app, PhoneGap can apparently do the conversion. I'm unclear how this works - perhaps it wraps the HTML in a WebKit control? - but it was emphasised that however it does it, PhoneGap does not fall foul of Apple's new rules on cross-compilation.

I see PhoneGap supports quite a few phone platforms (including WebOS, which is what my Palm uses). jQTouch itself seems limited to iPhone for the moment though. Probably because it takes advantage of cutting edge browser features.

According to Marc, jQuery's next focus is mobile. John Resig is consulting widely among interested parties to find out what makes most sense for mobile platforms.


Enterprise Java Hybrids for Cloud Computing - Eugene Ciurana
This was quite a high-level architectural discussion that covered the many components of a real-life company's IT infrastructure, the split between in-house and cloud servers, and the migration from a restricted to a scalable system as the company grew.

As usual from Eugene, it was a very well-informed presentation from someone who works at the coalface.


Java developer to iPhone coder - Matthew McCullough
If the aforementioned jQTouch and PhoneGap don't suffice then you will just have to get your hands dirty with Objective-C. Matthew McCullough introduced the development environment and language from the point of view of a Java developer. He squeezed an awful lot of information into one hour. I don't think I've ever been to a talk delivered so fast and with so little fluff. However he does it though, it works.

It was extremely practical, even advising how to avoid the iPhone Developer registration fee (up until you want to test on a real phone or sell through the app store anyway).

He explained actual lines of Objective-C code and highlighted the differences in terminology with the Java world.

Matthew also tried to tempt us with stories of enormous cheques paid out by Apple to iPhone developers.

Objective-C is a little less scary now, though it still doesn't look beautiful. I have an iPad but I don't have a Mac so I won't be dipping my toes in the water just yet.

Day 3 Epicenter 2010

I originally intended to see Matt Raible's "Comparing Kick-Ass Web Frameworks" first thing on Thursday morning. I went to his Evangelist talk on the "Future of Web Frameworks" the night before, however, and suddenly realised I'm not dithering over web frameworks any more. Grails it is for now. The only thing I wish for is better support for NoSQL databases but that's not particular to Grails.

I was able to give Matt's comparison of Grails, Rails, GWT and Flex a miss, therefore, to see:


Microformats and the Semantic Web - Tim Berglund
I had the impression that microformats were an idea that never really happened. It seems I might have written them off prematurely. Google is pulling useful info out of microformats and RDFa, for example, and some big sites are using them (eg Best Buy, O'Reilly).

I've installed the Operator extension in Firefox now, as suggested by Tim, so I'll notice the use of this extra markup as I'm surfing.

Two more random points:
  • Use Google Labs' Social Graph to explore your online connections (try your Twitter a/c)
  • Best Buy is "sorbet for the mind", ie somewhere you can go after a hard day to cleanse the brain cells :-)

PHP and Symfony - Andy Gibson
I have a bunch of PHP hosting accounts for my various websites but I really don't exploit this resource. I do use PHP occasionally to crunch calendar data or manipulate a web page based on URL parameters but I've never considered using it to write a proper web app.

That's why I was attracted to Andy Gibson's talk. Perhaps a Java developer can develop in PHP and not feel dirty.

I can't claim to know a lot about Symfony since the introduction was necessarily brief but it looks like Struts (ie it's MVC) with added ORM. Which is fine.

The final generated web files did look like the usual mix of HTML and PHP though, so perhaps the templating part is not up to much. Still, it does put PHP back on the table for me, in the event I want to write a simple database-backed application.


iPad
At this point I won an iPad thanks to the wonderful folks at WIBU Systems. And while on a high from this I missed the next session :-) But I did pull it together for the final slots of the day...


Productive Application Development with Grails Plugins - Peter Ledbrook
It's old news that Grails development is accelerated by incorporating neatly pre-packaged units of functionality, ie plugins. But Peter Ledbrook's talk brought us news of exciting improvements to the plugin system.

When building a modular application based on plugins, a change to one of those plugins was not reflected in the running code until the plugin was repackaged and reinstalled. There is a solution now - in-place plugins. You specify the location of the plugin's development directory in BuildConfig.groovy and updates are then reflected immediately in the running code.

There are more details of plugin management here, including via a repository manager.


jQuery UI and Plugins - Marc Grabanski
I find myself looking for nice web page widgets more and more. I settled on Yahoo! YUI a couple of years ago. That's still a great framework but it's kind of falling between two stools at the moment. The cool JavaScript language features are in version 3 while many of the fancy widgets are in version 2. Looking at it I have option paralysis.

jQuery is renowned for making DOM manipulations ridiculously easy and I've used it (for example on my home page) when I just want some quick effect. I hadn't taken a proper look at jQuery UI, however, until this talk.

Obviously jQuery UI comes with a bunch of widgets - slider, date picker, etc. That's great but what impressed me was how skinnable they are. I'm a terrible designer so I like stuff that looks good out of the box. jQuery UI comes with a very nice set of skins.

Not only that, the CSS classes are designed to be easy to apply to the rest of your web page so you can make your web app visually consistent. That's worth a lot.

I also noticed that jQuery UI comes with a large set of icons, conveniently combined into a single sprite. It's a really thoughtful library.

For functionality not in the core UI library, jQuery has a plugin architecture and an enthusiastic community contributing new widgets. It's rather a lot to navigate and evaluate though, so Marc has curated his own selection of the very best widgets that play well with jQuery UI.

Day 2 Epicenter 2010

Joda-Time and JSR-310 - Stephen Colebourne

Stephen Colebourne is something of a hero in the Java community. The standard Java date and time APIs are riddled with head-scratching quirks and bug-inducing complications. Stephen wrote the widely used drop-in replacement, Joda-Time, that has greatly eased the lives of developers.

Building on the experience of Joda-Time, Stephen now leads the effort to recast the date and time handling in Java under JSR-310. Work here is ongoing and unlikely to appear in JDK 7.

In this talk, Stephen outlined the motivation for JSR-310 and the many nuances of calendars and time that a good API should handle.

Some of the principles that the JSR has adopted are
  • Immutability (for thread-safety, something not true of the standard Java implementation)
  • Fluency (API should be easy to read and learn)
  • Clear, explicit and expected (general API design tip: if the JavaDoc is not simple, refactor)
  • Extensible (allow for particular uses of date-time, eg financial years) but make default suit most use cases
We heard about some of the classes in JSR-310 (instants, durations, zone-offsets, time-zones, etc), precision, calculations on date-times (eg what is one month from Jan 31st?), political and scientific meddling with human time (DST and leap-seconds), testing (can inject your own clock during testing for repeatability)...

It's head-spinning stuff and always more complicated than you think. It would make most people despair but Stephen seems remarkably chilled.

The JSR is unfinished but Joda-Time is available right now, of course, and will implement the JSR interfaces.

Groovy & Grails - Tim Berglund

I've been to quite a few Groovy and Grails intros at this stage and I've even used Groovy for a few small scripts so I probably didn't need this. But it's no harm to be reminded how powerful these tools are.

Tim Berglund is an excellent speaker - very assured, knowledgeable, conversational and funny.

Java Constraint Programming with JSR-331 - Jacob Feldman

Constraint programming is an easy concept to understand but it tends to be tricky to implement and more of an academic pursuit than a practical tool in business. The purpose of JSR-331 is to bridge the gap between academia and professional programmers.

An example of a real world constraint might be in the management of a share portfolio: the portfolio may not contain more than 15% of tech stocks unless it also contains at least 7% utility stocks.

After a few such constraints, implemented efficiently in Java code, you quickly get complicated, incomprehensible and unmodifiable code. Ideally you would have a language with which to express constraints and a black box that would take them and return an efficiently-calculated solution.

Search strategies for turning out solutions are the subject of much research but should be hidden from the programmer. A library conforming to JSR-331 will do that.

Sometimes it is not possible to satisfy all constraints. In that case you can request a minimal violation of constraints by assigning a cost to each constraint.

Examples of problems amenable to a constraint-based approach are train scheduling, truck loading & routing, and the allocation of field service personnel. The canonical tutorial example is solving sudoku puzzles (where the constraints are the digits 0-9 can each appear only once in each row, etc).

A constraint-based programming library sounds very useful and powerful. I'm not sure to what extent this exists today in Java open source. I imagine expressing constraints is straightforward enough, but that supplying efficient generic search strategies is quite difficult. Jacob mentioned Choco and Constrainer as a couple of Java libs in this area. More work has been done in C++ and solid commercial tools are available.

Apparently "Ireland is a major player in this area" thanks to the Cork Constraint Computation Centre.

Polyglot Programming in the JVM - Andres Almiray

There is a general realisation among Java programmers that, while the language is still very useful, its core development has come to the end of the road. The JVM, on the other hand, is a different story. There is lots of innovation in non-Java languages running on the JVM.

Polyglot programming is the idea that programmers should have a few languages in their toolbox and mix and match them within projects as appropriate.

The languages chosen by Andres Almiray for comparison in this talk were Groovy, Scala and Clojure. Interoperability between these and Java is first class and bidirectional so they are very good tools to use in an otherwise Java environment.

It sounds on the surface a little messy to mix languages but you can easily imagine tests on Java code being written in Groovy, say (or Spock). Andres also gave an example from Griffon of Scala and Clojure code being handled and integrated transparently during the build.

While I'd heard of Polyglot Programming before, Andres made a brief reference to Polyglot Persistence right at the end. That's new to me but he says it's on the way. What it means, I think, now that we have "NoSql" databases, is using more than one storage engine in an application. There are interesting times ahead!

Wednesday, June 9, 2010

Day 1 Epicenter 2010

These are the sessions I attended on the first day of Ireland's biggest software conference...

Introducing OpenXML - Craig Murphy
I'm working on a document conversion project at the moment so I'm already poking around Microsoft's new Office file format. It's easy enough to navigate and it's properly documented but I wouldn't rate it easy to generate.

I'm not a .NET programmer though. Microsoft supplies an OpenXML SDK that provides high-level access to document constructs like paragraphs from C#. Craig Murphy demonstrated what is possible using the right tools.

One slick utility in the SDK - the Document Reflector - will generate a C# program from, say, a Word invoice. It's trivial then to adapt that program to bulk generate Word invoices from a database.

I can't see myself moving to C# to take advantage of that but I did notice a couple of other utilities that I might be able to use standalone: a diff tool for comparing OpenXML files and an OpenXML validator.

Another neat feature that Craig showed us, and which was entirely unknown to me, is that of "Content Controls". These are well-defined editable regions of a document. They are particularly easy to read from and write to from software.

Most impressively, these controls can be bound to arbitrary XML. It's a two-way mapping, so XML can be automatically generated from Office documents or vice versa. Knowing that this features exists gives me the confidence that pretty much anything is possible and easily achievable from code with Microsoft Office.

Beyond Java Coding: NoSQL and Non-Java on the JVM - Eugene Ciurana
I was at Eugene Ciurana's talk last year on Google App Engine. This time around he was offering a "free your mind" survey of the possibilities and potential of the JVM once you stop thinking Java and relational databases.

What's great about Eugene's talks is that he has clearly tried all the major alternatives under field conditions. So when he says that Python (in the guise of Jython on the JVM) is the right tool in a particular context you have to take his opinion seriously.

There were plenty of nuggets of wisdom here. It would never have occurred to me, for example, to look for a Java version of awk for a text processing task. Jawk is a little slower than Java in execution but development time is radically reduced.

What language choice is best to ensure portability? In the enterprise, it's Java. Otherwise, according to Eugene, it's Python. As long as you avoid the bleeding edge features of CPython and code to the slightly less advanced Jython, that is.

There was more, about MongoDB and Mule, but it boils down to knowing what's out there and choosing the right tool for the job. The pay off is substantially reduced development time and lines of source code.

Flying with Griffon - Andres Almiray
This was an opportunity for a bit of celebrity spotting because Andres Almiray is often mentioned on the Grails Podcast, which I listen to religiously.

Griffon is a framework for rapid development of desktop Java applications. In concept and usage it's very similar to Grails.

I can't imagine wanting to develop a desktop Java application but I went along anyway because I came up against Griffon recently. I wanted to use Groovy to quickly generate some images but found that the various graphics classes therein were changing to fit better with Griffon. It was all quite confusing and I didn't have time to tease it out. I was hoping to gain some insight from the talk that would put me back on track.

I'm not sure I got that but Griffon is a very elegant system anyway and worth seeing. Having developed Swing UIs in the past I can appreciate what this new framework brings to the party.

Enterprise Business Intelligence Solutions - Neil Barry
The full title continues: To Stop the Data Deluge. That's what attracted me to the talk; I find the task of crunching huge amounts of data to create useful information fascinating.

Jaspersoft is a surprisingly substantial company doing well off the open source software business model. Their products covering data visualisation, reporting and dashboards are downloaded 120 times an hour and are in use in a huge array of enterprises. The Revenue Commissioners in Ireland are one happy customer.

What appeals to companies is that there is no per-user charge for the software. A division can download and deploy it without getting permission from head office. This has led to the "democratisation" of business intelligence.

A couple of case studies rounded out the talk. One detailed BT's use of the software on a database of 11 billion records, and to which 30 million records are being added daily. To manage such amounts of data and to extract business value from it is clearly a job for industrial-strength software. Apparently Jaspersoft has the solution.

I hope I have an excuse to test this out sometime.