| « Wednesday is Backwards Day | Longest Journey » |
Here is a complete definition of the CoNLL 2007 data format. It’s on a wiki, so there’s a chance it could be incorrect.
However, the last editor is Joakim Nivre, author of MaltParser, so it’s probably right.
On a side note, UTF-8 is the specified encoding, so I need to change my build script which currently standardises on ISO-8859-1 (latin1) early on.