« Wednesday is Backwards DayLongest Journey »

Definition of CoNLL data format

02/06/09

Permalink 09:02:26 am, 61 words
Categories: Code, Linguistics

Definition of CoNLL data format

Here is a complete definition of the CoNLL 2007 data format. It’s on a wiki, so there’s a chance it could be incorrect.

However, the last editor is Joakim Nivre, author of MaltParser, so it’s probably right.

On a side note, UTF-8 is the specified encoding, so I need to change my build script which currently standardises on ISO-8859-1 (latin1) early on.

2 comments

Comment from: An inquiring mind [Visitor] Email
What's the purpose of the new format?

jpg and bmp are for pictures
.txt is for text

I didn't see what this thing's purpose was. I'm going to [i]guess[/i] it's either linguistics or programming related, but I don't know..
02/06/09 @ 11:48
Comment from: sandersn [Member]
Yes, CoNLL format is a text format for storing sentence structure, like Subject, Predicate, Direct Object. It's not XML, but the linguistics world hasn't switched to XML for the most part.

This is a note to myself, and to Google. So it may not be very interesting to regular readers.
02/06/09 @ 14:53

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
powered by b2evolution free blog software