So far I've only talked about code that I've developed or played around with in my own time. In preparation for future blog posts I thought I'd spend a little time talking about the code I'm paid to work on.
As some of you may already know I work in the Department of Computer Science at the University of Sheffield. I work in the Natural Language Processing Group (NLP) where my interests have focused on information extraction -- getting useful information about entities and events from unstructured text such as newspaper articles or blog posts. The main piece of software that makes this work possible is GATE.
GATE is a General Architecture for Text Engineering. This means that it provides both the basic components required for building applications that work with natural language as well as a framework in which these components can be easily linked together and reused. The fact that I never have to worry about basic processing such as tokenization (splitting text into individual words and punctuation), sentence splitting, and part-of-speech tagging means that I'm free to concentrate on extracting information from the text. I've used GATE since 2001 when I started work on my PhD. For the last two years I've been employed as part of the core GATE team. Technically I'm not paid to develop GATE (I don't think any of us actually are) but the projects we work on all rely on GATE and so we contribute new plugins or add new features as the need arises.
One of the things I really like about working on GATE is that it is open-source software (released under the LGPL) which means not only am I free to talk about the work I do but also anyone is able to freely use and contribute to the development. This also means that GATE has been adopted by a large number of companies and universities around the world for all sorts of interesting tasks -- I'm currently involved in three projects that involve GATE being used for cancer research, mining of medical records and government transparency.
So if you are interested in text engineering and you haven't heard of GATE 1) shame on you and 2) go try it out and see just what it can do. And for those of you who don't do need to process text at least you'll know what I'm talking about when I refer to it in future posts.