UAM CorpusTool is a set of tools for the linguistic annotation of text. Core concepts include:
• The user defines a ‘project’, which is a set of files, and a set of analyses which are applied to each of these files.
• Each ‘analysis’ can be seen as a ‘layer’ of annotation. CorpusTool currently allows two types of annotation:
1. Document Coding: where the text as a whole is assigned features. For instance, these features could represent the register of the document (field,tenor, mode), or text-type.
2. Segment Coding: The user can select segments within a file, and assign features to each of these segments. Segments are specified by dragging the mouse over a span of text, and the user is then prompted to specify the features of this segment.
Other annotation types will be added in later versions, allowing annotation of rhetorical
structure theory (RST), Generic Structure (GSP), participant chaining, sentence structuring (e.g., Subj, Pred, Mood, Adjunct, etc.), annotation of spoken data etc.
UAM CorpusTool replaces prior software of the author, Systemic Coder, which allowed coding of single documents at a single layer. CorpusTool is an attempt to overcome the various limitations that constrained users of Coder. I wish to thank the many users of
Coder who forwarded their comments over the years, and to thank those sending me comments on this new tool.