org.annotation.wordfreak.annotator
Class ChunkerAnnotator

java.lang.Object
  extended byorg.annotation.wordfreak.annotator.Annotator
      extended byorg.annotation.wordfreak.annotator.DocumentProcessor
          extended byorg.annotation.wordfreak.annotator.ParagraphProcessor
              extended byorg.annotation.wordfreak.annotator.SentenceProcessor
                  extended byorg.annotation.wordfreak.annotator.ChunkerAnnotator
All Implemented Interfaces:
java.awt.event.ActionListener, AnnotatedFileListener, java.util.EventListener, Plugin

public abstract class ChunkerAnnotator
extends SentenceProcessor

This annotator creates annotations for sequence data by converting a series of tags into appropiate chunks. This can be extended to create named-entity recognizers, noun-phrase detectors, or other annotators of this category.

Author:
Tom Morton

Nested Class Summary
static class ChunkerAnnotator.ChunkAction
           
protected static class ChunkerAnnotator.ChunkActionEnum
           
 
Field Summary
protected  boolean createNons
          Set to true when non-chunks should be created along with regular chunks for active-learning.
 
Fields inherited from class org.annotation.wordfreak.annotator.SentenceProcessor
sentenceTypes
 
Fields inherited from class org.annotation.wordfreak.annotator.Annotator
annotationFilter, dataDirectory, DEFAULT_ANNOTATOR_NAME, files, guiListener, listeners, loaded, progress, trainingFilter
 
Constructor Summary
ChunkerAnnotator(java.lang.String type)
           
 
Method Summary
protected  void endOfDocument()
          This function is called after each document has been processed.
protected abstract  ChunkerAnnotator.ChunkAction getChunkAction(java.lang.String tag, Annotation ann)
          Determines the chunk action for the specified chunk tag.
protected abstract  java.lang.String[] getChunkTags(java.lang.String[] toks, java.lang.String[] tags, java.lang.String[] pretags, double[] tprobs)
          Computes a list of chunk tags which can be converted into chunk actions using getChunkAction.
protected  void processDocument(Annotation document, double percentage)
          Processes the specified document which consisits of the specified percentage of total work to be performed by this annotator.
 void processSentence(Annotation sentence, double percentage)
          Processes the specified sentence which consisits of the specified percentage of total work to be performed by this annotator.
 
Methods inherited from class org.annotation.wordfreak.annotator.SentenceProcessor
processParagraph
 
Methods inherited from class org.annotation.wordfreak.annotator.DocumentProcessor
annotating
 
Methods inherited from class org.annotation.wordfreak.annotator.Annotator
actionPerformed, addAnnotatorListener, annotate, annotatedFile, closeAnnotatedFile, done, getDataDirectory, hideWaitDialog, loadAnnotator, loaded, removeAnnotatorListener, setAnnotationFilter, setDataDirectory, setGuiListener, setProgress, setTrainingFilter, showWaitDialog, sortedOutcomes, supportsTraining, train, training, updateProgress
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

createNons

protected boolean createNons
Set to true when non-chunks should be created along with regular chunks for active-learning.

Constructor Detail

ChunkerAnnotator

public ChunkerAnnotator(java.lang.String type)
Parameters:
type - Unused parameter.
Method Detail

processDocument

protected void processDocument(Annotation document,
                               double percentage)
Description copied from class: DocumentProcessor
Processes the specified document which consisits of the specified percentage of total work to be performed by this annotator.

Overrides:
processDocument in class ParagraphProcessor

processSentence

public void processSentence(Annotation sentence,
                            double percentage)
Description copied from class: SentenceProcessor
Processes the specified sentence which consisits of the specified percentage of total work to be performed by this annotator.

Specified by:
processSentence in class SentenceProcessor
Parameters:
sentence - The sentence to be annotated.
percentage - The percentage of work this sentence represents.

getChunkTags

protected abstract java.lang.String[] getChunkTags(java.lang.String[] toks,
                                                   java.lang.String[] tags,
                                                   java.lang.String[] pretags,
                                                   double[] tprobs)
Computes a list of chunk tags which can be converted into chunk actions using getChunkAction.

Parameters:
toks - The tokens to be chunked.
tags - The POS tags of the words.
pretags - An array containing chunk tags which should be maintained. A value of null for a particular tag indicates that no pre-tag needs be maintained and sending in null for the array indicated that no pre-tags are to be maintained.
tprobs - The chunk tag probabilities for the returned chunk tags. This is populated by this function.
Returns:
A chunk tag for each toks

getChunkAction

protected abstract ChunkerAnnotator.ChunkAction getChunkAction(java.lang.String tag,
                                                               Annotation ann)
Determines the chunk action for the specified chunk tag.

Parameters:
tag - The chunk tag.
ann - The annotation the tag was applied to.
Returns:
A chunk tag with appropiate action and tag fields.

endOfDocument

protected void endOfDocument()
This function is called after each document has been processed. It can be over-ridden for processing between documents. This is useful for document level tag caching helpful in named entity detection.



Copyright © 2004 Thomas Morton and Jeremy LaCivita. All Rights Reserved.