org.annotation.wordfreak.annotator
Class TokenAnnotator

java.lang.Object
  extended byorg.annotation.wordfreak.annotator.Annotator
      extended byorg.annotation.wordfreak.annotator.DocumentProcessor
          extended byorg.annotation.wordfreak.annotator.ParagraphProcessor
              extended byorg.annotation.wordfreak.annotator.SentenceProcessor
                  extended byorg.annotation.wordfreak.annotator.TokenAnnotator
All Implemented Interfaces:
java.awt.event.ActionListener, AnnotatedFileListener, java.util.EventListener, Plugin
Direct Known Subclasses:
SimpleTokenAnnotator

public abstract class TokenAnnotator
extends SentenceProcessor

Provides common functionality for automatic annotation of tokens. Most annotators of tokens should extend this class and implements its abstract methods.


Field Summary
 
Fields inherited from class org.annotation.wordfreak.annotator.SentenceProcessor
sentenceTypes
 
Fields inherited from class org.annotation.wordfreak.annotator.Annotator
annotationFilter, dataDirectory, DEFAULT_ANNOTATOR_NAME, files, guiListener, listeners, loaded, progress, trainingFilter
 
Constructor Summary
TokenAnnotator(java.lang.String type)
           
 
Method Summary
protected abstract  double[] getTokProbs()
          Returns a confidence associated with each token returned in the most recent call to tokenize.
protected abstract  void initTraining()
          Initializes annotator for training.
protected  void processSentence(Annotation sentence, double percentage)
          Processes the specified sentence which consisits of the specified percentage of total work to be performed by this annotator.
protected abstract  Span[] tokenize(java.lang.String text)
          Returns character offsets which are the tokens of the text parametter.
protected abstract  void train()
          Trains a model based on the tokens provided in previous calls to trainWithTokens.
 void training(java.util.List files)
           
 void training(java.lang.String[] files)
           
protected abstract  void trainWithTokens(Span[] tokens, java.lang.String text)
          Uses the tokens provided to construct events for traiing the current tokenizer model.
 
Methods inherited from class org.annotation.wordfreak.annotator.SentenceProcessor
processParagraph
 
Methods inherited from class org.annotation.wordfreak.annotator.ParagraphProcessor
processDocument
 
Methods inherited from class org.annotation.wordfreak.annotator.DocumentProcessor
annotating
 
Methods inherited from class org.annotation.wordfreak.annotator.Annotator
actionPerformed, addAnnotatorListener, annotate, annotatedFile, closeAnnotatedFile, done, getDataDirectory, hideWaitDialog, loadAnnotator, loaded, removeAnnotatorListener, setAnnotationFilter, setDataDirectory, setGuiListener, setProgress, setTrainingFilter, showWaitDialog, sortedOutcomes, supportsTraining, train, updateProgress
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenAnnotator

public TokenAnnotator(java.lang.String type)
Method Detail

tokenize

protected abstract Span[] tokenize(java.lang.String text)
Returns character offsets which are the tokens of the text parametter.

Parameters:
text - the string to be tokenized. Typically a sentence.
Returns:
character offsets in to which are the tokens

getTokProbs

protected abstract double[] getTokProbs()
Returns a confidence associated with each token returned in the most recent call to tokenize.

Returns:
array of confidences associated with each token returned in the most recent call to tokenize.

initTraining

protected abstract void initTraining()
Initializes annotator for training.


trainWithTokens

protected abstract void trainWithTokens(Span[] tokens,
                                        java.lang.String text)
Uses the tokens provided to construct events for traiing the current tokenizer model.

Parameters:
tokens - character offsets into text which are tokens to be used for training.
text - string into which offsets specified in tokens refer to.

train

protected abstract void train()
Trains a model based on the tokens provided in previous calls to trainWithTokens.


training

public void training(java.util.List files)

training

public void training(java.lang.String[] files)
Overrides:
training in class Annotator

processSentence

protected void processSentence(Annotation sentence,
                               double percentage)
Description copied from class: SentenceProcessor
Processes the specified sentence which consisits of the specified percentage of total work to be performed by this annotator.

Specified by:
processSentence in class SentenceProcessor
Parameters:
sentence - The sentence to be annotated.
percentage - The percentage of work this sentence represents.


Copyright © 2004 Thomas Morton and Jeremy LaCivita. All Rights Reserved.