org.annotation.wordfreak.annotator
Class ChunkerAnnotator
java.lang.Object
org.annotation.wordfreak.annotator.Annotator
org.annotation.wordfreak.annotator.DocumentProcessor
org.annotation.wordfreak.annotator.ParagraphProcessor
org.annotation.wordfreak.annotator.SentenceProcessor
org.annotation.wordfreak.annotator.ChunkerAnnotator
- All Implemented Interfaces:
- java.awt.event.ActionListener, AnnotatedFileListener, java.util.EventListener, Plugin
- public abstract class ChunkerAnnotator
- extends SentenceProcessor
This annotator creates annotations for sequence data by converting a series of tags into appropiate chunks.
This can be extended to create named-entity recognizers, noun-phrase detectors, or other annotators of this category.
- Author:
- Tom Morton
Field Summary |
protected boolean |
createNons
Set to true when non-chunks should be created along with regular chunks for active-learning. |
Method Summary |
protected void |
endOfDocument()
This function is called after each document has been processed. |
protected abstract ChunkerAnnotator.ChunkAction |
getChunkAction(java.lang.String tag,
Annotation ann)
Determines the chunk action for the specified chunk tag. |
protected abstract java.lang.String[] |
getChunkTags(java.lang.String[] toks,
java.lang.String[] tags,
java.lang.String[] pretags,
double[] tprobs)
Computes a list of chunk tags which can be converted into chunk actions using getChunkAction . |
protected void |
processDocument(Annotation document,
double percentage)
Processes the specified document which consisits of the specified percentage of total work to be performed by this annotator. |
void |
processSentence(Annotation sentence,
double percentage)
Processes the specified sentence which consisits of the specified percentage of total work to be performed by this annotator. |
Methods inherited from class org.annotation.wordfreak.annotator.Annotator |
actionPerformed, addAnnotatorListener, annotate, annotatedFile, closeAnnotatedFile, done, getDataDirectory, hideWaitDialog, loadAnnotator, loaded, removeAnnotatorListener, setAnnotationFilter, setDataDirectory, setGuiListener, setProgress, setTrainingFilter, showWaitDialog, sortedOutcomes, supportsTraining, train, training, updateProgress |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
createNons
protected boolean createNons
- Set to true when non-chunks should be created along with regular chunks for active-learning.
ChunkerAnnotator
public ChunkerAnnotator(java.lang.String type)
- Parameters:
type
- Unused parameter.
processDocument
protected void processDocument(Annotation document,
double percentage)
- Description copied from class:
DocumentProcessor
- Processes the specified document which consisits of the specified percentage of total work to be performed by this annotator.
- Overrides:
processDocument
in class ParagraphProcessor
processSentence
public void processSentence(Annotation sentence,
double percentage)
- Description copied from class:
SentenceProcessor
- Processes the specified sentence which consisits of the specified percentage of total work to be performed by this annotator.
- Specified by:
processSentence
in class SentenceProcessor
- Parameters:
sentence
- The sentence to be annotated.percentage
- The percentage of work this sentence represents.
getChunkTags
protected abstract java.lang.String[] getChunkTags(java.lang.String[] toks,
java.lang.String[] tags,
java.lang.String[] pretags,
double[] tprobs)
- Computes a list of chunk tags which can be converted into chunk actions using
getChunkAction
.
- Parameters:
toks
- The tokens to be chunked.tags
- The POS tags of the words.pretags
- An array containing chunk tags which should be maintained.
A value of null for a particular tag indicates that no pre-tag needs be maintained and sending in null for the array indicated that no pre-tags are to be maintained.tprobs
- The chunk tag probabilities for the returned chunk tags. This is populated by this function.
- Returns:
- A chunk tag for each toks
getChunkAction
protected abstract ChunkerAnnotator.ChunkAction getChunkAction(java.lang.String tag,
Annotation ann)
- Determines the chunk action for the specified chunk tag.
- Parameters:
tag
- The chunk tag.ann
- The annotation the tag was applied to.
- Returns:
- A chunk tag with appropiate action and tag fields.
endOfDocument
protected void endOfDocument()
- This function is called after each document has been processed. It can be over-ridden for processing between documents.
This is useful for document level tag caching helpful in named entity detection.
Copyright © 2004 Thomas Morton and Jeremy LaCivita. All Rights Reserved.