•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
This module takes free text and produces tokens with sentence boundaries marked.
A token may be any of the following: word, abbreviation, punctuation mark, real number, special symbol etc. No token has white space in it. Special symbols such as ‘|’, ‘.’ and two new lines are treated as end of sentence marker. Period is analyzed to decide whether it is an end of sentence marker or not. Abbreviations such as Mr. or Dr. is consider as a token. A list of acronyms is consulted when a period is found. Based on the list and some rules, it decides whether it is an abbreviation. By the end of processing, each sentence will contain all the tokens that make up the sentence

Added on November 22, 2010


  More Details
  • Product Type : Tool
  • License Type : Research
  • System Requirement : Linux
Similar / Suggested Resources