•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 715
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 265

Search Results | Total Results found :   1214

You refine search by : All Results
  Catalogue
Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.

Added on August 28, 2018

89

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Ritu Garg,Anukriti Bansal,Santanu Chaudhury,Sumantra Dutta Roy

Active learning and crowd sourcing are becoming increasingly popular in the machine learning community for fast and cost effective generation of labels for large volumes of data. However, such labels may be noisy. So, it becomes important to ignore the noisy labels for building of a good classifier. We propose a framework for finding the best possible augmentation of a classifier for the character recognition problem using minimum number of crowd labeled samples. The approach inherently rejects the noisy data and tries to accept a subset of correctly labeled data to maximize the classifier performance.

Added on August 27, 2018

27

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Arpit Agarwal,Ritu Garg,Santanu Chaudhury

We propose here a technique for transforming the layout of a printed document image to a new user-conducive layout. Its objective is to effectuate better display in a low-resolution screen for providing comfort and convenience to a viewer while reading. The task of re-targeting starts with analyzing the document image in the spatial domain for identifying its paragraphs. Text lines, words, characters, and hyphenations are then recognized from each paragraph, and necessary word stitching is performed to reproduce the paragraph, as appropriate to the resolution of the display device. Test results and related subjective evaluation for different datasets, especially the pages scanned from some Bengali and English magazines, demonstrate the strength and effectiveness of the proposed technique.

Added on August 27, 2018

23

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Soumyadeep Dey, Jayanta Mukherjee, Shamik Sural,Partha Bhowmick

Performance of an OCR system is badly affected due to presence of hand-drawn annotation lines in various forms, such as underlines, circular lines, and other text-surrounding curves. Such annotation lines are drawn by a reader usually in free hand in order to summarize some text or to mark the keywords within a document page. In this paper, we propose a generalized scheme for detection and removal of these hand-drawn annotations from a scanned document page. An underline drawn by hand is roughly horizontal or has a tolerable undulation, whereas for a hand-drawn curved line, the slope usually changes at a gradual pace.

Added on August 27, 2018

14

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Sanjoy Pratihar, Partha Bhowmick, Shamik Sural, Jayanta Mukhopadhyay

Rubber stamps on document pages often overlap and obscure the text very badly, thereby impairing its readability and deteriorating the performance of an optical character recognition system. Removal of rubber stamps from a document image is, therefore, essential for successfully converting a document image into an editable electronic form. We propose here an effective technique for rubber stamp removal from scanned document images. It is based on the novel idea of a single feature obtained by projecting the pixel colors of the image foreground along the eigenvector corresponding to the first principal component in HSV color space.

Added on August 27, 2018

33

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Soumyadeep Dey, Jayanta Mukherjee,Shamik Sural