•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 696
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 251

Search Results | Total Results found :   1181

You refine search by : All Results
  Catalogue
The OCR technology for Indian documents is in emerging stage and most of these Indian OCR systems can read the documents written in only a single script. As many commercial and official documents of different states of India are tri-lingual in nature, therefore identification of script and/ or language is one of the elementary tasks for multi-script document recognition. A script recognizer simplifies the task of multi-lingual OCR by improving the accuracy and reducing the computational complexity. This script recognition may be at line, word or character level depending on interlacing of different scripts at different levels.

Added on August 28, 2018

75

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Rajneesh Rani,Renu Dhir,Gurpreet Singh Lehal

Script Identification is one of the challenging step in the Optical Character Recognition system for multi-script documents. In Indian and Non-Indian context some results have been reported, but research in this field is still emerging. This paper presents a research work in the identification of Gurmukhi and English scripts at word level. It also identifies English Numerals from Gurmukhi text. Gabor feature extraction is one of most popular method for script recognition. This paper presents a zone based gabor feature extraction technique.

Added on August 28, 2018

6

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Rajneesh Rani,Renu Dhir,Gurpreet Lehal

Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.

Added on August 28, 2018

15

  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Ritu Garg,Anukriti Bansal,Santanu Chaudhury,Sumantra Dutta Roy

Active learning and crowd sourcing are becoming increasingly popular in the machine learning community for fast and cost effective generation of labels for large volumes of data. However, such labels may be noisy. So, it becomes important to ignore the noisy labels for building of a good classifier. We propose a framework for finding the best possible augmentation of a classifier for the character recognition problem using minimum number of crowd labeled samples. The approach inherently rejects the noisy data and tries to accept a subset of correctly labeled data to maximize the classifier performance.

Added on August 27, 2018

4

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Arpit Agarwal,Ritu Garg,Santanu Chaudhury

We propose here a technique for transforming the layout of a printed document image to a new user-conducive layout. Its objective is to effectuate better display in a low-resolution screen for providing comfort and convenience to a viewer while reading. The task of re-targeting starts with analyzing the document image in the spatial domain for identifying its paragraphs. Text lines, words, characters, and hyphenations are then recognized from each paragraph, and necessary word stitching is performed to reproduce the paragraph, as appropriate to the resolution of the display device. Test results and related subjective evaluation for different datasets, especially the pages scanned from some Bengali and English magazines, demonstrate the strength and effectiveness of the proposed technique.

Added on August 27, 2018

5

  More Details
  • Contributed by : OCR Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Soumyadeep Dey, Jayanta Mukherjee, Shamik Sural,Partha Bhowmick