•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255

Search Results | Total Results found :   255

You refine search by :    Linguistic Resources      Handwritten Data      Document Image Corpora      Text Corpora      Named Entity Resources      Dictionary      Lexicon      Speech Corpora  
  Catalogue
500 scanned images of Malayalam language in TIFF format. Images are scanned at 600 dpi in greyscale mode. These scan images are the byproduct of OCR project.

Last updated on May 22, 2017

7
31

  More Details
  • Contributed by : C-DAC Noida, OCR Consortium
  • Product Type : Linguistic Resources
  • License Type : Research
  • System Requirement : Not Applicable

500 scanned images of Punjabi language in TIFF format. Images are scanned at 300 dpi in greyscale mode. These scan images are the byproduct of OCR project.

Last updated on May 23, 2017

1
18

  More Details
  • Contributed by : C-DAC Noida, OCR Consortium
  • Product Type : Linguistic Resources
  • License Type : Research
  • System Requirement : Not Applicable

500 scanned images of Telugu language in TIFF format. Images are scanned at 300 dpi in greyscale mode. These scan images are the byproduct of OCR project.

Last updated on May 23, 2017

1
35

  More Details
  • Contributed by : C-DAC Noida, OCR Consortium
  • Product Type : Linguistic Resources
  • License Type : Research
  • System Requirement : Not Applicable

Under the Indian Languages Corpora Initiative (ILCI) project initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected corpus in Hindi as source language and translated it in Bodo as the target language. There are 70,000 sentences, including Health, Tourism, Agriculture and Entertainment domain in this corpus. This corpus has a unique sentence ID for each sentence, UTF-8 encoding, and text file format. The translated sentences have been POS tagged and Chunked properly. The chunking guideline used in this corpus creation, is provided in supporting document.

Added on April 26, 2019

1
9

  More Details
  • Contributed by : ILCI Consortium, JNU
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

Under the Indian Languages Corpora Initiative (ILCI) project initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected corpus in Hindi as source language and translated it in English as the target language. There are 70,000 sentences, including Health, Tourism, Agriculture and Entertainment domain in this corpus. This corpus has a unique sentence ID for each sentence, UTF-8 encoding, and text file format. The translated sentences have been POS tagged and Chunked properly. The chunking guideline is provided in supporting document.

Last updated on April 29, 2019

0
25

  More Details
  • Contributed by : ILCI Consortium, JNU
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable