Rosetta Stone

Software

iATROS (Improved ATROS)

iATROS is a new implementation of a previous speech recogniser that has been adapted to be used in both speech and handwritten text recognition. iATROS provides a modular structure that can be used to build different systems whose core is a Viterbi-like search on a Hidden Markov Model network. iATROS provides standard tools for off-line recognition and on-line speech recognition (based on ALSA modules). Download.

  • Míriam Luján-Mares, Vicent Tamarit, Vicent Alabau, Carlos-D. Martínez-Hinarejos, Moisés Pastor, Alberto Sanchis, and Alejandro Toselli. iatros: A speech and handwritting recognition system. In V Jornadas en Tecnologías del Habla (VJTH'2008), pages 75-78, Bilbao (SPAIN), Nov 2008.

GIDOC (Gimp-based Interactive transcription of old text DOCuments)

GIDOC is a computer-assisted transcription prototype for handwritten text in old documents. It is a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. GIDOC is built on top of the well-known GNU Image Manipulation Program (GIMP), and uses standard techniques and tools for handwritten text preprocessing and feature extraction, HMM-based image modelling, and language modelling. Download.

Thot: a toolkit to train phrase-based models for statistical machine translation.

Thot is a toolkit to train phrase-based models for statistical machine translation. Thot allows to estimate the phrase-based models described in (Och, 2002) and (Ortiz et al. 2005). Thot also allows to obtain the best phrase alignment given a phrase model as described in (García-Varea et al. 2005). A description of the toolkit can be found in (Ortiz et al. 2005). Download

  • (Och, 2002) F.J. Och. Statistical Machine Translation: From Single-Word Models to Alignment Templates, Dissertation, Aachen, Germany, October, 2002
  • (Ortiz et al. 2005) D. Ortiz, I. García-Varea, and F. Casacuberta. Thot: a toolkit to train phrase-based statistical translation models. In Tenth Machine Translation Summit, pp. 141-148. Phuket, Thailand, September 2005

LPD: Learning Prototypes and Distances

An implementation of the LPD algorithm is available at: LPD C-Version.

  • R. Paredes and E. Vidal. Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization . Pattern Recognition, 39(2):180-188, 2006. Available: abstract(<1KB) bib(<1KB).

CPW: Class and Prototype Weights learning

An implementation of the CPW algorithm is available at: CPW C-Version

  • R. Paredes and E. Vidal. Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Transaction on Pattern Analisys and Machine Intelligence, 28(7), 2006. Available: abstract(<1KB) bib(<1KB).

GREAT: Giati and Refx Enhanced via Annotation Techniques

A toolkit to train stochastic transducers through the Giati method, using the Viterbi beam-search algorithm for decoding. Download

  • J. González, G. Sanchis, and F. Casacuberta. Learning finite state transducers using bilingual phrases. In 9th International Conference on Intelligent Text Processing and Computational Linguistics. Lecture Notes in Computer Science, Haifa, Israel, February 17 to 23 2008.
  • J. González and F. Casacuberta. GREAT: a finite-state machine translation toolkit implementing a Grammatical Inference Approach for Transducer Inference (GIATI). In EACL Workshop on Computational Linguistics Aspects of Grammatical Inference, pages 24-32, Athens, Greece, March 30 2009.