Jon Safari's Home Page
Downloads  

Character Sets:

PC-Kimmo:

Corpus Stuff:

  • buckwalter_morphan_1.tar.gz - Tim Buckwalter's Arabic Morphological Analyzer. It contains a massive Arabic dictionary, and it's Free.
  • wc-freq_1-1.pl - A Perl script that sorts words from a corpus in English by frequency, giving total number of types and tokens.
  • wc-freq-farsi_1-1.pl - A Perl script that sorts words from a corpus in Romanized Persian by frequency, giving total number of types and tokens.
  • newline_1-4.pl - A Perl script to convert a text corpus into one word per line file.