Source code

Revision control

Copy as Markdown

Other Tools

# About Hunspell
Hunspell is a free spell checker and morphological analyzer library
and command-line tool, licensed under LGPL/GPL/MPL tri-license.
Hunspell is used by LibreOffice office suite, free browsers, like
Mozilla Firefox and Google Chrome, and other tools and OSes, like
Linux distributions and macOS. It is also a command-line tool for
Linux, Unix-like and other OSes.
It is designed for quick and high quality spell checking and
correcting for languages with word-level writing system,
including languages with rich morphology, complex word compounding
and character encoding.
Hunspell interfaces: Ispell-like terminal interface using Curses
library, Ispell pipe interface, C++/C APIs and shared library, also
with existing language bindings for other programming languages.
Hunspell's code base comes from OpenOffice.org's MySpell library,
developed by Kevin Hendricks (originally a C++ reimplementation of
spell checking and affixation of Geoff Kuenning's International
Ispell from scratch, later extended with eg. n-gram suggestions),
its README, CONTRIBUTORS and license.readme (here: license.myspell) files.
Main features of Hunspell library, developed by László Németh:
- Unicode support
- Highly customizable suggestions: word-part replacement tables and
stem-level phonetic and other alternative transcriptions to recognize
and fix all typical misspellings, don't suggest offensive words etc.
- Complex morphology: dictionary and affix homonyms; twofold affix
stripping to handle inflectional and derivational morpheme groups for
agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian,
Turkish; 64 thousand affix classes with arbitrary number of affixes;
conditional affixes, circumfixes, fogemorphemes, zero morphemes,
virtual dictionary stems, forbidden words to avoid overgeneration etc.
- Handling complex compounds (for example, for Finno-Ugric, German and
Indo-Aryan languages): recognizing compounds made of arbitrary
number of words, handle affixation within compounds etc.
- Custom dictionaries with affixation
- Stemming
- Morphological analysis (in custom item and arrangement style)
- Morphological generation
- SPELLML XML API over plain spell() API function for easier integration
of stemming, morpological generation and custom dictionaries with affixation
- Language specific algorithms, like special casing of Azeri or Turkish
dotted i and German sharp s, and special compound rules of Hungarian.
Main features of Hunspell command line tool, developed by László Németh:
- Reimplementation of quick interactive interface of Geoff Kuenning's Ispell
- Parsing formats: text, OpenDocument, TeX/LaTeX, HTML/SGML/XML, nroff/troff
- Custom dictionaries with optional affixation, specified by a model word
- Multiple dictionary usage (for example hunspell -d en_US,de_DE,de_medical)
- Various filtering options (bad or good words/lines)
- Morphological analysis (option -m)
- Stemming (option -s)
See man hunspell, man 3 hunspell, man 5 hunspell for complete manual.
Translations: Hunspell has been translated into several languages already. If your language is missing or incomplete, please use [Weblate](https://hosted.weblate.org/engage/hunspell/) to help translate Hunspell.
</a>
# Dependencies
Build only dependencies:
g++ make autoconf automake autopoint libtool
Runtime dependencies:
| | Mandatory | Optional |
|---------------|------------------|------------------|
|libhunspell | | |
|hunspell tool | libiconv gettext | ncurses readline |
# Compiling on GNU/Linux and Unixes
We first need to download the dependencies. On Linux, `gettext` and
`libiconv` are part of the standard library. On other Unixes we
need to manually install them.
For Ubuntu:
sudo apt install autoconf automake autopoint libtool
Then run the following commands:
autoreconf -vfi
./configure
make
sudo make install
sudo ldconfig
For dictionary development, use the `--with-warnings` option of
configure.
For interactive user interface of Hunspell executable, use the
`--with-ui` option.
Optional developer packages:
- ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
- readline (for fancy input line editing, configure parameter:
--with-readline)
In Ubuntu, the packages are:
libncurses5-dev libreadline-dev
# Compiling on OSX and macOS
On macOS for compiler always use `clang` and not `g++` because Homebrew
dependencies are build with that.
brew install autoconf automake libtool gettext
brew link gettext --force
Then run:
autoreconf -vfi
./configure
make
# Compiling on Windows
## Compiling with Mingw64 and MSYS2
Download Msys2, update everything and install the following
packages:
pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see
above.
## Compiling in Cygwin environment
Download and install Cygwin environment for Windows with the following
extra packages:
- make
- automake
- autoconf
- libtool
- gcc-g++ development package
- ncurses, readline (for user interface)
- iconv (character conversion)
Then compile the same way as on Linux. Cygwin builds depend on
Cygwin1.dll.
# Debugging
It is recommended to install a debug build of the standard library:
libstdc++6-6-dbg
For debugging we need to create a debug build and then we need to start
`gdb`.
./configure CXXFLAGS='-g -O0 -Wall -Wextra'
make
./libtool --mode=execute gdb src/tools/hunspell
You can also pass the `CXXFLAGS` directly to `make` without calling
`./configure`, but we don't recommend this way during long development
sessions.
If you like to develop and debug with an IDE, see documentation at
# Testing
Testing Hunspell (see tests in tests/ subdirectory):
make check
or with Valgrind debugger:
make check
VALGRIND=[Valgrind_tool] make check
For example:
make check
VALGRIND=memcheck make check
# Documentation
features and dictionary format:
man 5 hunspell
man hunspell
hunspell -h
# Usage
After compiling and installing (see INSTALL) you can run the Hunspell
spell checker (compiled with user interface) with a Hunspell or Myspell
dictionary:
hunspell -d en_US text.txt
or without interface:
hunspell
hunspell -d en_GB -l <text.txt
Dictionaries consist of an affix (.aff) and dictionary (.dic) file, for
example, download American English dictionary files of LibreOffice
(older version, but with stemming and morphological generation) with
and with command line input and output, it's possible to check its work quickly,
for example with the input words "example", "examples", "teached" and
"verybaaaaaaaaaaaaaaaaaaaaaad":
$ hunspell -d en_US
Hunspell 1.7.0
example
*
examples
+ example
teached
& teached 9 0: taught, teased, reached, teaches, teacher, leached, beached
verybaaaaaaaaaaaaaaaaaaaaaad
# verybaaaaaaaaaaaaaaaaaaaaaad 0
Where in the output, `*` and `+` mean correct (accepted) words (`*` = dictionary stem,
`+` = affixed forms of the following dictionary stem), and
`&` and `#` mean bad (rejected) words (`&` = with suggestions, `#` = without suggestions)
(see man hunspell).
Example for stemming:
$ hunspell -d en_US -s
mice
mice mouse
Example for morphological analysis (very limited with this English dictionary):
$ hunspell -d en_US -m
mice
mice st:mouse ts:Ns
cats
cats st:cat ts:0 is:Ns
cats st:cat ts:0 is:Vs
# Other executables
The src/tools directory contains the following executables after compiling.
- The main executable:
- hunspell: main program for spell checking and others (see
manual)
- Example tools:
- analyze: example of spell checking, stemming and morphological
analysis
- chmorph: example of automatic morphological generation and
conversion
- example: example of spell checking and suggestion
- Tools for dictionary development:
- affixcompress: dictionary generation from large (millions of
words) vocabularies
- makealias: alias compression (Hunspell only, not back compatible
with MySpell)
- wordforms: word generation (Hunspell version of unmunch)
- hunzip: decompressor of hzip format
- hzip: compressor of hzip format
- munch (DEPRECATED, use affixcompress): dictionary generation
from vocabularies (it needs an affix file, too).
- unmunch (DEPRECATED, use wordforms): list all recognized words
of a MySpell dictionary
Example for morphological generation:
$ ~/hunspell/src/tools/analyze en_US.aff en_US.dic /dev/stdin
cat mice
generate(cat, mice) = cats
mouse cats
generate(mouse, cats) = mice
generate(mouse, cats) = mouses
# Using Hunspell library with GCC
Including in your program:
#include <hunspell.hxx>
Linking with Hunspell static library:
g++ -lhunspell-1.7 example.cxx
# or better, use pkg-config
g++ $(pkg-config --cflags --libs hunspell) example.cxx
## Dictionaries
Hunspell (MySpell) dictionaries:
Aspell dictionaries (conversion: man 5 hunspell):
László Németh, nemeth at numbertext org