View on GitHub

utf8proc

a clean C library for processing UTF-8 Unicode data: normalization, case-folding, graphemes, and more

Download this project as a .zip file Download this project as a tar.gz file

utf8proc

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding, supporting Unicode version 16.0. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects.

(utf8proc is used for basic Unicode support in the Julia language, and the Julia developers became involved because they wanted to add Unicode 7 support and other features.)

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

Documentation

See the utf8proc manual (or the utfproc.h header file included with utf8proc) for a description of the utf8proc API.

utf8proc can be compiled on Unix-flavor (e.g. GNU/Linux or MacOS X) or Windows systems with a C or C++ compiler. It can be called from the Ruby language via the utf8_proc gem.

On Unix, run make to compile static and dynamic libraries, make check to run some self-tests, and optionally sudo make install to install the libraries and header files in /usr/local (or sudo make prefix=/some/dir install to install in /some/dir). You can then link with the resulting libutf8proc library by including -lutf8proc in your link flags (preceded by -L/some/dir/lib if you installed in a nonstandard directory) and #include <utf8proc.h> in your source code (with a -I/some/dir/include compiler flag if you installed in a nonstandard directory).
Alternatively, you can compile utf8proc using cmake (which runs on both Unix-like systems and Windows). Assuming cmake and a compiler are installed, run mkdir build in the utf8proc directory (to create a build directory), followed by cd build and cmake .. to build the library.

Releases

See the utf8proc releases page for links to download the current and previous utf8proc releases, along with information on the changes in each release.

Contact

Bug reports, feature requests, and other queries can be filed at the utf8proc issues page on Github.