View on GitHub


a clean C library for processing UTF-8 Unicode data: normalization, case-folding, graphemes, and more

Download this project as a .zip file Download this project as a tar.gz file


utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding, supporting Unicode version 15. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects.

(utf8proc is used for basic Unicode support in the Julia language, and the Julia developers became involved because they wanted to add Unicode 7 support and other features.)

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included file for more detailed information.


See the utf8proc manual (or the utfproc.h header file included with utf8proc) for a description of the utf8proc API.

utf8proc can be compiled on Unix-flavor (e.g. GNU/Linux or MacOS X) or Windows systems with a C or C++ compiler. It can be called from the Ruby language via the utf8_proc gem.


See the utf8proc releases page for links to download the current and previous utf8proc releases, along with information on the changes in each release.


Bug reports, feature requests, and other queries can be filed at the utf8proc issues page on Github.