utf8proc

The following are the current and past releases of the utf8proc library. See also the development version of utf8proc on Github.

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

utf8proc 2.11.3 (2025-12-30):
- Correct out-of-bounds memory access when calling utf8proc_map with both UTF8PROC_CHARBOUND and UTF8PROC_COMPOSE (#323.
utf8proc 2.11.2 (2025-11-2):
- Fix composition for Hangul character U+11a7 (#317.
utf8proc 2.11.1 (2025-11-13):
- Correct out-of-bounds memory access when calling utf8proc_map with both UTF8PROC_CHARBOUND and UTF8PROC_DECOMPOSE (#311).
utf8proc 2.11.0 (2025-09-10):
- Unicode 17 support (#292, #294).
- Documentation improvements (#295, #291).
- Build fix for C90 (#284), silence ASAN warning (#240), CMake modernization (#260).
utf8proc 2.10.0 (2024-12-31):
- Unicode 16 support (#277).
- New utf8proc_charwidth_ambiguous function to return whether a character has East Asian width class A (Ambiguous) (#270).
utf8proc 2.9.0 (2023-10-32):
- Unicode 15.1 support (#253).
utf8proc 2.8.0 (2022-10-30):
- Unicode 15 support (#247).
utf8proc 2.7.0 (2021-12-16):
- Unicode 14 support (#233).
- Support GNUInstallDirs in CMake build (#159).
- cmake build now installs pkg-config file (#224).
- Various build and portability improvements.
utf8proc 2.6.1 (2020-12-15):
- Bugfix in utf8proc_grapheme_break_stateful for NULL state argument, which also broke utf8proc_grapheme_break.
utf8proc 2.6.0 (2020-11-23):
- New utf8proc_islower and utf8proc_isupper functions (#196).
- Bugfix for manual calls to grapheme_break_extended for initial characters (#205).
- Various build and portability improvements.
utf8proc 2.5.0 (2020-03-27):
- Unicode 13 support (#179).
- No longer report zero width for category Sk (#167).
- cmake support improvements (#173).
utf8proc 2.4.0 (2019-05-10):
- Unicode 12.1 support (#156).
- New -DUTF8PROC_INSTALL=No option for cmake builds to disable installation (#152).
- Better make support for HP-UX (#154).
- Fixed incorrect UTF8PROC_VERSION_MINOR version number in header and bumped shared-library version.
utf8proc 2.3.0 (2019-03-30):
- Unicode 12 support (#148).
- New function utf8proc_unicode_version to return the supported Unicode version (#151).
- Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 (#150).
- Fix CHARBOUND option for utf8proc_map to preserve U+FFFE and U+FFFF non-characters (#149).
- Various build-system improvements (#141, #142, #147).
utf8proc 2.2.0 (2018-07-24):
- Unicode 11 support (#132 and #140).
- utf8proc_NFKC_Casefold convenience function for NFKC_Casefold normalization (#133).
- UTF8PROC_STRIPNA option to strip unassigned codepoints (#133).
- Support building static libraries on Windows (callers need to #define UTF8PROC_STATIC) (#123).
- cmake fix to avoid defining UTF8PROC_EXPORTS globally (#121).
- toupper of ß (U+00df) now yields ẞ (U+1E9E) (#134), similar to musl; case-folding still yields the standard "ss" mapping.
- utf8proc_charwidth now returns 1 for U+00AD (soft hyphen) and for unassigned/PUA codepoints (#135).
utf8proc 2.1.1 (2018-04-27):
- Fixed composition bug (#128).
- Minor build fixes (#94, #99, #113, #125).
utf8proc v2.1.0 (2016-12-26):
- New functions utf8proc_map_custom and utf8proc_decompose_custom to allow user-supplied transformations of codepoints, in conjunction with other transformations (#89).
- New function utf8proc_normalize_utf32 to apply normalizations directly to UTF-32 data (not just UTF-8) (#88).
- Fixed stack overflow that could occur due to incorrect definition of UINT16_MAX with some compilers (#84).
- Fixed conflict with stdbool.h in Visual Studio (#90).
- Updated font metrics to use Unifont 9.0.04.
utf8proc v2.0.2 (2016-07-27):
- Move -Wmissing-prototypes warning flag from Makefile to .travis.yml since MSVC does not understand this flag and it is occasionally useful to build using MSVC through the Makefile (#79).
- Use a different variable name for a nested loop in bench/bench.c, and declare it in a C89 way rather than inside the for to avoid "error: 'for' loop initial declarations are only allowed in C99 mode" (#80).
utf8proc v2.0.1 (2016-07-13):
- Bug fix in utf8proc_grapheme_break_stateful (#77).
- Tests now use versioned Unicode files, so they will no longer break when a new version of Unicode is released (#78).
utf8proc v2.0 (2016-07-13):
- Updated for Unicode 9.0 (#70).
- New utf8proc_grapheme_break_stateful to handle the complicated grapheme-breaking rules in Unicode 9. The old utf8proc_grapheme_break is still provided, but may incorrectly identify grapheme breaks in some Unicode-9 sequences.
- Smaller Unicode tables (#62, #68). This required changes in the utf8proc_property_t structure, which breaks backward compatibility if you access this struct directly. The functions in the API remain backward-compatible, however.
- Buffer overrun fix (#66).
utf8proc v1.3.1 (2015-11-02):
- Do not export symbol for internal function unsafe_encode_char() (#55).
- Install relative symbolic links for shared libraries (#58).
- Enable and fix compiler warnings (#55, #58).
- Add missing files to make clean (#58).
utf8proc v1.3 (2015-07-06):
- Updated for Unicode 8.0 (#45).
- New utf8proc_tolower and utf8proc_toupper functions, portable replacements for towlower and towupper in the C library (#40).
- Don't treat Unicode "non-characters" as invalid, and improved validity checking in general (#35).
- Prefix all typedefs with utf8proc_, e.g. utf8proc_int32_t, to avoid collisions with other libraries (#32).
- Rename DLLEXPORT to UTF8PROC_DLLEXPORT to prevent collisions.
- Fix build breakage in the benchmark routines.
- More fine-grained Makefile variables (PICFLAG etcetera), so that compilation flags can be selectively overridden, and in particular so that CFLAGS can be changed without accidentally eliminating necessary flags like -fPIC and -std=c99 (#43).
- Updated character-width tables based on Unifont 8.0.01 (#51) and the Unicode 8 character categories (#47).
utf8proc v1.2 (2015-03-28):
- Updated for Unicode 7.0 (#6).
- New function utf8proc_grapheme_break(c1,c2) that returns whether there is a grapheme break between c1 and c2 (#20).
- New function utf8proc_charwidth(c) that returns the number of column-positions that should be required for c; essentially a portable replacment for wcwidth(c) (#27).
- New function utf8proc_category(c) that returns the Unicode category of c (as one of the constants UTF8PROC_CATEGORY_xx). Also, a function utf8proc_category_string(c) that returns the Unicode category of c as a two-character string.
- cmake script CMakeLists.txt, in addition to Makefile, for easier compilation on Windows (#28).
- Various Makefile improvements: a make check target to perform tests (#13), make install, a rule to automate updating the Unicode tables, etcetera.
- The shared library is now versioned (e.g. has a soname on GNU/Linux) (#24).
- C++/MSVC compatibility (#17).
- Most #defined constants are now enums (#29).
- New preprocessor constants UTF8PROC_VERSION_MAJOR, UTF8PROC_VERSION_MINOR, and UTF8PROC_VERSION_PATCH for compile-time detection of the API version.
- Doxygen-formatted documentation (#29).
- The Ruby and PostgreSQL plugins have been removed due to lack of testing (#22).
utf8proc v1.1.6 (2013-11-27):
- PostgreSQL 9.2 and 9.3 compatibility (lower case 'c' language name)
utf8proc-v1.1.5.tar.gz (2009-10-16)
- Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and RSTRING()->len for ruby-1.9 compatibility (and #define them, if nonexistent)
- Patches for compatibility with Microsoft Visual Studio
- Fixes to make utf8proc usable in C++ programs
utf8proc-v1.1.4.tar.gz (2009-08-19)
- Replaced C++ style comments for compatibility reasons
- Added typecasts to suppress compiler warnings
- Removed redundant source files for ruby-gemfile generation
- Changed copyright notice for Public Software Group e. V.
- Minor changes in the README file
utf8proc-v1.1.3.tar.gz
- PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)
- Added a function utf8proc_version returning a string containing the version number of the library.
- Included a target libutf8proc.dylib for MacOSX.
utf8proc-v1.1.2.tar.gz
- Fixed a serious bug in the data file generator, which caused characters being treated incorrectly, when stripping default ignorable characters or calculating grapheme cluster boundaries.
utf8proc-v1.1.1.tar.gz
- Changed license from BSD to MIT style.
- Added a new function utf8proc_codepoint_valid to the C library.
- Changed compiler flags in Makefile from -g -O0 to -O2
- The ruby script, which was used to build the utf8proc_data.c file, is now included in the distribution.
- Added a new PostgreSQL function unistrip, which behaves like unifold, but also removes all character marks (e.g. accents).
utf8proc-v1.0.3.tar.gz
- Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method String#utf8chars).
utf8proc-v1.0.2.tar.gz
- added support for PostgreSQL version 8.2
- included a check in Integer#utf8, which raises an exception, if the given code-point is invalid because of being too high (this was missing yet)
utf8proc-v1.0.1.tar.gz
- included a gem file for the ruby version of the library
utf8proc-v1.0.tar.gz
- added the LUMP option, which lumps certain characters together (see lump.txt) (also used for the PostgreSQL unifold function)
- added the STRIPMARK option, which strips marking characters (or marks of composed characters)
- deprecated ruby method String#char_ary in favour of String#utf8chars
utf8proc-v0.3.tar.gz
- added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
- added the ruby method String#chars, which returns an array of UTF-8 encoded grapheme clusters
- added NLF2LF transformation in postgresql unifold function
- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
- using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
- changed normalization from NFC to NFKC for postgresql unifold function
utf8proc-v0.2.tar.gz
- added -fpic compiler flag in Makefile
- fixed bug in the C code for the ruby library (usage of non-existent function)
- changed behaviour of PostgreSQL function to return NULL in case of invalid input, rather than raising an exceptional condition
- improved efficiency of PostgreSQL function (no transformation to C string is done)
utf8proc-v0.1.tar.gz (2006-06-02): Initial public release.