utf8proc
The following are the current and past releases of the utf8proc library. See also the development version of utf8proc on Github.
The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.
- utf8proc 2.11.0 (2025-09-10):
- utf8proc 2.10.0 (2024-12-31):
- utf8proc 2.9.0 (2023-10-32):
    - Unicode 15.1 support (#253).
 
- utf8proc 2.8.0 (2022-10-30):
    - Unicode 15 support (#247).
 
- utf8proc 2.7.0 (2021-12-16):
- utf8proc 2.6.1 (2020-12-15):
    - Bugfix in utf8proc_grapheme_break_statefulforNULLstate argument, which also brokeutf8proc_grapheme_break.
 
- Bugfix in 
- utf8proc 2.6.0 (2020-11-23):
- utf8proc 2.5.0 (2020-03-27):
- utf8proc 2.4.0 (2019-05-10):
- utf8proc 2.3.0 (2019-03-30):
  - 
      Unicode 12 support (#148). 
- 
      New function utf8proc_unicode_versionto return the supported Unicode version (#151).
- 
      Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 (#150). 
- 
      Fix CHARBOUNDoption forutf8proc_mapto preserve U+FFFE and U+FFFF non-characters (#149).
 
- 
      
- utf8proc 2.2.0 (2018-07-24):
- 
utf8proc_NFKC_Casefoldconvenience function forNFKC_Casefoldnormalization (#133).
- 
UTF8PROC_STRIPNAoption to strip unassigned codepoints (#133).
- 
Support building static libraries on Windows (callers need to #define UTF8PROC_STATIC) (#123).
- 
cmakefix to avoid definingUTF8PROC_EXPORTSglobally (#121).
- 
toupperof ß (U+00df) now yields ẞ (U+1E9E) (#134), similar to musl; case-folding still yields the standard "ss" mapping.
- 
utf8proc_charwidthnow returns1for U+00AD (soft hyphen) and for unassigned/PUA codepoints (#135).
 
- utf8proc 2.1.1 (2018-04-27):
- utf8proc v2.1.0 (2016-12-26):
- 
New functions utf8proc_map_customandutf8proc_decompose_customto allow user-supplied transformations of codepoints, in conjunction with other transformations (#89).
- 
New function utf8proc_normalize_utf32to apply normalizations directly to UTF-32 data (not just UTF-8) (#88).
- 
Fixed stack overflow that could occur due to incorrect definition of UINT16_MAXwith some compilers (#84).
- 
Fixed conflict with stdbool.hin Visual Studio (#90).
- 
Updated font metrics to use Unifont 9.0.04. 
 
- 
- utf8proc v2.0.2 (2016-07-27):
- 
Move -Wmissing-prototypeswarning flag fromMakefileto.travis.ymlsince MSVC does not understand this flag and it is occasionally useful to build using MSVC through theMakefile(#79).
- 
Use a different variable name for a nested loop in bench/bench.c, and declare it in a C89 way rather than inside theforto avoid "error: 'for' loop initial declarations are only allowed in C99 mode" (#80).
 
- 
- utf8proc v2.0.1 (2016-07-13):
- utf8proc v2.0 (2016-07-13):
  - Updated for Unicode 9.0 (#70). 
- New - utf8proc_grapheme_break_statefulto handle the complicated grapheme-breaking rules in Unicode 9. The old- utf8proc_grapheme_breakis still provided, but may incorrectly identify grapheme breaks in some Unicode-9 sequences.
- Smaller Unicode tables (#62, #68). This required changes in the - utf8proc_property_tstructure, which breaks backward compatibility if you access this- structdirectly. The functions in the API remain backward-compatible, however.
- Buffer overrun fix (#66). 
 
- utf8proc v1.3.1 (2015-11-02):
- utf8proc v1.3 (2015-07-06):
- Updated for Unicode 8.0 (#45). 
- New - utf8proc_tolowerand- utf8proc_toupperfunctions, portable replacements for- towlowerand- towupperin the C library (#40).
- Don't treat Unicode "non-characters" as invalid, and improved validity checking in general (#35). 
- Prefix all typedefs with - utf8proc_, e.g.- utf8proc_int32_t, to avoid collisions with other libraries (#32).
- Rename - DLLEXPORTto- UTF8PROC_DLLEXPORTto prevent collisions.
- Fix build breakage in the benchmark routines. 
- More fine-grained Makefile variables ( - PICFLAGetcetera), so that compilation flags can be selectively overridden, and in particular so that- CFLAGScan be changed without accidentally eliminating necessary flags like- -fPICand- -std=c99(#43).
- Updated character-width tables based on Unifont 8.0.01 (#51) and the Unicode 8 character categories (#47). 
 
- utf8proc v1.2 (2015-03-28):
- Updated for Unicode 7.0 (#6). 
- New function - utf8proc_grapheme_break(c1,c2)that returns whether there is a grapheme break between- c1and- c2(#20).
- New function - utf8proc_charwidth(c)that returns the number of column-positions that should be required for- c; essentially a portable replacment for- wcwidth(c)(#27).
- New function - utf8proc_category(c)that returns the Unicode category of- c(as one of the constants- UTF8PROC_CATEGORY_xx). Also, a function- utf8proc_category_string(c)that returns the Unicode category of- cas a two-character string.
- cmakescript- CMakeLists.txt, in addition to- Makefile, for easier compilation on Windows (#28).
- Various - Makefileimprovements: a- make checktarget to perform tests (#13),- make install, a rule to automate updating the Unicode tables, etcetera.
- The shared library is now versioned (e.g. has a soname on GNU/Linux) (#24). 
- C++/MSVC compatibility (#17). 
- Most - #definedconstants are now- enums(#29).
- New preprocessor constants - UTF8PROC_VERSION_MAJOR,- UTF8PROC_VERSION_MINOR, and- UTF8PROC_VERSION_PATCHfor compile-time detection of the API version.
- Doxygen-formatted documentation (#29). 
- The Ruby and PostgreSQL plugins have been removed due to lack of testing (#22). 
 
- utf8proc v1.1.6 (2013-11-27):
- PostgreSQL 9.2 and 9.3 compatibility (lower case 'c' language name)
 
- utf8proc-v1.1.5.tar.gz (2009-10-16)
- Use RSTRING_PTR()andRSTRING_LEN()instead ofRSTRING()->ptrandRSTRING()->lenfor ruby-1.9 compatibility (and#definethem, if nonexistent)
- Patches for compatibility with Microsoft Visual Studio
- Fixes to make utf8proc usable in C++ programs
 
- Use 
- utf8proc-v1.1.4.tar.gz (2009-08-19)
- Replaced C++ style comments for compatibility reasons
- Added typecasts to suppress compiler warnings
- Removed redundant source files for ruby-gemfile generation
- Changed copyright notice for Public Software Group e. V.
- Minor changes in the README file
 
- utf8proc-v1.1.3.tar.gz
- PostgreSQL 8.3 compatibility (use of SET_VARSIZEmacro)
- Added a function utf8proc_versionreturning a string containing the version number of the library.
- Included a target libutf8proc.dylibfor MacOSX.
 
- PostgreSQL 8.3 compatibility (use of 
- utf8proc-v1.1.2.tar.gz
- Fixed a serious bug in the data file generator, which caused characters being treated incorrectly, when stripping default ignorable characters or calculating grapheme cluster boundaries.
 
- utf8proc-v1.1.1.tar.gz
- Changed license from BSD to MIT style.
- Added a new function utf8proc_codepoint_validto the C library.
- Changed compiler flags in Makefilefrom-g -O0to-O2
- The ruby script, which was used to build the utf8proc_data.cfile, is now included in the distribution.
- Added a new PostgreSQL function unistrip, which behaves like unifold, but also removes all character marks (e.g. accents).
 
- utf8proc-v1.0.3.tar.gz
- Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method String#utf8chars).
 
- Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method 
- utf8proc-v1.0.2.tar.gz
- added support for PostgreSQL version 8.2
- included a check in Integer#utf8, which raises an exception, if the given code-point is invalid because of being too high (this was missing yet)
 
- utf8proc-v1.0.1.tar.gz
- included a gem file for the ruby version of the library
 
- utf8proc-v1.0.tar.gz
- added the LUMPoption, which lumps certain characters together (seelump.txt) (also used for the PostgreSQLunifoldfunction)
- added the STRIPMARKoption, which strips marking characters (or marks of composed characters)
- deprecated ruby method String#char_aryin favour ofString#utf8chars
 
- added the 
- utf8proc-v0.3.tar.gz
- added support to mark the beginning of a grapheme cluster with 0xFF(option:CHARBOUND)
- added the ruby method String#chars, which returns an array of UTF-8 encoded grapheme clusters
- added NLF2LFtransformation in postgresqlunifoldfunction
- added the DECOMPOSEoption, if you neither useCOMPOSEorDECOMPOSE, no normalization will be performed (different from previous versions)
- using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
- changed normalization from NFC to NFKC for postgresql unifoldfunction
 
- added support to mark the beginning of a grapheme cluster with 
- utf8proc-v0.2.tar.gz
- added -fpiccompiler flag inMakefile
- fixed bug in the C code for the ruby library (usage of non-existent function)
- changed behaviour of PostgreSQL function to return NULLin case of invalid input, rather than raising an exceptional condition
- improved efficiency of PostgreSQL function (no transformation to C string is done)
 
- added 
- utf8proc-v0.1.tar.gz (2006-06-02): Initial public release.