utf8proc
The following are the current and past releases of the utf8proc library. See also the development version of utf8proc on Github.
The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.
- utf8proc 2.10.0 (2024-12-31):
- utf8proc 2.9.0 (2023-10-32):
- Unicode 15.1 support (#253).
- utf8proc 2.8.0 (2022-10-30):
- Unicode 15 support (#247).
- utf8proc 2.7.0 (2021-12-16):
- utf8proc 2.6.1 (2020-12-15):
- Bugfix in
utf8proc_grapheme_break_stateful
forNULL
state argument, which also brokeutf8proc_grapheme_break
.
- Bugfix in
- utf8proc 2.6.0 (2020-11-23):
- utf8proc 2.5.0 (2020-03-27):
- utf8proc 2.4.0 (2019-05-10):
- utf8proc 2.3.0 (2019-03-30):
-
Unicode 12 support (#148).
-
New function
utf8proc_unicode_version
to return the supported Unicode version (#151). -
Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 (#150).
-
Fix
CHARBOUND
option forutf8proc_map
to preserve U+FFFE and U+FFFF non-characters (#149).
-
- utf8proc 2.2.0 (2018-07-24):
-
utf8proc_NFKC_Casefold
convenience function forNFKC_Casefold
normalization (#133). -
UTF8PROC_STRIPNA
option to strip unassigned codepoints (#133). -
Support building static libraries on Windows (callers need to
#define UTF8PROC_STATIC
) (#123). -
cmake
fix to avoid definingUTF8PROC_EXPORTS
globally (#121). -
toupper
of ß (U+00df) now yields ẞ (U+1E9E) (#134), similar to musl; case-folding still yields the standard "ss" mapping. -
utf8proc_charwidth
now returns1
for U+00AD (soft hyphen) and for unassigned/PUA codepoints (#135).
- utf8proc 2.1.1 (2018-04-27):
- utf8proc v2.1.0 (2016-12-26):
-
New functions
utf8proc_map_custom
andutf8proc_decompose_custom
to allow user-supplied transformations of codepoints, in conjunction with other transformations (#89). -
New function
utf8proc_normalize_utf32
to apply normalizations directly to UTF-32 data (not just UTF-8) (#88). -
Fixed stack overflow that could occur due to incorrect definition of
UINT16_MAX
with some compilers (#84). -
Fixed conflict with
stdbool.h
in Visual Studio (#90). -
Updated font metrics to use Unifont 9.0.04.
-
- utf8proc v2.0.2 (2016-07-27):
-
Move
-Wmissing-prototypes
warning flag fromMakefile
to.travis.yml
since MSVC does not understand this flag and it is occasionally useful to build using MSVC through theMakefile
(#79). -
Use a different variable name for a nested loop in
bench/bench.c
, and declare it in a C89 way rather than inside thefor
to avoid "error: 'for' loop initial declarations are only allowed in C99 mode" (#80).
-
- utf8proc v2.0.1 (2016-07-13):
- utf8proc v2.0 (2016-07-13):
Updated for Unicode 9.0 (#70).
New
utf8proc_grapheme_break_stateful
to handle the complicated grapheme-breaking rules in Unicode 9. The oldutf8proc_grapheme_break
is still provided, but may incorrectly identify grapheme breaks in some Unicode-9 sequences.Smaller Unicode tables (#62, #68). This required changes in the
utf8proc_property_t
structure, which breaks backward compatibility if you access thisstruct
directly. The functions in the API remain backward-compatible, however.Buffer overrun fix (#66).
- utf8proc v1.3.1 (2015-11-02):
- utf8proc v1.3 (2015-07-06):
Updated for Unicode 8.0 (#45).
New
utf8proc_tolower
andutf8proc_toupper
functions, portable replacements fortowlower
andtowupper
in the C library (#40).Don't treat Unicode "non-characters" as invalid, and improved validity checking in general (#35).
Prefix all typedefs with
utf8proc_
, e.g.utf8proc_int32_t
, to avoid collisions with other libraries (#32).Rename
DLLEXPORT
toUTF8PROC_DLLEXPORT
to prevent collisions.Fix build breakage in the benchmark routines.
More fine-grained Makefile variables (
PICFLAG
etcetera), so that compilation flags can be selectively overridden, and in particular so thatCFLAGS
can be changed without accidentally eliminating necessary flags like-fPIC
and-std=c99
(#43).Updated character-width tables based on Unifont 8.0.01 (#51) and the Unicode 8 character categories (#47).
- utf8proc v1.2 (2015-03-28):
Updated for Unicode 7.0 (#6).
New function
utf8proc_grapheme_break(c1,c2)
that returns whether there is a grapheme break betweenc1
andc2
(#20).New function
utf8proc_charwidth(c)
that returns the number of column-positions that should be required forc
; essentially a portable replacment forwcwidth(c)
(#27).New function
utf8proc_category(c)
that returns the Unicode category ofc
(as one of the constantsUTF8PROC_CATEGORY_xx
). Also, a functionutf8proc_category_string(c)
that returns the Unicode category ofc
as a two-character string.cmake
scriptCMakeLists.txt
, in addition toMakefile
, for easier compilation on Windows (#28).Various
Makefile
improvements: amake check
target to perform tests (#13),make install
, a rule to automate updating the Unicode tables, etcetera.The shared library is now versioned (e.g. has a soname on GNU/Linux) (#24).
C++/MSVC compatibility (#17).
Most
#defined
constants are nowenums
(#29).New preprocessor constants
UTF8PROC_VERSION_MAJOR
,UTF8PROC_VERSION_MINOR
, andUTF8PROC_VERSION_PATCH
for compile-time detection of the API version.Doxygen-formatted documentation (#29).
The Ruby and PostgreSQL plugins have been removed due to lack of testing (#22).
- utf8proc v1.1.6 (2013-11-27):
- PostgreSQL 9.2 and 9.3 compatibility (lower case 'c' language name)
- utf8proc-v1.1.5.tar.gz (2009-10-16)
- Use
RSTRING_PTR()
andRSTRING_LEN()
instead ofRSTRING()->ptr
andRSTRING()->len
for ruby-1.9 compatibility (and#define
them, if nonexistent) - Patches for compatibility with Microsoft Visual Studio
- Fixes to make utf8proc usable in C++ programs
- Use
- utf8proc-v1.1.4.tar.gz (2009-08-19)
- Replaced C++ style comments for compatibility reasons
- Added typecasts to suppress compiler warnings
- Removed redundant source files for ruby-gemfile generation
- Changed copyright notice for Public Software Group e. V.
- Minor changes in the README file
- utf8proc-v1.1.3.tar.gz
- PostgreSQL 8.3 compatibility (use of
SET_VARSIZE
macro) - Added a function
utf8proc_version
returning a string containing the version number of the library. - Included a target
libutf8proc.dylib
for MacOSX.
- PostgreSQL 8.3 compatibility (use of
- utf8proc-v1.1.2.tar.gz
- Fixed a serious bug in the data file generator, which caused characters being treated incorrectly, when stripping default ignorable characters or calculating grapheme cluster boundaries.
- utf8proc-v1.1.1.tar.gz
- Changed license from BSD to MIT style.
- Added a new function
utf8proc_codepoint_valid
to the C library. - Changed compiler flags in
Makefile
from-g -O0
to-O2
- The ruby script, which was used to build the
utf8proc_data.c
file, is now included in the distribution. - Added a new PostgreSQL function unistrip, which behaves like unifold, but also removes all character marks (e.g. accents).
- utf8proc-v1.0.3.tar.gz
- Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method
String#utf8chars
).
- Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method
- utf8proc-v1.0.2.tar.gz
- added support for PostgreSQL version 8.2
- included a check in
Integer#utf8
, which raises an exception, if the given code-point is invalid because of being too high (this was missing yet)
- utf8proc-v1.0.1.tar.gz
- included a gem file for the ruby version of the library
- utf8proc-v1.0.tar.gz
- added the
LUMP
option, which lumps certain characters together (seelump.txt
) (also used for the PostgreSQLunifold
function) - added the
STRIPMARK
option, which strips marking characters (or marks of composed characters) - deprecated ruby method
String#char_ary
in favour ofString#utf8chars
- added the
- utf8proc-v0.3.tar.gz
- added support to mark the beginning of a grapheme cluster with
0xFF
(option:CHARBOUND
) - added the ruby method
String#chars
, which returns an array of UTF-8 encoded grapheme clusters - added
NLF2LF
transformation in postgresqlunifold
function - added the
DECOMPOSE
option, if you neither useCOMPOSE
orDECOMPOSE
, no normalization will be performed (different from previous versions) - using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
- changed normalization from NFC to NFKC for postgresql
unifold
function
- added support to mark the beginning of a grapheme cluster with
- utf8proc-v0.2.tar.gz
- added
-fpic
compiler flag inMakefile
- fixed bug in the C code for the ruby library (usage of non-existent function)
- changed behaviour of PostgreSQL function to return
NULL
in case of invalid input, rather than raising an exceptional condition - improved efficiency of PostgreSQL function (no transformation to C string is done)
- added
- utf8proc-v0.1.tar.gz (2006-06-02): Initial public release.