#include <utf8proc.h>

Data Fields
utf8proc_propval_t	category

utf8proc_propval_t	combining_class

utf8proc_propval_t	bidi_class

utf8proc_propval_t	decomp_type

utf8proc_uint16_t	decomp_seqindex

utf8proc_uint16_t	casefold_seqindex

utf8proc_uint16_t	uppercase_seqindex

utf8proc_uint16_t	lowercase_seqindex

utf8proc_uint16_t	titlecase_seqindex

utf8proc_uint16_t	comb_index:10

utf8proc_uint16_t	comb_length:5

utf8proc_uint16_t	comb_issecond:1

unsigned	bidi_mirrored:1

unsigned	comp_exclusion:1

unsigned	ignorable:1

unsigned	control_boundary:1

unsigned	charwidth:2

unsigned	ambiguous_width:1

unsigned	pad:1

unsigned	boundclass:6

unsigned	indic_conjunct_break:2

Detailed Description

Struct containing information about a codepoint.

Field Documentation

◆ ambiguous_width

unsigned utf8proc_property_struct::ambiguous_width

East Asian width class A

◆ bidi_class

utf8proc_propval_t utf8proc_property_struct::bidi_class

Bidirectional class.

See also: utf8proc_bidi_class_t.

◆ boundclass

unsigned utf8proc_property_struct::boundclass

Boundclass.

See also: utf8proc_boundclass_t.

◆ category

utf8proc_propval_t utf8proc_property_struct::category

Unicode category.

See also: utf8proc_category_t.

◆ charwidth

unsigned utf8proc_property_struct::charwidth

The width of the codepoint.

◆ comb_index

utf8proc_uint16_t utf8proc_property_struct::comb_index

Character combining table.

The character combining table is formally indexed by two characters, the first and second character that might form a combining pair. The table entry then contains the combined character. Most character pairs cannot be combined. There are about 1,000 characters that can be the first character in a combining pair, and for most, there are only a handful for possible second characters.

The combining table is stored as sparse matrix in the CSR (compressed sparse row) format. That is, it is stored as two arrays, utf8proc_uint32_t utf8proc_combinations_second[] and utf8proc_uint32_t utf8proc_combinations_combined[]. These contain the second combining characters and the combined character of every combining pair.

comb_index: Index into the combining table if this character is the first character in a combining pair, else 0x3ff
comb_length: Number of table entries for this first character
comb_is_second: As optimization we also record whether this character is the second combining character in any pair. If not, we can skip the table lookup.

A table lookup starts from a given character pair. It first checks whether the first character is stored in the table (checking whether the index is 0x3ff) and whether the second index is stored in the table (looking at comb_is_second). If so, the comb_length table entries will be checked sequentially for a match.

◆ decomp_type

utf8proc_propval_t utf8proc_property_struct::decomp_type

type.

See also: utf8proc_decomp_type_t.

◆ ignorable

unsigned utf8proc_property_struct::ignorable

Can this codepoint be ignored?

Used by utf8proc_decompose_char() when UTF8PROC_IGNORE is passed as an option.

The documentation for this struct was generated from the following file:

utf8proc.h

Data Fields