C16RTOMB(3C) Standard C Library Functions C16RTOMB(3C)
NAME
c16rtomb,
c32rtomb,
wcrtomb,
wcrtomb_l - convert wide-characters to
character sequences
SYNOPSIS
#include <uchar.h> size_t c16rtomb(
char *restrict str,
char16_t c16,
mbstate_t *restrict ps);
size_t c32rtomb(
char *restrict str,
char32_t c32,
mbstate_t *restrict ps);
#include <stdio.h> size_t wcrtomb(
char *restrict str,
wchar_t wc,
mbstate_t *restrict ps);
#include <stdio.h> #include <xlocale.h> size_t wcrtomb_l(
char *restrict str,
wchar_t wc,
mbstate_t *restrict ps,
locale_t loc);
DESCRIPTION
The
c16rtomb(),
c32rtomb(),
wcrtomb(), and
wcrtomb_l() functions
convert wide-character sequences into a series of multi-byte
characters. The functions work in the following formats:
c16rtomb()
A UTF-16 code sequence, where every code point is
represented by one or two
char16_t. The UTF-16 encoding
will encode certain Unicode code points as a pair of two
16-bit code sequences, commonly referred to as a surrogate
pair.
c32rtomb()
A UTF-32 code sequence, where every code point is
represented by a single
char32_t. It is illegal to pass
reserved Unicode code points.
wcrtomb(),
wcrtomb_l()
Wide characters, being a 32-bit value where every code point
is represented by a single
wchar_t. While the
wchar_t and
char32_t are different types, in this implementation, they
are similar encodings.
The functions all work by looking at the passed in wide-character (
c16,
c32,
wc) and appending it to the current conversion state,
ps. Once a
valid code point, based on the current locale, is found, then it will
be converted into a series of characters that are stored in
str. Up to
MB_CUR_MAX bytes will be stored in
str. It is the caller's
responsibility to ensure that there is sufficient space in
str.
The functions are all influenced by the LC_CTYPE category of the
current locale for determining what is considered a valid character.
For example, in the
C locale, only ASCII characters are recognized,
while in a
UTF-8 based locale like
en_us.UTF-8, all valid Unicode code
points are recognized and will be converted into the corresponding
multi-byte sequence. The
wcrtomb_l() function uses the locale passed
in
loc rather than the locale of the current thread.
The
ps argument represents a multi-byte conversion state which can be
used across multiple calls to a given function (but not mixed between
functions). These allow for characters to be consumed from subsequent
buffers, e.g. different values of
str. The functions may be called
from multiple threads as long as they use unique values for
ps. If
ps is NULL, then a function-specific buffer will be used for the
conversion state; however, this is stored between all threads and its
use is not recommended.
The functions all have a special behavior when NULL is passed for
str.
They instead will treat it as though a the NULL wide-character was
passed in
c16,
c32, or
wc and an internal buffer (buf) will be used to
write out the results of the conversion. In other words, the functions
would be called as:
c16rtomb(buf, L'\0', ps)
c32rtomb(buf, L'\0', ps)
wcrtomb(buf, L'\0', ps)
wcrtomb_l(buf, L'\0', ps, loc)
Locale Details
Not all locales in the system are Unicode based locales. For example,
ISO 8859 family locales have code points with values that do not match
their counterparts in Unicode. When using these functions with non-
Unicode based locales, the code points returned will be those
determined by the locale. They will not be converted from the
corresponding Unicode code point. For example, if using the Euro sign
in ISO 8859-15, these functions will not encode the Unicode value
0x20ac into the ISO 8859-15 value 0xa4.
Regardless of the locale, the characters returned will be encoded as
though the code point were the corresponding value in Unicode. This
means that when using UTF-16, if the corresponding code point were in
the range for surrogate pairs, then the
c16rtomb() function will expect
to receive that code point in that fashion.
This behavior of the
c16rtomb() and
c32rtomb() functions should not be
relied upon, is not portable, and subject to change for non-Unicode
locales.
RETURN VALUES
Upon successful completion, the
c16rtomb(),
c32rtomb(),
wcrtomb(), and
wcrtomb_l() functions return the number of bytes stored in
str.
Otherwise,
(size_t)-1 is returned to indicate an encoding error and
errno is set.
EXAMPLES
Example 1 Converting a UTF-32 character into a multi-byte character
sequence.
#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
#include <stdio.h>
#include <uchar.h>
int
main(void)
{
mbstate_t mbs;
size_t ret;
char buf[MB_CUR_MAX];
char32_t val = 0x5149;
const char *uchar_exp = "\xe5\x85\x89";
(void) memset(&mbs, 0, sizeof (mbs));
(void) setlocale(LC_CTYPE, "en_US.UTF-8");
ret = c32rtomb(buf, val, &mbs);
if (ret != strlen(uchar_exp)) {
errx(EXIT_FAILURE, "failed to convert string, got %zd",
ret);
}
if (strncmp(buf, uchar_exp, ret) != 0) {
errx(EXIT_FAILURE, "converted char32_t does not match "
"expected value");
}
return (0);
}
ERRORS
The
c16rtomb(),
c32rtomb(),
wcrtomb(), and
wcrtomb_l() functions will
fail if:
EINVAL The conversion state in
ps is invalid.
EILSEQ An invalid character sequence has been detected.
MT-LEVEL The
c16rtomb(),
c32rtomb(),
wcrtomb(), and
wcrtomb_l() functions are
MT-Safe as long as different
mbstate_t structures are passed in
ps. If
ps is NULL or different threads use the same value for
ps, then the
functions are
Unsafe.
INTERFACE STABILITY
CommittedSEE ALSO
mbrtoc16(3C),
mbrtoc32(3C),
mbrtowc(3C),
newlocale(3C),
setlocale(3C),
uselocale(3C),
uchar.h(3HEAD),
environ(7)illumos December 2, 2023 illumos