U8_TEXTPREP_STR(3C)     Standard C Library Functions     U8_TEXTPREP_STR(3C)
NAME
       u8_textprep_str - string-based UTF-8 text preparation function
SYNOPSIS
       #include <sys/u8_textprep.h>       
size_t u8_textprep_str(
char *inarray, 
size_t *inlen,            
char *outarray, 
size_t *outlen, 
int flag,            
size_t unicode_version, 
int *errnum);
PARAMETERS
       inarray                           A pointer to a byte array containing a sequence
                           of UTF-8 character bytes to be prepared.       
inlen                           As input argument, the number of bytes to be
                           prepared in 
inarray. As output argument, the
                           number of bytes in 
inarray still not consumed.       
outarray                           A pointer to a byte array where prepared UTF-8
                           character bytes can be saved.       
outlen                           As input argument, the number of available bytes
                           at 
outarray where prepared character bytes can be
                           saved.  As output argument, after the conversion,
                           the number of bytes still available at 
outarray.       
flag                           The possible preparation options constructed by a
                           bitwise-inclusive-OR of the following values:                           
U8_TEXTPREP_IGNORE_NULL                               Normally 
u8_textprep_str() stops the
                               preparation if it encounters null byte even
                               if the current 
inlen is pointing to a value
                               bigger than zero.
                               With this option, null byte does not stop the
                               preparation and the preparation continues
                               until 
inlen specified amount of 
inarray bytes
                               are all consumed for preparation or an error
                               happened.                           
U8_TEXTPREP_IGNORE_INVALID                               Normally 
u8_textprep_str() stops the
                               preparation if it encounters illegal or
                               incomplete characters with corresponding                               
errnum values.
                               When this option is set, 
u8_textprep_str()                               does not stop the preparation and instead
                               treats such characters as no need to do any
                               preparation.                           
U8_TEXTPREP_TOUPPER                               Map lowercase characters to uppercase
                               characters if applicable.                           
U8_TEXTPREP_TOLOWER                               Map uppercase characters to lowercase
                               characters if applicable.                           
U8_TEXTPREP_NFD                               Apply Unicode Normalization Form D.                           
U8_TEXTPREP_NFC                               Apply Unicode Normalization Form C.                           
U8_TEXTPREP_NFKD                               Apply Unicode Normalization Form KD.                           
U8_TEXTPREP_NFKC                               Apply Unicode Normalization Form KC.
                           Only one case folding option is allowed. Only one
                           Unicode Normalization option is allowed.
                           When a case folding option and a Unicode
                           Normalization option are specified together,
                           UTF-8 text preparation is done by doing case
                           folding first and then Unicode Normalization.
                           If no option is specified, no processing occurs
                           except the simple copying of bytes from input to
                           output.       
unicode_version                           The version of Unicode data that should be used
                           during UTF-8 text preparation.  The following
                           values are supported:                           
U8_UNICODE_320                               Use Unicode 3.2.0 data during comparison.                           
U8_UNICODE_500                               Use Unicode 5.0.0 data during comparison.                           
U8_UNICODE_LATEST                               Use the latest Unicode version data available
                               which is Unicode 5.0.0 currently.       
errnum                           The error value when preparation is not completed
                           or fails. The following values are supported:                           
E2BIG                                     Text preparation stopped due to lack of
                                     space in the output array.                           
EBADF                                     Specified option values are conflicting
                                     and cannot be supported.                           
EILSEQ                                     Text preparation stopped due to an
                                     input byte that does not belong to
                                     UTF-8.                           
EINVAL                                     Text preparation stopped due to an
                                     incomplete UTF-8 character at the end
                                     of the input array.                           
ERANGE                                     The specified Unicode version value is
                                     not a supported version.
DESCRIPTION
       The 
u8_textprep_str() function prepares the sequence of UTF-8
       characters in the array specified by 
inarray into a sequence of
       corresponding UTF-8 characters prepared in the array specified by       
outarray. The 
inarray argument points to a character byte array to
       the first character in the input array and 
inlen indicates the number
       of bytes to the end of the array to be converted. The 
outarray       argument points to a character byte array to the first available byte
       in the output array and 
outlen indicates the number of the available
       bytes to the end of the array. Unless 
flag is       
U8_TEXTPREP_IGNORE_NULL, 
u8_textprep_str() normally stops when it
       encounters a null byte from the input array regardless of the current       
inlen value.
       If 
flag is 
U8_TEXTPREP_IGNORE_INVALID and a sequence of input bytes
       does not form a valid UTF-8 character, preparation stops after the
       previous successfully prepared character. If 
flag is       
U8_TEXTPREP_IGNORE_INVALID and the input array ends with an
       incomplete UTF-8 character, preparation stops after the previous
       successfully prepared bytes. If the output array is not large enough
       to hold the entire prepared text, preparation stops just prior to the
       input bytes that would cause the output array to overflow. The value
       pointed to by 
inlen is decremented to reflect the number of bytes
       still not prepared in the input array. The value pointed to by 
outlen       is decremented to reflect the number of bytes still available in the
       output array.
RETURN VALUES
       The 
u8_textprep_str() function updates the values pointed to by 
inlen       and 
outlen arguments to reflect the extent of the preparation. When       
U8_TEXTPREP_IGNORE_INVALID is specified, 
u8_textprep_str() returns
       the number of illegal or incomplete characters found during the text
       preparation. When 
U8_TEXTPREP_IGNORE_INVALID is not specified and the
       text preparation is entirely successful, the function returns 0. If
       the entire string in the input array is prepared, the value pointed
       to by 
inlen will be 0. If the text preparation is stopped due to any
       conditions mentioned above, the value pointed to by 
inlen will be
       non-zero and 
errnum is set to indicate the error. If such and any
       other error occurs, 
u8_textprep_str() returns (
size_t)-1 and sets       
errnum to indicate the error.
EXAMPLES
       Example 1: Simple UTF-8 text preparation
         #include <sys/u8_textprep.h>
         .
         .
         .
         size_t ret;
         char ib[MAXPATHLEN];
         char ob[MAXPATHLEN];
         size_t il, ol;
         int err;
         .
         .
         .
         /*
          * We got a UTF-8 pathname from somewhere.
          *
          * Calculate the length of input string including the terminating
          * NULL byte and prepare other arguments.
          */
         (void) strlcpy(ib, pathname, MAXPATHLEN);
         il = strlen(ib) + 1;
         ol = MAXPATHLEN;
         /*
          * Do toupper case folding, apply Unicode Normalization Form D,
          * ignore NULL byte, and ignore any illegal/incomplete characters.
          */
         ret = u8_textprep_str(ib, &il, ob, &ol,
             (U8_TEXTPREP_IGNORE_NULL|U8_TEXTPREP_IGNORE_INVALID|
             U8_TEXTPREP_TOUPPER|U8_TEXTPREP_NFD), U8_UNICODE_LATEST, &err);
         if (ret == (size_t)-1) {
             if (err == E2BIG)
                 return (-1);
             if (err == EBADF)
                 return (-2);
             if (err == ERANGE)
                 return (-3);
             return (-4);
         }
ATTRIBUTES
       See 
attributes(7) for descriptions of the following attributes:
       +--------------------+-----------------+
       |  ATTRIBUTE TYPE    | ATTRIBUTE VALUE |
       +--------------------+-----------------+
       |Interface Stability | Committed       |
       +--------------------+-----------------+
       |MT-Level            | MT-Safe         |
       +--------------------+-----------------+
SEE ALSO
       u8_strcmp(3C), 
u8_validate(3C), 
attributes(7), 
u8_strcmp(9F),       
u8_textprep_str(9F), 
u8_validate(9F)       The Unicode Standard (http://www.unicode.org)
NOTES
       After the text preparation, the number of prepared UTF-8 characters
       and the total number bytes may decrease or increase when you compare
       the numbers with the input buffer.
       Case conversions are performed using Unicode data of the
       corresponding version.  There are no locale-specific case conversions
       that can be performed.
                             September 18, 2007          U8_TEXTPREP_STR(3C)