Unicode::UTF8(3pm) | User Contributed Perl Documentation | Unicode::UTF8(3pm) |
Unicode::UTF8 - Encoding and decoding of UTF-8 encoding form
use Unicode::UTF8 qw[decode_utf8 encode_utf8]; use warnings FATAL => 'utf8'; # fatalize encoding glitches $string = decode_utf8($octets); $octets = encode_utf8($string);
This module provides functions to encode and decode UTF-8 encoding form as specified by Unicode and ISO/IEC 10646:2011.
$string = decode_utf8($octets); $string = decode_utf8($octets, $fallback);
Returns an decoded representation of $octets in UTF-8 encoding as a character string.
$fallback is an optional "CODE" reference which provides a error-handling mechanism, allowing customization of error handling. The default error-handling mechanism is to replace any ill-formed UTF-8 sequences or encoded code points which can't be interchanged with REPLACEMENT CHARACTER (U+FFFD).
$string = $fallback->($octets, $is_usv, $position);
$fallback is invoked with three arguments: $octets, $is_usv and $position. $octets is a sequence of one or more octets containing the maximal subpart of the ill-formed subsequence or encoded code point which can't be interchanged. $is_usv is a boolean indicating whether or not $octets represent a encoded Unicode scalar value. $position is a unsigned integer containing the zero based octet position at which the error occurred within the octets provided to "decode_utf8()". $fallback must return a character string consisting of zero or more Unicode scalar values. Unicode scalar values consist of code points in the range U+0000..U+D7FF and U+E000..U+10FFFF.
$octets = encode_utf8($string); $octets = encode_utf8($string, $fallback);
Returns an encoded representation of $string in UTF-8 encoding as an octet string.
$fallback is an optional "CODE" reference which provides a error-handling mechanism, allowing customization of error handling. The default error-handling mechanism is to replace any code points which can't be interchanged or represented in UTF-8 encoding form with REPLACEMENT CHARACTER (U+FFFD).
$string = $fallback->($codepoint, $is_usv, $position);
$fallback is invoked with three arguments: $codepoint, $is_usv and $position. $codepoint is a unsigned integer containing the code point which can't be interchanged or represented in UTF-8 encoding form. $is_usv is a boolean indicating whether or not $codepoint is a Unicode scalar value. $position is a unsigned integer containing the zero based character position at which the error occurred within the string provided to "encode_utf8()". $fallback must return a character string consisting of zero or more Unicode scalar values.Unicode scalar values consist of code points in the range U+0000..U+D7FF and U+E000..U+10FFFF.
$boolean = valid_utf8($octets);
Returns a boolean indicating whether or not the given $octets consist of well-formed UTF-8 sequences.
None by default. All functions can be exported using the ":all" tag or individually.
The sub-categories: "nonchar", "surrogate" and "non_unicode" is only available on Perl 5.14 or greater. See perllexwarn for available categories and hierarchies.
Here is a summary of features for comparison with Encode's UTF-8 implementation:
It's the author's belief that this UTF-8 implementation is conformant with the Unicode Standard Version 6.0. Any deviations from the Unicode Standard is to be considered a bug.
Please report any bugs by email to "bug-unicode-utf8 at rt.cpan.org", or through the web interface at <http://rt.cpan.org/Public/Dist/Display.html?Name=Unicode-UTF8>. You will be automatically notified of any progress on the request by the system.
This is open source software. The code repository is available for public review and contribution under the terms of the license.
<http://github.com/chansen/p5-unicode-utf8>
git clone http://github.com/chansen/p5-unicode-utf8
Christian Hansen "chansen@cpan.org"
Copyright 2011-2017 by Christian Hansen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
2022-12-04 | perl v5.36.0 |