utf8::all(3pm) | User Contributed Perl Documentation | utf8::all(3pm) |
utf8::all - turn on Unicode - all of it
version 0.024
use utf8::all; # Turn on UTF-8, all of it. open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here print length 'føø bār'; # 7 UTF-8 characters my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
The "use utf8" pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions.
"utf8::all" goes further:
The pragma is lexically-scoped, so you can do the following if you had some reason to:
{ use utf8::all; open my $out, '>', 'outfile'; my $utf8_str = 'føø bār'; print length $utf8_str, "\n"; # 7 print $out $utf8_str; # out as utf8 } open my $in, '<', 'outfile'; # in as raw my $text = do { local $/; <$in>}; print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use "no utf8::all" to turn off the effects.
Note that the effect on @ARGV and the "STDIN", "STDOUT", and "STDERR" file handles is always global and can not be undone!
As described above, the default behaviour of "utf8::all" is to convert @ARGV and to open the "STDIN", "STDOUT", and "STDERR" file handles with UTF-8 encoding, and override the "readlink" and "readdir" functions and "glob" operators when "utf8::all" is used from the "main" package.
If you want to disable these features even when "utf8::all" is used from the "main" package, add the option "NO-GLOBAL" (or "LEXICAL-ONLY") to the use line. E.g.:
use utf8::all 'NO-GLOBAL';
If on the other hand you want to enable these global effects even when "utf8::all" was used from another package than "main", use the option "GLOBAL" on the use line:
use utf8::all 'GLOBAL';
"utf8::all" will handle invalid code points (i.e., utf-8 that does not map to a valid unicode "character"), as a fatal error.
For "glob", "readdir", and "readlink", one can change this behaviour by setting the attribute "$utf8::all::UTF8_CHECK".
By default "utf8::all" marks decoding errors as fatal (default value for this setting is "Encode::FB_CROAK"). If you want, you can change this by setting $utf8::all::UTF8_CHECK. The value "Encode::FB_WARN" reports the encoding errors as warnings, and "Encode::FB_DEFAULT" will completely ignore them. Please see Encode for details. Note: "Encode::LEAVE_SRC" is always enforced.
Important: Only controls the handling of decoding errors in "glob", "readdir", and "readlink".
If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012 <https://metacpan.org/source/PJF/autodie-2.12/Changes#L3>. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 <https://rt.cpan.org/Ticket/Display.html?id=54777> and GH #7 <https://github.com/doherty/utf8-all/issues/7>.
Please report any bugs or feature requests on the bugtracker website <https://github.com/doherty/utf8-all/issues>.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The "readlink" and "readdir" functions and "glob" operators will therefore not be replaced on these systems.
This software is copyright (c) 2009 by Michael Schwern <mschwern@cpan.org>; he originated it.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
2018-01-06 | perl v5.26.1 |