I18N::Charset(3pm) | User Contributed Perl Documentation | I18N::Charset(3pm) |
I18N::Charset - IANA Character Set Registry names and Unicode::MapUTF8 (et al.) conversion scheme names
use I18N::Charset; $sCharset = iana_charset_name('WinCyrillic'); # $sCharset is now 'windows-1251' $sCharset = umap_charset_name('Adobe DingBats'); # $sCharset is now 'ADOBE-DINGBATS' which can be passed to Unicode::Map->new() $sCharset = map8_charset_name('windows-1251'); # $sCharset is now 'cp1251' which can be passed to Unicode::Map8->new() $sCharset = umu8_charset_name('x-sjis'); # $sCharset is now 'sjis' which can be passed to Unicode::MapUTF8->new() $sCharset = libi_charset_name('x-sjis'); # $sCharset is now 'MS_KANJI' which can be passed to `iconv -f $sCharset ...` $sCharset = enco_charset_name('Shift-JIS'); # $sCharset is now 'shiftjis' which can be passed to Encode::from_to() I18N::Charset::add_iana_alias('my-japanese' => 'iso-2022-jp'); I18N::Charset::add_map8_alias('my-arabic' => 'arabic7'); I18N::Charset::add_umap_alias('my-hebrew' => 'ISO-8859-8'); I18N::Charset::add_libi_alias('my-sjis' => 'x-sjis'); I18N::Charset::add_enco_alias('my-japanese' => 'shiftjis');
The "I18N::Charset" module provides access to the IANA Character Set Registry names for identifying character encoding schemes. It also provides a mapping to the character set names used by the Unicode::Map and Unicode::Map8 modules.
So, for example, if you get an HTML document with a META CHARSET="..." tag, you can fairly quickly determine what Unicode::MapXXX module can be used to convert it to Unicode.
If you don't have the module Unicode::Map installed, the umap_ functions will always return undef. If you don't have the module Unicode::Map8 installed, the map8_ functions will always return undef. If you don't have the module Unicode::MapUTF8 installed, the umu8_ functions will always return undef. If you don't have the iconv library installed, the libi_ functions will always return undef. If you don't have the Encode module installed, the enco_ functions will always return undef.
There are four main conversion routines: "iana_charset_name()", "map8_charset_name()", "umap_charset_name()", and "umu8_charset_name()".
$sCharset = iana_charset_name('WinCyrillic');
$sCharset = mime_charset_name('Extended_UNIX_Code_Packed_Format_for_Japanese');
$sCharset = enco_charset_name('Extended_UNIX_Code_Packed_Format_for_Japanese');
$sCharset = libi_charset_name('Extended_UNIX_Code_Packed_Format_for_Korean');
$sCharset = mib_to_charset_name('3');
$iMIB = charset_name_to_mib('US-ASCII');
$sCharset = map8_charset_name('windows-1251');
$sCharset = umap_charset_name('hebrew');
$sCharset = umu8_charset_name('windows-1251');
There is one function which can be used to obtain a list of all IANA-registered character set names.
This module supports several semi-private routines for specifying character set name aliases.
Returns the target character set name of the successfully installed alias. Returns 'undef' if the target character set name is not registered. Returns 'undef' if the target character set name of the second alias is not registered.
I18N::Charset::add_iana_alias('my-alias1' => 'Shift_JIS');
With this code, "my-alias1" becomes an alias for the existing IANA character set name 'Shift_JIS'.
I18N::Charset::add_iana_alias('my-alias2' => 'sjis');
With this code, "my-alias2" becomes an alias for the IANA character set name referred to by the existing alias 'sjis' (which happens to be 'Shift_JIS').
If the first argument is a registered IANA character set name, then all aliases of that IANA character set name will end up pointing to the target Map8 mapping name.
Returns the target mapping name of the successfully installed alias. Returns 'undef' if the target mapping name is not registered. Returns 'undef' if the target mapping name of the second alias is not registered.
I18N::Charset::add_map8_alias('normal' => 'ANSI_X3.4-1968');
With the above statement, "normal" becomes an alias for the existing Unicode::Map8 mapping name 'ANSI_X3.4-1968'.
I18N::Charset::add_map8_alias('normal' => 'US-ASCII');
With the above statement, "normal" becomes an alias for the existing Unicode::Map mapping name 'ANSI_X3.4-1968' (which is what "US-ASCII" is an alias for).
I18N::Charset::add_map8_alias('IBM297' => 'EBCDIC-CA-FR');
With the above statement, "IBM297" becomes an alias for the existing Unicode::Map mapping name 'EBCDIC-CA-FR'. As a side effect, all the aliases for 'IBM297' (i.e. 'cp297' and 'ebcdic-cp-fr') also become aliases for 'EBCDIC-CA-FR'.
Returns the target conversion scheme name of the successfully installed alias. Returns 'undef' if there is no such target conversion scheme or alias.
Examples:
I18N::Charset::add_libi_alias('my-chinese1' => 'CN-GB');
With this code, "my-chinese1" becomes an alias for the existing iconv conversion scheme 'CN-GB'.
I18N::Charset::add_libi_alias('my-chinese2' => 'EUC-CN');
With this code, "my-chinese2" becomes an alias for the iconv conversion scheme referred to by the existing alias 'EUC-CN' (which happens to be 'CN-GB').
Returns the target encoding name of the successfully installed alias. Returns 'undef' if there is no such encoding or alias.
Examples:
I18N::Charset::add_enco_alias('my-japanese1' => 'jis0201-raw');
With this code, "my-japanese1" becomes an alias for the existing encoding 'jis0201-raw'.
I18N::Charset::add_enco_alias('my-japanese2' => 'my-japanese1');
With this code, "my-japanese2" becomes an alias for the encoding referred to by the existing alias 'my-japanese1' (which happens to be 'jis0201-raw' after the previous call).
Similarly, the only character set names which have a corresponding mapping in the Unicode::Map module are the character sets that Unicode::Map can convert.
Martin 'Kingpin' Thurn, "mthurn at cpan.org", <http://tinyurl.com/nn67z>.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
2021-02-27 | perl v5.32.1 |