<icap_store_eventiconv_get_encoding>
view the version of this page
Last updated: Sun, 21 May 2006

LVII. iconv Functions

Introduction

This module contains an interface to iconv character set conversion facility. With this module, you can turn a string represented by a local character set into the one represented by another character set, which may be the Unicode character set. Supported character sets depend on the iconv implementation of your system. Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.

Since PHP 5.0.0, this extension comes with various utility functions that help you to write multilingual scripts. Let's have a look at the following sections to explore the new features.

Requirements需求

You will need nothing if the system you are using is one of the recent POSIX-compliant systems because standard C libraries that are supplied in them must provide iconv facility. Otherwise, you have to get the libiconv library installed in your system.

Installation

To use functions provided by this module, the PHP binary must be built with the following configure line: --with-iconv[=DIR].

Note to Windows® Users: In order to enable this module on a Windows® environment, you need to put a DLL file named iconv.dll or iconv-1.3.dll (prior to 4.2.1) which is bundled with the PHP/Win32 binary package into a directory specified by the PATH environment variable or one of the system directories of your Windows® installation.

This module is part of PHP as of PHP 5 thus iconv.dll and php_iconv.dll is not needed anymore.

Runtime Configuration运行时配置

The behaviour of these functions is affected by settings in php.ini.

这些函数的行为受 php.ini 的影响

Table 1. Iconv configuration options

NameDefaultChangeableChangelog
iconv.input_encoding"ISO-8859-1"PHP_INI_ALLAvailable since PHP 4.0.5.
iconv.output_encoding"ISO-8859-1"PHP_INI_ALLAvailable since PHP 4.0.5.
iconv.internal_encoding"ISO-8859-1"PHP_INI_ALLAvailable since PHP 4.0.5.
For further details and definitions of the PHP_INI_* constants, see the
有关 PHP_INI_* 常量进一步的细节与定义参见Appendix G.

Warning

Some systems (like IBM AIX) use "ISO8859-1" instead of "ISO-8859-1" so this value has to be used in configuration options and function parameters.

Note: Configuration option iconv.input_encoding is currently not used for anything.

Resource Types资源类型

This extension has no resource types defined.
本扩展模块未定义任何资源类型。

Predefined Constants预定义常量(常数)

Since PHP 4.3.0 it is possible to identify at runtime which iconv implementation is adopted by this extension.

Table 2. iconv constants

NameTypeDescription
ICONV_IMPLstringThe implementation name
ICONV_VERSIONstringThe implementation version

Note: Writing implementation-dependent scripts with these constants is strongly discouraged.

Since PHP 5.0.0, the following constants are also available:

Table 3. iconv constants available since PHP 5.0.0

NameTypeDescription
ICONV_MIME_DECODE_STRICTintegerA bitmask used for iconv_mime_decode()
ICONV_MIME_DECODE_CONTINUE_ON_ERRORintegerA bitmask used for iconv_mime_decode()

Table of Contents
iconv_get_encoding -- Retrieve internal configuration variables of iconv extension
iconv_mime_decode_headers --  Decodes multiple MIME header fields at once
iconv_mime_decode --  Decodes a MIME header field
iconv_mime_encode --  Composes a MIME header field
iconv_set_encoding -- Set current setting for character encoding conversion
iconv_strlen --  Returns the character count of string
iconv_strpos --  Finds position of first occurrence of a needle within a haystack
iconv_strrpos --  Finds the last occurrence of a needle within the specified range of haystack
iconv_substr --  Cut out part of a string
iconv -- Convert string to requested character encoding
ob_iconv_handler -- Convert character encoding as output buffer handler


add a note add a note User Contributed Notes
iconv Functions
CoolCode.cn

通常情况下,我们的网页要指定一个编码字符集,如 GB2312、UTF-8、ISO-8859-1 等,这样我们就可以在网页上显示我们指定编码的文字了。但是我们很可能会遇到这种情况,那就是我们可能希望在 ISO-8859-1 编码的网页上显示汉字,或者在 GB2312 编码的网页上显示韩文等。
当然一种解决办法就是我们不用 ISO-8859-1 或者 GB2312 编码,而统统都采用 UTF-8 编码,这样我们只要在这种编码下,就可以混合显示各国文字了,这是现在很多网站采用的方法。

而我这里所说的并非上面这种方法,因为上面这种方法必须要指定字符集为 UTF-8 才可以,一旦用户手工指定为其他字符集,或者可能因为某些原因,那个字符集设置没起作用,而浏览器又没有正确自动识别的话,我们看到的网页还是乱码,尤其是在某些用框架作的网页中,某个框架中的页面如果字符集设置没起作用,在 firefox 中显示乱码而且还没法改变(我是说在不装RightEncode插件的情况下)。

而我这里介绍的方法即使是把网页指定为 ISO-8859-1 字符集,也能够正确显示汉字、日文等。原理很简单,就是把除了 ISO-8859-1 编码中前128个字符以外的所有其他的编码都用 NCR(Numeric character reference) 来表示。比如“   汉字  ”这两个字,如果我们写成“   &#27721;&#23383;   ”这种形式,那么它在任意字符集下都可以正确显示。根据这个原理,我写了下面这个程序,它可以把现有的网页转化为在任意字符集下都能显示的网页。你只需要指定源网页的字符集和源网页,点提交按钮,就可以得到目标网页了。你也可以只转化某些文字,只需要把文字填写到文本框中,并指定这些文字原来的字符集,点提交按钮,就会在页面上面显示编码后的文字了。另外我还编写了 WordPress 的插件,现在我的 Blog 已经可以在任意字符集下都能正确显示了。

实现方法:
首先第一步是要把源字符集的字符串转化为UTF-16字符集,做这一步是因为UTF-16字符集中的每个字符都是两个字节,后面处理起来很容易,而如果在源字符集上直接做处理则很复杂。源字符集可以从原网页中的meta标签中获得,也可以单独指定,我的程序是让用户在表单中指定源字符集,因为我不能保证用户提交的文件就一定是HTML文件(其他文件也是可以的,比如这个WordPress的汉化包源文件是个po文件,它里面的内容也可以这样处理),而且即使是HTML文件,里面也不一定就有用于指定字符集的meta标签,所以通过表单单独指定字符集比较保险。你可能会觉得将一种字符集转化为另一种字符集很复杂,确实如此,如果自己来实现的话,确实非常麻烦,但是用PHP来做却很容易,因为它里面已经包含这样的函数了,你可以通过iconv函数很容易的来实现各种字符集之间的转化,如果你的机器上没有安装iconv扩展,你也可以使用mb_convert_encoding函数,如果Multibyte String扩展也没有安装,那就没办法了,因为你要自己实现那么多种编码的转化基本上是不可能的,除非你是顶级大牛!推荐使用iconv,因为这个效率高,支持的字符集也更多。

做完上面那一步之后,接下来是以每两个字节为单位对字符串进行处理。这两个字节直接转化为数字就是&#xxxxx;中的xxxxx,如果这个数字小于128就直接使用这个字符(注意这里就变成单字节了),否则就使用&#xxxxx;的形式。这里有一点要注意,就是当这个数字是65279(16进制的0xFEFF)时,请把它忽略掉,因为这个是Unicode编码中的传输控制字符,而我们现在的字符串已经只有iso-8859-1编码中的前128个字符了,所以我们不需要它了。
好了,基本思路就是这样,下面是实现的程序:

  1. <?php
  2. function nochaoscode($encode, $str) {
  3.     $str = iconv($encode, "UTF-16BE", $str);
  4.     for ($i = 0; $i < strlen($str); $i++,$i++) {
  5.         $code = ord($str{$i}) * 256 + ord($str{$i + 1});
  6.         if ($code < 128) {
  7.             $output .= chr($code);
  8.         } else if ($code != 65279) {
  9.             $output .= "&#".$code.";";
  10.         }
  11.     }
  12.     return $output;
  13. }
  14. ?>

函数的参数中,$encode是源字符集,$str是需要进行转化的字符串。返回结果是转化以后字符串。

出处:CoolCode.CN

08-Nov-2005 01:05
But this is a very slow method to convert this:

// function to change german umlauts into ue, oe, etc.
function cv_input($str){

Better try this:
$tr = array(chr(xyz) => ' , chr(160) => ' '); // Just a simple example, put all your characters in there
$string = strtr($string, $tr);
Christophe Lienert
27-Sep-2005 03:09
In addition to Godfather's note below, you may find this function useful just as well.

// function to change german umlauts into ue, oe, etc.
function cv_input($str){
     $out = "";
     for ($i = 0; $i<strlen($str);$i++){
           $ch= ord($str{$i});
           switch($ch){
               case 195: $out .= "";break;   
               case 164: $out .= "ae"; break;
               case 188: $out .= "ue"; break;
               case 182: $out .= "oe"; break;
               case 132: $out .= "Ae"; break;
               case 156: $out .= "Ue"; break;
               case 150: $out .= "Oe"; break;
               default : $out .= chr($ch) ;
           }
     }
     return $out;
}
The Godfather
15-Dec-2004 06:36
With this function you can translate the german Symbols from the character set UTF-8 in windows-1252.

function convert_text($str){
  $out = '';
  for ($i = 0; $i<strlen($str);$i++){
   $ch = ord($str{$i});
   switch($ch){
         case 252: $out .= chr(129);break; //u Umlaut
         case 220: $out .= chr(154);break;//U Umlaut
         case 228: $out .= chr(132);break;//a Umlaut 
         case 196: $out .= chr(142);break;//A Umlaut
         case 214: $out .= chr(153);break;//O Umlaut 
         case 246: $out .= chr(148);break;//o Umlaug
         case 223: $out .= chr(225);break;//SZ
         default : $out .= chr($ch) ;
   }
  }
  return $out;
}
tokiee at hotmail dot com
19-Aug-2004 02:40
iconv now has been built-in, at least in PHP >= 5.0.1 for win32. You don't have to modify php.ini for this. Actually you should not. And clearly, libiconv does not need to be installed.
thierry.bo
23-Dec-2003 03:26
Windows users.

Personaly I leaved all php dlls in \php\dlls\ directory, just adding this path to my system path, and iconv.dll supplied with php 4.3.2 works fine, also leaving supplied php_iconv.dll in my \php\extensions\ directory. This was working fine with Apache and Omnihttpd server I use.

As soon I installed IIS on the same server, php complained about not finding php_iconv.dll in the extensions directory. In fact PHP with IIS loads all extensions in my \php\extensions directory correctly, except php_iconv.dll.
Although iconv.dll is in my system path, the only way to load php_iconv.dll was to copy iconv.dll file in \%winnt\system32 directory. With other servers, iconv.dll can be in anywhere in the system path.
ALecFFer
06-Nov-2003 05:10
To windows users:

Download here iconv version 1.9.1:
http://www.zlatkovic.com/pub/libxml/iconv-1.9.1.win32.zip
13-Sep-2002 06:23
I'm not sure how recent version of
glibc 2.x Slackware 7.x/8.x comes with, but
it's very likely that it comes with glibc 2.2.x.
In that case, you don't have to bother at all to
install libiconv in /usr/local. iconv(3) in glibc 2.2.x
is very good (thanks to Ulrich Drepper and
Bruno Haible. the latter is the author of libiconv).
libiconv is very handy for those outdated/non-standard-compliant Unix
and non-Unix systems that don't have
sufficiently good iconv(3) in their C library.
elk at NOSPAMmodel-fx dot com
26-Jul-2002 10:07
If you use the libiconv library instead of the libc's iconv support, don't forget to use libiconv() instead of iconv()
elk at NOSPAMmodel-fx dot com
25-Jul-2002 09:39
To compile libiconv under Slackware 7.0 or 8.0 without errors (either with the apache module of PHP or the CGI version), you must specify the full path of the libiconv installation.

Exemple :

       --with-iconv=/usr/local

<icap_store_eventiconv_get_encoding>
 Last updated: Sun, 21 May 2006
show source | credits | stats | sitemap | contact | advertising | mirror sites 
  This mirror generously provided by: VersaServers
Last updated: Mon May 22 06:18:47 2006 MDT