MySQL 5.1 Reference Manual :: 10 Character Set Support :: 10.4 Connection Character Sets and Collations
  • MySQL 5.1 Reference Manual

  • 10 Character Set Support
  • 10.1 Character Sets and Collations in General
  • 10.2 Character Sets and Collations in MySQL
  • 10.3 Specifying Character Sets and Collations
  • 10.4 Connection Character Sets and Collations
  • 10.5 Collation Issues
  • 10.6 Operations Affected by Character Set Support
  • 10.7 Unicode Support
  • 10.8 UTF-8 for Metadata
  • 10.9 Character Sets and Collations That MySQL Supports


10.4.字符编码    内码    汉字编码

 

从早期ASCII以英文 数字为主的编码、为了处理汉字 程序员设计了用于简体中文的GB2312和用于繁体中文的big5, GB2312收录了 7445个字, GB2312支持的汉字太少。1995年的汉字扩展规范 GBK1.0 收录了21886个符号 ,2000年的 GB18030 收录了藏文、蒙文、维吾尔文等主要的少数民族文字,该标准收录了27484个汉字。

不过GB18030增加的字符,普通人是很难用到的,一般中文Windows操作系统(Operation System)的缺省(Default)内码还是GBK,或big5

如右图 : 後来的字符表是建立在先前的基础上,这些编码方法是向下兼容的,例如:B 字符在这些字符表中 ASCII或GB18030都有相同的编码 : 66

而Unicode只与ASCII兼容(更准确地说,是与ISO-8859-1兼容),与GB码不兼容,big5、GB码都是独立的字符集,这也叫做远东字符集。例如“汉”字的Unicode编码是6C49,而GB码是BABA。
如果资源中的Unicode字符串不能映射到当前代码页中的字符,就会出现??.........想知道更多 Click into 专家 原作叁观这篇化繁为简很棒的文章

Binary 二進位 Decimal 十進位 Hex 十六進位 Graphic 圖形
0100 0000 64 40 @
0100 0001 65 41 A
0100 0010 66 42 B
0100 0011 67 43 C
0101 1010 90 5A Z
0101 1011 91 5B [
0101 1100 92 5C \

字符集(character sets),或稱字集.......维基百科

右表是省略的ASCII(American Standard Code for Information Interchange,美國信息互換標準代碼),内容篇幅较长,请点击这里阅读全文 .........维基百科

使用UTF-8的原因

utf8字符集是一张非常巨大的表格,把世界上各种语言的字符和标点符号都编排到里面,UTF-8是Unicode的其中一個使用方式。 UTF是 Unicode Translation Format(另一種說法為 UCS Translation Format),即把Unicode轉做某種格式的意思。

这网站就是采用 UTF-8 编码,只要网页制作在这种UTF-8格式编码下,就可以显示各国文字了,这也是现在很多网页制作 采用的方法
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

big5的5C 問題:許 功 蓋 餐.......

Big5的 許 功 蓋 分別為 許(B35C)  功(A55C)  蓋(BB5C), 而右表的 最後一行  \  這個字元的內碼也為 5C
PHP會自動 為  \  這個字加上  \   , 所以經過PHP 成為: 功\   ,使用者端傳送:功(A5 5C)  Server伺服器端接收:功(A5 5C 5C) 存入資料庫的話,問題就來了........more
UTF8 沒有這問題   許功蓋 分別為 許(E8A8B1) 功(E58A9F) 蓋(E8938B)  

可慧网络 可以找到很多相关的知识

有一种中文乱码問題是你收到一批中文资料,不知道是big5 还是utf8    就存进MySQL Server

当我们在屏幕上看到 : 功    是Windows操作系统 把内码0101 1100 0100 0010 (A5 5C)字符点阵字体的映射在屏幕上
如右图:功(A5 5C)功(E58A9F)表面看都一样,

你可以 按照 右图 4 步骤 ,确定 网页自身的编码,与MySQL处理的编码是否 统一,
也可以检查Dreamweaver的 Edit   /  Preference   /   New Document  /  Default encoding



UltraEdit已经支援UTF16,  BOM,  Big Endian
若是资料不多 用 Windows附的notepad 或UltraEdit 把资料改以 utf8格式編碼後Save As (儲存),再送進 MySQL 即可,资料很多 试试看下面的PHP 手冊 mb-convert-encoding

CREATE DATABASE `daily-food` 
DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;


USE `daily-food` ;


CREATE TABLE `healthy-food` (
`早上` VARCHAR( 50 ) NULL ,
`中午` VARCHAR( 50 ) NOT NULL ,
`晚上` VARCHAR( 50 ) NULL 
) TYPE = innodb;


INSERT INTO `healthy-food` ( `早上` , `中午` , `晚上` ) 
VALUES (
'烤地瓜', '林光常健康餐', '全麦面包'
);

 
mysql> SELECT * FROM `daily-food`.`healthy-food` ;
+------+---------+------+
| ??   | ??      | ??   |
+------+---------+------+
| ???  | ??? ??? | ???? |
+------+---------+------+
1 row in set (0.00 sec)
<?php  //following script  from php 5 高島 優作 500個函式 範例(非常棒的书)
    
$s = "從「big5」轉換成「UTF-8」\n";
$a = array(  "將陣列所有元素", "一起從「big5」", "轉換成「UTF-8」\n");
   
header("Content-Type: text/plain; charset=UTF-8");
echo "[mb_convert_encoding]\n";
echo mb_convert_encoding($s, "UTF-8", "big5") . "\n";
echo "[mb_convert_variables]\n";
mb_convert_variables("UTF-8", "big5", $a);
echo   implode("", $a);
?>

想知道更多......PHP 手冊mb-convert-encoding.................mb_convert_variables


 

 


User Comments

Posted by Peter Didenko on April 23 2004 4:58am[Delete] [Edit]

This example are usable for russian users who want to have windows-1251 encoding on the site and koi8-r encoding into the database:

set CHARACTER SET cp1251_koi8

Posted by N Bernhardt on August 17 2005 4:51pm[Delete] [Edit]

If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that the _connection_ character set is still standard latin1.

To change the connection charset permanently to UTF-8, add the following line in the [mysqld] section:
要改变connection charset永久 成UTF-8 把下面这行加入[mysqld] :

[mysqld]
init-connect='SET NAMES utf8'

init代表 initial初始
The other way to let MySQL know what connection charset you intend to use is per-connection based. After a connection is established (with host, name, password), add the following two lines in your application:
把下面这二行加入你的应用程式 application:

SET NAMES utf8;
SET CHARACTER_SET utf8;

The last hint is given most of the time, but not everybody is happy to change every application (esp. when some lazy add-on and extension programmers use their own connection stuff instead of the (PHP) application.

Posted by Roger Wu on September 30 2005 1:41am[Delete] [Edit]

mysql> SET character_set_client = x;
mysql> SET character_set_results = x;
mysql> SET collation_connection = @@collation_database;

When a client connects, it sends to the server the name of the character set that it wants to use. The server sets the character_set_client, character_set_results, and character_set_connection variables to that character set. (In effect, the server performs a SET NAMES operation using the character set.)

With the mysql client, it is not necessary to execute SET NAMES every time you start up if you want to use a character set different from the default. You can add the --default-character-set option setting to your mysql statement line, or in your option file. For example, the following option file setting changes the three character set variables set to koi8r each time you run mysql:

[mysql]
default-character-set=koi8r

Example: Suppose that column1 is defined as CHAR(5) CHARACTER SET latin2. If you do not say SET NAMES or SET CHARACTER SET, then for SELECT column1 FROM t, the server sends back all the values for column1 using the character set that the client specified when it connected. On the other hand, if you say SET NAMES 'latin1' or SET CHARACTER SET latin1, then just before sending results back, the server converts the latin2 values to latin1. Conversion may be lossy if there are characters that are not in both character sets.

If you do not want the server to perform any conversion, set character_set_results to NULL:

mysql> SET character_set_results = NULL;






real_vine@hotmail.com