MySQL 5.0 Reference Manual :: 5 Database Administration :: 5.11 MySQL Localization and International Usage :: 5.11.7 Problems With Character Sets
  • MySQL 5.0 Reference Manual

  • 5.11 MySQL Localization and International Usage
  • 5.11.1 The Character Set Used for Data and Sorting
  • 5.11.2 Setting the Error Message Language
  • 5.11.3 Adding a New Character Set
  • 5.11.4 The Character Definition Arrays
  • 5.11.5 String Collating Support
  • 5.11.6 Multi-Byte Character Support
  • 5.11.7 Problems With Character Sets
  • 5.11.8 MySQL Server Time Zone Support

Get the MySQL Language Reference and MySQL Administrator's Guide from MySQL Press!



5.11.7. Problems With Character Sets

If you try to use a character set that is not compiled into your binary, you might run into the following problems:

  • Your program uses an incorrect path to determine where the character sets are stored. (Default /usr/local/mysql/share/mysql/charsets). This can be fixed by using the --character-sets-dir option when you run the program in question.

  • The character set is a multi-byte character set that cannot be loaded dynamically. In this case, you must recompile the program with support for the character set.

  • The character set is a dynamic character set, but you do not have a configure file for it. In this case, you should install the configure file for the character set from a new MySQL distribution.

  • If your Index file does not contain the name for the character set, your program displays the following error message:

    ERROR 1105: File '/usr/local/share/mysql/charsets/?.conf'
    not found (Errcode: 2)
    

    In this case, you should either get a new Index file or manually add the name of any missing character sets to the current file.

For MyISAM tables, you can check the character set name and number for a table with myisamchk -dvv tbl_name.


User Comments

Posted by Alexey Mikhailov on October 25 2003 11:15pm[Delete] [Edit]

This BUG in PHP, not in MySQL (PHP 4.3.1, MySQL 3.23.51, Apache 2.0.43, WindowsXP)... I agree with Lubomir Krajcovic about first item but second didn't work... I change "c:\mysql\" to "c:\mysql3" not in libmysql.dll but in php4ts.dll.

Posted by Eugene on February 26 2004 12:24am[Delete] [Edit]

Recently I was working on c++ prog (VisualStudio 7.0) that should be ran as a service on Win2kSP3 station stuffing some data to a remote mysql-3.23.53 server via ODBC 3.51. It worked fine in user mode, but if i tried to run it as a service it instantly died with an error:

File 'C:\mysql\\share\charsets\?.conf' not found (Errcode: 2)
Character set '#14' is not a compiled character set and is not specified in the 'C:\mysql\\share\charsets\Index' file.

Web search brought no success, as most people had this trouble with mysql server service, not ODBC-client. It was a great luck indeed when i ran into a russian USENET thread with a very strange but effective solution:
I just copied all contents of "/usr/local/mysql/share/mysql/charsets" directory from my remote FreeBSD 4.4 box to "C:\mysql\share\charsets\" on my Win2kSP3 station. And it worked!
I would really appreciate any explanation from anybody with deeper understanding of MySQL and MySQL ODBC 3.51 on such strange behaviour.

Posted by Alex Smith on October 23 2004 12:28am[Delete] [Edit]

I also got all those pretty stuff with cp1251:
1) created empty database
2) restored data into it from sql file
3) run myisamchk - ALL tables with varchar (i.e. russian strings) are reported having errors! in clean database!!!
I was really confused... :(((
Then I found some good advice somewhere - I wrote "character-sets-dir=" also into [client] and [myisamchk] sections of my.cnf (before it was only in [mysqld]) and "default-character-set=cp1251" in [client] (dont know if it's really needed) - run myisamchk with "--set-character-set=cp1251" on all tables - MIRACLE! all damn things were fixed! ...
still I dont understand why did myisamchk not understand command line key "--character-sets-dir" and only modifying of my.cnf helped. I wonder...

Posted by Alexandre Koriakine on November 11 2004 11:46am[Delete] [Edit]

File 'C:\mysql\\share\charsets\?.conf' not found (Errcode: 22)
Character set '#51' is not a compiled character set and is not specified in the 'C:\mysql\\share\charsets\Index' file

I had the same error in WINXP. Even I had copied from both Linux and BSD installations - this didn't help me. I have even installation in c:\ptorgam files\mysql.

My Solution: I just created c:\mysql\share\charsets and copied from original folder all files, and added into "Index" file new row (with encoding I use):
cp1251 51

Now I don't get any errors. In all my linux/bsd machines I ve never seen this error.

Posted by K. Swartz on January 1 2005 4:39am[Delete] [Edit]

The C:\mysql directory appears to only be hardcoded when trying to look for information on character sets that are not compiled with the installation. Given that, here is a better way to workaround the problem:

First, go into mysql for your database and do "show collation;" and look for character sets with "Yes" under the Compiled column. Then, edit your my.cnf file and set collation-server (under [mysqld]) to one of these character sets -- preferably one that is closest to what your data uses. Bounce the mysqld server, and the errors should go away.

Posted by Andrew Smyk on January 16 2005 11:16am[Delete] [Edit]

I had this problem when my MySQL was using utf-8.. File utf-8.xml is not present where he has to -> c:\mysql\share\charsets.. I changed default coding to cp1250 and now it is working ..

Posted by Alexander Amelkin on January 21 2005 5:01pm[Delete] [Edit]

The "?.conf''" problem may also be caused by an outdated DBD::mysql perl package.

If you used an old version of MySQL (say, 3.23) and then upgraded your system to MySQL 4.1, you may encounter such a problem with your perl scripts.

Just run cpan and issue an 'install DBD::mysql' command.
After you update your DBD::mysql package the charset problem should be gone.

At least it worked for me.

Posted by Alexander Splinter on February 10 2005 1:41pm[Delete] [Edit]

Hi guys!
Here You are OVERALL solution for 4.1 that makes Your php (for example) code "crosshosted" i.e. it will not depend on MySQL server charset options (it's very useful when host admin is not available :) ) :

After connecting (@mysql_connect) and selecting (@mysql_select_db) a database You should make the following:
$set = @mysql_query ('SET NAMES CP1251');
$set = @mysql_query ('SET COLLATION_CONNECTION=CP1251_GENERAL_CI');

may be just first SET is enough even.

Remove ANY stuff concerning charsets from my.ini (my.cnf) just forget it and set connection properties as above. Some stupid strings from errors.log will gone as bonus :)

Posted by Brad Rathbun on March 12 2005 6:57pm[Delete] [Edit]

The comments by Gotz helped me out a lot. I thought I would also add the following comments to expand on his line of thought:

This is a quick hack to generate the necessary Index file:
#!/bin/bash
mv Index.xml Index.xml.tmp
i="1"
for filename in $( ls *.xml | sort )
do
echo "$filename $i" >> Index
i=`expr $i + 1`
done
mv Index.xml.tmp Index.xml

I would also like to mention that I got the errors when I configured using "--with-charset=utf8". When I removed that, problem went away. The docs say that latin1 is the default, which is good enough for me. Easy fix is to avoid specifying the charset if you can live with the defaults.

Posted by B!rd Feathery on March 17 2005 8:04pm[Delete] [Edit]

hello. i am using ActivePerl 5.8.6.811 and MySQL 4.1.9 (in "D:\web\mysql") on Win2kSP4.
i got error
"File 'C:\mysql\\share\charsets\?.conf' not found (Errcode: 2)Character set '#51' is not a compiled character set and is not specified in the 'C:\mysql\\share\charsets\Index' file"
connecting using DBI:mysql.

copying to C:\mysql\share... did not solved this problem.
So i have just installed DBI:mysqlPP and use "DBI->connect('DBI:mysqlPP...". It Works!

Posted by DOGUS YILDIRIM on May 25 2005 10:49am[Delete] [Edit]

ha haa :)) I know the answer ,finally !!!
If you see this error,just create "C:/mysql/share/" like Alexandre said,and copy a well-working "charsets" file into it.I use mysql 4.1.11,and the files in charsets folder are xml files,and they didn't work.I copied the correct files of a mysql 3.23.33 charsets folder,it contains .conf files.Don't edit any .ini file!just do what i said,and refresh your page...
Have fun !!!

Posted by Ozan Hazer on August 27 2005 5:09pm[Delete] [Edit]

Windows XP, MySQL 4.1.14
About the charset problem, I added character-sets-dir="C:\Program Files\MySQL\MySQL Server 4.1\\share\charsets"
under [client] in my.ini and it worked!

Watch out for double \ before share...
\s is converted to space character so we have to escape backslash...

Posted by bossk on November 10 2005 5:28pm[Delete] [Edit]

I had the same problem.
I did a check with myisamchk -dvv and found out that one of the tables had a problem with the charset. The type was not in the /usr/local/mysql/charset/Index. So I edited this file and put the characterset in there. The next I did was to set the right charset myisamchk --set-character-set=<your_character>.

Another thing you could do is find out the table with the character problem and backup its content and create a new one and put the content back.

Posted by Andrei Bastun on November 14 2005 11:20am[Delete] [Edit]

I had problem with inserting cyrillic data into mysql table. I added
default-character-set=cp1251
character_set_client=cp1251

in my.ini file, and now it works fine.

Posted by Stepan Francl on December 29 2005 8:40pm[Delete] [Edit]

Win2000 Czech, MySQL 5.0.17-nt
On Character Set Dialog Page in the MySQL Configuration Wizard I set 3rd option (Manual Selected) and Character Set as cp1250.
When I tried connect into mysql server via mysql.exe command, it fails with message "mysql: Character set 'cp1250' is not a compiled character set and is not specified in the 'C:\mysql\\share\charsets\Index.xml' file".
Solution was simple (thanks Ozan Hazer's message above) - I added
character-sets-dir="C:\Program Files\MySQL\MySQL Server 5.0\\share\charsets\"
under [client] section in my.ini and everything works :-)

Posted by Mr. Sea on February 9 2006 11:16am[Delete] [Edit]

I setup MySQL 5.0.18 on a Windows 2000 SP4 machine.
I used "mysql-essential-5.0.18-win32.msi" installer package for installation. I
configured character set to latin5 (with MySQL Server Instance Config Wizard).
Server started without any problem but when I try to use "mysql" command line
tool I saw following error message and mysql client could not start;

mysql: Character set 'latin5' is not a compiled character set and is not
specified in the 'C:\mysql\\share\charsets\Index.xml' file

After this error I add following line(character-sets-dir) into my.ini file and
now it is working.

# CLIENT SECTION
...
[mysql]
default-character-set=latin5
character-sets-dir="C:/Program Files/MySQL/MySQL Server 5.0/share/charsets"

I reported this as a bug. Bug #17271

Add your own comment.

这些文字都会被显示成无法辨识的符号(乱码)。

所谓的“内码”,指的是中、日、韩等亚洲文字在电脑系统中的排列与编码方式

由於这叁国的文字都无法以罗马字符表示出来,在电脑系统里,一般以两个英文字符组合成一个中、日或韩文文字。电脑的语文系统便的自动侦测这些英文字符,以显示出正确的文字。 ----目前,在国际上所广泛采用的中文内码标准有:

 GB国标码:广泛应用於中国大陆,以及采用简体字的地区,如新加坡和马来西亚等。
 BIG5大五码:适用於台湾、香港,及采用繁体字的地区。
 HZ汉字:这是流行於某些网路新闻论坛和电子邮件的编码方式,所采用的内码是国标码。

处理Unicode文字则像处理有秩序的文字。您也许会高兴地知道前128个Unicode字符(16位代码从0x0000到0x007F)就是ASCII字符,而接下来的128个Unicode字符(代码从0x0080到0x00FF)是ISO 8859-1对ASCII的扩展。Unicode中不同部分的字符都同样基于现有的标准。这是为了便于转换。
希腊字母表使用从0x0370到0x03FF的代码,斯拉夫语使用从0x0400到0x04FF的代码,美国使用从0x0530到0x058F的代码,希伯来语使用从0x0590到0x05FF的代码。中国、日本和韩国的象形文字(总称为CJK)占用了从0x3000到0x9FFF的代码。Unicode的最大好处是这里只有一个字符集,没有一点含糊。

Unicode是一个标准。UTF-8是其概念上的子集,UTF-8是具体的编码标准。
UNICODE是所有想达到世界统一编码标准的标准

UTF-8标准就是Unicode(ISO10646)标准的一种变形方式, UTF的全称是:Unicode/UCS Transformation Format,其实有两种UTF,一种是UTF-8,一种是UTF-16, 不过UTF-16使用较少,其对应关系如下:

在Unicode中编码为 0000 - 007F 的 UTF-8 中编码形式为: 0xxxxxxx 
在Unicode中编码为 0080 - 07FF 的 UTF-8 中编码形式为: 110xxxxx 10xxxxxx
在Unicode中编码为 0000 - 007F 的 UTF-8 中编码形式为: 1110xxxx 10xxxxxx 10xxxxxx

utf-8是unicode的一个新的编码标准,其实unicode有过好几个标准.我们知道一直以来使用的unicode字符内码都是16位,它实际上还不能把全世界的所有字符编在一个平面系统,比如中国的藏文等小语种,所以utf-8扩展到了32位,也就是说理论在utf-8中可容纳二的三十二次方个字符.
UNICODE的思想就是想把所有的字符统一编码,实现一个统一的标准.big5、gb都是独立的字符集,这也叫做远东字符集,把它拿到德文版的WINDOWS上可能将会引起字符编码的冲突....
早期的WINDOWS默认的字符集是ANSI.notepad中输入的汉字是本地编码,
但在NT/2000内部是可以直接支持UNICODE的。

对于已有的ANSI字符,unicode简单的将其扩展为16位:
比如ANSI "A"=0x43,则对应的UNICODE为 "A"= 0x0043
而ASCII用七存放128个字符,ASCII是一个真正的美国标准,所以它不能满足其他国家的需要,例如斯拉夫语的字母和汉字于是出现了Windows ANSI字符集,是一种扩展的ASCII码,用8位存放字符,低128位仍然存放原来的ASCII码, 而高128位加入了希腊字母等

if def UNICODE
TCHAR = wchar
else
TCHAR = char

你需要在Project\Settings\C/C++\Preprocesser definitions中添加UNICODE和_UNICODE UINCODE,_UNICODE都要定义。不定义_UNICODE的话,用SetText(HWND,LPCTSTR),将被解释为SetTextA(HWND,LPTSTR),这时API将把你给的Unicode字符串看作ANSI字符串,显示乱码。因为windows API是已经编译好存在于dll中的,由于不管UNICODE还是ANSI字符串,都被看作一段buffer,如"0B A3 00 35 24 3C 00 00"如果按ANSI读,因为ANSI字串是以'\0'结束的,所以只能读到两字节"0B A3 \0",如果按UNICODE读,将完整的读到'\0\0'结束。

由于UNICODE没有额外的指示位,所以系统必须知道你提供的字串是哪种格式。

此外,UNICODE好象是ANSI C++规定的,_UNICODE是windows SDK提供的。如果不编写windows程序,可以只定义UNICODE。

开发过程:
围绕着文件读写、字符串处理展开。文件主要有两种:.txt和.ini文件
1. 在unicode和非unicode环境下字符串做不同处理的,那么需要参考以上9,10两条,以适应不同环境得字符串处理要求。
对文件读写也一样。只要调用相关接口函数时,参数中的字符串前都加上_TEXT等相关宏。如果写成的那个文件需要是unicode格式保存的,那么在创建文件时需要加入一个字节头。

某些语言(如韩语)必须在unicode环境下才能显示,这种情况下,在非unicode环境下开发,就算用字符串函数转换也不能达到显示文字的目的,因为此时调用得API函数是用ANSI的(虽然底层都是用UNICODE处理但是处理结果是按照程序员调用的API来显示的)。所以必须用unicode来开发。





real_vine@hotmail.com