MySQL 3.23, 4.0, 4.1 Reference Manual :: 10 Character Set Support :: 10.9 Upgrading Character Sets from MySQL 4.0



10.9. Upgrading Character Sets from MySQL 4.0

10.9.1. 4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs
10.9.2. Converting 4.0 Character Columns to 4.1 Format

What about upgrading from older versions of MySQL? MySQL 4.1 is almost upward compatible with MySQL 4.0 and earlier for the simple reason that almost all the features are new, so there's nothing in earlier versions to conflict with. However, there are some differences and a few things to be aware of.

It is important to note that the “MySQL 4.0 character set” contains both character set and collation information in one single entity. Beginning in MySQL 4.1, character sets and collations are separate entities. Though each collation corresponds to a particular character set, the two are not bundled together.

There is a special treatment of national character sets in MySQL 4.1. NCHAR is not the same as CHAR, and N'...' literals are not the same as '...' literals.

Finally, there is a different file format for storing information about character sets and collations. Make sure that you have reinstalled the /share/mysql/charsets/ directory containing the new configuration files.

If you want to start mysqld from a 4.1.x distribution with data created by MySQL 4.0, you should start the server with the same character set and collation. In this case, you won't need to reindex your data.

There are two ways to do so:

shell> ./configure --with-charset=... --with-collation=...
shell> ./mysqld --default-character-set=... --default-collation=...

If you used mysqld with, for example, the MySQL 4.0 danish character set, you should use the latin1 character set and the latin1_danish_ci collation:

shell> ./configure --with-charset=latin1 \
           --with-collation=latin1_danish_ci
shell> ./mysqld --default-character-set=latin1 \
           --default-collation=latin1_danish_ci

Use the table shown in Section 10.9.1, “4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs”, to find old 4.0 character set names and their 4.1 character set/collation pair equivalents.

If you have non-latin1 data stored in a 4.0 latin1 table and want to convert the table column definitions to reflect the actual character set of the data, use the instructions in Section 10.9.2, “Converting 4.0 Character Columns to 4.1 Format”.


User Comments

Posted by Boot Zero on May 21 2006 7:53pm[Delete] [Edit]

This took me several hours to figure out, so I hope it saves someone else some time:

Upgrading from MySQL 4.0x on Redhat to 4.1x on Gentoo. The upgrade went smoothly. Optimized and repaired tables and we were on our way.

After running etc-update, the sizes of character and varchar fields were smaller than they were supposed to be. This, I found, was caused by a difference in characters sets. MySQL 4.1 was defaulting to a multi-byte character set on single-byte data, which was shrinking my field sizes.

An examination of /var/log/mysql/mysqld.err showed the following when starting mysqld:

060521 11:42:51 [Warning] './bigdata/update_log' had no or invalid character set, and default character set is multi-byte, so character column sizes may have changed
060521 11:42:51 [Warning] './bigdata/user_tracking' had no or invalid character set, and default character set is multi-byte, so character column sizes may have changed
060521 11:42:51 [Warning] './bigdata/users' had no or invalid character set, and default character set is multi-byte, so character column sizes may have changed

My install of MySQL 4.0x was using a latin1 default character set. The solution was to switch the default character encoding in my.cnf on the 4.1x box like so:

[mysqld]
character-set-server = latin1
default-character-set = latin1

Poof! tables back to normal.

-BZ

Add your own comment.

10.9.1. 4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs

ID 4.0 Character Set 4.1 Character Set 4.1 Collation
1 big5 big5 big5_chinese_ci
2 czech latin2 latin2_czech_ci
3 dec8 dec8 dec8_swedish_ci
4 dos cp850 cp850_general_ci
5 german1 latin1 latin1_german1_ci
6 hp8 hp8 hp8_english_ci
7 koi8_ru koi8r koi8r_general_ci
8 latin1 latin1 latin1_swedish_ci
9 latin2 latin2 latin2_general_ci
10 swe7 swe7 swe7_swedish_ci
11 usa7 ascii ascii_general_ci
12 ujis ujis ujis_japanese_ci
13 sjis sjis sjis_japanese_ci
14 cp1251 cp1251 cp1251_bulgarian_ci
15 danish latin1 latin1_danish_ci
16 hebrew hebrew hebrew_general_ci
17 win1251 (removed) (removed)
18 tis620 tis620 tis620_thai_ci
19 euc_kr euckr euckr_korean_ci
20 estonia latin7 latin7_estonian_ci
21 hungarian latin2 latin2_hungarian_ci
22 koi8_ukr koi8u koi8u_ukrainian_ci
23 win1251ukr cp1251 cp1251_ukrainian_ci
24 gb2312 gb2312 gb2312_chinese_ci
25 greek greek greek_general_ci
26 win1250 cp1250 cp1250_general_ci
27 croat latin2 latin2_croatian_ci
28 gbk gbk gbk_chinese_ci
29 cp1257 cp1257 cp1257_lithuanian_ci
30 latin5 latin5 latin5_turkish_ci
31 latin1_de latin1 latin1_german2_ci