Jump to Navigation

Drupal 7 Feed Aggregator issue - MySQL Character Set

In the last couple of months, the Feed Aggregator updates broke cron, when a blog post that I wanted to import contained a special character. The log message for the import of the offending feed looks like this:
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC6\x83u\xC4\xB1u

Similar problems are described on http://drupal.org as an issue with the default character set settings in the MySQL installation.
Backup & Migrate issue queue: http://drupal.org/node/1100146
Feeds issue queue: http://drupal.org/node/1140194
Commerce Feeds issue queue: http://drupal.org/node/1189708

As suggested by several posts in these issue queues, I fixed the problem by converting all settings in MySQL from the character set "latin-1" to "uft-8".
A great very long post (http://www.bluebox.net/news/2009/07/mysql_encoding, not accessible today, hopefully it will be back soon) describes the issue in detail, but not from a drupal perspective.

If you are having this problem, some of your variables in MySQL are probably set the wrong way by default:

On the mysql prompt:
show variables like 'char%';
Probably returns something similar to the printout below, or a mixture of different encodings (latin1, utf8, and even something swedish).

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

More about tables in latin1 (on the mysql prompt using your database):

show table status where collation like '%latin1%';
result: | tablename | MyISAM |      ... cut some columns    | latin1_swedish_ci |     NULL |

(Yes, there is swedish in MySQL by default!)

First: Fix your mysql installation, so that it always uses utf8:
Edit your my.cnf file (usually in /etc). Add the following lines:
[client]
default-character-set = utf8
[mysqld]
default-character-set = utf8

(If you already have [client] and [mysqld] sections, add the character-set lines to these sections)

Now, restart mysql to set these new variables. In CentOS: service mysqld stop;service mysqld start
The variables should now look like this:
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Next, you'll have to change the character set settings for all of your existing database tables.
Some special characters in existing tables might get screwed up in the process, so it's important to make a backup first.
In my case, the offending tables were all feeds, and I didn't care about these. I did try to create a page once with special characters, before I converted the tables to the correct character encoding, and the insert got rejected with the error message above.

Create a backup of your database:
drush cc all 
drush bb

Optional, add the name of your site as defined in your .drush directory like drush @example.com bb
This should have created a backup in the designated manual backup destination.

Use a text editor to change all instances of latin1 to utf8:

cp yourbackupfile newbackupfile # makes a copy
vi newbackupfile   # starts vi editor with newbackupfile
:1,$s/DEFAULT CHARSET=latin1 /DEFAULT CHARSET=utf8 /g  # substitution of latin1 with utf8
:wq  #saves the file

This substitutes all occurrences of "latin1 " with "utf8 "; You should now search for latin1 in the document, in case you have latin1_swedish_ci in your tables, exchange those, too. I purposely added a space after latin1 to avoid substituting latin1 derivatives. I don't know whether the character set gets dumped using the swedish specification, didn't happen to me.

To check on remaining latin1 occurrences, use grep:
grep -i latin1 newbackupfile
If you get a result, you'd need to fix those lines, too.

Now, load your new file:
mysql -u user -p"yourpassword" yourdatabase < newbackupfile
(or use drush for this. I usually use the commandline mysql, so I haven't tested the drush version)

Check whether the site is working properly. This should do it, and also fix it for future drupal sites in the same mysql installation.

If the site is broken, restore the un-altered backup file, and repeat the procedure while making sure that you didn't accidentally break an insert statement.

Comments

Fine way of telling, and good article to get information regarding
my presentation focus, which i am going to deliver in academy.

hmm great written. are you a professional writer? i'm looking for help for my own website.

You can contact me using the contact form, if you are interested in working with me.

You can use the following code:
$val = mb_check_encoding($val, 'UTF-8') ? $val : utf8_encode($val);

if it is just matter of one field value.

Add new comment

(If you're a human, don't change the following field)
Your first name.

Filtered HTML

  • Lines and paragraphs break automatically.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd><h3><h4><img><pre><br><p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.