phpMyadmin Import doesn't like the specials
September 7, 2008 2:23 AM
I'm trying to replicate the setup of an existing blog by copying over the wordpress folder and database using an SQL export.
After I got the site up and running, I discovered that a lot of posts were abruptly cut off. A little bit of investigation showed that the posts had got truncated wherever some special characters appeared like - and ' .
Looking at the SQL file itself, I can see that the text isn't cut off on export. So how do I get the import to go through without cutting off every time it hits a special character?
I'm looking for a solution that works solely with phpmyadmin or tools available within Cpanel (since that would mean I could do this across any shared hosting account), but if shell access is a must I can do that for now.
Version info (Main Blog)
- Wordpress 2.6.1 (upgraded from Wordpress 2.5 onwards)
- mySQL 5.0.45-community-log
- phpmyadmin 2.11.6
Version info (Blog copy)
- mySQL 5.0.45
- phpmyadmin 2.11.8
Looking at the SQL file itself, I can see that the text isn't cut off on export. So how do I get the import to go through without cutting off every time it hits a special character?
I'm looking for a solution that works solely with phpmyadmin or tools available within Cpanel (since that would mean I could do this across any shared hosting account), but if shell access is a must I can do that for now.
Version info (Main Blog)
- Wordpress 2.6.1 (upgraded from Wordpress 2.5 onwards)
- mySQL 5.0.45-community-log
- phpmyadmin 2.11.6
Version info (Blog copy)
- mySQL 5.0.45
- phpmyadmin 2.11.8
I've had some trouble with this, since the phpMyAdmin of my host was fixed in exporting dumps as Latin-1 text, and the phpMyAdmin I used on my test server insisted on translating everything to the character set UTF-8.
Not being an expert in phpMyAdmin, or using the command line, I ended up using MySQL Administrator instead, for more control.
posted by ijsbrand at 4:29 AM on September 7, 2008
Not being an expert in phpMyAdmin, or using the command line, I ended up using MySQL Administrator instead, for more control.
posted by ijsbrand at 4:29 AM on September 7, 2008
Seconding maxwelton
I had similar problems, but phpadmin was my solution as I was previously trying to use the Wordpress inbuilt tools and found the XML was getting truncated by special characters.
Can you do a find and replace on the SQL export if you know what chars are problematic? ijsbrand's solution sounds more plausible though.
posted by kaydo at 5:32 AM on September 7, 2008
I had similar problems, but phpadmin was my solution as I was previously trying to use the Wordpress inbuilt tools and found the XML was getting truncated by special characters.
Can you do a find and replace on the SQL export if you know what chars are problematic? ijsbrand's solution sounds more plausible though.
posted by kaydo at 5:32 AM on September 7, 2008
Also, not sure if it helps but here's the other stuff I tried:
On recommendation from TechDuke, I tried using the bigdump script to import my db from the db, though this is generally targeted at people with specific import issues. Laziness was my specific issue. It didn’t work on the first run, but I think I’ll bookmark it to review again.
Spam comments were the main reason for my import issues, so I tried this one successfully.
posted by kaydo at 5:35 AM on September 7, 2008
On recommendation from TechDuke, I tried using the bigdump script to import my db from the db, though this is generally targeted at people with specific import issues. Laziness was my specific issue. It didn’t work on the first run, but I think I’ll bookmark it to review again.
Spam comments were the main reason for my import issues, so I tried this one successfully.
posted by kaydo at 5:35 AM on September 7, 2008
Can you paste an example of one of the lines of SQL in which a post is getting cut off? As you probably know, a double-hyphen is a comment indicator in SQL, while a single quote (apostrophe) is a field delimiter, so it sounds like when you're doing the export (as maxwelton pointed out) you're either not delimiting the fields properly or escaping special characters (i.e., in your SQL you should see:
posted by Doofus Magoo at 6:20 AM on September 7, 2008
INSERT INTO TABLE VALUES ('I DON\'T KNOW')...not:
INSERT INTO TABLE VALUES ('I DON'T KNOW')Also, definitely seconding MySQL administrator; you can set it up to do a direct DB transfer without having to muck around with exporting on one end and importing on the other end.
posted by Doofus Magoo at 6:20 AM on September 7, 2008
if you have shell access and your export is in a proper sql format, the easiest way to do this is surely:
posted by fishfucker at 12:46 PM on September 7, 2008
mysql -u<mysql_user> -p <database name> < <your export file>you will be queried for your mysql user's password. You need a blank database to import to.
posted by fishfucker at 12:46 PM on September 7, 2008
I've had exactly the same problem, but doing manual mysql exports and imports. The problem was down to some exotic characters in the text. For example, I'm guessing that the apostrophe and dash you mention aren't normal apostrophes and dashes but curly apostrophes and em-dashes (slightly wider than a standard hyphen).
I think this becomes a problem because, as ijsbrand suggested, different databases (and the SQL export file itself) can have different character encodings. For example, it might be that your original database could handle UTF-8 encodings but you're importing the data into a database that's only set up for Latin-1, which won't like these relatively exotic characters.
If you haven't added anything to the new site, and are able to reimport the old data, I'd suggest creating a new database and, using phpMyAdmin, set the encodings of the database to UTF-8 before importing (if you have permissions to do this).
I wish I could offer a more concrete solution but I don't 100% understand it all myself unfortunately. But I'm pretty sure it's this kind of issue that is causing the problem. Good luck.
posted by fabius at 2:19 PM on September 7, 2008
I think this becomes a problem because, as ijsbrand suggested, different databases (and the SQL export file itself) can have different character encodings. For example, it might be that your original database could handle UTF-8 encodings but you're importing the data into a database that's only set up for Latin-1, which won't like these relatively exotic characters.
If you haven't added anything to the new site, and are able to reimport the old data, I'd suggest creating a new database and, using phpMyAdmin, set the encodings of the database to UTF-8 before importing (if you have permissions to do this).
I wish I could offer a more concrete solution but I don't 100% understand it all myself unfortunately. But I'm pretty sure it's this kind of issue that is causing the problem. Good luck.
posted by fabius at 2:19 PM on September 7, 2008
Wow - thank you all for the responses.
ijsbrand - In fact, there is a character-set encoding problem in my Wordpress database. This article explains it in great detail. In fact it's in order to fix the encoding mismatch that I'm setting up a copy of the blog. However, the target database was empty when I imported into it and thus the encoding (atleast at the table level) is identical on both sides. Not sure if there is still a problem due to encoding though..
kaydo - I've had very little success with the Wordpress WXR Import as well. The 2MB limit is a nightmare and the import doesn't seem to get all the posts in one go. That's why I tried the SQL dump approach instead.
Doofus Magoo / fabius - Nice detective work. I just went through the SQL file that I exported from the server where the blog is running and discovered (what I think is) the source of the problem. In some cases, the SQL has the delimiter like so:
I'm now in deeper trouble that I had expected it seems :-(. I had assumed thus far that the problem was with the "import" but it looks like the real problem is on the "export" side. On the export side, I don't have quite as much control. It's on a shared hosting account using Cpanel (although I do have SSH access) - would exporting via mysql on the command line or mySQL administrator help with properly escaping em-dashes etc.?
posted by your mildly obsessive average geek at 9:00 PM on September 7, 2008
ijsbrand - In fact, there is a character-set encoding problem in my Wordpress database. This article explains it in great detail. In fact it's in order to fix the encoding mismatch that I'm setting up a copy of the blog. However, the target database was empty when I imported into it and thus the encoding (atleast at the table level) is identical on both sides. Not sure if there is still a problem due to encoding though..
kaydo - I've had very little success with the Wordpress WXR Import as well. The 2MB limit is a nightmare and the import doesn't seem to get all the posts in one go. That's why I tried the SQL dump approach instead.
Doofus Magoo / fabius - Nice detective work. I just went through the SQL file that I exported from the server where the blog is running and discovered (what I think is) the source of the problem. In some cases, the SQL has the delimiter like so:
INSERT INTO `wp_posts` VALUES ('I think I\'ve worked out...')But in other cases it looks like this
INSERT INTO `wp_posts` VALUES ('it is this – You ...')Like fabius points out, this might be because it has been posted as an em-dash and not a regular hyphen
I'm now in deeper trouble that I had expected it seems :-(. I had assumed thus far that the problem was with the "import" but it looks like the real problem is on the "export" side. On the export side, I don't have quite as much control. It's on a shared hosting account using Cpanel (although I do have SSH access) - would exporting via mysql on the command line or mySQL administrator help with properly escaping em-dashes etc.?
posted by your mildly obsessive average geek at 9:00 PM on September 7, 2008
I think you're getting two issues slightly confused (but apologies if I'm teaching you to suck eggs!).
The delimiters for the SQL data are 's, ie, straight apostrophes. The kind you get by just hitting the apostrophe key. So any straight apostrophes within some SQL text needs to be escaped with a backslash. This is the only character that needs to be escaped in such a manner. (I think it can also be escaped by another straight apostrophe, ie '').
What I think is happening is that you have some extended characters in there, such as curly apostrophes and em dashes. Something in the process isn't understanding these characters and when it reaches one during the import it's throwing up its hands and giving up - it thinks the character must be the end of that field.
Escaping these extended characters isn't the solution as there's no way to do so (that I know of).
First thing would be to check that what I'm guessing is in fact happening. Find some text on the new blog that has been truncated and then find the same text in your SQL file - if the next character in the file after the text stops in the blog is a curly quote or em dash or something, then this is probably the problem.
I don't think your problem is 100% identical to this or this or this but it's along the same lines. Hope that helps.
posted by fabius at 1:55 AM on September 8, 2008
The delimiters for the SQL data are 's, ie, straight apostrophes. The kind you get by just hitting the apostrophe key. So any straight apostrophes within some SQL text needs to be escaped with a backslash. This is the only character that needs to be escaped in such a manner. (I think it can also be escaped by another straight apostrophe, ie '').
What I think is happening is that you have some extended characters in there, such as curly apostrophes and em dashes. Something in the process isn't understanding these characters and when it reaches one during the import it's throwing up its hands and giving up - it thinks the character must be the end of that field.
Escaping these extended characters isn't the solution as there's no way to do so (that I know of).
First thing would be to check that what I'm guessing is in fact happening. Find some text on the new blog that has been truncated and then find the same text in your SQL file - if the next character in the file after the text stops in the blog is a curly quote or em dash or something, then this is probably the problem.
I don't think your problem is 100% identical to this or this or this but it's along the same lines. Hope that helps.
posted by fabius at 1:55 AM on September 8, 2008
fabius - My apologies those were probably bad examples to offer up as evidence. Simply put here's what I'm seeing:
This insert:
This insert:
This insert:
So overall, it definitely looks like a problem with extended characters. As you have pointed out, that just makes the problem a whole lot more complicated :(
With respect to the articles you have linked to - there is a character set problem in the database as well. So the posts that make it through in the entirety wind up with sentences filled with odd characters like â€. There is a script that can help fix this, but starting with an incomplete database is not the best idea :(
posted by your mildly obsessive average geek at 2:48 AM on September 8, 2008
This insert:
INSERT INTO `wp_posts` VALUES ('it is this – You ...')fails and causes a truncated post (Looks like an em-dash)
This insert:
INSERT INTO `wp_posts` VALUES ('A recent example - Miyagi.sg')works properly. (Looks like a regular hyphen)
This insert:
INSERT INTO `wp_posts` VALUES ('in the chaos from packing/unpacking just isn’t ')fails after the letters isn (I think this is a curly quote)
So overall, it definitely looks like a problem with extended characters. As you have pointed out, that just makes the problem a whole lot more complicated :(
With respect to the articles you have linked to - there is a character set problem in the database as well. So the posts that make it through in the entirety wind up with sentences filled with odd characters like â€. There is a script that can help fix this, but starting with an incomplete database is not the best idea :(
posted by your mildly obsessive average geek at 2:48 AM on September 8, 2008
If you'd try to import your existing database dump with MySQL Administrator in that empty database, you could choose to import it in either the Latin-1 character set, or the UTF-8. Just to see what happens. Should one of them import everything correctly, you know a lot more.
posted by ijsbrand at 5:38 AM on September 8, 2008
posted by ijsbrand at 5:38 AM on September 8, 2008
Yay! So it took a bit of fiddling around with SSH tunneling before I got it working but mySQL Administrator is indeed the answer! Here's what I did:
1. Installed mySQL administrator on my desktop
2. Via Putty, setup a SSH tunnel to my primary blog database as follows: L13306 localhost:3306 (Yes it is odd to refer to destination as localhost, but it works)
3. Fire up mySQL Administrator and connected to the main database via localhost:13306. Started a Backup project and exported a SQL file of the database.
4. Another SSH tunnel, this time to the test server with a slightly different forwarding setup - L14306 localhost:3306.
5. Connected to the test database on localhost:14306. Chose Restore, Open Backup file. Since I've used mySQL administrator to export (and only mySQL administrator generated backup files work in Restore) other settings are fine, so "Start Restore".
6. Look at the homepage and not only are all the posts complete with no cut off characters, I don't have weird characters from charset conversions either. Awesome!
Thank you all for your help and suggestions.
posted by your mildly obsessive average geek at 12:28 AM on September 11, 2008
1. Installed mySQL administrator on my desktop
2. Via Putty, setup a SSH tunnel to my primary blog database as follows: L13306 localhost:3306 (Yes it is odd to refer to destination as localhost, but it works)
3. Fire up mySQL Administrator and connected to the main database via localhost:13306. Started a Backup project and exported a SQL file of the database.
4. Another SSH tunnel, this time to the test server with a slightly different forwarding setup - L14306 localhost:3306.
5. Connected to the test database on localhost:14306. Chose Restore, Open Backup file. Since I've used mySQL administrator to export (and only mySQL administrator generated backup files work in Restore) other settings are fine, so "Start Restore".
6. Look at the homepage and not only are all the posts complete with no cut off characters, I don't have weird characters from charset conversions either. Awesome!
Thank you all for your help and suggestions.
posted by your mildly obsessive average geek at 12:28 AM on September 11, 2008
This thread is closed to new comments.
* Check that your old field type and new field type are the same (assuming you're not using your dump to write the new tables)...if the old field was "text" and the new was "varchar 200" or similar, that wouldn't be helpful.
* If you didn't dump the old data using phpmyadmin, consider surrounding all data fields with quotes or whatever you define on import. Consider escaping the characters you feel are causing problems.
Did phpmyadmin import the new data without throwing any errors, but still truncated the data? That sounds like field-type mismatch to me, but my expertise is laughable. Could it be a character encoding difference?
I've never had an import problem when I dump using phpmyadmin and then import using the same tool on a new server. I've had endless problems with CSVs and tab-delimited files, so I feel your pain.
posted by maxwelton at 3:31 AM on September 7, 2008