Mapping site should be faster now

Over the past few weeks, I have been noticing that the site I created to allow users to plot multiple addresses on Google Maps would sometimes become very slow when a lot of people are using it. So, I started to investigate the cause.

(I am going to split this post into two sections, the first aimed at non-programmers, and the second at programmers, in case they are interested. I realize that most of the people that use my maps site probably don’t really care how it works under the hood, so the first part is for them. On the other hand, if I can share what I learned from this experience with other programmers, then they can read the whole post, focusing on the second half.)

Overview

The site uses a database to store the information about maps that people choose to save. I found that one of the database parameters was the most likely culprit for the slowness. Basically, it was optimized for a database that gets information read from it almost all the time, but when data is written to it, the whole thing locks, and doesn’t allow other people to read data from it while that write is happening.

So, now that I had found the likely cause of the issue, I started investigating how to fix it. I found that there were too methods, so I just picked the one that seemed easier. After a few hours of issuing the command to make this change, it appeared to still be working. Not sure if something was wrong, and the process stalled, I cancelled and re-issued the command. This was probably my biggest mistake. I later realized that the process just takes a long time, and I should have let it run. I basically did this a couple of times, trying the other method to make this change as well. Of course, this site is something I do in my free time, so I had to do other things during this process as well, such as go to work, sleep, and eat. In the end, once I realized that the process was working, and I should just let it run, it took about 10.5 hours.

Long story short, the site is back up, and hopefully it will be faster from here on out.

Technical Details

The parameter that I was trying to change was the MySQL storage engine. Originally I had it set as MyISAM, most likely because that was the default when I created the database. MyISAM is extremely fast when you are doing just reads from a table, but when you do a write, it will lock the entire table, preventing any other threads from access it for reading or writing. I suspect this is what was causing the performance issues with the site.

So, I decided to change it to the InnoDB storage engine. InnoDB is still fast for reads, but not as fast as MyISAM. One of the big advantages to InnoDB, though, is that it uses row-level locking, instead of table-level locking. So, if one request is adding a new address to the table, another request can be reading some other addresses. (The one that is currently being written won’t be able to be read, but it is very unlikely that someone would be trying to view a map that has not been fully created yet.)

First Approach

So, there are two main approaches to changing the database storage engine. The first one I tried, which seemed simpler at the time, was to issue the following command:

ALTER TABLE tablename ENGINE = InnoDB

I found this on Farhan Mashraqi’s site during my research, so I decided to try it first (after making a backup to the database, of course).

I restored the backup on my laptop, and decided to give it a test run. I tried it on some of the smaller tables, and it seemed to work. Since I wanted to get the upgrade done quickly, and get on with my life, I didn’t try upgrading the whole database on my laptop, and just started doing it on the production server.

This was really my biggest mistake. I should have done the whole upgrade on my laptop, start to finish, before trying to change anything on the real site. (I don’t really know why I did this; I never would have been so sloppy with a similar task at work.)

So, as I said, I started making the change on the real site. After a could of hours, it still wasn’t finished, and I noticed that in the MySQL Administrator GUI, on the Catalogs tab, the value in the ‘rows’ column was jumping all over the place. It would go up, down, and change in multiple different tables. This didn’t seem to make any sense, and led me to believe that something was wrong, and that perhaps I should restart the process. As it turns out, the row count is just bogus, and shouldn’t be trusted. But, I didn’t realize that at the time, so I restarted the process, and went to bed.

Second Approach

In the morning, I checked on it, and it was still chugging along. I figured that something must be wrong, since it was taking so long, and the row counts were still acting weirdly (I didn’t know they were bogus yet). So, I decided to try the other method of changing the storage engine, which is to take your backup file form mysqldump, and change the parts where it says ‘MyISAM’ to ‘InnoDB’, and then restore it. I decided to do this with the sed command, since this backup file is rather large, and I wasn’t sure if vim would choke on it. The following command did the trick:

sed s/MyISAM/InnoDB/ backupfile.sql > updatedbackupfile.sql

Then, I just ran that SQL script. It also took forever. I thought that perhaps the issue was that my site runs on a virtual server (from slicehost, and overall I am very happy with their service), this means that the disk i/o can sometimes be slow. It generally isn’t a problem, but it is one of the disadvantages to a virtualized server. Since what I was doing was very i/o intensive—reading a large file, doing a little bit of processing, and then writing the data out into the database, which resides on disk—I thought that perhaps this was the culprit. So, once again, I canceled the process.

The next (and final) try, I did the same thing, except I had the modified backup file that I was trying to restore on my laptop, and I restored it over the network. This meant that the server, with its virtualized disks, was only responsible for writing the data to disk, and not reading it from disk. It read the data from the network. I also learned that the row count in the MySQL Administrator GUI was bogus, and I shouldn’t worry myself with it.

A little more than 10 hours later, the database is back up and running, and I removed the maintenance message from the site!

Lessons Learned

  • Always do something first in a test environment before touching the production environment
  • Database actions that change entire large tables take a long time. Let them run
  • Always do something first in a test environment before touching the production environment
  • The row count in the MySQL Administrator GUI is bogus. Don’t even look at it.
  • Always do something first in a test environment before touching the production environment
This entry was posted in gmaps.kaeding.name and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

32 Comments