Sunday, February 24, 2008

Custom Caching Dynamic Pages For Better Performance


Over a year ago I decided to make my community site homepage a static page. It was a page that combined a high number of complicated queries, and hence put quite a load on the database every time it was called. Since our homepage itself usually updated with new content every 5-10 minutes or so, I made the call to have a static version of it created via CRON every 5 minutes or so.

The result was great, took a huge load off the database server and allowed our homepage to be "up" even when the database would go down.

Building on that idea, I had long been toying with the idea of caching some of my inner pages. One page in particular is the most popular script supporting over 30,000 pages of content. This single page is responsible for over 50% of our total pageviews.

The main problem I faced was the fact that these pages could change quite often, with user comments, etc being added. Having a CRON job create 30,000+ static pages every hour just didn't seem to make much sense when I was trying to save on performance... ;-)

I decided on a middle-of-the-road solution. Realizing that I could capture the output from that page and store it in a "cache" table in my database, I could turn a page that was doing 20 queries into a page that was only doing one query to retrieve it from the cache table.

To set it all up, I had to learn some new tricks involving redirecting PERL/CGI output to a new pointer (variable) and back again to the standard output. Once I got that figured out, I didn't have to change my existing script much at all. I basically add some code to check the cache table in the begining of the script. If it found a cached copy that was clean (ie - not dirty), it would serve it up. If it found a dirty cache copy (dirty bit set), I would let the script run all 20 queries and then update the cache, and server the new updated page.

In one script I could handle both serving the live dynmic page and the cached page, along with updating the cache. And it really only took a few lines of code at the top of the script and some at the bottom to save the updated cache copy.

The next step was to figure out what made the cache dirty. I basically went through and reviewed every possible change that could happen to update that page. In each of those places I put a small piece of code to mark the cache "dirty" when a change would take place.

After a cached page was marked dirty, the next time it was called it would automatically update itself.

One last decision was to by-pass the cache if the user was logged in. We have a fairly active user community, but they really only make up about 10% of our active visitors on the site at any given time. 90% of the traffic is lurkers, or folks just browsing content who are not logged in.

We decided to only user the cached copy for the non-authenticated users (and robots). This allowed us to continue to offer all our custom user-based features on those pages, and still save a ton of database processing time.

We have been monitoring it for the last few weeks. So far so good. Site performance is way up, and the database load is way down. Most importantly, the end users never even noticed a change was made.

Another benefit, if one of the pages gets "dug" or "stumbled", that page would normally get hammered by anonymous visitors and slow the server down for everyone. Not any more. Now the first person to access the page (if it is new) triggers the cache copy to be saved. After that every one else is pulling the page from our cache, and our main tables are freed up for the regular users to continue on without noticing a slow down.

I would highly recommend trying something like this if you have a single server setup with dynamic content like I do, and you get hit pretty hard with traffic. It took me less than a day to get it all setup and working. Once I turned it on, the cache basically built itself over the next few days caching all new pages as they were served up.

If you have any questions, by all means drop me a line or leave a comment. Hope this helps someone out there! ;-)

No comments: