Boost and Varnish and general tips and rants about Drupal caching

This topic has always been a big pain in the ass for me. Drupal takes a lot of resources, which means money for hosting, you should at least have VPS to have some decent hosting for drupal, but not to have OOMing and high CPU usage all the time, you should use different caching systems.

Boost

Boost is a nice system that basicly creates static pages of all (or some if you use filters) content you have and servers them as if you have html pages. Which is pretty fast, and not DB queries are run when visitors come, other then when first visitor comes and he pulls a page from DB to be created (this occurs also when page expires, so you have refreshed page). In Drupal 6 you had a big admin page with lots of different options for boost, but in Drupal 7 this has been cut to just few and with other settings best practice/defaults are used. You can't mess this up a lot. Just make clear that "cache" directory has writing permissions set, and that drupal can write to it. You can use 777 and www-data www-data (user group) for this to work for sure, but some more advanced and more secure setting should be used here. Also make a note that in drupal 6 for some time you could have drupal change your .htaccess to work with Boost, now in D7 you need to make it yourself. That is it, only problem you can have here is if you want to have some dynamic pages, comments, forms etc. But this also works nice now, and you can always just use 3rd party plugins, like Disqus for comments, that handles everything and it is JS based so it works and refreshes without having to have your boost page refreshed.

BTW All of this talk is about serving anonymous users, which is what most sites do. So bare this in mind.

Varnish

Varnish is a different beast. It is a reverse proxy caching thing, meaning your pages would be served without running php or apache, which means that your server would work less and resources can be used for something else, like serving data to logged in users who pull data directly from DB, in most cases that could be just your admin user, but hey quicker administration is a big plus, especially if you are upgrading something, changing or adding to your drupal site.

So to install Varnish there are bunch of tutorials out there. I won't get into this here but you need to do that on your server, there is not much to be done with drupal.
And I have done this, had Varnish running for some time, but not realizing it is not working properly on my sites and not caching things. Why, well because you do have to set some things in drupal as well. I accidentally bumbed into this site
http://www.isvarnishworking.com/ which is sort of a blessing, as it says is your site varnish caching working and why it is not if not. First message I got was Yes, Sort of. Meaning varnish was up and running but not really caching data. Why, because in headers of page, this was set "Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0" Saying basicly to varnish not to cache this pages. Why, because this was set on performance page of drupal. Usually you select caching there if you want to use drupal core (caching in DB) option, but if you use boost, you usually disable this, but then again on Varnish you MUST use this, and settings below, when pages should expire, min and max time. This sets above header to some specific time and you will have something like "Cache-Control: public, max-age=86400" and the site above will confirm that varnish is working. If you need to set more advanced expire times and cache control headers, you should check out https://drupal.org/project/cache_control and there are https://drupal.org/project/expire, https://drupal.org/project/purge and https://drupal.org/project/varnish modules for drupal, that could all help you with some specific actions when refresh data and how.

Some notes
When checking your headers you could come up and see "Expires: Sun, 19 Nov 1978 05:00:00 GMT" which is set in a past, but this is not a problem as Cache-Control overrides this and this is just another self promotion of founder of Drupal and his birhtday.

There is also an article about using both boost and varnish, http://www.trellon.com/content/blog/boosted-varnish-high-performance-ca…
but I don't see there is much use of this. As you boost could only help you not to hit database so instead of it, varnish loads data from pregenerated file, but guess what, if you ever want to refresh this page, you will need to hit DB with boost, so what is the gain? Maybe you could have some odd case scenario where you set boost to refresh every 24 hours and varnish refreshes data every hour. But I don't see a gain here. In the end you can just set varnish to refresh every 24 hours and that is all.

Varnish will have problems if you set cookie for anonymous user, it will not cache page and there are reports that some modules will do that http://2bits.com/articles/beware-drupal-modules-disable-page-cache.html and on some other pages http://akyl.net/modules-break-varnish-caching-and-how-fix-them, but this is mostly beeing fixed now and for example if you use captcha, just install recaptcha module and it will load with JS everything and your page can still be cached.