It has been a while since the last time I wanted to write something about Urbanesia. It was one of my best product, technology wise. I had learned a lot of techniques from my colleagues, especially from Andri Burman. Without further a do, here’s a few important points that you need to know about Urbanesia.
Squid as Reverse ProxyBy using Squid, it will make your single server looks 4 times the capabilities! It is a must for a website that is planning to receive high amount of traffics. Urbanesia still uses one server when it receives millions of pageviews per month and ranked 280-ish in Alexa Indonesia.
Basically, use the same technique as describe in the Squid manuals as reverse proxy (
Vary: Accept-Encoding to keep two types of docs, gzipped and raw,
Cache-Control: must-revalidate, max-age=0, s-maxage=600 to tell Squid to cache for a specific time). But one of the most important part is the cookie technique.
For logged-in users, you create this cookie:
setcookie(‘LOGGED_IN’, ‘Y’, $_SERVER[‘REQUEST_TIME’] + 604800, ‘/’, ‘.urbanesia.com’);
and then set this at squid.conf:
acl cookie_logged_in_set rep_header Set-Cookie LOGGED_IN=Y
cache deny cookie_logged_in_set
acl cookie_logged_in_out rep_header Cookie LOGGED_IN=Y
cache deny cookie_logged_in_out
acl cookie_logged_in req_header Cookie LOGGED_IN=Y
cache deny cookie_logged_in
Don’t forget to remove the cookie when logged out.
This is a pretty lifesaver. First, Squid will always give cache when this cookie doesn’t exists. So crawlers and bots will always receive Squid cache. Sometimes evil crawlers and bots ignore those headers. Since Urbanesia receives 92% organic traffics, we can confidently say that those 92% visits only hit Apache once to generate the cache for the page and the rest is handled by Squid.
MemcachedThe next one is Memcache. Don’t know about this? Please Google. This is the utmost important knowledge you’ll ever get. Basically what memcache does is store keys and values in memory. That’s it! But when the quota of memory usage is capped, there will be eviction, based on LRU (Least Recently Used), which means the unimportant keys will be removed. The good thing about Memcache is that every keys have expiry time. So when it is time for the key and value to expire, it will just disappear from Memcache.
Basically I created a memcache model to replace $this->db->query() into $this->mdb->result(), so when I requested a query to MySQL, it first go to Memcache to check for a key (the query is converted to a hash key, basically md5($sql) to generate the hash). If the key exists, then it return the value. But when a key doesn’t exists, it query the MySQL, store into Memcache with the defined expiry time, and return the value. So next time there’s a similar query, it will not go directly to MySQL anymore.
My tips for memcache is:
- create a falsify/defalsify technique. The problem with memcache, you cannot differ between a ‘false’ value stored in Memcache key and a false when there is no key found in Memcache. So when I wanted to store a ‘false’ value, I stored ‘’ instead, and when I receive a ‘’ value, I return a boolean value ‘false’. I got the technique from Adam Gotterer from CollegeHumor.com.
- Max cache expiry time is 30 days. If more than that, Memcache behaves strangely. So I capture the expiry time first, and when it is over 30 days, I change it to 30 days.
- Create namespace to delete multiple keys automagically. In memcache, you HAVE to know the key to get the values or delete the key. Using namespace, I can delete multiple keys. For instance I create a key business_id#1#attribute#10, business_id#1#attribute#11 and business_id#2#attribute#12. When I call delete(‘business_id#1#*’), then I will delete both keys attribute 10 and 11. But when I call delete(‘business_id#*’) I will delete all those three keys. The namespace is stored in Memcache in the form of arrays.
- It’s better to cache a MySQL query for 1 minute rather than a pure MySQL query for every requests. Imagine when you got DDoS and you receive hundreds and thousands of requests per minute? It will only goes to MySQL once! Use the dogpile technique explained by Adam Gotterer to avoid Memcache key lost and in between the requests to MySQL and generating Memcache key, other requests also go to MySQL because the key is not ready yet.
What’s the impact for this? Most of the time, logged in users in Urbanesia never accessed MySQL when they go to the Urbanesia landing page. Average requests is 70 ms for landing page. So you convert a 20-200 ms worth of MySQL query time and cpu resource into a less than 1 ms requests to Memcache. The good part is, Memcache is scalable easily. Just give one line of command server_add() and you are using additional Memcache server.
That’s why Urbanesia memory is 8 GB (last time it was upgraded to 16 GB) because memory is cheap, but CPU resource is not!
Cache your PHP code with eAcceleratorPHP is an compiler and interpreter (CMIIW). So what it does is that it first compile your raw code and then deliver it to browser. The problem with PHP is that it has to compile over and over again for every requests. eAccelerator cache the bytecode and deliver a compiled program, so it will run much faster (up to 2x the performance). The installation is very easy. My tips is let eAccelerator cache your PHP script in memory. If I am not mistaken, it can do a hybrid cache, in memory and in HDD if it ran out of memory quota for eAccelerator.
Other techniquesWe also implements several techniques to optimize the browsing experience. For instance:
- Put the javascript code down below, just before the closing body tag. This way, user will experience a ‘fast web’ experience. Why? Because when you load Javascript, it will stop loading the HTML until the Javascript is loaded. This is caused by DOM.
- Because the javascript is down below, we use the Hijax technique. Hijax comes from hijack+ajax. Basically is that when javascript is not loaded yet and someone clicked on a link with ajax behaviour, it will go normally to that certain page (for instance clicking on the sign-in link while the Javascipt is still loading will take you to the /signin page). But if the Javascript is loaded and the binding is complete, then I will call the Javascript program (for instance, clicking the sign-in link will show you a pop-up windows). This way Googlebot can still access your website without the Javascript on (but actually Googlebot now run the javascript too). More about Hijax here.
- Minify and gzipped the javascript and css code as much as you can. I uses Minify down at Google code, and store the cache on memory using Memcache.
- Create a special subdomain for images, js, css. Set the expiry time with mod_expire to one year for those subdomains, since images, js and css won’t change too often. And you will have a persistent connections too. Urbanesia had static-10.urbanesia.com and static-20.urbanesia.com so it will download a lot faster.