Friday, September 21, 2012

Gumball: A race condition prevention technique for cache augmented SQL Database Management systems - Developed by Shahram Ghandeharizadeh and Jason Yap

It is not uncommon for a system these days to increase its scalability by addition of cache servers(key-value stores,KVS) complementing its RDBMS systems. Most frequently accessed RDBMS records are stored in caching systems as key-value pairs. When there is an update to the record in database, its corresponding cache entry is invalidated and gets recomputed on next cache miss. KVS such as memcached are widely used by popular sites such as Facebook, Youtube, Wikipedia, etc.

On a high traffic site, it is highly likely to end up with inconsistent or stale data when that there could be undesirable race conditions between two cache update operations. The paper chooses a social networking website scenario as an example. Consider a time when Alice is retrieving her profile after her update to it while the administrator is deleting her profile for breach of terms and conditions. There could be a race on KVS between update operation(from profile updates of Alice) and delete operation(from delete profile action of administrator). Gumball [1] Technique(GT), developed by Shahram Ghandeharizadeh( and Jason Yap( at USC, tries to solves these race conditions.

GT understands that caching systems have taken the overall system too far on scalability spectrum ignoring consistency and thus introducing a new scale of data staleness(consistency) on the over all system. It also understands that this system's position on scalability spectrum is mostly rigid while it fluctuates on consistency spectrum depending on number of occurrences of race conditions. GT tries to flip these positions so that it is rigid and far up as possible on the consistency spectrum while trading off to small degree on scalability spectrum. It also ensures that its position on scalability spectrum adjusts automatically depending on number of race conditions.

GT introduces a new type of entry in the KVS called gumball entry that can be associated on a per key basis. These entries store the timestamp Tdelete when a delete operation happens on a key. Subsequent deletes on this key that has no value, continue to update the Tdelete metadata. This gumball entry sticks on to the KVS only for a small duration of time, Δ, after which it is dies. Get operations continue to function with a minor variation of reporting back a Tmiss whenever it encounters a cache miss. 

The purpose of gumball entry is to provide sufficient information to subsequent put operations on this key for them to be able to detect and act on race conditions. The put operation race detection test rejects the put operation when one of the following checks pass:

  1. If the difference between current time TC and Tmiss is greater than Δ, there is no way to tell for sure if there lived a gumball entry while the key-value pair was being recomputed on a cache miss. It takes a cautious outlook here and rejects the put operation
  2. When there is a gumball entry with timestamp Tdelete, it can detect a race condition if the cache miss Tmiss happened before the gumball was created at Tdelete when the key was deleted. This is a positive pass in detecting the race condition and the put is rejected.
  3. Similar to above condition, if it finds a value for the key, it detects a race if Tmiss is before the timestamp of existing value. This is a positive pass in detecting the race condition and the put is rejected.

Whenever a put is rejected, subsequent get operations observe a miss and are redirected to RDBMS. This ensures the system sees consistent data while trading to a small degree, its position on the scalability spectrum. This position depends on the value of Δ, the time duration for which a gumball sticks to KVS key. Ideally, this value is set to the average time duration it takes to recompute and begin put operation of updated key-value pairs(on encountering a cache miss). If this value is lower than ideal value, higher percentage of puts will be rejected and subsequent gets observe a cache miss, pulling the system's position down more vigourously on scalability spectrum; if this value is higher than ideal value, gumballs stick around for longer duration improving better quality results in race condition detection tests, impacting scalability to a smaller extent than the former case. To see a balance between the two, the paper suggests dynamic computation of Δ, where the system learns and adjusts by increasing Δ gradually(limited to a dynamically capped value based on maximum response time within the last few seconds) on each put reject and decreasing it on each overwrite of gumball entry with actual value. This way the system's position on scalability spectrum is adjusted based on the number of occurrences of race conditions.

The paper describes the benchmark evaluation experiments that measure the amount of stale data elimited by Gumball Technique(GT), the impact of GT on system performance(while managing gumball entries and going through race condition check on each put),  and how quickly the technique learns about the workload and adjusts Δ. In their specially setup experiment(so that RDBMS query computations do not shadow the overhead of GT), it was found that there was 300 fold reduction in stale data, GT's impact on system performance was negligible in their experiments that encountered race conditions less than 3% of the time and GT always adjusted the value of Δ quickly for varying workloads ensuring it became cautious and maintains consistency.

GT is a simple technique that acknowledges that while caching system increase scalability, it adds an overhead of maintaining consistency in data between KVS and RDBMS. The technique is simple to implement and requires no specific feature from RDBMS or KVS, thus translating to better rate of returns via increased reduction in stale data. It also works well in a replicated setup of caching servers when there are is improper ordering of cache operations only on a few of the servers, hence it has no clock synchronization requirements. 

The paper on ACM is here and a detailed lab report is here


[1]  S. Ghandeharizadeh and J. Yap., Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database Management Systems., In the Second ACM SIGMOD Workshop on Databases and Social Networks (DBSocial), Scottsdale, Arizona, May 2012.

P.S. 1. Please note that Gumball Technique has a patent application. The patent reference for Gumball is:
US Provisional Patent Application No. 61/669,257.
Race Condition Technique That Prevents Caches From Becoming Inconsistent With The Database.S. Ghandeharizadeh and J.Yap.
Issued August 22, 2012.

P.S. 2. I am on the look out for a project using memcached that is willing to experiment and tweak its caching system by introduction of Gumball technique in an offline manner and then roll it out to production if the results are satisfactory. Please direct message me on twitter @abineshtd if you are interested.

No comments:

Post a Comment