Key Value Pair Performance Awareness In C#


Traversing a long and distinguished career in various roles in software development it is common to reuse several "killer" patterns or structures over and over again as their application and benefits has been proven multiple times over. One of those "killer" patterns involve the use of various key/value pair data structures that are available within the .NET framework, each having their own personality and use case. While there are many benefits of using these in-memory data stores we have seen the most benefit in using a key/value pair as a dynamic caching mechanism that can significantly reduce the cost of accessing a database to obtain values that often times don't see much change or volatility, or in a simpler case, accessing an expensive item of value and retaining for quick access later.

Key/Value Defined

A key value pair is a data structure that consists of two parts: a key and the value associated with it. A key is usually an identifier for an entity in your application, while a value might be any type of data. The key and value together make up a single piece of information in the database system.

Key/Value Types

As previously mentioned, several different flavors of key/value data structures are available within the .NET framework with each having subtle differences that can play a role in the performance of your solution. Perhaps a quick overview of the data structures that we'll be comparing later:
  • Hashtable: a non-generic collection of key/value pairs that optimizes lookups by computing the hash code of each key and stores it in a different bucket internally and then matches the hash code of the specified key at the time of accessing the values.
  • HybridDictionary: a non-generic collection that attempts to optimize the Hashtable by implementing a linked list in combination of a Hashtable internally. When the volume of entries is small a ListDictionary is used, once the volume exceeds a certain threshold then a Hashtable is dynamically switched.
  • SortedDictionary: belonging to the generic collection variety that applies a sort on the key upon entry, the keys are immutable, always unique, and cannot be null, while providing the fastest insertion and removal operations for unsorted data.
  • ConcurrentDictionary: belonging to the generic collection variety but offering a thread-safety that can be used with multiple threads concurrently.
  • Dictionary: belonging to the generic collection variety and thus requiring the same types of key/value types.
  • List: not belonging to a key/value option, but a collection that can have its contents queried via LINQ (Language-Integrated Query).

Sample Use Case

A tax collection process needs to obtain the applicable sales and use taxes for all assets that were placed in service over the last year in order to bill the taxpayer. The data consists of 250,000 records containing 75 distinct counties within the state of Texas. Each county maintains its own facility to obtain the county tax rate, thus imagine a remote procedure call (RPC) to lookup the tax rate for the consumer taking approximately 5 seconds to return a result. If we were to not implement any added "store and retrieve" capability to the solution, we would see that 5 second lookup function being performed 250,000 times at an elapsed time of approximately 347 hours. 

Implementing a smarter lookup and essentially a cache mechanism we could only lookup tax rates for the counties that we haven't already obtained and theoretically see the previous 347 hours reduced to an around 6.5 minutes. Once again, the "killer" pattern to the rescue, the only question remains what are some performance differences among the key/value pair options, and which one would be best selected for the aforementioned use case?

Key/Value Test Preparation

While we have established a use case for a key/value pair data structure, we now need to take a look at how different key/value types perform when tasked with looking up different volumes of elements in the collection. In order to load up the key/value structures we will use a GUID (globally unique identifier) as key to prevent any benefits of a numerical sort being added into the equation, and a value being a simple padded string such as '0000001'. We can now just load the key/value pair collections with a simple loop, starting with an initial count of 500,000.




Additionally, we would like select keys that are having an ordinal position that is between the top 25% and bottom 25% to prevent any skewed access times for those values that might be closer to the top and benefit from any internal sequential lookup.




Key/Value Test Execution

Having established the test data, we can now perform some lookups over each key/value pair collection using the randomly selected elements to determine any performance patterns. In order to get a good representation of various scenarios we used multiple keys to spread the ordinal positions of the keys around in the collection to provide more of a real-world scenario. 

Additionally, we observed the first access of the collection saw a more costly operation in terms of time so omitted the first run in order to properly initialize the structures and get more accurate times (hence the 'isInitialized' variable).





Key/Value Pair Performance

Testing the lookup times, we were not sure what to expect; Would the different key/value pair types perform differently given the number of elements in the collection? To what degree would the non-generic collections boxing and unboxing of types impact performance?

Boxing is like creating a reference type box and putting a value of a value type inside it. In other words, it consists of converting a value type to 'object' or to an interface type this value type implements. Unboxing is the opposite; it opens the box and extracts the value type from inside it. This is important to understand as boxing and unboxing can be an expensive process, when a value is boxed another object is created on the heap, resulting in additional overhead from the garbage collector.

Test 1: 500,000 Elements
Type Time (In Milliseconds)
Hashtable 0.0010366336633663373
HybridDictionary 0.0030544554455445567
ConcurrentDictionary 0.004181188118811883
Dictionary 0.004277227722772278
SortedDictionary 0.02198217821782178
Linq-Find 6.2869148514851485
Linq-Query 7.300021782178217
Linq-First 7.533993069306933
Linq-Where 7.554212871287127
Linq-Single 14.860235643564364
Despite the boxing/unboxing cost Hashtable out in front
Test 2: 1,500,000 Elements
Type Time (In Milliseconds)
Hashtable 0.0009376237623762377
HybridDictionary 0.0035623762376237617
Dictionary (+1) 0.004427722772277229
ConcurrentDictionary 0.0047544554455445556
SortedDictionary 0.024322772277227733
Linq-Find 17.7169198019802
Linq-Query 20.017012871287132
Linq-Where (+1) 20.102739603960394
Linq-First 21.869850495049512
Linq-Single 43.984086138613854
Did the lack of boxing/unboxing cost give the Dictionary a boost?
Test 3: 2,500,000 Elements
Type Time (In Milliseconds)
Hashtable 0.0020019801980198024
HybridDictionary 0.00330990099009901
ConcurrentDictionary (+1) 0.003987128712871289
Dictionary 0.006721782178217826
SortedDictionary 0.02225247524752475
Linq-Find 26.732295049504955
Linq-Query 29.738853465346534
Linq-Where 31.412106930693067
Linq-First 31.572389108910876
Linq-Single 66.31818712871288
ConcurrentDictionary making gains despite no parallel activity...
Test 4: 3,500,000 Elements
Type Time (In Milliseconds)
Hashtable 0.0010940594059405937
ConcurrentDictionary (+1) 0.004739603960396042
Dictionary (+1) 0.00505940594059406
HybridDictionary (-2) 0.005286138613861386
SortedDictionary 0.02706831683168318
Linq-Find 46.741195049504945
Linq-Query 52.44522673267326
Linq-Where 55.80127722772276
Linq-First 57.466306930693065
Linq-Single 125.0834524752475
Appears the lack of boxing/unboxing benefit finally kicking in for ConcurrentDictionary and Dictionary...
Test 5: 4,500,000 Elements
Type Time (In Milliseconds)
HashTable 0.0014000000000000002
HybridDictionary (+2) 0.0030188118811881183
ConcurrentDictionary (-1) 0.0039000000000000003
Dictionary (-1) 0.004732673267326734
SortedDictionary 0.021590099009900988
Linq-Find 50.61095841584158
Linq-Query 54.99771683168315
Linq-Where 57.0718811881188
Linq-First 59.58632673267328
Linq-Single 124.96653069306927
Not so fast, non-generic collections hold the line...
Test 6: 6,000,000 Elements
Type Time (In Milliseconds)
HashTable 0.0020801980198019805
HybridDictionary 0.0038871287128712864
ConcurrentDictionary 0.004108910891089109
Dictionary 0.005804950495049501
SortedDictionary 0.023022772277227717
Linq-Find 63.95591881188121
Linq-Query 72.04433465346533
Linq-Where 72.28270495049506
Linq-First 75.46057722772277
Linq-Single 156.7921326732673
LINQ simply does not scale at the same rate of the traditional key/value pair options...

Conclusion

Hopefully this demonstration sheds some light on how different key/value pair types respond to pressure. It can be argued given the small variance in performance among the key/value pair types (excluding LINQ) that would such small differences really matter in an actual solution? It could be argued that using a more performative key/value pair collection could result in getting more concurrent users out of a single server, essentially doing more with less, at any rate it is always good practice to know the numbers and see actual results rather than relying on some white paper or documentation.

Comments

Popular posts from this blog

Exploring C# Optimization Techniques from Entry-Level to Seasoned Veteran

Is Cloud Computing a Digital Transformation Enabler or Obstacle?

Implementing Enhanced Policing With Big Data and Predictive Analytics