Key Value Pair Performance Awareness In C#
Traversing a long and distinguished career in various roles in software development it is common to reuse several "killer" patterns or structures over and over again as their application and benefits has been proven multiple times over. One of those "killer" patterns involve the use of various key/value pair data structures that are available within the .NET framework, each having their own personality and use case. While there are many benefits of using these in-memory data stores we have seen the most benefit in using a key/value pair as a dynamic caching mechanism that can significantly reduce the cost of accessing a database to obtain values that often times don't see much change or volatility, or in a simpler case, accessing an expensive item of value and retaining for quick access later.
Key/Value Defined
A key value pair is a data structure that consists of two parts: a key and the value associated with it. A key is usually an identifier for an entity in your application, while a value might be any type of data. The key and value together make up a single piece of information in the database system.
Key/Value Types
- Hashtable: a non-generic collection of key/value pairs that optimizes lookups by computing the hash code of each key and stores it in a different bucket internally and then matches the hash code of the specified key at the time of accessing the values.
- HybridDictionary: a non-generic collection that attempts to optimize the Hashtable by implementing a linked list in combination of a Hashtable internally. When the volume of entries is small a ListDictionary is used, once the volume exceeds a certain threshold then a Hashtable is dynamically switched.
- SortedDictionary: belonging to the generic collection variety that applies a sort on the key upon entry, the keys are immutable, always unique, and cannot be null, while providing the fastest insertion and removal operations for unsorted data.
- ConcurrentDictionary: belonging to the generic collection variety but offering a thread-safety that can be used with multiple threads concurrently.
- Dictionary: belonging to the generic collection variety and thus requiring the same types of key/value types.
- List: not belonging to a key/value option, but a collection that can have its contents queried via LINQ (Language-Integrated Query).
Sample Use Case
Key/Value Test Preparation
While we have established a use case for a key/value pair data structure, we now need to take a look at how different key/value types perform when tasked with looking up different volumes of elements in the collection. In order to load up the key/value structures we will use a GUID (globally unique identifier) as key to prevent any benefits of a numerical sort being added into the equation, and a value being a simple padded string such as '0000001'. We can now just load the key/value pair collections with a simple loop, starting with an initial count of 500,000.
Additionally, we would like select keys that are having an ordinal position that is between the top 25% and bottom 25% to prevent any skewed access times for those values that might be closer to the top and benefit from any internal sequential lookup.
Key/Value Test Execution
Key/Value Pair Performance
Testing the lookup times, we were not sure what to expect; Would the different key/value pair types perform differently given the number of elements in the collection? To what degree would the non-generic collections boxing and unboxing of types impact performance?
Boxing is like creating a reference type box and putting a value of a value type inside it. In other words, it consists of converting a value type to 'object' or to an interface type this value type implements. Unboxing is the opposite; it opens the box and extracts the value type from inside it. This is important to understand as boxing and unboxing can be an expensive process, when a value is boxed another object is created on the heap, resulting in additional overhead from the garbage collector. |
Test 1: 500,000 Elements | |
Type | Time (In Milliseconds) |
Hashtable | 0.0010366336633663373 |
HybridDictionary | 0.0030544554455445567 |
ConcurrentDictionary | 0.004181188118811883 |
Dictionary | 0.004277227722772278 |
SortedDictionary | 0.02198217821782178 |
Linq-Find | 6.2869148514851485 |
Linq-Query | 7.300021782178217 |
Linq-First | 7.533993069306933 |
Linq-Where | 7.554212871287127 |
Linq-Single | 14.860235643564364 |
Despite the boxing/unboxing cost Hashtable out in front | |
Test 2: 1,500,000 Elements | |
Type | Time (In Milliseconds) |
Hashtable | 0.0009376237623762377 |
HybridDictionary | 0.0035623762376237617 |
Dictionary (+1) | 0.004427722772277229 |
ConcurrentDictionary | 0.0047544554455445556 |
SortedDictionary | 0.024322772277227733 |
Linq-Find | 17.7169198019802 |
Linq-Query | 20.017012871287132 |
Linq-Where (+1) | 20.102739603960394 |
Linq-First | 21.869850495049512 |
Linq-Single | 43.984086138613854 |
Did the lack of boxing/unboxing cost give the Dictionary a boost? | |
Test 3: 2,500,000 Elements | |
Type | Time (In Milliseconds) |
Hashtable | 0.0020019801980198024 |
HybridDictionary | 0.00330990099009901 |
ConcurrentDictionary (+1) | 0.003987128712871289 |
Dictionary | 0.006721782178217826 |
SortedDictionary | 0.02225247524752475 |
Linq-Find | 26.732295049504955 |
Linq-Query | 29.738853465346534 |
Linq-Where | 31.412106930693067 |
Linq-First | 31.572389108910876 |
Linq-Single | 66.31818712871288 |
ConcurrentDictionary making gains despite no parallel activity... | |
Test 4: 3,500,000 Elements | |
Type | Time (In Milliseconds) |
Hashtable | 0.0010940594059405937 |
ConcurrentDictionary (+1) | 0.004739603960396042 |
Dictionary (+1) | 0.00505940594059406 |
HybridDictionary (-2) | 0.005286138613861386 |
SortedDictionary | 0.02706831683168318 |
Linq-Find | 46.741195049504945 |
Linq-Query | 52.44522673267326 |
Linq-Where | 55.80127722772276 |
Linq-First | 57.466306930693065 |
Linq-Single | 125.0834524752475 |
Appears the lack of boxing/unboxing benefit finally kicking in for ConcurrentDictionary and Dictionary... | |
Test 5: 4,500,000 Elements | |
Type | Time (In Milliseconds) |
HashTable | 0.0014000000000000002 |
HybridDictionary (+2) | 0.0030188118811881183 |
ConcurrentDictionary (-1) | 0.0039000000000000003 |
Dictionary (-1) | 0.004732673267326734 |
SortedDictionary | 0.021590099009900988 |
Linq-Find | 50.61095841584158 |
Linq-Query | 54.99771683168315 |
Linq-Where | 57.0718811881188 |
Linq-First | 59.58632673267328 |
Linq-Single | 124.96653069306927 |
Not so fast, non-generic collections hold the line... | |
Test 6: 6,000,000 Elements | |
Type | Time (In Milliseconds) |
HashTable | 0.0020801980198019805 |
HybridDictionary | 0.0038871287128712864 |
ConcurrentDictionary | 0.004108910891089109 |
Dictionary | 0.005804950495049501 |
SortedDictionary | 0.023022772277227717 |
Linq-Find | 63.95591881188121 |
Linq-Query | 72.04433465346533 |
Linq-Where | 72.28270495049506 |
Linq-First | 75.46057722772277 |
Linq-Single | 156.7921326732673 |
LINQ simply does not scale at the same rate of the traditional key/value pair options... |
Conclusion
Hopefully this demonstration sheds some light on how different key/value pair types respond to pressure. It can be argued given the small variance in performance among the key/value pair types (excluding LINQ) that would such small differences really matter in an actual solution? It could be argued that using a more performative key/value pair collection could result in getting more concurrent users out of a single server, essentially doing more with less, at any rate it is always good practice to know the numbers and see actual results rather than relying on some white paper or documentation.
Comments
Post a Comment