Thread-safe generic collections, containers and dictionaries in .NET

ThreadsIt is very common with threaded applications to need to share data between threads, often in the form of dictionaries, lists, stacks, etc. However, the standard containers (generic or otherwise) in .NET are not thread-safe—if you just, say, reference a regular Dictionary<> or List<> from different threads, you will end up with some fairly hard-to-track-down bugs.

I recently worked on a project where an ArrayList was being used between threads with no sort of locking put in place and, amazingly enough, there were all sorts of hard-to-reproduce failures in the code. In general, locking code was used elsewhere, so it looked like an oversight on the part of one particular coder.

As it happens, the .NET libraries have all sorts of built-in thread-safe(ish) containers, collections, dictionaries, etc., but a lot of coders are unaware of some or all of them. This is not entirely surprising given the ways that they are exposed and the timing of their release.

Non-generic containers

This is one time when the terminology can get a bit confusing. A generic container is one that allows for the exact definition of the type of thing it contains, whereas a non-generic container is one that doesn’t do any type checking. Read that again—I assure you that it is correct!

To be more specific, generics make use of the generics functionality added in .NET 2.0, and takes the form of:

Container Example
List<T> List<string>
Dictionary<TKey, TValue> Dictionary<int, string>
Stack<T> Stack<DateTime>
Cup<T> Cup<Oolong>

As opposed to the older containers:

  • ArrayList
  • Hashtable
  • Queue
  • Stack

In general, you should always use the generic versions if you can since they provide type safety and can be faster as well. However, you might run into the non-generic versions in older code—either because it predates generics or it is post-generics, but pre-thread-safe containers! That is because there are very handy static methods that go with each of these that provide synchronized versions of the collections:

ArrayList myList = ArrayList.Synchronized(new ArrayList());

Hashtable myDict = Hashtable.Synchronized(new Hashtable());

…and so on.

These static Synchronized() methods each return an appropriate derivation from the main class that wraps each method and property in a lock. For example, if you look at the source for, say, the Count property of ArrayList, it would look like this:

public override int Count
{
  get
  {
    lock(_root)
    {
      return _list.Count;
    }
  }
}

Where _root is the SyncRoot of the contained ArrayList and _list is the ArrayList itself. Of course, you could just wrap every single call to a regular ArrayList with your own explicit lock, but it a) bloats the code and b) it is very easy to miss a spot or two.

These synchronized collections are very handy, but there are some potential issues with them:

  1. They are not generic, so you get no type-safety.
  2. The locking code wraps a call to the original method, which means that locks might not be as efficient as they could be (a completely thought-through stand-alone implementation might take advantage of the arrangement of the internal structures).
  3. Enumerating collections is not thread-safe!!

The last one is particularly likely to cause bugs because it is easy to assume that, since the collection is “Synchronized”, everything will automatically be thread-safe. This is, however, the reason why I added the “ish” when I said that these collections were thread-safe. Note that this is also true of some of the generic collections, so worth explaining in more detail.

Consider the following code:

_mySynchronizedList = ArrayList.Synchronized(new ArrayList());
//...
foreach(string str in _mySynchronizedList)
{
  DoSomething(str);
}

This code is not thread-safe! The way that foreach works is that it gets an enumerator from the underlying ArrayList, and then uses it to step through the list. Following the pattern, the call to create the enumerator is thread-safe, but the enumerator itself is not, which means that if another thread adds or removes something from the collection while you are enumerating, your results are likely to be random – either skipping items or throwing an exception–or possibly much stranger (punching a hole in the space-time continuum is not unheard of in this situation).

Of course, you can make your code thread-safe by adding your own explicit lock:

lock(_mySynchronizedList.SyncRoot)
{
  foreach (string str in _mySynchronizedList)
  {
    DoSomething(str);
  }
}

But, again, the issue here is that it is easy for future coders to forget to do this. Also, this may not be ideal—particularly if the DoSomething() method is time-consuming. If it is, then you might want to first copy the items from the list (inside your lock) and then use your copy to do the actual work – possibly testing to see if each item is still in the collection before doing anything with it (in case it was removed by another thread).

The tl;dr from all of this is that the Synchronized containers are very useful, but have their own dangers. In general, it is better to use the types of containers that tend to have atomic operations (Dictionaries, Queues and Stacks). Of course, you can enumerate all of these types of containers, but it is not as likely that you would need to.

Generic Containers

It was not until .NET 4.0 that an explicit set of generic thread-safe containers were added. Unlike with the non-generic collections, these are explicit implementations, and are all in their own namespace: System.Collections.Concurrent. The most important items in the namespace are:

  • ConcurrentDictionary<TKey, TValue>
  • ConcurrentQueue<T>
  • ConcurrentStack<T>
  • ConcurrentBag<T>

Other than the class names, these work basically the same as their non-concurrent namesakes, but with appropriate locks. Also, because of the way these are written, you can enumerate these collections! There is some fancy code in the enumeration handling to make that legal, although you are not guaranteed that the contents of the enumeration won’t change during the operation.

However, you may notice that there is no ConcurrentList<> available. The closest is the ConcurrentBag<>, which doesn’t have ordering, and misses a number of the capabilities of a List<>.

This comes back to the issues we discussed with the non-generic version of ArrayList—it is simply not possible to make an implementation of List<> that is guaranteed to be thread-safe under all circumstances without requiring outside assistance. So, it seems, Microsoft decided that it is better to not have one at all, rather than have one that would cause trouble. I agree with them, and, again, recommend that you avoid using lists between threads if you possible can use one of the other types of containers.

But…

If you absolutely, positively need to have a List<> that is shared between threads, there are some options. Interestingly, Microsoft themselves obviously decided that they needed an implementation internally, since they created an internal version for their own use (which we cannot access). But, there is also an implementation that is public:

SynchronizedCollection<T>

This collection was added in .NET 3.0, and, rather than being part of the core libraries, is implemented inside of System.ServiceModel.dll (which means that you might need to add a reference—and it might not be available for all profiles), although it is in the System.Collections.Generic namespace.

Technically, this class should be called SynchronizedList<> because that is really what it is – it works the same way as the synchronized wrapper you get from ArrayList.Synchronized() – it contains a List<T> and just puts a lock around each call to the internal implementation. It also implements the IList<T> interface, so can be used in most places that a list is required.

The nice thing is that this gives you a type-safe replacement to using ArrayList.Synchronized(), but it does suffer from the same limitations—particularly that enumeration is still not thread-safe!

 

Building thread-safe application is not easy, although the new async capabilities of .NET definitely make life simpler. However, if you have to share data containers between threads, you at least now have a fair number of options.

Tagged with: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*