Iterating over a collection in multithreaded code presents a problem. At
the most basic level, the problem stems from the premise that while a
thread is iterating over the collection, another thread may be adding or
removing items from the same collection. When this happens, the runtime throws
an Invalid
Operation Exception with the description of 'The Collection was modified...".
Iterating over a collection is further complicated when you try to follow good Object
Oriented techniques, including abstraction and encapsulation. For example, this means that
your GUI code responsible for dynamically building a Context Menu is separate from
the business entity that represents your collection of Contacts.
Additional complications arise when building frameworks. In the case of a Contact Collection
inside a framework, this collection is likely to be used in a number of different projects
and across a number of different use cases. Some of these projects will be
single threaded WinForms apps, other will highly concurrent ASP.Net applications, and others will
be multithreaded Service Applications.
Anti-Pattern: Make the Application Developer do the work
Many frameworks take a "place the burden on the application developer" approach, which means
that locking and threaded operations are the responsibility of the application developer
building the final application. This typically means the framework collection classes will
look something like:
public class ContactManager
{
private List<string> _contacts = new List<string>();
private object _syncroot = new object();
public object SyncRoot { get { return _syncroot; } }
/// <remarks>
/// Anyone iterating over this must do so inside
/// a lock on SyncRoot
/// </remarks>
public List<string> Contacts
{
get { return _contacts; }
}
// Various Contact Operations Go here
// - Add
// - Delete
// - Filter
}
In this code the collection itself is directly exposed, along with a locking primitive. To use this code, developers
must know to use the lock primitive, and write their code to look like:
public void AddContactsToMenu()
{
lock (_manager.SyncRoot)
{
foreach (string s in _manager.Contacts)
{
// Do something here
}
}
}
Although this works, the drawbacks are many. Every access to the Contact collection needs to be
done within the context of the correct lock. The odds are very high (and I speak from personal experience
as a Framework Architect) that developers will:
- Not use the lock construct at all.
- Use the lock construct incorrectly.
- Use the lock construct in some places, but forget all about it in others.
- Blame you for having a crappy Framework that has threading issues.
This pattern typically ends in frustration and failure for the application developer. They simply
don't know enough about threading, and don't know enough about the framework to properly use the
locking mechanisms exposed.
In general, Frameworks should ease the burden on application developers. Pushing thread management out of
the framework and into the application is not a good way to go.
Anti-Pattern: Using ReadOnlyCollection to eliminate locks
A common solution that we used for some time was to allow application developers the ability to iterate
over a ReadOnlyCollection that was populated from
the real collection. The code that we wrote to do this typically looked like:
public ReadOnlyCollection<string> Contacts
{
get
{
lock (_syncroot)
{
return new ReadOnlyCollection<string>(_contacts);
}
}
}
This approach looks and sounds great, and is conceptually sound. After all, if we only give the application developer a ReadOnly copy of the collection, they don't
have to worry about the collection being changed out from under them.
Unfortunately, the .NET Framework version of the ReadOnlyCollection doesn't work this way. I wrote all about that in
a previous blog.
If you end up taking this route, make sure the ReadOnly version of the collection really is ReadOnly. This means a fair bit of Cloning and Copying.
Anti-Pattern: Abandoning Object Oriented Principles
Another common solution is to start throwing everything into the framework, including GUI code. It's easy to envision a business class that has the collection, yet also has in it the
GUI code to build a WinForms Context Menu, Populate a ListBox, or build an ASP.Net menu. This is especially true with frameworks to which application developers have source code access to.
Each time a developer needs something new, it's easy to add one more method into the factory: Populate a ListView, Build a tree, add items to a context menu. After all, the locking
constructs are already there, all the relevant business objects are handy, and you can just create a method that takes a DataGridView as a parameter...
18 months and 300 methods (with at least 75% code duplication) later, the code is now a jumbled mess, with no sense of structure or ownership.
This, um, Organic approach to building a framework leads to chaos and darkness.
Pattern: Using Lambda Functions
After spending a few years going through the various (in hindsight) Anti-Patterns to solving the Collection
Enumeration problem, I think I've finally settled on a solid approach. The requirements that the problem needs
to solve is:
- Allow applications developers to enumerate over collections of items in the framework.
- Not require application developers to perform any explicit Thread or Locking operations.
- Participate in any locking mechanism required by the Framework, so that multi-threaded code can be written and leveraged.
The best solution to this overall problem seems to be using Lambda Functions - or
in C# 2.0 terms, passing in an Action
or a custom ActionWithState delegate. Using the same code as earlier, our Application code will typically end up looking like:
public void BuildMenu()
{
_manager.EnumerateContacts(BuildMenuCallback);
}
private void BuildMenuCallback(string s)
{
_myMenu.Add(s);
}
Alternatively, we could do this same thing with an Anonymous method. Using outer/captured variables for state, and doing lightweight operations, this is a great way to go.
public void BuildMenu()
{
List<string> menu = new List<string>();
_manager.EnumerateContacts(delegate(string s)
{
menu.Add(s);
});
}
The key to this application code is the callback method. With the framework class controlling the
callback, it's easy to participate in any locking infrastructure.
The Framework implementation of this code is straightforward:
public void EnumerateContacts(Action<string> contactAction)
{
lock (_syncroot)
{
foreach (string s in _contacts)
{
contactAction(s);
}
}
}
One drawback to this approach is the difficulty of passing and managing
state. In the anonymous method example above, I used a captured variable
that was implicitly passed in for state. In general, I prefer explicit
methods over anonymous, yet I still want to be able to pass in a state
variable. For this, I borrow a page from the .Net Async Delegate pattern,
and add on an Object parameter to the Action delegate.
public delegate void ActionWithState<T>(T obj, object state);
Now, in the framework code, I create another overload of the EnumerateContacts method:
public void EnumerateContacts(ActionWithState<string> contactAction, object state)
{
lock (_syncroot)
{
foreach (string s in _contacts)
contactAction(s, state);
}
}
Now that we have state, it's easy for the client code to do more interesting things:
public void BuildMenu()
{
List<string> menu = new List<string>();
_manager.EnumerateContacts(BuildMenuCallback, menu);
// Now we can actually build our menu right here...
}
private void BuildMenuCallback(string s, object state)
{
List<string> menu = state as List<string>;
if (s.StartsWith("C", StringComparison.OrdinalIgnoreCase))
menu.Add(s);
}
Using C# 3.0, we could certainly get fancier with the delegates, and turn them into true
Lambda functions, Predicate Functions, and other classic Functional Programming techniques. Even
with C# 2.0 we could get most of the way there, but the syntax is a bit cumbersome.
There is one big drawback to this approach: If the application developer does
'significant' work in the callback function, the lock pattern will be held much longer
than we would like. Fortunately, this is pretty easy to mitigate by telling
application developers not to get too crazy in their callback methods. It's also easy to
debug later if performance becomes an issue.
Conclusions
Overall, I've been pretty happy with this approach so far. It certainly beats the other
approaches I've used - so much so that I now consider them to be Anti-Patterns.
While I'm sure this pattern isn't new, I haven't seen it before. There are
a number of concurrency patterns discussed on the web, including Thread Safe Iterators,
and a few other related examples. Certainly a close variation of this would be the
Loop-Tiling pattern Joe Duffy talked
about in his recent MSDN article.
I would love to hear suggestions and other variations to solving the underlying thread safe enumeration problem.