Wednesday, May 21, 2014

Arrays are bound to get you into trouble

A few days ago I read this old yet informative blog post by Eric Lippert, about why it is good practice to avoid arrays as much as possible in your C# code. Since I am working a lot with numerical computations, I do use arrays quite a lot since they tend to be faster than class-based collections such as List<T> etc. for my particular purposes. Nevertheless, Eric's article is a good eye-opener on the caveats associated with arrays.

Ironically, I today stumbled into a serious problem involving arrays and multiple threads. It was not obvious from the start that it would lead to problems, but it certainly did.

Somewhat simplified, I had one pre-sorted key array with no duplicate entries and a collection of value arrays. The value arrays were subject to some operation from a 3rd-party library that involved sorting based on the key array. The operation on one value array would be independent of the operations on the other arrays, so I decided to parallelize the process to gain time. Schematically, the code looked like this:

double[] x = GetSortedKeyArray();

var y = new double[samples][];
for (var i = 0; i < samples; ++i) y[i] = GetValueArray(i);

Parallel.For(
    0,
    samples,
    i =>
    {
        ThirdPartyLibMethodUsingArraySort(x, y[i]);
    });

And in the 3rd-party library:

public void ThirdPartyLibMethodUsingArraySort(
    double[] x, 
    double[] y)
{
    ...
    Array.Sort(x, y);
    ...
}

Since the key array was already sorted before the first call to the 3rd-party library method, I never imagined that the Array.Sort call would cause any trouble. As long as the number of items in the key and value arrays were no more than 16, it did not either. But for 17 items and above, chaos!

If the number of items in the key and value arrays exceeded 16 and the number of value arrays were large enough (of the order of hundreds on my machine), the Array.Sort method would on some calls return duplicates in the key array x. Of course, once this error started appearing, it would continue to propagate and completely run havoc with the supposedly immutable key array.

Reading the MSDN documentation on Array.Sort, it seems like array size 16 is an important limit, where the sorting internally switch from an insertion sort algorithm to a Quicksort algorithm. The Quicksort algorithm apparently does not account for the fact that the incoming array might already be sorted and shuffles around the data. When the same array is then entering Array.Sort from another thread before the previous thread is finished, things are bound to go wrong.

What I had to do in the end was to copy the contents of x to a temporary array before each call to the 3rd-party library method, and call the method using the temporary array instead:

    {
        var tmp = new double[x.Length];
        Array.Copy(x, tmp, x.Length);
        ThirdPartyLibMethodUsingArraySort(tmp, y[i]);
    }

Take-home message: you can never be too careful with array immutability in a multi-threaded context.

No comments:

Post a Comment