A Comparison Between Maximum Subarray Algorithms

One simple but intriguing problem about math and computer science would be the Maximum Subarray problem. Suppose given an array with finite number of arbitrary integers, e.g., {1, 2, 90, -4, 2, 8, -13, -4, 132, -9483, 235, 95, -6}. Now our goal is to find a contiguous subarray that has the greatest sum among all possible subarrays. Say, if to choose 1, 2, 90, the sum 93 is smaller than that of 1, 2, 90, -4, 2, 8, and this new one is also smaller than 1, 2, …, 132. This one might be easy to tell that 235, 95 is the maximum subarray. If given an extremely long array with randomly distributed positive and negative numbers, however, one cannot easily find out the maximum subarray.

One brute force algorithm would be to find out all subarrays and compare their sums; this is apparently unrealistic. The first reasonable idea jumps into your head, is probably recursion. This method is extensively discussed in Introduction to Algorithm. Basically, it divides the main array into left and right parts, runs the algorithm recursively on both left and right parts to find the maximum sum, and calls a helper method to check the array that passes midpoint. Below is the python implementation of this recursive method:

def maximum_subarray_recursion(numlist,low,high):
    """
    This function takes an array of numbers, and find the sum of the maximum
    subarray in the list.
    Input: A list of real numbers; the beginning and the end of the part in the list.
    Output: Indices and sum.
    """
    if low == high: # base case
        return low,high,numlist[low]
    else:
        mid = int(math.ceil(float(low + high)/2))
        left_l,left_h,left_sum = maximum_subarray_recursion(numlist,low,mid - 1)
        right_l,right_h,right_sum = maximum_subarray_recursion(numlist,mid,high)
        cross_l,cross_h,cross_sum = max_crossing_array(numlist,low,mid,high)
        final = max(left_sum,right_sum,cross_sum)        
        if left_sum == final:
            return left_l,left_h,left_sum
        elif right_sum == final:
            return right_l,right_h,right_sum
        else:
            return cross_l,cross_h,cross_sum

def max_crossing_array(sublist,low,mid,high):
    """
    This helper function tries to find the indices of a maximum subarray that
    crosses the midpoint, and return the sum.
    Input: A list of real numbers.
    Output: The indices of a maximum subarray that crosses the midpoint, together
    with its sum.
    """
    inf = float('inf')
    left_sum = -inf
    s = 0
    i = mid - 1
    # beware that i and j should be consistent with maximum_subarray configurations;
    # that is, if use ceiling, then (mid - 1) and mid; if using floor,
    # then mid and (mid + 1).
    while i >= low:
        s = s + sublist[i]
        if s > left_sum:
            left_sum = s
            left_index = i
        i -= 1
    right_sum = -inf
    s = 0
    j = mid
    while j <= high:         
        s = s + sublist[j]         
        if s > right_sum:
            right_sum = s
            right_index = j
        j += 1
    total = left_sum + right_sum
    return (left_index,right_index,total)

This recursive method, however, is not always the most efficient one; under certain situation, it is even slower than brute force algorithm!

def bruteforce_maximum_subarray(nlist,low,high):
    """
    This function takes an array of numbers, and find the sum of the maximum
    subarray in the list.
    Input: A list of real numbers; the beginning and the end of the part in the list.
    Output: Indices and sum.
    """
    S = {}
    p1 = low
    S[p1 - 1] = 0
    for p1 in range(low,high + 1):
        S[p1] = S[p1 - 1] + nlist[p1]
    maxsum = -float('inf')
    for p1 in range(low,high + 1):
        for p2 in range(p1,high + 1):
            subsum = S[p2] - S[p1 - 1]
            if subsum > maxsum:
                maxsum = subsum
                left,right = p1,p2
    return (left,right,maxsum)

This brute force algorithm, by testing, is faster than the recursion when input array length is smaller than 25 due to the function calls and case setup in recursion; the actual length may differ on different machines.

So to improve the speed of recursion, a small modification is applied; that is, run brute force algorithm when input length is below 25…


def modified_maximum_subarray_recursion(array,low,high):
    """
    This function takes an array of numbers, and find the sum of the maximum
    subarray in the list by recursion. However, the base case has been modified
    such that it will do bruteforce when its speed beats recursion, such that the
    algorithm is even faster.
    Input: A list of real numbers; the beginning and the end of the part in the list.
    Output: Indices and sum.
    """
    if (high - low + 1) <= 25: # base case
        return bruteforce_maximum_subarray(array,low,high)
    # the advantage of using bruteforce
    else:
        mid = int(math.ceil(float(low + high)/2))
        left_l,left_h,left_sum = maximum_subarray_recursion(array,low,mid - 1)
        right_l,right_h,right_sum = maximum_subarray_recursion(array,mid,high)
        cross_l,cross_h,cross_sum = max_crossing_array(array,low,mid,high)
        final = max(left_sum,right_sum,cross_sum)        
        if left_sum == final:
            return left_l,left_h,left_sum
        elif right_sum == final:
            return right_l,right_h,right_sum
        else:
            return cross_l,cross_h,cross_sum

This modification barely improves the performance and does not change the nature of the recursive method. A possibly better way is to apply the method of Dynamic Programming. This is the linear time Kadane’s Algorithm. It is extremely clear and simple: go along the array, keep track of the maximum sum and update it by comparing itself with the new sum.


def maximum_subarray_dp(A,low,high):
    """
    This function takes an array of numbers, and find the sum of the maximum
    subarray in the list.
    Input: A list of real numbers; the beginning and the end of the part in the list.
    Output: Indices and sum.
    """
    low_here,high_here,max_here = low,high,A[low]
    low_far,high_far,max_far = low,high,A[low]
    for x in A[(low + 1):high]:
        max_here = max(x,max_here + x)
        if x == max_here:
            low_here,high_here = A.index(x),A.index(x)
        else:
            low_here,high_here = low_here,A.index(x)
        max_far = max(max_far,max_here)
        if max_far == max_here:
            low_far,high_far = low_here,high_here
    return low_far,high_far,max_far

Now let’s do a comparison of the three methods described above. The array length goes from 1 to 8000.

Algo Plot

You can easily see that DP has a tiny advantage over the recursive methods. Although this is not a huge “victory”, it wins!