ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Follow publication

Coltrane and Kadane — How We Can Apply Lessons from Legends to Improve our Coding Craft (Code as Craft vol. 1)

--

The Heavyweight Champion

John Coltrane was and remains a jazz legend. Widely considered the greatest tenor saxophonist ever, he left behind a broad and diverse set of virtuosic recordings, and he owed his greatness to a searching musicality which was both intellectual and spiritual, as well as to a nearly unparalleled devotion to his craft.

Okay…so what’s this got to do with programming?

As developers and software engineers, we spend a great deal of our time looking for the “next thing” — it might be a hot language, a bleeding-edge technology, a cool software pattern, whatever. But as a group, we tend to be attracted to novelty — things that we often perceive either as innovative, or topics which are simply entirely new to us, and so we are drawn to them. We are sort of like moths and lamps in this sense. And being drawn to novelty often pairs with a drive to innovate on one’s own, to help push the field forward. I would argue this is a positive quality to have, to be sure. Stagnation can be the kiss of death in what is still a relatively young field.

BUT…there is a great deal of value in periodically revisiting things we’ve already learned. John Coltrane was an innovator in many ways. But one of the ways in which he enabled his own innovations was to master thoroughly the musical building blocks that he could then assemble into the melodic and harmonic ideas that flowed from his brain — without the hours upon hours of assiduous practice, the ideas would have stopped before making it from his brain, through his fingers, and out of the mouth of his saxophone. And we would all be much poorer from it.

As a software engineer, I find myself turning, from time to time, back to seemingly basic (in the sense of being foundational) concepts and skills. I like to go back and review ideas that I had perhaps simply taken for granted. Why does a given software pattern, perhaps one I’ve used repeatedly, actually work in the way that it does? Why is it the right one, or is it perhaps time to rethink my approach?

I also like to engage in the seemingly (but sometimes misleadingly) simple activity of putting these foundational ideas into code. It’s a form of repetition — or, to use a different word…of practice.

Jazz musicians use the term “woodshedding” to describe this form of practice — of repeated execution of a given idea (a scale, a melody, a progression) until they have fully mastered it. Then they can use it almost instinctively to build larger, more complex ideas.

This article quotes Paul Klemperer (the jazz saxophonist, _not_ the Oxford economist) as describing woodshedding as

…a recognition of the need to sequester oneself and dig into the hard mechanics of the music before you can come back and play with a group in public…a humbling but necessary chore, like chopping wood before you can start the fire.

I really like that description. So what I’m writing about today is a form of software development “woodshedding” — returning to ideas and tools we may have learned long ago to review them, to ensure that we still understand why they exist and therefore why we learned them in the first place (beyond “they were assigned to me in some class I took as an undergrad”), and to ensure that not only do we remember their form and function, but that we maintain or even grow our facility with them, so that we can use them effectively in our work. Software development is both art and science, and in that respect, it can be a lot like music, I would argue.

And if, perchance, you’ve never had the occasion to learn the specific topics I’m addressing, then I hope this article proves doubly useful and gives you ideas for future exploration and improvement of your development skills.

Development Woodshedding in Practice

In the rest of this article, I am going to walk through a well-known problem category. Along the way, I will introduce some variations on the base problem and how they may relate to real world problem domains, and I will also show some sample code (in Go) for implementing solutions. While I would not pretend that any of my code is ready to be used as-is in a production application, you are free to borrow/copy it and reuse it for whatever purpose you wish.

The repository in which the sample code can be found is https://github.com/ScarletTanager/algorithms, specifically in the files subset/max_subarray.go and subset/interval.go, along with their corresponding test files (found in the same subset directory).

Maximum Subarray Problems

You probably remember working on maximum subarray problems. This form of problem is a topic that developers often learn fairly early on, because it is a good tool for teaching how to compare the computational complexity of different solutions to the same problem. Two primary approaches (a recursive divide-and-conquer approach and the well-known Kadane’s algorithm) are both fairly easy to grasp, but there are nuances which can be useful to note even for experienced developers.

If we assume an ordered list of of numeric values (an array, or a slice/tuple/list in different programming languages), there should be some contiguous set of values within that larger array which, when taken in the aggregate (which can mean added together, averaged, etc. as the case may be), yield the highest possible value. For example, if we have the array [5, 1, -13, 8, -3, 4, -6, 0, 0, 7], what subarray of values, added together, gives the greatest sum? We could simply do a brute force manual set of calculations and arrive at the subarray [8..7], but that method (even if you automate it) is extremely inefficient — imagine trying that for a list of, say, thousands of values. Maximum subarray algorithms try to find efficient approaches to finding this contiguous subset. Why would we even need such an approach?

Stock Trading

One use case for maximum subarray is that of selecting when a stock should be bought and sold to maximize profit. This is the approach Cormen (2009) takes to introduce the problem, and we will borrow it here. If we pretend we have a time machine, and we can go back in time to buy/sell shares of a particular stock knowing precisely the share price on each day, when should we buy and sell in order to make the highest possible profit from our trading?

Consider the following graph of a stock’s price over time — each point being the price in dollars on a given day:

Clearly, the stock sold for its highest price early in this period — the highest two prices are the first two points on the graph. But we have no way to take advantage of that price, if we cannot find a point before then with a lower price at which to purchase the stock. So let us start with our first key point:

The solution to the problem is not just finding the greatest single value over the interval.

But the solution here is also not simply to find the values in this specific dataset, which, when totalled, give us the greatest sum. These are prices (and are therefore all positive values). Why isn’t the answer just adding all the values — in other words, the same as our original array? This brings us to our second key point:

Sometimes you need to perform an intermediate operation on your dataset to transform it.

What we are looking for is not the same as maximizing price — we are trying to maximize profit — which is, at its simplest, the delta in price between the purchasing of a thing and its sale. So first we need to convert our values into price deltas between the days with something like this (I am using int values here just to keep things simple, but using floating point values does not change anything materially important beyond type conversions/formatting):

func PriceToProfit(prices []int) []int {
deltas := make([]int, len(prices) - 1)
for i, price := range prices[1:] {
deltas[i-1] = price - prices[i-1]
}
return deltas
}

Running this on our original price array yields an array like this:

[5, 12, -173, 9, -20, -100, 13, 12, 0, 4, -76, 243, 17, 16, -343]

In go, this is technically a slice, not an array — see here if you do not understand the distinction. I will be using the two terms interchangeably throughout the article, along with terms like “list”, despite the seeming inaccuracy when taken in a language-specific context. When I mean a slice specifically, I will make that clear.

Some observations:

  1. This is a list of both negative and positive values.
  2. Based on visual inspection, the largest profits appear to be available at the right side of the list, rather than at the left (which is where the price peak was). This matches our graph, which is a good confirmation that we are at least on the right track. It never hurts to make sure that your data “looks right.”

Let us now bring up our third key point:

In an array of both negative and positive values, the maximum subarray will be bounded on the left and right by positive (or nonnegative in some approaches) values.

This point gives us a clue to where we’re heading. If we scan the array from left to right, we may find a maximum subarray which spans (includes) one or more negative values, but it will never be bounded by them. If our solution returns a subarray in which either the beginning or ending value is negative (assuming an original array of both negative and positive/nonnegative values), then our solution cannot be correct.

There is a special sub-case in which the array contains no positive values— we will address that later on.

Kadane’s Algorithm

Joseph “Jay” Kadane is one of those individuals who seems to exist to make the rest of us feel, well, a bit dim. Kadane is a professor emeritus of statistics at Carnegie Mellon University, and his list of publications is…let’s go with “lengthy.” In computer science, however, he is probably best known for his algorithm for finding the maximum subarray (which is around forty years old), and it is this to which we now turn.

There are other methods for finding maximum subarrays — you can implement a good divide-and-conquer solution (which will, predictably enough, run in O(nlgn), so it isn’t horrible). But Kadane formulated a base algorithm which runs in linear time (so O(n)), and we are going to spend the rest of this article looking at it (and some slight variations).

Kadane’s eponymous algorithm can be loosely paraphrased as:

Scan array from left to right
remember the maximum sum seen so far
if the current sum is greater than the maximum,
replace maximum with current

I think one unfortunate habit that most examples and descriptions (for any algorithm, not just this one) fall into is that of only providing correct solutions — but I think you learn more by starting off with a solution that is incorrect, but not obviously so, at least for some cases. So we’re going to begin with an implementation that works…some of the time, and then we’ll look at how and why it needs to be adjusted to arrive at a correct implementation. As part of that, discerning readers may have noticed that I’ve glossed over a couple of key details in my pseudo-paraphrase of Kadane above — never fear, I’ve done this on purpose.

Implementing Kadane

Let us start with a list of values, represented here (and henceforth) as one of several go slices:

a = []int{5, 4, -6, 3, -12, 16, 23, -18, -47, 7, 27, -3}

The maximum subarray here is the slice a[5, 7] — which is the two value slice {16, 23}, adding up to 39. Let’s see if we can write some code that produces the correct result:

func MaxSubarrayLinear(a []int) (max, leftBound, rightBound int) {
var (
currentSum int
)
if len(a) == 0 {
return 0, 0, 0
}
max = minValueInt()for idx, val := range a {
currentSum += val
if currentSum > max {
if val > currentSum {
max = val
currentSum = val
leftBound = idx
} else {
max = currentSum
}
rightBound = idx
}
}
return
}

So this function starts by setting max to the minimum possible value for a signed integer and then continues by scanning the slice from left to right. It then:

  1. Computes a current sum by adding each value in turn;
  2. If the current sum is greater than the current max, check if the current value is greater than the sum. Since the current sum _includes_ that value, val > currentSum means that currentSum was negative before adding val.
  3. If currentSum is determined to have been negative, replace it with val (remember our point about the interval being bounded by positive values)
  4. If currentSum is and was positive, it becomes our new max value.

You’ll note that the value of rightBound is the index of the rightmost value, _not_ the bounding value you’d use in a golang slice (since the right bound is excluded when taking a subslice in go). But I’ve done it this way (and continue to do so below) because I think it is more readable. So we expect this code to produce the result 39, 5, 6, and so it does, according to our test code ( ais declared earlier in the test file as being of type []int):

Describe("MaxSubarrayLinear", func() {
BeforeEach(func() {
a = []int{5, 4, -6, 3, -12, 16, 23, -18, -47, 7, 21, -3}
})
It("Finds the maximum subarray", func() {
s, l, r := subset.MaxSubarrayLinear(a)
Expect(l).To(Equal(5))
Expect(r).To(Equal(6))
Expect(s).To(Equal(39))
})
})

This passes! This looks pretty successful so far. Let’s tweak our input list slightly by appending 85 to it:

a = []int{5, 4, -6, 3, 4, -12, 16, 23, -18, -47, 7, 21, -3, 85}

Now the maximum interval should have a starting index of 10 and a terminating index of 13 (the subslice a[10:]), and a sum of 110:

It("Returns the maximum subarray", func() {
s, l, r := subset.MaxSubarrayLinear(a)
Expect(l).To(Equal(10))
Expect(r).To(Equal(13))
Expect(s).To(Equal(110))
})

Unfortunately, no such luck — our code returns a left bound of 13 and a sum of 85, which is incorrect. So what’s going on here? Kadane’s algorithm hinges in part on our third point above — the maximum subarray will never be bounded by negative numbers (and, unless the array contains only negative values, the maximum subarray will never add to less than 0). But our code has a bug — it will find the previous maximum subarray of {16, 23}, but it will then continue extending that subarray even after the sum drops below zero, when it should have discarded the current sum (and started a new subarray) once it reached that point. So it extended {16, 23} to {16, 23, -18, -47, 7, 21...}, and the condition if currentSum > max was never met until it encountered the last value of 85, and even then it fell through (again, incorrectly) via the if val > currentSum conditional.

How do we fix this? For starters, we need a check on each iteration to see if the current sum is at or below zero, and we should do this before doing any other comparison. We should never be extending a subarray which is nonpositive, in other words. Making this change leaves us with this:

func MaxSubarrayLinear(a []int) (max, leftBound, rightBound int) {
var (
currentSum, currentLeftBound, currentRightBound int
)
if len(a) == 0 {
return 0, 0, 0
}
max = minValueInt() for idx, val := range a {
// Is our current sum at or below zero?
if currentSum <= 0 {
// If the value is greater than the current sum, start a new subarray
// Remember that we only get here if currentSum is <= 0
if val > currentSum {
currentSum = val
currentLeftBound = idx
currentRightBound = idx
}
} else {
// Extend the current subarray
currentSum += val
currentRightBound = idx
}
if currentSum > max {
max = currentSum
leftBound = currentLeftBound
rightBound = currentRightBound
}
}
return
}

This seems better, and running this against our existing test cases confirms that it does, in fact, yield the correct results.

BUT…(there’s always a but)…what if a contains only negative values?

a = []int{-20, -25, -13, -100, -1, -2}

A maximum subarray still exists — it contains a single value which is simply the “largest” value (the array element v with the lowest absolute value |v|, specifically). It is useful here to recall our example from the stock price data above — we treated each value as a deviation from the previous value (a delta), so that we had signed values. This is no different — these are all deviations from zero, and the “largest” negative value is the one with the smallest deviation from zero. So finding the maximum subarray still makes sense here.

Unfortunately, our code does not work as currently implemented. The bug lies in the fact that an int (such as currentSum) is initialized to a value of 0 unless otherwise specified. So the conditional if val > currentSum will never evaluate to true over a set of all negative values, and currentSum will retain its initial value of 0 over every iteration.

There are two possible fixes for this, one simpler and more elegant than the other. Let’s look at the slightly more complicated fix first. We need our current value to be greater than our currentSum, and we can easily implement this by assigning currentSum the lowest possible value for an int (which is -9223372036854775808 on 64-bit architectures) — we do this by simply adding this line prior to entering our for loop:

currentSum = max // We had previously set max to the min value for int

Now our function returns the correct tuple of -1, 4, 4 — the single value of -1 with index 4.

As it happens, there is a more elegant solution, and we implement this by removing the if val > currentSum conditional entirely (but leaving the previously enclosed lines in place):

.
.
.
if currentSum <= 0 {
currentSum = val
currentLeftBound = idx
currentRightBound = idx
} else {
.
.
.

There is no need to assign a specific value to currentSum at startup time (let it be 0), and the rest of our code stays the same as above.

Using Kadane

So why would I care about this?

Maximum subarray problems are useful for a lot more than academic exercise, and they are more frequently encountered than we think. Maybe, like, me, you are a cyclist, and you like to look at your power data. You might well wonder, what is the maximum power output I was able to sustain for any 20 minute interval during my most recent ride? This is an example of what I’ll call a “bounded maximum subarray” — meaning a maximum subarray with a defined length. Here’s a dataset we can use to demonstrate using the concept:

Daily average temperature in C, Delhi, India, May 2018

This table contains the average daily temperatures for New Delhi, India, for May 2018. What if we were to ask, what is the three day period during the month when the daily average was the highest? We would be looking for a bounded maximum subarray — a maximum subarray with a specified length, in which “maximum” refers only to the sum of the elements, not their cardinality. We might implement the algorithm to compute this as follows:

func MaxAverageInterval(a []float64, intervalLen int) (
maxAverage float64,
leftBound, rightBound int) {
var (
currentSum, maxSum float64
)
if len(a) == 0 || intervalLen <= 0 {
return 0, 0, 0
}
maxSum = float64(-9223372036854775808) for idx, val := range a[:len(a)-intervalLen+1] {
currentSum = val
var j int
for j = idx + 1; j < idx+intervalLen; j++ {
currentSum += a[j]
}
if currentSum > maxSum {
maxSum = currentSum
leftBound = idx
rightBound = j - 1
}
}
maxAverage = maxSum / float64(intervalLen)
return
}

This is not identical to Kadane, but it’s somewhat similar. The logic is actually simpler — we simply iterate left to right, remembering the maximum (in terms of sum) interval of fixed length intervalLen seen — but the computational complexity is not. Since the length is fixed, maximizing the sum in turn maximizes the average (mean). The inner loop will execute ln+l-l² times for an input of size n and an intervalLen of l, and this is maximized when l = (n+1)/2, which, for any given n, causes the inner loop to execute (n²+2n+1)/4 times. When l=n or l=1, the inner loop is traversed exactly n times (but these are unrealistically and nonsensically trivial cases, since the function would return the original array and the single largest value, respectively).

But we can apply Kadane’s algorithm directly to answering a different question about the same dataset — what is the longest interval (maximum subarray) whose mean value exceeds a given threshold? In New Delhi, what was the longest period of May 2018 over which the daily average temperature, taken collectively, averaged in excess of 35 C? In excess of 38 C? Answering these questions requires, much like our initial example of maximizing stock profit, a straightforward application of Kadane…after first doing a modest data transformation:

func LongestIntervalExceeds(a []float64, threshold float64) (
leftBound, rightBound int) {
// Convert every value into a deviation
devs := make([]float64, len(a))
for i, v := range a {
devs[i] = v - threshold
}
var (
currentSum, maxSum float64
currentLeftBound, currentRightBound int
)
maxSum = float64(-9223372036854775808) // Now we just run Kadane...
for idx, val := range devs {
if currentSum <= 0 {
currentSum = val
currentLeftBound = idx
} else {
currentSum += val
currentRightBound = idx
}
if currentSum > maxSum {
maxSum = currentSum
leftBound = currentLeftBound
rightBound = currentRightBound
}
}
return
}

Once again, we convert the values into deviations, but this time the results are relative to a value passed in via the threshold argument. But we’ve seen all the rest before by this point.

Wrapping Up

I have broken precisely zero new ground in this article, but that’s kind of my point. Sometimes we need to go back to the basics, to foundational considerations and concepts, and review them repeatedly to make sure that our command of them does not fail us whenever we finally need those concepts. Some of the ideas that, at first glance, seem completely straightforward may nonetheless hold some hidden corners that are worth exploring.

When I’m in the mood for woodshedding, or when I feel like I just need it, sometimes I’ll break out a chestnut like Kadane and read through it several times, and then I will (re)implement it. Sometimes I will be doing this in a language in which I’ve never implemented the algorithm or pattern, and this is a mechanism to explore new syntax for doing something familiar. Other times, it’s about performing (and reperforming) a task I already know, with the intent being to solidify it in my (if you’ll forgive the oxymoronic phrasing) “intellectual muscle memory”. Or maybe I’ve noticed something I never noticed before — I’ve done this after using the same code snippet or pattern perhaps ten or more times, when a little shiny fragment gleams forth from the first time from some shady corner, prompting me to get out my pickaxe and shovel and dig in an attempt to discover the fragment’s true shape.

Hopefully you found the article useful, and I especially hope some of you are inspired to spend some time reviewing and strengthening your command of your toolbox. We should not let the constant pressure to give our attention and energy to the novel prevent us from appreciating the well-established.

And by all means…go listen to Trane!

An Updated Postscript

I recently started a new article addressing heap implementations in Go, and I think that article follows nicely from this one. So I’m going to put them both into a series I’m going to call “Code As Craft,” focusing on how we, as developers, can practice and improve our craft as a purpose in and of itself. Hopefully you’ll find these articles useful.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Sandy Cash

Software engineer, birder, cyclist, language nerd, maybe just a nerd. Never stop learning. A friendly smile costs you nothing.

No responses yet

Write a response