Technology
Lost AirPods, arrays, and sorting algorithms
The search for the lost AirPod
My wife Dylan is a math and physics teacher. Yesterday while she was on a Zoom call with a student I got a text message from her: “the kids have my AirPods.” She knew because her student was cracking up laughing at the babblings of our toddlers, captured by the built-in microphone on the AirPods.
I tracked down the kids. They were carrying around an AirPod case with one AirPod in it. The other one was nowhere to be seen. Awesome. Now we had to search the whole house for a tiny white piece of plastic worth over $100. It could have been anywhere.
Luckily, we realized we had two important tools at our disposal: the missing AirPod was in Bluetooth range and it had a microphone. So Dylan connected to it and called me. She walked around the house saying “hello” while I listened on the other end of the line, and I told her when she was getting louder and quieter. Soon we’d narrowed down the search to a laundry basket chock-full of dirty toddler clothes.
We started digging around in the clothes, but to no avail. There had to be 100 different items of clothing in there. It would take forever to search them all. Then I had an idea: we could do a binary search on the laundry basket! This made me happy because I’m a huge geek.
We pulled half the clothes out of the basket. Then Dylan called me again, took the basket into the garage and said “hello.” I heard the “hello” on my end of the line, so that meant the AirPod was still in the basket—we could ignore all the clothes we’d pulled out. She came in and we repeated the process: remove half the clothes, take the basket into the garage, and listen for a “hello.” This time I couldn’t hear her, so that meant the clothes in the basket were ruled out and the clothes we’d just removed were ruled in. After a few more iterations we narrowed down the AirPod’s location to a small pile of six items or so. At that point it was faster to search them one at a time than to keep going back and forth to the garage. At long last we found the missing AirPod, in the pocket of a cute little shirt that we didn’t even know had pockets.
So if you ever lose an AirPod in a laundry basket, now you know what to do. But more importantly, you’ve learned the two most common ways to search an array!
Sequential search
The first way is a sequential search, where you look at each item in turn. If the item is exactly what you’re looking for, you’re done. If not, you keep going. This is the slowest but easiest way to search an array (or a laundry basket). A sequential search function is built into most programming languages. In JavaScript, it’s the Array.find
method. In C#, it’s the Array.Find
or IEnumerable.First
method. You don’t need to write your own, but if you did it might look something like this:
T SequentialSearch<T>(IEnumerable<T> laundry, Func<T, bool> predicate) {
foreach (var item in laundry) {
if (predicate(item)) {
return item;
}
}
return default(T);
}
Binary search
Sometimes a sequential search is much too slow. This is where a binary search may come in handy. You divide the array in half, figure out (without checking each item) which half has what you want, then divide that one in half again and repeat. When your array is too small to divide in half anymore (that is, exactly one element) you’re done. The classic example is an array of numbers sorted from smallest to largest. Say you’re looking for the number 4. You find the midpoint of the array and see if it’s smaller than, larger than, or equal to 4. If it’s equal, you’re done. If it’s smaller, you ignore everything before the midpoint and start over with everything after the midpoint. If it’s larger, you do the opposite: ignore everything after the midpoint and start over with everything before it. You might code it like this:
long? BinarySearch(long[] laundry, long searchTarget) {
var startIndex = 0;
var endIndex = laundry.Length - 1;
while (startIndex <= endIndex) {
var midpointIndex = (endIndex + startIndex) / 2;
var middleItem = laundry[midpointIndex];
if (middleItem == searchTarget) {
return middleItem;
}
if (middleItem > searchTarget) {
endIndex = midpointIndex - 1;
continue;
}
if (middleItem < searchTarget) {
startIndex = midpointIndex + 1;
continue;
}
}
return null;
}
I think I’ve used a binary search maybe once. Most of the time you either don’t need it (sequential search is a snap for arrays of a few hundred elements or so) or it’s not an option—say, if there’s no easy way to compare what you’re looking for against a whole group of elements. If you’re working with a very large array of numbers or words, that’s an example of a case where binary search may be a good solution. Really, anything sortable could be a candidate.
Speaking of sortable, what if you want to sort your laundry instead of find a lost AirPod? Say you want to sort it from smallest to largest. Baby socks should be all the way on the left, footie pajamas should be all the way on the right. There are a few different ways to do that.
(By the way, if you’re unfamiliar with Big O notation, read this for an excellent introduction. We’re going to be talking about it because it’s a great way to compare the speed of different algorithms.)
Selection sort
We’ll start with selection sort, mostly because it’s easy. If you had to invent your own sorting algorithm in five minutes you might come up with something like this.
To selection sort your laundry, you start by pulling the top item of clothing out of the basket and putting it on the floor. Then you go through every other item of clothing one at a time. If it’s smaller than the one on the floor, they switch places: the one on the floor goes in the basket, the smaller one goes on the floor. Once you’ve been through the whole basket, you know for sure the item of clothing on the floor is the smallest one of all.
You move that item to the left, then pull the next item from the top of the basket and start over. You’re looking for the smallest item of the remaining clothes. Once you’ve found it you put it next to the previous smallest item, grab another item from the basket, and do it all again. For each item in the basket, you have to go through the entire basket of unsorted clothes.
Here’s how you’d code that:
// This method switches two items in an array
void SwapItemsInArray<T>(T[] array, int index1, int index2) {
(array[index1], array[index2]) = (array[index2], array[index1]);
}
void SelectionSort(long[] laundry) {
var numberOfItems = laundry.Length;
// We look at each element of the array. If it's not the minimum value
// of the rest of the array, we swap it for the one that is.
for (var outerLoopIndex = 0; outerLoopIndex < numberOfItems; outerLoopIndex++) {
var possibleMinimumIndex = outerLoopIndex;
for (var innerLoopIndex = outerLoopIndex + 1; innerLoopIndex < numberOfItems; innerLoopIndex++) {
if (laundry[innerLoopIndex] < laundry[possibleMinimumIndex]) {
possibleMinimumIndex = innerLoopIndex;
}
}
SwapItemsInArray(laundry, possibleMinimumIndex, outerLoopIndex);
}
}
This is a complete sorting algorithm in less than 20 lines of code. That’s the best thing about it, unfortunately. If nested for
loops strike fear in your heart as they do in mine, you understand the problem: we’re doing a lot of iterations.
Selection sort has a time complexity of O(n²)
. An array of 16 items will theoretically take 64 times as long to sort as an array of 2 items. Generally speaking you should avoid it.
Insertion sort
Insertion sort is probably what you’d do if you had to sort laundry by size in real life.
You’d pick an item out of the basket and put it on the floor. Cool, that item is sorted. Then you’d pick another item. If it’s bigger than the one on the floor, put it next to it on the right. If it’s smaller, it goes on the left. Then take another item. If it’s smaller than both the items on the floor, it goes to the left. If it’s bigger than both, to the right. If it’s in between, it goes in the middle. You’d do this with all the clothes in the basket. When the basket is empty, all the clothes are sorted.
Here’s how you’d code that:
void InsertionSort(long[] laundry) {
// We start at 1 here because the first item in the list is already sorted.
// On each iteration, everything left of loopIndex will be already sorted.
for (var loopIndex = 1; loopIndex < laundry.Length; loopIndex++) {
var unsortedItem = laundry[loopIndex];
var comparingItemIndex = loopIndex -1;
// We go left from the current unsorted item, moving each item to the right
// until we find something larger than the current item.
while (comparingItemIndex >= 0 && unsortedItem < laundry[comparingItemIndex]) {
laundry[comparingItemIndex + 1] = laundry[comparingItemIndex];
comparingItemIndex--;
}
// We set our unsorted item down to the left of the larger item.
// Now it's sorted.
laundry[comparingItemIndex + 1] = unsortedItem;
}
}
Insertion sort has an average time complexity of O(n²)
, but it can be very fast on small arrays—even faster than Quicksort, which we’ll look at next.
Quicksort
Quicksort is a famous sorting algorithm named for its speed. Here’s how you Quicksort your laundry.
You start by picking an item of clothing (for best performance here you’d pick something of medium size, like a t-shirt). You put it in the middle of the floor. Everything larger than the t-shirt goes in a pile to the right, and everything smaller than the t-shirt goes in a pile to the left. Then you focus on the left pile: pick something, put it center-left on the floor. Everything smaller than it goes to the left, everything larger than it goes to the right (but not further right than the t-shirt!). Switch to the right pile and do the same thing.
Now you have four piles. You focus on each pile in turn, doing the same thing over and over again until you have eight piles. Do it again and you’ll end up with 16 piles. Then 32. Then 64. You’ll be done when the number of piles is the same as the number of items. And they’ll be sorted perfectly from smallest to largest.
Here’s how you’d code it:
// This method switches two items in an array
void SwapItemsInArray<T>(T[] array, int index1, int index2) {
(array[index1], array[index2]) = (array[index2], array[index1]);
}
// This method divides a set into two subsets, where the items
// in the first subset are smaller than the items in the second
// subset.
int Partition(long[] laundry, int smallIndex, int largeIndex) {
// Pick an item (a "pivot") to divide the set.
// There are many ways to pick an item, but a common way
// is just to choose the last one.
var tshirt = laundry[largeIndex];
// We're going to leave the pivot in last place for now.
// We'll pick a placeholder, the first item in the array,
// and swap it with the first thing we find that's smaller
// than a tshirt. Each time we do that we'll move to the next
// item as our new placeholder.
// When we reach the pivot at the end of the array, everything
// smaller than a tshirt will be left of the placeholder. So we
// can swap the pivot and the placeholder and everything will be
// divided correctly.
var possibleFootiesIndex = smallIndex - 1;
for (var loopIndex = smallIndex; loopIndex < largeIndex; loopIndex++) {
if (laundry[loopIndex] <= tshirt) {
possibleFootiesIndex++;
SwapItemsInArray(laundry, loopIndex, possibleFootiesIndex);
}
}
possibleFootiesIndex++;
SwapItemsInArray(laundry, possibleFootiesIndex, largeIndex);
// Since we just did a swap, this is now the index of the pivot element.
return possibleFootiesIndex;
}
void Quicksort(long[] laundry, int leftIndex, int rightIndex) {
if (leftIndex >= rightIndex) {
return;
}
// First we partition the array and get the pivot
var pivotIndex = Partition(laundry, leftIndex, rightIndex);
// Then we sort the left pile
Quicksort(laundry, leftIndex, pivotIndex - 1);
// Then we sort the right pile
Quicksort(laundry, pivotIndex + 1, rightIndex);
}
// You could use these to quicksort an array like so:
// Quicksort(myArray, 0, myArray.Length - 1);
Quicksort is commonly used in web browsers to implement JavaScript’s Array.sort()
method (although they may opt to use a different algorithm for very small arrays).
Quicksort is not a stable sorting algorithm. This means if you have elements in the array that are equal to each other, they may not appear in the same order after the sort! In other words, if you have several 3s in the array, the first 3 before sorting might be the last 3 after sorting. They’ll still be sorted correctly, though—all the 3s will end up next to each other—so most of the time it doesn’t matter.
In Big O notation, quicksort’s time complexity averages O(n log n)
but can be O(n²)
in the worst case. O(n log n)
is pretty good, about as good as we can generally expect from a sorting algorithm. O(n²)
is pretty bad, and would only happen if you choose the worst possible pivot in every case. For example, the code above would perform poorly on an pre-sorted array since it would always choose the largest element in each subset as the pivot, causing the maximum number of swaps. To avoid this, implementations of Quicksort sometimes sample a few random elements from each subset and use their median as a pivot.
Quicksort’s a doozy, isn’t it? It took me a couple hours’ study to really understand it. Let’s take a break with another simple one: bubble sort.
Bubble sort
To bubble sort your laundry, you lay out the items on the floor in any order. Then you start with the leftmost item of clothing. If it’s bigger than the one next to it, you swap them. If not, you don’t. Then you move to the second item from the left. If it’s bigger than the one on its right, swap ’em. If not, leave ’em. Then move onto the third item from the left. Repeat until you’ve reached the last item. At the end of this loop, the largest item of clothing will be furthest to the right. You can consider it sorted.
Then you start over. Hooray! Again compare each item with the one on its right, but stop when you’ve reached the second-to-last item (since we know the last item is sorted). The second-largest item will end up second from the right, and you can ignore it on the next go-around. We call this “bubble sort” because on each loop the largest unsorted item goes as far to the right as it can go (toward the “top” of the list), like a bubble rising to the top of a pond.
Do it again and again and again until you can make a full loop without swapping any items. Then you’re done. At most you’ll have done as many loops as there are items of clothing.
Here’s how you’d code it:
// This method switches two items in an array
void SwapItemsInArray<T>(T[] array, int index1, int index2) {
(array[index1], array[index2]) = (array[index2], array[index1]);
}
void BubbleSort(long[] laundry) {
for (var outerLoopIndex = 0; outerLoopIndex < laundry.Length; outerLoopIndex++) {
var lastUnsortedIndex = laundry.Length - outerLoopIndex;
var isSorted = true;
for (var innerLoopIndex = 0; innerLoopIndex < lastUnsortedIndex - 1; innerLoopIndex++) {
if (laundry[innerLoopIndex] > laundry[innerLoopIndex + 1]) {
SwapItemsInArray(laundry, innerLoopIndex, innerLoopIndex + 1);
isSorted = false;
}
}
if (isSorted) {
break;
}
}
}
Bubble sort generally has a time complexity of O(n²)
. It performs much better on pre-sorted lists, but that’s not very useful, is it? Mechanics also perform better on cars that aren’t broken.
Bubble sort is probably the least practical sorting algorithm, all things considered. Still, it’s often taught in introductory CS courses so it’s good to know about.
Merge sort
Let’s sort our laundry one more time. We’ll use merge sort.
Start by dividing the laundry in half. Then divide each half in half. You have four piles; divide them into eight. Then divide those eight into 16. Keep going until each item of clothing is its own pile.
Now go through the laundry items in groups of 2 (if there’s an odd one at the end, that’s okay, it can be its own group). In each group, swap the items if necessary so that the smaller one is on the left.
Now go through all the groups two at a time. Pick up the first item of the left group in your left hand, the first item of the right group in your right hand. Compare them and put the smaller item in a brand new group. Whichever hand is empty, pick up the next item from the pile on that side. Again, compare the items you’re holding and add the smaller one to the new group. Refill the empty hand as before. If the pile on that side is empty, then add the item in your other hand to the new group, and add each remaining item from that pile to the new group as well. Your new group will be sorted.
Once all the groups are sorted you can start again, sorting two (larger) groups at a time into a new group. Eventually you’ll only have one group left and your laundry will be sorted.
Here’s how you’d code it:
void MergeSort(long[] laundry) {
if (laundry.Length <= 1) {
return;
}
// Divide the pile into two piles
var midpointIndex = laundry.Length / 2;
var leftPile = laundry.Take(midpointIndex).ToArray();
var rightPile = laundry.Skip(midpointIndex).ToArray();
// Sort each pile using this same method
MergeSort(leftPile);
MergeSort(rightPile);
// Treat `laundry` as a whole new pile and walk through the left and right
// piles. On each step put the smaller element next in the new pile.
int leftPileIndex = 0, rightPileIndex = 0, newPileIndex = 0;
while (leftPileIndex < leftPile.Length && rightPileIndex < rightPile.Length) {
if (leftPile[leftPileIndex] < rightPile[rightPileIndex]) {
laundry[newPileIndex] = leftPile[leftPileIndex];
leftPileIndex++;
} else {
laundry[newPileIndex] = rightPile[rightPileIndex];
rightPileIndex++;
}
newPileIndex++;
}
// Finish adding any leftover items in the left pile to the new pile...
while (leftPileIndex < leftPile.Length) {
laundry[newPileIndex] = leftPile[leftPileIndex];
newPileIndex++;
leftPileIndex++;
}
// ...and do the same for any leftover items in the right pile.
// There will only be leftovers in one pile!
while (rightPileIndex < rightPile.Length) {
laundry[newPileIndex] = rightPile[rightPileIndex];
newPileIndex++;
rightPileIndex++;
}
}
Merge sort has a time complexity of O(n log n)
in the best, worst, and average cases. So theoretically it’s faster than Quicksort most of the time. However, it also uses more memory (O(n)
for merge sort versus O(log n)
for Quicksort) because of all the extra arrays.
Other algorithms
There are many other sorting algorithms you could learn about. Most of them are variations of the above, adding complexity to the code in order to achieve faster average completion times. Some are more useful for sorting words than numbers; others are very useful when there are only a few different possibilities for each element in the array. Still others are a combination of different algorithms—they might start with one algorithm, then switch to another one under certain conditions, like when a subset is very small or when the sort method has recursed on itself a given number of times.
I like to point people to one of the most famous sayings in Computer Science:
Premature optimization is the root of all evil.
Donald Knuth, The Art of Computer Programming
In other words, you should almost always start with whatever form of Array.sort()
is built into your programming language or platform. That will be the easiest thing for you and any other programmers who use your code. And if you find out it’s too slow, only then should you implement your own sorting algorithm (very carefully, and with plenty of unit tests).
What? Array is evolving!
Most arrays don’t need to be sorted. And much of the time an array is not the best thing for what you’re trying to do. If you ever see an Array.find()
inside of a loop, for example, you should take a closer look and see if you’d be better off transforming the array into something else. Take the following code:
void MatchPeopleToAddresses(Person[] people, Address[] addresses) {
foreach (var person in people) {
var matchingAddress = addresses.First(ad => ad.PersonName == person.Name);
person.Address = $"{matchingAddress.StreetAddress} {matchingAddress.CityState} {matchingAddress.Zip}";
}
}
This is a pretty common sort of operation. We have data in two arrays and we want to match them up according to some common property. The problem is that we’re iterating through addresses
on each loop—it’s a loop inside a loop!—and evaluating each Address
multiple times. We don’t have to do that. Consider this:
void MatchPeopleToAddresses2(Person[] people, Address[] addresses) {
var addressesByPersonName = addresses.ToDictionary(ad => ad.PersonName);
foreach (var person in people) {
var matchingAddress = addressesByPersonName[person.Name];
person.Address = $"{matchingAddress.StreetAddress} {matchingAddress.CityState} {matchingAddress.Zip}";
}
}
We start with one official, all-the-way-through iteration of addresses
. We add all the elements to a Dictionary (you might know it in other languages as an object, an associative array, or a hash table) where the keys come from the PersonName
field. Then on each iteration of the foreach
loop, all we have to do is look up the address by that key. Hash table lookups are O(1)
, which is as good as it gets. A hash table lookup will always be fast enough. Sometimes it isn’t ideal for other reasons, but speed is a foregone conclusion.
You may find uses for other data structures such as hashsets, tries, graphs, queues and so forth, especially in extremely performance-sensitive situations. Generally the goal is to balance speed of access, speed of mutation, and storage space with known constraints. These are all good tools to have in your toolbox. But most of the time, in day-to-day coding tasks, at companies that aren’t in the news every day, arrays and dictionaries will be your best friends.
Knowing what’s out there
Do you need to memorize all this stuff? Maybe, if you’re cramming for an interview at Google. (I don’t envy you.) But otherwise, probably not. It’s a fun exercise to think about all the ways you can search and sort an array, but you’ll go back to your day job and use Array.find
, Array.sort
, or a dictionary transformation pretty much always. Maybe an occasional hashset.
But someday, if you’re very lucky, it may turn out that none of the basic tools are good enough. You’ll need to squeeze some performance out of a heavily-used method or comb through an enormous amount of data. And in that situation, you’ll be glad you know what else is out there.
That’s half the battle.