Anagrams & Big O
To explore different orders of magnitude, letās consider four different solutions to the problem of detecting if a string is an anagram. One string is an anagram of another if the second is a rearrangement of the letters in the first. For example, 'heart'
and 'earth'
are anagrams.
For the sake of simplicity, letās assume that the two strings in question use symbols from the set of 26 lowercase alphabetic characters. Our goal is to write a boolean function that will take two strings and return whether they are anagrams.
Big O
An algorithm is little more than a series of steps required to perform some task. If we treat each step as a basic unit of computation, then an algorithmās execution time can be expressed as the number of steps required to solve the problem.
This abstraction is exactly what we need: it characterizes an algorithmās efficiency in terms of execution time while remaining independent of any particular program or computer. Now we can take a closer look at those two summation algorithms we introduced last chapter.
Intuitively, we can see that the first algorithm (sum_of_n
) is doing more work than the second (arithmetic_sum
): some program steps are being repeated, and the program takes even longer if we increase the value of n
. But we need to be more precise.
The most expensive unit of computation in sum_of_n
is variable assignment. If we counted the number of assignment statements, we would have a worthy approximation of the algorithm's execution time: there's an initial assignment statement (total = 0
) that is performed only once, followed by a loop that executes the loop body (total += i
) n
times.
We can denote this more succinctly with function , where .
The parameter is often referred to as the āsize of the problemā, so we can read this as ā is the time it takes to solve a problem of size , namely 1 + steps.ā
For our summation functions, it makes sense to use the number of terms being summed to denote the size of the problem. Then, we can say that āThe sum of the first 100,000 integers is a bigger instance of the summation problem than the sum of the first 1,000 integers.ā
Based on that claim, it seems reasonable that the time required to solve the larger case would be greater than for the smaller case. āSeems reasonableā isnāt quite good enough, though; we need to prove that the algorithmās execution time depends on the size of the problem.
To do this, weāre going to stop worrying about the exact number of operations an algorithm performs and determine the dominant part of the function. We can do this because, as the problem gets larger, some portion of the function tends to overpower the rest; itās this dominant portion that is ultimately most helpful for algorithm comparisons.
The order of magnitude function describes the part of that increases fastest as the value of increases. āOrder of magnitude functionā is a bit of a mouthful, though, so we call it big O. We write it as , where is the dominant part of the original . This is called āBig O notationā and provides a useful approximation for the actual number of steps in a computation.
In the above example, we saw that . As gets larger, the constant 1 will become less significant to the final result. If we are simply looking for an approximation of , then we can drop the 1 and say that the running time is .
Letās be clear, though: the 1 is important to and can only be safely ignored when we are looking for an approximation of .
As another example, suppose that the exact number of steps in some algorithm is . When is small (1 or 2), the constant 1005 seems to be the dominant part of the function. However, as gets larger, the term becomes enormous, dwarfing the other two terms in its contribution to the final result.
Again, for an approximation of at large values of , we can focus on and ignore the other terms. Similarly, the coefficient becomes insignificant as gets larger. We would then say that the function has an order of magnitude ; more simply, the function is .
Although we donāt see this in the summation example, sometimes the performance of an algorithm depends on the problemās exact data values rather than its size. For these kinds of algorithms, we need to characterize their performances as worst case, best case, or average case.
The worst case performance refers to a particular data set where the algorithm performs especially poorly, while the best case performance refers to a data set that the algorithm can process extremely fast. The average case, as you can probably infer, performs somewhere in between these two extremes. Understanding these distinctions can help prevent any one particular case from misleading us.
There are several common order of magnitude functions that will frequently appear as you study algorithms. These functions are listed below from lowest order of magnitude to highest. Knowing these goes a long way toward recognizing them in your own code.
Constant
Logarithmic
Linear
Log Linear
Quadratic
Cubic
Exponential
In order to decide which of these functions dominates a function, we must see how it compares with the others as becomes larger. Weāve taken the liberty of graphing these functions together below.
Notice that, when is small, the functions inhabit a similar area; itās hard to tell which is dominant. However, as grows, they branch, making it easy for us to distinguish them.
As a final example, suppose that we have the fragment of Python code shown below. Although this program does nothing useful, itās instructive to see how we can take actual code and analyze its performance.
To calculate for this fragment, we need to count the assignment operations, which is easier if we group them logically.
The first group consists of three assignment statements, giving us the term 3. The second group consists of three assignments in the nested iteration: . The third group has two assignments iterated times: . The fourth āgroupā is the last assignment statement, which is just the constant 1.
Putting those all together: . By looking at the exponents, we can see that the term will be dominant, so this fragment of code is . Remember that we can safely ignore all the terms and coefficients as grows larger.
The diagram below shows a few of the common big O functions as they compare with the function discussed above.
Note that is initially larger than the cubic function but, as grows, cannot compete with the rapid growth of the cubic function. Instead, it heads in the same direction as the quadratic function as continues to grow.
What makes one computer program better than another?
Take a moment to answer this for yourself š. If you were given two programs that solve the same problem, how would you decide between them?
The truth is there are many valid criteria, which are often in conflict.
We typically want our program to be correct. In other words, weād like the programās output to match our expectations. Unfortunately, correctness is not always clear. For instance what does it mean for Google to return the ācorrectā top 10 search results for your search query?
Good software engineers often want their code to be readable, reusable, elegant or testable. These are admirable goals, but you may not be able to achieve them all at the same time. Itās also not entirely clear what something like āeleganceā looks like, and we certainly havenāt been able to model it mathematically, so computer scientists havenāt given these aspects of programs much consideration š¤·ā.
Two factors that computer scientists love to model mathematically, though, are how long a program will take to run, and how much space (typically, memory) it will use. We call these time and space efficiency, and theyāll be at the core of our study of algorithms.
We may need to trade these off against other concerns: algorithm A may be faster but use more memory than algorithm B. They might both be less elegant than algorithm C, in a context where elegance is the priority. Weāll be focusing on time and space because they happen to be both interesting and measurable, but please donāt go away thinking theyāre always the most important factors. The only truly correct answer is: āit dependsā.
Another aspect of āit dependsā, even when we focus on just time or space, is the context in which the program runs. There is often a relationship between the inputs of a program and its running time or space usage. For instance if you grep
over many large files, it will take longer than if you grep
over fewer, smaller files. This relationship between inputs and behavior will be an important part of our analysis.
Beyond this, the exact time and space that your program uses will also depend on many other factors. Can you think of at least three?
Here are some:
How long it takes your computer to execute every instruction
Your computerās āInstruction Set Architectureā, for instance ARM or Intel x86
How many cores of your machine the program uses
What language your program is written in
How your operating system chooses to schedule processes
What other programs are running at the same time
ā¦ and there are many more.
All of these are important in practice, but none address the core question of whether an algorithm is generally better or worse than another. Sometimes weād like to be able to ask: generally speaking, irrespective of whether a program is written in Fortran for the IBM 704 or in Python running on a shiny new Macbook, will it be more time and/or space efficient than an alternative? Will it use less space? This is the crux of algorithm analysis.
Algorithm analysis is a way to compare the time and space efficiency of programs with respect to their possible inputs, but irrespective of other context.
In the real world, we measure the time used by a program in some unit ofā¦ errā¦ time, such as seconds. Similarly we measure space used in something like bytes. In analysis, this would be too specific. If we measure the time that it takes to finish, this number would incorporate details like language choice and CPU speed. We will need new models and vocabulary in order to speak with the level of generality that weāre seeking.
Letās explore this idea with an example.
Say I wanted to calculate the sum of the first n numbers, and Iām wondering how long this will take. Firstly, can you think of a simple algorithm to do the calculation? It should be a function that has n as a parameter, and returns the sum of the first n numbers. You donāt need to do anything fancy, but please do take the time to write out an algorithm and think about how long it will take to run on small or large inputs.
Here is a simple Python solution:
Will sum_to_n
take longer to run given a larger n? Intuitively, the answer seems to be yes, as it will loop more times.
Will sum_to_n
take the same amount of time to run each time itās invoked with the same input? Intuitively the answer seems to be yes, since the same instructions are executed.
Letās now add some profiling code:
Letās say I ran this with n=1000000
(1 million) and noticed that it took 0.11 seconds. What would you expect to see if I ran it five more times?
Interestingly, it takes a slightly different amount of time on each invocation, due to the slightly different state of my computer and the Python virtual machine each time. Weād generally like to ignore such small and random differences.
Now, what if we were to run it again with a range of different inputs, say 1 million, 2 million, 3 million and so on up to 9 million? What would you expect to see?
Do you see the general relationship between between n and time elapsed? Is this what you expected? How would the relationship look if you were to plot values of n on the x-axis and execution time on the y-axis?
drawScatter('svg.scatter-linear', [ {x: 1, y: 0.1198549}, {x: 2, y: 0.2401729}, {x: 3, y: 0.3838110}, {x: 4, y: 0.4790699}, {x: 5, y: 0.6189690}, {x: 6, y: 0.6952291}, {x: 7, y: 0.8431778}, {x: 8, y: 0.9679160}, {x: 9, y: 1.0458572}, ], 'n (millions)', 'Execution time (seconds)')
It turns out that our simple strategy is not the most efficient. In fact there is a short formula that will give us the answer to our question without any looping. Can you determine (or perhaps remember) what it is? Here's a hint: try summing the numbers 1 to 6 by grouping 1 and 6, 2 and 5, and 3 and 4 together, noticing that there are three pairs which each total 7.
Mathematically, the formula is:
If you donāt quite understand this formula, take a moment to explore one of these visual explanations.
How would we implement this as a Python function, again with our timing code?
What do you expect to see if we run this with a range of inputs as we did with sum_to_n
?
Notice that our answers are all correct. Did you expect each calculation to take around the same amount of time? What would this look like if we were to again plot value of n on the x-axis and execution times on the y-axis?
drawScatter('svg.scatter-constant', [ {x: 1, y: 2.1}, {x: 2, y: 2.9}, {x: 3, y: 1.9}, {x: 4, y: 1.9}, {x: 5, y: 3.1}, {x: 6, y: 2.1}, {x: 7, y: 2.1}, {x: 8, y: 2.9}, {x: 9, y: 1.9}, ], 'n (millions)', 'Execution time (microseconds)', null, [0, 10])
Notice that our y-axis is now marked in microseconds, which are millionths of a second. Also notice that the execution time appears to be largely independent of the size of the input.
We describe sum_to_n
as ālinearā or , and arithmetic_sum
as āconstantā or . Hopefully you can start to see why. Irrespective of the exact times that these functions take to execute, we can spot a general trend, that the execution time for sum_to_n
grows in proportion to n, whereas the execution time for arithmetic_sum
remains constant. All else being equal, arithmetic_sum
is the better algorithm, for this reason.
In the following sections, weāll apply a little more rigor to our reasoning, and explore a method for determining these time and space characteristics without direct measurement.
Last updated