Introduction
Narasimha Karumanchi
Narasimha Karumanchi
1. Variables:
Variables are placeholders for presenting data.
Ex: x=1, y=2
2. Data Types:
A data type in a programming language is a set of data with predefined values.
Ex: integer, floating point, unit number, character, string, etc.
At the top level, there are two types of data types:
System-defined data types (also called Primitive data types): int, string...
User-defined data types: struct, class...
3. Data Structures:
Data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.
Ex: arrays, files, linked lists, stacks, queues, trees, graphs and so on.
Depending on the organization of the elements, data structures are classified into two types:
1) Linear data structures: Linked Lists, Stacks and Queues.
2) Non – linear data structures: Trees and graphs.
4. Abstract Data Types (ADTs):
To simplify the process of solving problems, we combine the data structures with their operations and we call this Abstract Data Types (ADTs).
An ADT consists of two parts:
Declaration of data.
Declaration of operations.
Ex: Linked Lists, Stacks, Queues, Priority Queues, Binary Trees, Dictionaries, Disjoint Sets (Union and Find), Hash Tables, Graphs, and many others.
5. Algorithms:
An algorithm is the step-by-step unambiguous instructions to solve a given problem.
Two main criteria for judging the merits of algorithms:
Correctness (does the algorithm give solution to the problem in a finite number of steps?)
Efficiency (how much resources (in terms of memory and time) does it take to execute).
6. Why the Analysis of Algorithm:
Algorithm analysis helps us to determine which algorithm is most efficient in terms of time and space consumed.
Ex: Sorting problem has many algorithms: insertion sort, selection sort, quick sort ...
7. Goal of the Analysis of Algorithms
To compare algorithms (or solutions) mainly in terms of running time but also in terms of other factors (e.g., memory, developer effort, etc.)
8. What is Running Time Analysis?
It is the process of determining how processing time increases as the size of the problem (input size) increases.
9. How to Compare Algorithms
Execution times
Number of statements executed
Ideal solution
10. What is Rate of Growth
The rate at which the running time increases as a function of input.
11. Commonly Used Rates of Growth
1 (constant), logn (logarithmic), n (linear), nlogn (linear logarithmic), n2 (quadratic), n3 (cubic), 2n (exponential).
12. Types of Analysis
To analyze an algorithm we need some kind of syntax, and that forms the base for asymptotic analysis/notation. There are three types of analysis:
Worst case: Defines the input that the algorithm takes the slowest time to complete.
Best case: Defines the input that the algorithm takes the fastest time to complete.
Average case: Run the algorithm many times, using many different random inputs that come from some distribution that generates these inputs, compute the total running time (by adding the individual times), and divide by the number of trials.
Lower Bound <= Average Time <= Upper Bound
13. Asymptotic Notation
Having the expressions for the best, average and worst cases, for all three cases we need to identify the upper and lower bounds.
Let us assume that the given algorithm is represented in the form of function f(n).
14. Big-O Notation [Upper Bounding Function]
14.1 Definition:
O–notation defined as O(g(n)) = {f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n > n0}. This notation gives the tight g(n) upper bound of the given function.
Objective: to give the smallest rate of growth g(n), which is greater than or equal to the given algorithms’ rate-of-growth/(n).
Generally we discard lower values of n (the rate of growth at lower values of n is not important).
In the figure, n0 is the point from which we need to consider the rate of growth for a given algorithm.
Below n0, the rate of growth could be different and not important.
n0 is called threshold for the given function.
14.2 Big-O Visualization:
O(g(n)) is the set of functions with smaller or the same order of growth as g(n).
Ex: O(n2) includes O(1), O(n), O(nlogn) ... .
14.3 Big-O Examples:
Example-1
Find upper bound for f(n) = 3n + 8
Solution: 3n + 8 ≤ 4n, for all n ≥ 8
∴ 3n + 8 = O(n) with c = 4 and n0 = 8
Example-2
Find upper bound for f(n) = n2 + 1
Solution: n2 + 1 ≤ 2n2, for all n ≥ 1
∴ n2 + 1 = O(n2) with c = 2 and n0 = 1
Example-3
Find upper bound for f(n) = n4 + 100n2 + 50
Solution: n4 + 100n2 + 50 ≤ 2n4, for all n ≥ 11
∴ n4 + 100n2 + 50 = O(n 4 ) with c = 2 and n0 = 11
Example-4
Find upper bound for f(n) = 2n3 – 2n2
Solution: 2n3 – 2n2 ≤ 2n3, for all n > 1
∴ 2n3 – 2n2 = O(n3) with c = 2 and n0 = 1
Example-5
Find upper bound for f(n) = n
Solution: n ≤ n, for all n ≥ 1
∴ n = O(n) with c = 1 and n0 = 1
Example-6
Find upper bound for f(n) = 410
Solution: 410 ≤ 410, for all n > 1
∴ 410 = O(1) with c = 1 and n0 = 1
No Uniqueness?
There is no unique set of values for n0 and c in proving the asymptotic bounds.
Let us consider, 100n + 5 = O(n).
For this function there are multiple n0 and c values possible.
Solution 1: 100n + 5 ≤ 100n + n = 101n ≤ 101n, for all n ≥ 5, n0=5 and c=101 is a solution.
Solution 2: 100n + 5 ≤ 100n + 5n = 105n ≤ 105n, for all n > 1, n0=1 and c=105 is also a solution.
15. Omega-Q Notation [Lower Bounding Function]
15.1 Definition:
Ω notation is defined as Ω(g(n)) = {f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0}.
g(n) is an asymptotic tight lower bound for f(n).
Our objective is to give the largest rate of growth g(n) which is less than or equal to the given algorithm’s rate of growth f(n).
Similar to the O discussion, this notation gives the tighter lower bound of the given algorithm and we represent it as f(n) = Ω(g(n)).
That means, at larger values of n, the tighter lower bound of f(n) is g(n).
Ex: if f(n) = 100n2 + 10n + 50, g(n) is Ω(n2).
15.2 Ω Examples:
Example-1
Find lower bound for f(n) = 5n2.
Solution: ∃c, n0 Such that: 0 ≤ cn2 ≤ 5n2 ⇒ cn2 ≤ 5n2 ⇒ c = 5 and n0 = 1
∴ 5n2 = Ω(n2) with c = 5 and n0 = 1
Example-2
Prove f(n) = 100n + 5 ≠ Ω(n2).
Solution: ∃ c, n0 Such that: 0 ≤ cn2 ≤ 100n + 5 ⇒ 100n + 5 ≤ 100n + 5n(∀n ≥ 1) = 105n ⇒ cn2 ≤ 105n ⇒ n(cn - 105) ≤ 0. Since n is positive ⇒ cn - 105 ≤ 0 ⇒ n ≤ 105/c ⇒ Contradiction: n cannot be smaller than a constant
Example-3
2n = Q(n), n3 = Q(n3), = O(logn).
16. Theta-Θ Notation [Order Function]
16.1 Definition:
This notation decides whether the upper and lower bounds of a given function (algorithm) are the same. The average running time of an algorithm is always between the lower bound and the upper bound. If the upper bound (O) and lower bound (Ω) give the same result, then the Θ notation will also have the same rate of growth.
Ex: f(n) = 10n + n. Then, its tight upper bound g(n) is O(n). The rate of growth in the best case is g(n) = O(n). In this case, the rates of growth in the best case and worst case are the same. As a result, the average case will also be the same. For a given function (algorithm), if the rates of growth (bounds) for O and Ω are not the same, then the rate of growth for the Θ case may not be the same.
In this case, we need to consider all possible time complexities and take the average of those (for example, for a quick sort average case, refer to the Sorting chapter).
Now consider the definition of Θ notation. It is defined as Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0}. g(n) is an asymptotic tight bound for f(n).
Θ(g(n)) is the set of functions with the same order of growth as g(n).
16.2 Θ Examples:
Example 1
Find Θ bound for f(n) = n2/2 - n/2
Solution: n2/5 ≤ n2/2 - n/2 ≤ n2 for all, n ≥ 2
∴ n2/2 - n/2 = Θ(n2) with c1 = 1/5, c2 = 1 and n0 = 2
Example 2
Prove n ≠ Θ(n2)
Solution: c1n2 ≤ n ≤ c2n2 ⇒ only holds for: n ≤ 1/c1
∴ n ≠ Θ(n2)
Example 3
Prove 6n3 ≠ Θ(n2)
Solution: c1n2 ≤ 6n3 ≤ c2n2 ⇒ only holds for: n ≤ c2/6
∴ 6n3 ≠ Θ(n2)
Example 4
Prove n ≠ Θ(logn)
Solution: c1logn ≤ n ≤ c2logn ⇒ c2 ≥ n/logn, ∀n ≥ n0 ⇒ Impossible
17. Important Notes
For analysis (best case, worst case and average), we try to give the upper bound (O) and lower bound (Ω) and average running time (Θ).
Not always be possible to get O, Ω, Θ.
Ω is of no practical importance.
Use the Θ if O and Ω are the same.
18. Why is it called Asymptotic Analysis?
In every case for a given function f(n), we are trying to find another function g(n) (also a curve) which approximates f(n) at higher values of n.
In mathematics, g(n) curve is an asymptotic curve.
For this reason, we call algorithm analysis asymptotic analysis.
19. Guidelines for Asymptotic Analysis
There are some general rules to help us determine the running time of an algorithm:
Loops: O(n)
The running time of a loop is, at most, the running time of the statements inside the loop (including tests) multiplied by the number of iterations.
# executes n times
for i in range(n):
m += 1 #constant time, c
Total time = a constant c × n = cn = O(n)
Nested loops: O(n2)
Analyze from the inside out. Total running time is the product of the sizes of all the loops.
# outer loop executes n times
for i in range(n):
# inner loop executes n times
for j in range(n):
k += 1 #constant time, c
Total time = c × n × n = cn2 = O(n2).
Consecutive statements: O(n2)
Add the time complexities of each statement.
x +=1 #constant time
# first loop executes n times
for i in range(n):
m += 1 #constant time, c
# outer loop executes n times
for i in range(n):
# inner loop executes n times
for j in range(n):
k += 1 #constant time, c
Total time = c0+ c1n + c2n2 = O(n2)
If-then-else statements: O(n)
Worst-case running time: the test, plus either the then part or the else part (whichever is the larger).
# test: constant
if (n == 0):
return false #then part: constant
else: #else part: (constant + constant) * n
for j in range(n):
#another if: constant + constant (no else part)
if (True):
#constant
return True
Total time = c0 + c1 + (c2 + c3) * n = O(n).
Logarithmic complexity: O(logn)
If it takes a constant time to cut the problem size by a fraction (usually by ½).
i=1
while i<n:
i = i*2
The value of i is doubling every time. Initially i = 1, in next step i = 2, and in subsequent steps i = 4, 8 and so on.
Let us assume that the loop is executing some k times. At kth step 2k = n, and at (k+1)th step, we come out of the loop.
Taking logarithm on both sides, gives:
log(2k) = logn
klog2 = logn
k = logn #assume base 2
Total time = O(logn).
Another example: binary search (finding a word in a dictionary of n pages)
20. Simplifying properties of asymptotic notations
Transitivity: f(n) = Θ(g(n)) and g(n) = Θ(h(n)) ⇒ f(n) = Θ(h(n)). (Valid for O and Ω).
Reflexivity: f(n) = Θ(f(n)). (Valid for O and Ω).
Symmetry: f(n) = Θ(g(n)) if and only if g(n) = Θ(f(n)).
Transpose symmetry: f(n) = O(g(n)) if and only if g(n) = Ω(f(n)).
If f(n) is in O(kg(n)) for any constant k > 0, then f(n) is in O(g(n)).
If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then (f1 + f2)(n) is in O(max(g1(n)), (g2(n))).
If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)).
21. Commonly used Logarithms and Summations
21.1 Logarithm:
21.2 Arithmetic series:
21.3 Geometric series:
21.4 Harmonic series:
21.5 Other important formula:
22. Master Theorem for Divide and Conquer Recurrences
22.1 Definition:
All divide and conquer algorithms divide the problem into sub-problems, each of which is part of the original problem, and then perform some additional work to compute the final answer.
22.2 Example:
A merge sort algorithm operates on two sub-problems, each of which is half the size of the original, and then performs O(n) additional work for merging. This gives the running time equation:
T(n) = 2 T(n/2) + O(n)
22.3 Determine the running time:
For a given program (algorithm), first we try to find the recurrence relation for the problem.
If the recurrence is of the below form then we can directly give the answer without fully solving it.
If the recurrence is of the form:
T(n) = aT(n/b) + Θ(nklogpn),
where a ≥ 1, b > 1, k ≥ 0 and p is a real number, then:
23. Master Theorem for Subtract and Conquer Recurrences
Let T(n) be a function defined on positive n, and having the property:
T(n) = c if n ≤ 1 or aT(n-b) + f(n) if n > 1
For some constants c, a > 0, b ≥ 0, k ≥ 0, and function f(n). If f(n) is in O(nk), then:
T(n) = O(nk) if a < 1 or O(nk+1) if a = 1 or O(nkan/b) if a > 1
24. Variant of Subtraction and Conquer Master Theorem
The solution to the equation T(n) = T(α n) + T((1 – α)n) + βn, where 0 < α < 1 and β > 0 are constants, is O(nlogn).
25. Method of Guessing and Confirming
The solution to the equation T(n) = T(α n) + T((1 – α)n) + βn, where 0 < α < 1 and β > 0 are constants, is O(nlogn).