==== DRAFT FOR DISCUSSION IN CS252, 28 Sep 2015 ==== Unweighted Interval Scheduling: We are given a set of activities A = {a_1,...,a_n}, where each activity a_j = (s_j, f_j) consists of a start time s_j and a finish time f_j, with s_j < f_j. Two activities are said to be "compatible" if their time intervals do not overlap (except possibly at their endpoints). That is, a_i and a_j are compatible if s_i >= f_j or s_j >= f_i. A set of activities is said to be "compatible" if all pairs of activities from the set are compatible. Note that the empty set and one-element sets of activities are compatible since they have no pairs to test. A compatible subset B of A is said to be maximal if no other compatible subset of A has a greater size than B. (There may, however, be many maximal compatible subsets, which thus will all be of the same size.) Our goal for the Unweighted Interval Scheduling problem is to describe an efficient algorithm for computing a maximal compatible subset of a given set of activities. Our strategy will be: (1) Sort the set in increasing order by finishing times. Thus, rather than a set, we will assume we start with a sequence of activities A = (a_1,...,a_n) where (i <= j) => (f_i <= f_j). (2) Use a greedy approach, adding activities to our subsequence in increasing order of finish time, as shown in the algorithm below. ==== The algorithm ==== // Preconditions: (1) Each activity a_k = (s_k, f_k), where // s_k is the activity's start time and f_k is its finish time. // (2) The a_k's are sorted in increasing order of f_k. 1 MAX_SIZE_COMPATIBLE_SCHEDULE(A = [a_1,...,a_n]): 2 B = [] 3 i = 1 4 while i <= n: 5 B.append(a_i) 6 j = i + 1 7 while j <= n and s_j < f_i: 8 j++ 9 i = j 10 11 return B ==== Correctness proof ==== Claim: MAX_SIZE_COMPATIBLE_SCHEDULE returns a compatible subsequence of A that has maximal length among all compatible subsequences of A. Proof: First, note that the loop at line 4 and the loop at line 7 both terminate, because their counting indexes i and j each advance by at least 1 per iteration, and are bounded above by n. Thus, our algorithm terminates for any finite sequence A. Let B = [b_1,...,b_k] be the sequence returned by MAX_SIZE_COMPATIBLE_SCHEDULE. Second, note that the claim is trivial if n = 0 (in which case B is the empty sequence, which is maximal) or n = 1 (in which case B = A, which is obviously maximal). Thus, throughout this proof, we will assume that n >= 2. To complete our proof, we need to show: (a) B is a subsequence of A (b) B is a compatible sequence (c) Any compatible subsequence C has length <= k (a) B starts out empty (line 2), and is modified only in line 5, where an element of A is appended to it. Thus, B is a subsequence of A. (b) The first time line 5 is executed, B's length goes from 0 to 1, and thus there are no compatibility concerns. As we enter line 9, either j > n (in which case loop 4 will terminate and B will receive no new activities) or a_j is compatible with the most recently added activity a_i because s_j >= f_i. Furthermore, since the activities in A are sorted in increasing order of finishing time, all activities a_h in schedule have f_h <= f_i <= s_j. Thus, a_j is compatible with all the activities in B, so when a_j gets added to B at line 5, B remains compatible. Therefore, B remains compatible throughout the algorithm. This proves subclaim (b). (c) Let C = [c_1,...,c_m] be any compatible subsequence of A. Suppose that m > k. We will argue that this assumption leads to a contradiction. We will use the notation sc_i to mean "starting time for c_i," and similarly for fc_i, sb_i, fb_i, etc. First, observe from a trace of lines 2-5 that b_1 = a_1. Therefore, since c_1 = a_i for some value of i, fb_1 <= fc_1. That is, b_1 finishes earlier than or at the same time as c_1. Now consider c_2 and b_2. We see that b_2 gets added to B at line 5, after the loop at lines 7-8 executes with i = 1. During that loop, j skips over zero or more elements of A, say a_2,...,a_h, because they all have starting times < fa_1. But since fa_1 = fb_1 <= f_c1, that means a_2,...,a_h are also incompatible with c_1. Therefore, c_2 must be a_i with i > h. That is, c_2 is either equal to b_2 or c_2 comes from later in the a_i sequence than b_2 does. Therefore, fc_2 >= fb_2. Arguing similarly for c_3 and beyond, we can conclude that fb_i <= fc_i for all i <= k. Now, recall our assumption that m > k. Then c_(k+1) = a_h for some h, where a_h comes later in A than does b_k. Therefore: sa_h = sc_(k+1) >= fc_k >= fb_k But if sa_h >= fb_k and a_h is later in A than is b_k, then our final iteration of the loop at lines 7-8 would have stopped with j = h, and thus added a_h to B. This is a contradiction. Therefore, our assumption that m > k must be false. So we can conclude that any compatible subsequence C of A must have length <= the length of B. Therefore, B is of maximal length among compatible subsequences of A. Little white square, or FIN, or QED, or Booyah, or whatever you prefer to use to terminate your proofs. ==== Running time analysis ==== The outer loop's commands execute only once per execution of the inner loop, which in turn consumes at least one index value (from 1 through n) per execution. Thus, we can evaluate the running time of MAX_SIZE_COMPATIBLE_SCHEDULE by counting the number of iterations of the inner loop. The inner loop at lines 7-8 iterates exactly once for each value of j from 1 through n. Therefore, MAX_SIZE_COMPATIBLE_SCHEDULE takes \Theta(n) time. If you include the sorting as part of the algorithm, then the whole solution requires \Theta(n log n) time.