CPSC 421/501: Skeletal notes of what will be covered in class. THIS PLAN IS TENTATIVE: MATERIAL WILL BE ADDED AND MODIFIED. -------------------------------------------------------- September 3-19 or so: Paradoxes, Counting, and Acceptance/Halting Problem. The main point of this article is to discuss two subtle ideas in this course, used to prove that certain problems "cannot be solved": namely counting arguments, and specific problems which--in a sense--involve self-referencing. Remarks on the handout "Uncomputability and Self-Referencing in CPSC 421": * Sections 1 and 2 describe some very general "paradoxes," and not all "paradoxes" listed there are true paradoxes. * Section 3 is very easy (regarding finite sets), but is really a warm-up for Section 4 (regarding countable and uncoutable infinite sets). * The rest of the article deals with the Halting Problem and uncomputability in a number of contexts. Most first courses in computation theory (in my humble opinion) do this too quickly, so that the fundamental ideas are not well understood. In particular, the Halting Problem can be stated for C programs, Java programs, Turing Machines, RAMs (Random-Access Machines), etc. List paradoxes in handout. Give example of making things precise. We typically use languages to describe any yes/no 0/1 true/false question, a subset of strings/words over an alphabet. * Introduce Sigma, Simga^*, empty string, etc. * Give examples of how to make "paradoxes" precise, when they lead to a real paradox, when they don't * Embark on making some of them precise. * Make the halting problem precise. General Counting * The rational numbers are countable, the real numbers uncountable (see Sipser 4.2). * The set of strings over a finite alphabet is countable. * The set of languages over a non-empty alphabet is uncountable. * The set of subsets of a countable set is not countable. Generally, set of subsets of a set, S, cannot be put into 1-1 correspondence with S. Proof: if f:S->P(S), let T={s s.t. f(s) does not contain s}. Contradiction (similar to Russell's Paradox and to diagonalization). ---------------------------------------------------------------- Specifics: September 3: Some paradoxes (they are all self-referential): (2) This statement is a lie. (3) This sentence referes to the smallest positive integer not defined by a sentence of thirty words or fewer. (6) Let S = "The set of all sets that don't contain themselves". Does S contain itself? (7) "This is a C progrma that takes a string, s, as input and (i) loops infinitely if s is a valid C program that halts on input s, and (ii) halt otherwise." How does this program behave on its own input? Sometimes paradoxes give theorems, sometimes they are resolved by telling you what you can't allow yourself to do, etc. ------------- Self-referencing occurs--in a sense--in "diagonalization arguments." Define two types of infinities: "countably infinite" and "uncountably infinite". Def: A set, S, is **countable** if there is a sequence of elements of S, s1, s2, s3, ... such that each element of S appears in this sequence at least once. Remark 1: Any finite set is countable: e.g., S = {1,3,4}, then the sequence 1,3,4,4,4,4,... is a sequence in S that contains each element of S. Remark 2: The rational numbers are countable: consider the following lists: List 1: 0 List 2: 1/1, -1/1 List 3: 2/1, -2/1, 1/2, -1/2 List 4: 3/1, -3/1, 2/2, -2/2, 1/3, -1/3 etc. The lists in order give a sequence: 0, 1/1, -1/1, 2/1, -2/1, 1/2, -1/2, ... such that any rational number occurs at least one in this sequence. Theorem: The real numbers are uncountable (i.e., not countable). Two proofs (very similar). The first proof is like that in Sipser 4.2: First proof: If r1, r2, r3, ... is any sequence of reals, let r be real such that the i-th digit to the right of the decimal point disagrees with ri (and doesn't equal 9 or 0, to avoid problems such as 3.2699999... = 3.270000... ) If you draw a picture, you read off the digits of r from the "diagonal". Second proof: This requires an important lemma. Lemma: Let S be a set. Then Power(S), the set of all subsets of S, is "larger" than S, i.e., there is no map f : S -> Power(S) such that the image of f is all of Power(S). Proof: Assume (for the sake of contradiction) that such an f exists. Let T = { s in S such that s is not in f(s) }. Can there exist a t in S such that f(t) = T ? No: get contradiction. You can actually see this proof in action. For example, let S = { 1,2,3 }, and let f : S -> Power(S) be given by f(1) = {1,3}, f(2)= {1,2,3}, f(3)={1} . Then T = { 3 }. But now we claim that for t=1,2,or 3, f(t) cannot equal T. Second proof: To each subset, S, of {1,2,3,...} define r(S) = sum_{s in S} 1/(10)^s. Then r : Power({1,2,3,...}) -> reals is injective (i.e., two distinct subsets are taken to distinct real numbers. "Hence" (see class discussion) the real numbers are at least as "large in cardinality" than Power({1,2,3,...}); but this power set is uncountable. Hence the reals are uncountable. The term "Hence" is really the following lemma: Lemma: Let A be uncountable, and let r : A -> B be an injective map (i.e., r takes distinct elements of A to distinct elements of B). Then B is uncountable. ----------------------- Starting September 10: Discuss Godel's Incompleteness Theorems: They say that in any "sufficiently expressive system," either (1) some statement is both true and false (which usually implies that any statement is both true and false), (2) some statement is neither true nor false (which is bad news as well), or (3) some statement is true but not provably true. The only problem is that mathematicians might want to know if, for example, the Riemann Hypothesis is true but not provably true, but Godel's Incompleteness Theorem only gives a somewhat artificial theorem that is true but not provably true. Discuss universal machines and 421Simple and 421TRAN, as in handout. Universal machines are relatively easy to build in powerful languages. 421Simple is a simple language that ressembles the way Turing machines work. 421Simple reserved words: arrays: INPUT, WORKTAPE, OUTPUT, integers: AUG, DEC, RESET, flow: IF, THEN GOTO, GOTO, END, EOF, LET 421TRAN: This is 421Simple with various extras. There could be numerous dialects. The key is to see that even in very limited languages, such as 421Simple, one can build universal machines. Begin giving axioms. ----------------------- September 22, 2014: We have finished the axioms and the proof that A_P is recognizable but not decidable; and also that A_P complement is not even recognizable. We will leave the discussion of Halt_P to the exercises. We wil start our coverage of Turing machines, covering Sections 3.1, 3.2, 3.3 4.2, 7.1, 7.2, and 7.3 of the text; at this point we will see languages that are not decidable, even not recognizable (Section 4.2), and by Section 7.3 we will be able to formally state the problem "P versus NP," and more generally describe TIME( f(n) ) and NTIME( f(n) ) for functions f defined on the integers. September 24, 2014: Sipser distinguishes between: (1) "high-level description" of a Turing machine (English prose ignoring how things are implemented, (2) "implementation description," discussing what is written where and how the tape head moves, and (3) "formal description," where you specify Q, Gamma, delta (writing out delta's values in a table or a diagram), etc. Consider: (1) adding one to a base three integer (you can fairly easily give a formal description), (2) adding two base ten integers (an implementation description is not hard, a formal description is more difficult), (3) a universal Turing machine (high-level or describing a detail or two is most reasonable). Variants: k-tape machine, RAM (random-access machine), PRAM (parallel RAM, to model parallel computation), Non-deterministic Turing machine. Remark: Palindrome requires time proportional to n^2 in a one-tape machine, but linear time in a 2-tape machine or a RAM. After discussing P we jump briefly to: Theorem 9.10: For any time constructible function t from the integers to the integers, there is a language decidable in time order t(n) but not decidable in time asymptotically smaller than t(n)/log(t(n)). ------------------------------- October 6, 2014: We will prove a weaker form of the "Time Hierarchy Theorem." Theorem: For any integer a >= 1 there is a language computable in time n^{2a+1} that is not computable in time order n^a . The proof relies on two ideas: (1) There exist universal Turing machines which on a given description of a Turing machine and input, , can simulate n^a steps of M in time at most C_M n^(2a), where C_M is a constant depending on M, and (2) One can take a Turing machine, M, and pad it with extra states (and hence a larger delta function). To make the proof simple, we assume: (3) A Turing machine has state set Q = {1,2,...,q} for some integer q, where 1 is the initial state, 2 is the accepting state, and 3 is the rejecting state, and has input alphabet Sigma = {1,...,s} for some integer s, and worktape(s) alphabet Gamma = {0,1,...,g} for some integer g, where 0 is the blank symbol. We will describe our Turing machine by writing only the description of the delta function, delta(state,symbol), where we first list state 1 and symbol 1, then state 1 and symbol 2, etc. As such (3') If we "pad" a Turing machine with extra states, a simulation will never reach the delta values of these extra states. An alternative to (3) and (3') is (3'') We can "pad" our Turing machine descriptions (in class we typically used the alphabet {0,1,...,9,#,L,R} ) with special symbols {a,b,...,z} which are ignored; we are free to take any Turing machine and pad it with a string from {a,b,...,z} of arbitrary length, which is then ignored. The original Time Hierarchy Theorem was stated as above, due to Hartmanis and Stearns (the journal publication appeared in 1965); later, a better universal Turing machine (Hennie and Sterns, journal publication in 1966) allowed to improve the theorem essentially to the form in which it is stated these days. Proof of the Time Hierarchy Theorem (our proof is slightly different than that in Siper's textbook): Consider a Turing machine, M, which operates as follows: on input, w, of size n, M first copies w to a special worktape to store w. Then M runs for n^(2a+1) additional steps, during which it simulates w, viewed as a Turing machine description, on input w. (If w is not the description of a Turing machine, then it doesn't matter what we do, as long as M stop within n^(2a+1) steps.) We claim that if M' is a Turing machine that recognizes a language L in time order n^a, then M and M' will disagree on some input, w: indeed, we can pad M' to be arbitrarily long (we are careful to do this by padding to M' a long, irrelevant suffix) machine M'' such that to simulate M'' on any input, w, for |w|^(a+0.4) steps takes time at most C=C_{M'} times |w|^(2a+0.8) time. Since M runs its simulation for |w|^(2a+1) steps, if |w| is sufficiently large then M will determine whether M'' on input |w| has reached the accepting or rejecting state. Since M simulates the description of M'' on input , it follows that if M'' is sufficiently padded, then M will determine what M'' (and therefore M') does on input , and then do the opposite. Hence if M' is a Turing machine that recognizes a language L in time order n^a, then M and M' will disagree on some input, w. Hence M recognizes a language that is different from that recognized by M'. Hence the language recognized by M, which takes time order n^{2a+1}, cannot be recognized by a Turing machine running in time order n^a. (End of proof) The improvement of Hennie and Stearns is due to a k-tape simulation of an arbitrary Turing machine, M, which on input takes time C_M n^a log(n^a) steps (for any real a) to simulate n^a steps to M on input i, with |i|=n. Hence one can say that for any real a, TIME( n^a ) is properly contained in TIME( n^a (log n)^1.00001 ) for example. The general statement is that TIME( f(n) ) is properly contained in TIME( g(n) ), provided that f(n) log(f(n)) = o(g(n)) , and that f(n) or g(n) can be computed in time order g(n) (if f(n) and g(n) are very difficult to compute, or even to reasonably bound, then it may be hard to know when to stop your simulation...). ------------------------ October 8 or 10, 2014: We now know that the class P is fairly rich in structure. We turn out attention to the class NP and the Cook-Levin theorem (Theorem 7.37 in Sipser's textbook): Theorem: If L is a language in NP, then there is a polynomial time reduction of L to SAT. From this it follows that SAT is an example of a "hardest problem in NP," which we call "NP-complete" (see Sipser's textbook and class for details). It then follows that a slew of problems, including 3SAT, 3COLOR, PARTITION, SUBSET-SUM, etc. are also NP-complete. Proof of Cook-Levin: Given a Non-deterministic Turing machine, M, and an input w, to M, we wish to simulate M on w for m = c n^c steps (we may assume |w| >= 1) for any given positive integer, c. Idea: Represent all configurations of M with a Boolean formula which tells us the configuration. We assume that the Non-deterministic Turing machine has only one tape. We identify Q with {1,2,...,q}, and Gamma with {1,2,...,gamma} x_{i,j,s} = At time i and cell j of the tape, the tape cell has the value that is the s-th element of Gamma, y_{i,j} = At time i, the tape head is on the j-th cell, and z_{i,t} = At time i, we are in the t-th state of Q. (Sipser does things slightly differently, combining Q and Gamma which (to me) seems to complicates matters...) We now want to write a Boolean formula (in polynomial in |w| time) such that the Non-deterministic Turing machine has an accepting computation that terminates in m = c n^c steps. This is tedious but not difficult. We will give some details in class. (Also see the textbook.) Theorem (Slightly improved, Corollary 7.42 in the textbook): 3SAT is NP-complete, i.e., any problem in NP can be polynomially time reduced to 3SAT. -------------------------- Using SAT and 3SAT (most often 3SAT), we show that SUBSET-SUM, 3COLOR, CLIQUE, etc. are NP-complete -------------------------- November 3: Starting the Baker-Gill-Soloway Theorem, and a discussion of SPACE. The B-G-S theorem says: (1) There is an oracle A such that P^A = NP^A, and (2) there is an oracle B such that P^B not equal NP^B. Our Approach: part (2) is fairly tricky and far less interesting than (1). So we will discuss part (1) fully, and perhaps prove (2) if time and energy permit... Remark 1: (1) is far more interesting than (2). Reason: most people believe that P is not NP; most people believe that if P = NP, then it is a matter of coming up with an algorithm for SAT, 3COLOR, etc. and showing that it runs in polynomial time (this is, of course, not necessarily the case, but this is how most languages are shown to be in P...). If you believe that P is not NP, you would like to give a proof. Most of our theorems and techniques about P, NP, TIME(g(n)), NTIME(g(n)) generalize immediately to oracle Turing machines. So if you have a proof than P is not NP, it had better not hold true of oracle Turing machines (recall the "blah blah blah Turing machine blah blah" principle discussed in class). The fact that P^A = NP^A for some oracle, actually any PSPACE complete problem, shows that you must be doing something that does not generalize to Turing machines with such an oracle. Remark 2: (2) is only interesting if you think that P = NP and that you are going to prove this is the case using the aforementioned ideas that generalize to Turing machines with oracles (simulation, diagonalization, etc.). Few people believe that P = NP, and those who do don't think you will prove this by such methods. Remark 3: Showing (1) will involve understanding some principles of SPACE bounded computations, which are very different from TIME bounded computations. Remark 4: If A is any PSPACE complete problem, then NP^A is contained in NPSPACE (almost immediate), NPSPACE is contained in PSPACE (Savitch's theorem), PSPACE is contained in P^A (immediate), and P^A is contained in NP^A (immediate). Hence P^A = NP^A = PSPACE ( = NPSPACE ). Remark 5: One can usually find languages that are complete for a complexity class for almost trivial reasons. Example 1--a language that is NP-complete for trivial reasons: Let L = { where M is a non-det Turing machine, i is an input to M and t is an integer given in unary (!!) such that M accepts i after some computation branch that halts in t steps} (It is crucial that t is given in unary!) Claim: L is NP-complete, and the proof of this is almost trivial(!). Proof: Discussed in class, but is very easy. Example 2--a language that is PSPACE-complete for trivial reasons: Let L = { where M is a non-det Turing machine, i is an input to M and s is an integer given in unary (!!) such that M accepts i after some computation branch that halts using at most s space} (It is crucial that s is given in unary!) Claim: L is PSPACE-complete, and the proof of this is almost trivial(!). Proof: Discussed in class, but is very easy. Remark 6: Let A be the language in the above Example 2. Then P^A = NP^A; indeed, this follows from the above remarks. November 14: Discuss circuits and formulas. Formulas are circuits whose DAG (directed acyclic graph) structure is that of a tree. Remark: Give the functions: and, or, negation, the XOR (or parity) of n Boolean variables can be computed by a circuit of size 4n-3. However, a formula for XOR is known to require order n^2 size (Khrapchenko, 1971). (Of course, if we allow an XOR gate, we see that XOR has a formula of size n-1 (or n, depending on how one counts).) Remark: Since we have XOR(x_1,x_2) = ( x_1 and (not x_2) ) or ( (not x_1) and x_2 ), we see that if f(n) is the formula size required for majority, then f(n) <= 4 f(n/2) (perhaps plus a constant, depending on how one is counting), which gives f(n) <= order(n^2). To prove that a formula for XOR of n variables requires order(n^2) size is much more difficult. Remark: Min circuit depth = Min formula depth is proportional to Min formula size. Min circuit size is probably more difficult to compute. Remark: Most Boolean functions require a circuit of size order 2^n / n (and size 2^n n is clearly sufficient). We can give a proof by counting (very crudely). Proof: If we have x_1,..,x_n and "interior vertices" y_1,...,y_m, then each y_i can be choosen from (1) the AND of two previous variables, (2) the OR of two previous variables, or (3) the negation of a previous variable, for a possible 2 ( (n+m) choose 2 ) + (n+m)-1. Set r = n+m. Then we have the number of circuits of size r is at most (2r^2)^r (very crudely). Since there are 2^(2^n) Boolean functions, the majority of Boolean functions require r to be any value such that (2r^2)^r < 2^(2^n)/2. Taking logs this requires that r order( log r) < 2^n, which is satisfied for r as large as order(2^n/n).