CPSC 421/501: Skeletal notes of what will be covered in class.
THIS PLAN IS TENTATIVE: MATERIAL WILL BE ADDED AND MODIFIED.

--------------------------------------------------------

September 3-19 or so: Paradoxes, Counting, and Acceptance/Halting Problem.
The main point of this article is to discuss two subtle ideas in this 
course, used to prove that certain problems "cannot be solved": namely
counting arguments, and specific problems which--in a sense--involve
self-referencing.

Remarks on the handout "Uncomputability and Self-Referencing in CPSC 421":
  * Sections 1 and 2 describe some very general "paradoxes," and not all
    "paradoxes" listed there are true paradoxes.
  * Section 3 is very easy (regarding finite sets), but is really a
    warm-up for Section 4 (regarding countable and uncoutable infinite
    sets).
  * The rest of the article deals with the Halting Problem and
    uncomputability in a number of contexts.  Most first courses in
    computation theory (in my humble opinion) do this too quickly, so
    that the fundamental ideas are not well understood.  In particular,
    the Halting Problem can be stated for C programs, Java programs,
    Turing Machines, RAMs (Random-Access Machines), etc.

List paradoxes in handout. Give example of making things precise.
We typically use languages to describe any yes/no 0/1 true/false question,
a subset of strings/words over an alphabet.  
  * Introduce Sigma, Simga^*, empty string, etc.
  * Give examples of how to make "paradoxes" precise, when they lead to
	a real paradox, when they don't
  * Embark on making some of them precise.
  * Make the halting problem precise.

General Counting
  * The rational numbers are countable, the real numbers uncountable
    (see Sipser 4.2).
  * The set of strings over a finite alphabet is countable.
  * The set of languages over a non-empty alphabet is uncountable.
  * The set of subsets of a countable set is not countable.  Generally,
	set of subsets of a set, S, cannot be put into 1-1 correspondence
	with S.  Proof: if f:S->P(S), let T={s s.t. f(s) does not contain s}.
	Contradiction (similar to Russell's Paradox and to diagonalization).

----------------------------------------------------------------

Specifics:
September 3:
Some paradoxes (they are all self-referential):

(2) This statement is a lie.

(3) This sentence referes to the smallest positive integer not defined
by a sentence of thirty words or fewer.

(6) Let S = "The set of all sets that don't contain themselves".
    Does S contain itself?

(7) "This is a C progrma that takes a string, s, as input and (i) loops
infinitely if s is a valid C program that halts on input s, and
(ii) halt otherwise."  How does this program behave on its own input?

Sometimes paradoxes give theorems, sometimes they are resolved by telling
you what you can't allow yourself to do, etc.

-------------

Self-referencing occurs--in a sense--in "diagonalization arguments."

Define two types of infinities: "countably infinite" and "uncountably
infinite".

Def: A set, S, is **countable** if there is a sequence of elements of S,
s1, s2, s3, ...
such that each element of S appears in this sequence at least once.

Remark 1: Any finite set is countable:  e.g., S = {1,3,4}, then the
sequence  1,3,4,4,4,4,... is a sequence in S that contains each element of S.

Remark 2: The rational numbers are countable:  consider the following 
lists:
List 1: 0
List 2: 1/1, -1/1
List 3: 2/1, -2/1, 1/2, -1/2
List 4: 3/1, -3/1, 2/2, -2/2, 1/3, -1/3
etc.
The lists in order give a sequence:

0, 1/1, -1/1, 2/1, -2/1, 1/2, -1/2, ...

such that any rational number occurs at least one in this sequence.

Theorem: The real numbers are uncountable (i.e., not countable).
Two proofs (very similar).  The first proof is like that in Sipser 4.2:

First proof:  If   r1, r2, r3, ... is any sequence of reals, let
r be real such that the i-th digit to the right of the decimal point
disagrees with ri (and doesn't equal 9 or 0, to avoid problems such as
3.2699999... = 3.270000... )

If you draw a picture, you read off the digits of  r  from the "diagonal".

Second proof: This requires an important lemma.

Lemma: Let S be a set.  Then Power(S), the set of all subsets of S, is
"larger" than S, i.e., there is no map    f : S -> Power(S) such that
the image of f is all of Power(S).
Proof: Assume (for the sake of contradiction) that such an  f  exists.
Let
T = { s in S   such that    s is not in f(s) }.
Can there exist a  t in S  such that  f(t) =  T  ?   No: get contradiction.

You can actually see this proof in action.  For example, let
S = { 1,2,3 }, and let  f : S -> Power(S)  be given by
f(1) = {1,3},  f(2)= {1,2,3},  f(3)={1} .
Then  T = { 3 }.  But now we claim that for t=1,2,or 3, f(t) cannot equal T.

Second proof:
To each subset, S, of {1,2,3,...} define 
r(S) = sum_{s in S} 1/(10)^s.
Then   r : Power({1,2,3,...}) -> reals   is injective (i.e., two distinct
subsets are taken to distinct real numbers.  "Hence" (see class discussion)
the real numbers are at least as "large in cardinality" than
Power({1,2,3,...}); but this power set is uncountable.  Hence the reals are
uncountable.

The term "Hence" is really the following lemma:
Lemma: Let A be uncountable, and let  r : A -> B be an injective map
(i.e., r takes distinct elements of A to distinct elements of B).
Then  B  is uncountable.

-----------------------

Starting September 10:

Discuss Godel's Incompleteness Theorems:  They say that in any "sufficiently
expressive system," either
(1) some statement is both true and false (which usually implies that any
statement is both true and false),
(2) some statement is neither true nor false (which is bad news as well),
or
(3) some statement is true but not provably true.
The only problem is that mathematicians might want to know if, for example,
the Riemann Hypothesis is true but not provably true, but Godel's
Incompleteness Theorem only gives a somewhat artificial theorem that is
true but not provably true.

Discuss universal machines and 421Simple and 421TRAN, as in handout.
Universal machines are relatively easy to build in powerful languages.
421Simple is a simple language that ressembles the way Turing machines
work.

421Simple reserved words: arrays: INPUT, WORKTAPE, OUTPUT, 
integers: AUG, DEC, RESET,  flow: IF, THEN GOTO, GOTO, END, EOF, LET

421TRAN: This is 421Simple with various extras.  There could be numerous
dialects.

The key is to see that even in very limited languages, such as 421Simple,
one can build universal machines.

Begin giving axioms.

-----------------------

September 22, 2014:

We have finished the axioms and the proof that A_P is recognizable but
not decidable; and also that A_P complement is not even recognizable.

We will leave the discussion of Halt_P to the exercises.

We wil start our coverage of Turing machines, covering Sections 3.1, 3.2, 3.3
4.2, 7.1, 7.2, and 7.3 of the text; at this point we will see languages
that are not decidable, even not recognizable (Section 4.2), and by
Section 7.3 we will be able to formally state the problem "P versus NP,"
and more generally describe  TIME( f(n) )  and  NTIME( f(n) )  for
functions  f  defined on the integers.

September 24, 2014:

Sipser distinguishes between: (1) "high-level description" of a Turing machine
(English prose ignoring how things are implemented, (2) "implementation
description," discussing what is written where and how the tape head
moves, and (3) "formal description," where you specify Q, Gamma, delta
(writing out delta's values in a table or a diagram), etc.

Consider: (1) adding one to a base three integer (you can fairly easily
give a formal description), (2) adding two base ten integers (an
implementation description is not hard, a formal description is more
difficult), (3) a universal Turing machine (high-level or describing
a detail or two is most reasonable).

Variants: k-tape machine, RAM (random-access machine), PRAM (parallel
RAM, to model parallel computation), Non-deterministic Turing machine.

Remark: Palindrome requires time proportional to n^2 in a one-tape
machine, but linear time in a 2-tape machine or a RAM.

After discussing P we jump briefly to:
Theorem 9.10: For any time constructible function t from the integers to
the integers, there is a language decidable in time order t(n) but not
decidable in time asymptotically smaller than t(n)/log(t(n)).


-------------------------------

October 6, 2014:

We will prove a weaker form of the "Time Hierarchy Theorem."

Theorem: For any integer  a >= 1  there is a language computable in time
n^{2a+1} that is not computable in time order  n^a .

The proof relies on two ideas:

(1) There exist universal Turing machines which on a given description
of a Turing machine and input, <M,i>, can simulate  n^a  steps of M in
time at most  C_M  n^(2a),  where C_M is a constant depending on M, and

(2) One can take a Turing machine, M, and pad it with extra states (and
hence a larger delta function).

To make the proof simple, we assume:

(3) A Turing machine has state set Q = {1,2,...,q} for some integer q,
where 1 is the initial state, 2 is the accepting state, and 3 is the
rejecting state, and has input alphabet Sigma = {1,...,s} for some integer
s, and worktape(s) alphabet Gamma = {0,1,...,g} for some integer g, where
0 is the blank symbol.  We will describe our Turing machine by writing only
the description of the delta function, delta(state,symbol), where we first
list state 1 and symbol 1, then state 1 and symbol 2, etc.

As such

(3') If we "pad" a Turing machine with extra states, a simulation will
never reach the delta values of these extra states.

An alternative to (3) and (3') is

(3'') We can "pad" our Turing machine descriptions (in class we typically
used the alphabet {0,1,...,9,#,L,R} ) with special symbols
{a,b,...,z} which are ignored; we are free to take any Turing machine and
pad it with a string from {a,b,...,z} of arbitrary length, which is then
ignored.

The original Time Hierarchy Theorem was stated as above, due to Hartmanis
and Stearns (the journal publication appeared in 1965); later, a better
universal Turing machine (Hennie and Sterns, journal publication in 1966)
allowed to improve the theorem essentially to the form in which it is stated
these days.

Proof of the Time Hierarchy Theorem (our proof is slightly different than
that in Siper's textbook):

Consider a Turing machine, M, which operates as follows:  on input, w,
of size n, M first copies w to a special worktape to store w.  Then M
runs for n^(2a+1) additional steps, during which it simulates w, viewed
as a Turing machine description, on input w.  (If w is not the
description of a Turing machine, then it doesn't matter what we do, as
long as M stop within n^(2a+1) steps.)

We claim that if M' is a Turing machine that recognizes a language L in
time order n^a,  then  M  and  M'  will disagree on some input,  w:
indeed, we can pad  M'  to be arbitrarily long (we are careful to do this
by padding to M' a long, irrelevant suffix) machine M'' such that to
simulate M'' on any input, w, for  |w|^(a+0.4) steps takes time at most
C=C_{M'} times |w|^(2a+0.8) time.  Since  M  runs its simulation for
|w|^(2a+1) steps, if |w| is sufficiently large then  M  will determine
whether M'' on input |w| has reached the accepting or rejecting state.
Since  M  simulates the description of M'' on input <M''>, it follows that
if M'' is sufficiently padded, then  M  will determine what M'' (and therefore
M') does on input <M''>, and then do the opposite.

Hence if M' is a Turing machine that recognizes a language L in
time order n^a,  then  M  and  M'  will disagree on some input,  w.
Hence  M  recognizes a language that is different from that recognized
by M'.  Hence the language recognized by M, which takes time order n^{2a+1},
cannot be recognized by a Turing machine running in time order n^a.

(End of proof)

The improvement of Hennie and Stearns is due to a k-tape simulation of an
arbitrary Turing machine, M, which on input <M,i> takes time
C_M  n^a  log(n^a)  steps (for any real a) to simulate  n^a steps to M
on input i, with |i|=n.  Hence one can say that for any real a,

TIME( n^a ) is properly contained in   TIME( n^a (log n)^1.00001 )

for example.  The general statement is that 

TIME( f(n) ) is properly contained in  TIME( g(n) ),

provided that    f(n) log(f(n))  =  o(g(n))   , and that f(n) or g(n) can be 
computed in time order g(n)  (if   f(n)  and   g(n)  are very difficult to
compute, or even to reasonably bound, then it may be hard to know when to
stop your simulation...).

------------------------

October 8 or 10, 2014:

We now know that the class P is fairly rich in structure.  We turn out
attention to the class NP and the Cook-Levin theorem (Theorem 7.37 in
Sipser's textbook):

Theorem: If L is a language in NP, then there is a polynomial time reduction
of L to SAT.

From this it follows that SAT is an example of a "hardest problem in NP,"
which we call "NP-complete" (see Sipser's textbook and class for details).

It then follows that a slew of problems, including 3SAT, 3COLOR, PARTITION,
SUBSET-SUM, etc. are also NP-complete.

Proof of Cook-Levin: Given a Non-deterministic Turing machine, 
M, and an input w, to M, we wish to simulate  M on w for  m = c n^c steps
(we may assume |w| >= 1) for any given positive integer, c.

Idea: Represent all configurations of M with a Boolean formula which tells
us the configuration.  We assume that the Non-deterministic Turing machine
has only one tape.  We identify Q with {1,2,...,q}, and 
Gamma with {1,2,...,gamma}

x_{i,j,s} = At time i and cell j of the tape, the tape cell has the value
  that is the s-th element of Gamma,

y_{i,j} = At time i, the tape head is on the j-th cell,

and

z_{i,t} = At time i, we are in the t-th state of Q.

(Sipser does things slightly differently, combining Q and Gamma which (to me)
seems to complicates matters...)

We now want to write a Boolean formula (in polynomial in |w| time) such
that the Non-deterministic Turing machine has an accepting computation that
terminates in m = c n^c steps.

This is tedious but not difficult.  We will give some details in class.
(Also see the textbook.)

Theorem (Slightly improved, Corollary 7.42 in the textbook): 3SAT is 
NP-complete, i.e., any problem in NP can be polynomially time reduced to 3SAT.

--------------------------

Using SAT and 3SAT (most often 3SAT), we show that SUBSET-SUM, 3COLOR,
CLIQUE, etc. are NP-complete

--------------------------

November 3: Starting the Baker-Gill-Soloway Theorem, and a discussion of
SPACE.  The B-G-S theorem says:
(1) There is an oracle A such that P^A = NP^A, and
(2) there is an oracle B such that P^B not equal NP^B.

Our Approach: part (2) is fairly tricky and far less interesting than (1).
So we will discuss part (1) fully, and perhaps prove (2) if time and energy
permit...

Remark 1: (1) is far more interesting than (2).  Reason: most people believe
that P is not NP; most people believe that if P = NP, then it is a matter
of coming up with an algorithm for SAT, 3COLOR, etc. and showing that it
runs in polynomial time (this is, of course, not necessarily the case, but
this is how most languages are shown to be in P...).
If you believe that P is not NP, you would like to give a proof.  Most
of our theorems and techniques about P, NP, TIME(g(n)), NTIME(g(n))
generalize immediately to oracle Turing machines.  So if you have a
proof than P is not NP, it had better not hold true of oracle Turing
machines (recall the "blah blah blah Turing machine blah blah" principle
discussed in class).  The fact that P^A = NP^A for some oracle, actually
any PSPACE complete problem, shows that you must be doing something that
does not generalize to Turing machines with such an oracle.

Remark 2: (2) is only interesting if you think that P = NP and that you are
going to prove this is the case using the aforementioned ideas that generalize
to Turing machines with oracles (simulation, diagonalization, etc.).  Few
people believe that P = NP, and those who do don't think you will prove this
by such methods.

Remark 3: Showing (1) will involve understanding some principles of SPACE
bounded computations, which are very different from TIME bounded computations.

Remark 4: If A is any PSPACE complete problem, then
  NP^A is contained in NPSPACE (almost immediate),
  NPSPACE is contained in PSPACE (Savitch's theorem),
  PSPACE is contained in P^A (immediate), and
  P^A is contained in NP^A (immediate).
Hence P^A = NP^A = PSPACE ( = NPSPACE ).

Remark 5: One can usually find languages that are complete for a complexity
  class for almost trivial reasons.
Example 1--a language that is NP-complete for trivial reasons: Let 
  L = { <M,i,t> where M is a non-det Turing machine, i is an input to M and 
        t is an integer given in unary (!!) such that M accepts i after
        some computation branch that halts in t steps}
  (It is crucial that t is given in unary!)
  Claim: L is NP-complete, and the proof of this is almost trivial(!).
  Proof: Discussed in class, but is very easy.
Example 2--a language that is PSPACE-complete for trivial reasons: Let 
  L = { <M,i,s> where M is a non-det Turing machine, i is an input to M and 
        s is an integer given in unary (!!) such that M accepts i after
        some computation branch that halts using at most  s  space}
  (It is crucial that s is given in unary!)
  Claim: L is PSPACE-complete, and the proof of this is almost trivial(!).
  Proof: Discussed in class, but is very easy.

Remark 6: Let A be the language in the above Example 2.  Then P^A = NP^A;
  indeed, this follows from the above remarks.

November 14: Discuss circuits and formulas.  Formulas are circuits whose
  DAG (directed acyclic graph) structure is that of a tree.

Remark: Give the functions: and, or, negation, the XOR (or parity) of n Boolean 
  variables can be computed by a circuit of size 4n-3.  However, a
  formula for XOR is known to require order n^2 size (Khrapchenko, 1971).
  (Of course, if we allow an XOR gate, we see that XOR has a formula
  of size n-1 (or n, depending on how one counts).)

Remark: Since we have  
  XOR(x_1,x_2) = ( x_1  and (not x_2) )  or  ( (not x_1) and x_2 ),
  we see that if f(n) is the formula size required for majority, then
  f(n) <=  4  f(n/2)  (perhaps plus a constant, depending on how one
  is counting), which gives  f(n) <= order(n^2).
  To prove that a formula for XOR of n variables requires  order(n^2)
  size is much more difficult.

Remark: Min circuit depth = Min formula depth  is proportional to
  Min formula size.  Min circuit size is probably more difficult to compute.

Remark: Most Boolean functions require a circuit of size order 2^n / n  (and
  size 2^n n is clearly sufficient).
We can give a proof by counting (very crudely).
Proof: If we have x_1,..,x_n and "interior vertices" y_1,...,y_m, then
  each y_i can be choosen from (1) the AND of two previous variables,
  (2) the OR of two previous variables, or (3) the negation of a previous
  variable, for a possible  2 ( (n+m) choose 2 ) + (n+m)-1.  Set 
  r = n+m.  Then we have the number of circuits of size r is at most
  (2r^2)^r (very crudely).  Since there are 2^(2^n) Boolean functions,
  the majority of Boolean functions require   r  to be any value such 
  that   (2r^2)^r < 2^(2^n)/2.  Taking logs this requires that
  r order( log r) <  2^n, which is satisfied for  r  as large as 
  order(2^n/n).