I. The perceptron Chapter 11.2 in your textbook introduces the idea of neural network learning. That chapter provides a nice explanation of how to train a collection of simulated neurons in what's called a feed-forward backpropagation network. Rather than repeat what's said in Chapter 11.2 (which you should most definitely read), I presented the evolutionary precursor to the artificial neuron, called a perceptron. A perceptron is a computing input which takes some number of inputs...each of those inputs can be a 1 or a 0. The individual inputs are multiplied by their corresponding weights, which are real numbers that can be negative or positive. The perceptron takes the sum of those products, and if the sum exceeds some threshold the perceptron outputs a 1; otherwise, the perceptron outputs a 0. What makes this little device interesting is that it can learn to respond to certain input patterns with a 1 and other input patterns with a 0. If those patterns represent positive and negative examples of some concept, then the perceptron can learn that concept...at least sometimes. The trick to perceptron learning is that when the perceptron is presented with a positive example of the concept but outputs a 0 (where a 0 means that the perceptrons is saying that the example is not representative of the concept), the weights corresponding to the non-zero inputs are increased to make the perceptron more sensitive to those inputs. Conversely, when the perceptron is presented with a negative example but outputs a 1, the weights on the non-zero inputs are decreased to make the perceptron less sensitive to those inputs. In class, we applied a perceptron to the task of learning the concept of (what else?) an arch. We created a representation language consisting of 1's and 0's, and in that language we abstracted away all but the most relevant, although simplified, features. We presented our perceptron with a series of examples and it learned to respond correctly to both positive and negative examples of archness. We successfully taught the perceptron another concept too. But as we found out, the perceptron can't learn everything we might want it to. In particular, we found out that the perceptron can't learn concepts involving the notion of exclusive-or, and the fact that it can't learn some things led to the decline of interest in perceptrons years ago. But perceptrons were resurrected with some improvements (replacing the threshold-based step function that controlled output with a continuous function, building networks of these units with multiple layers and new ideas about how to adjust the weights) and gave rise to the current excitement about neural networks. Here's the perceptron code, if you're interested. It's not CILOG. ;; very very simple perceptron learning - one cell ;; ;; wx = initial weights (define w1 '(-0.5 0 0.5 0 -0.5)) ;; range -1.0 to 1.0 (define w2 '(0.5 0.5 0.5 0.5 0.5)) ;; t = initial threshold -- the program doesn't yet know how to change this (define t 0.5) ;; range 0 to 1 - make it a float just to be safe ;; example 1 - a supports c and b supports c but a doesn't touch b (arch) (define ex1 '((#t 1 1 1 1 0) (#f 1 1 1 1 1) (#f 0 0 0 0 0) (#f 0 0 1 1 0) (#f 1 0 1 0 1) (#f 1 0 1 0 0) (#f 0 1 0 1 1) (#f 0 1 0 1 0) (#f 0 0 1 0 0) (#f 0 0 0 1 0))) ;; example 2 - a touches c and b touches c (all three blocks in contact - no islands) (define ex2 '((#t 1 1 1 1 0) (#t 1 1 1 1 1) (#f 0 0 0 0 0) (#t 0 0 1 1 0) (#f 1 0 1 0 1) (#f 1 0 1 0 0) (#f 0 1 0 1 1) (#f 0 1 0 1 0) (#f 0 0 1 0 0) (#f 0 0 0 1 0))) ;; example 3 - a touches c or b touches c but not both (exclusive-or) (define ex3 '((#f 1 1 1 1 0) (#f 1 1 1 1 1) (#f 0 0 0 0 0) (#f 0 0 1 1 0) (#t 1 0 1 0 1) (#t 1 0 1 0 0) (#t 0 1 0 1 1) (#t 0 1 0 1 0) (#t 0 0 1 0 0) (#t 0 0 0 1 0))) (define (learn_concept init_weights init_threshold init_training_set) (do ((weights init_weights) (threshold init_threshold) (examples init_training_set) (stop_training #f)) (stop_training (display "done")(newline)) (set! weights (train weights threshold examples)) (display "continue training? (y/n): ") (set! stop_training (equal? (read) 'n)) (newline) )) (define (train weights threshold examples) ;; this function could be better (do ((current_weights weights) (guess #f) (real #f) (inputs ())) ((null? examples) current_weights) (set! inputs (cdar examples)) (set! real (caar examples)) (set! guess (compute_g current_weights inputs threshold)) (set! current_weights (adjust_weights current_weights inputs guess real)) (display "inputs: ")(display inputs)(newline) (display_weights guess real current_weights) (set! examples (cdr examples)))) (define (display_weights guess real current_weights) (display "guess: ") (display guess) (display " answer:") (display real) (newline) (display_weights_2 current_weights) (newline)) (define (display_weights_2 weights) (cond [(null? weights) #t] [else (display " ") (display (car weights)) (newline) (display_weights_2 (cdr weights))])) (define (adjust_weights weights inputs guess real) ;; increment shouldn't be hard coded (cond [(equal? guess real) weights] [guess (adjust_weights_2 weights inputs -0.1)] [else (adjust_weights_2 weights inputs 0.1)])) (define (adjust_weights_2 weights inputs increment) (cond [(null? inputs) ()] [(equal? (car inputs) 1) (cons (+ (car weights) increment) (adjust_weights_2 (cdr weights) (cdr inputs) increment))] [else (cons (car weights) (adjust_weights_2 (cdr weights) (cdr inputs) increment))])) (define (compute_weighted_input weights inputs) (cond [(null? inputs) 0] [else (+ (* (car weights) (car inputs)) (compute_weighted_input (cdr weights) (cdr inputs)))])) (define (compute_g weights inputs threshold) (> (compute_weighted_input inputs weights) threshold)) II. Learning as the search for the best representation again As I said before, I'll let your textbook develop learning in neural networks from there, but whether we're talking about perceptrons or more sophisticated artificial neurons, both approaches represent what they've learned not as semantic networks or decision trees, but as a collection of weights...in other words, a representation of an arch is just a set of real numbers. And what those numbers mean may not be entirely obvious, but they were derived by searching through possible sets of weights until a set was found that resulted in all the right outputs for all the right inputs. Once again, as it has since September, it all boils down to search.
Last revised: December 7, 2004