Discussions

Compiled Messages

Message no. 1
Posted by David Poole (cpsc_422_term2) on Thursday, December 23, 2004 1:06pm
Subject: Welcome to CPSC 422

We will only be using WebCT for discussions and for seeing grades. See
the standard web page for all other information (e.g., assignments).

I hope you enjoy the course.

David

Message no. 2
Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 1:50pm
Subject: queries proved every time point (goals?)

I've been playing with the robot control program for the first assignment. It seems like 
the robot tries to prove the same queries at each time step, specifically:
1. assign(robot_pos,(X,Y),T).
2. assign(to_do,R,T).
3. assign(goal_pos, Coords, T).
4. assign(compass,C,T).

Is this actually what is happening? In other words, is the robot actually trying to prove 
these at each time step? And, is proving them what makes the robot "go" (it certainly 
seems to be)?

My first intuition of how to approach question 1 involved changing one of these queries, 
or at least adding to this set. Is this even possible? How is this set of goal (is that the 
right word?) queries determined?

My Prolog is a bit weak so apologies if the answers to these are obvious or fully covered 
in 312.

Dave

Message no. 3
Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 7:37pm
Subject: using val/3 is mangling unrelated lists

Hi

I know that Dr. Poole mentioned in class today that we didn't need to use val, but I found 
that in answering question 1 assign(goal_pos,_,_) and assign(to_do,_,_) were too closely 
intertwined to get away with only using was/4. One needs to see the consequences of the 
other.

Unfortunately, when I use val/3 it is somehow mangling unrelated lists in some way. For 
example, when starting out my to do list looks like
[goto(@N1), goto(@N2), goto(@N3), goto(@N4), goto(@N5), goto(@N6)]
but after successfully arriving at N1, my to_do list becomes
goto(@N2)[goto(@N3), goto(@N4), goto(@N5), goto(@N6)]

Note the position of the "["! And I am not even using val/3 to assign to_do, only to assign 
goal_pos. Unfortunately the mangled list is no longer recognized by any of the clauses 
and the robot's brain dies.

Avoiding the use of val/3 entirely results in a nice tidy list deconstruction
[goto(@N1), goto(@N2), goto(@N3), goto(@N4), goto(@N5), goto(@N6)]
[goto(@N2), goto(@N3), goto(@N4), goto(@N5), goto(@N6)]
[goto(@N3), goto(@N4), goto(@N5), goto(@N6)]
But this approach has other problems.

I can see workarounds, but they're very untidy and don't seem to be in the spirit of the 
assignment. 

I don't know how CILog stores data, but could the tricky way of val/3 guaranteeing things 
are only calculated once be calculating and storing them incorrectly?

Please fix val/3. (code available if it'll help you fix val)

Dave

Message no. 4
Posted by Christopher John Hawkins (s93985018) on Tuesday, January 11, 2005 9:09pm
Subject: problems with fluents

i am having a hard time getting the robot simulator to recognize fluents
that i have created.  i tried the code "assign(time,T,T) <- arrived(T)"
as a test and it was never executed.  is there something i am missing
here (it's been a while since i took 312 and my prolog is rusty)?

thanks!

Message no. 5[Branch from no. 4]
Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 9:55pm
Subject: Re: problems with fluents

This is rougly what I was trying to do when I discovered that only those 4 assigns() are 
proven at each time point (see the queries proved every time thread). And I can't figure 
out how to sneak an assign() in to the body of another assign and still have things make 
sense. This at least would cause it to be evaluated.

Dave

Message no. 6[Branch from no. 2]
Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:27pm
Subject: Re: queries proved every time point (goals?)

In message 2 on Tuesday, January 11, 2005 1:50pm, David Burns Cameron
writes:
>I've been playing with the robot control program for the first
assignment. It seems like 
>the robot tries to prove the same queries at each time step, specifically:
>1. assign(robot_pos,(X,Y),T).
>2. assign(to_do,R,T).
>3. assign(goal_pos, Coords, T).
>4. assign(compass,C,T).

>Is this actually what is happening? In other words, is the robot
actually trying to prove 
>these at each time step? 

Yes. That is how it implements its belief state.

>And, is proving them what makes the robot "go" (it certainly 
>seems to be)?

AT the bottom level it just calls the compass and goasl_pos and plots them.

>My first intuition of how to approach question 1 involved changing one
of these queries, 
>or at least adding to this set. Is this even possible? How is this set
of goal (is that the 
>right word?) queries determined?

Think about what it has to remember. And then what it has to do based on
what it remembers.

>My Prolog is a bit weak so apologies if the answers to these are
obvious or fully covered 
>in 312.

They were not covered in 312. But that course gives the logic
programming familiarity that I am assuming.

>Dave
>

David

Message no. 7[Branch from no. 3]
Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:29pm
Subject: Re: using val/3 is mangling unrelated lists

We will have a new version of the controller that doesn't use val
available tonight.

If you really need val, you can write it yourself. I didn't use val in
my solution.

David

Message no. 8[Branch from no. 5]
Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:30pm
Subject: Re: problems with fluents

In message 5 on Tuesday, January 11, 2005 9:55pm, David Burns Cameron
writes:
>This is rougly what I was trying to do when I discovered that only
those 4 assigns() are 
>proven at each time point (see the queries proved every time thread).
And I can't figure 
>out how to sneak an assign() in to the body of another assign and still
have things make 
>sense. This at least would cause it to be evaluated.

It proves all assigns at each time step.

David

Message no. 9[Branch from no. 4]
Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:37pm
Subject: Re: problems with fluents

In message 4 on Tuesday, January 11, 2005 9:09pm, Christopher John
Hawkins writes:
>i am having a hard time getting the robot simulator to recognize fluents
>that i have created.  i tried the code "assign(time,T,T) <- arrived(T)"
>as a test and it was never executed.  is there something i am missing
>here (it's been a while since i took 312 and my prolog is rusty)?

Why do you think it was never executed? It works for me. I had the
appropriate assign in the controller log.

David

Message no. 10[Branch from no. 6]
Posted by David Poole (cpsc_422_term2) on Tuesday, January 11, 2005 10:41pm
Subject: Re: queries proved every time point (goals?)

In message 6 on Tuesday, January 11, 2005 10:27pm, David Poole writes:

>Yes. That is how it implements its belief state. 

It proves assign(F,V,T) for every time T. For every solution it
remembers the value proved. Then it plots the compass and the robot_pos.

David

Message no. 11[Branch from no. 9]
Posted by Christopher John Hawkins (s93985018) on Tuesday, January 11, 2005 10:54pm
Subject: Re: problems with fluents

when i run it, the assign does not show up in the controller log.  maybe
i should reinstall the simulator.

Message no. 12[Branch from no. 10]
Posted by David Burns Cameron (s66878984) on Tuesday, January 11, 2005 11:09pm
Subject: Re: queries proved every time point (goals?)

>It proves assign(F,V,T) for every time T. For every solution it

It works for arbitrary F now. I'm very sure that it didn't before. I
notice the comments are gone. Maybe there was a glitch in the old
controller code that was corrected at the same time?

Dave

Message no. 13[Branch from no. 11]
Posted by Christopher John Hawkins (s93985018) on Tuesday, January 11, 2005 11:19pm
Subject: Re: problems with fluents

if i access the applet directly through the CIspace webpage the code
works fine but it will NOT run with the simulator compiled for Win32.

another quick question; what is the syntax for negation or NAF?  

thanks, CJ

Message no. 14
Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 12:19pm
Subject: Assignment 1 notes

We have fixed up the controller. Please make sure you are using version
4.6 (you may need to clear the cache in your browser).  This is also
available for a download.

We removed "val" from the controllers. Please don't use it; you can
always write your own equivalent function. Also, don't use a predicate
name that starts with "val".

The solution to question 1 is straightforward. You have to think about
"what should the agent remember?" All of the problems I have seen are
because the proposed solution gets the agent to remember too much (and
they interfere with each other, usually resulting in two locations being
taked off the to_do list).

There is negation as failure in the controller, but you don't need to
use it. (I didn't in my solution).

The documentation is wrong about built-in arithmetic. The current
version supports "is" and comparisons such as "<".

There may still be a problem with a solution to 2a. I will post a
revision to that. (I will not change the question, but will change the
way it is tested).

This assignment is not designed to be really difficult. If you are
having real problems, ake a step back and think about the two questions:
what should the agent remember and what what should the agent do. Have fun!

That's all I can think of for now. Please post your questions here and
we will do our best to answer them. 

David

Message no. 15
Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 12:42pm
Subject: Alternate question 2(a)

Here is a revised question 2(a) that you can optionally do instead of
the current question.

Suppose you get a job with "Future Software for Flakey Robots
Inc". Your job is to write a controller for a robot that has not yet
been released. You have a prototype for the robot, but that is
buggy. You do, however, have a specification of how it is supposed to
work. You need to be able to deliver a controller that will work when
a robot that fulfills the specification comes available.
Unfortunately, you won't be able to do any empirical tests on how it
compares to the existing robot until the robot is released.

(a) Change the CIspace controller so that it is opportunistic; when it
    selects the next location to visit, it selects the location that
    closest to its current position. It should still visit all of the
    locations.

(b) Explain how you could test your controller. While the robot
    controller may be buggy, you can assume that it implements that same
    language as CILog and that CILog is not buggy. What queries could
    you ask in CILog that will convince you (and your boss) that your
    controller will work in the to-be-released controller. You need to
    explain (in English) why you think that your tests are adequate.

(c) Show your tests in CILog.

(d) Try your controller in the existing applet. Why does it not work?
    (Be as specific as you can to pinpoint exactly what the bug (or
    bugs) in the current applet is.)

Note that, we are not assuming you solution will not work in the
current applet. We have a solution that doesn't. There may be
solutions that do work.

Message no. 16
Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 4:29pm
Subject: TA office hours

Frank Hutter's office hours are rugularly at Wednesdays 1:30-2:30 in
room CICSR 341. He will have an extra office hour this week on Friday
from 2:00-3:00.

Michael Chiang's office hours are regularly on Thursdays 12:50-1:50
(i.e., the hour before class) in the CICSR atrium.  He will hold a
special office hour on Monday from 2-3.

David

Message no. 17
Posted by Robert McGregor (s92140011) on Wednesday, January 12, 2005 9:06pm
Subject: Problems with controller

Hi,

I created a simple controller similar to the one in class that turns on a "follow_wall" 
variable when a wall is hit.  I can run it in the online applet, but when I try to copy/paste 
in the identical program from notepad, it will not work as expected.

In the debugger for the simple controller, it tries to evaluate the expression:
assign(follow_wall,Var1,T)

but in the copy/pasted one it tries to evaluate:
assign(follow_wall,Var1,Var2)

(which of course is true since    assign(follow_wall,off,0)    is a fact)

So it can never assign follow_wall to on because it will always assign it to off first.

I sort of solved the problem once by restarting the applet and typing it all bit by bit 
(starting from scratch each time).

Bob

Message no. 18[Branch from no. 17]
Posted by David Poole (cpsc_422_term2) on Wednesday, January 12, 2005 10:15pm
Subject: Re: Problems with controller

I am not sure what you are asking, but if you replace the code for steer in
the middle level controlle with the following it will follow a wall when it hits one.
Does this work for you?

steer(D,T) <-
   was(wallfollow,off,_,T) &
   whisker_sensor(off,T) &
   goal_is(D,T).

steer(left,T) <-
   whisker_sensor(on,T).

steer(right,T) <-
  was(wallfollow,on,_,T) &
  whisker_sensor(off,T).

assign(wallfollow,on,T) <-
   whisker_sensor(on,T) &
   was(wallfollow,off,T1,T).

assign(wallfollow,off,0.0).

Message no. 19[Branch from no. 13]
Posted by Wing Hang Chan (s84098011) on Friday, January 14, 2005 9:55am
Subject: Re: problems with fluents

Ah I've been using the win32 executable...thanks for the hint.

Message no. 20[Branch from no. 19]
Posted by David Poole (cpsc_422_term2) on Friday, January 14, 2005 10:07am
Subject: Re: problems with fluents

In message 19 on Friday, January 14, 2005 9:55am, Wing Hang Chan writes:
>Ah I've been using the win32 executable...thanks for the hint.

Make sure you are using version 4.6 of the applet (look at the header of
the applet itself; it looks like the web page version number has not
been updated). If you downloaded it a few days ago, you may have to
reload it again.

David

Message no. 21
Posted by Danelle Abra Wettstein (s86800018) on Saturday, January 15, 2005 2:20pm
Subject: Java errors??

I'm getting a StringIndexOutOfBoundException when I try running the applet with the code I entered. 
There aren't even any strings in what I wrote, and why am I getting a JAVA error??? Aughh... this 
assignment may not be meant to be difficult, but it's sure got me turned inside out and upside down.

Message no. 22[Branch from no. 21]
Posted by Christopher John Hawkins (s93985018) on Saturday, January 15, 2005 3:29pm
Subject: Re: Java errors??

I got these messages  but only when i had a syntax error in my prolog
code, usually a missing "." or  "&".

Message no. 23[Branch from no. 21]
Posted by David Poole (cpsc_422_term2) on Saturday, January 15, 2005 8:04pm
Subject: Re: Java errors??

I have never seen this error. 

Remember that the controller is supposed to use CILog syntax.  Try
loading your controller into CILog and seeing if you get an error. The
"check" command in CILog is very useful as it tells you what  you need
to add to test your code.

David

Message no. 24[Branch from no. 21]
Posted by David Burns Cameron (s66878984) on Sunday, January 16, 2005 4:05pm
Subject: Re: Java errors??

In message 21 on Saturday, January 15, 2005 2:20pm, Danelle Abra
Wettstein writes:
>StringIndexOutOfBoundException when I try running the applet with the
code I entered. 

I saw that last night alot, along with NullPointerExceptions. They all
seemed to be caused by syntax errors. "," instead of "." Missing ")" and
all that.

Dave

Message no. 25[Branch from no. 23]
Posted by Danelle Abra Wettstein (s86800018) on Sunday, January 16, 2005 6:31pm
Subject: Re: Java errors??

What is the command to open cilog (it's been too long)?

Message no. 26[Branch from no. 25]
Posted by David Poole (cpsc_422_term2) on Sunday, January 16, 2005 9:20pm
Subject: Re: Java errors??

In message 25 on Sunday, January 16, 2005 6:31pm, Danelle Abra Wettstein
writes:
>What is the command to open cilog (it's been too long)?

See: http://www.cs.ubc.ca/spider/poole/ci/code/cilog/cilog_man.html
CILOG User Manual; Section 1 says how to get it and use it.  Just download
http://www.cs.ubc.ca/spider/poole/ci/code/cilog/cilog_swi.pl
and load it using SWI prolog.

David

Message no. 27
Posted by David Burns Cameron (s66878984) on Monday, January 17, 2005 6:51pm
Subject: CISpacecraft

In question 2aa,

How can we go about maintaining the to_do list after the best goal has
been chosen. A single predicate that calculates the best goal and the
new list, for example

% closest is true if Bestgoal is a goal that is closer than all of the
goals on Goals. But where
% did Bestgoal come from? It couldn't have come from Goals, as would be
possible in an
% imperative language.
closest(Bestgoal,Goals,T).

thanks

Message no. 28[Branch from no. 27]
Posted by David Poole (cpsc_422_term2) on Monday, January 17, 2005 8:33pm
Subject: Re: CISpacecraft

In message 27 on Monday, January 17, 2005 6:51pm, David Burns Cameron
writes:
>In question 2aa,
>
>How can we go about maintaining the to_do list after the best goal has
>been chosen. A single predicate that calculates the best goal and the
>new list, for example
>
>% closest is true if Bestgoal is a goal that is closer than all of the
>goals on Goals. But where
>% did Bestgoal come from? It couldn't have come from Goals, as would be
>possible in an
>% imperative language.
>closest(Bestgoal,Goals,T).

I don't understand your question.  You take in a list and return the
smallest element and the rest of the list.  (Which imperative languageis
it easier in?)  I could even imagine asking this in the first CPSC312
assignment.

David

Message no. 29
Posted by Ryan Yee (s81483042) on Monday, January 17, 2005 8:45pm
Subject: Gripes with cilog.

Some things I learned while doing Assignment 1.

Comments are buggy. A percent (comment) on the end of the line causes a
syntax error (huh what?!). Seriously.

some_predicate(stuff_here). %

Furthermore, where are the cuts and semicolons?! How else can we do
mutual exclusion without rewriting vast blocks of code? (i.e. why can't
cilog be more like prolog?)

And lastly, debugging is a PITA with the applet. When a syntax error is
made, the entire goal + body is transformed into one long line (which
usually spans wider than the screen). Kinda discourages me from using
meaningful names.

Message no. 30
Posted by Christopher John Hawkins (s93985018) on Monday, January 17, 2005 9:43pm
Subject: The followwall example

Was the code from the followwall example given in class last Tuesday
(11th) posted online anywhere?

Thanks!

Message no. 31[Branch from no. 28]
Posted by David Burns Cameron (s66878984) on Monday, January 17, 2005 9:50pm
Subject: Re: CISpacecraft

It seems like you're suggesting

closest(Goals, BestGoal, GoalsThatAreNotBest, T)

which we tried. During the recursive call CILog was backing out like a
Mac truck in a runaway lane whenever it needed to assign a value to
BestGoal. It seemed that BestGoal needed to be known before any two
goals could be compared (since its position needed to be known for
comparison), and at the same time couldn't be known until it had been
compared to the other goals in Goals.

Would a "helper rule" be appropriate here so a worklist could be used?

Dave

PS, it's easier (for me) in just about every imperative language since
closest(BestGoal, Goals, T)
could use pass-by-reference/in-out parameter semantics for Goals. On the
way in it would be the complete list, on the way out the diminished list.

Message no. 32[Branch from no. 30]
Posted by David Poole (cpsc_422_term2) on Monday, January 17, 2005 10:44pm
Subject: Re: The followwall example

In message 30 on Monday, January 17, 2005 9:43pm, Christopher John
Hawkins writes:
>Was the code from the followwall example given in class last Tuesday
>(11th) posted online anywhere?
>
>Thanks!

Yes. It was posted on the bulletin board under  "Problems with controller".

David

Message no. 33[Branch from no. 31]
Posted by David Poole (cpsc_422_term2) on Monday, January 17, 2005 11:00pm
Subject: Re: CISpacecraft

In message 31 on Monday, January 17, 2005 9:50pm, David Burns Cameron
writes:
>It seems like you're suggesting
>
>closest(Goals, BestGoal, GoalsThatAreNotBest, T)
>
>which we tried. During the recursive call CILog was backing out like a
>Mac truck in a runaway lane whenever it needed to assign a value to
>BestGoal. 
I have no idea what that means. Is a Mac truck (in this context) like a
MS truck that just crashes less often?

>It seemed that BestGoal needed to be known before any two
>goals could be compared (since its position needed to be known for
>comparison), and at the same time couldn't be known until it had been
>compared to the other goals in Goals.

Did you try it in CILog itself (see the message "Alternate question
2(a)")?  It is easier to debug there.

>Would a "helper rule" be appropriate here so a worklist could be used?

Perhaps. This is a pretty basic 312-like question. (It would even be a
simple assignment in CPSC 124 when we offered that).

>Dave
>
>PS, it's easier (for me) in just about every imperative language since
>closest(BestGoal, Goals, T)
>could use pass-by-reference/in-out parameter semantics for Goals. On the
>way in it would be the complete list, on the way out the diminished list.

That is almost always a *bad* thing to do, as you lose the previous list
of Goals. What if you add to the program and need to know what used to
be the goal? What if you are not actually going to do it, but are just
thinking about what would happen if you did do it? Side effects are
generally bad things in programs unless there is no way to get around
it; they prevent sensible debugging (where you know what the symbols
mean) and most optimizations that you would like to be carried out to
your code. But that is an entirely different debate.

David

Message no. 34
Posted by Michael Chiang (s27992023) on Wednesday, January 19, 2005 1:11pm
Subject: TA hours (Michael C.)

Hi everyone,

This is just to inform you that I will be moving the location of my TA
consultation hour from the CICSR atrium to lab #106, which is only
accessible from the atrium itself. The time will be unchanged, Thursdays
12.50pm ~ 1.50pm. 

Thanks,
Michael

Message no. 35
Posted by David Poole (cpsc_422_term2) on Friday, January 21, 2005 8:01pm
Subject: Updated notes on decision-theoretic planning and reinforcemnt learning

There are some more notes available from:
http://www.cs.ubc.ca/spider/poole/cs422/2005/slides.html

On Tuesday, I am planning on continuing to talk about reinforcement
learning. It is really important that you understand the basics, so
please come with lots of questions unless it is all very obvious to you.

David

Message no. 36
Posted by David Poole (cpsc_422_term2) on Sunday, January 23, 2005 10:17pm
Subject: Assingmnet 1 solution

A solution to assignment 1 (the controller changes) are posted to the
course home page:
http://www.cs.ubc.ca/spider/poole/cs422/2005/#assignments

David

Message no. 37
Posted by David Poole (cpsc_422_term2) on Sunday, January 23, 2005 10:20pm
Subject: Assignment 2

There will be bonus marks to the students who give the smallest number
of states for question 1 (assuming that it is correct).

Think about it!

David

Message no. 38
Posted by Wing Hang Chan (s84098011) on Friday, January 28, 2005 12:10am
Subject: assignment 2: simulating random rewards

Hi,

I am having trouble simulating the rewards at each of the corners of the
grid.
The assignment says that they appear with a probability of 0.2, but how
do we simulate that?

would it be:
discount * 10 * 0.2 ?

or am I totally wrong. 

Thanks in advance.

Message no. 39[Branch from no. 38]
Posted by Frank Hutter (s62336011) on Saturday, January 29, 2005 5:34pm
Subject: Re: assignment 2: simulating random rewards

I misread this part as well the first time, but the wording is actually
pretty clear:

"When there is no treasure, at each time step, there is a probability P1
= 0.2 that a treasure appears, and it appears with equal probability at
each corner."

(there is a probability 0.2 for it to appear and if it appears, this
happens with probability 0.25 in any of the corners)

That means, there can be only one treasure at a time, or no treasure at all.
Once a treasure appeared, it remains there until it is collected (and
there are no other treasures until this one is collected)

This domain is fully observable, i.e. you know at each time step whether
there is a treasure or not and if there is one where it is. You also
know where you are. (but you can't plan ahead deterministically due to
the randomness in the actions)

Message no. 40
Posted by Frank Hutter (s62336011) on Saturday, January 29, 2005 5:55pm
Subject: Some clarifications on assignment 2

I got a few questions about assignment 2 in my office hour, so here's
some general clarifications.

Question 1 is very important, so think about it before you start
programming something. 
The robot can be in any of the 25 fields; the treasure can be in any of
the 4 corners or not there at all; each of the monsters can check or not
check. Which of these do you need to represent as states, and which ones
do you get around ? Recall that the monsters do not move and that they
check independently of earlier checks.
Also, you possibly can exploit a lot of symmetries in the domain which
reduces the number of states. There will be a bonus on the lowest
correct number of states. (but if you do fancy things make sure to
explain them well and not just give a number)

I guess the standard solution is easiest to implement (I did it this
way, too), but if you exploit the symetries in a clever way the problem
may actually get small enough to do it by hand. You're totally not
required to hand in any code, just the optimal policy, i.e.
which action to perform in each state (not just in every field, the
states subsume more than that!).

In my office hour, I advised people to check out the Java code for the
applet and possibly use this implementation of value iteration as a code
base. After implementing it myself I don't really see the need for this
anymore. The algorithm can be written in a few lines anyways and the
applet code has a lot of special purpose elements to it, so don't get
hung up with it. Micheal Chiang (the other TA) e.g. uses Matlab. Again,
you may be able to do it by hand, too ...

Hope that helps,
Frank

Message no. 41
Posted by Guan Wang (s77942019) on Sunday, January 30, 2005 9:04am
Subject: optimal policy

For the optimal policy can I assume that the treasure appears in just
one corner, and show the actions based on that assumption?

Message no. 42[Branch from no. 41]
Posted by Guan Wang (s77942019) on Sunday, January 30, 2005 9:18am
Subject: Re: optimal policy

... but I guess that means that there's more than one optimal policy? Is
this correct?

Thanks

Message no. 43[Branch from no. 39]
Posted by Wing Hang Chan (s84098011) on Sunday, January 30, 2005 12:21pm
Subject: Re: assignment 2: simulating random rewards

Ooh thanks for clearing that up.  So how would we implement that in our
Q function?
I am guessing that we don't multiply the reward of 10 by 0.2 and 0.25,
as that would make the reward value very small.

Thanks in advance.

Message no. 44[Branch from no. 42]
Posted by Frank Hutter (s62336011) on Sunday, January 30, 2005 6:45pm
Subject: Re: optimal policy

> For the optimal policy can I assume that the treasure appears in just
one corner, and show the actions based on that assumption?

The treasure is always in only one corner (or not there).
In every possible state, the robot needs to know what to do (that's what
a policy is for).
If you can treat the case "treasure in some corner" in one by using
symetry that's fine.

(In general you can't do this though. 
If the treasure appears in, say the lower left corner and you go there,
you already need to take into respect that it could next appear in the
opposite corner or in one of the adjacent corners, so you need to factor
in future uncertainty. 
But don't bother with this comment for the assingment)

> ... but I guess that means that there's more than one optimal policy? Is
this correct?
There is only one optimal policy.
Think about what's part of the state !

Frank

Message no. 45[Branch from no. 43]
Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 5:51pm
Subject: Re: assignment 2: simulating random rewards

> I am guessing that we don't multiply the reward of 10 by 0.2 and 0.25,
as that would make the reward value very small.

If the treasure is in some corner and you know that, then it's gonna
stay there until you get there. So you have the full reward in that case.

If there is no treasure you need to reason about all possible future
states at once. Here, for each corner you have a 0.2*0.25 = 0.05
probability for the treasure to appear there, and you're reasoning about
all future states at once by weighting them by the probability of
getting there.

Frank

Message no. 46
Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 6:54pm
Subject: Re: CPSC 422 A2 Question

[I'm posting this email question I got and my answer here. Please ask
your questions here, not via email]

> I do not understand how the optimal policy relates to the position of 
> the treasure. It seems to me that the optimal action at any square 
> depends on the current location
> of the treasure, but that would mean that there isn't one optimal 
> policy. Do I want 4 arrays of qvalues, one for each possible position of 
> treasure, or what am I misunderstanding?
> Does it follow the same policy not planning based on the location of the 
> treasure, and in this way form a plan for what is generally best?
> 
> Thanks,
> Blake

You're misunderstanding the difference between a robot position and a state.
The state can subsume much more, whatever you need.
You want one utility value for each state, and one Q value for each
state-action pair.
You can e.g. code the states as a multidimensional array, where each
dimension defines some part of the state. For Q, the action is yet
another dimension.
Noone holds you back from including where the treasure is in your state.
Then the state  is different from
the state , and you're all good.

Frank

Message no. 47
Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 7:03pm
Subject: Re: CPSC 422 A2 Question

[Sorry for the double posting, I first posted to Main by accident]

[I'm posting this email question I got and my answer here. Please ask
your questions here, not via email]

> I do not understand how the optimal policy relates to the position of 
> the treasure. It seems to me that the optimal action at any square 
> depends on the current location
> of the treasure, but that would mean that there isn't one optimal 
> policy. Do I want 4 arrays of qvalues, one for each possible position of 
> treasure, or what am I misunderstanding?
> Does it follow the same policy not planning based on the location of the 
> treasure, and in this way form a plan for what is generally best?
> 
> Thanks,
> Blake

You're misunderstanding the difference between a robot position and a state.
The state can subsume much more, whatever you need.
You want one utility value for each state, and one Q value for each
state-action pair.
You can e.g. code the states as a multidimensional array, where each
dimension defines some part of the state. For Q, the action is yet
another dimension.
Noone holds you back from including where the treasure is in your state.
Then the state  is different from
the state , and you're all good.

Frank

Message no. 48[Branch from no. 47]
Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 7:08pm
Subject: Re: CPSC 422 A2 Question

Ups, 
apparently WebCT doesn't like it when you enclose text with the keys
"smaller" and "larger".
I did this with my states. The last sentence was supposed to say:
"Then the state [robot in square X, treasure in square Z1] is different from
the state [robot in square X, treasure in square Z2], and you're all good."

Frank

Message no. 49
Posted by Frank Hutter (s62336011) on Monday, January 31, 2005 7:14pm
Subject: Change of office hour

Sorry, I need to reschedule my office hour.
(One of the reading groups I am attending now got scheduled exactly for
that time slot.)

My new slot is also on Wednesdays, but now from 9:50 to 10:50 am.
This is only gonna be updated on the course webpage by next week when
David returns.

Sorry for any inconvenience !
Frank

Message no. 50[Branch from no. 45]
Posted by Danelle Abra Wettstein (s86800018) on Monday, January 31, 2005 9:52pm
Subject: Re: assignment 2: simulating random rewards

>If there is no treasure you need to reason about all possible future
>states at once. 

What does this mean? Reason about all possible future states at once? That sounds scary, and like a 
lot to do in one step.

>probability for the treasure to appear there, and you're reasoning about
>all future states at once by weighting them by the probability of
>getting there.

Also confused about what this means... reasoning about all future states at once by weighting them by 
the probability of getting there.

Clarification?

TIA

Message no. 51[Branch from no. 44]
Posted by Vivian Luk (s82215013) on Monday, January 31, 2005 10:50pm
Subject: Re: optimal policy

I'm a little confused as to how/what format should the optimal policy be
expressed (couldn't find any examples in our notes).  Would inference
rules suffice?

Thanks.

Message no. 52[Branch from no. 51]
Posted by Samuel Douglas Davis (s85850014) on Monday, January 31, 2005 11:56pm
Subject: Re: optimal policy

I would also like some clarification on this, if it's not too late.
Right now I have a *very* crude ascii art output of a multidimensional
array (which I suppose I could hand-copy to make clearer), along the
lines of the VI applet. Is that what we were expected to produce? Simply
making a huge list of rules mapping from states to actions seems like a
really bad idea. I'm tempted to hand in my Java code too, just in case.

Message no. 53
Posted by David Poole (cpsc_422_term2) on Tuesday, February 1, 2005 6:25am
Subject: I am not here

Greetings from Germany.
I forgot to tell everyone that I won't be in my office hour today. Alan
Mackworth will be be teaching the classes for this week.

When I get back next week I will post a message saying what will be on
the midterm.

I hope you had fun with the assignment, and learnt lots,
David

Message no. 54[Branch from no. 52]
Posted by Frank Hutter (s62336011) on Tuesday, February 1, 2005 12:44pm
Subject: Re: optimal policy

An ascii output of what to do in which state is good.
I've also done it like this, along the lines of the applet, but with ascii

If you want, you can also write the rules down in English (e.g. "if
there is no treasure, move straight away from walls that are not
corners", but they need to cover all states for full marks)

Frank

Message no. 55[Branch from no. 50]
Posted by Frank Hutter (s62336011) on Tuesday, February 1, 2005 12:47pm
Subject: Re: assignment 2: simulating random rewards

> Also confused about what this means... reasoning about all future
states at 
> once by  weighting them by the probability of getting there.
> Clarification?

Check out the classnotes. This is just in English language what the
formula for the update of the Q-values is saying.

Frank

Message no. 56
Posted by David Poole (cpsc_422_term2) on Sunday, February 6, 2005 8:40pm
Subject: Slides page has been updated

The slides page at
http://www.cs.ubc.ca/spider/poole/cs422/2005/slides.html
has been updated.

Note that there is a new version of the reinforcement learning draft
notes. If you have any comments or questions about these, please ask.
There are planned to be expanded into part of a new chapter of the
second edition of our book; I'm happy to exand parts that you think need
to be exanded now. So let me know what you want to be explained better!

Also note that the tentative date for the midterm was 22 Feb, and that
would be confirmed two weeks before. So let's discuss it on Tuesday.  I
will have a more detailed outline of what will be on the midterm later
this week.

David

Message no. 57
Posted by Danelle Abra Wettstein (s86800018) on Sunday, February 6, 2005 10:38pm
Subject: Chapters

What chapters in the textbook should we have read by now?

Thanks.

Message no. 58[Branch from no. 57]
Posted by David Poole (cpsc_422_term2) on Monday, February 7, 2005 5:47pm
Subject: Re: Chapters

In message 57 on Sunday, February 6, 2005 10:38pm, Danelle Abra
Wettstein writes:
>What chapters in the textbook should we have read by now?

We have covered or will have covered by the midterm:
Chapter 12
Notes on decision-theoretic planning (see slides page)
Notes on reinforcement learning (see slides page)
Sections 6.3-6.6
Section 7.3
Chapter 9

David

Message no. 59
Posted by Stephen Shui Fung Mak (s36743003) on Monday, February 7, 2005 11:32pm
Subject: MT and Assignment question

1) Will there be a practice MT posted before the study break?
2) When will we get back our assignment?
3) Will there be office hour (or TA office hour) on Feb 21 Monday?

Thanks,

Stephen

Message no. 60[Branch from no. 59]
Posted by David Poole (cpsc_422_term2) on Wednesday, February 9, 2005 10:50am
Subject: Re: MT and Assignment question

In message 59 on Monday, February 7, 2005 11:32pm, Stephen Shui Fung Mak
writes:
>1) Will there be a practice MT posted before the study break?

Yes.

>2) When will we get back our assignment?

Thursday (tomorrow) for both assignments.

>3) Will there be office hour (or TA office hour) on Feb 21 Monday?

The midetem is now on Thursday 24th. There will be office hours on the
Tuesday, Wednesday and Thursday.

David

Message no. 61
Posted by David Poole (cpsc_422_term2) on Wednesday, February 9, 2005 12:23pm
Subject: Midterm, new date

As discussed in class yesterday, the midterm will now be on Feb 24th.

David

Message no. 62
Posted by Danelle Abra Wettstein (s86800018) on Wednesday, February 9, 2005 8:38pm
Subject: Assignment 3, Question 1

Hi,

I don't feel like reading the notes (or the "rough notes") has helped me understand how 
we're supposed to do the assignment. Could you give an example as to how this question 
should be done?

TIA

Message no. 63[Branch from no. 62]
Posted by David Poole (cpsc_422_term2) on Thursday, February 10, 2005 1:00pm
Subject: Re: Assignment 3, Question 1

In message 62 on Wednesday, February 9, 2005 8:38pm, Danelle Abra
Wettstein writes:
>Hi,
>
>I don't feel like reading the notes (or the "rough notes") has helped
me understand how 
>we're supposed to do the assignment. Could you give an example as to
how this question 
>should be done?
>
>TIA

For question 1,

In the notes, (p 411, bottom) is the formula that specifies when it is
guaranteed to converge in theory. Hint: 1/k is guaranteed to converge. 
 {It is only "in theory" because it only guarantees convergence
eventaully, it does not guarantee how long it takes.). Some of the ways
to vary alpha follow these conditions and some do not.

Run the applet to find the answer to (b). You may have to change a line
in the code to test the 10/(9+k), but I presume you can all read Java. I
wanted you to get to look at the code, (the core of the algorithm
happens at the bottom of do-step).

For part (c), you just have to think about it.

As to question 2&3, I'd suggest doing question 2 only if you want to get
your hands dirty in Java code. 90% of the code is the UI which doesn't
need to be changed. The rest is as given in the notes.

Question 3 is a matter of getting you to think about the qualitative
notion of the algorithm. It is an exteremely simple problem. The mubers
are chosen so that it takes a long time to jump out of the states with a
low probability of exiting; just think about what happens eventually.
This is one of theose "ah ha" problems, where being able to explain it
will make sure you understand it.

I hope that helps. Please ask us if you have more specific questions.

David

Message no. 64[Branch from no. 63]
Posted by Robin McQuinn (s12331039) on Tuesday, February 15, 2005 12:14am
Subject: Re: Assignment 3, Question 1

for question 3, aka 2b (I hope) it seems as if you're saying that the
actual formula/algorithm to calculate Q(lambda) does not need to be
known to answer the question,  yet I find it important, in as much as I
don't quite understand how it works.  I've looked through the draft
notes (which seems to be the only place where this is covered) where the
concepts are flesched  much more thoroughly than in class.   The concept
of the eligibility trace, is also only covered in the draft notes,
strangely. 

The closest that the description comes to deriving an algorithm for
calculating Q(lambda) is at the top of page 417.  This algorithm only
obscures the concept more, which leads me to think that I'm barking up a
tree a mile from the explanation.  

More specifically, my question is, to calculate an eligibility trace
from a state-action pair, are all possible future paths accounted for? 
and if so, how?  simply added?

I'm also wondering if the draft notes are intended for publication in
the next version of the book, because there are some minor spelling errors.

Message no. 65[Branch from no. 60]
Posted by Robin McQuinn (s12331039) on Tuesday, February 15, 2005 12:18am
Subject: Re: MT and Assignment question

I'm just wondering if discussing the posted midterm here is sanctioned?

Message no. 66[Branch from no. 64]
Posted by David Poole (cpsc_422_term2) on Tuesday, February 15, 2005 10:46am
Subject: Re: Assignment 3, Question 1

In message 64 on Tuesday, February 15, 2005 12:14am, Robin McQuinn writes:
>for question 3, aka 2b (I hope) it seems as if you're saying that the
>actual formula/algorithm to calculate Q(lambda) does not need to be
>known to answer the question,  yet I find it important, in as much as I
>don't quite understand how it works.  I've looked through the draft
>notes (which seems to be the only place where this is covered) where the
>concepts are flesched  much more thoroughly than in class.   The concept
>of the eligibility trace, is also only covered in the draft notes,
>strangely. 

>The closest that the description comes to deriving an algorithm for
>calculating Q(lambda) is at the top of page 417.  This algorithm only
>obscures the concept more, which leads me to think that I'm barking up a
>tree a mile from the explanation.  

Use the SARSA(lambda) algorithm for this question. The one that is given
on page 418 of the notes. 

I was going to describe only Q(lambda) but for Q(lambda)  the
mathematics doesn't work for Q(lambda) as you are using different values
for V(s) in different parts of the algorithm (one for the best action
and one for the action the agent is actually following). You only need
to know about SARSA(lambda).

>More specifically, my question is, to calculate an eligibility trace
>from a state-action pair, are all possible future paths accounted for? 
>and if so, how?  simply added?

YES!

Think of it this way, imagine there was a infinite sequence of people
and each person had to get a penny from each of the people after them.
The easiest way to implement this is for each pseron to give a penny to
all of the people before them in the sequence. 

The eligibility trace is an implementation of this, where the e[s,a]
number indicates how much Q[s,a] should be updated by the current error.

Does this make sense?

>I'm also wondering if the draft notes are intended for publication in
>the next version of the book, because there are some minor spelling errors.

Yes; we are currently working on the second edition. We'd rather hear
about things you don't understand and need to be explained better at the
moment rather than minor spelling errors (but we want to hear about
these too). Please email these to me or post them here.

David

Message no. 67[Branch from no. 65]
Posted by David Poole (cpsc_422_term2) on Tuesday, February 15, 2005 10:47am
Subject: Re: MT and Assignment question

In message 65 on Tuesday, February 15, 2005 12:18am, Robin McQuinn writes:
>I'm just wondering if discussing the posted midterm here is sanctioned? 

Certainly.

David

Message no. 68[Branch from no. 63]
Posted by Danelle Abra Wettstein (s86800018) on Tuesday, February 15, 2005 3:05pm
Subject: Re: Assignment 3, Question 1

>Run the applet to find the answer to (b). You may have to change a line
>in the code to test the 10/(9+k), but I presume you can all read Java. I
>wanted you to get to look at the code, (the core of the algorithm
>happens at the bottom of do-step).
>

In regards to the applet, how do we know if it "converges"?

Message no. 69[Branch from no. 68]
Posted by Danelle Abra Wettstein (s86800018) on Tuesday, February 15, 2005 3:15pm
Subject: Re: Assignment 3, Question 1

And what is meant by "the environment changes slowly" in part c?

Message no. 70
Posted by David Poole (cpsc_422_term2) on Wednesday, February 16, 2005 1:50pm
Subject: Assignment 3, question 2(a)

For updating the applet, you *only* need to change the method doStep and
add some global variables.

The only trick is to keep it very clear in your mind which is the
current state and the previous state, and the current action and the
previous action. (Be clear about which S and A in SARSA each variable is
referring to).

Good luck. If you spend more than a couple of hours on this, you are
doing something wrong, and you should step back and rethink what you are
doing.

Of course, if you are not familiar with Java, I'd recommend not doing
this question.

David

Message no. 71
Posted by Frank Hutter (s62336011) on Saturday, February 19, 2005 8:15pm
Subject: Q(lambda) description

Q(lambda) is mentioned in David's notes, but it's not explained in
detail. So if you're having problems with it, you may want to have a
look at section 7.6 in the Sutton and Barto book whose HTML version is
linked of the course webpage under "slides used in class".
You don't need to understand this section in detail, just give it a look
- that should already clarify many questions ...

Cheers,
Frank

Message no. 72
Posted by Vivian Luk (s82215013) on Sunday, February 20, 2005 11:59am
Subject: A3 Ques2B - Eligibility Trace

Hi,

Does anyone know any good sites that go into more detail about
eligibility traces?  The class notes, text, and rough notes don't really
say much about it.

Thanks :)
Vivian

Message no. 73[Branch from no. 72]
Posted by Vivian Luk (s82215013) on Sunday, February 20, 2005 12:09pm
Subject: Re: A3 Ques2B - Eligibility Trace

Ah, nvm.  I just noticed Frank's post above.  Thanks anyways.

Message no. 74
Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 2:49pm
Subject: Comments on reinforcement notes

I just read over the reinforcement notes, and I made some comments as I went. Some of these comments may be unnecessary, I'm still working on understanding reinforcement learning, but I think these would be trouble areas. Particularly for someone who didn't have the benefit of the lectures and was only working from the text.

I'd be especially interested to hear if my comments suggest a deeper misunderstanding, what with the midterm coming up.

p410

There could be more context for the v values at the beginning of the temporal differences section.

p411

"you increase the predicted value in proportion to that difference. If the new value is less than the old prediction, we decrease the predicted value by that amount." makes it sound as if increases are affected by alpha (proportional) but decreases are not (by that amount), but they are both affected by alpha, aren't they?

"but it may be a better estimate of the next value if the dynamics is non-stationary." is rather jargon loaded. Could this refer to the environment changing or the real value of vk changing rather than non-stationary dynamics?

p414

"This does encourage exploration, however the agent can hallucinate that some actions are good for a long time, even though there is no real evidence for it. A state only gets to look bad, when all its actions look bad, but when all of these* lead to states that look good, it takes a long time to get a realistic view of the actual values."

Ok, but if all the actions are good, and it's hallucinating that they're all good, then isn't that fairly realistic? At least it isn't mistaking bad actions for good actions. I think this is unclear because the "these" that I've starred refers to the states the actions lead to, but on first reading seems to refer to a second state where the actions lead to good results that is being compared to a state where the actions lead to bad results.

At the bottom of the page, the text jumps directly from explaining backups to talking about lookaheads. Some definition or explanation of lookaheads would be good.

p415

"However, this is provides an improved estimate of the policy that the agent is actually following. If the agent is following policy p this gives an improved estimate of Qp ." This wasn't clear to me either in class or in the text. What is "the policy that the agent is actually following"? If it's the current function used to select actions, then how can we be estimating it? Don't we know it's exact value? And isn't its value changing as we continue to update Q? What is the policy converging on, if it's an estimate of what we're already doing? The following paragraph does a good job of explaining this, but this paragraph standing alone sounds incredibly confusing.

p417

"One could imagine a version of Q-learning g that uses eligibility traces, however when an action that isn’t optimal is chosen, you need to be careful to make sure that the values cancel out as we did above."

I really don't know where we made what values cancel out, or how we did it. By using all of the n-step lookaheads?

There is this passage:

"Unfortunately this is not a good estimate of the optimal Q-value, Q., as action at+1 may not be the optimal action. For example, if action at+1 were the action that takes the robot into a .10 position, where there were better actions available, you shouldn't update Q[s0, a0]."

but it doesn't explictly refer to cancelling out.

The notes were certainly a help, particularly with the assignment.

Dave

Message no. 75
Posted by Kaili Elizabeth Vesik (s83834010) on Sunday, February 20, 2005 4:48pm
Subject: Assignment 3, Question 2(a)

I'm working on Question 2 (a), so I've been wandering around the java
code for the Q(lambda) applet, as well as figuring out the differences
in implementation between Q(lambda) and SARSA(lambda) according to the
algorithms in section 7.5 and 7.6 of the Sutton/Barto online text.

Now here's the fun part: I was trying to decide what parts of the code I
needed to change in order to convert the applet to do SARSA(lambda), and
along the way, realized that it looks to me suspiciously like the java
code implements an algorithm that is a hybrid of the two we're concerned
with. Has anyone else noticed this, or am I just horribly confused? It'd
be nice to know either way. ;)

Kaili

Message no. 76
Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 4:58pm
Subject: alpha in Q(lambda) demo

I'm checking out the Q(lambda) applet, and noticed this in the instructions:

"The alpha value, by default uses the counts (so the value is the
average of the experiences). You can also make it a fixed value."

But the usual "fixed" checkbox isn't there. Is this an error? Does alpha
have to be fixed for Q(lambda)? It seems like you could have an alpha
for each state based off a K for each state just like in Q-learning.

Alpha doesnt' show up in too many of the Q(lambda) equations, but it
does make an appearance in the first one on p415:
Q[s,a] <- Q[s,a] + alpha*delta-t
and the rest of the section doesn't seem incompatible with either fixed
or dynamic alphas.

I guess I should dive in to the code and find out?

Dave

Message no. 77[Branch from no. 76]
Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 7:10pm
Subject: Re: alpha in Q(lambda) demo

I found it in the code and it's definetly fixed. But the code is also
there to count visits. Why would it not allow a variable alpha based on
the number of visits? Nothing about SARSA would seem to forbid this.

Dave

Message no. 78[Branch from no. 71]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 7:24pm
Subject: Re: Q(lambda) description

It should be SARSA(lambda). Please answer question 2(b) assuming
SARSA(lambda).

Sorry about that,
David


In message 71 on Saturday, February 19, 2005 8:15pm, Frank Hutter writes:
>Q(lambda) is mentioned in David's notes, but it's not explained in
>detail. So if you're having problems with it, you may want to have a
>look at section 7.6 in the Sutton and Barto book whose HTML version is
>linked of the course webpage under "slides used in class".
>You don't need to understand this section in detail, just give it a look
>- that should already clarify many questions ...
>
>Cheers,
>Frank

Message no. 79[Branch from no. 75]
Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 7:37pm
Subject: Re: Assignment 3, Question 2(a)

It looks to me like Q(lambda) because it only looks at max(a') Q[s',a']
for the expected future value (ie for the V term in the delta equation
(using Poole's notation from his notes)). I'll take a peak at the
Sutton/Barto material before I go and modify anything rashly though.

Why do you think it's a mixture?

If you want to chat about it, I'm on msn at davcamer@hotmail.com

Dave

Message no. 80[Branch from no. 76]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 8:00pm
Subject: Re: alpha in Q(lambda) demo

I could not work out how to use alpha=1/k using Q(lambda). [It isn't
clear what to use as k]. The naive way of doing it doesn't work.  I
asked Rich Sutton (who is the world expert on Reinforcement learning)
and he said that it wasn't appropriate to use alpha varying. I thought I
had removed all references to a varying alpha in the code, but obviously
I didn't.

In any case, alpha should be fixed.

David

Message no. 81[Branch from no. 74]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 9:44pm
Subject: Re: Comments on reinforcement notes

In message 74 on Sunday, February 20, 2005 2:49pm, David Burns Cameron writes:

I just read over the reinforcement notes, and I made some comments as I went. Some of these comments may be unnecessary, I'm still working on understanding reinforcement learning, but I think these would be trouble areas. Particularly for someone who didn't have the benefit of the lectures and was only working from the text.

I'd be especially interested to hear if my comments suggest a deeper misunderstanding, what with the midterm coming up.

p410

There could be more context for the v values at the beginning of the temporal differences section.

p411

"you increase the predicted value in proportion to that difference. If the new value is less than the old prediction, we decrease the predicted value by that amount." makes it sound as if increases are affected by alpha (proportional) but decreases are not (by that amount), but they are both affected by alpha, aren't they?

"but it may be a better estimate of the next value if the dynamics is non-stationary." is rather jargon loaded. Could this refer to the environment changing or the real value of vk changing rather than non-stationary dynamics?

p414

"This does encourage exploration, however the agent can hallucinate that some actions are good for a long time, even though there is no real evidence for it. A state only gets to look bad, when all its actions look bad, but when all of these* lead to states that look good, it takes a long time to get a realistic view of the actual values."

Ok, but if all the actions are good, and it's hallucinating that they're all good, then isn't that fairly realistic? At least it isn't mistaking bad actions for good actions. I think this is unclear because the "these" that I've starred refers to the states the actions lead to, but on first reading seems to refer to a second state where the actions lead to good results that is being compared to a state where the actions lead to bad results.

At the bottom of the page, the text jumps directly from explaining backups to talking about lookaheads. Some definition or explanation of lookaheads would be good.

p415

"However, this is provides an improved estimate of the policy that the agent is actually following. If the agent is following policy p this gives an improved estimate of Qp ." This wasn't clear to me either in class or in the text. What is "the policy that the agent is actually following"? If it's the current function used to select actions, then how can we be estimating it? Don't we know it's exact value? And isn't its value changing as we continue to update Q? What is the policy converging on, if it's an estimate of what we're already doing? The following paragraph does a good job of explaining this, but this paragraph standing alone sounds incredibly confusing.

p417

"One could imagine a version of Q-learning g that uses eligibility traces, however when an action that isn’t optimal is chosen, you need to be careful to make sure that the values cancel out as we did above."

I really don't know where we made what values cancel out, or how we did it. By using all of the n-step lookaheads?

There is this passage:

"Unfortunately this is not a good estimate of the optimal Q-value, Q., as action at+1 may not be the optimal action. For example, if action at+1 were the action that takes the robot into a .10 position, where there were better actions available, you shouldn't update Q[s0, a0]."

but it doesn't explictly refer to cancelling out.

The notes were certainly a help, particularly with the assignment.

Dave

Message no. 82[Branch from no. 75]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 9:59pm
Subject: Re: Assignment 3, Question 2(a)

In message 75 on Sunday, February 20, 2005 4:48pm, Kaili Elizabeth Vesik
writes:
>I'm working on Question 2 (a), so I've been wandering around the java
>code for the Q(lambda) applet, as well as figuring out the differences
>in implementation between Q(lambda) and SARSA(lambda) according to the
>algorithms in section 7.5 and 7.6 of the Sutton/Barto online text.
>
>Now here's the fun part: I was trying to decide what parts of the code I
>needed to change in order to convert the applet to do SARSA(lambda), and
>along the way, realized that it looks to me suspiciously like the java
>code implements an algorithm that is a hybrid of the two we're concerned
>with. Has anyone else noticed this, or am I just horribly confused? It'd
>be nice to know either way. ;)

Yes. It is. Sutton and Barto describe 3 versions of Q(lambda), none of
which actually work (very well). I implemented what they called the
naive version (or at least I tried to). Fortunately  (for you), you have
to implement SARSA(lambda) for which there is a well defined algorithm.
Instead of using max_a Q(s',a) like I did, you should use Q(s',a') where
a' is the action actually used. This makes the math work.

In the next version of the book, Q(lambda) will not be mentioned. I was
originally trying to minimize the number of things to present. I didn't
want to pretend that Q-learning was the only thing in reinforcement
learning. I originally was trying to present averaging over k-step
lookaheads without introducing SARSA, but it doesn't really work.

You only need to change one method; doStep and some global variables.
You don't even need to look at the other methods. Most of which are just
doing UI stuff or calling doStep with appropriate arguments.

David

>Kaili

Message no. 83[Branch from no. 81]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 10:37pm
Subject: Re: Comments on reinforcement notes

I just typed in a big long reply using the HTML editor and it didn't
seem to work at all!!!

Message no. 85[Branch from no. 84]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 10:45pm
Subject: Re: Comments on reinforcement notes In message 84 on Sunday, February 20, 2005 10:41pm, David Poole writes: >As far as I can tell the HTML editor doesn't work at all. Here is as >much as I saved of my reply. There were more comments at the end I will >try to reconstruct tomorrow. Let's see if it works better inline... (This is the same as the previous message; it has not been updated). Did I say I hated WebCT?

In message 74 on Sunday, February 20, 2005 2:49pm, David Burns Cameron writes:

I just read over the reinforcement notes, and I made some comments as I went. Some of these comments may be unnecessary, I'm still working on understanding reinforcement learning, but I think these would be trouble areas. Particularly for someone who didn't have the benefit of the lectures and was only working from the text.

Thanks! All comments are appreciated and will help. I will reply here and post a revised set of notes.

I'd be especially interested to hear if my comments suggest a deeper misunderstanding, what with the midterm coming up.

p410

There could be more context for the v values at the beginning of the temporal differences section.

The v's can be any numbers. The fact that they are estimates of future extected value is irrelevant. They can be 0's and 1's to learn proabilities, can be heights to infer the average height, or anything else.

I am not sure what context you want. I thought it was easier to separate out the alpha as purely a way to get average values, and not have it more mixed up with the other parameters of reinforcement learning.

p411

"you increase the predicted value in proportion to that difference. If the new value is less than the old prediction, we decrease the predicted value by that amount." makes it sound as if increases are affected by alpha (proportional) but decreases are not (by that amount), but they are both affected by alpha, aren't they?

Yes. "that amount" was intended to refer to alpha times the difference.

"but it may be a better estimate of the next value if the dynamics is non-stationary." is rather jargon loaded. Could this refer to the environment changing or the real value of vk changing rather than non-stationary dynamics?

OK. I wanted to say something like the process that generates the vi's changes. For example, if you wanted to keep a running prediction of the average grade, but where the average moves over time because schools become more generous when giving grades. Again this is an example where the basic idea has nothing to do with reinforcement learning, but is just about running averages. See if the new version is better.

p414

"This does encourage exploration, however the agent can hallucinate that some actions are good for a long time, even though there is no real evidence for it. A state only gets to look bad, when all its actions look bad, but when all of these* lead to states that look good, it takes a long time to get a realistic view of the actual values."

Ok, but if all the actions are good, and it's hallucinating that they're all good, then isn't that fairly realistic? At least it isn't mistaking bad actions for good actions. I think this is unclear because the "these" that I've starred refers to the states the actions lead to, but on first reading seems to refer to a second state where the actions lead to good results that is being compared to a state where the actions lead to bad results.

The point is that it takes a long time for anything to look bad. Because we are maximizing, a state looks bad only if all of the states leading from it are bad. A state looks good if some of the actions from it lead to states that are good. To see the asymmetry, try the Q-learning applet with a greedy exploit of 0% (i.e., it is only exploring and not exploiting), and compare a high initial value (say 10) with a low initial value (say -10) and an intermediate one (say 0). The starred "these" refers to the actions (doesn't it?).

At the bottom of the page, the text jumps directly from explaining backups to talking about lookaheads. Some definition or explanation of lookaheads would be good.

It is difficult to explain. A backup and a lookahead are the same. It depends on where you are standing (from which state's perspective you are taking). I'll try to explain it better.

p415

"However, this is provides an improved estimate of the policy that the agent is actually following. If the agent is following policy p this gives an improved estimate of Qp ." This wasn't clear to me either in class or in the text. What is "the policy that the agent is actually following"? If it's the current function used to select actions, then how can we be estimating it? Don't we know it's exact value? And isn't its value changing as we continue to update Q? What is the policy converging on, if it's an estimate of what we're already doing? The following paragraph does a good job of explaining this, but this paragraph standing alone sounds incredibly confusing.

The current policy the agent is following may be "choose the action that maximizes Q 80% of the time and choose a random action 20% of the time." The value of this policy is not the same as the value of the optimal policy. The policy that the agent is following includes its exploration steps. These may be dangerous. SARSA takes this into account; Q-learning doesn't.

p417

"One could imagine a version of Q-learning g that uses eligibility traces, however when an action that isn’t optimal is chosen, you need to be careful to make sure that the values cancel out as we did above."

I really don't know where we made what values cancel out, or how we did it. By using all of the n-step lookaheads?

If you look at the math. We are using a consistent value for V(s_t+i). If you tried to push it through for Q(lambda), e.g., in the implementation, sometimes we are using max_a Q(s_t+i,a) and sometimes Q(s_t+i,a_t+i); sometimes the best action and sometimes the action we actually did. The math doesn't make sense if we inconsistently substitute different values for the V(s_t+i)

There is this passage:

"Unfortunately this is not a good estimate of the optimal Q-value, Q., as action at+1 may not be the optimal action. For example, if action at+1 were the action that takes the robot into a .10 position, where there were better actions available, you shouldn't update Q[s0, a0]."

but it doesn't explictly refer to cancelling out.

The notes were certainly a help, particularly with the assignment.

Dave

Message no. 86[Branch from no. 83]
Posted by David Poole (cpsc_422_term2) on Sunday, February 20, 2005 10:48pm
Subject: Re: Comments on reinforcement notes In message 83 on Sunday, February 20, 2005 10:37pm, David Poole writes: >I just typed in a big long reply using the HTML editor and it didn't >seem to work at all!!! > > Here is as much of the message as I can reconstruct. There was more at the end, but my brain isn't working now. I'll try to reconstruct it tomorrow. Did I say I hated WebCT?

In message 74 on Sunday, February 20, 2005 2:49pm, David Burns Cameron writes:

I just read over the reinforcement notes, and I made some comments as I went. Some of these comments may be unnecessary, I'm still working on understanding reinforcement learning, but I think these would be trouble areas. Particularly for someone who didn't have the benefit of the lectures and was only working from the text.

Thanks! All comments are appreciated and will help. I will reply here and post a revised set of notes.

I'd be especially interested to hear if my comments suggest a deeper misunderstanding, what with the midterm coming up.

p410

There could be more context for the v values at the beginning of the temporal differences section.

The v's can be any numbers. The fact that they are estimates of future extected value is irrelevant. They can be 0's and 1's to learn proabilities, can be heights to infer the average height, or anything else.

I am not sure what context you want. I thought it was easier to separate out the alpha as purely a way to get average values, and not have it more mixed up with the other parameters of reinforcement learning.

p411

"you increase the predicted value in proportion to that difference. If the new value is less than the old prediction, we decrease the predicted value by that amount." makes it sound as if increases are affected by alpha (proportional) but decreases are not (by that amount), but they are both affected by alpha, aren't they?

Yes. "that amount" was intended to refer to alpha times the difference.

"but it may be a better estimate of the next value if the dynamics is non-stationary." is rather jargon loaded. Could this refer to the environment changing or the real value of vk changing rather than non-stationary dynamics?

OK. I wanted to say something like the process that generates the vi's changes. For example, if you wanted to keep a running prediction of the average grade, but where the average moves over time because schools become more generous when giving grades. Again this is an example where the basic idea has nothing to do with reinforcement learning, but is just about running averages. See if the new version is better.

p414

"This does encourage exploration, however the agent can hallucinate that some actions are good for a long time, even though there is no real evidence for it. A state only gets to look bad, when all its actions look bad, but when all of these* lead to states that look good, it takes a long time to get a realistic view of the actual values."

Ok, but if all the actions are good, and it's hallucinating that they're all good, then isn't that fairly realistic? At least it isn't mistaking bad actions for good actions. I think this is unclear because the "these" that I've starred refers to the states the actions lead to, but on first reading seems to refer to a second state where the actions lead to good results that is being compared to a state where the actions lead to bad results.

The point is that it takes a long time for anything to look bad. Because we are maximizing, a state looks bad only if all of the states leading from it are bad. A state looks good if some of the actions from it lead to states that are good. To see the asymmetry, try the Q-learning applet with a greedy exploit of 0% (i.e., it is only exploring and not exploiting), and compare a high initial value (say 10) with a low initial value (say -10) and an intermediate one (say 0). The starred "these" refers to the actions (doesn't it?).

At the bottom of the page, the text jumps directly from explaining backups to talking about lookaheads. Some definition or explanation of lookaheads would be good.

It is difficult to explain. A backup and a lookahead are the same. It depends on where you are standing (from which state's perspective you are taking). I'll try to explain it better.

p415

"However, this is provides an improved estimate of the policy that the agent is actually following. If the agent is following policy p this gives an improved estimate of Qp ." This wasn't clear to me either in class or in the text. What is "the policy that the agent is actually following"? If it's the current function used to select actions, then how can we be estimating it? Don't we know it's exact value? And isn't its value changing as we continue to update Q? What is the policy converging on, if it's an estimate of what we're already doing? The following paragraph does a good job of explaining this, but this paragraph standing alone sounds incredibly confusing.

The current policy the agent is following may be "choose the action that maximizes Q 80% of the time and choose a random action 20% of the time." The value of this policy is not the same as the value of the optimal policy. The policy that the agent is following includes its exploration steps. These may be dangerous. SARSA takes this into account; Q-learning doesn't.

p417

"One could imagine a version of Q-learning g that uses eligibility traces, however when an action that isn’t optimal is chosen, you need to be careful to make sure that the values cancel out as we did above."

I really don't know where we made what values cancel out, or how we did it. By using all of the n-step lookaheads?

If you look at the math. We are using a consistent value for V(s_t+i). If you tried to push it through for Q(lambda), e.g., in the implementation, sometimes we are using max_a Q(s_t+i,a) and sometimes Q(s_t+i,a_t+i); sometimes the best action and sometimes the action we actually did. The math doesn't make sense if we inconsistently substitute different values for the V(s_t+i)

There is this passage:

"Unfortunately this is not a good estimate of the optimal Q-value, Q., as action at+1 may not be the optimal action. For example, if action at+1 were the action that takes the robot into a .10 position, where there were better actions available, you shouldn't update Q[s0, a0]."

but it doesn't explictly refer to cancelling out.

The notes were certainly a help, particularly with the assignment.

Dave

Message no. 87[Branch from no. 80]
Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 11:23pm
Subject: Re: alpha in Q(lambda) demo

It's mostly gone. There are some lines commented out, so it seemed like
you had probably taken it out, but I wasn't sure why. There more
confusing part was that the documentation at the bottom of the webpage
the applet is served from specifically says that alpha can be fixed, or
varied.

I'm still curious why alpha won't work, so I'll try to explain how I was
thinking it would, which is undoubtedly the broken naive approach. So...

Every action taken results in an infinite series of datum that will be
reported back, but will eventually stop being relevant. The factors
(1-lambda)*lambda that these datum are discounted by are chosen to
converge to 1. This makes them equivalent in magnitude to a one-step backup.

An action can be taken again, and when this happens we want to average
the resulting infinite series in some way. Because they are equivalent
in magnitude to a one-step backup, we can use the same technique of
temporal differences. That's where alpha comes in. Alpha is proportional
to the number of items in the average, and this can be tracked by
counting the number of times the action has been taken.

However, there is an implmentation issue because the value of alpha
changes on subsequent visits, but the eligibility traces generated on
each visit are summed and collapsed together. This could be avoided by
changing the eligibility values to be tuples of alpha and e, and then
updating with an equation like


 
 
 




where alpha and eligibility are now subscripted to indicate the visit
which first generated them. (alpha,e) pairs could be removed from the
list when e decayed past a certain threshold. Storing a list of tuples
would be a lot of extra work but I can't see a way to simplify it.

Would it be worth it? Q(lambda) in particular seems unstable, and a
dynamic alpha is designed to increase stability. But then that's trading
off against adaptability again.

Dave

Message no. 88[Branch from no. 85]
Posted by David Burns Cameron (s66878984) on Sunday, February 20, 2005 11:58pm
Subject: Re: Comments on reinforcement notes

Oh dear, I didn't realize what I set off by using the HTML Editor. And it was enough trouble getting it to work in the first place!

p410

The v's can be any numbers. ...

I am not sure what context you want. ...

I realized this after, and it is easier this way. Actually, I guess the problem was that I was expecting them to be in the context of reinforcement learning and they aren't. Something to escape the context, as simple as "In general" or "any values" to emphasize that they're arbitrary, and this isn't specific to learning.

p414

"This does encourage exploration, however the agent can hallucinate that some actions (some states?) are good for a long time, even though there is no real evidence for it. A state only gets to look bad, when all its actions look bad, but when all of these* lead to states that look good, it takes a long time to get a realistic view of the actual values."

The point is that it takes a long time for anything to look bad. Because we are maximizing, a state looks bad only if all of the states leading from it are bad. A state looks good if some of the actions from it lead to states that are good. To see the asymmetry, try the Q-learning applet with a greedy exploit of 0% (i.e., it is only exploring and not exploiting), and compare a high initial value (say 10) with a low initial value (say -10) and an intermediate one (say 0).

Thanks for clearing that up.

The starred "these" refers to the actions (doesn't it?).

It does, but it's a little ambiguous because there are two noun phrases on the stack. (A state (all of its actions (these...))) and my poor brain was already cluttered with heaps of maths.

It is difficult to explain. A backup and a lookahead are the same. It depends on where you are standing (from which state's perspective you are taking). I'll try to explain it better.

That's a good clear metaphor, and could work well in the text.

p417

If you look at the math. ...

I'll have to take a longer look at it.

I'm glad you found the comments helpful.

Dave

Message no. 89[Branch from no. 87]
Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 9:00am
Subject: Re: alpha in Q(lambda) demo

As you show, it could be done if you were to store more information for
each state. However,
storing two numbers for each state-action pair is still too much for
real applications. You need to be able to approximate for realistic size
applications. Question 1 will help you answer whether it is better if
you actually reduce alpha as 1/k. 

This is how research works. You make conjectures about what will work.
Work out the details and then test to see what works in practice.

David

Message no. 90
Posted by Frank Hutter (s62336011) on Monday, February 21, 2005 5:30pm
Subject: Detailed hints on question 2(b)

Hi everybody,

unfortunately, we only clarified today that question 2(b) should be done
for SARSA(lambda) instead of Q(lambda) (see David's reply to "Q(lambda)
description" above)

Given that the assignment is already due tomorrow, I thought it may only
be fair to give some hints on how to attack that question.
You only need to trace the algorithm, and you're likely to learn more
about it doing so. Just trace SARSA(lambda) on page 418 of the draft
notes for the following cases (write down the values it assigns to the
different variables):

(Eligibility traces and say also all the Qs are initialized to zero.)

1) you're in state B and go left the first time. 
(Q[B,left] is increased)

... staying in state A for a long time (=> eligibility trace for
(B,left) goes to 0)

2) you're in state B and go left the second time. 
(Q[B,left] is increased more)

... staying in state A for a long time (=> eligibility trace for
(B,left) goes to 0)

3) you're in state B and go left the third time. 

Do you see a pattern ? Against which value will Q[B,left] converge
eventually ?


Now the slightly more complicated case:
1a) you're in state B and go right the first time. 
Let's asume you end up in state B this time (this only happens with high
probability, you will eventually end up in state A at some point)
1b) you're in state B and go right the second time. 
Again, let's assume you end up in B.
1c) you're in state B and go right the second time. 
Again, let's assume you end up in B.

The eligibility trace e[B,right] is converging to something close to 4. 

1d) You're in state B and go right the Nth time for rather large N.
Assume N is large enough such that the eligibility trace converged to 4
already. Now assume finally you end up in A.
You're getting a reward, and since your eligibility trace is 4+1,
Q[B,right] becomes quite large.

... staying in state A for a long time (=> eligibility trace for
(B,right) goes to 0)

2a) you're visiting state B the second time and go right.
Do the same thing as in 1a) - 1c)

Do you see a pattern ? Where do you end up after this ? To which value
does Q[B,right] converge ? Is it any different than the Q[B,right] value
after 1c) ?

Doing the same thing as in 1d) then increases Q[B,right] lots again.

After 1d) (after ending up in A when going right in state B), Q[B,right]
is indeed larger than what Q[B,left] converges to. But Q[B,right] is not
stable.

How could you prevent such a funny behaviour of SARSA(lambda) ? Would a
change in alpha make any difference ?


Hope that helps. It's more than we wanted to reveal but I guess it's
just fair since you only got today.

Good luck,
Frank

Message no. 91[Branch from no. 90]
Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 7:49pm
Subject: Re: Detailed hints on question 2(b)

In message 90 on Monday, February 21, 2005 5:30pm, Frank Hutter writes:
>Hi everybody,
>
>unfortunately, we only clarified today that question 2(b) should be done
>for SARSA(lambda) instead of Q(lambda) (see David's reply to "Q(lambda)
>description" above)

From message 66 (last Tuesday the 15th):

Use the SARSA(lambda) algorithm for this question. The one that is given
on page 418 of the notes. 

David

Message no. 92[Branch from no. 90]
Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 9:51pm
Subject: Re: Detailed hints on question 2(b)

One more clarification for this question.

While thinking about Frank's questions will help you answer the
question, answering them will not answer question 2(b).

You need to actually answer the questions posed in the assignment. 

We want a qualitative description. What would you say to your friend?
What is wrong with your friend's argument? While answering all of Franks
questions may help, they do not provide the answer. The answer is a
concise description that would help your friend. Something to get them
to say "oh I see; I didn't think about it enough."

There are, of course, lots of possible answers to the second question
"What does this example show?" You need to write something sensible.

This isn't meant to be tricky. This is a simple problem. There are only
2 states, 2 actions and only 2 (deterministic) policies (as it doesn't
matter what you do in state A). The two policies are "go right in state
B" and "go left in state B".

David

Message no. 93[Branch from no. 88]
Posted by David Poole (cpsc_422_term2) on Monday, February 21, 2005 10:16pm
Subject: Re: Comments on reinforcement notes

I have downloaded a revised version to
http://www.cs.ubc.ca/spider/poole/ci2/excerpts/reinforcementlearning.pdf

These have not been proofread as carefully as I would like, but I
thought I'd post them just the same.

Note that you are not expected to know the details of SARSA(lambda) just
the general idea for the midterm. (See the "what is on the midterm"
pointer from the homapage).

Please post or send me any comments on the revised version.

Thanks for your feedback.

David

Message no. 94
Posted by Stephen Shui Fung Mak (s36743003) on Monday, February 21, 2005 10:20pm
Subject: Is tomorrow (Tues) going to be a review session?

Just wondering...because the material presented relating reinforcement 
learning is quite chaotic and would be great if Professor Poole can go over 
the key concepts one more time and some examples before the MT.

Message no. 95[Branch from no. 90]
Posted by Vivian Luk (s82215013) on Monday, February 21, 2005 10:53pm
Subject: Re: Detailed hints on question 2(b)

I'm a little confused on the wording in the 'draft notes'.

On pg. 417, it states that when the state-action pair is first visited,
the eligibility is set to 1
On pg. 418, it states that e[s,a] is initialized to 0.

?

Thanks!

Message no. 96[Branch from no. 94]
Posted by Vivian Luk (s82215013) on Monday, February 21, 2005 10:58pm
Subject: Re: Is tomorrow (Tues) going to be a review session?

I second a review session. 
That would help to clarify some (most) concepts. :)

Message no. 97[Branch from no. 92]
Posted by Frank Hutter (s62336011) on Monday, February 21, 2005 11:07pm
Subject: Re: Detailed hints on question 2(b)

Uups, so it was clear since last week that you should use SARSA(lambda).

>While thinking about Frank's questions will help you answer 
> the question, answering them will not answer question 2(b).

True, I guess my posting could be misinterpreted a bit - it was only
meant to give you a couple of questions I would suggest to answer for
yourself in order to understand the problem (and the friend's argument)
better. At least, they helped some students I explained stuff to.

Clearly, you still need to answer (solely) the actual assignment
questions for full marks.
Frank

Message no. 98[Branch from no. 95]
Posted by Frank Hutter (s62336011) on Monday, February 21, 2005 11:11pm
Subject: Re: Detailed hints on question 2(b)

> I'm a little confused on the wording in the 'draft notes'.

> On pg. 417, it states that when the state-action pair is first visited,
> the eligibility is set to 1
> On pg. 418, it states that e[s,a] is initialized to 0.

Yes, the eligibility traces are all initialized to 0. When you first
visit a state-action pair, its eligibility trace is set to 1 since you
increase its eligibility trace by 1 and 0+1 = 1

So both formulations are equivalent.

Frank

Message no. 99[Branch from no. 96]
Posted by Michael Nightingale (s98742018) on Tuesday, February 22, 2005 12:53am
Subject: Re: Is tomorrow (Tues) going to be a review session?

Yes, perhaps we could go over some of the practice mt questions?

Message no. 100[Branch from no. 99]
Posted by Christopher John Hawkins (s93985018) on Tuesday, February 22, 2005 1:21am
Subject: Re: Is tomorrow (Tues) going to be a review session?

I agree.  Even a bit of time at the beginning of class where we could
ask questions would be great.

Message no. 101[Branch from no. 94]
Posted by Guan Wang (s77942019) on Tuesday, February 22, 2005 9:29am
Subject: Re: Is tomorrow (Tues) going to be a review session?

yeah,  go over sample mt questions is goos idea :-)

Message no. 102[Branch from no. 94]
Posted by David Poole (cpsc_422_term2) on Tuesday, February 22, 2005 9:32am
Subject: Re: Is tomorrow (Tues) going to be a review session?

In message 94 on Monday, February 21, 2005 10:20pm, Stephen Shui Fung
Mak writes:
>Just wondering...because the material presented relating reinforcement 
>learning is quite chaotic and would be great if Professor Poole can go
over 
>the key concepts one more time and some examples before the MT. 

It wasn't going to be, but I am more than happy to answer questions.

David

Message no. 103[Branch from no. 95]
Posted by David Poole (cpsc_422_term2) on Tuesday, February 22, 2005 9:35am
Subject: Re: Detailed hints on question 2(b)

In message 95 on Monday, February 21, 2005 10:53pm, Vivian Luk writes:
>I'm a little confused on the wording in the 'draft notes'.
>
>On pg. 417, it states that when the state-action pair is first visited,
>the eligibility is set to 1
>On pg. 418, it states that e[s,a] is initialized to 0.
>
>?

Yes. What is the problem? When you initialize, no state-action pair has
ben visited. When you fist visit a state-action pair you add 1 to 0.

David

Message no. 104[Branch from no. 100]
Posted by David Poole (cpsc_422_term2) on Tuesday, February 22, 2005 9:37am
Subject: Re: Is tomorrow (Tues) going to be a review session?

In message 100 on Tuesday, February 22, 2005 1:21am, Christopher John
Hawkins writes:
>I agree.  Even a bit of time at the beginning of class where we could
>ask questions would be great.

There is always time at the start of any lecture to ask as many
questions as you like. Please ask lots; if you don't ask questions, I
assume that you understand.

David

Message no. 105
Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, February 22, 2005 9:57pm
Subject: Assignment 3 solutions

David:

When will the solutions for assignment 3 be posted?

Thanks
Kaili

Message no. 106
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, February 23, 2005 1:49am
Subject: Sample midterm

In section one (robot control) of the sample midterm, question (g) reads
"Why don't we run the logical specification of a hierarchical controller
using SLD resolution? How can it be implemented efficiently?"

My question is this: what does "it" refer to? The logical specification
of a hierarchical controller in a general sense? Or using SLD Resolution?

Thanks to anyone who can offer some insight on this.

Message no. 107[Branch from no. 106]
Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 9:37am
Subject: Re: Sample midterm

In message 106 on Wednesday, February 23, 2005 1:49am, Kaili Elizabeth
Vesik writes:
>In section one (robot control) of the sample midterm, question (g) reads
>"Why don't we run the logical specification of a hierarchical controller
>using SLD resolution? How can it be implemented efficiently?"
>
>My question is this: what does "it" refer to? The logical specification
>of a hierarchical controller in a general sense? Or using SLD Resolution?

A hierarchical controller (like the CIspace applet). It does not use the
logical specification of was:

was(Fl,V,T0,T) <-
   assign(F1,V,T0) &
   T0 < T &
   ~ assignedbetween(FL,T0,T).
assignedbetween(FL,T0,T) <-
    assign(FL,V1,T1) &
    T0 < T1 & T1 < T.

Why doesn't it? And how can the controller be implemented effieciently without just running 
those clauses in a logic programming language.

David

Message no. 108[Branch from no. 105]
Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 10:20am
Subject: Re: Assignment 3 solutions

CPSC 422 - Assignment 3 - Solution - Spring 2005

Question 1

(a) (i) and (ii) converge in theory. (iii) doesn't converge. (iv)
        converges too quickly (10000 step may not be enough).

(b) 

(i) doesn't converge to the correct answer in any reasonable time
    (e.g., for 1000000 steps for initializing at 10.0).

(ii) converges to within 2 significant digits of the correct answer after
1000000 steps. (It even works if the initial value is set to 100 or
-100, after a few million steps). See:
http://www.cs.ubc.ca/spider/poole/demos/rl/q10.html

(iii) It gets a reasonable approximation reasonably quickly. It is
within one significant digit.

(iv) It converges, but not to the correct answer. Even if 10,000 is
replaced by 1,000,000 it doesn't seem to converge to the right answer.

Question 2a

See
http://www.cs.ubc.ca/spider/poole/demos/rl/sarsaLambda.html

Question 2b

This was discussed in class.

The brief answer is that Q[B,left] converges to 10 independently of the
value of the parameters.

When following the policy of going right in state B, Q[B,right] has a
high value when it ends up in state A, and then it decays to zero. The
height of this value is sensitive to the parameters lambda and
alpha. If alpha is reduced (as it is supposed to be) there is no
problem.

Message no. 109
Posted by David Burns Cameron (s66878984) on Wednesday, February 23, 2005 10:53am
Subject: race a robot car in the desert

Hi All

Have you heard about the DARPA Grand Challenge? It's a race across the
California desert, but only for entirely robotic cars! It was last run
in March 2004, and because no teams even came close to completing the
course, it is being run next in October 2005. More background on the
race and last year's running can be found here:
http://en.wikipedia.org/wiki/DARPA_Grand_Challenge

UBC has a team, originally started by the Mining Engineering school that
has been working on a vehicle since last fall:
http://www.ubcthunderbird.com/ . The roboticization of the vehicle is
nearly complete, and the Discovery channel will be taping the first
teleoperation test this weekend.

But, remote-control won't cut it for the actual race and the software
for the robot has yet to be written. If you're interested in applying AI
techniques, this is a chance to do it on a real world project. The
software challenges include the actual decision making, as well as
integrating many different hardware and software sensing and actuation
systems, and ensuring the whole thing runs in real time.

Software team meetings happen Mondays at 5 o'clock in the Frank Forward
Building, room 519a:
http://www.maps.ubc.ca/PROD/index_detail.php?locat1=562 . If you're
interested, post a reply here or track me down before or after class.
I'm a keener up in the second row with brown hair, square black glasses
and a red backpack.

Dave

Message no. 110
Posted by Guan Wang (s77942019) on Wednesday, February 23, 2005 1:03pm
Subject: textbook exercises

Hi,

  I tried doing some exercises:

  Ex.7.1 in page 278, and came up with {e,g},{h},and {d} as the set of
minimal conflicts.

  
  Ex.9.1 page 343
             a) {hunting},{robbing} are all minimal explanations of get(gun)

             b) {robbing},{hunting,banking} min. explanations of
get(gun) ^goto(bank)

             c) If observe goto(forest) then you can remove {robbing}

  Do these look right?
Thanks !

Message no. 111[Branch from no. 109]
Posted by Christopher John Hawkins (s93985018) on Wednesday, February 23, 2005 2:04pm
Subject: Re: race a robot car in the desert

this sounds really interesting.  are there any prerequisites for getting
involved?

Message no. 112[Branch from no. 110]
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, February 23, 2005 3:45pm
Subject: Re: textbook exercises

In no way am I saying that my answers are correct, but here's what I got:

Exercise 7.1 (p 278) - I got the same answers as you did.

Exercise 9.1 (p 343) - I got the same answers as you did for (a) and
(b),  but for (c), I think that if you observe puton(goodShoes), then
you can remove {robbing}.

My reasoning is this:
If your minimal explanation is {robbing}, then you can infer get(gun)
and goto(bank), neither of which could possibly lead to any contradictions.
However, if you look at the other explanation, {hunting, banking}, then
you can infer get(gun), goto(forest), and goto(bank).
I noticed that goto(forest) might lead to a contradiction if we also
have puton(goodShoes), so that is what I chose as my answer.
If we observe puton(goodshoes) as well get(gun) ^ goto(bank), then we
reach a "false" conclusion, which allows us to remove the {hunting,
banking} explanation.

Does anyone else have any opinions on these exercises?

Message no. 113
Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, February 23, 2005 3:54pm
Subject: Questions about practice midterm

1. When the question says "show one step of something (Q-Learning,SARSA,etc)", what 
do I need to answer? Do I just write down the procedure and explains how each steps in 
one iteration would do? Is this how I answer the question?

2. Where can I find more information regarding alpha and lambda in SARSA(lambda)? I 
read the draft notes already and went through the alpha=1/k proofing yet I don't know 
how I can describe in words what it does. Also, I can seem to find reference related to 
lambda anywhere...

3. Ragarding the question about "explain why alpha_k should be reduced as a function of 
k.explain why you may not want to reduce alpha", I do not really understand the 
question. Can someone who have done this question give me some hints or reference on 
solving this problem?

Thanks,

Stephen

Message no. 114[Branch from no. 113]
Posted by Michael Nightingale (s98742018) on Wednesday, February 23, 2005 4:14pm
Subject: Re: Questions about practice midterm

I would also like to get some feedback on my answers for these two
practice midterm questions, below are the questions that were asked
above, and my responses to them:


Explain why alpha_k should be reduced as function of k. Explain why you
may not want to reduce alpha.

- if alpha_k is a function of k, then each TD-error is given the same
weight, allowing you to weigh more recent values accurately
- you may not want to reduce if you do not want alpha_k as a function of
k when the dynamics are non-stationary

There are a number of parameters in SARSA-lambda: alpha, gamma, lambda.
Explain what each does. Which one affects what is the correct answer?
Which one affects whether it will converge?

- 	α: learning rate, affects the convergence
	γ: how much the future value is discounted
	λ: rate at which the eligibility traces fade with each step, correctness

Message no. 115[Branch from no. 113]
Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 7:38pm
Subject: Re: Questions about practice midterm

In message 113 on Wednesday, February 23, 2005 3:54pm, Stephen Shui Fung
Mak writes:
>1. When the question says "show one step of something
(Q-Learning,SARSA,etc)", what 
>do I need to answer? Do I just write down the procedure and explains
how each steps in 
>one iteration would do? Is this how I answer the question?

You will be told exactly what is expected. You should expect to show
what value is changed and how.  We will give some values (Q or V) and
ask you to show what values get changed. (I will only ask the details
about Value iteration and/or Q-learning; you need to know the general
idea of SARSA(lambda) though).

>2. Where can I find more information regarding alpha and lambda in
SARSA(lambda)? I 
>read the draft notes already and went through the alpha=1/k proofing
yet I don't know 
>how I can describe in words what it does. Also, I can seem to find
reference related to 
>lambda anywhere...

alpha provides a way to average a number of values.

For a detailed description of lambda, see the Sutton and Barto book
references from the slides web page.

>3. Ragarding the question about "explain why alpha_k should be reduced
as a function of 
>k.explain why you may not want to reduce alpha", I do not really
understand the 
>question. Can someone who have done this question give me some hints or
reference on 
>solving this problem?

That is what you should have learned doing question 1 of assignment 3.
What happens when it is not reduced? That question gave 3 ways it could
be reduced all with different properties.

David

Message no. 116[Branch from no. 110]
Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 7:49pm
Subject: Re: textbook exercises

Warning - don't read this till you have tried the Ex.7.1 in page 278,
and Ex.9.1 page 343, as it gives away the answer.....

In message 110 on Wednesday, February 23, 2005 1:03pm, Guan Wang writes:
>Hi,
>
>  I tried doing some exercises:
>
>  Ex.7.1 in page 278, and came up with {e,g},{h},and {d} as the set of
>minimal conflicts.

That is what I got too.

>  
>  Ex.9.1 page 343
>             a) {hunting},{robbing} are all minimal explanations of
get(gun)

Yes.

>             b) {robbing},{hunting,banking} min. explanations of
>get(gun) ^goto(bank)

Yes.

>             c) If observe goto(forest) then you can remove {robbing}

No, not really. It adds the explanation {robbing,walking}.

If you observed puton(goodShoes) then they can't be hunting. So that
removes {hunting,banking} - it is no longer consistent.

David

>  Do these look right?
>Thanks !

Message no. 117[Branch from no. 114]
Posted by David Poole (cpsc_422_term2) on Wednesday, February 23, 2005 8:25pm
Subject: Re: Questions about practice midterm

In message 114 on Wednesday, February 23, 2005 4:14pm, Michael
Nightingale writes:
>I would also like to get some feedback on my answers for these two
>practice midterm questions, below are the questions that were asked
>above, and my responses to them:
>
>
>Explain why alpha_k should be reduced as function of k. Explain why you
>may not want to reduce alpha.
>
>- if alpha_k is a function of k, then each TD-error is given the same
>weight, allowing you to weigh more recent values accurately

In assignment 3, question 1, there were 3 different functions of k, only
one of which gave each TD-error the same weight.


>- you may not want to reduce if you do not want alpha_k as a function of
>k when the dynamics are non-stationary

Right. (Do you know what the means?)

>There are a number of parameters in SARSA-lambda: alpha, gamma, lambda.
>Explain what each does. Which one affects what is the correct answer?
>Which one affects whether it will converge?
>
>- 	α: learning rate, affects the convergence
it also affects the correctness.
>	γ: how much the future value is discounted
it affects what is the correct answer (i.e., correctness is defined in
terms of gamma).
>	λ: rate at which the eligibility traces fade with each step, correctness
lambda doesn't affect correctness. It only affects convergence.

Message no. 118[Branch from no. 111]
Posted by David Burns Cameron (s66878984) on Wednesday, February 23, 2005 10:05pm
Subject: Re: race a robot car in the desert

enthusiasm

:)

Message no. 119
Posted by Stephen Shui Fung Mak (s36743003) on Tuesday, March 1, 2005 5:14pm
Subject: Anyone needs a group? Or any group needs member?

I am looking for a group to join or people to form a group as I don't know 
anyone in this class. Please leave a message at this dicussion board if 
anyone is interested. Thanks.

Message no. 120
Posted by Michael Chiang (s27992023) on Tuesday, March 1, 2005 11:07pm
Subject: announcement: assignment marks

Hi all,

Marks for assignments #1 and #2 have been entered in the webct system.
Assignment #3 marks as well as that of the midterm will be posted soon.

Thanks for your patience,
Michael

Message no. 121
Posted by Samuel Douglas Davis (s85850014) on Wednesday, March 2, 2005 1:55am
Subject: Project Description

Could the project description please be posted on the website? I think
this was handed out today but I was late and forgot to pick one up after
class.

Thanks,
Sam

Message no. 122[Branch from no. 121]
Posted by David Poole (cpsc_422_term2) on Wednesday, March 2, 2005 6:39pm
Subject: Re: Project Description

In message 121 on Wednesday, March 2, 2005 1:55am, Samuel Douglas Davis
writes:
>Could the project description please be posted on the website? I think
>this was handed out today but I was late and forgot to pick one up after
>class.
>
>Thanks,
>Sam

It is available from the cs422 home page:

http://www.cs.ubc.ca/spider/poole/cs422/2005/#assignments

David

Message no. 123
Posted by Michael Chiang (s27992023) on Wednesday, March 2, 2005 11:48pm
Subject: assignment #3 and midterm marks

Dear all,

These marks are up, enjoy!

Michael

Message no. 124[Branch from no. 123]
Posted by Dan Shu-Zan Liu (s80395015) on Thursday, March 3, 2005 1:51am
Subject: Re: assignment #3 and midterm marks

Is it just me?  
Non of my marks seem to be up.
Maybe only the prof has the function in webct to allow the students to
view their marks, and the TA's can only enter them in.

Message no. 125[Branch from no. 124]
Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 3, 2005 3:00pm
Subject: Re: assignment #3 and midterm marks

None of my grades are up, either. (Not that I really want to see them)

Message no. 126
Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 3, 2005 3:26pm
Subject: Midterm average

What was the class average on the midterm, and will there be any scaling?

Message no. 127[Branch from no. 124]
Posted by David Poole (cpsc_422_term2) on Thursday, March 3, 2005 10:15pm
Subject: Re: assignment #3 and midterm marks

In message 124 on Thursday, March 3, 2005 1:51am, Dan Shu-Zan Liu writes:
>Is it just me?  
>Non of my marks seem to be up.
>Maybe only the prof has the function in webct to allow the students to
>view their marks, and the TA's can only enter them in.

Try it now. I changed some of the settings, but I can't really test them. 

David

Message no. 128[Branch from no. 126]
Posted by David Poole (cpsc_422_term2) on Thursday, March 3, 2005 10:19pm
Subject: Re: Midterm average

In message 126 on Thursday, March 3, 2005 3:26pm, Danelle Abra Wettstein
writes:
>What was the class average on the midterm, and will there be any scaling?

The stats will be available on the marks page.

Yes, there will be some scaling. I am not sure why the marks seem lower
this year; it is perhaps that I rearranged the course and misjudged the
difficultly of some of the topics. But I did tell you what was on the exam.

I will post a solution to the midterm tomorrow. Please look at the
solutions. One of the questions (perhaps reworded) will be on the final
exam!

David

Message no. 129[Branch from no. 127]
Posted by Samuel Douglas Davis (s85850014) on Thursday, March 3, 2005 10:23pm
Subject: Re: assignment #3 and midterm marks

>Try it now. I changed some of the settings, but I can't really test them. 
>
>David
>

It still doesn't work.

Sam

Message no. 130
Posted by Robin McQuinn (s12331039) on Friday, March 4, 2005 11:31am
Subject: Midterm/Lecture comments

After looking over the midterm, and everything that I got wrong, I feel
that many of the questions were a bit ambiguous.  I realize this is a
bit premature, not having the intended answers to compare to my own, but
I really do feel like I know many of the concepts that the midterm
thinks I don't.  

for example,  it took me several readings of question 1b to incorrectly
understand what was being asked.  I though that "communication between
layers and communication between time steps" reffered to the method and
nature of the information passed between the layers of the controller. 
Now, I'm only sure whats NOT being asked (~horn clause, but incomplete
to reason with)

In a broader scope, I don't feel like the lectures prepare us well
enough for implementing the algorithms in assignments or on tests.  A
significant portion of lecture time is spent on manipulating the Java
applets, and testing them with different scenarios.  The applets are
very interesting but tend to consume a disproportionate amount of time
in the lectures.  The actual algorithms upon which the applets are based
are covered in much less time than necessary.   

In short, I would suggest 3 things:  
Less time on applets in class,
More time on Algorithms, 
Possible examples of specific cases in which the algorithms are applied 
(actually fitting numbers into the equation to calculate by hand the
values for a specific scenario)

Hope that helps
Robin

Message no. 131[Branch from no. 130]
Posted by David Poole (cpsc_422_term2) on Saturday, March 5, 2005 11:05am
Subject: Re: Midterm/Lecture comments

In message 130 on Friday, March 4, 2005 11:31am, Robin McQuinn writes:

>for example,  it took me several readings of question 1b to incorrectly
>understand what was being asked.  I though that "communication between
>layers and communication between time steps" reffered to the method and
>nature of the information passed between the layers of the controller. 
>Now, I'm only sure whats NOT being asked (~horn clause, but incomplete
>to reason with)

This question was taken directly from the "what is on the midterm" web
page. Did you look at this?

>In a broader scope, I don't feel like the lectures prepare us well
>enough for implementing the algorithms in assignments or on tests.  A
>significant portion of lecture time is spent on manipulating the Java
>applets, and testing them with different scenarios.  The applets are
>very interesting but tend to consume a disproportionate amount of time
>in the lectures.  The actual algorithms upon which the applets are based
>are covered in much less time than necessary.   

Most of these algorithms are very short. You also had to use them in the
assignments.

You were also told what you were expected to do.  That same web page
said you should be able to do one step of value iteration and steps of
Q-learning. It also said you would use the game domain of assignment 2.

The most imprtant part of these algorithms is the gestalt part; how a
very simple control structure can give rise to complicated behaviour. 
Unfortunately it is difficult to ask questions about this in an exam.

>In short, I would suggest 3 things:  
>Less time on applets in class,
>More time on Algorithms, 
>Possible examples of specific cases in which the algorithms are applied 
>(actually fitting numbers into the equation to calculate by hand the
>values for a specific scenario)

OK. Thank you. For your comments.  I appreciate the feedback.

However, the aim of the lectures isn't to help you pass exams!  It is to
give you some background knowledge, to make you think about the
possibilities and to motivate you to learn more.  Unfortuately students
want marks assigned (if not, they are more than welcome to sit in on
lectures), so I have to think up questions, which indicate whether they
have got the ideas. I even told you the essence of what was on the exam;
on the ground that the details of some things was essential to
understanding what was going on, even if for other things, you just need
to get the main idea.

What do others think?

David

>Hope that helps
>Robin
>

Message no. 132[Branch from no. 131]
Posted by Vivian Luk (s82215013) on Saturday, March 5, 2005 1:11pm
Subject: Re: Midterm/Lecture comments

I think Robin brought up many good points.

Though we were told what will be covered on the midterm, the problem
lies with the amount of practice we had for applying concepts/doing
algorithms.  In class, you manipulate Java applets. In assignments, we
manipulate Java applets. In exams, there are no Java applets.  It will
be useful, as per Robin’s suggestion, to spend some more time on
understanding underlying concepts and doing algorithms in class.

I also felt that the midterm could have benefited from a second (or
third) proofreading to ensure the wording is clear and easy to follow. 
With regards to Q3 on the midterm, I spent a lot of unnecessary time in
understanding why there were 2 graphs instead of 1 and how they were
connected (in space/time/etc).  

Looking at the midterm solutions, it’s painfully evident that, “ah, of
course that’s the right answer”.  I guess from a professor’s
perspective, that is often what you feel.  But as students who may not
have been given adequate practice in class (on doing algorithms/applying
concepts/etc), the exams/assignments become more difficult than it
should be.

Hope this helps,
Vivian

Message no. 133[Branch from no. 131]
Posted by Sillard Jake Urbanovich (s82244013) on Saturday, March 5, 2005 6:16pm
Subject: Re: Midterm/Lecture comments

Midterm Feedback:

I liked how the lectures tried to engage us in the subject matter, like
the Java applets and when professor Poole brought some toys to class
(robot control).  

In contrast, the midterm seemed very abstract to me.  I came to the exam
and I felt like I entered a Biology or French 200 exam.  I could not use
any skills/knowledge that I acquired in any previous computer science
courses.  

Also, I agree with the previous posters about the wording of the
questions.  I might have gotten some questions completely wrong not
because I didn't necessarily know the answer, but because the answer I
gave answered a different question than what was asked.

Message no. 134
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 2:35pm
Subject: Propositions

Okay... a bit confused. The glossary of the textbook gives the following statement:
"Ground atoms denote propositions."

When you look up ground, though, it states this:
"A ground atom is one without any variables."

This is all fine and dandy but, the notes say this:
"A proposition is a boolean formula made from assignments of values to variables."

I'm really, really confused how something that should be ground also uses the 
word 'variables' in it. Could you explain further, and perhaps give an example of a 
proposition?

Thanks.

Message no. 135[Branch from no. 134]
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 2:55pm
Subject: Re: Propositions

Okay... I've mostly solved my confusion about this. Page 349 of the textbook was a big 
help.

Message no. 136
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 3:38pm
Subject: Understanding independence

Could someone who wrote down the answers to questions on page 2 please let me know 
what they are?

Thanks.

Message no. 137[Branch from no. 119]
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 3:55pm
Subject: Re: Anyone needs a group? Or any group needs member?

Hi Stephen, if you haven't yet formed a group, I have a group of 2 that needs more 
members. Let me know by emailing me at miznellie@shaw.ca

Message no. 138
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 4:02pm
Subject: Chapter 10, Lec 3

Can you explain how the two are independent given fire? It seems to be more like they 
are dependent, given fire, especially by the phrase "learning one can affect the other by 
changing your belief in fire".

Thanks.

Message no. 139
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 4:16pm
Subject: d-seperation

Can you explain this concept a little further? The B E Z etc is confusing me... what is Z 
and how is B part of it?

Thanks.

Message no. 140[Branch from no. 128]
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 4:47pm
Subject: Re: Midterm average

Have the solutions been posted?

Message no. 141
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 6, 2005 5:26pm
Subject: Assignment 4 - 1a

For conditional probabilities, do you do a probability for every node, for every value of its 
parents? (ie, a LOT of probabilities) You have an example in the notes, but I can't tell if 
it's just very incomplete or if I'm doing this wrong.

TIA

Message no. 142[Branch from no. 138]
Posted by Samuel Douglas Davis (s85850014) on Sunday, March 6, 2005 6:03pm
Subject: Re: Chapter 10, Lec 3

In message 138 on Sunday, March 6, 2005 4:02pm, Danelle Abra Wettstein
writes:
>Can you explain how the two are independent given fire? It seems to be
more like they 
>are dependent, given fire, especially by the phrase "learning one can
affect the other by 
>changing your belief in fire".
>
>Thanks.

If fire is not given, then observing smoke might increase your belief in
fire, which in turn increases your belief in alarm, and the 2 are thus
dependent. If fire is given, it means you know for certain whether fire
is true, so nothing that you learn about alarm or smoke can change your
belief in fire, and there is no way that changing your belief in smoke
can affect your belief in alarm, or vice versa.

To put it another way, if you *know* that fire is true, then you expect
alarm to be true with a certain probability, and this probability is not
affected by whether or not you observe smoke.

Sam

Message no. 143[Branch from no. 139]
Posted by Samuel Douglas Davis (s85850014) on Sunday, March 6, 2005 6:17pm
Subject: Re: d-seperation

In message 139 on Sunday, March 6, 2005 4:16pm, Danelle Abra Wettstein
writes:
>Can you explain this concept a little further? The B E Z etc is
confusing me... what is Z 
>and how is B part of it?
>
>Thanks.

I found this slide confusing because of the way the definition of a path
is sort of inserted into the definition of d-separation; I think it
might be clearer if path were defined first.

If I understand correctly, X, Y, and Z are sets of variables, and A, B
and C are individual variables. Z is the set of variables that are
given, so "B in Z" means B is given, and "B not in Z" means B is not
given. I think this slide is just stating the ideas of the 3 previous
slides more formally. 

I'm curious what the d in d-separation stands for.

HTH,
Sam

Message no. 144[Branch from no. 134]
Posted by David Poole (cpsc_422_term2) on Monday, March 7, 2005 12:44pm
Subject: Re: Propositions

In message 134 on Sunday, March 6, 2005 2:35pm, Danelle Abra Wettstein
writes:

>I'm really, really confused how something that should be ground also
uses the 
>word 'variables' in it. Could you explain further, and perhaps give an
example of a 
>proposition?

It is because logicians and probabilists use the term "variable" for
different things.  A random variable is not (what the logicians call) a
variable.

In probability, a variable (often called a random variable, but this is
a misnomer as there is nothing random about them) is like an algebraic
variable, that can take on a value, such as "todays maximum temperature"
which could take on integer values, say. Then "todays maximum
temperature = 14" is a proposition that is true or false.  This is the
same sort of "variable" you saw in CPSC 322 when you did CSPs.

This is contrased to a logical variable which denotes an individual in a
domian.  These are then quantified to spcify whether you want a formula
true for all indivivuals or whether it is true if these exists an
individual for which the variable is true.

When teaching this we have two choices:
(a) we use the traditional notation  and try to be clear when we are
talking about a random variable or a logical variable or
(b) we try to think up different names for the two different concepts.
Unfortunately neither choice is very satisfactory as the course doesn't
sit in isolation from the rest of what you have learnt or will learn.

David

Message no. 145[Branch from no. 141]
Posted by David Poole (cpsc_422_term2) on Monday, March 7, 2005 2:55pm
Subject: Re: Assignment 4 - 1a

In message 141 on Sunday, March 6, 2005 5:26pm, Danelle Abra Wettstein
writes:
>For conditional probabilities, do you do a probability for every node,
for every value of its 
>parents? (ie, a LOT of probabilities) You have an example in the notes,
but I can't tell if 
>it's just very incomplete or if I'm doing this wrong.
>
>TIA

Yes. The number of parameters is exponential in the number of parents.
This only works well if there are few parameters (i.e., there is lots of
conditional independencies).

David

Message no. 146[Branch from no. 143]
Posted by David Poole (cpsc_422_term2) on Monday, March 7, 2005 9:30pm
Subject: Re: d-seperation

In message 143 on Sunday, March 6, 2005 6:17pm, Samuel Douglas Davis writes:
>In message 139 on Sunday, March 6, 2005 4:16pm, Danelle Abra Wettstein
>writes:
>>Can you explain this concept a little further? The B E Z etc is
>confusing me... what is Z 
>>and how is B part of it?
>>
>>Thanks.
>
>I found this slide confusing because of the way the definition of a path
>is sort of inserted into the definition of d-separation; I think it
>might be clearer if path were defined first.

The trouble with doing it that way is that the notion of a path depends
on what is observed (the Z's).  Z is the set of observed variables.

>If I understand correctly, X, Y, and Z are sets of variables, and A, B
>and C are individual variables. Z is the set of variables that are
>given, so "B in Z" means B is given, and "B not in Z" means B is not
>given. I think this slide is just stating the ideas of the 3 previous
>slides more formally. 

Exactly.

>I'm curious what the d in d-separation stands for.

d stands for "directed" There is a standard notion of separation for
undiretced graphs. X and Y are separated by Z, where X,Y,Z are sets of
variables if every path from an element of X to an element of Y contains
an element of Z.

Message no. 147[Branch from no. 131]
Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, March 8, 2005 8:26am
Subject: Re: Midterm/Lecture comments

Snippet from David's message #131: 

This question was taken directly from the "what is on the midterm" web
page. Did you look at this?

-------------

While it was rather exciting to be given a superset of the questions
that would be on the exam,  it was also a fairly large drawback. That
is, even though there were many questions that I couldn't do even with
all of my notes and unlimited time while practicing (let alone trying to
do them in an exam situation), I couldn't very well ask for help,
because that would mean getting answers to the test questions.

Perhaps a more advantageous way to express your kindness would be to
provide a practice exam with solutions, instead of the actual questions
on the exam. That way we would still have an idea of the types of things
that would be required of us, but if we had any problems, at least we
would find out prior to the exam how to fix them.

Kaili

Message no. 148
Posted by David Poole (cpsc_422_term2) on Tuesday, March 8, 2005 9:03pm
Subject: Solution to miderm

		      CPSC 422 Midterm Solution 
			      March 2005

Question 1

(a) The belief state consists of Q[S,A], s, a
    Observe s', r
One possible control function is
  do a = argmax_a' Q[s',a'] with probability epsilon
         random action with probability 1-epsilon
State Transition function
   Q[s,a] = Q[s,a] + alpha( r + gamma max_a' Q[s',a']
   s = s'
   and remember the action it did (i.e., the "a" it selected in the
    control function).

[Answers much simpler than this got full marks, as long as they had
the right idea]

 (b)
  between time steps is handled by fluents (using the relations assign and was)
  between layes is handled by sharing predicate symbols (predicated defined in 
    one layer can be used in another).

Question 2.

There are 15 possible states that could be entered, depending on which
direction the robot actually went (up, left or right) and whether the
treasure arrived, and where it arrived. Those that have a non-zero immediate
reward and/or a future value give:

Q[s13,a2] =
  0.8 * 0.8 * ( 0 + 0.9 * 2)                -- up, no treasure
+ 0.8 * 0.2 * 0.25 * ( 0 + 0.9 * 7)         -- up, treasure at top right
+ 0.1 * 0.8 * ( 0.2 * -10 + 0.9 * 0)        -- left, no treasure
+ 0.1 * 0.2 * ( 0.2 * -10 + 0.9 * 0)        -- left, treasure appears
+ 0.1 * 0.2 * 0.25 (10 + 0.9*0)             -- right, treasure appears there

every other value is 0.  The most common mistake was confusing the
immediate reward and the estimated future value.


Question 3

(a) Q[s4,right]=10

(b) Q[s2,right] = -10 /2 = -5
    Q[s3,right]=0.9*10/2 = 4.5
    Q[s4,right]=10+0.5*(10-10)=10

(c) Q[s1,right], q[s2,right], q[s3,right], q[s4,right] all get their
values updated when it received the reward of 10 (i.e., when entering
s5).

(d) the first time, it only uses the new value (i.e, when k=1, alpha=1).
It is guaranteed to converge (we know that as it is between 1/k and 10*1/k)
More recent values are assigned higher weight than old values.

Question 4

(a) 
{at(2,0)}
{at(4,0)}
{at(6,0)}
{at(7,0)}

(b)
{at(2,0),do(right,0),do(right,1)}
{at(4,0),do(right,0),do(right,1)}

(c)
explain(observe(door,0) & do(right,0) & observe(nodoor,1) &
do(right,1) & observe(door, 2) & do(right,2) & at(L,3), E).

Message no. 149
Posted by Stanley Chi Hong Tso (s58635020) on Wednesday, March 9, 2005 8:56pm
Subject: assignment 4 - 1a

the problem I have is, if the student don't know how to do bit carry and bit addition, he 
should do a guess, but he should have the even lower probability getting the answer right.

is that make sense? I have the DoProblemWithGuess as a variable, do I have to make 
another one like 'understandingMaterial' so that it also affect the outcome?

I'm not sure what I'm asking haha. but well I get all works fine but just the part when a 
student doesn't know both, what would be the affect of it?

Stan

Message no. 150[Branch from no. 149]
Posted by David Poole (cpsc_422_term2) on Wednesday, March 9, 2005 10:35pm
Subject: Re: assignment 4 - 1a

In message 149 on Wednesday, March 9, 2005 8:56pm, Stanley Chi Hong Tso
writes:
>the problem I have is, if the student don't know how to do bit carry
and bit addition, he 
>should do a guess, but he should have the even lower probability
getting the answer right.
>
>is that make sense? 

Not really. If they guess, they have a 50-50 chance of getting any bit
correct.

>I have the DoProblemWithGuess as a variable, do I have to make 
>another one like 'understandingMaterial' so that it also affect the
outcome?

There are two parts of their understanding. Understanding basic
arithmetic and understanding the carry. The guess means the conditional
probability of an output bit given they don't understand is 0.5. If they
do understand the conditional probability of getting the correct anser
is much higher (but not 1 as students do make mistakes even if they know
the material).

>I'm not sure what I'm asking haha. but well I get all works fine but
just the part when a 
>student doesn't know both, what would be the affect of it?

Carrying and Adding affect different parts. For example, the value of
C_0 is only affacted by their understanding of the basic arithmetic.
They don't need to understand carrying to get this answer.  However,
they need to undertsand carrying to compute the carry bit that is needed
to compute C_1 (but given the carry bit, they only need to understand
basic arithmetic).

I hope this helps.

David

>Stan

Message no. 151
Posted by Michael Chiang (s27992023) on Wednesday, March 9, 2005 10:38pm
Subject: tomorrow's TA hour moved

Hi all,

Unfortuntately it doesn't seem that I'd be able to attend to my TA hour
tomorrow (10th March), as I'm feeling quite ill. I will notify again
upon recovery about running an extra TA hour some time.

Sorry for the late notice and any inconvenience this may cause,

Michael

Message no. 152[Branch from no. 151]
Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, March 9, 2005 11:59pm
Subject: Re: tomorrow's TA hour moved

Will it be possible that you can hold your office hour on Monday then? I just 
have some quesiton about the assignment that I want to ask...

Message no. 153
Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 10, 2005 9:11pm
Subject: "Guess"?

How do we emulate the "guess" in our graph? Can we just assume that if they guess, 
they get it wrong? :)

Message no. 154[Branch from no. 153]
Posted by David Poole (cpsc_422_term2) on Friday, March 11, 2005 8:15am
Subject: Re: "Guess"?

In message 153 on Thursday, March 10, 2005 9:11pm, Danelle Abra
Wettstein writes:
>How do we emulate the "guess" in our graph? Can we just assume that if
they guess, 
>they get it wrong? :)

No. Sometimes they guess right. Just make it so that they can pick a 1
or a 0 with uniform probability. The conditional probabilities can model
any distribution.

David

Message no. 155[Branch from no. 154]
Posted by Stanley Chi Hong Tso (s58635020) on Saturday, March 12, 2005 10:14pm
Subject: Re: "Guess"?

in the other word, its about the probability of getting it right. So guess make eg C0 
become .5 as getting it or make a mistake.


Stan

Message no. 156
Posted by Stanley Chi Hong Tso (s58635020) on Saturday, March 12, 2005 10:18pm
Subject: Probability question.

Reguarding the the notes, the example David gave us about P(ABCDEFG), we break it 
initially by P(G|ABCDEF) * P(F|ABCDE) * P(C|ABDE) * P(D|ABE) * P(E|AB) * P(A|B) * P
(B)

I wonder how P(D|ABE) or P(E|AB) exist? why D isn't only given the condition on only E? 
why P(D|ABE) != P(D|E) ????

Stan

Message no. 157
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 2:57pm
Subject: Notes on summing

What is the difference between P(Z|Y1 = y1, Y2 = y2...,YK = yK) and P(Z,Y1 = y1, Y2 = 
y2, .., YK = yK)?

Message no. 158
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 3:44pm
Subject: Question 2, a

I get P(E) = 2 and P(not E) = 2 for this answer... what am I doing wrong? I have followed 
the chart way of doing this, given in class.

Message no. 159
Posted by Guan Wang (s77942019) on Sunday, March 13, 2005 5:57pm
Subject: Assignt 4

Question 1:Do I have to model all the letters as nodes(that is A0, A1, 
B0, B1, C0, C1, C2) + the knows guess nodes? Is this the way of 
solving this question?

Question 2b)Querying p(e|~f) means F is false. I dont understand why 
according to online applet we have initial factor p(c) instead p(f|c).

Thanks,
Guan

Message no. 160[Branch from no. 155]
Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:39pm
Subject: Re: "Guess"?

In message 155 on Saturday, March 12, 2005 10:14pm, Stanley Chi Hong Tso
writes:
>in the other word, its about the probability of getting it right. So
guess make eg C0 
>become .5 as getting it or make a mistake.
>
>
>Stan

Right. For C0, there are two values. If you guess with a 0.5 chance for
each there is a 50% chance of getting it right.

It is like a multiple choice exam. If there are 4 alternatives, and you
guess each one, you would expect to get a grade of 25%.

I hope that helps,
David

Message no. 161[Branch from no. 156]
Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:42pm
Subject: Re: Probability question.

In message 156 on Saturday, March 12, 2005 10:18pm, Stanley Chi Hong Tso
writes:
>Reguarding the the notes, the example David gave us about P(ABCDEFG),
we break it 
>initially by P(G|ABCDEF) * P(F|ABCDE) * P(C|ABDE) * P(D|ABE) * P(E|AB)
* P(A|B) * P(B)

This is just a theorem of probability theory.

>I wonder how P(D|ABE) or P(E|AB) exist? why D isn't only given the
condition on only E? 
>why P(D|ABE) != P(D|E) ????

This is the assumption made in a belief network (a Bayes net): a node is
independent of its predecessors given its parents.

David

>Stan

Message no. 162[Branch from no. 157]
Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:44pm
Subject: Re: Notes on summing

In message 157 on Sunday, March 13, 2005 2:57pm, Danelle Abra Wettstein
writes:
>What is the difference between P(Z|Y1 = y1, Y2 = y2...,YK = yK) and
P(Z,Y1 = y1, Y2 = 
>y2, .., YK = yK)?

The first is a conditional probability and the second is the probability
of a conjunction. [Do you know what this means? Read the book/notes, and
if it still doesn't make sense, please aks.] You can easily compute the
first from the second.

David

Message no. 163[Branch from no. 158]
Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:46pm
Subject: Re: Question 2, a

In message 158 on Sunday, March 13, 2005 3:44pm, Danelle Abra Wettstein
writes:
>I get P(E) = 2 and P(not E) = 2 for this answer... what am I doing
wrong? I have followed 
>the chart way of doing this, given in class. 

I have no idea. But this is wrong. The algorithm I gave *always*
produces number in the range [0.,1] as it is the linear interpolation of
numbers in this range. Did you check your calculations using the applet?

David

Message no. 164[Branch from no. 159]
Posted by David Poole (cpsc_422_term2) on Sunday, March 13, 2005 8:48pm
Subject: Re: Assignt 4

In message 159 on Sunday, March 13, 2005 5:57pm, Guan Wang writes:
>Question 1:Do I have to model all the letters as nodes(that is A0, A1, 
>B0, B1, C0, C1, C2) + the knows guess nodes? Is this the way of 
>solving this question?

Yes, this is *a* way of solving the problem. It seems like a reasonable
way as then you can set observations easily.

>Question 2b)Querying p(e|~f) means F is false. I dont understand why 
>according to online applet we have initial factor p(c) instead p(f|c).

Because you have observed F=false. It is just a function of C.

David

>Thanks,
>Guan

Message no. 165
Posted by David Burns Cameron (s66878984) on Sunday, March 13, 2005 10:17pm
Subject: Assn 4 Q1 assumptions

I'm tempted to assume that the value of a carried digit depends only on
whether or not the student knows how to carry, and not whether or not
the student knows binary addition. But this doesn't quite seem right,
since carrying is the step you do after you have correctly done the
addition step, if it is necessary.

However, if I model the carry digit as depending on whether or not the
student knows binary addition and whether or not the student knows how
to carry, then that makes one variable dependent on 5 variables, and
therefore having 2^5 probabilities. Many are the same, but this still
feels excessive.

I guess we are trying to model a person though, so things should get a
little complicated. What does everyone else think?

Dave

Message no. 166
Posted by David Burns Cameron (s66878984) on Sunday, March 13, 2005 10:28pm
Subject: Assn 4 Q1 prior probabilities

Since we don't know whether the student knows binary addition or not
what is a reasonable prior probability for it? Should we just guess,
given no further information?

Dave

Message no. 167[Branch from no. 160]
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 11:17pm
Subject: Re: "Guess"?

Yes, definitely. Thanks both of you!

Message no. 168[Branch from no. 165]
Posted by Danelle Abra Wettstein (s86800018) on Sunday, March 13, 2005 11:20pm
Subject: Re: Assn 4 Q1 assumptions

I made them independent of one another... ie, it's possible to not know how to carry even if they do 
know how to add, and the carry is only dependent on whether the person knows how to carry, not if 
he/she knows addition.

Message no. 169[Branch from no. 165]
Posted by Samuel Douglas Davis (s85850014) on Sunday, March 13, 2005 11:23pm
Subject: Re: Assn 4 Q1 assumptions

In message 165 on Sunday, March 13, 2005 10:17pm, David Burns Cameron
writes:
>I'm tempted to assume that the value of a carried digit depends only on
>whether or not the student knows how to carry, and not whether or not
>the student knows binary addition. But this doesn't quite seem right,
>since carrying is the step you do after you have correctly done the
>addition step, if it is necessary.

I think it depends on what is meant by "knowing how to carry." I assumed
that carrying is just a function from the inputs to a carry bit, so you
carry always, not just when it is necessary (ie. sometimes you carry a
0). In this case carrying should be independent of whether the student
knows how to do binary addition.

I suppose "knowing how to carry" could be defined as knowing what to do
with the extra bit you get when the sum is greater than 1, in which case
things do get messy. It seems like the answer actually depends on how
the student does addition (either as 2 independent adding and carrying
operations or 1 combined operation), and this could be different for
different people. The way the question is phrased, it sounds to me like
we should assume they are independent, but I'm not certain.

Sam

Message no. 170[Branch from no. 169]
Posted by Daniel Joseph Anderson (s76045996) on Monday, March 14, 2005 12:12am
Subject: Re: Assn 4 Q1 assumptions

Consider this: carrying is the operation of, given a value that is too large to fit in one 
digit, subtracting some amount and compensating it elsewhere to make it fit in the digit in 
question.

Not knowing how to add does not prevent the number under consideration from 
(correctly or in-) being larger than the digit in question can hold.

Thus "I have a number that won't fit, I have to do something with it" can result in 
CORRECT carrying from addition operations of unknown correctness.

If you make different assumptions - eg. can't know how to carry if don't know how to 
add - it dramatically decreases the size of the model. Since this is a "from real life" 
example, though, choosing the model is at least as important as getting the numbers 
right. Which model is right, I can't say - due in no small part to not being absolutely 
certain. *grin*

Message no. 171[Branch from no. 169]
Posted by Daniel Joseph Anderson (s76045996) on Monday, March 14, 2005 12:16am
Subject: Re: Assn 4 Q1 assumptions

Oh, and Samuel: if you assume that carrying is an operation performed even when there 
is no carry bit, you do realize that with correct addition, 00 + 01 gives a potentially 
incorrect result, right? (I was originally going to model it that way - nice and simple - but 
then realized that most people don't think about carrying 0s, and so it's maybe not very 
realistic to act as if they do.)

Message no. 172[Branch from no. 171]
Posted by Daniel Joseph Anderson (s76045996) on Monday, March 14, 2005 12:48am
Subject: Re: Assn 4 Q1 assumptions

And, just to be spammy: note that the problem can be much more efficiently stated if the 
correct answer is stored. Then info (don't want to give it away entirely) dictates whether 
or not knowing/not knowing comes into play for carrying, giving a T/F for whether that 
matters, which then acts upon the true value to give the value that the student in 
question would achieve. So you get a few extra nodes, roughly the same number of arcs, 
but vastly a vastly smaller total for entries in probability tables.

Too bad I've already done it the long way - that would've been easier.

Message no. 173[Branch from no. 170]
Posted by Samuel Douglas Davis (s85850014) on Monday, March 14, 2005 1:15am
Subject: Re: Assn 4 Q1 assumptions

>If you make different assumptions - eg. can't know how to carry if
don't know how to 
>add - it dramatically decreases the size of the model. Since this is a
"from real life" 
>example, though, choosing the model is at least as important as getting
the numbers 
>right. Which model is right, I can't say - due in no small part to not
being absolutely 
>certain. *grin*

I don't think there was any suggestion that the student's knowledge of
how to carry should be dependent on their knowledge of addition. The
question is whether getting a wrong answer in the addition step affects
the chance of getting the carry bit right.

Message no. 174[Branch from no. 165]
Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 10:21am
Subject: Re: Assn 4 Q1 assumptions

In message 165 on Sunday, March 13, 2005 10:17pm, David Burns Cameron
writes:
>I'm tempted to assume that the value of a carried digit depends only on
>whether or not the student knows how to carry, and not whether or not
>the student knows binary addition. But this doesn't quite seem right,
>since carrying is the step you do after you have correctly done the
>addition step, if it is necessary.
>
>However, if I model the carry digit as depending on whether or not the
>student knows binary addition and whether or not the student knows how
>to carry, then that makes one variable dependent on 5 variables, and
>therefore having 2^5 probabilities. Many are the same, but this still
>feels excessive.

It is. You only need 4 parents. If the student needs to know both, you
could create a parent that says they know both (and has both as a
parent). Or youy could make knowing how to carry depend on knowing
addition.  Which make more sense depends on your semantics for the
various variables.  [Eventually we want to get to the stage that the
decisions are based on how the world works, not about the tool.]

David

>I guess we are trying to model a person though, so things should get a
>little complicated. What does everyone else think?
>
>Dave

Message no. 175[Branch from no. 166]
Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 10:22am
Subject: Re: Assn 4 Q1 prior probabilities

In message 166 on Sunday, March 13, 2005 10:28pm, David Burns Cameron
writes:
>Since we don't know whether the student knows binary addition or not
>what is a reasonable prior probability for it? Should we just guess,
>given no further information?
>
>Dave

For the moment, just guess, We will discuss how to learn probabilities
from data later.

David

Message no. 176[Branch from no. 169]
Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 10:26am
Subject: Re: Assn 4 Q1 assumptions

> The way the question is phrased, it sounds to me like
>we should assume they are independent, but I'm not certain.

I was trying to phrase the question to not prejudge any answer, but for
you to think about the domain (as you have done) and to make a choice
that seems reasonable.  I will post a solution, but there is no right
answer.  One of the things to learn is that modelling a domain is
non-trivial (even a seemingly trivial example), but once modelled, we
can answer interesting questions.

David

Message no. 177
Posted by David Poole (cpsc_422_term2) on Monday, March 14, 2005 11:58am
Subject: project proposal feeback

Everyone should have received email commenting on your porposal.  If you
didn't receive an email, please send me an email containing your proposal.

In general we want a project that is of managable size so that you can
tell us in the presentation: here is one thing that we tried and it
did/didn't work, and we learned...

David

Message no. 178
Posted by David Poole (cpsc_422_term2) on Wednesday, March 16, 2005 1:14pm
Subject: Relevant talk tomorrow

There is an invited speaker talking tomorrow immediatley before our
class. David Forsyth will be talking about tracking people. This is
closely related to what we have been covering in class. See:
http://www.cs.ubc.ca/~rbridson/EASS/#mar17

Some of you may find this interesting.

David

Message no. 179
Posted by David Poole (cpsc_422_term2) on Wednesday, March 16, 2005 2:23pm
Subject: web page on RL

I just came across he following web page on
" Common myths and misstatements about reinforcement learning
The ambition"
http://neuromancer.eecs.umich.edu/cgi-bin/twiki/view/Main/MythsofRL

It may make interesting reading given your projects. You should be able
to understand much of it (it isn't as technically complicated as many
other pages).

David

Message no. 180
Posted by Michael Chiang (s27992023) on Monday, March 21, 2005 2:41pm
Subject: project consultation with TAs

Hi all,

Below are available time slots for meeting with either Frank or I for
project consultation. Here are instructions for choosing slots:

(1) Each group are allowed 2 slots of 20 minutes per week leading up to
the due date. The chosen slots will be fixed for this period.
(2) Choose two empty slots by putting the name or student number of ONE
of your group members in the box to the left, and repost the modified
list to this thread using the reply function. We will use the latest
completed version of this list, and please do not alter the choices of
other groups!

----------------------------------------
Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[ ] - Fri 2.20 ~ 2.40pm
[ ] - Fri 2.40 ~ 3pm
[ ] - Fri 3 ~ 3.20pm
[ ] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ ] - Thu 1 ~ 1.20pm
[ ] - Thu 1.20 ~ 1.40pm
[ ] - Thu 1.40 ~ 2pm
[ ] - Thu 2 ~ 2.20pm
[ ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[ ] - Thu 3.20 ~ 3.40pm
[ ] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 181[Branch from no. 180]
Posted by David Burns Cameron (s66878984) on Monday, March 21, 2005 8:35pm
Subject: Re: project consultation with TAs

----------------------------------------
Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[ ] - Fri 2.20 ~ 2.40pm
[ ] - Fri 2.40 ~ 3pm
[ ] - Fri 3 ~ 3.20pm
[ ] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ Dave Cameron ] - Thu 1 ~ 1.20pm
[ Dave Cameron ] - Thu 1.20 ~ 1.40pm
[ ] - Thu 1.40 ~ 2pm
[ ] - Thu 2 ~ 2.20pm
[ ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[ ] - Thu 3.20 ~ 3.40pm
[ ] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 182[Branch from no. 181]
Posted by Ryan Yee (s81483042) on Tuesday, March 22, 2005 2:57pm
Subject: Re: project consultation with TAs

----------------------------------------
Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[ ] - Fri 2.20 ~ 2.40pm
[ ] - Fri 2.40 ~ 3pm
[ ] - Fri 3 ~ 3.20pm
[ ] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ Dave Cameron ] - Thu 1 ~ 1.20pm
[ Dave Cameron ] - Thu 1.20 ~ 1.40pm
[ ] - Thu 1.40 ~ 2pm
[ Ryan Yee ] - Thu 2 ~ 2.20pm
[ Ryan Yee ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[ ] - Thu 3.20 ~ 3.40pm
[ ] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 183[Branch from no. 182]
Posted by Dan Shu-Zan Liu (s80395015) on Tuesday, March 22, 2005 3:26pm
Subject: Re: project consultation with TAs

----------------------------------------
Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[ ] - Fri 2.20 ~ 2.40pm
[ ] - Fri 2.40 ~ 3pm
[ ] - Fri 3 ~ 3.20pm
[ ] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ Dave Cameron ] - Thu 1 ~ 1.20pm
[ Dave Cameron ] - Thu 1.20 ~ 1.40pm
[ ] - Thu 1.40 ~ 2pm
[ Ryan Yee ] - Thu 2 ~ 2.20pm
[ Ryan Yee ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[Bob McGregor ] - Thu 3.20 ~ 3.40pm
[Bob McGregor] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 184[Branch from no. 183]
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, March 23, 2005 9:16am
Subject: Re: project consultation with TAs

----------------------------------------
Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[ ] - Fri 2.20 ~ 2.40pm
[ ] - Fri 2.40 ~ 3pm
[ ] - Fri 3 ~ 3.20pm
[ ] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ Dave Cameron ] - Thu 1 ~ 1.20pm
[ Dave Cameron ] - Thu 1.20 ~ 1.40pm
[ Kaili Vesik ] - Thu 1.40 ~ 2pm
[ Ryan Yee ] - Thu 2 ~ 2.20pm
[ Ryan Yee ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[Bob McGregor ] - Thu 3.20 ~ 3.40pm
[Bob McGregor] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 185
Posted by David Poole (cpsc_422_term2) on Thursday, March 24, 2005 4:00pm
Subject: Assignment 5

The assignment and the decision network for question 2 is attached. Have
a good wekend.

David

See Attached

Message no. 186
Posted by Frank Hutter (s62336011) on Friday, March 25, 2005 9:17am
Subject: extra project consultation hours Tuesday, Mar 29

Hi all,

when Michael and me came up with project consultation hours, none of us
realized my primary choices of Monday and Friday both coincided with
Easter in the first week.

I guess the least I can do to leverage this is to throw in a few extra
slots next Tuesday (the day after Easter Monday), so you have a chance
to talk about  projects in case you're working on them over the weekend.

I'm busy until 5 on Tuesday, so here's a few slots after that:
[ ] - Tu, Mar 29: 5 ~ 5.20pm
[ ] - Tu, Mar 29: 5.20 ~ 5.40pm
[ ] - Tu, Mar 29: 5.40 ~ 6pm
[ ] - Tu, Mar 29: 6 ~ 6.20pm
[ ] - Tu, Mar 29: 6.20 ~ 6.40pm
[ ] - Tu, Mar 29: 6.40 ~ 7pm

Cheers,
Frank

Message no. 187[Branch from no. 184]
Posted by Kaili Elizabeth Vesik (s83834010) on Saturday, March 26, 2005 5:32pm
Subject: Re: project consultation with TAs

----------------------------------------
Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[ ] - Fri 2.20 ~ 2.40pm
[ ] - Fri 2.40 ~ 3pm
[ Kaili Vesik ] - Fri 3 ~ 3.20pm
[ Kaili Vesik] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ Dave Cameron ] - Thu 1 ~ 1.20pm
[ Dave Cameron ] - Thu 1.20 ~ 1.40pm
[ ] - Thu 1.40 ~ 2pm
[ Ryan Yee ] - Thu 2 ~ 2.20pm
[ Ryan Yee ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[Bob McGregor ] - Thu 3.20 ~ 3.40pm
[Bob McGregor] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 188
Posted by Guan Wang (s77942019) on Wednesday, March 30, 2005 12:46pm
Subject: Finding policy

Hi, I'm confused on how to find the optimal policy. For the car-buying example discussed 
in class how do I know which nodes to cancel first(why eliminate car condition first), what 
is the order of elimination? 

Thanks a lot,
Guan

Message no. 189[Branch from no. 188]
Posted by David Poole (cpsc_422_term2) on Wednesday, March 30, 2005 5:24pm
Subject: Re: Finding policy

In message 188 on Wednesday, March 30, 2005 12:46pm, Guan Wang writes:
>Hi, I'm confused on how to find the optimal policy. For the car-buying
example discussed 
>in class how do I know which nodes to cancel first(why eliminate car
condition first), what 
>is the order of elimination? 
>
>Thanks a lot,
>Guan

Let's talk about that in class tomorrow. But in general, the elimination
order is arbitrary as long as you eliminate a decision variable when it
is in a factor that contains only (some of) its parents. That is, you
eliminate a decision variable's non-parents first, then the decision
variable itself (by maximizing). Apart from this, you can eliminate
variables in any order.

David

Message no. 190
Posted by Danelle Abra Wettstein (s86800018) on Wednesday, March 30, 2005 9:02pm
Subject: Solving the decision network?

I'm having troubles solving the decision network. Try to optimize, and it says to add the 
no-forgetting arcs. Try to add the arcs, and it tells me to order the decision variables. 
How do you do that?

And how do you give the decision variables policies, if at all?

Sorry... feel like I should know this from lecture.

Thanks in advance!

Message no. 191
Posted by David Poole (cpsc_422_term2) on Wednesday, March 30, 2005 10:37pm
Subject: CIspace Bayes net applet

We have a new version of the applet available from the CIspace page. It
has been fixed so that it works as long as you use verbose mode and
eliminate the variables manually. Also, remember to save your graph.
Don't add new variables after optimizing. If you do this, it should work
fine. (It is on the list to get fixed this summer, but that isn't much
help to you.)

As an alternative, you can use Netica (downloadable from
http://www.norsys.com/). The free version should be good enough for your
assignment.

Please let us know if you have any problems.

David

Message no. 192[Branch from no. 190]
Posted by David Poole (cpsc_422_term2) on Wednesday, March 30, 2005 10:40pm
Subject: Re: Solving the decision network?

In message 190 on Wednesday, March 30, 2005 9:02pm, Danelle Abra
Wettstein writes:
>I'm having troubles solving the decision network. Try to optimize, and
it says to add the 
>no-forgetting arcs. Try to add the arcs, and it tells me to order the
decision variables. 
>How do you do that?

Please clear your cache and try again. We have uploaded a new version
that should fix this. (Optimize in verbose mode and eliminate variables
by clicking on them).

>And how do you give the decision variables policies, if at all?

By optimizing.... then you can chabge the policies.

>Sorry... feel like I should know this from lecture.

Sorry that the applet isn't as bug-free as we would like...

>Thanks in advance!

You are welcome. I hope it works now,

David

Message no. 193[Branch from no. 187]
Posted by Blake William Edwards (s83251017) on Thursday, March 31, 2005 5:33pm
Subject: Re: project consultation with TAs

This is for this week right? or did i miss it

Frank's slots:
@ room #341

[ ] - Mon 1 ~ 1.20pm
[ ] - Mon 1.20 ~ 1.40pm
[ ] - Mon 1.40 ~ 2pm
[ ] - Mon 2 ~ 2.20pm
[ ] - Mon 2.20 ~ 2.40pm
[ ] - Mon 2.40 ~ 3pm
[ ] - Mon 3 ~ 3.20pm

[ ] - Wed 9.50am ~ 10.10am
[ ] - Wed 10.10 ~ 10.30am
[ ] - Wed 10.30 ~ 10.50am

[ ] - Fri 1 ~ 1.20pm
[ ] - Fri 1.20 ~ 1.40pm
[ ] - Fri 1.40 ~ 2pm
[ ] - Fri 2 ~ 2.20pm
[Blake Edwards ] - Fri 2.20 ~ 2.40pm
[Blake Edwards ] - Fri 2.40 ~ 3pm
[ Kaili Vesik ] - Fri 3 ~ 3.20pm
[ Kaili Vesik] - Fri 3.20 ~ 3.40pm
[ ] - Fri 3.40 ~ 4pm
[ ] - Fri 4 ~ 4.20pm

-----------------------------------------------
Mike's slots:
@ room # 206 (student learning centre)

[ ] - Tue 1 ~ 1.20pm
[ ] - Tue 1.20 ~ 1.40pm
[ ] - Tue 1.40 ~ 2pm
[ ] - Tue 2 ~ 2.20pm
[ ] - Tue 2.20 ~ 2.40pm
[ ] - Tue 2.40 ~ 3pm
[ ] - Tue 3 ~ 3.20pm
[ ] - Tue 3.20 ~ 3.40pm
[ ] - Tue 3.40 ~ 4pm
[ ] - Tue 4 ~ 4.20pm

[ Dave Cameron ] - Thu 1 ~ 1.20pm
[ Dave Cameron ] - Thu 1.20 ~ 1.40pm
[ ] - Thu 1.40 ~ 2pm
[ Ryan Yee ] - Thu 2 ~ 2.20pm
[ Ryan Yee ] - Thu 2.20 ~ 2.40pm
[ ] - Thu 2.40 ~ 3pm
[ ] - Thu 3 ~ 3.20pm
[Bob McGregor ] - Thu 3.20 ~ 3.40pm
[Bob McGregor] - Thu 3.40 ~ 4pm
[ ] - Thu 4 ~ 4.20pm

Message no. 194
Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 31, 2005 9:27pm
Subject: Project question

I know I'm going to get mocked for this question, but does the report have to be a 
certain length? Can we get a guideline? I just don't want to fall incredibly short of the 
suggested length. Amount of words? Something? The proposal is one thing, but I don't 
want to get marks taken off of the project due to poor format :)

Message no. 195[Branch from no. 194]
Posted by David Poole (cpsc_422_term2) on Thursday, March 31, 2005 10:11pm
Subject: Re: Project question

In message 194 on Thursday, March 31, 2005 9:27pm, Danelle Abra
Wettstein writes:
>I know I'm going to get mocked for this question, but does the report
have to be a 
>certain length? Can we get a guideline? I just don't want to fall
incredibly short of the 
>suggested length. Amount of words? Something? The proposal is one
thing, but I don't 
>want to get marks taken off of the project due to poor format :)

Good question. I was wondering when someone was going to ask this!

I would sggest about 3-8 pages typeset (depending on the size of the group).

Here is a suggested outline:

Title + Authors

Abstract: about 100 words

Introduction: what is the problem you are trying to solve

Badckground: what someone needs to know to read this paper (write it so
that one of your peers can understand what is going on).

Hypothesis: what is it that you are trying to show

Methodology: what you actually did to test the hypothesis

Results: what you discovered (and why we should believe it).

Conclusion and future work: sum up the paper and suggest what other
questions may be interesteing based on what you did.

Acknowledgements and References: reference all sources used.

----
I hope that helps,
David

Message no. 196
Posted by David Poole (cpsc_422_term2) on Thursday, March 31, 2005 10:13pm
Subject: Project Presentation Schedule

Here is a schedule for the project presentations. You can switch
times, but you can only switch daya with an equal number of people. (I
want to keep the days balanced).

Each person should plan for 3 minutes + 1 minute for questions. Just
try to tell us one thing that is interesting. Groups should give a
multi-person coordinated talk (i.e., every person should talk, and the
whole presentation of the group should be coherent).

You can use slides (for use with an overhead projector), powerpoint or
pdf. You can bring it on a floppy (remember these?), a USB drive, a CD
or you can email it to me before 8:00pm on the previous day.

If you name is not on this list please email me ASAP.

         *******  Tuesday ********

Costa Vlachos, Tom Pospisil 

David Matheson, Ian Macdonald, Yavar Naddaf

Robin McQuinn, Dave Cameron

Daniel McLaren

Stanley Chiu, Dan Liu, Vivian Luk, Bob McGregor, Sillard Urbanovich,
Guan Wang

       *******  Thursday  *******

David Chong, Kaili Vesik

Bryan Chua

Onur Kamili, Ryan Yee

Wing Hang David Chan, Stanley Tso

Danelle Wettstein, Kevin Irmscher, Stephen Mak

Sam Davis

Blake Edwards

William Fong, Daniel Chang

Message no. 197[Branch from no. 195]
Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 31, 2005 10:17pm
Subject: Re: Project question

Most definitely. That was better than expected :) 

So, a group of 3 people should have about 5 or 6 pages... got it!

Message no. 198
Posted by David Poole (cpsc_422_term2) on Thursday, March 31, 2005 10:20pm
Subject: Today's logic programming & Bayes net example

If you are interested in the example of the multi-digit arithmetic from
todays class, see
http://www.cs.ubc.ca/spider/poole/ci2/code/cilog/CILog2.html
It is the arithmetic.cil that I showed. It is interesting to play with.

If you don't want to look at it that's fine. I promise I won't ask
anything on the final exam about it or the other stuff I covered in the
second half of the class. But I will about the value of information.

David

Message no. 199[Branch from no. 197]
Posted by Danelle Abra Wettstein (s86800018) on Thursday, March 31, 2005 10:27pm
Subject: Re: Project question

Actually...

Double-spaced?

And what about w/ images? Should we count those out when doing pages, or do they 
belong in the pages?

No, I'm  never happy with an answer ;)

Message no. 200[Branch from no. 196]
Posted by Christopher John Hawkins (s93985018) on Thursday, March 31, 2005 10:50pm
Subject: Re: Project Presentation Schedule

My group (C.J. Hawkins and Mike Nightingale) seems to have been left off
the list.

Message no. 201[Branch from no. 199]
Posted by David Poole (cpsc_422_term2) on Friday, April 1, 2005 2:58pm
Subject: Re: Project question

In message 199 on Thursday, March 31, 2005 10:27pm, Danelle Abra
Wettstein writes:
>Actually...
>
>Double-spaced?

No. Jut make it as readable as possible.

>And what about w/ images? Should we count those out when doing pages,
or do they 
>belong in the pages?

This is meant to be a rough estimate. Figures are good, but I can't tell
how many words a figure should replace. Use enough words to explain
clearly what you have done and what
you have learned. Use your common sense.

>No, I'm  never happy with an answer ;)

OK. So then I'll give you an answer to keep you unhappy ;^}

David

Message no. 202[Branch from no. 200]
Posted by David Poole (cpsc_422_term2) on Saturday, April 2, 2005 2:34pm
Subject: Re: Project Presentation Schedule

In message 200 on Thursday, March 31, 2005 10:50pm, Christopher John
Hawkins writes:
>My group (C.J. Hawkins and Mike Nightingale) seems to have been left off
>the list.  

You can present on Thursday.

David

Message no. 203
Posted by Kaili Elizabeth Vesik (s83834010) on Saturday, April 2, 2005 10:24pm
Subject: Assignment marks

David,

You mentioned in class that any assignments done prior to the midterm
with grades lower than our midterm grade would have their grades
increased to the value of our midterm grade. Should we expect to see
this reflected in the "grades" section of webct, or will it be
considered only when you calculate our final marks?

Thanks.
Kaili

Message no. 204[Branch from no. 203]
Posted by David Poole (cpsc_422_term2) on Sunday, April 3, 2005 10:55am
Subject: Re: Assignment marks

In message 203 on Saturday, April 2, 2005 10:24pm, Kaili Elizabeth Vesik
writes:
>David,
>
>You mentioned in class that any assignments done prior to the midterm
>with grades lower than our midterm grade would have their grades
>increased to the value of our midterm grade. Should we expect to see
>this reflected in the "grades" section of webct, or will it be
>considered only when you calculate our final marks?
>
>Thanks.
>Kaili

It will be reflected in my program to compute grades. (But it might be
good to remind me closer to the final exam ;^)

David

Message no. 205
Posted by David Poole (cpsc_422_term2) on Sunday, April 3, 2005 11:02am
Subject: Assignment 5

I had one student who had a problem with the assignment because they did
not put in the no-forgetting arcs. You need arcs from previous decisions
and the information available to them into subsequent decisions.
Otherwise the algorithm doesn't work: it never gets to the stage where
it can maximize. Unfortunately the applet doesn't give very good error
messages (It used to, but it was wrong, so we removed it).  That is the
only tricky part of question 1.

David

p.s. if you don't know where to start, it may be easier to start with
question 2. The questions are in this order in the assignment because
logically creating a decision network comes before solving one. But
playing with one may make it easier to know how to construct one.

Message no. 206
Posted by Daniel Joseph Anderson (s76045996) on Sunday, April 3, 2005 11:42am
Subject: assignment 5

There's mention of assignment 5 here, but it's not on the website... what's up?

Message no. 207[Branch from no. 206]
Posted by David Poole (cpsc_422_term2) on Sunday, April 3, 2005 1:17pm
Subject: Re: assignment 5

In message 206 on Sunday, April 3, 2005 11:42am, Daniel Joseph Anderson
writes:
>There's mention of assignment 5 here, but it's not on the website...
what's up?

It was given out in class and the text is in message 185 (March 24).

David

Message no. 208
Posted by Stephen Shui Fung Mak (s36743003) on Monday, April 4, 2005 12:13am
Subject: Final Exam Practice Questions?

Will there be any given out anytime soon? It would be great if a set of practice final 
exam questions can be given this week so that we can have more time to ask questions 
and prepare for it.

Message no. 209[Branch from no. 208]
Posted by David Poole (cpsc_422_term2) on Monday, April 4, 2005 9:26am
Subject: Re: Final Exam Practice Questions?

In message 208 on Monday, April 4, 2005 12:13am, Stephen Shui Fung Mak
writes:
>Will there be any given out anytime soon? It would be great if a set of
practice final 
>exam questions can be given this week so that we can have more time to
ask questions 
>and prepare for it.

OK. I will try to get something out this week. But I can't promise it.

David

Message no. 210[Branch from no. 209]
Posted by Wing Hang Chan (s84098011) on Monday, April 4, 2005 12:03pm
Subject: Re: Final Exam Practice Questions?

It would be great to have the sample questions released early.  Also,
will we be allowed a cheat-sheet like for the midterm?

Message no. 211
Posted by Robert McGregor (s92140011) on Monday, April 4, 2005 8:16pm
Subject: Assignment 5

Hi,

I'm just wondering if there is an update for the decision networking applet that has not 
been uploaded.  The current applet on CISpace does not allow you to create decision or 
value nodes...

Thanks,
Bob

Message no. 212[Branch from no. 211]
Posted by Samuel Douglas Davis (s85850014) on Monday, April 4, 2005 9:14pm
Subject: Re: Assignment 5

You have to select Belief/Decision Mode --> Decision Network Mode in the
Network Options menu.

Sam

Message no. 213[Branch from no. 210]
Posted by David Poole (cpsc_422_term2) on Monday, April 4, 2005 9:24pm
Subject: Re: Final Exam Practice Questions?

In message 210 on Monday, April 4, 2005 12:03pm, Wing Hang Chan writes:
>It would be great to have the sample questions released early. 

I agree; it would be great.

> Also,
>will we be allowed a cheat-sheet like for the midterm?

Yes. One sheet of letter sized paper. You can use as many sides of this
one sheet of paper as you like.

David

Message no. 214
Posted by Daniel Wen-Yen Chang (s81965014) on Monday, April 4, 2005 9:42pm
Subject: Assignment 5 Questions

Hi,

I'm just wondering if anyone could give me an example of a utility function, optimal 
decision function, and optimal policy? Probably using the car buying question?

I tried to look at the notes and the things that David has written on the board, but I'm 
still confused as to what is expected from us.

cheers,
dan

Message no. 215[Branch from no. 212]
Posted by Robert McGregor (s92140011) on Monday, April 4, 2005 10:10pm
Subject: Re: Assignment 5

Thanks for the help.

Bob

Message no. 216[Branch from no. 214]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 5, 2005 9:30am
Subject: Re: Assignment 5 Questions

In message 214 on Monday, April 4, 2005 9:42pm, Daniel Wen-Yen Chang writes:
>Hi,
>
>I'm just wondering if anyone could give me an example of a utility
function, optimal 
>decision function, and optimal policy? Probably using the car buying
question?

Have a look at question 2. If you click on the utility (diamond shaped)
node (in the appropriate mode) it shows you the utility function. It
gives the utility for various values of its parents.

After you optimize decisions (use verbose mode, and click on the nodes
to emilinate), you can view the optimal decision functions by clicking
on them.

The optimal policy is the set of the optimal decision functions.

>I tried to look at the notes and the things that David has written on
the board, but I'm 
>still confused as to what is expected from us.
>
>cheers,
>dan

Message no. 217[Branch from no. 205]
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 6, 2005 7:05pm
Subject: Re: Assignment 5

I am having some problems with Question One. Here's what I've done (I've
tried it both online and after downloading the applet onto my own machine):
- created network
- filled in probablity/utility tables
- clicked "add no-forgetting arcs" button
- clicked "optimize decisions" button

There are two issues.

First of all, all the utilities I filled into the table are in the range
[0,100], but when I optimize, it says the expected utility is 451 (plus
some decimal). This seems rather strange to me.

Second problem: after I optimize decisions, when I click the
"View/modify decision" button and then a decision variable rectangle, I
get an error message saying "Policy has not been defined yet." When I
click the "Tell me more" option, it says "the network has not yet been
optimized". But didn't I just optimized it?

Could anyone be so kind as to point me in the right direction here?
Thanks!

Message no. 218[Branch from no. 217]
Posted by Samuel Douglas Davis (s85850014) on Wednesday, April 6, 2005 8:28pm
Subject: Re: Assignment 5

In message 217 on Wednesday, April 6, 2005 7:05pm, Kaili Elizabeth Vesik
writes:
>I am having some problems with Question One. Here's what I've done (I've
>tried it both online and after downloading the applet onto my own machine):
>- created network
>- filled in probablity/utility tables
>- clicked "add no-forgetting arcs" button
>- clicked "optimize decisions" button
>
>There are two issues.
>
>First of all, all the utilities I filled into the table are in the range
>[0,100], but when I optimize, it says the expected utility is 451 (plus
>some decimal). This seems rather strange to me.
>
>Second problem: after I optimize decisions, when I click the
>"View/modify decision" button and then a decision variable rectangle, I
>get an error message saying "Policy has not been defined yet." When I
>click the "Tell me more" option, it says "the network has not yet been
>optimized". But didn't I just optimized it?
>
>Could anyone be so kind as to point me in the right direction here?
>Thanks!

I had the same problems. Are you using verbose mode and selecting the
nodes to eliminate manually? When I first did it, it wouldn't let me
select all of the nodes for some reason, but when I started over and
created the exact same network from scratch it did work, so you might
try that.

Sam

Message no. 219[Branch from no. 218]
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 6, 2005 8:34pm
Subject: Re: Assignment 5

Yeah, that's exactly what's happening.

Thanks for the suggestion; I'll try again.

Message no. 220[Branch from no. 218]
Posted by David Poole (cpsc_422_term2) on Wednesday, April 6, 2005 8:49pm
Subject: Re: Assignment 5

>I had the same problems. Are you using verbose mode and selecting the
>nodes to eliminate manually? When I first did it, it wouldn't let me
>select all of the nodes for some reason, but when I started over and
>created the exact same network from scratch it did work, so you might
>try that.

I am not sure what the problem is, but the student who was maintaining
this code assured that this works (see an earler message).  We have
hored a student over the summer to fix it, but that doesn't help you.

Sorry about that!

David

Message no. 221[Branch from no. 218]
Posted by David Poole (cpsc_422_term2) on Wednesday, April 6, 2005 8:55pm
Subject: Re: Assignment 5

>I had the same problems. Are you using verbose mode and selecting the
>nodes to eliminate manually? When I first did it, it wouldn't let me
>select all of the nodes for some reason, but when I started over and
>created the exact same network from scratch it did work, so you might
>try that.

Before you try to create the problem again from scratch try the
following: save the graph (copy the text representation into the
clipboard and save it in a text file, or use the save in the downloaded
version), quit, then copy the text representation back. This usually
works (as it doesn't remember some state that otherwise messes things up).

David

Message no. 222[Branch from no. 221]
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 6, 2005 9:17pm
Subject: Re: Assignment 5

Well, I've tried both suggestions that were given (recreating the entire
network, as well as saving and restarting), and neither has worked.
So it's off to banging my head against the wall some more.

Thanks for your suggestions.

Kaili

Message no. 223[Branch from no. 222]
Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 10:08pm
Subject: Re: Assignment 5

Glad to see I'm not alone when it comes to banging my head on the wall. Strange things 
keep happening when I use the applet, I'm not sure if it's me not understanding the 
material or if the applet is lying to me and has a bug.

Onur

Message no. 224
Posted by Guan Wang (s77942019) on Wednesday, April 6, 2005 10:36pm
Subject: applet messed up

Hi I used the applet to do question 2 last week and it optimazied it to 
give 79, now its giving me 71. Seems like the way it optmizes changed 
but which one is correct?

Message no. 225[Branch from no. 223]
Posted by Wing Hang Chan (s84098011) on Wednesday, April 6, 2005 10:36pm
Subject: Re: Assignment 5

the applet works fine for my question 1 graph (and my conditional
probabilities)

i would recommend that you play with the question 2 graph first.  it
really helped me :)

Message no. 226[Branch from no. 224]
Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 10:44pm
Subject: Re: applet messed up

The applet optimizes to 79.2 for me still, but I'm not even sure if that's right. It's 
optimizing to 71 without you making any changes or observing anything?

Onur

Message no. 227[Branch from no. 225]
Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 10:46pm
Subject: Re: Assignment 5

For question 2, why does the probability of Trouble 2 come out to be 0.16? I keep 
calculating 0.3, and can't seem to figure out how it's getting 0.16, I'm calculating things 
just like I did the others but it just doesn't seem to add up.

Onur

Message no. 228[Branch from no. 226]
Posted by Guan Wang (s77942019) on Wednesday, April 6, 2005 10:48pm
Subject: Re: applet messed up

ya, strange. I reloaded the graph it optimizes 79.2 again. I guess no 
need to panic? :-)

Message no. 229[Branch from no. 217]
Posted by Onur Komili (s88435045) on Wednesday, April 6, 2005 11:32pm
Subject: Re: Assignment 5

I'm having the exact same problems you are now.

My Expected Utility comes out to 520, when my utility values range from 0 to 100. Also 
I'm getting the "policy not defined" messages you're having after trying to optimize it.

Did you figure out how to solve your problem yet?

Onur

Message no. 230[Branch from no. 229]
Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 6, 2005 11:44pm
Subject: Re: Assignment 5

Are you doing brief or verbose? I find brief gives me some whacked out answer, so I always use 
verbose now.

Message no. 231[Branch from no. 229]
Posted by David Poole (cpsc_422_term2) on Thursday, April 7, 2005 9:57am
Subject: Re: Assignment 5

In message 229 on Wednesday, April 6, 2005 11:32pm, Onur Komili writes:
>I'm having the exact same problems you are now.
>
>My Expected Utility comes out to 520, when my utility values range from
0 to 100. Also 
>I'm getting the "policy not defined" messages you're having after
trying to optimize it.
>
>Did you figure out how to solve your problem yet?
>
>Onur

I looked at your code.

All of the decision nodes need to be connected together. And any of the
information from a previous decision needs to be available for a next
decision. It needs to be "noforgetting".  Otherwise the algorithm
doesn't work. You did not do that, and that is why it doesn't work.

David

Message no. 232
Posted by Wing Hang Chan (s84098011) on Friday, April 8, 2005 1:35pm
Subject: Sample Exam Questions?

Just wondering when we can expect to see some sample exam questions. 
The sooner the better as I would like to get a head start on studying
for this course.  

Thank you :)

p.s. I enjoyed the class presentations, they were all very interesting

Message no. 233
Posted by Stanley Chi Hong Tso (s58635020) on Saturday, April 9, 2005 9:36pm
Subject: Assignment solutions?

Are there assignment solutions to the recent assignments?

Stan

Message no. 234
Posted by Vivian Luk (s82215013) on Tuesday, April 12, 2005 9:18pm
Subject: March 29th/31st Lecture notes?

Are the lecture notes for March 29th and 31st going to be available? 
Thanks!

# 29 Mar. Value of information and control.
# 31 Mar.  Putting it together.

Message no. 235
Posted by David Poole (cpsc_422_term2) on Wednesday, April 13, 2005 12:32pm
Subject: practice final exam available

There is a practice final exam available from the courese web page.  I
had hoped to post this earlier, but I couldn't; sorry about that.

I will post a solution on Monday sometime.

David

Message no. 236[Branch from no. 235]
Posted by Wing Hang Chan (s84098011) on Wednesday, April 13, 2005 3:10pm
Subject: Re: practice final exam available

thank you for the posting the sample exam.  will there be solutions
posted for this?

Message no. 237[Branch from no. 236]
Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 13, 2005 6:33pm
Subject: Re: practice final exam available

>I will post a solution on Monday sometime.
>
>David
>


Wing Hang Chan writes:
>thank you for the posting the sample exam.  will there be solutions
>posted for this?



I'm sorry, ask that again?

Message no. 238
Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 13, 2005 11:24pm
Subject: Office hours?

Are the TA and Prof office hours the same for the exam period?

Message no. 239[Branch from no. 237]
Posted by Wing Hang Chan (s84098011) on Thursday, April 14, 2005 3:54am
Subject: Re: practice final exam available

oops i was in such a rush this afternoon i didn't finish reading the
post.  *waits for monday*

Message no. 240
Posted by Michael Chiang (s27992023) on Thursday, April 14, 2005 9:48am
Subject: TA hours (Michael)

Hi all,

I will be running my usual TA hour today at 1pm, in room 106 (accessible
from the atrium, just knock on the door).

Also, I will run a 2 hour session next on Monday at 11am in the Student
Learning Centre, CISCR 206.

Good luck with the preparations,
Michael

Message no. 241
Posted by Frank Hutter (s62336011) on Thursday, April 14, 2005 10:37am
Subject: TA hours (Frank)

Hi everybody,

I'll have extra TA hours this Friday 11-1 and next Tuesday 11-12:30
(just before David's office hour). I'll also have my regular office hour
next Wednesday (but better don't wait until then ;).

Please notice that all these TA hours will be held in my new office in
the new building, office X563.

Cheers,
Frank

Message no. 242
Posted by Vivian Luk (s82215013) on Thursday, April 14, 2005 6:41pm
Subject: How long is the final exam?

Is the final exam 2hrs or 2.5hrs?

Thanks.

Message no. 243
Posted by Stephen Shui Fung Mak (s36743003) on Friday, April 15, 2005 12:37am
Subject: Assignment Solutions????

When will they be posted???
The exam is on next Wednesday...and I feel so lost right now...

Message no. 244[Branch from no. 243]
Posted by Kaili Elizabeth Vesik (s83834010) on Friday, April 15, 2005 11:32am
Subject: Re: Assignment Solutions????

To add to the questions re: assignments, David, will we be able to pick
up our marked Assignment 5s sometime before the exam?

Regarding the post I'm replying to, is there anyone who's gotten perfect
on any of the assignments who happens to feel like being kind to the
rest of us and scanning/posting their solutions?
I assume that's not against the rules, since the due dates are all past
now, right?

Thanks!
Kaili

Message no. 245[Branch from no. 242]
Posted by Vivian Luk (s82215013) on Friday, April 15, 2005 12:06pm
Subject: Re: How long is the final exam?

Any reply soon would be greatly appreciated! (I have to give a
presentation at 3:30 right after the 422Final so I need to contact my
prof if I can't make it on time...)

Thx

Message no. 246[Branch from no. 241]
Posted by Frank Hutter (s62336011) on Friday, April 15, 2005 12:19pm
Subject: Re: TA hours (Frank)

By "new building" I mean CS2, not DMP, in case someone was wondering ...

Message no. 247[Branch from no. 242]
Posted by David Poole (cpsc_422_term2) on Friday, April 15, 2005 1:27pm
Subject: Re: How long is the final exam?

In message 242 on Thursday, April 14, 2005 6:41pm, Vivian Luk writes:
>Is the final exam 2hrs or 2.5hrs?
>
>Thanks.

2.5 hours.

Remember you can bring in 1 letter-sized sheet of paper.

David

Message no. 248[Branch from no. 234]
Posted by David Poole (cpsc_422_term2) on Friday, April 15, 2005 1:46pm
Subject: Re: March 29th/31st Lecture notes?

In message 234 on Tuesday, April 12, 2005 9:18pm, Vivian Luk writes:
>Are the lecture notes for March 29th and 31st going to be available? 
>Thanks!
>
># 29 Mar. Value of information and control.
># 31 Mar.  Putting it together.

No. I don't think there were any.

David

Message no. 249[Branch from no. 233]
Posted by Vivian Luk (s82215013) on Saturday, April 16, 2005 12:01pm
Subject: Re: Assignment solutions?

Will there be solutions to the assignments? Thanks.

Message no. 250
Posted by Michael Chiang (s27992023) on Sunday, April 17, 2005 1:31am
Subject: assignment 4 solutions

Hi all, attached are solutions to assignment 4 (one of the many possible
sets at least) which Frank and I put together. They should be fairly
self-explanatory. If not, feel free to post your questions here and
we'll do our best to answer.

Michael

See Attached

Message no. 251[Branch from no. 250]
Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:29am
Subject: Re: assignment 4 solutions

I think Mike forgot to post this part of our solution to question 2 of
assignment 4 which does the computations by hand:

Question 2a)
Initial factors: f0(A), f1(B), f2(A,B,C), f3(B,D), f4(C,E), f5(C,F)

Eliminate D: sum f3(B,D) over D => new factor f6(B) (all 1s)
Eliminate F: sum f5(C,F) over F => new factor f7(C) (all 1s)
Eliminate A: sum f0(A)*f2(A,B,C) over A => new factor f8(B,C) (t,t:0.16;
t,f: 0.84; f,t:0.76; f,f: 0.24)
Eliminate B: sum f1(B)*f6(B)*f8(B,C) over B => new factor f9(C) (t:0.64,
f:0.36)
Eliminate C: sum f7(C)*f9(C)*f4(C,E) over C => new factor f10(E)
(t:0.52, f:0.48)

This is already normalized, so the result is P(E=t) = 0.52 and P(E=f) = 0.48


Question 2b)
You can reuse most of the computation.

f5(C,F) becomes F5(C) (t: 0.8, f:0.1)
Note that this is NOT a CPT anymore, it is not normalized!

You also don't need to eliminate F anymore.
The last factor is then f9(E) (t: 0.3656, f: 0.1824)

Normalization yields the end result: P(E=t|F=f) = 0.6672 and P(E=f|F=f)
= 0.3328

Message no. 252
Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:42am
Subject: Assignment 5 solutions

Hi everybody,

here's the solution of assignment 5.
For question 1, David's solution is attached as an xml file.

For question 2, here's what Mike and I came up with (the xml file can be
found on the course website):

Question 2a)

The optimal decision function for variable cheat2 is to always cheat,
expect 
when you cheated before (Cheat1=t) and got trouble (Trouble1=t).
When you didn't cheat but still got trouble, it actually doesn't matter
whether
you cheat in Cheat2 (the utilities are then equal for Cheat2=t and
Cheat2=f, and 
the applet chooses Cheat2=t in that case).

You can compute this decision function with variable elimination as follows.
Like in assignment 4 for Bayesian networks, we eliminate one variable at a 
time. Random variables are eliminated by summing over them, and decision
variables 
are eliminated by taking the maximum of their outcomes (there's an
example later on).

Initial factors: 
f0(Cheat2, Trouble1, Watched, Trouble2) (from CPT for Trouble2)
f1(Trouble1, Cheat1, Watched) (from CPT of Trouble1)
f2(Watched) (from CPT of Watched)
f3(Trouble2, Cheat2, Utility) (from table for the utility node)

Eliminate Trouble2: sum f0 x f3 over Trouble2 => new factor f4(Trouble1,
Watched, Cheat2, Utility)
This is factor f4:
Trouble1   Watched   Cheat2    Utility
t          t         t         30
t          t         f         49
t          f         t         79
t          f         f         49
f          t         t         44
f          t         f         70
f          f         t         100
f          f         f         70

Eliminate Watched: sum f1 x f2 x f4 over Watched => new factor
f5(Trouble1, Cheat1, Cheat2, Utility)
This is factor f5:
Trouble1   Cheat1    Cheat2    Utility
t          t         t         9.6
t          t         f         15.68
t          f         t         0
t          f         f         0
f          t         t         63.52
f          t         f         47.6
f          f         t         77.6
f          f         f         70

From this, you can already extract the decision function for Cheat2:
Trouble1   Cheat1   Cheat2
t          t        f (since 15.68 > 9.6)
t          f        t (arbitrary, could just as well be f, since 0==0)
f          t        t (since 63.52 > 47.6)
f          f        t (since 77.6 > 70)

You read this as follows: 
"if Trouble1=t and Cheat1=t, I will not cheat the second time
(Cheat2=f)" etc.



Question 2b)

Eliminate Cheat2: maximize f5 over Cheat2 => new factor f6(Trouble1,
Cheat1, Utility)
(the maximization is the same we did in the end of question 2a - just
keep the higher entry) )
This is factor f6:
Trouble1   Cheat1    Utility
t          t         15.68
t          f         0
f          t         63.52
f          f         77.6


Eliminate Trouble1: sum f6 over Trouble1 => new factor f7(Cheat1, Utility)
This is factor f7:
Cheat1   Utility
t        79.2
f        77.6

Thus, the decision function for Cheat1 always chooses Cheat1=t.

The optimal policy is the combination of the decision functions for
Cheat1 and Cheat2.

If you want to compute the expected utility (which was not asked for),
you need to eliminate Cheat1 as well:
Eliminate Cheat1: maximize f7 over Cheat1 => factor f8(Utility). 
This yields the expected utility 79.2 when you follow the optimal policy.

Note that the expected utility of a policy merely says how well you do
on average with this policy.
It is NOT the policy itself. The policy consists of decision functions
for each decision variable. 
(Many people mixed this up)

Message no. 253[Branch from no. 249]
Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:46am
Subject: Re: Assignment solutions?

Ok, now there should be solutions for all assignments:

Solutions for assignments 1 and 2 are on the course website.
David posted a solution to assignment 3 on February 23.
Mike posted a solution to assignment 4 earlier today.
I just posted a solution to assignment 5.

If you don't find or understand them, please ask.

Cheers,
Frank

Message no. 254[Branch from no. 228]
Posted by Frank Hutter (s62336011) on Sunday, April 17, 2005 2:48am
Subject: Re: applet messed up

79.2 is correct for the expected utility - see the solution I just posted.
I hope there's no new bug with the applet.

Frank

Message no. 255[Branch from no. 251]
Posted by Michael Chiang (s27992023) on Sunday, April 17, 2005 10:10am
Subject: Re: assignment 4 solutions

Oops, good save Frank! Thanks.

M.

Message no. 256[Branch from no. 253]
Posted by Vivian Luk (s82215013) on Sunday, April 17, 2005 11:04am
Subject: Re: Assignment solutions?

Thanks!!!

Message no. 257[Branch from no. 252]
Posted by Kaili Elizabeth Vesik (s83834010) on Sunday, April 17, 2005 11:49am
Subject: Re: Assignment 5 solutions

There doesn't seem to be an xml file attached. Was there supposed to be?

Kaili

Message no. 258[Branch from no. 245]
Posted by Samuel Douglas Davis (s85850014) on Sunday, April 17, 2005 2:12pm
Subject: Re: How long is the final exam?

In message 245 on Friday, April 15, 2005 12:06pm, Vivian Luk writes:
>Any reply soon would be greatly appreciated! (I have to give a
>presentation at 3:30 right after the 422Final so I need to contact my
>prof if I can't make it on time...)
>
>Thx

Isn't the 422 final at 3:30?

Message no. 259[Branch from no. 258]
Posted by Vivian Luk (s82215013) on Sunday, April 17, 2005 4:23pm
Subject: Re: How long is the final exam?

I meant 5:30 :)

Message no. 260
Posted by Samuel Douglas Davis (s85850014) on Sunday, April 17, 2005 6:11pm
Subject: SARSA(lambda)

I think I've misunderstood something all along. The notes on
reinforcement learning say that the algorithm given for SARSA(lambda)
"specifies that Q[s, a] is updated for every state s and action a
whenever a new reward is received," however it seems to me that it says
that Q is updated at each step, regardless of whether there is a reward
or not. Am I right in thinking that the algorithm only specifies what to
do when r != 0, and if so, what happens when r=0?

Thanks,
Sam

Message no. 261[Branch from no. 260]
Posted by David Poole (cpsc_422_term2) on Sunday, April 17, 2005 10:07pm
Subject: Re: SARSA(lambda)

In message 260 on Sunday, April 17, 2005 6:11pm, Samuel Douglas Davis
writes:
>I think I've misunderstood something all along. The notes on
>reinforcement learning say that the algorithm given for SARSA(lambda)
>"specifies that Q[s, a] is updated for every state s and action a
>whenever a new reward is received," however it seems to me that it says
>that Q is updated at each step, regardless of whether there is a reward
>or not. Am I right in thinking that the algorithm only specifies what to
>do when r != 0, and if so, what happens when r=0?
>
>Thanks,
>Sam

r=0 isn't treated differently from any other case. A reward is received,
includes the case when 0 is received. 

David

Message no. 264[Branch from no. 261]
Posted by Samuel Douglas Davis (s85850014) on Sunday, April 17, 2005 10:26pm
Subject: Re: SARSA(lambda)

In message 261 on Sunday, April 17, 2005 10:07pm, David Poole writes:
>r=0 isn't treated differently from any other case. A reward is received,
>includes the case when 0 is received. 
>
>David
>

Ok, I was confused by question 3c of the midterm. I thought it was
asking us how part b would be different using SARSA, but it was actually
asking us about part a, right?

Message no. 265
Posted by David Poole (cpsc_422_term2) on Sunday, April 17, 2005 10:26pm
Subject: Solution to the practice final exam

There is a solution to the practice final exam at:
http://www.cs.ubc.ca/spider/poole/cs422/2005/exams/prfinsol.html or
http://www.cs.ubc.ca/spider/poole/cs422/2005/exams/prfinsol.pdf

You can also expect something repreated from the midterm, and something
that you should have learned from the assignments and the project.

David

Message no. 266
Posted by Michael Chiang (s27992023) on Monday, April 18, 2005 11:31am
Subject: extra TA hour

Hi all,

I will be offering another hour of TA consultation tomorrow (due to some
demand). It will be held in the Student Learning Centre #206, 3.30pm ~
4.30pm.

Michael

Message no. 267
Posted by Onur Komili (s88435045) on Monday, April 18, 2005 1:39pm
Subject: Midterm Questions Scan

I know it's a little late but I just realized I missed the lecture where we got our midterms 
back and never actually went to pick it up since our grades were posted online. Could 
someone please scan the midterm (feel free to blank out your answers if you want) and 
just post the midterm questions or possibly email it to me? 

David posted the solutions to the midterm but it doesn't have the questions themselves 
and I can't remember what all the questions were.

Thanks in advance,

Onur

Message no. 268
Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 2:43pm
Subject: Final, Question 10c

I was going through this question, and have come across what I believe is an error.

Going to the 3rd equation, after you sum out S, you get f(A,C).  Then maximize over 
shutdown and get f(c,utility)... so that leaves P(C), f(A,C) and f(C, Utility). Then it sums 
out C... that leaves f(A, Utility)... but the answer in the solutions just says f(Utility). It's 
my assumption that A should have been summed out sometime before C, right? (or we 
can just sum out A now... but the bottom line is A should have been summed out)

Let me know if I'm somewhat right :)

Message no. 269
Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 2:45pm
Subject: Discounted reward

I'm a bit confused as to the purpose of discounted reward. My thoughts were that the 
values got more refined as the algorithms/applets ran, so why are the more refined, 
future values worth less than the original, unrefined values? I understand that discounted 
reward solves the problem of infinite rewards, but is there another reason for Discounted 
rewards besides that?

Thanks.

Message no. 270[Branch from no. 268]
Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 2:55pm
Subject: Re: Final, Question 10c

Also, is there any specific reasons the variables were summed/maximized in this order?

Thanks!

Message no. 271
Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 3:29pm
Subject: Midterm Q3b

For Q[S4,Right], how is the answer 10+0.5(10-10) = 10 generated? Where did the 
discount of 0.9 go, and how does S5 suddenly have a value of 10?

Message no. 272[Branch from no. 271]
Posted by Kaili Elizabeth Vesik (s83834010) on Monday, April 18, 2005 4:06pm
Subject: Re: Midterm Q3b

In message 271 on Monday, April 18, 2005 3:29pm, Danelle Abra Wettstein
writes:
>For Q[S4,Right], how is the answer 10+0.5(10-10) = 10 generated? Where
did the 
>discount of 0.9 go, and how does S5 suddenly have a value of 10?

The discount of 0.9 doesn't show up because it multiplies the current
value of s5, which is zero. The expression, written out fully, would be
Q[s4,right] <- 10 + 0.5(10 + 0.9*0 - 10).

s5 doesn't have a Q-value, but it does have a reward of 10; that's where
the first 10 inside the brackets comes from. Take a look at the diagram
on the test-- s5 is inside a circle; that is, a reward state.

Message no. 273[Branch from no. 272]
Posted by Samuel Douglas Davis (s85850014) on Monday, April 18, 2005 4:29pm
Subject: Re: Midterm Q3b

I don't really understand why the value of s5 is 0. We visited it before
and must have carried out some action, and if we went right or crashed
into a wall it would have a non-zero Q-value, wouldn't it?

Message no. 274[Branch from no. 273]
Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 5:12pm
Subject: Re: Midterm Q3b

Oh.. I incorrectly had brackets around a statement... that totally explains why I was confused :)

Message no. 275[Branch from no. 273]
Posted by Danelle Abra Wettstein (s86800018) on Monday, April 18, 2005 5:15pm
Subject: Re: Midterm Q3b

In message 273 on Monday, April 18, 2005 4:29pm, Samuel Douglas Davis writes:
>I don't really understand why the value of s5 is 0. We visited it before
>and must have carried out some action, and if we went right or crashed
>into a wall it would have a non-zero Q-value, wouldn't it?

It could have moved down. I guess we just make the assumption that it didn't hit anything, given the 
information wasn't presented.

Message no. 276[Branch from no. 275]
Posted by Samuel Douglas Davis (s85850014) on Monday, April 18, 2005 5:48pm
Subject: Re: Midterm Q3b

In message 275 on Monday, April 18, 2005 5:15pm, Danelle Abra Wettstein
writes:
>In message 273 on Monday, April 18, 2005 4:29pm, Samuel Douglas Davis
writes:
>>I don't really understand why the value of s5 is 0. We visited it before
>>and must have carried out some action, and if we went right or crashed
>>into a wall it would have a non-zero Q-value, wouldn't it?
>
>It could have moved down. I guess we just make the assumption that it
didn't hit anything, given the 
>information wasn't presented.

Actually, I think I was wrong. Even if we went right and received a
negative reward, the value of the other actions would still be zero, so
the max of those values is still zero.

Message no. 277[Branch from no. 268]
Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 8:25pm
Subject: Re: Final, Question 10c

In message 268 on Monday, April 18, 2005 2:43pm, Danelle Abra Wettstein
writes:
>I was going through this question, and have come across what I believe
is an error.
>
>Going to the 3rd equation, after you sum out S, you get f(A,C).  Then
maximize over 
>shutdown and get f(c,utility)... so that leaves P(C), f(A,C) and f(C,
Utility). Then it sums 
>out C... that leaves f(A, Utility)... but the answer in the solutions
just says f(Utility). It's 
>my assumption that A should have been summed out sometime before C,
right? (or we 
>can just sum out A now... but the bottom line is A should have been
summed out)
>
>Let me know if I'm somewhat right :)

Yes, it is wrong. You need to sum out all of the variables apart from A
and Shutdown. You can sum out these variables in any order.  Then you
need to maximize shutdown. Then you sum out A.

Sorry about that. I will correct it. (So if someone loads the solutions
later tonight they may not understand this thread).

David

Message no. 278[Branch from no. 269]
Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 9:21pm
Subject: Re: Discounted reward

In message 269 on Monday, April 18, 2005 2:45pm, Danelle Abra Wettstein
writes:
>I'm a bit confused as to the purpose of discounted reward. My thoughts
were that the 
>values got more refined as the algorithms/applets ran, so why are the
more refined, 
>future values worth less than the original, unrefined values? I
understand that discounted 
>reward solves the problem of infinite rewards, but is there another
reason for Discounted 
>rewards besides that?
>
>Thanks.

This has nothing to do with any algorithms. It is a way to comapre $1:00
now with $1:00 in a year's time. Think about how much is it worth to you
now to get $1000 in a year. It would be less tha $1000, but more than,
say, $500.

A discount of gamma means that a reward of 1 in one time step is worth
gamma to you now. And a reward of 1 in two time steps is worth gamma^2
to you now, etc.

Does that make sense?

David

Message no. 279[Branch from no. 271]
Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 9:37pm
Subject: Re: Midterm Q3b

In message 271 on Monday, April 18, 2005 3:29pm, Danelle Abra Wettstein
writes:
>For Q[S4,Right], how is the answer 10+0.5(10-10) = 10 generated? Where
did the 
>discount of 0.9 go, and how does S5 suddenly have a value of 10?

S5 doesn't have a value of 10; it has a value of 0. You receive a reward
of 10 from entering S5. 

That was what the bold text at the top of page 4 explained.

David

Message no. 280[Branch from no. 273]
Posted by David Poole (cpsc_422_term2) on Monday, April 18, 2005 9:41pm
Subject: Re: Midterm Q3b

In message 273 on Monday, April 18, 2005 4:29pm, Samuel Douglas Davis
writes:
>I don't really understand why the value of s5 is 0. We visited it before
>and must have carried out some action, and if we went right or crashed
>into a wall it would have a non-zero Q-value, wouldn't it?

No. It was only the second time though, and even though one of the Q
values was negative (because it crashed) one of the other Q-values would
be zero, so the future value would be zero. 

David

Message no. 281[Branch from no. 257]
Posted by Frank Hutter (s62336011) on Monday, April 18, 2005 10:26pm
Subject: Re: Assignment 5 solutions

Thanks for pointint that out.
I was so sure I attached it ... anyhow, it's there now.

Frank

See Attached

Message no. 282
Posted by William Hoy Fong (s77957017) on Monday, April 18, 2005 11:14pm
Subject: assignment pick up

have the assignments been marked yet? is there a chance that we could
pick them up tomorrow (tuesday)? if so when and where will they be
available?

Message no. 283[Branch from no. 282]
Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 12:42am
Subject: Re: assignment pick up

On a related note, have projects been marked?

Onur

Message no. 284
Posted by Stanley Chi Hong Tso (s58635020) on Tuesday, April 19, 2005 4:59am
Subject: May I have a softcopy of the midterm exam?

I would like to have a copy of the midterm exam if possible, so I can print it out and redo 
them for study

thanks

Message no. 285[Branch from no. 282]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 9:22am
Subject: Re: assignment pick up

In message 282 on Monday, April 18, 2005 11:14pm, William Hoy Fong writes:
>have the assignments been marked yet? is there a chance that we could
>pick them up tomorrow (tuesday)? if so when and where will they be
>available?

They will be available from outside of my door at 12:30.  

The projects have not been marked (sorry).

David

Message no. 286[Branch from no. 284]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 9:43am
Subject: Re: May I have a softcopy of the midterm exam?

In message 284 on Tuesday, April 19, 2005 4:59am, Stanley Chi Hong Tso
writes:
>I would like to have a copy of the midterm exam if possible, so I can
print it out and redo 
>them for study
>
>thanks

I just posted an HTML version at:
http://www.cs.ubc.ca/spider/poole/cs422/2005/exams/mid.html

David

Message no. 287
Posted by Robin McQuinn (s12331039) on Tuesday, April 19, 2005 9:50am
Subject: Applet XML Loading

I can't get the Belief and Descision Network applet to load local XML
files.  Is this indeed possible using the entirety of the document path?
 And if so,  Is the option the "open location" menu option?  
How else can the XML files be loaded?  Or are we expected to scan
through the XML files and extract the info visually, which is what I
have been doing.  

Thanks lots

Message no. 288[Branch from no. 287]
Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, April 19, 2005 10:13am
Subject: Re: Applet XML Loading

Are you using the version on the web, or have you downloaded it?

If you download the applet and run it yourself, there is a File menu
option called "Load Graph", which opens a file chooser and lets you
select your xml file to load.

Message no. 289[Branch from no. 288]
Posted by Robin McQuinn (s12331039) on Tuesday, April 19, 2005 10:35am
Subject: Re: Applet XML Loading

Brilliant!  Didn't do that yet, because it seems to change so much! 
That had the side effect though, of learning the XML representation,
which is obvious enough!

thanks

Message no. 290[Branch from no. 286]
Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 11:57am
Subject: Re: May I have a softcopy of the midterm exam?

Excellent, thank you.

Onur

Message no. 291
Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 12:57pm
Subject: Midterm, Question #2

In the solutions, David wrote the following for #2

Q[s13,a2] = 
   0.8 * 0.8 * ( 0 + 0.9 * 2)                -- up, no treasure 
 + 0.8 * 0.2 * 0.25 * ( 0 + 0.9 * 7)         -- up, treasure at top right 
 + 0.1 * 0.8 * ( 0.2 * -10 + 0.9 * 0)        -- left, no treasure 
 + 0.1 * 0.2 * ( 0.2 * -10 + 0.9 * 0)        -- left, treasure appears 
 + 0.1 * 0.2 * 0.25 (10 + 0.9*0)             -- right, treasure appears there 

I'm assuming he's using the Value Iteration algorithm ( 
http://www.cs.ubc.ca/spider/poole/ci2/excerpts/decisionprocesses.pdf top of page 7 ) 
however his solution above doesn't match up with the algorithm. Where does the second 
0.8 in the first line come from, and where does the 3rd 0.25 come from in the 2nd and 
5th lines?

According to the algorithm it says...

P(s'|a, s)(r(s, a, s') + γ Vk−1(s'))  (pardon my lack of formatting, just look at page 7 for 
exact algo)

If we're doing the first line, it should be.

P(s'|a,s) = 0.8 (probability of horizontal/vertical move)
r(s, a, s') = 0  (no reward since not in a corner)
γ = 0.9 (given to us)
Vk−1(s') = 2 (given on grid)

Wouldn't that mean the first line should be

0.8 * ( 0 + 0.9 * 2 ) + ...  ?

Please help! Thanks in advance,

Onur

Message no. 292[Branch from no. 291]
Posted by Robin McQuinn (s12331039) on Tuesday, April 19, 2005 1:08pm
Subject: Re: Midterm, Question #2

Hi Onur, that had me flummoxed for a bit too, but the answer is related
to the specific states for which there are non-zero rewards.  

The probabilities come from the following: 

Line 1: P(s'|a,s) = P(moving up) * P(treasure doesn't appear)
Line 2: P(s'|a,s) = P(moving up) * P(treasure appears) * P(treasure in
top right)
Line 3: P(s'|a,s) = P(moving left) * P(treasure doesn't appear)
Line 4: P(s'|a,s) = P(moving left) * P(treasure appears)
Line 5: P(s'|a,s) = P(moving right) * P(treasure appears) * P(treasure
in top right)

remember, if a treasure appears there is only a .25 probabaility that it
will appear in the top right corner.  

Hope that helps
Robin

Message no. 293[Branch from no. 292]
Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 1:13pm
Subject: Re: Midterm, Question #2

Ahh... I was starting to think about that after posting. So I assume that for the line...

 + 0.1 * 0.8 * ( 0.2 * -10 + 0.9 * 0)        -- left, no treasure 

the 0.2 * -10 is the fact that there's a 0.2 probability that this particular monster will 
check if the person landed, and if so they get the -10 reward.

That makes more sense. Thank you.

Onur

Message no. 294
Posted by Daniel Joseph Anderson (s76045996) on Tuesday, April 19, 2005 1:14pm
Subject: reminder for Dr. Poole

from before:

In message 203 on Saturday, April 2, 2005 10:24pm, Kaili Elizabeth Vesik
writes:
>David,
>
>You mentioned in class that any assignments done prior to the midterm
>with grades lower than our midterm grade would have their grades
>increased to the value of our midterm grade. Should we expect to see
>this reflected in the "grades" section of webct, or will it be
>considered only when you calculate our final marks?
>
>Thanks.
>Kaili

It will be reflected in my program to compute grades. (But it might be
good to remind me closer to the final exam ;^)

David

Message no. 295
Posted by Daniel Joseph Anderson (s76045996) on Tuesday, April 19, 2005 1:21pm
Subject: Chapters on the final

The course webpage says "6, 7, 9, 10, 11 and 12" but I know there's some
stuff in there that we didn't cover in class. Are we expected to know
all of each of these chapters, or only some sections of each?

Thanks,
-Dan

Message no. 296[Branch from no. 287]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 1:57pm
Subject: Re: Applet XML Loading

In message 287 on Tuesday, April 19, 2005 9:50am, Robin McQuinn writes:
>I can't get the Belief and Descision Network applet to load local XML
>files.  Is this indeed possible using the entirety of the document path?
> And if so,  Is the option the "open location" menu option?  
>How else can the XML files be loaded?  Or are we expected to scan
>through the XML files and extract the info visually, which is what I
>have been doing.  
>
>Thanks lots

If it is run as an application (the windows executable or the jar file),
you should be able to load xml files. It works for me.

Otherwise you can just copy the whole file into the clipboeard and paste
into the view/edit text representations.

Let us know if this doesn't work.

David

Message no. 297[Branch from no. 295]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 1:59pm
Subject: Re: Chapters on the final

In message 295 on Tuesday, April 19, 2005 1:21pm, Daniel Joseph Anderson
writes:
>The course webpage says "6, 7, 9, 10, 11 and 12" but I know there's some
>stuff in there that we didn't cover in class. Are we expected to know
>all of each of these chapters, or only some sections of each?
>
>Thanks,
>-Dan

Only what we covered in class.

David

Message no. 298[Branch from no. 286]
Posted by Stanley Chi Hong Tso (s58635020) on Tuesday, April 19, 2005 5:04pm
Subject: Re: May I have a softcopy of the midterm exam?

Nice, thanks

Message no. 299
Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 8:01pm
Subject: Practice Final #10c

I'm having a really hard time figuring out when/how to do factorization. I know it's 
probably too late at this point, but if anyone wants to attempt to explain this to me and 
others that are probably unsure as well but too embaressed to ask I'd really appreciate it.

I understand what maximizing is and how to do it. I understand what "summing out" is. 
What I don't know is how to choose what needs to be factored and why. I tried reading 
the notes and the book and it's just not clicking in my brain at all. 

If anyone wants to explain why the solution for question 10c is the way it is I'd 
appreciate it. 

Thanks in advance,

Onur

Message no. 300[Branch from no. 299]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 19, 2005 8:26pm
Subject: Re: Practice Final #10c

In message 299 on Tuesday, April 19, 2005 8:01pm, Onur Komili writes:
>I'm having a really hard time figuring out when/how to do
factorization. I know it's 
>probably too late at this point, but if anyone wants to attempt to
explain this to me and 
>others that are probably unsure as well but too embaressed to ask I'd
really appreciate it.
>
>I understand what maximizing is and how to do it. I understand what
"summing out" is. 
>What I don't know is how to choose what needs to be factored and why. I
tried reading 
>the notes and the book and it's just not clicking in my brain at all. 
>
>If anyone wants to explain why the solution for question 10c is the way
it is I'd 
>appreciate it. 
>
>Thanks in advance,
>
>Onur

Sum out all of the random variables that are not the parent of a
decision node. These can be done in any order.

You should have a decision variable with its parent; maximize this. 

Repeat till there are no more decision nodes. 

Then sum out the remaining variables: the resulting number is the
expected utility.

Does this make sense?

David

Message no. 301[Branch from no. 300]
Posted by Onur Komili (s88435045) on Tuesday, April 19, 2005 8:51pm
Subject: Re: Practice Final #10c

That makes a lot more sense. In your ch10/lect6.pdf page 16 slides you say something 
similar but it didn't make any sense at the time and even reading it now it doesn't make 
sense. Perhaps rewording it to say something similar may make it a little more clear for 
future semesters.

I just tested it out on the practice final question and it seems to work out. I just wish I 
asked this before assignment 5. Ah well, better late than never I suppose.

Thanks for clearing that up,

Onur

Message no. 302
Posted by Danelle Abra Wettstein (s86800018) on Tuesday, April 19, 2005 11:26pm
Subject: ??

Is anyone as terrified as me?

Message no. 303[Branch from no. 302]
Posted by David Burns Cameron (s66878984) on Tuesday, April 19, 2005 11:49pm
Subject: Re: ??

yes.

Message no. 304[Branch from no. 303]
Posted by Vivian Luk (s82215013) on Wednesday, April 20, 2005 12:33am
Subject: Re: ??

ditto

:(

Message no. 305[Branch from no. 304]
Posted by Onur Komili (s88435045) on Wednesday, April 20, 2005 12:34am
Subject: Re: ??

More so than any other exam I've ever had since highschool....

Onur

Message no. 306[Branch from no. 305]
Posted by Ryan Yee (s81483042) on Wednesday, April 20, 2005 12:36am
Subject: Re: ??

Game over man!! Game over!

Message no. 307[Branch from no. 306]
Posted by Stanley Chi Hong Tso (s58635020) on Wednesday, April 20, 2005 1:22am
Subject: Re: ??

I'm sure I'll be f***ed after the exam.

Message no. 308[Branch from no. 307]
Posted by Daniel Wen-Yen Chang (s81965014) on Wednesday, April 20, 2005 1:29am
Subject: Re: ??

i just hope that the actual final's difficulty will be like the sample one
and yes..I'm terrified too

Message no. 309[Branch from no. 308]
Posted by Wing Hang Chan (s84098011) on Wednesday, April 20, 2005 2:18am
Subject: Re: ??

lol is this the pre-exam anxiety thread?

wish everyone good luck this afternoon :)

Message no. 310[Branch from no. 306]
Posted by Kaili Elizabeth Vesik (s83834010) on Wednesday, April 20, 2005 12:15pm
Subject: Re: ??

In message 306 on Wednesday, April 20, 2005 12:36am, Ryan Yee writes:
>Game over man!! Game over!

Expresses my feelings EXACTLY.   :)
(I don't really know why I added a smiley there... could be that
exam-fear-delirium I'm feeling).

Message no. 311[Branch from no. 302]
Posted by Christopher John Hawkins (s93985018) on Wednesday, April 20, 2005 1:19pm
Subject: Re: ??

Game over indeed!

impending doom...

Message no. 312[Branch from no. 311]
Posted by Michael Nightingale (s98742018) on Wednesday, April 20, 2005 1:22pm
Subject: Re: ??

and to think that my graduation hinges on decision networks ;(

lol

Message no. 313[Branch from no. 312]
Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, April 20, 2005 2:45pm
Subject: Re: ??

In message 312 on Wednesday, April 20, 2005 1:22pm, Michael 
Nightingale writes:
>and to think that my graduation hinges on decision networks ;(
>
>lol

LOL...same feeling here

Message no. 314[Branch from no. 313]
Posted by Daniel Gayo McLaren (s40871022) on Wednesday, April 20, 2005 3:01pm
Subject: Re: ??

I had an instructor that used to tell us to bring Kleenex to his
midterms/exams because there would be lots of crying.  I'm bringing lots
for this one.

Good luck everyone!

Message no. 315[Branch from no. 313]
Posted by Danelle Abra Wettstein (s86800018) on Wednesday, April 20, 2005 3:09pm
Subject: Re: ??

In message 313 on Wednesday, April 20, 2005 2:45pm, Stephen Shui Fung
Mak writes:
>In message 312 on Wednesday, April 20, 2005 1:22pm, Michael 
>Nightingale writes:
>>and to think that my graduation hinges on decision networks ;(
>>
>>lol
>
>LOL...same feeling here

Me too! And my extended family has already bought non-refundable tickets
to Vancouver in May!

Message no. 316[Branch from no. 315]
Posted by David Burns Cameron (s66878984) on Wednesday, April 20, 2005 7:49pm
Subject: Re: ??

I survived!

And it wasn't nearly as bad as I thought!

Thank You, Practice Final!!!

Message no. 317
Posted by Vivian Luk (s82215013) on Monday, April 25, 2005 12:50am
Subject: Midterm total

Is the midterm out of 55 or 60?  (WebCT says 60 but counting up points
on midterm add up to 55)

Tks :)

Vivian

Message no. 318[Branch from no. 317]
Posted by Kaili Elizabeth Vesik (s83834010) on Tuesday, April 26, 2005 7:38am
Subject: Re: Midterm total

I'm at my parents' house right now (without my exam), so I don't know
for certain, but I'm pretty sure there was a comment during the test
about how one of the questions has an incorrect points value. If nobody
else has answered this by the time I get back to my place, I will check
the exam and let you know.

Kaili

Message no. 319[Branch from no. 317]
Posted by David Poole (cpsc_422_term2) on Tuesday, April 26, 2005 9:51am
Subject: Re: Midterm total

In message 317 on Monday, April 25, 2005 12:50am, Vivian Luk writes:
>Is the midterm out of 55 or 60?  (WebCT says 60 but counting up points
>on midterm add up to 55)

The questions are worth 10, 10, 25, 15 which adds up to 60.

If anyone has marks for assignments missing, please let me know.

David

Message no. 320
Posted by Michael Nightingale (s98742018) on Wednesday, April 27, 2005 12:52pm
Subject: Final Marks

Just wanted to mention that I noticed final exam marks are up on the
SSC, so you can check out how you did.

Have a great summer, and if applicable, a fun-filled graduation (see you
there) :D

Message no. 321[Branch from no. 320]
Posted by David Poole (cpsc_422_term2) on Wednesday, April 27, 2005 2:24pm
Subject: Re: Final Marks

In message 320 on Wednesday, April 27, 2005 12:52pm, Michael Nightingale
writes:
>Just wanted to mention that I noticed final exam marks are up on the
>SSC, so you can check out how you did.
>
>Have a great summer, and if applicable, a fun-filled graduation (see you
>there) :D

The final grades have been submitted (yesterday), so you should be able
to access them (but I'm not sure when they release them).

Thanks all. Have a good summer,

David

Message no. 322[Branch from no. 320]
Posted by David Poole (cpsc_422_term2) on Wednesday, April 27, 2005 2:27pm
Subject: Re: Final Marks

In message 320 on Wednesday, April 27, 2005 12:52pm, Michael Nightingale
writes:
>Just wanted to mention that I noticed final exam marks are up on the
>SSC, so you can check out how you did.

That's interesting, WebCT claims they are not released (and I can't
check this). Can you really see them?

David

Message no. 323[Branch from no. 322]
Posted by Stephen Shui Fung Mak (s36743003) on Wednesday, April 27, 2005 3:44pm
Subject: Re: Final Marks

Yes! It's out! =D

Stephen


In message 322 on Wednesday, April 27, 2005 2:27pm, David Poole writes:
>In message 320 on Wednesday, April 27, 2005 12:52pm, Michael Nightingale
>writes:
>>Just wanted to mention that I noticed final exam marks are up on the
>>SSC, so you can check out how you did.
>
>That's interesting, WebCT claims they are not released (and I can't
>check this). Can you really see them?
>
>David
>

Message no. 324[Branch from no. 323]
Posted by Daniel Gayo McLaren (s40871022) on Wednesday, April 27, 2005 9:01pm
Subject: Re: Final Marks

I can't see the final exam mark on WebCT, but I can see my final mark on
the Student Service Center website.  Happy summer!

Download Close