Sample Assignment on Decision Tree Learning
Question 1
In electronic commerce applications we want to make predictions about what
a user will do. Consider the following made-up data used to predict whether
someone will ask for more information (more_info) based on whether
they accessed from an educational domain (edu), whether this is
a first visit (first), whether they have bought goods from an affiliated
company (bought), and whether they have visited a famous online
information store (visited).
Example |
bought |
edu |
first |
visited |
more_info |
e1 |
false |
true |
false |
false |
true |
e2 |
true |
false |
true |
false |
false |
e3 |
false |
false |
true |
true |
true |
e4 |
false |
false |
true |
false |
false |
e5 |
false |
false |
false |
true |
false |
e6 |
true |
false |
false |
true |
true |
e7 |
true |
false |
false |
false |
true |
e8 |
false |
true |
true |
true |
false |
e9 |
false |
true |
true |
false |
false |
e10 |
true |
true |
true |
false |
true |
e11 |
true |
true |
false |
true |
true |
e12 |
false |
false |
false |
false |
true |
|
We want to use this data to learn the value of more_info as a function
of the values of the other variables.
Suppose we measure the error of a decision tree as the number of misclassified
examples. The optimal decision tree from a class of decision trees is an
element of the class with minimal error.
-
Give the optimal decision tree with only one node. What is the error of
this tree?
-
Give the optimal decision tree of depth 2 (i.e., the root node is the only
node with children). For each node in the tree give the examples that are
filtered to that node. What is the error of this tree?
-
Give the decision tree that is produced by the DT algorithm discussed in
class, run to completion, where we split on the attribute that reduces
the error the most. For each node in the tree specify which examples are
filtered to that node. As well as growing the tree, give a logical representation
of the tree using the prop predicate, as we did in class.
-
Give two instances that don't appear in the examples above and show how
they are classified. Use this to explain the bias inherent in the tree
(how does the bias give you these particular predications?).
-
How can overfitting occur in the learned network? Explain in terms of this
example
S