Sample Assignment on  Decision Tree Learning (+ Solution)

Question 1

In electronic commerce applications we want to make predictions about what a user will do. Consider the following made-up data used to predict whether someone will ask for more information (more_info) based on whether they accessed from an educational domain (edu), whether this is a first visit (first), whether they have bought goods from an affiliated company (bought), and whether they have visited a famous online information store (visited).
 
Example  bought edu first visited more_info
e1 false  true  false  false  true
e2 true  false  true  false  false
e3 false  false  true  true  true
e4 false  false  true  false  false 
e5 false  false  false  true  false 
e6 true  false  false  true  true
e7 true  false  false  false  true
e8 false  true  true  true  false
e9 false  true  true  false  false
e10 true  true  true  false  true
e11 true  true  false  true  true
e12 false  false  false  false  true
We want to use this data to learn the value of more_info as a function of the values of the other variables.

Suppose we measure the error of a decision tree as the number of misclassified examples. The optimal decision tree from a class of decision trees is an element of the class with minimal error.

  1. Give the optimal decision tree with only one node. What is the error of this tree?
  2. Give the optimal decision tree of depth 2 (i.e., the root node is the only node with children). For each node in the tree give the examples that are filtered to that node. What is the error of this tree?
  3. Give the decision tree that is produced by the decision tree learning algorithm we discussed in class, run to completion, where we split on the attribute that reduces the error the most. For each node in the tree specify which examples are filtered to that node. As well as drawing the tree, give its logical representation using the prop predicate, as we did in class.
  4. Give two instances that don't appear in the examples above and show how they are classified. Use this to explain the bias inherent in the tree (how does the bias give you these particular predications?).
  5. How can overfitting occur in the learned network? Explain in terms of this example

Solution

  1. Give the optimal decision tree with only one node. What is the error of this tree?

  2.  

     
     
     
     
     

    The optimal decision tree is the one with more_info=true. It makes 5 errors. [It gets examples e2, e4, e5, e8 and e9 wrong.].

  3. Give the optimal decision tree of depth 2 (i.e., the root node is the only node with children). For each node in the tree give the examples that are filtered to that node. What is the error of this tree?

  4.  

     
     
     
     
     

    There is one non-leaf node, labelled with first. The tree is:

    Classification of more_info for a tree of depth 2.

    This has error of 3 (it misclassifies examples e3, e10 and e5).

  5. Give the decision tree that is produced by the DT learning  algorithm run to completion, where we split on the attribute that reduces the error the most. For each node in the tree specify which examples are filtered to that node.

  6.  

     
     
     
     
     
     
     

    There are many possible answers to this problem depending on how the arbitrary choices are resolved. All solutions have first at the root.

    One solution is: ( there is an error in the tree below. e7 should be classified with e6 under the path with arcs  <false, false, true>, not under <false, true>)

    Some of the prop clasues describing the tree are:
    Prop(Obj,more_info, true) <- prop(Obj, first, true) & prop(Obj,edu, true)& prop(Obj,bought, true)
    Prop(Obj,more_info, false) <- prop(Obj, first, true) & prop(Obj,edu, true)& prop(Obj,bought, false)
    Prop(Obj,more_info, false) <- prop(Obj, first, true) & prop(Obj,edu, false)& prop(Obj,visited, true)
    Prop(Obj,more_info, true) <- prop(Obj, first, true) & prop(Obj,edu, false)& prop(Obj,visited, false)
    Prop(Obj,more_info, true) <- prop(Obj, first, false) & prop(Obj,edu, true)
    ......................................................................
  1. Give two instances that don't appear in the examples above and show how they are classified. Use this to explain the bias inherent in the tree (how does the bias give you these particular predications?).

  2.  

     
     
     

    There are 4 instances that don't appear in the examples above:

     
    bought edu first visited more_info
    true  true  true  true  true
    true true  false  false  true
    true  false  true  true  true
    false  true  false  true  true
    The bias is that when first is true, visited is irrelevant when edu is true and bought is irrelevant when edu is false. When first is false, more_info is always false except for the example corresponding to e5.
  3. How can overfitting occur in the learned network? Explain in terms of this example.

  4.  

     
     
     
     
     

    There are two different thing that you could notice from this dataset, in particular from noticing that there is only one example that is false when first is false. This single example results in a complex tree for first=false.

    The first thing to notice is that on which attribute to split on  for first=false is completely arbitrary. Depending on the arbitrary choice, the instance:

     
    bought edu first visited
    false  true  false  true 
    will be classified differently. We can (over)fit the data to models where this instance is true and to models where this instance is false.

    The second thing is that this example e5 could be noisy. It may be a better model to say that the user asks for more_info when first=false, but that there is one noisy example rather than there being a complex theory about the world.