From Knowledge Discovery
>
> BAYES NET STRUCTURE LEARNING
> With regard to scoring a Bayes Net Structure, I?m a bit confused about
> the meaning of the equation we seek to minimize. Could you clarify
> what each term represents and reiterate the intuition for this error
> expression?
I'l have to look these up again. The basic idea is that you one to
find a minimum description length model, that is a model structure
and parameters which give a good fit to the data without requiring too
many parameters. The details lie in how much one penalizes each
parameter in the model. The most common method is to use BIC.
>
> L1, L2, Lp, Linfinity
> Could you recap your discussion from the last lecture of these measures
> and how they pertain to the learning methods we?ve talked about? Your
> description went by a little fast on Thursday.
for a vector x of length n, with elements x_i:
|x|_0 = fraction of nonzero element in the vector x
|x|_1 = (1/n) Sum_{i=1,n} |x|
|x|_2 = sqrt [(1/n) Sum_{i=1,n}x^2]
|x|_inf = max_i |x_i|
>
> SUPPORT VECTOR MACHINES
> Can you discuss QP, its relationship to SVM?s, and what level of
> understanding we should have of QP, and this relationship?
We did not cover QP, so you don't need to know about it. What you should know
is what SVM's are maximizing subject to what constraints
>
> BOOSTING/BAGGING/RANDOM FORESTS
> The slides say bagging can reduce bias. How?
If you average (bag) a bunch of decision trees, you can get a more
general model, i.e. one which is not a decision tree.
> Can you elaborate on how margins apply to boosting and the relationship
> between boosting and SVM?s?
that's good for discussion.
>
> K-MEANS
> When should hierarchical clustering be used instead of k-means (slide 40)?
>
for smaller data sets, one sometimes wants to avoid picking a given k,
and to see a whole tree of different clusters.
> INFERING MARKOV MODELS
> The application of EM to Markov Model inference is still esoteric.
> Could you discuss?
In estimating a HMM, the states are not-observable variables, so
estimation (not inferenence) requires alternating between computing
the transition and emission probabilities, given the states that one
'expects' to be in, and computing the probability of being in each
state (at each time) given the model parameters that one has
estimated.
_ lyle
>
>