2022-03-02#

Lead Scibe: Derek

Multi Objective in Data Mining#

Presented by Chamudi

Intro#

  • Model Quality represented by an n-dimensional vector

    • n is the num of qual criteria to be optimized

  • 2 scenarios

    • Predictive tasks

      • Classificaiton, regression, etc

      • Which model is better

        • Easily interpretable dozen node decision tree with 92% test accuracy or non-interpretable hundred node decision tree with 95%

        • The answer depends on the problem at hand and what the user prefers

          • May be after just accuracy or you may want to better understand what’s going on

          • One may overfit/underfit, etc

          • Alex’s MNIST example → for another class accuracy was the end goal so the non-interpretable would be the better move in that instance

          • If a bank uses it, making mistakes may cost some money, but non-interpretability may cause a lawsuit

    • Attribute Selection

      • Select a subset of attributes that are relevent for the target data mining task

      • Maximize accuracy of the model, and minimize the number of selected attributes

      • Trade offs

        • Subset A is more accurate than B, but B has fewer attributes

Three Approaches for Coping with Multi Objective Problems#

  • Transforming multi-objective to single-objective

    • Assigning a weight to each objective and combining values of weighted criteria into single value by adding or multiplying all the weightred criteria

      • Q = w1*c1 + w2*c2 + … + wn*cn or

      • Q = c1W1 * c2W2 * … * cnWn

    • A couple scnearios

      • Rule induction algorithms for classification

      • Attribute selection algorithms for classification

    • Arguments For

      • Simplicity

      • Easy to use

    • Arguments Against

      • Setting of weights is ad-hoc

        • Each weight is “magic”, justified vaguely

        • Hard for user to define the best setting of weights a priori without knowing results of the research

        • But you could basically brute force it and pick the best weights

      • Mixing different units of measurement

        • Model quality criterias have different scales in their units of measurement

        • Can be fixed with normalization but only to a certain extent

      • Mixing apples/oranges

        • Non commensurable criteria should not be added together

        • For example 50k salary + 5 dependents

  • Lexicographic Approach

    • Assign different priorities to different objectives then optimize each in order of priority

    • Usually compare performance measure for the highest priority objective unless one is significantly better than the other

    • Arguments For

      • Recognizing non commensurability of different quality data

        • Avoids the 3 drawbacks of weighted formula approach

      • Simpler and easier than Pareto

    • Arguments Against

      • Introducing a new Ad-Hox parameter

        • Requires one to specify a tolerance threshold for each criterion

  • Pareto Approach

    • Use multi objective algorithm to solve the original multi objective problem

    • Pareto Dominance

      • s1 Dominates s2 iff s1 is strictly better than s2 w.r.t. at least one of the objectives being optimized and s1 is not worse than s2 w.r.t. all criteria being optimized

      • theres one ci s.t. s1(ci) > s1(ci) and

      • for all ci, i=1…k, s1(ci)>=s2(ci)

      • See figure 1 of paper

    • Pareto vs Single Objective

      • Multi objective retures a set of non dominated solutions rather than a single solution

      • Search by multi objective should explore wider area of the search space and track non dominated solutions found to find as many solutions as possible

      • Much more complex than single objective

    • Arguments For

      • Multiple runs of a single-objective optimization algorithm is inefficient and ineffective

      • Minimum description length principle comes with a price

        • How to encode hypothesis and its data exceptions into bits of info

    • Arguments Against

      • Multiple runs of a single-objective optimization algorithm seems “enough”

      • Minimum description length principle seems “enough”

  • Moral of the story: Pareto and Lexicographic are the way to go for multi objective