CSE 142 UCSC Different Type of Machine Learning Programming Questions

Programming

CSE 142

University of California Santa Cruz

CSE

Question Description

I'm working on a programming test / quiz prep and need guidance to help me study.

Please solver those problems for me. This is a opening machine learning class. I have barely experience such class as ML before. So I totally got lost with its content. Thank you very much.

Unformatted Attachment Preview

v. 1 CSE 142: Machine Learning, Winter 2021 Assignment #1 Due Tuesday, January 26 by 23:59pm PT Notes: § § § § § § This assignment is to be done individually. You may discuss the problems at a general level with others in the class (e.g., about the concepts underlying the question, or what lecture or reading material may be relevant), but the work you turn in must be solely your own. Be sure to re-read the “Policy on Academic Integrity” on the course website. Be aware of the late policy in the course syllabus so turn in what you have by the due time. Justify every answer you give – show the work that achieves the answer or explain your response. Any updates or corrections will be posted on the Assignments page (of the course website), so check there occasionally. To turn in your assignment: - Submission through Gradescope. - Clearly indicates each part of your submission belongs to which question when you are submitting. If you don't do it right, grader might see "no content is available". Problem #1 [5 points] Consider the problem of an adult learning to speak and understand a foreign language. Explain how this process fits into the general learning model (Fig. 3 in the textbook) – i.e., describe the domain objects, training data, model, learning algorithm, and output for this scenario. Discuss what kind(s) of learning takes place. Problem #2 [6 points] You are asked to build a machine learning system to estimate someone’s blood pressure (two numbers: systolic and diastolic; consider them to be real-valued) based on the following inputs: the patient’s sex, age, weight, average grams of fat consumed per day, number of servings of red meat per week, servings of fruits and vegetables per day, smoker or non-smoker. You are given a training data set of values for all of these variables and the blood pressure numbers for 10,000 patients. Answer (and explain) the following questions: (a) What kind of machine learning problem is this? (b) Is it a predictive task or a descriptive task? (c) Are you likely to use a geometric model, a probabilistic model, or a logical model? (d) Will your model be a grouping model or a grading model? (e) What is the label space for this problem? (f) What is the output space for this problem? Problem #3 [8 points] We (simplistically) describe a basketball player’s value in terms of the following statistics: Minutes played per game Points scored per game Rebounds per game Assists per game Steals per game Fouls per game Turnovers per game Stephen Curry (x1) 33.6 30.6 4.8 6.8 1.22 2.11 3.00 James Harden (x2) 36.7 27.0 4.7 11.3 1.0 1.67 3.83 Treating the statistics for each player (x1 and x2) as a feature vector, what is the distance between them, measured in terms of (a) L1 distance, (b) L2 distance, (c) L10 distance, (d) L100 distance? (e) If a constant vector v = [5 5 2 2 0.5 0.1 1]T is added to both 𝑥1 and 𝑥2, which (if any) of L1, L2, L10, or L100 will change? (f) If 𝑥1 and 𝑥2 are multiplied by a constant k, which (if any) of L1, L2, or L10 will change? Problem #4 [12 points] The joint probability distribution of three variables, class, grade and effort can be computed from the following table that shows numbers of students in each bin: class = 165B class = basketweaving grade effort=Small Medium Large effort=Small Medium Large A 0 25 100 50 100 150 B 25 50 75 50 50 25 C 25 50 25 50 25 0 D 50 20 5 0 0 0 F 50 0 0 0 0 0 (a) (b) (c) (d) What is the conditional probability distribution P(grade | class, effort)? What is the marginal probability distribution P(grade, effort)? What is the marginal probability distribution P(effort)? What is P(grade=A | class)? Problem #5 [12 points] There are 100,000 emails used to train a spam detection system – 5,000 of them are of spam and the rest are non-spam. To test the system, you have 10,000 emails – 2,000 spam and 8,000 non-spam – in your test set. The results of the test are as follows: 250 of the spam emails are classified as non-spam, and the rest are classified as spam; 250 of the non-spam emails are classified as spam, and the rest are classified as non-spam. (a) Show the contingency table for this binary classification experiment. Label it clearly and fill out the table entries. (b) What is the false positive rate of the system in this experiment? (c) What is the false negative rate? (d) What is the error rate? (e) What is the precision? (f) What is the accuracy? Problem #6 [8 points] A ranking classifier ranks 25 training examples {xi}, from highest to lowest rank, in the following order: Highest Lowest x2, x3, x1, x5, x13, x6, x8, x7, x9, x10, x12, x11, x15, x4, x14, x21, x17, x20, x18, x22, x16, x19, x25, x23, x24 Examples x1 through x12 are in the positive class (which should be ranked higher); examples x13 through x25 are in the negative class (which should be ranked lower). (a) How many ranking errors are there? (b) What is the ranking error rate? (c) What is the ranking accuracy? Problem #7 [14 points] C6 C5 C4 C3 C2 C1 The figure below shows training data with two features, with each example labeled as being in the positive (filled-in points) or negative (open points) class. Proposed linear discriminant functions (C1 through C6) are shown as dotted lines, each one indicating a different classifier for this data. Each classifier classifies points to the upper-right of its dotted line as positive and points to the lower-left of its dotted line as negative. (a) Draw the coverage plot for this data and plot the different classifiers (and label them as C1, C2, etc.). (b) Draw the ROC plot and label the classifiers on the plot. (c) (d) (e) (f) (g) Which classifiers have the highest and lowest accuracy? Which classifiers have the highest and lowest precision? Which classifiers have the highest and lowest recall? Which classifiers (if any) are complete? Which classifiers (if any) are consistent? MACHINE LEARNING The Art and Science of Algorithms that Make Sense of Data As one of the most comprehensive machine learning texts around, this book does justice to the field’s incredible richness, but without losing sight of the unifying principles. Peter Flach’s clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. He covers a wide range of logical, geometric and statistical models, and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. Machine Learning will set a new standard as an introductory textbook: r The Prologue and Chapter 1 are freely available on-line, providing an accessible first step into machine learning. r The use of established terminology is balanced with the introduction of new and r r r r useful concepts. Well-chosen examples and illustrations form an integral part of the text. Boxes summarise relevant background material and provide pointers for revision. Each chapter concludes with a summary and suggestions for further reading. A list of ‘Important points to remember’ is included at the back of the book together with an extensive index to help readers navigate through the material. MACHINE LEARNING The Art and Science of Algorithms that Make Sense of Data PETER FLACH cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9781107096394  C Peter Flach 2012 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2012 Printed and bound in the United Kingdom by the MPG Books Group A catalogue record for this publication is available from the British Library ISBN 978-1-107-09639-4 Hardback ISBN 978-1-107-42222-3 Paperback Additional resources for this publication at www.cs.bris.ac.uk/home/flach/mlbook Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. To Hessel Flach (1923–2006) Brief Contents Preface xv Prologue: A machine learning sampler 1 1 The ingredients of machine learning 13 2 Binary classification and related tasks 49 3 Beyond binary classification 81 4 Concept learning 104 5 Tree models 129 6 Rule models 157 7 Linear models 194 8 Distance-based models 231 9 Probabilistic models 262 10 Features 298 11 Model ensembles 330 12 Machine learning experiments 343 Epilogue: Where to go from here 360 Important points to remember 363 References 367 Index 383 vii Contents Preface xv Prologue: A machine learning sampler 1 1 The ingredients of machine learning 1.1 13 Tasks: the problems that can be solved with machine learning . . . . . . . 14 Looking for structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Evaluating performance on a task . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2 Models: the output of machine learning . . . . . . . . . . . . . . . . . . . . 20 Geometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Probabilistic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Logical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Grouping and grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 1.3 Features: the workhorses of machine learning . . . . . . . . . . . . . . . . 38 Two uses of features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Feature construction and transformation . . . . . . . . . . . . . . . . . . . 41 Interaction between features . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.4 Summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 What you’ll find in the rest of the book . . . . . . . . . . . . . . . . . . . . . 48 2 Binary classification and related tasks 2.1 49 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 ix Contents x Assessing classification performance . . . . . . . . . . . . . . . . . . . . . . 53 Visualising classification performance . . . . . . . . . . . . . . . . . . . . . 58 2.2 Scoring and ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Assessing and visualising ranking performance . . . . . . . . . . . . . . . . 63 Turning rankers into classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.3 Class probability estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Assessing class probability estimates . . . . . . . . . . . . . . . . . . . . . . 73 Turning rankers into class probability estimators . . . . . . . . . . . . . . . 76 2.4 3 Binary classification and related tasks: Summary and further reading . . 79 Beyond binary classification 3.1 81 Handling more than two classes . . . . . . . . . . . . . . . . . . . . . . . . . 81 Multi-class classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Multi-class scores and probabilities . . . . . . . . . . . . . . . . . . . . . . 86 3.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3 Unsupervised and descriptive learning . . . . . . . . . . . . . . . . . . . . 95 Predictive and descriptive clustering . . . . . . . . . . . . . . . . . . . . . . 96 Other descriptive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.4 4 Beyond binary classification: Summary and further reading . . . . . . . . 102 Concept learning 4.1 104 The hypothesis space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Least general generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Internal disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2 Paths through the hypothesis space . . . . . . . . . . . . . . . . . . . . . . 112 Most general consistent hypotheses . . . . . . . . . . . . . . . . . . . . . . 116 Closed concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.3 Beyond conjunctive concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Using first-order logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5 4.4 Learnability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.5 Concept learning: Summary and further reading . . . . . . . . . . . . . . . 127 Tree models 129 5.1 Decision trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 Ranking and probability estimation trees . . . . . . . . . . . . . . . . . . . 138 Sensitivity to skewed class distributions . . . . . . . . . . . . . . . . . . . . 143 5.3 Tree learning as variance reduction . . . . . . . . . . . . . . . . . . . . . . . 148 Regression trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Contents xi Clustering trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.4 6 Tree models: Summary and further reading . . . . . . . . . . . . . . . . . . 155 Rule models 6.1 157 Learning ordered rule lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Rule lists for ranking and probability estimation . . . . . . . . . . . . . . . 164 6.2 Learning unordered rule sets . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Rule sets for ranking and probability estimation . . . . . . . . . . . . . . . 173 A closer look at rule overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.3 Descriptive rule learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Rule learning for subgroup discovery . . . . . . . . . . . . . . . . . . . . . . 178 Association rule mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7 6.4 First-order rule learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.5 Rule models: Summary and further reading . . . . . . . . . . . . . . . . . . 192 Linear models 7.1 194 The least-squares method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Multivariate linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Regularised regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Using least-squares regression for classification . . . . . . . . . . . . . . . 205 7.2 The perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7.3 Support vector machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Soft margin SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 8 7.4 Obtaining probabilities from linear classifiers . . . . . . . . . . . . . . . . 219 7.5 Going beyond linearity with kernel methods . . . . . . . . . . . . . . . . . 224 7.6 Linear models: Summary and further reading . . . . . . . . . . . . . . . . 228 Distance-based models 231 8.1 So many roads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 8.2 Neighbours and exemplars . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 8.3 Nearest-neighbour classification . . . . . . . . . . . . . . . . . . . . . . . . 242 8.4 Distance-based clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 K -means algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Clustering around medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Silhouettes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 8.5 Hierarchical clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 8.6 From kernels to distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 8.7 Distance-based models: Summary and further reading . . . . . . . . . . . 260 Contents xii 9 Probabilistic models 262 9.1 The normal distribution and its geometric interpretations . . . . . . . . . 266 9.2 Probabilistic models for categorical data . . . . . . . . . . . . . . . . . . . . 273 Using a naive Bayes model for classification . . . . . . . . . . . . . . . . . . 275 Training a naive Bayes model . . . . . . . . . . . . . . . . . . . . . . . . . . 279 9.3 Discriminative learning by optimising conditional likelihood . . . . . . . 282 9.4 Probabilistic models with hidden variables . . . . . . . . . . . . . . . . . . 286 Expectation-Maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Gaussian mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 9.5 Compression-based models . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 9.6 Probabilistic models: Summary and further reading . . . . . . . . . . . . . 295 10 Features 298 10.1 Kinds of feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Calculations on features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Categorical, ordinal and quantitative features . . . . . . . . . . . . . . . . 304 Structured features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 10.2 Feature transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Thresholding and discretisation . . . . . . . . . . . . . . . . . . . . . . . . . 308 Normalisation and calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Incomplete features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 10.3 Feature construction and selection . . . . . . . . . . . . . . . . . . . . . . . 322 Matrix transformations and decompositions . . . . . . . . . . . . . . . . . 324 10.4 Features: Summary and further reading . . . . . . . . . . . . . . . . . . . . 327 11 Model ensembles 330 11.1 Bagging and random forests . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 11.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Boosted rule learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 11.3 Mapping the ensemble landscape . . . . . . . . . . . . . . . . . . . . . . . 338 Bias, variance and margins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Other ensemble methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Meta-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 11.4 Model ensembles: Summary and further reading . . . . . . . . . . . . . . 341 12 Machine learning experiments 343 12.1 What to measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 12.2 How to measure it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Contents xiii 12.3 How to interpret it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Interpretation of results over multiple data sets . . . . . . . . . . . . . . . . 354 12.4 Machine learning experiments: Summary and further reading . . . . . . . 357 Epilogue: Where to go from here 360 Important points to remember 363 References 367 Index 383 Preface This book started life in the Summer of 2008, when my employer, the University of Bristol, awarded me a one-year research fellowship. I decided to embark on writing a general introduction to machine learning, for two reasons. One was that there was scope for such a book, to complement the many mor ...
Student has agreed that all tutoring, explanations, and answers provided by the tutor will be used to help in the learning process and in accordance with Studypool's honor code & terms of service.

This question has not been answered.

Create a free account to get help with this and any other question!