Healthcare Analytics: Unstructured Classifiers
ML is about study of computers (programs) that learn from experience to perform better.
What are the fundamental laws that govern all learning processes?
Examples:
Spam filters in Gmail, Yahoo! Mail
Automated diagnoses
Personalized X (X = Medicine, Search, etc)
Healthcare Analytics: Unstructured ClassifiersTrần Thế TruyềnCenter for Pattern Recognition and Data AnalyticsDeakin University, AustraliaEmail: truyen.tran@deakin.edu.auURL: truyen.vietlabs.comHUST, VN, Dec 2013Introduction to Machine LearningML is about study of computers (programs) that learn from experience to perform better.What are the fundamental laws that govern all learning processes?Examples:Spam filters in Gmail, Yahoo! MailAutomated diagnosesPersonalized X (X = Medicine, Search, etc)2Machine Learning TypesSupervised – correction givenUnsupervised – learn by itselfReinforcement – learn with continuous rewardsIndeed, many more ...3What is an unstructured classifier?A classifier is a system that takes an input and outputs a correct class label.E.g., ABZBCSX -> “Bad pattern”Thus this is discrete.The input can be anything, a vector, an object or a graph, etc.Unstructured means there are no statistical relation between inputs.Also known as “i.i.d.” (independently and identically distributed).4risk prediction/stratification6readmissiondeathtoxicitystressquality-of-lifeprogression to advanced stageslength-of-stayside effectssuicide attemptsSetting7Training phase{Input, output} are givenLearn a mapping function from input to outputControl for overfittingTesting phaseMap new input into outputEvaluate the prediction qualityInput (features)Output (classes) boundaries8K-Nearest Neighbors9K-nearest neighbours (k-NN): Find me a patient like thatAlgorithm:Given a vector representing a patient, compute similar vectors in the training dataChoose k most similar vectors (the neighbours) and aggregate their outputs by each class labelPredict the class using the most popular label in the neighbourhoodThis cost N operations for every test data point (N is number of training data points)So it is expensive if not done right!10k-NN (continued)Properties:Simple but have strong theoretical generalization property.Often very competitive against more complex classifiers.Should always be the first baseline to evaluate against new methods.Questions?How to compute similarity?11Logistic Regression12Logistic regressionMultivariate input, binary outcomes13parameters14Logistic regression: Regularized Learning15Decision Trees16Decision trees17Data partitioningDecision rulesSupport Vector Machines18Support vector machines19KernelKernels for SVM: Mapping from nonlinear input to linear feature space20Naïve Bayes21Naïve Bayes{‘Heart Attack’, ‘Acid Reflux’, ‘Healthy’}{chest pain, hypertension, age = 50, male}OthersBaggingRandom ForestsAdaBoostGradient Boosted MachinesMaxEntMARTGeneralized Linear ModelsGeneralized Additive Models23Overfitting and Variable Selection24Overfitting25Training errorTesting errorVariable selectionForward selectionBackward eliminationLasso26Ridge/Gaussian/L2-normLasso/Laplace/L1-normPerformance MeasuresRecallPrecisionF-score27Performance Measures (2)SpecificitySensitivity28Performance Measures (3)ROC curveArea Under ROC curve (AUC)29
File đính kèm:
- healthcare_analytics_unstructured_classifiers.pptx