Course Intro

기계학습이론과실습 2022. 5. 4. 13:43

Through this course we will learn about these topics and their mathmatical principles.

Supervised Learning
- Linear Regression
- Logistic Regression
- Decision Tree
- Ensemble methods
- Support Vecotr Machines
Unsupervised Learning
- Clustering: Hierarchical Clustering, DBSCAN, K-Means, GMM
- Dimension Reduction: PCA

Definition of AI, ML, DL

AI: "smart computer", or "intelligent computer" / Rule-based approaches vs Learning-based approaches

ML: "a machine learns" / Supervised learning vs Unsupervised learning vs Reinforcement learning

Supervised Learning: Regression problem vs Classification problem

Purpose:

To find out the best relationship between IVs and DV =

finding the optimal values of parameters =

minimizing the errors of the function =

minimizing the value of the cost function(MSE or Entropy) =

two ways to minimize MSE: 1) normal equation 2) gradient descent (assumption: convex function)

Overall Procedure:

1. Data preparation: preparing the data into a numpy array form or a pandas data frame form

* to use character type variables use "OrdinalEncoder" or "pd.get_dummies" here

* to prevent imbalanced classification problem use "SMOTE" related method here

2. Splitting Data: splitting data into train data and test data

3. Normalization: feature scaling if it's necessary

4. Learning: get the optimal parameter values

* to prevent overfitting use "Lasso" or "Ridge" regularization method here

* to prevent imbalanced classification problem use "weight" related method here

* for hyperparameter tuning use "cross validation" or "grid search" here

5. Evaluating: check the performance of the model on the test data

6. Fixing the model: to increase the performance of the model, fix the model

동산 동산