-
Course Intro기계학습이론과실습 2022. 5. 4. 13:43
Through this course we will learn about these topics and their mathmatical principles.
- Supervised Learning
- Linear Regression
- Logistic Regression
- Decision Tree
- Ensemble methods
- Support Vecotr Machines - Unsupervised Learning
- Clustering: Hierarchical Clustering, DBSCAN, K-Means, GMM
- Dimension Reduction: PCA
Definition of AI, ML, DL
AI: "smart computer", or "intelligent computer" / Rule-based approaches vs Learning-based approaches
ML: "a machine learns" / Supervised learning vs Unsupervised learning vs Reinforcement learning
Supervised Learning: Regression problem vs Classification problem
Purpose:
To find out the best relationship between IVs and DV =
finding the optimal values of parameters =
minimizing the errors of the function =
minimizing the value of the cost function(MSE or Entropy) =
two ways to minimize MSE: 1) normal equation 2) gradient descent (assumption: convex function)
Overall Procedure:
1. Data preparation: preparing the data into a numpy array form or a pandas data frame form
* to use character type variables use "OrdinalEncoder" or "pd.get_dummies" here
* to prevent imbalanced classification problem use "SMOTE" related method here
2. Splitting Data: splitting data into train data and test data
3. Normalization: feature scaling if it's necessary
4. Learning: get the optimal parameter values
* to prevent overfitting use "Lasso" or "Ridge" regularization method here
* to prevent imbalanced classification problem use "weight" related method here
* for hyperparameter tuning use "cross validation" or "grid search" here
5. Evaluating: check the performance of the model on the test data
6. Fixing the model: to increase the performance of the model, fix the model
'기계학습이론과실습' 카테고리의 다른 글
Decision Tree (0) 2022.05.04 Clustering (0) 2022.05.04 Naive Bayes (0) 2022.05.03 Document Classification (0) 2022.04.25 Imbalanced Classification (0) 2022.04.25 - Supervised Learning