ABOUT ME

-

Today: -

Yesterday: -

Total: -

동산 동산

기계학습이론과실습 2022. 5. 4. 15:33

Decision Tree는 Dataset에 있는 관측치들을 독립변수의 값에 따라 종속변수의 값이 유사한 여러 개의 그룹으로 분리하고, 각 그룹에 속한 관측치들의 종속변수 값을 동일한 값으로 예측하는 알고리즘이다.

구성요소

* root node

* internal node = decision node

-> cut-off or cut point value

-> hyper-parameter: depth

* leaf node = terminal node

Decision Tree Regressor
- RSS(Sum of Residual Squares)를 Minimize하는 Cut-off 적용
- 예측값은 해당 그룹의 평균값
: 변수의 값을 크기에 따라 정렬 ->
연속된 두 개의 평균값을 Cut-off로 적용했을 때의 RSS 계산 ->
해당 작업 반복 후 RSS를 Minimize 하는 Cut-off 적용
Decision Tree Classifier
- Entropy E 혹은 Gini Index G를 Minimize하는 Cut-off 적용
- 예측값은 해당 그룹의 최빈값
* G = sigma (k=1 to k) Pj,k * (1-Pj,k) where Pj,k = Group j에서 Class k의 비중 = mk / mj
* 각 그룹의 data points 수에 따라 weight를 주기도 한다
: G1 + G2를 Minimize 하는 Cut-off 적용

* 주요 Hyper-parameters (refer to sklearn.tree.DecisionTreeClassifier or sklearn.tree.DecisionTreeRegressor)

- criterion: entropy or gini
- max_depth
- min_samples_split
- min_samples_leaf
- max_leaf_nodes

과적합(overfitting) 혹은 과소적합(underfitting) 문제가 발생할 수 있으니 유의

'기계학습이론과실습' 카테고리의 다른 글

차원축소(Dimension Reduction) (0)	2022.05.16
Ensemble methods (0)	2022.05.11
Clustering (0)	2022.05.04
Course Intro (0)	2022.05.04
Naive Bayes (0)	2022.05.03

관련글 관련글 더보기

댓글

인기포스트

ABOUT ME

LINK

ADMIN

티스토리툴바