ABOUT ME

-

Today: -

Yesterday: -

Total: -

동산 동산

9주차 Embedding 평가 방법

NAVER AI TECH 2023. 5. 3. 11:24

Embedding이 잘 되었는지 어떻게 확인할 수 있을까? 두 가지 방법이 있다.

1.Human-annotated similarity scores를 사용한다

WordSim-353, SimLex-999, MEN과 같은 Human-annotated similarity scores data가 있다.
다음은 WordSim-353 data 일부를 발췌한 것이다.
# i = identical tokens
# s = synonym (at least in one meaning of each)
# a = antonyms (at least in one meaning of each)
# h = first is hyponym of second (at least in one meaning of each)
# H = first is hyperonym of second (at least in one meaning of each)
# S = sibling terms (terms with a common hyperonymy)
# m = first is part of the second one (at least in one meaning of each)
# M = second is part of the first one (at least in one meaning of each)
# t = topically related, but none of the above

	Word1	Word2	Human(mean)
t	love	sex	6.77
h	tiger	cat	7.35
i	tiger	tiger	10.00
t	book	paper	7.46
M	computer	keyboard	7.62
t	computer	internet	7.58

두 단어의 Embedded vectors 간 cosine similarity를 계산한다.
(ex. cosine similarity between embedded vecotor love and embedded vector sex)
cosine similarity score와 human-annotated similarity score 간의 Spearman's correlation을 계산한다.

2. analogy test를 활용한다.

analogy test의 예시는 다음과 같다.
"man is to woman as king is to ___"
과제는 빈칸에 들어갈 알맞은 단어를 채워넣는 것이다.
accuracy가 높을 수록 Language Model 혹은 Word Embedding의 성능이 좋다고 할 수 있다.

'NAVER AI TECH' 카테고리의 다른 글

NLP based Data-Centric AI (0)	2023.05.24
Data-Centric AI (0)	2023.05.23
8주차 회고록 (AI 서비스 개발 기초) (0)	2023.04.28
8주차 현업자 특강 (0)	2023.04.26
8주차 학습 내용 (AI 서비스 개발 기초) (0)	2023.04.25

관련글 관련글 더보기

댓글

인기포스트

ABOUT ME

LINK

ADMIN

티스토리툴바