ABOUT ME

-

Today: -

Yesterday: -

Total: -

동산 동산

9주차 Embedding 평가 방법

NAVER AI TECH 2023. 5. 3. 11:24

Embedding이 잘 되었는지 어떻게 확인할 수 있을까? 두 가지 방법이 있다.

1.Human-annotated similarity scores를 사용한다

WordSim-353, SimLex-999, MEN과 같은 Human-annotated similarity scores data가 있다.
다음은 WordSim-353 data 일부를 발췌한 것이다.
# i = identical tokens
# s = synonym (at least in one meaning of each)
# a = antonyms (at least in one meaning of each)
# h = first is hyponym of second (at least in one meaning of each)
# H = first is hyperonym of second (at least in one meaning of each)
# S = sibling terms (terms with a common hyperonymy)
# m = first is part of the second one (at least in one meaning of each)
# M = second is part of the first one (at least in one meaning of each)
# t = topically related, but none of the above

	Word1	Word2	Human(mean)
t	love	sex	6.77
h	tiger	cat	7.35
i	tiger	tiger	10.00
t	book	paper	7.46
M	computer	keyboard	7.62
t	computer	internet	7.58

두 단어의 Embedded vectors 간 cosine similarity를 계산한다.
(ex. cosine similarity between embedded vecotor love and embedded vector sex)
cosine similarity score와 human-annotated similarity score 간의 Spearman's correlation을 계산한다.

2. analogy test를 활용한다.

analogy test의 예시는 다음과 같다.
"man is to woman as king is to ___"
과제는 빈칸에 들어갈 알맞은 단어를 채워넣는 것이다.
accuracy가 높을 수록 Language Model 혹은 Word Embedding의 성능이 좋다고 할 수 있다.

'NAVER AI TECH' 카테고리의 다른 글

모델 성능 향상을 위한 데이터 처리 방법 (0)	2023.05.25
Data-Centric AI (0)	2023.05.23
8주차 회고록 (AI 서비스 개발 기초) (0)	2023.04.28
7주차 회고록 (Level 1 Project 종료) (0)	2023.04.22
Attention Is All You Need (0)	2023.03.28

관련글 관련글 더보기

댓글

인기포스트

ABOUT ME

LINK

ADMIN

티스토리툴바