Document

❯

❯

❯

10 transformer

2026년 1월 11일1 min read

attention

sequence x를 통해서 hidden layer로 정보 전달함
그럼 아예 RNN을 없애볼까?
Cross attention: paying attention to the input x to generate 𝑦𝑡

self attention to generate 𝑦𝑡, we need to pay attention to $y_{< t}$

rnn과 다르게 $y_{t}$ 를 생성하기 위해서 이전 $y_{< t}$ 를 참고한다!

position embedding

sinusoids

from scratch

정해진 index 밖은 표현이 안됌

Decoder

미래 정보는 Masking으로 0으로 만듬
여러 Block을 쌓아서 만듬
Next token의 distr

Encoder

bidirectional하기 위해 No masking 진행

croess attention

그래프 뷰

Created with Quartz v4.5.0 © 2026

GitHub
Discord Community