>I can't understand the reason for paying attention to oneself.
Certain parts of a sentence strongly inform the meaning of other parts and so it is important to encode them together. If you see the word "bank" in a sentence, is it referring to the financial institution or the land next to a body of water? We know by what came before or what comes after. Attention allows relevant context to inform token processing without being distracted by irrelevant context.
Certain parts of a sentence strongly inform the meaning of other parts and so it is important to encode them together. If you see the word "bank" in a sentence, is it referring to the financial institution or the land next to a body of water? We know by what came before or what comes after. Attention allows relevant context to inform token processing without being distracted by irrelevant context.