Transformer Block Outputs

First Layer Attention Matrices

Last Layer Attention Matrices