HashMap Attention

0

Build a lookup table of attention scores during prefill/keep adding to it during decode. Attention scales subquadratially by doing fuzzy lookups and only computing attention scores against keys where previous instances of the token (or small edit distance token sequences) attended with some threshold value. Could try to use this for an adaptive KV cache quantization too. The systems side of getting this fast enough to be viable at the frontier would be a huge headache though. Unless you’re only doing this for adaptive quantization, you’ll want the dictionary in accelerator memory which isn’t really made for these type of ops. Maybe some cleverness with sparse matrix ops

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{muchane-hashmap-attention-2026,
  author = {Muchane, Mark},
  title = {HashMap Attention},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/UYtNyuXwKF3BsFthG9Vx}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!