model weights (params), KV cache (keys + values stored for autoregressive generation), activation/intermediate buffers, and a configurable framework overhead. The estimator intentionally uses ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results