The inference point is spot on. Every time someone asks Claude a question, that is an inference call. Multiply that by hundreds of millions of users and you are talking about enormous compute costs where even a modest per-query efficiency gain adds up to hundreds of millions of dollars annually.
