I’d encourage devs to use MiniMax, Kimi, etc for
real world tasks that require intelligence. The down
sides emerge pretty fast: much higher reasoning
token use, slower outputs, and degradation that is
palpable. Sadly, you do get what you pay for right
now. However that doesn’t prevent you from saving
tons through smart model routing, being smart about
reasoning budgets, and using max output tokens
wisely. And optimize your apps and prompts to reduce
output tokens.
reply