The debugging experience when things go wrong is where this tool still feels rough. The AI autofix feature catches common errors, but when something breaks in a subtle way, the back-and-forth to diagnose it can consume more tokens than building the feature did.
