What happens when you give three developer teams just nine hours to build a web application from scratch using AI? Each team approached the task differently, and what we learned revealed both the promise and the limits of putting AI at the center of the development process.
The goal wasn’t to see who “won.” The purpose of the experiment was to ensure everyone on the team gained experience using AI, and find out how AI could help our processes.
The clearest advantage of using AI was speed. AI had no problem taking care of simple boilerplate code that normally eats up time. We weren’t stuck writing repetitive setup code, which gave us room to focus on higher-level decisions.
AI’s value wasn’t limited to code. We used it to write user stories, create GitHub issues, and even draft commit messages. One team leaned on it for Django work, and it showed a strong understanding of the framework and even recommended best practices when there were multiple ways to solve a problem.
With the right context, AI can act less like an autocomplete tool and more like a senior dev nudging you toward a cleaner path.
And when it came to getting started, AI was a rocket booster. It was great for quickly generating a working prototype. Teams could experiment, learn from mistakes, refine specs, and then do a rebuild later once requirements were clearer. That kind of acceleration added real value.
Not everything went smoothly. One team ran into version trouble right away. They were working in Tailwind 4, but the AI kept falling back to Tailwind 3 since that version is better documented. The mismatch cost them time before they could even start building.
Debugging created its own headaches. AI often suggested fixes that didn’t actually solve the problem, which meant getting stuck in loops: ask for a fix, test it, watch it fail, repeat. Stability was shaky too. Features that had been working previously would suddenly break after new code was added. And once the apps grew more complex, the AI struggled to build on top of existing logic.
Collaboration also suffered. On one team, four developers divided up features and each used AI to generate their piece. The result was overlapping work, Git conflicts, and more time spent resolving issues. Since no one had authored the code themselves, they felt less ownership and were slower to untangle what had gone wrong.
Finally, when overloaded with instructions, the AI faltered. It skipped steps or left tasks incomplete, and some blind spots were consistent. A few models repeatedly tripped over basic Python tasks like mixing up positional and keyword arguments.
The most critical factor for success was investing in a detailed, upfront plan. The better your spec is going in, the more control you’ll have over the final product. One team skipped upfront planning and ended up with an app so broken it was faster to start over than fix it.
Other teams pulled AI into the planning process itself. One group spent its entire first session with Claude.ai, generating detailed tickets and user stories before writing a single line of code. Another went a step further, feeding the AI a screenshot of a planning board. From that, it produced a full technical spec, which the team then refined.
In both cases, planning first paid off.
AI can get you to 90% completion quickly, but it needs boundaries. The best results came from treating it like a junior developer — give it one task at a time, with clear context and limits. One team even built “Goose Hints,” pre-defined prompt files that laid out project rules and scope. With those guardrails, the AI stayed focused and produced code that was easier for code review and QA.
Teams also saw the next big opportunity: AI testing its own work. Manual QA slowed progress, and everyone agreed a quality feedback loop would change the game. Pairing AI with tools like Playwright could allow it to generate and run end-to-end tests automatically, catching errors earlier and reducing the burden on developers.
Deployment turned out to be either a non-event or a total roadblock, and the difference came down to preparation. The teams that deployed to a sandbox immediately after bootstrapping didn’t have to think about it again. Their pipelines worked from day one.
Others weren’t so lucky. One group chose an untested toolchain and spent an entire session fighting deployment issues. Meanwhile, another followed a standard, well-proven process, ran a single command, and moved on. The lesson was clear: stick to proven paths if you want deployment to be a non-event.
Calvin Hendryx-Parker, Six Feet Up CTO and AWS Hero, shares takeaways in this video:
The future of software development won’t be measured by how quickly we can write boilerplate code. That part is already shifting to AI. What will matter is how well teams define the business problem, envision the solution, and plan the work.
AI can be a powerful teammate, but it cannot replace the leadership, creativity, and discipline that humans bring. The real value lies in blending AI’s speed with human expertise to guide, refine, and deliver.
Looking for help with a custom software development project? Let’s talk.