back
back

By

Justin Slack

April 1, 2026

3 min read

How We're Using AI at NML: Tools, Habits, and Honest Opinions

Earlier this year, our CTO challenged the team to put together a knowledge-sharing session on AI. Rather than standing up and telling everyone what to think, we decided to ask the team first. We sent out a survey company-wide, and the results were genuinely interesting. Consistent in some places, surprisingly honest in others. We followed it up with a live session that included two developer demos. This post captures what we found and what we learned.

Development
Business
Design

The numbers

Every person who responded is already using AI in some form. The question for us has moved on from whether to use it, to how to use it better.

The satisfaction score from the survey was 2.3 out of 5. Not terrible, but not great either. It tells us that people are finding value, but there are still lots of issues with it

Nine different tools are in active use across the team: ChatGPT leads, followed by Claude and GitHub Copilot, then Cursor, and a spread of others including Gemini, OpenAI Codex, Perplexity, Deepseek, and T3 - a service that bundles multiple AI models into one interface for a flat fee, which a few people found a practical way to experiment without managing multiple subscriptions.

What we're using AI for

The dominant use case is research and problem-solving. Debugging and writing came in close behind. These are the low-friction, low-risk entry points, and where most people start.

A more interesting observation is what sits lower on the list. Use cases such as code completion, AI-generated pull requests and automated testing may appear high value in theory, but in practice they still come with significant limitations and risk. In our environment - particularly given the nature of the work we do in financial services - these areas require a high degree of caution, strong context, and careful human review.

What's working

When we asked about the biggest strengths, a few themes came up consistently.

Repetitive work. Boilerplate, commit messages, ticket descriptions, documentation - tasks that take time but don't require deep thinking. AI handles these well, and almost everyone has found value here.

There's also a subtler benefit that came through in the responses. John-Keith put it well: AI is great at bridging the "I don't know what I don't know" gap. It's not just about going faster - it's about opening doors you didn't know existed.

Dean highlighted that AI is particularly good at solving your specific problem, rather than making you wade through dozens of similar-but-not-quite-right answers online. And Moolman noted its ability to filter large datasets against very specific, multi-layered constraints - the kind of precise, tedious task AI handles well.

What's not working

The frustrations were the most consistent part of the survey. Nobody needed prompting.

Confidently wrong. Almost every respondent mentioned this. The model doesn't say "I'm not sure" - it gives you an authoritative-sounding answer that is sometimes simply incorrect. Or it hallucinates a method that doesn't exist. Or references code it wrote three messages ago as though it's still there.

As Nic put it: rather than saying it doesn't have the answer, it guesses and often goes into loops. If you've experienced that, you know how much time you can sink into following a trail that was wrong from the start.

No context about your codebase. AI works with whatever you give it in the chat. It has no knowledge of your architecture, your coding standards, or your business logic. The more context you provide, the better the output - which means effective prompting is itself a skill.

John-Keith's practical workaround: when a conversation runs long and starts losing coherence, start a fresh one with a narrower focus. Simple, but it works - and it came up independently from multiple people.

The pattern: a powerful tool that needs a skilled operator

The through-line across all responses is this: AI doesn't replace expertise. It amplifies it.

Lwandile said it well: "The biggest strength is the accelerated problem-solving that happens when I actively monitor and correct the AI in real time. It's about using my own expertise to steer it."

The people getting the most from these tools aren't the ones handing over the wheel. They're the ones who know their domain well enough to catch when the model goes off course and correct it quickly.

Which also means the floor matters. If you don't have enough context to know when AI is wrong, you're in a genuinely risky position. Raising the floor - building shared literacy about how these tools work, where they fail, and how to prompt them well - is as important as finding the most powerful tool.

Mahina's demo: how I use AI daily as a developer

Mahina walked us through how she uses AI across three areas of her work - writing, thinking, and coding - with real examples rather than theory.

For writing, her biggest time saver is commit messages. After a long task, writing a clear structured summary is often the last thing you have the mental energy for. Give AI a rough description or a diff and it produces something clean and consistent immediately. The same applies to tickets - starting with "fix a validation bug" and expanding it through AI into a properly structured ticket with description, expected behaviour, actual behaviour, and testing notes. It doesn't just save time, it improves how work is communicated to QA and the next developer who picks it up.

For debugging, Mahina's framing was revealing: she doesn't use AI for answers, she uses it for direction. She provides context - what she expects, what's actually happening, relevant logs - and AI suggests possible causes. She then investigates and confirms herself. She also made a practical distinction between tools: ChatGPT she uses as a thinking partner for initial exploration when the picture is still fuzzy; Claude she uses when she has more context to share and needs deeper analysis of a larger piece of logic.

For coding, her threshold is clear: if she doesn't understand what AI produced, she doesn't ship it. She uses it for smaller bounded tasks - UI tweaks, validation logic, repetitive patterns - rather than building features end to end. Her mental checklist before using any AI-generated code: Do I understand it? Does it fit our architecture? Have I tested it? Could it break something else?

She also showed a side-by-side of a gauge chart component she's building - the design she's working to, and the version AI produced from her prompt. Technically functional, visually different. Her conclusion: spending hours perfecting a prompt to get an exact design match is often less efficient than using AI as a rough starting point and doing the fine-tuning yourself.

Her closing line landed well: use AI to move faster, not to think less.

Justin's demo: scaffolding a real project with Cursor

Justin took a different angle - showing what a properly set-up AI-assisted development workflow looks like from the ground up. He scaffolded a Next.js marketing site live, in real time, using Cursor. The value wasn't in the finished product; it was in watching the process.

The most important thing he demonstrated happened before any prompting started. He brought his own rules files - markdown documents covering project structure, naming conventions, component patterns, and how Cursor should behave when generating code. These were added to the project at the start and referenced automatically on every prompt. Without them, AI makes its own decisions about naming, structure, and patterns. Those decisions are inconsistent and won't match your existing codebase.

He also pre-wrote prompt files rather than typing on the fly. The quality of what comes out is almost entirely determined by the quality of what goes in - and writing a good prompt requires knowing your domain well enough to specify it precisely.

A few practical points he flagged explicitly: always set Cursor to ask permission before running anything external - it should never install packages, run builds, or call services without your approval. Match the model to the task - simple tasks like converting a JSON object to a TypeScript type don't need your most powerful (and expensive) model. And like Mahina, he opens a new chat for each prompt rather than letting long conversations degrade.

His honest summary at the end: everything the AI produced would need proper review before going anywhere near production. AI accelerated the work. The judgment, the standards, and the responsibility stayed with the developer.

What comes next

A few things came out of the session as clear next steps for the team:

Prompt engineering is the most valuable skill you can build right now. Not which tool you use - how you use it.

Sharing what works matters. If you've found a prompt, a workflow, or a genuinely useful combination of tools, putting it in a shared channel or doc compounds across the whole team.

We'll do more of these sessions. The plan is to go narrower and more workflow-specific next time - less survey overview, more "here's exactly how I do this thing."