I had mixed feelings watching the OpenAI DevDay opening keynote yesterday. Sam Altman had a contagious enthusiasm as he unveiled the new features. When the GPTs demo texted him live on stage, the whole room erupted in cheers and applause. Lucky people in the audience getting surprised with $500 credits by an agent running live off the stage was a great move. You could feel the energy and excitement around the rapid progress of AI.

Closer to the end of the keynote, I started thinking about how the new features will change my team’s work. Among other things, we’re building complex tools like agent-powered chatbots. The magic wore off quickly. Don’t get me wrong — the new capabilities are useful, especially for less technical users. But for those of us knee-deep in development, they don’t fundamentally change the game.

Take GPTs, for example. Predefined instructions are nice for narrow use cases. But for a sophisticated assistant handling a wide range of tasks on the fly, it needs more flexibility than having a predefined list. Or knowledge retrieval — being able to upload documents is great, but finding the right data to use as context at scale is still more art than science. Unless developers can control how exactly documents are retrieved, it’s of limited use.

At the end of the day, while the features enable some flashy demos, they don’t alleviate the good old-fashioned engineering work needed to build truly capable assistants. The only thing that has real value for engineers is the 32k-token context window, which will be a game-changer. The rest of the features are aimed at building single-use AI assistants with minimal coding.

At first, I was excited about the GPTs and GPT store features. Packaging together predefined instructions, documents, and functions and publishing as an assistant seemed like a total game-changer. I’m not gonna lie — I pictured of whipping up a simple personal assistant bot in a day.

But after looking closely under the hood, I realized it was not quite the magic bullet I hoped for. For basic stuff like very task-specific chatbots, predefined GPTs are great. They take a lot of grunt work out of the equation.

The problem is that sophisticated assistants need to be way more flexible. It has to handle many user intents on the fly and choose the right tools. In the real world, users don’t like switching to a different chatbot each time. You could augment this with function calls and manage the tools on your own, but then the question arises — why not build the whole agent yourself using LangChain or LLamaIndex.

A sophisticated agent needs to dynamically figure out intent, dig up relevant info using tools, and iterate until it gets satisfactory results. For advanced assistants operating in the open world, we still gotta architect solutions combining different techniques.

Knowledge retrieval remains a challenge

Another feature that I wanted so badly was knowledge retrieval. Hooking up an LLM to some proprietary data retrieved in real time is one of the biggest challenges.

To use retrieval API, you have to upload all your documents to OpenAI, either manually or automatically. They will then generate the embeddings and use vector search to get matching records via vector search.

It might work on a small document subset, but uploading every document when you have thousands of them stored somewhere isn’t very practical or secure. Plus, you will inevitably run into a search problem — you’ll need to apply additional rules to which documents are retrieved, which is not something OpenAI currently supports. Having built search engines and AI assistants, I know it gets hairy quickly. It will likely work okay for small datasets but will start to crack as content and queries grow more complex.

I’ll give OpenAI credit — their tools make retrieval look effortless. But behind the curtain of every useful AI assistant is meticulous engineering work. The magical demos don’t capture the blood, sweat, and tears required to build serious solutions.

Core problems persist

I’ll admit — those demos were slick. If I put my engineering hat aside for a moment, it was easy to get caught up in the wow factor, just like before. The crowd oohed and aahed for a reason.

But in the light of day, I see there are still plenty of head-scratching problems we have to wrestle with. Tasks like detecting intent and working with knowledge and real-time data remain squarely on the shoulders of engineers.

For example, the demos made it seem like models know which function to call. But that requires meticulous engineering, too, and the more functions you have, the bigger the problem it becomes. It’s not clear how you’re supposed to solve this with what OpenAI gives you so far.

I don’t try throwing shade at OpenAI here — what they’re doing is extremely valuable and visionary. Their tools are handy for building amazing tools with a minimal amount of coding experience. But it doesn’t eliminate the need for hardcore development work for many real-world cases where a handful of external functions and a few uploaded documents can cut it.

Keeping Expectations in Check

Listening to the presentations was an emotional rollercoaster. One minute, I was amped about what’s new, like the 32K-token context. The next, I was skeptical about the hype. My inner engineer was doing battle with my inner AI enthusiast.

There’s no doubt these tools make building custom chatbots much easier, and I’m sure many people will be successful in doing so. But at the end of the day, we still have to put in the long hours combining different retrieval techniques, maintaining data warehouses, and designing custom logic.

For those of us committed to solving complex problems, it’s an exciting time! But expectations need to be kept in check. AI is the future, but it’s not magic.


Originally published on Medium.com