You Are What You Feed Your AI
I had another blog planned for this week, but I couldn’t not talk about what’s happening right now. If it feels like generative AI (aka ChatGPT) has been a little off lately, you're not imagining things. Responses are getting weirder. Confidence is getting bolder. Sycophancy was so thick, it was sickening. (Read more about how this happened.) Some of the sharpest tools are suddenly fuzzy on the basics. Starting over with everything I had in memory? It’s more than frustrating and also telling. A more serious issue is taking shape: AI is being trained on the outputs of other AI and that loop is starting to show its cracks.
Many of today’s large language models are now being trained on data that includes AI-generated content. It’s faster. It’s cheaper. It scales. But like most incestuous endeavors, the more it happens, the more things degrade. Hallucinations aren’t just bugs anymore, they’re becoming the Habsburg jaw of AI: a visible side effect of overbreeding what was already a replica of the real thing.
It’s not just about data sources either. Some companies are training smaller models on the outputs of their larger models to get similar performance from lighter-weight tools. The problem is that fidelity becomes the metric. So even when the smaller model makes minor errors, those mistakes get rewarded for sounding right. The system learns how to imitate the form, not the function leading to those small mistakes getting locked in.
Over time, the flaws aren’t just repeated. They’re reinforced. Errors in early training sets become structural assumptions. Those assumptions then become the base for everything that gets built on top of them. What should have been corrected becomes a feature and then the whole thing starts to drift.
If the models you’re using feel like they’re losing their edge, they probably are. Not because the tech is getting worse, but because the foundation it’s built on is getting thinner and nobody’s hitting pause to fix it.
So what do we do with that?
We slow down. We ask better questions. We stop confusing speed with progress.
If you’re building with AI, whether it’s an internal tool, a member-facing assistant, or a fully-integrated platform, this matters. You need to know where your data comes from. You need to know what your model is actually learning and you need to be sure you’re building something useful, not just layering polish on top of educated guesses. Relying on synthetic or AI-generated data can clearly lead to a cascade of issues, from degraded model performance to ethical and legal complications. Ensuring that AI systems are trained on accurate, diverse, and representative datasets is essential for building trustworthy and effective AI.
This is exactly why we built Voiceflip the way we did.
Ardi, Zip, and our next assistant (vote now if you haven’t already!) are each trained in a closed environment using real client documentation. No regurgitated AI. No synthetic shortcuts. Just facts - our client’s facts. We’ve never had a hallucination, and that’s by design. The smartest thing we can do right now is protect the integrity of what we build before the industry forgets what right even looks like.
I’ve overcome much of my own skepticism, but what’s happening in the broader landscape cannot be ignored. Oversight matters, especially when working with generative AI models built on massive training sets and unlimited parameters. As much as I believe in AI’s potential, we have to stay honest about what it is and what it isn’t. These recent shifts are a reminder that humans still have a critical role to play. We’re not just building tools, we’re setting standards.