The Hallucination Panic: Before Blaming AI, Learn How to Drive It

Recent AI hallucination incidents at leading consulting firms have triggered widespread concerns about whether Artificial Intelligence can be trusted. But are we asking the wrong question? In this article, Roshan argues that many AI failures are not simply model failures—they are often failures of prompting, validation, governance, and user training. Drawing parallels between learning to drive a car and learning to use AI, he explores why concepts such as explicit prompting criteria, few-shot prompting, human-in-the-loop review, and validation workflows are becoming essential skills in the AI era. The article examines real-world examples from the consulting industry and explains why reducing hallucinations requires more than better AI models. It requires better drivers. As AI becomes embedded in business, compliance, audit, cybersecurity, legal, and decision-making processes, organizations that invest in AI literacy, governance, and risk management will gain a significant advantage over those that simply deploy the latest technology. A thought-provoking perspective on why the future belongs not to those with the most advanced AI, but to those who know how to use it responsibly and effectively.

AICISOEVOLUTION

Roshan Yacob George CISA CISSP C|CISO CFE

5/30/20266 min read

a man riding a skateboard down the side of a ramp

Imagine a world where someone buys a Formula One car, drives it at 250 km/h without training, crashes into a wall, and then declares:

"Cars don't work."

Absurd?

Yet that is exactly how much of the world is reacting to Artificial Intelligence today.

Every time an AI system generates an inaccurate citation, a flawed report, an incorrect recommendation, or a fabricated reference, the immediate reaction is predictable:

"AI is hallucinating."

"AI cannot be trusted."

"AI is the problem."

The headlines spread rapidly.

The technology is blamed.

The model is scrutinized.

The vendor is questioned.

But very few people stop to ask a far more important question:

Was the AI being used correctly in the first place?

The Headlines That Triggered the Panic

Recently, global consulting giant EY withdrew a published report after researchers discovered apparent AI-generated hallucinations, including fabricated citations, fake references, and links pointing to sources that did not exist. The report, which was used to promote cybersecurity services, was eventually removed after independent researchers highlighted multiple inconsistencies and unverifiable claims.

Around the same time, discussions intensified around Deloitte's own AI-related controversies, where AI-generated inaccuracies raised broader questions about accountability, governance, validation, and liability within professional services firms. As a warning for every business leader, it was argued that if organizations with extensive expertise, mature controls, and enterprise-grade AI capabilities can publish inaccurate outputs, then every organization should assume it faces similar risks.

The public reaction was immediate.

Many concluded:

"AI cannot be trusted."

But that conclusion may be far too simplistic.

Because these incidents reveal something much bigger than model limitations.

They reveal a growing skills gap in how organizations use Artificial Intelligence.

The Skill Gap No One Is Talking About

Organizations across the world are rapidly integrating AI into:

Research
Consulting
Software Development
Legal Services
Internal Audit
Compliance
Cybersecurity
Customer Service
Decision Support Systems

Yet many users still interact with AI using instructions such as:

"Write a report."
"Review this document."
"Analyze these findings."
"Check whether this is accurate."

Then they are surprised when the output contains inaccuracies, omissions, inconsistencies, or hallucinations.

The problem is often not the model.

The problem is often the prompt.

The problem is often the process.

The problem is often the lack of training provided to the person behind the keyboard.

Just as driving requires education, practice, discipline, and adherence to rules, effective use of AI requires a new form of digital literacy.

And that literacy is becoming one of the most important competitive advantages of the modern enterprise.

Prompting Is Becoming the New Digital Literacy

Many people believe prompting simply means asking a question.

It does not.

Prompting is the process of:

Defining objectives
Providing context
Establishing constraints
Specifying evaluation criteria
Demonstrating expected outcomes

A prompt is not a question.

A prompt is an instruction set.

The difference between an average AI outcome and an exceptional AI outcome often comes down to the quality of that instruction set.

Most AI failures begin long before the output is generated.

They begin with poorly designed prompts.

Why Vague Instructions Create Bad AI Outcomes

One of the most important lessons emerging from enterprise AI deployments is that vague instructions rarely produce precise outcomes.

Consider the difference between these two prompts:

Prompt 1

"Check whether the comments are accurate."

Prompt 2

"Flag comments only when the documented behavior contradicts the actual code behavior."

The second prompt is dramatically more effective because it defines exactly what should be reported.

The first prompt leaves the AI to determine what "accurate" means.

This distinction is crucial.

Many users attempt to improve AI performance by adding instructions such as:

Be conservative.
Use your best judgment.
Only report high-confidence findings.
Be careful.

These instructions sound useful.

In reality, they often fail to improve precision because they do not establish measurable criteria.

The AI is still forced to guess.

And when AI has to guess, inconsistency follows.

Precision Comes from Explicit Criteria

The most effective AI users understand that precision does not come from asking the AI to be more careful.

Precision comes from defining exactly what should be reported and what should be ignored.

For example:

Instead of:

"Review this code for issues."

Use:

"Report only:

Security vulnerabilities
Authentication bypasses
SQL injection risks
Sensitive data exposure

Do not report:

Coding style preferences
Naming conventions
Formatting differences
Team-specific implementation patterns"

Notice what changed.

The AI now understands both sides of the decision.

It knows what belongs inside the review.

And it knows what belongs outside the review.

This dramatically reduces false positives.

The Hidden Cost of False Positives

Most discussions about AI focus on hallucinations and missed findings.

However, false positives are often equally damaging.

A system that repeatedly reports issues that are not actually issues eventually loses credibility.

This is a lesson cybersecurity professionals learned years ago.

A Security Operations Center flooded with false alerts eventually develops alert fatigue.

Analysts stop paying attention.

The same phenomenon occurs with AI.

If users repeatedly encounter inaccurate findings, they begin to distrust all findings.

Eventually, even accurate outputs are questioned.

Trust is difficult to build.

Easy to lose.

And once lost, difficult to recover.

This is why reducing false positives is not simply a technical objective.

It is a trust objective.

Few-Shot Prompting: The Missing Driver Training Program

One of the most powerful techniques for improving AI reliability is Few-Shot Prompting.

Most people try to improve AI by adding more instructions.

Experienced practitioners often do the opposite.

They provide examples.

Instead of telling the AI what to do, they show it.

Consider a citation review task.

Without examples:

"Verify these citations."

Results may vary significantly.

Now consider the same task with examples.

Example 1

Citation:
Smith, 2023

Result:
PASS

Reason:
Publication exists and supports the claim.

Example 2

Citation:
Johnson, 2024

Result:
FAIL

Reason:
Publication cannot be verified.

Example 3

Citation:
Federal Court Judgment

Result:
FAIL

Reason:
Quoted text does not appear in the judgment.

The model now understands the decision standard.

It is no longer guessing.

It is learning from examples.

Few-shot prompting improves:

Consistency
Precision
Output quality
Structured responses
Ambiguous-case handling

Most importantly, it teaches the AI the decision boundary.

The Difference Between Teaching and Telling

The best examples do more than demonstrate what should be reported.

They demonstrate what should not be reported.

Consider a security review.

Example 1

Hardcoded database password.

Result:
Report.

Reason:
Credential exposure creates security risk.

Example 2

Prepared SQL statement.

Result:
Do Not Report.

Reason:
Input is safely parameterized.

Example 3

Minor variable naming inconsistency.

Result:
Do Not Report.

Reason:
Not a security issue.

Example 4

SQL query constructed using string concatenation.

Result:
Report.

Reason:
Potential SQL injection vulnerability.

The AI now understands where the boundary exists between acceptable patterns and genuine issues.

This is how few-shot prompting reduces false positives while enabling generalization.

The Hallucination Problem Is Often a Validation Problem

The EY incident should not be viewed merely as a story about AI hallucinations.

It should be viewed as a story about validation failure.

An AI generating an inaccurate citation is a known risk.

Publishing that citation without verification is a governance failure.

The question should not simply be:

"Why did the model generate a bad citation?"

The more important question is:

"Why did the review process allow it to pass?"

Professional services firms, auditors, legal teams, compliance professionals, and researchers have always relied on validation.

AI does not remove that responsibility.

It increases it.

The issue is no longer purely technical.

It is operational.

It is procedural.

It is managerial.

It is a risk management issue.

The Driver Matters More Than the Vehicle

When a road accident occurs, investigators do not begin by asking whether the car manufacturer should be blamed.

They ask:

Was the driver trained?
Were traffic rules followed?
Was the vehicle operated correctly?
Were safety controls in place?

Yet when an AI system produces an inaccurate output, many organizations immediately blame the model.

This reaction reveals a fundamental misunderstanding of how Artificial Intelligence works.

A skilled driver can safely operate a powerful vehicle.

An untrained driver can crash even the safest car.

The same principle applies to AI.

A poorly trained user can generate poor outcomes from a state-of-the-art AI model.

A well-trained user can achieve exceptional results using a less sophisticated model.

The difference is rarely the technology alone.

The difference is often the person behind the keyboard.

Organizations must invest in teaching employees how to:

Design Effective Prompts

The quality of an AI response is heavily influenced by the quality of the instruction.

Define Explicit Criteria

Specific criteria reduce ambiguity and improve precision.

Use Few-Shot Prompting

Examples teach the model how to make decisions consistently.

Implement Validation Workflows

Generation is only the first step.

Verification must follow.

Maintain Human-in-the-Loop Oversight

AI should augment human judgment, not replace it.

Apply Governance and Risk Management

AI systems require accountability, controls, monitoring, and oversight.

These capabilities are rapidly becoming the new digital literacy.

The future will not belong to organizations that simply have access to AI.

AI will eventually become available to everyone.

The future will belong to organizations that know how to use it better than everyone else.

A Call to Action

We must move beyond the belief that AI is either miraculous or dangerous.

It is neither.

It is a tool.

A powerful tool.

A transformative tool.

But still a tool.

And like every powerful tool in human history, its effectiveness depends on the skill of the person using it.

Before blaming AI for hallucinations, organizations should ask:

Were prompts properly designed?
Were explicit criteria provided?
Were few-shot examples included?
Were outputs validated?
Were review controls implemented?
Were users trained?

If the answer is no, then the problem may not be the model.

The problem may be the driver.

Because learning to use AI is not very different from learning to drive a car.

The technology matters.

The model matters.

But in the age of Artificial Intelligence, the driver still matters more than the vehicle.

Contacts

ygroshan@gmail.com
0091-9886574088

The Hallucination Panic: Before Blaming AI, Learn How to Drive It

Contacts

Socials

Subscribe to our newsletter