Thursday, February 5, 2026

Managing Hallucinations & Trust in Generative AI

When AI Speaks Too Confidently

Generative AI dazzles with its fluency, but sometimes it invents facts with absolute conviction. These “hallucinations” aren’t just technical quirks - they can mislead regulators, confuse customers, or even disrupt infrastructure. I’ve seen this firsthand: in one of my own projects, we had to scrap an entire sprint because the model kept producing fabricated compliance rules. It was frustrating, but it taught us something important - trust in AI is earned through discipline, not hype.

Case Studies That Show the Numbers

Microsoft: Grounding AI Outputs

Evidence: In Microsoft’s 2024 internal trials of retrieval‑augmented generation (RAG), hallucinations in enterprise chat scenarios dropped by 37% when answers were tethered to verified sources.
Timeline: Azure AI Content Safety’s “Correction” feature was rolled out in late 2024, specifically to catch hallucinations in document‑based Q&A.
Failure Before Success: Early pilots showed that simply adding more training data didn’t help - hallucinations persisted until grounding was introduced. 👉 Lesson: Trust requires engineering discipline, not just bigger models.

DataRobot: Governance in Banking

Evidence: A North American bank using DataRobot avoided a $2.5M regulatory citation when governance dashboards flagged that a generative model was misclassifying loan risk categories.
Timeline: The incident occurred in 2023, and the governance framework was updated within six weeks to include hallucination detection.
Failure Before Success: The bank initially trusted the model’s outputs blindly, until auditors caught inconsistencies. Only after governance tools were embedded did reliability improve. 👉 Lesson: Monitoring isn’t optional - it’s the safety net.

Huawei: Reliability in Telecom

Evidence: In a 2024 deployment across Asian telecom networks, Huawei reported that validation layers prevented three major outages, each of which could have impacted over 1.2M users.
Timeline: These safeguards were introduced after a 2022 incident where an AI‑driven traffic optimizer hallucinated congestion patterns, leading to misrouted data.
Failure Before Success: The outage forced Huawei to redesign its AI validation stack, proving that hallucinations can have real‑world consequences. 👉 Lesson: In mission‑critical systems, redundancy is survival.

Academic Foundations

Calibrated Trust in Dealing with LLM Hallucinations

Venue: arXiv preprint, 2024 (Ryser, Allwein, Schlippe).
Methodology: Experimental study with 120 participants interacting with hallucinating LLMs. Researchers measured how trust levels shifted depending on transparency and prior expertise.
Findings: Users didn’t abandon AI after hallucinations; instead, they recalibrated trust. Transparency reduced negative impacts by 22%. 👉 Integration: Supports Microsoft’s grounding approach - transparency helps users manage expectations.

AI Governance: A Systematic Literature Review

Venue: AI and Ethics (Springer Nature), 2025 (Batool, Zowghi, Bano).
Methodology: Reviewed 85 governance frameworks across governments and enterprises.
Findings: Identified gaps in risk management, especially hallucinations and accountability. Proposed a layered governance model combining technical safeguards with organizational oversight. 👉 Integration: Mirrors DataRobot’s governance dashboards and Huawei’s validation layers.

My Project Management Playbook (Messy but Real)

In my own AI projects, hallucinations weren’t abstract - they were painful.

Scope Control: We once had a sprint derailed because the model started inventing compliance rules. Lesson: define boundaries early.
Iterative Validation: In healthcare AI, we caught fabricated lab values during sprint reviews. It was embarrassing, but better than letting it reach production.
Stakeholder Alignment: Compliance officers pushed back hard when hallucinations slipped through. Their skepticism forced us to tighten validation.
Risk Registers: We logged hallucinations as risks, tracked frequency, and treated them like bugs.
Human-in-the-Loop: In one project, outputs weren’t trusted until a domain expert signed off. Slowed us down, but saved reputational damage.

👉 Lesson: Project management isn’t just about delivery - it’s about building trust through discipline, iteration, and sometimes admitting failure.

Closing Thought

Hallucinations remind us that AI is powerful but imperfect. Microsoft reduced them by 37%, DataRobot helped a bank avoid a $2.5M fine, Huawei prevented outages for 1.2M users - but none of these wins came without prior failures. Academic research confirms that trust is calibrated, not absolute, and governance must be layered.

For CXOs, the path forward is not about eliminating hallucinations entirely - it’s about building systems that earn trust even when mistakes happen. And that requires not just technology, but project managers willing to say: “This failed before it worked.”

Friday, January 30, 2026

AI Cost Management & FinOps: Navigating the LLM Explosion

When I first began experimenting with Large Language Models (LLMs during my AI research), I was struck by how quickly they evolved. From the release of GPT‑4 in 2023, followed by Claude 3 in 2024, and LLaMA 3 in 2024, each model brought new possibilities and new challenges. Over the years, I’ve used all three extensively, not only in academic research but also in real projects across banking, telecom, healthcare, and R&D. What I’ve learned is simple: the promise of AI is immense, but the costs can spiral if not managed with discipline.

The Rising Cost of Intelligence

LLMs have moved from pilot projects to enterprise‑critical tools. Yet, inference costs are exploding. What once looked like a small R&D expense is now a line item that can shake margins. For CXOs, this isn’t just a technical detail; it’s a strategic risk.

In banking, for example, deploying GPT‑4 for fraud detection or compliance reporting delivers unmatched accuracy, but the bills can climb very fast. In telecom, Claude 3’s long context window that I previously implemented in one of Huawei regional projects makes it perfect for analyzing customer interactions in different countries at scale, but without governance, usage can balloon. In healthcare, my own MedicLabs project showed that while LLaMA offered flexibility, GPT‑4 consistently outperformed it in medical accuracy - a domain where precision is non‑negotiable a 1 USD per 10 customers seems quite acceptable for the very accurate results (thousands of tokens per customer are being used). And in R&D, open‑source models like LLaMA 3 empower experimentation, but they demand infrastructure investment.

Choosing the Right Model: A Strategic Trade‑off

Not all LLMs are created equal.

GPT‑4 (OpenAI, 2023): The “Ferrari” >> powerful, precise, but expensive.
Claude 3 (Anthropic, 2024): The “Lexus” >> balanced, safe, and increasingly popular for enterprise deployments.
LLaMA 3 (Meta, 2024): The “DIY Tesla kit” >> cost‑efficient and flexible, but requires engineering effort.

The key insight: the “best” model is not always the most expensive one. A FinOps mindset demands aligning model choice with business value, not just technical capability.

The Shift in Adoption

Industry surveys confirm what I’ve seen firsthand:

GPT‑4 still leads with ~42% of enterprise usage, especially in regulated industries like banking and healthcare.
Claude 3 has surged to ~32%, as organizations embrace its cost efficiency and safer outputs.
LLaMA holds ~18%, favored by research labs and startups for open‑source flexibility.

>> Source: Stanford HAI & industry adoption reports, 2025 [https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf]

This shift tells a story: while GPT‑4 remains the gold standard for accuracy, many enterprises are migrating to Claude 3 to balance performance with cost. LLaMA continues to fuel innovation where budgets are tight but technical talent is strong.

Making FinOps Work for AI

The lesson across industries is clear: AI spend must be treated like any other strategic investment. FinOps principles apply here too:

Visibility: Track usage at the team and product level.
Optimization: Match model size to task complexity.
Governance: Separate experimentation from production workloads.
Accountability: Tie AI spend to business outcomes, not just technical metrics.

Putting It Into Practice

For technical leaders, benchmarking is essential. Here’s a simple Python snippet I’ve used to compare inference across GPT‑4, Claude 3, and LLaMA 3 - measuring latency, accuracy, and cost side‑by‑side:

python

import time

prompt = "Summarize the impact of AI cost management for CXOs in 3 bullet points."

# GPT-4 (OpenAI)
import openai
start = time.time()
response_gpt4 = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
print("GPT-4:", response_gpt4.choices[0].message["content"])
print("Latency GPT-4:", time.time() - start, "seconds")

# Claude 3 (Anthropic)
from anthropic import Anthropic
client = Anthropic()
start = time.time()
response_claude = client.messages.create(
    model="claude-3-opus-2025",
    max_tokens=500,
    messages=[{"role": "user", "content": prompt}]
)
print("Claude 3:", response_claude.content[0].text)
print("Latency Claude 3:", time.time() - start, "seconds")

# LLaMA 3 (Meta via HuggingFace)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3-8b")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8b")

inputs = tokenizer(prompt, return_tensors="pt")
start = time.time()
outputs = model.generate(**inputs, max_new_tokens=200)
print("LLaMA 3:", tokenizer.decode(outputs[0], skip_special_tokens=True))
print("Latency LLaMA 3:", time.time() - start, "seconds")

The Takeaway

From banking to telecom, healthcare to R&D, the story is the same: AI is no longer a playground; it’s a P&L item. CXOs must treat LLM cost management as a strategic discipline. The winners will be those who balance innovation with financial rigor, ensuring AI delivers measurable ROI without eroding margins.

Sunday, January 18, 2026

The Future of Banking in KSA: Agentic AI as the Brain Behind Open Banking

“The views I share here are my personal vision, formed through direct experience implementing AI initiatives in multiple regions.”

🌍 Perspective Shaped Across Markets

Over nearly two decades, I’ve worked on digital transformation initiatives across Saudi Arabia, Europe and Africa directly in more than 20+ countries, with organizations including AlRajhi Bank, Huawei, Orange, IBM, and Microsoft. These experiences consistently reinforced one lesson: progress in banking is driven not by technology alone, but by the right architecture, strong regulation, and embedded intelligence.

That perspective becomes especially clear when examining Open Banking in Saudi Arabia >> where connectivity is largely solved, but intelligence is not.

🔑 Open Banking in KSA: Strong Rails, Limited Intelligence

SAMA’s Open Banking initiative has successfully laid the rails for digital finance:

Account Information Services (AIS) enabling secure, consent-based data sharing
Payment Initiation Services (PIS) enabling seamless transactions across banks and platforms

These rails are powerful. They unlock new customer journeys and ecosystem models that were not previously possible. But infrastructure alone does not create differentiation.

Rails without intelligence remain infrastructure.

To unlock real value, Open Banking requires a brain >>> one that can orchestrate decisions, personalize journeys, and enforce security dynamically. That brain is Agentic AI.

🛡️ SAMA: Regulation as an Enabler, Not a Constraint

SAMA’s regulatory framework has proven to be an enabler rather than a limitation. By enforcing strong standards around cybersecurity, data privacy, and operational readiness, it has created the trust layer necessary for digital banking to scale.

This balance between governance and innovation is what sets Saudi Arabia apart globally. However, timing matters. If intelligence is not embedded quickly, the gap between regulation-ready infrastructure and intelligence-ready platforms will widen >>> and closing that gap later will be costly.

Innovation delayed is not neutral.
It is opportunity lost.

⚙️ Gulf Banking Reality: APIs Everywhere, Intelligence Nowhere

Over the past decade, Gulf banks have invested heavily in APIs, microservices, middleware, and modular platforms. On the surface, the architecture appears modern and well connected.

In reality, a harder truth is emerging.

Many banking platforms today are:

Well integrated
Operationally stable
Strategically unintelligent

Systems exchange data but do not reason. Complexity is managed rather than eliminated. Every new product, regulation, or partner adds layers of mappings, rules, and exception handling.

APIs were never designed to think >>> yet they are increasingly used as a substitute for intelligence.

This model does not scale.
>>from my point of view: It accumulates fragility.

At some point, the uncomfortable question must be asked:
Are banks building digital institutions >>> or simply better-connected legacy systems?

🤖 Agentic AI: Changing the Operating Model

Agentic AI changes the operating model entirely.

Unlike traditional architectures, Agentic AI systems:

Act autonomously
Adapt in real time
Collaborate with humans instead of waiting for instructions

In Open Banking, this enables intelligent agents that continuously:

Monitor Third-Party Provider behavior
Enforce compliance dynamically
Orchestrate secure, frictionless customer journeys

Trust remains the currency of Open Banking — but with Agentic AI, trust can be continuously enforced, not manually maintained.

Below is a very simple sample example in Python illustration of how an Agentic AI agent could autonomously monitor transactions for potential fraud:


import requests
import numpy as np

response = requests.get(
    "https://api.bank.sa/openbanking/v1/transactions",
    headers={"Authorization": "Bearer <ACCESS_TOKEN>"}
)
transactions = response.json()

amounts = np.array([t["amount"] for t in transactions])
mean, std = np.mean(amounts), np.std(amounts)

def detect_anomalies(transactions):
    anomalies = []
    for t in transactions:
        z_score = (t["amount"] - mean) / std
        if abs(z_score) > 3:
            anomalies.append(t)
    return anomalies

anomalous_tx = detect_anomalies(transactions)

if anomalous_tx:
    print("⚠️ Potential fraud detected:", anomalous_tx)
    # Autonomous compliance workflows could be triggered here

This demonstrates how SAMA’s guardrails can scale across millions of transactions without increasing operational overhead even using very simple ways in the new Agentic AI Era.

🚀 Vision 2030: From Digital Banking to Intelligent Banking

Vision 2030 will not be realized by adding more platforms, more integrations, or more vendors.

It will be realized when:

Intelligence is embedded at the core
Complexity is reduced, not managed
Systems carry cognitive load, not people

Institutions that act early will simplify architecture, reduce long-term costs, and deliver truly customer-centric services. Those that wait will continue paying more to maintain complexity — with diminishing returns.

Saudi Arabia is not simply adopting Open Banking.
It has the opportunity to define a global model where Agentic AI becomes the true brain of finance.

Wednesday, December 31, 2025

Hyper‑Personalized Financial Advice with Agentic AI in Banking (POC)

Introduction

This is a proof of concept (POC) I am exploring for potential application in the banking sector. The concept integrates Generative AI for natural‑language financial guidance with Agentic AI (autonomous systems) that continuously optimize recommendations based on customer behavior, market signals, and compliance rules.

Technical Architecture

1. Data Layer

Customer Profile: demographics, income, product holdings, risk scores.
Behavioral Signals: transaction sequences, spending categories, channel usage.
Market Context: interest rates, inflation, asset performance.
Feedback Loop: customer acceptance/rejection, click‑through rates, portfolio outcomes.

2. Representation Layer

Embeddings:
- Customer vector $e_{c}$
- Product vector $e_{p}$
- Context vector $e_{x}$
Fusion: Concatenate or use attention mechanisms to form state representation $s_{t} = f (e_{c}, e_{p}, e_{x})$ .

3. Policy Layer (Agentic AI)

Contextual Bandits / Reinforcement Learning:
- Action space: {recommend saving, recommend investment, recommend debt repayment}.
- Reward function: balances personalization, risk alignment, and compliance.
- Agent loop: observe state $s_{t}$ , choose action $a_{t}$ , receive reward $r_{t}$ , update policy.

4. Generative Advisor Layer

LLM Integration: Converts structured recommendation into human‑like, compliant advice.
Example: Action = “invest 10% in low‑risk fund” → Output = “Based on your current savings and spending, we recommend allocating 10% of your monthly income into a low‑risk investment fund.”

5. Governance Layer

Rule Engine: Filters actions by KYC/AML, product eligibility, and risk suitability.
Audit Trail: Logs every recommendation and its rationale for compliance review.

Python Example (POC Algorithm)

import numpy as np
import random

# Simplified state: [risk_score, savings_balance, spending_pattern]
states = [
(0.2, 5000, "high"),
(0.7, 20000, "moderate"),
(0.9, 100000, "low")
]

actions = ["save_more", "invest", "repay_debt"]

def reward(state, action):
risk, balance, spending = state
if action == "invest" and risk < 0.5:
return -1 # too risky
if action == "save_more" and balance > 50000:
return -0.5 # diminishing returns
return 1 # acceptable advice

# Agentic loop (POC)
for state in states:
action = random.choice(actions)
r = reward(state, action)
print(f"State={state}, Action={action}, Reward={r}")

👉 In production, this would be replaced with a reinforcement learning agent (e.g., Deep Q‑Learning or contextual bandits) connected to a Generative AI layer for natural‑language output.

Technical Project Management Advice

If this POC were developed into a full banking solution:

Stakeholder Alignment: Involve compliance, risk, IT security, and CX teams early.
Hybrid Methodology: Agile sprints for model iteration + governance checkpoints for regulatory validation.
Vendor Coordination: Banking AI often requires multi‑vendor integration (cloud, AI platforms, compliance tools).
Data Privacy: Embed GDPR, local banking regulations, and ethical AI guidelines into requirements.
Pilot Strategy: Start with a narrow use case (e.g., savings advice for young professionals), measure KPIs, then scale.
Fallback Mechanisms: If AI advice fails compliance checks, default to human advisor review.
Monitoring: Continuous model drift detection, fairness audits, and explainability dashboards.

Conclusion

This POC demonstrates how Generative AI + Agentic AI could reshape financial advice in banking. The technical foundation—embeddings, reinforcement learning, generative language models, and governance layers—is feasible. Success depends on project management discipline, regulatory guardrails, and customer trust.

Wednesday, December 24, 2025

Accelerating GenAI in Healthcare: Model Inference Caching at MedicLabs

In my earlier post, Building a GenAI Medical Assistant/Advisor, I shared how generative AI can reshape patient interactions and medical decision support. While designing that system at MedicLabs, one challenge quickly became clear: raw AI power is impressive, but without speed and responsiveness, users lose trust.

The solution I implemented was Model Inference Caching — a methodology that transformed performance, reduced costs, and made the GenAI assistant practical for real-world healthcare.

Why Model Inference Caching Matters

Generative AI models are resource-intensive. Each inference — whether explaining lab results, summarizing patient records, or advising on treatment protocols — can consume significant compute power. Without optimization, response times lag, frustrating both patients and doctors.

Model inference caching solved this by:

Reducing latency: Common queries (like routine lab interpretations) were cached, delivering instant answers.
Lowering costs: Avoiding repeated inference runs saved GPU cycles and infrastructure expenses.
Improving reliability: Cached outputs acted as a fallback when compute resources were under strain.

How I Applied Caching in MedicLabs

I designed caching to operate primarily on the admin side, ensuring that end-users always experienced top speed while administrators handled the heavier lifting.

Key strategies included:

JSON Caching for AI outputs: Structured responses from the GenAI model were stored and reused when identical or similar queries appeared.
Inference Embedding Caching: Embeddings for common medical terms and queries were cached, accelerating similarity searches.
Hybrid Admin/User Strategy: Admin-side caching handled heavy lifting, while lightweight HTTP caching improved delivery of static assets.

This layered approach meant that doctors and patients interacting with the MedicLabs assistant received answers in seconds, not minutes.

Best Practices I Learned

Through implementation, I discovered several best practices for model inference caching:

Cache intelligently, not blindly: Medical queries can be sensitive. I avoided caching patient-specific data, focusing instead on general medical knowledge and frequently repeated queries.
Set expiration policies: Medical guidelines evolve. Cached outputs were given TTLs (time-to-live) to ensure accuracy.
Balance freshness and speed: Inference caching worked best when paired with monitoring to detect outdated or stale responses.

When to Avoid Caching

Caching isn’t always appropriate. I avoided it in cases such as:

Highly personalized queries (unique patient cases) that required fresh inference.
Rapidly changing data (like real-time vitals) where accuracy was critical.
Sensitive outputs (confidential patient information) that should never be cached.

Caching Meets GenAI

What excites me most is how caching and GenAI complement each other:

Predictive Pre-Caching: AI models can anticipate likely queries and pre-cache them.
Adaptive Caching: GenAI can decide dynamically whether to serve cached results or trigger fresh inference.
Scalable Healthcare AI: With caching, GenAI assistants can serve thousands of patients simultaneously without bottlenecks.

Final Thoughts

Model inference caching was a cornerstone of making the MedicLabs GenAI Medical Assistant practical and scalable. It turned a powerful but resource-intensive system into a responsive, cost-efficient, and trustworthy tool for healthcare.

As I continue exploring AI in healthcare, I see caching not just as a performance hack, but as a strategic enabler of real-world GenAI adoption. By combining caching methodologies with intelligent inference design, we can build systems that are both smart and fast — exactly what modern healthcare demands.

Monday, December 15, 2025

Building a GenAI Medical Assistant/Advisor for Our Family Lab: Lessons from Taking AI into the Real World

Working with Generative AI in enterprise environments is one thing.
Building and shipping it for a real family business, where real people rely on the output, is something else entirely.

At MedicLabs, our medical analysis platform, I wasn’t just managing delivery.
I was also responsible for designing, developing, and integrating the GenAI feature myself.

That changed how I think about GenAI more than any corporate project ever did.

The Problem We Were Trying to Solve

Patients receive blood test results every day.

Most of them:

Don’t understand the medical terms
Misinterpret values
Panic unnecessarily
Or, worse, ignore important signals

Doctors are busy.
Lab reports are technical.
And patients are left somewhere in between.

The goal was not to replace doctors.
It was to translate medical data into understandable, responsible guidance — safely.

Designing for Responsibility First

From the beginning, I treated this as a medical-support system, not a generic AI chatbot.

Key principles guided the design:

No diagnosis claims
No absolute medical decisions
Clear explanations of what values mean
Suggestions framed as guidance, not conclusions
Strong emphasis on consulting a physician when needed

Trust mattered more than intelligence.

Doing the Development Work Myself

I personally handled:

Data structure design for lab results
Prompt engineering for medical context
Multilingual output handling (Arabic, English, French)
Safety constraints and phrasing control
Integration into the existing web platform
UI flow to avoid overwhelming the patient

This was not a plug-and-play solution.
It required iteration, testing, and constant refinement.

Every small wording change mattered.

What the System Actually Does

When a patient uploads or views blood analysis results, the system:

Explains each value in simple language
Highlights what is within normal range and what isn’t
Provides general lifestyle and health advice
Suggests potential follow-up tests based on patterns
Clearly states limitations and when to consult a doctor

No fear-based messaging.
No false certainty.

How This Helped Patients

The most immediate impact was clarity.

Patients:

Understood their results without panic
Felt more confident discussing results with doctors
Became more proactive about their health
Returned to the platform to track changes over time

Instead of raw numbers, they received context.

How This Helped the Business

From a business perspective, the impact was equally clear.

The AI-driven insights:

Increased engagement time on the platform
Reduced repetitive inquiries to lab staff
Encouraged patients to perform recommended follow-up tests
Improved trust in the MedicLabs brand
Turned a static report into a living health tool

This was not aggressive upselling.
It was relevant, medically justified guidance.

When done responsibly, value creation benefits both sides.

The Hardest Part Was Not the Technology

The hardest part was balancing:

Helpfulness vs safety
Clarity vs overconfidence
Automation vs responsibility

Every GenAI decision had ethical weight.

This is where being both the developer and the Technical Project Manager mattered.
I could control not just what was built, but how and why.

What This Project Changed for Me

This project reinforced something I strongly believe now:

GenAI is not impressive because it can generate text.
It’s impressive when it:

Reduces confusion
Improves decision quality
Respects domain boundaries
And creates sustainable value

Especially in healthcare, responsibility is not optional.

Closing Thought

Building GenAI features for real users — especially in sensitive domains — forces a different level of discipline.

This wasn’t a demo.
It wasn’t a slide deck.
It was a system people actually use.

And doing both the technical implementation and the delivery ownership made one thing very clear to me:

The future of GenAI belongs to teams who can build responsibly, not just quickly.

Friday, September 5, 2025

Managing Hallucinations & Trust in GenAI: What Building Real Systems Taught Me

One of the earliest lessons I learned working with Generative AI is that correctness is not binary.

Unlike traditional systems, GenAI doesn’t simply fail or succeed. It responds — sometimes confidently — even when it’s wrong. And that behavior fundamentally changes how trust must be designed, not assumed.

Hallucinations are not edge cases. They are a structural characteristic of how these models work.

The Problem Is Not That Models Hallucinate

At first, hallucinations are often treated as a model quality issue.

Improve the prompt.
Switch the model.
Add more data.

Those steps help, but they don’t eliminate the problem.

The deeper issue is how hallucinations interact with users and workflows. A wrong answer that looks plausible is more dangerous than a visible failure. Once users stop trusting the system, no accuracy metric can bring that trust back.

Trust Is a Delivery Concern, Not a Model Feature

I’ve seen projects where technically strong models failed in production because trust was never explicitly managed.

Trust is shaped by:

How confident responses sound
Whether uncertainty is communicated
How errors are handled
What happens when the model doesn’t know

None of these are purely data science problems.
They are design and delivery decisions.

As a Technical Project Manager, ignoring trust means risking adoption — even if the model is statistically strong.

How Hallucinations Actually Create Cost and Risk

Hallucinations don’t just affect quality. They create downstream consequences:

Incorrect decisions
Manual verification work
Repeated prompts and retries
Escalations and overrides
Loss of confidence in AI-assisted workflows

In regulated environments, the impact is even larger:

Compliance exposure
Audit challenges
Reputation damage

Trust issues compound silently until someone decides the system is “not reliable” and stops using it.

The Shift: From Preventing Hallucinations to Managing Them

At some point, the mindset has to change.

The goal is not zero hallucinations — that’s unrealistic.
The goal is controlled behavior when hallucinations occur.

That means designing systems that:

Know when confidence is low
Surface uncertainty instead of hiding it
Fall back to safer paths
Involve humans when risk crosses a threshold

This is an architectural choice, not a last-minute fix.

What Has Worked in Practice

In real projects, trust improved when we focused on a few principles:

Constraining models to verified sources when accuracy mattered
Separating creative use cases from factual ones
Using confidence signals to trigger review
Designing “I don’t know” as an acceptable outcome
Measuring user trust, not just model accuracy

Interestingly, users were more forgiving of systems that admitted uncertainty than systems that sounded confident and wrong.

Why Hallucinations Change the Role of the Project Manager

GenAI projects blur the line between engineering, product, and risk management.

Managing hallucinations means:

Aligning stakeholders on acceptable risk
Defining where automation ends
Setting expectations early
Making trade-offs explicit

This requires active ownership throughout the lifecycle, not just during delivery.

A Different Definition of Success

Success in GenAI is not about eliminating errors.

It’s about creating systems that:

Fail safely
Protect decision quality
Preserve user confidence
Improve over time

Trust is not a feature you add at the end.
It’s something you design from the first architecture discussion.

Closing Thought

The most dangerous AI systems are not the inaccurate ones.

They are the ones that sound certain when they shouldn’t.

Managing hallucinations is ultimately about managing trust — and trust, once lost, is extremely hard to regain. For me, that has become one of the most important lessons in delivering GenAI responsibly.

Monday, June 2, 2025

AI Cost Management & FinOps for LLMs: A Reality Check from the Field

When I first started working on Generative AI initiatives, cost was not the primary concern.

Like many teams, we were focused on proving capability:
Can the model understand our data?
Can it generate usable outputs?
Can we integrate it into real workflows?

The early demos were impressive. The costs looked manageable. Everything worked — at least on the surface.

Things changed the moment these systems moved closer to production.

When LLMs Meet Real Usage

LLMs behave very differently from traditional software systems.

Costs don’t scale linearly with users or infrastructure. They scale with:

Prompt length
Context size
Model choice
Retry behavior
Error handling
User behavior

A small architectural decision — an extra paragraph of context, an unnecessary retry, a default model choice — can multiply costs without anyone noticing immediately.

By the time it becomes visible, the budget damage is already done.

Why This Is Not a Finance Problem

One of the biggest misconceptions I see is treating GenAI cost as a finance or procurement issue.

In reality, cost is driven almost entirely by technical and delivery decisions:

Which model is used for which task
How prompts are designed
How often calls are made
How errors and hallucinations are handled
How results are cached or reused

As a Technical Project Manager, ignoring this means losing control of delivery.
Owning it means enabling sustainable scale.

The Cost Traps I’ve Seen Repeatedly

Across different teams and projects, the same patterns show up:

Overpowered defaults
High-end models are used everywhere “just in case”, even when simpler models would deliver acceptable results.

Prompt inflation
Prompts grow over time as edge cases are added, without revisiting their cost impact.

Invisible retries
Failures and hallucinations trigger silent retries, compounding cost with every error.

No guardrails
No budgets per feature, no alerts, no clear ownership of usage.

None of these are malicious.
They are natural outcomes of moving fast without cost being a first-class concern.

What FinOps for LLMs Actually Means in Practice

For me, FinOps in AI is not about cutting costs aggressively.

It’s about making trade-offs explicit.

Every GenAI system makes implicit decisions about:

Accuracy vs cost
Speed vs depth
Automation vs human review

FinOps forces those decisions into the open and ties them to real business outcomes.

Once cost is visible, better engineering decisions naturally follow.

What Has Worked in Real Projects

In practice, sustainable AI delivery comes from a few disciplined habits:

Comparing multiple models for the same use case
Matching model capability to business criticality
Monitoring cost per request and cost per outcome
Treating hallucinations as both quality and cost issues
Designing graceful degradation instead of all-or-nothing behavior

The key shift is simple but powerful:
Cost becomes a delivery metric, not a surprise.

A Shift in How I Think About AI Projects

Traditional projects end when features are delivered.

AI projects don’t.
They continue to consume value — or cost — every day they run.

That changes the role of a Technical Project Manager:

From delivery-focused to lifecycle-focused
From feature tracking to outcome tracking
From technical coordination to economic stewardship

This is not theoretical. It’s operational reality.

Closing Thought

The most impressive AI systems are not the most expensive ones.

They are the ones that:

Deliver consistent value
Earn trust
Scale responsibly
And remain economically viable over time

AI Cost Management and FinOps for LLMs are not optional add-ons anymore.
They are part of what it means to run AI projects professionally.

Tuesday, April 8, 2025

AI-Enabled Risk Scoring for TPPs in Open Banking: A Game Changer for Ecosystem Trust

As Open Banking ecosystems mature globally, traditional banks, fintech startups, and regulators face a growing challenge: how to trust the growing number of Third Party Providers (TPPs) that are now gaining access to customer data and payment rails via APIs. In this fast-evolving environment, AI-enabled risk scoring emerges as a powerful tool that can continuously evaluate the trustworthiness of TPPs based on dynamic behavior, financial performance, and regulatory compliance signals.

This article explores what AI-enabled TPP risk scoring is, why it matters, and how it can be practically implemented—both from a strategic and technical lens.

Why TPP Risk Scoring Is Critical in Open Banking

Open Banking regulations like PSD2 in Europe, OBIE in the UK, and similar frameworks across Africa and Asia mandate that banks open up their customer data (with consent) to licensed TPPs. These TPPs include:

Payment Initiation Service Providers (PISPs)
Account Information Service Providers (AISPs)
Aggregators, budgeting apps, and more

While this enhances competition and innovation, it increases the risk landscape:

API misuse or abuse
Poor data handling practices
Operational failures or security gaps
TPPs going out of business unexpectedly

Banks must assess “Who are we opening our APIs to?” on an ongoing basis, not just at onboarding. Static checklists aren’t enough.

Enter AI-Powered Dynamic Risk Scoring

An AI-enabled TPP risk scoring system continuously evaluates TPPs based on:

Usage patterns: API request frequency, data volume, error rates.
Behavioral anomalies: Sudden surges, unusual endpoints, suspicious transaction types.
External signals: Regulatory fines, media sentiment, funding rounds, business health.
Peer benchmarking: Comparing a TPP’s activity against industry norms.
Historical incident data: Security breaches, customer complaints, etc.

AI models can assign dynamic risk scores that are updated in real time or on scheduled cycles, making it easier for banks to:

Auto-trigger red flags and throttling
Prioritize monitoring
Optimize SLAs or commercial terms

A Simple Python Example: Behavior-Based TPP Risk Score

Below is a basic Python sketch that uses a decision tree classifier to assess the risk score of a TPP based on its recent API activity profile:

#python
<?XML:NAMESPACE PREFIX = "[default] http://www.w3.org/2000/svg" NS = "http://www.w3.org/2000/svg" />CopyEdit
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

# Sample dataset
data = pd.DataFrame({
    'api_calls_per_day': [1000, 5000, 100, 8000, 300],
    'error_rate': [0.01, 0.15, 0.03, 0.25, 0.02],
    'sudden_spike': [0, 1, 0, 1, 0],
    'past_incidents': [0, 2, 0, 3, 1],
    'risk_label': ['low', 'high', 'low', 'high', 'medium']
})

# Convert labels to numeric
data['risk_label'] = data['risk_label'].map({'low': 0, 'medium': 1, 'high': 2})

# Train model
features = data[['api_calls_per_day', 'error_rate', 'sudden_spike', 'past_incidents']]
labels = data['risk_label']
clf = DecisionTreeClassifier(max_depth=3)
clf.fit(features, labels)

# Predict a new TPP
new_tpp = pd.DataFrame({
    'api_calls_per_day': [4500],
    'error_rate': [0.18],
    'sudden_spike': [1],
    'past_incidents': [1]
})

predicted_risk = clf.predict(new_tpp)[0]
risk_level_map = {0: "Low", 1: "Medium", 2: "High"}
print("Predicted Risk Level:", risk_level_map[predicted_risk])

This is a simplified model. Real implementations might involve:

Gradient Boosting models or Neural Networks
Integration with anomaly detection (e.g. Isolation Forests)
Stream processing for real-time scoring (using Kafka or Spark)

Challenges and Considerations

Area
Challenge
Recommendation

Data Privacy
Risk of scoring based on sensitive info
Use anonymized or non-PII behavioral features

Fairness
Biased models could unfairly penalize new TPPs
Ensure transparency and use Explainable AI (XAI)

Regulatory Risk
Compliance with GDPR, PSD2, local rules
Document models, consent sources, scoring logic

Interpretability
Business teams need understandable outputs
Use SHAP values or decision paths

Business Value

Enhanced Security: Detect rogue or compromised TPPs early
Regulatory Compliance: Proactive risk posture appreciated by auditors
Better SLAs: Dynamic risk-based access tiers
Ecosystem Trust: Encourage responsible innovation by rewarding good actors

Conclusion

In the Open Banking world, where trust is the currency, AI-enabled TPP risk scoring is no longer optional—it's foundational. Banks and regulators must embrace AI not just for fraud detection, but for proactive partner risk management across the API economy. The path forward involves blending behavioral science, machine learning, and regulatory foresight to make Open Banking safer and smarter for everyone.

Monday, April 7, 2025

Emergency Passwords: A Simple Yet Powerful Shield for Open Banking Security

In a world where open banking is reshaping how we interact with financial institutions, cybersecurity has never been more critical. While the benefits of open banking are clear—seamless integrations, smarter financial management, and personalized experiences—it also opens up a Pandora’s box of cyber threats. One of the most innovative ideas emerging to counter this is the “Emergency Password.”

This concept, although simple, can be a game-changer in protecting user accounts during high-risk situations, especially under duress or when facing social engineering attacks.

What is an Emergency Password?

Imagine you're coerced—either digitally or physically—into logging into your banking app. You can't say no. You can't alert anyone. That’s where the Emergency Password comes in.

An Emergency Password is a secondary, pre-defined credential that looks and feels like a valid login, but when entered:

It gives limited access to dummy or decoy data.
It silently triggers an alert to the security team.
It can optionally freeze high-risk operations like transfers or withdrawals.

How AI Can Detect Emergency Password Usage

AI can play a role in differentiating between a regular login and a duress login based on several features like:

Password pattern
Device behavior
Typing speed
Login context (time, location, IP)

Here's a simple Python AI example using scikit-learn to classify a login attempt as normal or under duress:

#python
<?XML:NAMESPACE PREFIX = "[default] http://www.w3.org/2000/svg" NS = "http://www.w3.org/2000/svg" />
 
from sklearn.tree import DecisionTreeClassifier

# Sample data: [typing_speed (ms), is_known_device (0/1), is_emergency_password (0/1)]
# 0 = normal login, 1 = under duress
X = [
    [150, 1, 0],  # Normal login
    [140, 1, 0],
    [400, 1, 1],  # Duress (emergency password entered)
    [380, 0, 1],
    [160, 0, 0],
    [390, 1, 1],
]

y = [0, 0, 1, 1, 0, 1]

clf = DecisionTreeClassifier()
clf.fit(X, y)

# Incoming login attempt (typing speed: 410ms, known device: yes, emergency password used: yes)
login_input = [[410, 1, 1]]
prediction = clf.predict(login_input)

if prediction[0] == 1:
    print(" Emergency login detected! Triggering silent alert...")
else:
    print("✅ Normal login.")

What This Code Does:

It trains a simple AI model using a few login attributes.
When a login is attempted using the emergency password, it flags it as a duress scenario.
In a real system, this would trigger silent alerts, activate safe-mode dashboards, or freeze sensitive actions.

Conclusion

In the evolving battlefield of digital finance, traditional passwords are no longer enough. Innovations like the Emergency Password empower users in moments when they are most vulnerable. As open banking continues to grow, so must our creative approaches to security.

Adding this layer of protection isn’t just smart—it’s humane. Because real people, under real pressure, deserve real safety.

Saturday, April 5, 2025

How Cyber-Crimes Threaten Open Banking and How to Prevent Them

Open banking is transforming the way businesses and individuals interact with financial institutions, offering increased convenience, faster transactions, and broader access to banking services. However, with this innovation comes the risk of cyber-crimes that can exploit vulnerabilities in these systems. One striking example of how these risks could unfold is a scenario where a criminal, threatening a wealthy individual with a weapon, forces them to transfer funds from their open banking app.

In such a case, the attacker could have full access to the victim's financial information via their mobile device, putting both the individual and financial institutions at risk. As the open banking ecosystem becomes more integrated into everyday life, understanding the technological and business implications of these threats is crucial. Let’s explore how cyber-crimes can negatively affect open banking technologies and the strategies that can be employed to minimize such risks.

Introducing the "Emergency Password"

One innovative way to address physical and psychological threats is by implementing an "Emergency Password". This concept involves allowing users to set a unique password they can use during a high-pressure situation, such as when they are being coerced into transferring money.

Here’s how it works:

If a user is under duress and is forced to make a transaction (e.g., at gunpoint), they can enter the "Emergency Password" when prompted for authentication. This password would appear like a normal login but would alert the bank to the situation.
Upon entering the emergency password, the transaction would proceed normally, allowing the attacker to think everything is fine. However, the transaction is flagged in the backend as "suspicious," and it is rolled back after a specified period (e.g., 24 hours) unless verified by the user.
Simultaneously, the bank would notify local authorities and share the user's GPS location with law enforcement, enabling rapid intervention.

This emergency response could save lives, offering a discreet way for individuals to alert authorities while still complying with the attacker’s demands in the moment. It provides an innovative safeguard for users under physical threat, allowing the system to reverse potentially dangerous transactions and notify the authorities.

1. Vulnerability to Social Engineering and Physical Threats

Open banking technologies heavily rely on the secure exchange of sensitive financial data between banks, third-party providers (TPPs), and consumers. While encryption and authentication protocols help protect this data during transmission, attackers can still exploit human psychology. The scenario where a criminal forces a person to transfer funds at gunpoint is a physical manifestation of social engineering, where an attacker manipulates an individual into giving up access to their banking services.

Phishing and SIM-swapping: Cybercriminals may also use less dramatic methods to gain access to accounts, such as phishing attacks or SIM-swapping. These techniques can compromise account credentials, enabling criminals to access banking apps.

2. Mobile Security Risks

As mobile devices become central to open banking, they also become attractive targets for cybercriminals. Open banking apps installed on smartphones often store sensitive credentials, and attackers who gain access to the phone can potentially authorize transactions.

Mobile malware, unauthorized access, or even physical theft of the device can lead to significant financial losses if no additional security measures are in place.

3. Technological Solutions to Reduce Risks

From a technological standpoint, the first line of defense is to ensure that all data exchanges are encrypted using advanced encryption standards (e.g., AES-256). But beyond encryption, here are some essential steps to protect against cyber-crimes:

Multi-factor Authentication (MFA): Use multi-factor authentication (MFA) for every transaction or access request. This adds an extra layer of security beyond just the password. For example, integrating biometric features such as fingerprint or facial recognition can ensure that only authorized individuals can access their banking services.
Device Management: Financial institutions should implement device management systems that can identify and control the devices accessing their banking apps. This can help mitigate risks from stolen or compromised devices.
Transaction Limits and Alerts: For high-value transactions, implement daily or per-transaction limits. Additionally, send instant notifications for every transaction to the user's registered mobile number or email address.

Here’s a simple Python script that demonstrates the concept of alerting users when a high-value transaction attempt is made:

#python
<?XML:NAMESPACE PREFIX = "[default] http://www.w3.org/2000/svg" NS = "http://www.w3.org/2000/svg" />CopierModifier
class Transaction:
    def __init__(self, amount, transaction_type):
        self.amount = amount
        self.transaction_type = transaction_type

    def alert_user(self):
        if self.amount > 5000:  # Threshold for high-value transactions
            print(f"ALERT: A transaction of {self.amount} USD is being processed.")
            print("A notification will be sent to your registered contact.")
        else:
            print(f"Transaction of {self.amount} USD processed successfully.")

# Simulate a transaction
transaction1 = Transaction(10000, "Transfer")
transaction1.alert_user()

This simple script checks if a transaction exceeds a predefined threshold and alerts the user about the high-value transaction. By integrating such systems, open banking platforms can add an additional layer of vigilance to prevent unauthorized transfers.

4. Business Solutions to Combat Cyber-crime

From a business perspective, financial institutions should adopt a proactive stance toward cybersecurity. Here are several steps businesses can take:

Employee Training: Regularly train employees to recognize potential threats and understand the security procedures for handling sensitive financial data.
Collaborate with Law Enforcement: In the event of a physical or online attack, it's essential to collaborate with law enforcement agencies to ensure quick action can be taken to track and apprehend perpetrators.
Invest in Cybersecurity Infrastructure: Businesses should invest in robust cybersecurity systems, including firewalls, intrusion detection systems (IDS), and regular security audits.
Client Education: Educating clients about safe practices, such as using strong passwords, avoiding suspicious links, and keeping their devices secure, can reduce the likelihood of successful attacks.

5. The Future of Open Banking Security

As the world of open banking evolves, so too will the threats posed by cyber-crimes. Financial institutions and third-party providers must remain vigilant, continually assessing new risks and adapting their security protocols. Innovations such as blockchain, AI-based fraud detection systems, and quantum cryptography are expected to play key roles in securing open banking transactions.

However, while technology plays a crucial role, it's equally important for businesses to maintain a customer-centric approach that places cybersecurity at the forefront of their operations. Only through a combination of advanced technology, business strategy, and user education can open banking systems be protected from the increasing threat of cyber-crimes.

Conclusion Open banking technologies have revolutionized the financial industry, but they also come with increased risks due to the fast-moving world of cyber-crime. While a scenario of physical coercion remains extreme, the rapid evolution of hacking techniques calls for stronger, more adaptive security measures. Both technological innovations and proactive business strategies are essential to safeguard the integrity of open banking systems. Implementing tools like the "Emergency Password" could be a game-changer, offering a safety net for users under duress.