How AIs Horseshoe Curve Causes Predictions to Fail Where We Least Expect - with Greg Paskal

Categories: Podcasts , The Value of Software Testing

June 4, 2026

The podcast examines the transformative impact of technology on testing practices since the 1990s, highlighting AI testing challenges, the hype-driven “gold rush” in AI, and the need for education, collaboration, and humility amid superficial adoption. It emphasizes security risks in AI systems, the critical role of QA validation, and the irreplaceable importance of human oversight and rigorous testing to counter AI limitations and ethical pitfalls.

The Value of Software Testing

Randy Rice has a video Software Testing podcast - solo shows and interviews. Youtube only.

Episode Details

Show Notes: https://www.youtube.com/watch?v=ybrxoNiAhJw
Published: 2026-06-04T14:24:30Z
Duration: 00:55:04
Author: Rice Consulting Services, Inc.

Overview

The podcast reflects on the evolution of technology since the 1990s, emphasizing how tools like the internet, fax machines, and early computing hardware have transformed testing and development practices. It introduces Greg Pascal, an expert in test automation, and explores challenges in AI testing, such as unpredictable failures and the distinction between deterministic and non-deterministic scenarios. The discussion highlights the current AI landscape as a “gold rush,” marked by abundant jargon and a lack of true expertise, urging humility and transparency in understanding AIs limitations. It critiques superficial adoption of AI terminology and emphasizes the need for education on concepts like agentic AI, while acknowledging widespread impostor syndrome among newcomers. Collaborative, jargon-free approaches to problem-solving are encouraged, along with simplifying complex ideas for broader clarity.

Security risks in AI systems are addressed, including vulnerabilities exposed by chatbots that unintentionally reveal backend data, the importance of restricting AI agents access to critical systems, and the dangers of relying on AI-generated outputs without verification. Testing strategies for AI are explored, such as addressing the “horseshoe curve problem,” where AI focuses on data extremes and overlooks mid-range values, and the risks of type mismatches in languages like Python, which can silently produce errors in safety-critical contexts. The podcast stresses the role of QA professionals in validating AI outputs, ensuring numerical accuracy in critical systems, and prioritizing human oversight over automated processes. It warns against “vibe coding” and superficial automation, advocating for rigorous testing, modular design, and formal proofs to mitigate risks in AI-driven applications.

The discussion underscores the need for QA advocacy to protect product quality and brand integrity, even as AI reshapes testing and development workflows. It critiques the overestimation of AI capabilities and the risks of AI-generated content, citing examples like fabricated legal citations. The podcast concludes by emphasizing the irreplaceable value of human judgment in AI adoption, urging caution against replacing expertise with unverified tools and advocating for education, collaboration, and critical evaluation of AIs role in complex systems.

What If

What if you focused AI testing on mid-range data values instead of extremes
- Move: Implement a testing strategy that prioritizes equivalence partitioning, sampling midpoints and quarter points in ranges (e.g., 110,000), rather than relying on AI to handle edge cases.
- Why Now?: AI systems often overlook mid-range values, increasing risks in financial, medical, or safety-critical systems where defects here can cause real harm.
- Expected Upside: Catch hidden bugs early, align with traditional QA practices, and reduce reliance on AI for tasks its inherently bad at (e.g., boundary value analysis).
What if you restricted AI agents to non-critical systems and consulted architects upfront
- Move: Limit AI agent access to non-sensitive components (e.g., UI elements, non-financial data) and involve senior architects to define safe operational boundaries.
- Why Now?: Rapid AI adoption without safeguards is exposing systems to vulnerabilities (e.g., chatbots revealing backend details). Early architectural review mitigates this.
- Expected Upside: Minimize exposure to unknown risks, ensure compliance with security protocols, and avoid costly rework later in development.
What if you adopted a “silent confidence framework” with strongly typed languages
- Move: Use strongly typed languages like Java or C++ for critical backend processing, avoiding Pythons silent type mismatches (e.g., integers vs. floats).
- Why Now?: Pythons “silent confidence” can mask critical defects (e.g., truncating decimal values), which is risky for safety-critical systems.
- Expected Upside: Reduce subtle, hard-to-detect errors, improve code reliability, and align with QA practices that demand explicit error handling.

Takeaway

Implement structured testing for AI outputs, focusing on edge cases: Prioritize testing numeric data, critical systems (e.g., finance, safety), and mid-range values (e.g., 0.1, 100.01) to catch defects like AI “hallucinations” or misinterpreted data types. Avoid relying solely on AI-generated test cases for validation.
Use strongly typed languages for safety-critical code: When developing backend systems, explicitly enforce type constraints using languages like Java or C++ to prevent silent errors (e.g., truncating decimal values) that Pythons duck-typing might mask.
Validate AI-generated code manually and with unit tests: Avoid vibe coding by thoroughly reviewing AI-produced code for complexity, modularity, and correctness. Use unit tests to catch regressions or subtle bugs introduced by iterative AI prompt corrections.
Educate stakeholders on AI limitations and risks: Explain AIs unreliability (e.g., 5-year-old with ADD analogy) to non-technical decision-makers, emphasizing the need for human oversight and avoiding superficial adoption of AI jargon without understanding.
Engage with QA communities and avoid AI jargon: Join QA testing groups to collaborate on AI-specific challenges (e.g., equivalence partitioning, oracles). Refrain from using unclear terms like agentic AI unless you can simplify them for non-experts (e.g., AI communicating with smaller entities).

For a PDF of longer Software Testing Podcast Episode Summaries with Briefing Notes and more detailed summary notes, visit EvilTester Patreon Podcast Summaries.