Ruthlessly Helpful

Stephen Ritchie's offerings of ruthlessly helpful software engineering practices.

Tag Archives: Automated Testing

Quality Assurance When Machines Write Code

Automated Testing in the Age of AI

When I wrote about automated testing in “Pro .NET Best Practices,” the challenge was convincing teams to write tests at all. Today, the landscape has shifted dramatically. AI coding assistants can generate tests faster than most developers can write them manually. But this raises a critical question: if AI writes our code and AI writes our tests, who’s actually ensuring quality?

This isn’t a theoretical concern. I’m working with teams right now who are struggling with this exact problem. They’ve adopted AI coding assistants, seen impressive productivity gains, and then discovered that their AI-generated tests pass perfectly while their production systems fail in unexpected ways.

The challenge isn’t whether to use AI for testing; that ship has sailed. The challenge is adapting our testing strategies to maintain quality assurance when both code and tests might come from machine learning models.

The New Testing Reality

Let’s be clear about what’s changed and what hasn’t. The fundamental purpose of automated testing remains the same: gain confidence that code works as intended, catch regressions early, and document expected behavior. What’s changed is the economics and psychology of test creation.

What AI Makes Easy

AI coding assistants excel at several testing tasks:

Boilerplate Test Generation: Creating basic unit tests for simple methods, constructors, and data validation logic. These tests are often tedious to write manually, and AI can generate them consistently and quickly.

Test Data Creation: Generating realistic test data, edge cases, and boundary conditions. AI can often identify scenarios that developers might overlook.

Test Coverage Completion: Analyzing code and identifying untested paths or branches. AI can suggest tests that bring coverage percentages up systematically.

Repetitive Test Patterns: Creating similar tests for related functionality, like testing multiple API endpoints with similar structure.

For these scenarios, AI assistance is genuinely ruthlessly helpful. It’s practical (works with existing test frameworks), generally accepted (becoming standard practice), valuable (saves significant time), and archetypal (provides clear patterns).

What AI Makes Dangerous

But there are critical areas where AI-assisted testing introduces new risks:

Assumption Alignment: AI generates tests based on code structure, not business requirements. The tests might perfectly validate the code’s implementation while missing the fact that the implementation itself is wrong.

Test Quality Decay: When tests are easy to generate, teams stop thinking critically about test design. You end up with hundreds of tests that all validate the same happy path while missing critical failure modes.

False Confidence: High test coverage numbers from AI-generated tests can create illusion of safety. Teams see 90% coverage and assume quality, when those tests might be superficial.

Maintenance Burden: AI can create tests faster than you can maintain them. Teams accumulate thousands of tests without considering long-term maintenance cost.

This is where we need new strategies. The old testing approaches from my book still apply, but they need adaptation for AI-assisted development.

A Modern Testing Strategy: Layered Assurance

Here’s the framework I’m recommending to teams adopting AI coding assistants. It’s based on the principle that different types of tests serve different purposes, and AI is better at some than others.

Layer 1: AI-Generated Unit Tests (Speed and Coverage)

Let AI generate basic unit tests, but with constraints:

What to Generate:

  • Pure function tests (deterministic input/output)
  • Data validation and edge case tests
  • Constructor and property tests
  • Simple calculation and transformation logic

Quality Gates:

  • Each AI-generated test must have a clear assertion about expected behavior
  • Tests should validate one behavior per test method
  • Generated tests must include descriptive names that explain what’s being tested
  • Code review should focus on whether tests actually validate meaningful behavior

Implementation Example:

// AI excels at generating tests like this
[Theory]
[InlineData(0, 0)]
[InlineData(100, 100)]
[InlineData(-50, 50)]
public void Test_CalculateAbsoluteValue_ReturnsCorrectResult(int input, int expected)
{
    # Arrange + Act
    var result = MathUtilities.CalculateAbsoluteValue(input);

    # Assert
    Assert.Equal(expected, result);
}

The AI can generate these quickly and comprehensively. Your job is ensuring they test the right things.

Layer 2: Human-Designed Integration Tests (Confidence in Behavior)

This is where human judgment becomes critical. Integration tests verify that components work together correctly, and AI often struggles to understand these relationships.

What Humans Should Design: – Tests that verify business rules and workflows – Tests that validate interactions between components – Tests that ensure data flows correctly through the system – Tests that verify security and authorization boundaries

Why Humans, Not AI: AI generates tests based on code structure. Humans design tests based on business requirements and failure modes they’ve experienced. Integration tests require understanding of what the system should do, not just what it does do.

Implementation Approach: 1. Write integration test outlines describing the scenario and expected outcome 2. Use AI to help fill in test setup and data creation 3. Keep assertion logic explicit and human-reviewed 4. Document the business rule or requirement each test validates

Layer 3: Property-Based and Exploratory Testing (Finding the Unexpected)

This layer compensates for both human and AI blind spots.

Property-Based Testing: Instead of testing specific inputs, test properties that should always be true. AI can help generate the properties, but humans must define what properties matter. For more info, see: Property-based testing in C#

Example:

// Property: Serializing then deserializing should return equivalent object
[Test]
public void Test_SerializationRoundTrip_PreservesData()
{
    # Arrange
    var user = TestHelper.GenerateTestUser();
    var serialized = JsonSerializer.Serialize(user);
  
    # Act
    var deserialized = JsonSerializer.Deserialize<User>(serialized);

    # Assert
    Assert.Equal(user, deserialized);
}

Exploratory Testing: Use AI to generate random test scenarios and edge cases that humans might not consider. Tools like fuzzing can be enhanced with AI to generate more realistic test inputs.

Layer 4: Production Monitoring and Observability (Reality Check)

The ultimate test of quality is production behavior. Modern testing strategies must include:

Synthetic Monitoring: Automated tests running against production systems to validate real-world behavior

Canary Deployments: Gradual rollout with automated rollback on quality metrics degradation

Feature Flags with Metrics: A/B testing new functionality with automated quality gates

Error Budget Tracking: Quantifying acceptable failure rates and automatically alerting when exceeded

This layer catches what all other layers miss. It’s particularly critical when AI is generating code, because AI might create perfectly valid code that behaves unexpectedly under production load or data.

Practical Implementation: What to Do Monday Morning

Here’s how to adapt your testing practices for AI-assisted development, starting immediately.

Step 1: Audit Your Current Tests

Before generating more tests, understand what you have:

Coverage Analysis:

  • What percentage of your code has tests?
  • More importantly: what critical paths lack tests? what boundaries lack tests?
  • Which tests actually caught bugs in the last six months?

Test Quality Assessment:

  • How many tests validate business logic vs. implementation details?
  • Which tests would break if you refactored code without changing behavior?
  • How long do your tests take to run, and is that getting worse?

Step 2: Define Test Generation Policies

Create clear guidelines for AI-assisted test creation:

When to Use AI:

  • Generating basic unit tests for new code
  • Creating test data and fixtures
  • Filling coverage gaps in stable code
  • Adding edge case tests to existing test suites

When to Write Manually:

  • Integration tests for critical business workflows
  • Security and authorization tests
  • Performance and scalability tests
  • Tests for known production failure modes

Quality Standards:

  • All AI-generated tests must be reviewed like production code
  • Tests must include names or comments explaining what behavior they validate
  • Test coverage metrics must be balanced with test quality metrics

Step 3: Implement Layered Testing

Don’t try to implement all layers at once. Start where you’ll get the most value:

Week 1-2: Implement Layer 1 (AI-generated unit tests)

  • Choose one module or service as a pilot
  • Generate comprehensive unit tests using AI
  • Review and refine to ensure quality
  • Measure time savings and coverage improvements

Week 3-4: Strengthen Layer 2 (Human-designed integration tests)

  • Identify critical user workflows that lack integration tests
  • Write test outlines describing expected behavior
  • Use AI to help with test setup, but keep assertions human-designed
  • Document business rules and logic each test validates

Week 5-6: Add Layer 4 (Production monitoring)

  • Implement basic synthetic monitoring for critical paths
  • Set up error tracking and alerting
  • Create dashboards showing production quality metrics
  • Establish error budgets for key services

Later: Add Layer 3 (Property-based testing)

  • This is most valuable for mature codebases
  • Start with core domain logic and data transformations
  • Use property-based testing for scenarios with many possible inputs

Step 4: Measure and Adjust

Track both leading and lagging indicators of test effectiveness:

Leading Indicators:

  • Test creation time (should decrease with AI)
  • Test coverage percentage (should increase)
  • Time spent reviewing AI-generated tests
  • Number of tests created per developer per week

Lagging Indicators:

  • Defects caught in testing vs. production
  • Production incident frequency and severity
  • Time to identify root cause of failures (should decrease with AI-generated tests)
  • Developer confidence in making changes

The goal isn’t maximum test coverage; it’s maximum confidence in quality control at minimum cost.

Common Obstacles and Solutions

Obstacle 1: “AI-Generated Tests All Look the Same”

This is actually a feature, not a bug. Consistent test structure makes tests easier to maintain. The problem is when all tests validate the same thing.

Solution: Focus review effort on test assertions. Do the tests validate different behaviors, or just different inputs to the same behavior? Use code review to catch redundant tests before they accumulate.

Obstacle 2: “Our Test Suite Is Too Slow”

AI makes it easy to generate tests, which can lead to exponential growth in test count and execution time.

Solution: Implement test categorization and selective execution. Use tags to distinguish:

  • Fast unit tests (run on every commit)
  • Slower integration tests (run on pull requests)
  • Full end-to-end tests (run nightly or on release)

Don’t let AI generate slow tests. If a test needs database access or external services, it should be human-designed and tagged appropriately.

Obstacle 3: “Tests Pass But Production Fails”

This is the fundamental risk of AI-assisted development. Tests validate what the code does, not what it should do.

Solution: Implement Layer 4 (production monitoring) as early as possible. No amount of testing replaces real-world validation. Use production metrics to identify gaps in test coverage and generate new test scenarios.

Obstacle 4: “Developers Don’t Review AI Tests Carefully”

When tests are auto-generated, they feel less important than production code. Reviews become rubber stamps.

Solution: Make test quality a team value. Track metrics like:

  • Percentage of AI-generated tests that get modified during review
  • Bugs found in production that existing tests should have caught
  • Test maintenance cost (time spent fixing broken tests)

Publicly recognize good test reviews and test design. Make it clear that test quality matters as much as code quality.

Quantifying the Benefits

Organizations implementing modern testing strategies with AI assistance report numbers that should be taken with a grain of salt, because of source bias, different levels of maturity, and the fact that not all “test coverage” is equally valuable.

Calculate your team’s current testing economics:

  • Hours per week spent writing basic unit tests
  • Percentage of code with meaningful test coverage
  • Bugs caught in testing vs. production
  • Time spent debugging production issues

Then try to quantify the impact of:

  • AI generating routine unit tests (did you save 40% of test writing time?)
  • Investing saved time in better integration and property-based tests
  • Earlier defect detection (remember: production bugs cost 10-100x more to fix)

Next Steps

For Individual Developers

This Week:

  • Try using AI to generate unit tests for your next feature
  • Review the generated tests critically; do they test behavior or just implementation?
  • Write one integration test manually for a critical workflow

This Month:

  • Establish personal standards for AI test generation
  • Track time saved vs. time spent reviewing
  • Identify one area where AI testing doesn’t work well for you

For Teams

This Week:

  • Discuss team standards for AI-assisted test creation
  • Identify one critical workflow that needs better integration testing
  • Review recent production incidents; would better tests have caught them?

This Month:

  • Implement one layer of the testing strategy
  • Establish test quality metrics beyond just coverage percentage
  • Create guidelines for when to use AI vs. manual test creation

For Organizations

This Quarter:

  • Assess current testing practices across teams
  • Identify teams with effective AI-assisted testing approaches
  • Create shared guidelines and best practices
  • Invest in testing infrastructure (fast test execution, better tooling)

This Year:

  • Implement comprehensive production monitoring
  • Measure testing ROI (cost of testing vs. cost of production defects)
  • Build testing capability through training and tool investment
  • Create culture where test quality is valued as much as code quality

Commentary

When I wrote about automated testing in 2011, the biggest challenge was convincing developers to write tests at all. The objections were always about time: “We don’t have time to write tests, we need to ship features.” I spent considerable effort building the business case for testing; showing how tests save time by catching bugs early.

Today’s challenge is almost the inverse. AI makes test creation so easy that teams can generate thousands of tests without thinking carefully about what they’re testing. The bottleneck has shifted from test creation to test design and maintenance.

This is actually a much better problem to have. Instead of debating whether to test, we’re debating how to test effectively. The ruthlessly helpful framework applies perfectly: automated testing is clearly valuable, widely accepted, and provides clear examples. The question is how to be practical about it.

My recommendation is to embrace AI for what it does well (generating routine, repetitive tests) while keeping humans focused on what we do well:

  • understanding business requirements,
  • anticipating failure modes, and
  • designing tests that verify real-world behavior.

The teams that thrive won’t be those that generate the most tests or achieve the highest coverage percentages. They’ll be the teams that achieve the highest confidence with the most maintainable test suites. That requires strategic thinking about testing, not just tactical application of AI tools.

One prediction I’m comfortable making: in five years, we’ll look back at current test coverage metrics with the same skepticism we now have for lines-of-code metrics. The question won’t be “how many tests do you have?” but “how confident are you that your system works correctly?” AI-assisted testing can help us answer that question, but only if we’re thoughtful about implementation.

The future of testing isn’t AI vs. humans. It’s AI and humans working together, each doing what they do best, to build more reliable software faster.

Fakes, Stubs and Mocks

I’m frequently asked about the difference between automated testing terms like fakes, stubs and mocks.

The term fake is a general term for an object that stands in for another object; both stubs and mocks are types of fakes. The purpose of a fake is to create an object that allows the method-under-test to be tested in isolation from its dependencies, meeting one of two objectives:

1. Stub — Prevent the dependency from obstructing the code-under-test and to respond in a way that helps it proceed through its logical steps.

2. Mock — Allow the test code to verify that the code-under-test’s interaction with the dependency is proper, valid, and expected.

Since a fake is any object that stands in for the dependency, it is how the fake is used that determines if it is a stub or mock. Mocks are used only for interaction testing. If the expectation of the test method is not about verifying interaction with the fake then the fake must be a stub.

BTW, these terms are explained very well in The Art of Unit Testing by Roy Osherove.

There’s more to learn on this topic at stackoverflow https://stackoverflow.com/questions/24413184/difference-between-mock-stub-spy-in-spock-test-framework