Is My Software Quality Bad – in Three Metrics

Three simple metrics that software engineering leaders can use to measure software quality. 

How do your customers see your software?

If you are wondering whether your team’s software quality could be better, you may already have some data.

Are you seeing any of these concerning signs?

  • QA is always falling behind or being a bottleneck.
  • New projects are started without any test strategy or plan.
  • There is no automation or mainly UI automation.
  • You have stabilization sprints before releases.
  • You are losing engineers.
  • You do not measure quality and do not have a good quality dashboard.

These are all quality-related frustrations, and they should be fixed, but they are not enough on their own to say the software has quality problems. Quality is defined by your customer experience.

You can measure your customer satisfaction or net promotor score, but a poor score may just be an indication that you don’t have the right features, or your support and customer success departments are not doing their jobs. What you want to know is “is my engineering team producing high-quality software?”

Three metrics to measure your software quality

To check your quality quickly, try the following three metrics:

The first metric is the amount of time your team spends fixing customer-found defects

Take a look at the figure Engineering Time Spent by Area below that describes what teams can spend their time on. The left-right axis describes work that is past-facing or future-facing. The top-down axis work that is done for external customers or internal technology to support those customers. You want your teams to spend time building new features. However, that can’t be all the team does. They must also spend time updating the architecture to support future features. Your teams need to spend time paying down technical debt to prevent it from building up and slowing down development. Technical debt includes software upgrades, replacing out-of-date components, refactoring to improve code that was not ideal when it was first written, and so on.

Company engineering time spent broken down by past/future and externally facing/internal facing axes.
Engineering Time Spent by Area with Typical Values

The thing you least want engineers doing is spending time correcting customer-found defects. For this metric, don’t worry about the time it takes to fix internally found bugs, that’s just part of your software development process. The concern is the time it takes to fix customer found bugs including bugs that block new deployments. Time spent fixing customer bugs slows down new features and makes customers unhappy.

So how do you measure the time spent on defect correction?

If you have a dedicated team that is fixing customer bugs, then just divide the size of that team by the whole engineering team size (Dev and QA, but not managers) and you’ll get the percentage of effort spent fixing bugs. By the way, having a dedicated bug-fix team is a bad sign for your quality. In general, it is not a good idea to have such a team except as a short-term workaround to get through an unexpected bug backlog – say six months. If you have to keep the team longer or repeatedly build such a team, there’s a quality problem.

If you are using planning poker for estimation, you can compare the number of story points for bugs vs. other work. Story points are not comparable across teams, but you can look at the number of points each team spends on fixing customer bugs vs. their other work.

If you don’t have story points you can try the number of bugs and the number of stories. For example, say your average story takes 3 days and your average bug takes 1.5 days – include both development and QA time. Take the number of customer bugs / (bugs + (stories * 3/1.5)) and you have your percentage.

So, what does good quality look like?

Based on the over 100 companies I’ve looked at, 5% or less of engineering time spent fixing customer bugs is ideal. 5% to 10% is OK. 10% to 15% isn’t the end of the world, but it’s begun to smell bad. Spending more than 15% of the team’s time fixing customer bugs indicates a significant quality problem that is wasting development time you could be spending on new features.

The second metric is the Change Failure Rate, also called the Hotfix Rate

What percentage of your releases had a problem you needed to fix as soon as possible (without waiting for the next release)?

Count your last 6 to 10 releases (or 1 to 2 years’ worth if releases are very infrequent). The number of releases with issues divided by the total number of releases is your hotfix rate. If you are having trouble matching hotfixes to releases, you can get the number more easily by adding up all the hotfixes in 6 months or a year and the number of releases at the same time and divide the number of hotfixes by the number of releases. See here for more about the measures.

The Change Failure rate is one of the Google DevOps Research and Assessment metrics (DORA). Google measures these every year across many companies. In 2022, companies with a failure rate of

  • 46% to 60% or higher rated Low Quality
  • 16% to 30% were Medium Quality
  • Up to 15% were High Quality

The change failure rate is not a perfect metric. It is highly correlated with your release frequency. If you release more often, you’ll probably have more releases that don’t have failures. However, it’s still a good way to see if you have a problem from a customer’s point of view.

The third metric is the bug production rate or incoming bug rate

You can think of software engineering teams as producing features, but engineering teams also produce bugs. Engineering teams tend to produce bugs at a relatively steady rate. We can measure that rate.

Measure the customer-found bugs over some time and divide them by the size of the engineering team. Here’s the formula:

Bug Production Rate = (number of customer bugs) / (groups of 8 engineers) / (the number of 2-week sprints in a time period)

You can get the number of customer-found defects by looking at ones escalated by customer support. If you have a service, you may also want to add in the operational failures that cause service issues like downtime.

I normalize by groups of 8 engineers because I want to emphasize that we are not measuring the bug production of an individual developer. We’re not trying to find blame. We want to measure the quality of the software. The number of engineers includes developers and QA, but not managers or people who don’t code (Product Owners, non-coding leads, non-coding Scrum Masters, etc.).

This is a slowly changing metric, so measure it across a minimum of 3 months. You want several releases to be covered. It’s not a very good metric to look at every sprint to see if you are doing better, as it will fluctuate based on release schedules and when customers use the features.

Based on the 100+ companies I’ve looked at, companies with 2 or fewer bugs/teams of 8/sprint have good enough quality and it’s not worth trying to improve it. Companies with 4+ bugs/teams/sprint have a quality problem that needs fixing. Between 2 and 4 is the “it’s starting to smell, you don’t have to fix it right now, but a little quality love wouldn’t hurt”.

So, now what?

Let’s say you’ve confirmed your suspicion, and your software quality could be improved. Can you use these metrics to help you make improvements? Unfortunately, no. These metrics are great for showing if you have a quality problem. They are not diagnostic enough to tell you what the problem is, or how to fix it. Also, because each metric is slow to change, they should be used only every 3 or 4 releases to track improvement, but not more frequently. In the short-term there is a lot of noise in the data that will obscure the trend.

If you have a quality problem, you’re going to need to diagnose the root causes of the problem to come up with a transformation plan to fix the causes. Root causes vary with every company! I’ve seen companies where they had a disconnect between requirements and engineering, others that had integration problems between teams, others that lacked deployment automation leading to release issues, and so on. This is where I can help you!

My Experience

If you need help improving your software quality, I have been transforming the software quality of companies as a consultant for over 8 years. I have

  • over 30 years of experience leading teams to improve quality, service resilience, and availability.
  • acted as an interim leader guiding quality engineering for several companies.
  • led multiple Quality, DevOps, and Scaled Agile transformations across various industries.

I can help you find the root causes of your quality problems, create a plan to fix them, and coach you through the transformation.

Copyright © 2023, All Rights Reserved by Bill Hodghead, shared under creative commons license 4.0

Continue ReadingIs My Software Quality Bad – in Three Metrics

How to Improve Your QA Productivity

Help! My QA Team Can’t Keep Up with Development!

We hear a lot of variations on this cry for help:

  • “QA is always falling behind.”
  • “QA doesn’t have enough time for the full test passes.”
  • “There isn’t enough time for automation or performance work.”

Does any of this sound familiar?

There are many ways to improve your QA productivity. Let’s look at some of the best practices and anti-patterns.

It’s beyond the scope of this post to show every bad practice, but we’ll try to hit the most common ones we see and describe what to do about them.

5 ways to improve your QA Productivity

Here are our top five methods to improve QA productivity. For each one, we talk about when to use it, best practices, and anti-patterns.

  1. Test Automation
  2. Architecture Changes to Make Testing Easier
  3. Write Better Manual Tests
  4. Put Developers and QA Close Together
  5. Make Quality Everyone’s Job

1. Test Automation

Automation is usually the first thing people think of to improve QA productivity. Yes, it can help, when done right, but it may not be the most important thing you can do.

The decision to automate is a simple return on investment (ROI) decision. Consider:

  • How much time do I spend manually doing this test?
  • How much time will it take to automate?
  • How much time will it take to maintain, run, and debug the automation once it’s built?

The time to create and maintain the automation better be less than you spend doing manual testing or you haven’t increased productivity. When adding automation to a regular build pipeline, you may plan on increasing the number of times you run the test by 10x to 1000x, so the automation makes much more sense.

The following are good rules to live by when considering automation:

  • Automate the test if you are going to run it at least 6 more times. Your cutoff may vary, but this rule has worked for us.
  • Automate tests where you would fix a bug the automation found. Start automating the tests that would find your worst bugs with the least automation time and work downward. At some point, you get to diminishing returns.
  • Don’t automate a UI-only feature like a picture or the color of the dialog. When developers change a UI, these changes should be manually tested. A human will find usability issues much better than a machine. In one product measured over a year, over 50% of the UI bugs would not have been found by UI automation. Only a human would have caught them, and the same was not true in reverse.
  • Your first automated tests will be slow – that’s OK. Your first few tests may take a lot more time to automate, because you need to build up common test libraries. The more tests you automate, the more shared code you have, and automation times should drop. However, you should drop tests to the bottom of your list that are going to take several days to automate.

Best Practices

  • Test one thing. Most of your automated tests should check one important behavior. You will have a few end-to-end tests that check a lot of things exercising a key business scenario, but most automated tests will be much simpler. A simple test is easier to maintain and debug.
  • Log test failures and automation failures differently. A test failing because the product is broken is not the same as one that fails because it couldn’t run properly. Track different failures for these cases. You’ll save debugging and reporting time later.


  • Excessive UI automation. UI automation ROI tends to be great for the first 10 tests. It’s OK up to 100 tests and degrades severely as you head toward 1000 tests. This is the nature of the beast. See my blog: Less UI Automation, More Quality

2.  Architecture Changes to Make Testing Easier

Want the biggest ROI for quality? Write easily maintainable code.

The companies with the lowest incoming bug rates are the ones that have the best architectures – irrespective of any other development practice.

They have a core set of code that rarely changes but is configurable and extensible. Components are modular and easily isolated. When your components can be easily tested in isolation or mocked, you can have lots of simple tests that don’t depend on each other or a lot of underlying functionality to work.

As a developer, if you want to help your test team, make your code more maintainable.

Best Practices

  • Use well-defined interfaces using a machine-readable specification like Open API. Have a limited number of endpoints for any component and specify them fully. A clear contract describes exactly what to test.
  • Separate your presentation from your business functionality using a pattern like MVVM or CQRS. That way you can run tests against your business logic without the UI and vice versa.
  • Measure cyclomatic complexity and the number of dependencies for your functions. Complexity describes the number of unit tests you are going to need. Dependencies describe the number of stubs or mocks that you’ll need to write those unit tests. Make your life easy and keep these small. We like complexity to be less than 10 and dependencies to be less than 7 for any function. If you are using Visual Studio, check out Tools/Analyze; if not, use a tool like SonarQube.


  • The dreaded monolithic architecture. If you change one thing in your code, how many tests do you have to run? In a simple modular architecture, the answer is one test. If you have a large inter-connected code base, it’s “all of them, every time”. That’s what’s slowing down your test team.
  • Large stored procedures (SPs). Back in the ’90s, I worked on the Microsoft SQL team. Like other database vendors, we told developers to put business logic in stored procedures to improve performance. I’m sorry! Yes, if you need to use a stored procedure for performance, do it, but look everywhere else first. If you must do it, keep them small and use ANSI SQL. SPs are hard to test, maintain, upgrade, or port to another DB product. It can be done, but it’s not easy.

3. Write Better Manual Tests

What does the classic test look like in a tool like Zephyr or TFS? It’s a set of steps and checks. “Press this button, look for that window.”

These kinds of tests take a long time to write. They can also have a lot of duplication of steps, and if the navigation changes, you have lots of tests to change.

Want to go faster? Write tests with less text, but that help you understand what is tested.

All those steps are not that important if you are optimizing for the obvious–the stuff that a new user would figure out quickly. Instead, you want to describe the important things like:

  • Prerequisites of the test. What needs to be set up in advance? What are the setup steps at the beginning of the test trying to accomplish?
  • What is the test trying to do? Instead of steps, describe the intent. What are you trying to accomplish?
  • What is the test measuring? What is being measured in the software? Ex: “check that the new config entry is created”. If it’s obvious how to do this, don’t bother spelling it out. When we automate, we could implement this in the UI, the API, or as a call to the DB.
  • What is the customer impact if it fails? Imagine you see a test report. If it says 90% pass, what does that mean? Not very useful. What if it says: “major breaking issues: users can’t change dates for their calendar events”? That’s useful.
  • The priority. The test should have a priority based on how bad the result could be and how likely it is to happen.
  • When you automate, make sure it logs that priority and the scenario that is impacted so you know how to report the severity of the failure.

Best Practices

  • Use “functions”. Just like with good code, you want to avoid repeating steps in your tests. It’s helpful to identify repeated or complex setup and verification operations like “create event” or “create user”. Write the steps for these in a separate document and link to them in your test.
  • Use data-driven tests for complex logic so that one test can do the work of many (e.g. a test to add to numbers may have one function to perform the test action, and sets of data like for all the equivalence classes).
  • Use a chain of responsibility pattern as an oracle when there are many possible outputs from your inputs. This pattern is easy to use, code, and maintain. A chain of responsibility pattern is just a series of IF/THEN statements, where each statement calls out to a result and stops the series. The first statements should be the worst cases, like “if (A or B) then error #1”. Statements get progressively less negative till you have “else success”. This way you guarantee that you cover the negative cases and new cases can be added easily without affecting existing ones.
  • Use pairwise testing to reduce the number of tests when two or more inputs depend on each other. Also use them to reduce the number of tests when you have to run the test under multiple configurations like “Spanish, android, large data set”. By combining tests that test pairs of inputs or configurations, you can reduce a test matrix of thousands of tests into tens. See my blog on pairwise testing for more.


As we see from the Best Practices, the anti-patterns are:

  • Repeated steps – writing the same thing over and over
  • Duplicate tests – testing the same functionality in several tests
  • A very complex test oracle – the point of an oracle is to be simpler than the code.

4. Put Development and QA Close Together

Development and QA engineers should share the same sprint processes, share the same code branch, sit in the same office, and ideally sit next to each other.

Best Practices

  • Test code is in the same repo. Test code is in the same place as development code.
  • Test code is real code. Test coding practices are the same as development.
  • Testers aren’t separated. QA and development sit near each other or are in constant communication.
  • Work is completed together. QA and Dev work on the same stories in the same sprint and complete that work together.
  • Keep sprints short. 2 weeks or fewer. Longer sprints tempt dev and QA to get out of sync.
  • Provide API stubs before coding. If QA has an API definition, they can be coding automation in parallel with the developer writing the API.

Separating the QA work from the dev work lowers development productivity and hurts quality. It may seem like dev can do more if they don’t wait for QA, but it’s not the case if QA is involved early with defining the requirements.

Here’s a typical sprint schedule:

  • Pre-sprint: QA and dev work with the Product Owner or Business Analyst to define acceptance criteria for stories. You should be having story refinement meetings at least weekly. These break down larger stories and add acceptance criteria. Sometimes stories in-flight need to be split further so that one piece can be shipped while another part requires additional work.
  • Sprint week 1: QA writes tests based on acceptance criteria, runs a part of the ongoing regression and performance test matrix that is not affected by the work in the sprint, writes library automation functions, and does exploratory testing. Dev writes code and shows pieces to QA as they go. Dev writes unit tests as code is written.
  • Sprint week 2: QA does final tests on new code and automates tests that make sense to repeat. Dev completes code, fixes issues, adds integration tests, and works on ongoing refactoring work. Dev and QA can add logging/monitoring calls to the code to help track it in production.
  • End of sprint: The team demos working code. The demo is typically run by QA to prove it can be run by someone other than the dev that wrote it.

Unless Dev provides API stubs, QA must wait for a little for the dev to have something to share. In the plan above, we pad the beginning of the sprint with ongoing QA work like test passes and perf testing. These must get done but can happen anytime.

Dev will typically be done before QA has finished automation, so we pad the end of the sprint with ongoing dev work like refactoring and integration tests.


  • Testing after dev has moved on.

If you find a bug when the developer is working on a feature, then it is still fresh in their mind and will likely be fixed correctly.

If you find it two weeks later, you’ll interrupt their current work, and the dev may not remember what they were thinking and may fix it wrong.

It’s not just expensive to find bugs later, it adds to your technical debt and slows productivity. We see a drop of 20% to 40% productivity for the whole team when QA lags dev by one sprint.

5. Make Quality Everyone’s Job

Is QA responsible for quality? What about development? The product owner?

Each is responsible for their part in the quality of the product:

  • The product manager or owner (PM/PO) typically owns the vision for the story – how is it going to help the customer? They should be checking that the features match their vision of them before the final demo.
  • The developer should be able to prove their code works the way they intended to write it such as with unit and integration tests.
  • QA owns measurement of the quality – are we done? To what extent is it working? And sometimes, is it working for the customer?

Measurement of quality is important because it gives the organization the data needed to make decisions, but it’s not the same as owning all the quality. QA falls behind when the job becomes bigger than they can accomplish.

The dev and PM/PO should take on quality tasks that they can do most efficiently: unit testing, integration tests, and acceptance review.

Best Practices

  • A good RACI (roles/responsibilities) model. Make clear what tasks each role is responsible for doing.
  • Multiple team members should contribute to your epics and stories. Everyone brings their own skills to the process and if they write the part of it they are responsible for, they will understand it. Co-authorship eliminates a lot of the story review process. This works especially well for epics, where you might have more detail, but also makes sense for stories.
    • Typically, the Product Owner (PO) or Product Manager writes the “why” part of the story: who is the customer, what’s the problem from their point of view, and what are the outcomes. They DON’T describe how the technology will work.
    • The Developer writes the “how” part of the story. These are notes for the developers, so they only need enough detail for those working on it to understand how to build it.
    • QA writes the acceptance criteria. How will you know when the outcomes are achieved and you can measure done?
    • The operations engineer (OPS or DevOps) writes how success is measured in production (if needed).
    • The user experience (UX) designer comes up with any UI and UX guidelines.


  • Hand-offs. Development writes code and then hands it off to QA without checking the requirements or existing tests. Instead, development should be proving their code works as they intended. QA can put it in a bigger context – does it integrate well? Does the experience make sense? What’s the performance like? We also see hand-off problems when development is done in one location and QA in another. That rarely works well. QA must be in constant communication with the developer, and ideally, sit side-by-side.
  • The attitude that QA is responsible for quality. QA is a contributor to quality, not responsible for it – at least not all of it. All roles have quality responsibilities. At the end of an epic, everyone signs off that they did their part and all of those add up to quality. Think of the role of QA as “measuring product quality and customer behavior to give the team data necessary to make decisions”. If you think of yourself as a manual tester or an automator (I.e., back-end loaded), you aren’t providing a lot of value to the company. If you are helping the company make decisions and assure quality as early as possible, that’s a different story. Those people get paid more.
  • The product owner (PO) writes all the stories alone in a room. Product owners are great at understanding the vision behind the story, but not so much with all the details. POs can fall way behind on story creation if they have to research and write everything, especially if you need enough detail in the story for an offshore development team.


Quality assurance is everyone’s job in today’s faster agile environment. While there are many engineering best practices that teams can implement, we’ve had tremendous success with those described above.   Implementing these best practices will help organizations improve QA productivity while maintaining a healthy return on their overall investment in testing.

Copyright © 2019, All Rights Reserved by Bill Hodghead, shared under creative commons license 4.0

Continue ReadingHow to Improve Your QA Productivity

Improving your Root Cause Analysis

I’m often asked to help people improve the value of their root cause analysis (RCA). Here’s my most common advice. I include a Root Cause Analysis (RCA) process you can use to improve your software development along with some advice to make your analysis more effective.


  1. When determining the problem, ask why till you hit a people issue.
  2. Ask what’s the earliest process that could have prevented this. This can be applied to many bugs at once using a drop down in your bug tracking system.
  3. Implement more than your first solution.


Root Cause Analysis is process you use after a production issue to get to the root of it, fix that root cause, and make sure it never happens again. The process can be formal or quick and dirty. It can be done for one issue or many at once. It should never be about finding blame. The ultimate point is for identifying, fixing, and prevention.

Identifying the Root Cause

My favorite technique for determining root cause is the 5 whys technique. You ask why something happened and then you ask why that happened, and so on like a 5-year-old. It would be annoying if it wasn’t so effective.

The number five is arbitrary – it’s meant to be high enough to keep people asking why and not stopping at the first things. Stopping with the Whys too soon is the biggest mistake people make with this technique.

My advice is to keep going till you get past the technical causes to a people cause. No, I don’t mean find someone to blame. While the technical causes are important, there’s often an underlying people issue.

Let me give the most recent example from a client. They had an outage and did a very nice investigation which showed that, while they had been trying to implement Infrastructure as Code (IaC) they had also made changes in production outside of the code. When the deployment happened, the production state was not in a place the code could deal with and the deployment broke things.

Their RCA found this issue – great – but they stopped. My advice is taking it one more step. Why were people not using IaC in production?

  • Is it a training issue where the operational changes are made by people who don’t know the code?
  • Are the people making different changes under different reporting structures that have different processes?
  • Is there an incentive for making changes quickly rather than going through the code path?

There’s a categorization system called the 6 boxes approach which can help with identifying people issues.


Description automatically generated with low confidence

The Lippett-Knoster approach for change management is another nice model for identifying the people issues.


Description automatically generated

Both systems provide a framework to think about what is missing in the way people are working together that could lead to the problem you are seeing.

Fixing the Root Cause

The biggest problem I see out of RCAs is not implementing enough of the fixes! Either the team implements the first or easiest fix or none at all. It’s very common that I see the same root causes suggestions appear over and over on multiple RCAs and that’s just a waste of time. If you’ve found it, fix it!

For each potential fix ask:

  • Is the fix a good idea, or will it just disguise a bad architecture or practice? The latter fixes are not root cause and should be avoided.
  • Are the fixes independent? If two fixes are for the same problem, you may want to just do the best fix and wait to see if that is enough.

Otherwise, my advice is to do as many of the fixes as you can that would have prevented the issue. It seems silly to have to say this but fix it all! There’s a big tendency among people to grab the first or easiest solution. Take the time to do several.

Preventing the Issue

Let’s look at a method to evaluate a whole lot of issues and once, to find a common problem – and prevention – in your organization.

My favorite question for this phase of the RCA is “What’s the earliest process that could have prevented this?”. It might be the most important question in the RCA. Do you really need to know the exact cause if you can figure out a way to prevent it?

There’s a simple way to identify that root cause that gives you real data to fix your team or company’s software development.

  1. Add a dropdown on your bug tracking system – Jira, TFS, Salesforce, whatever. In the dropdown put a bunch of preventions like:
  • Requirements from PM
  • Acceptance demo and test
  • Unit test
  • Component/API test
  • Integration or Resource use test
  • A/B test with real customer data
  • Deployment automation
  • Deployment risk prevention (ex: canary, blue/green, etc.)

You can change these or add your own but try to start with fewer than 10 choices.

2. Review about 100 to 200 high priority bugs in your system. Definitely look at issues in production and escalated from customers, but also consider ones found internally after merge into main that didn’t make it to production.

Rank the bugs with that drop down. This is an ugly manual job and will take a couple of hours, but it’s worth it to do once every 6 months to a year. Don’t farm it out to the dev teams. Just get one to 3 people to go through them all quickly.

3. Look for patterns. Most companies have one or two areas where they are having problems. This exercise gives you the data to prove that those areas need to be fixed.

I’ve seen problems in most of the areas mentioned in the list, except unit testing. Unit testing wasn’t shown to be a great bug preventative for P1 bugs in the few companies I looked at. I think that’s because it’s better at documenting the code and making it maintainable than preventing bugs.

4. Find sub categories and do another pass. Most companies seem to have 1 or 2 major areas where they need help. Unfortunately, the exercise above will have narrowed down the problem but not provided a root cause. It just tells you where to look.

For example, were requirements from PM not translated from epics to stories? We’re they not in the acceptance criteria? Were they not validated with customers? Did they not include the customer, problem from the customer point of view, and outcome the customer would get?

You’ll need a new set of categories and another pass of just the bugs in that category to narrow down your root cause among the available issues. In the spirit of 5 whys, you may need several passes to get enough information to go all the way to root cause. This will sound like a lot of work, and it is, but you are doing a massive RCA for a lot of issues to get real data on where to fix your software development lifecycle.

Copyright © 2021, All Rights Reserved by Bill Hodghead, shared under creative commons license 4.0

Continue ReadingImproving your Root Cause Analysis