Agile Defect Prevention

September 23, 2011 1 comment

I recall a day in the late ’90’s when assessing readiness for deployment of an application at Kodak after a several month long release cycle having 3,000 deferred defects. WOW! I can’t believe that was acceptable at that time, but in long waterfall release cycles that was the norm at the time. How can you manage defects like this? Today, this is unacceptable. The idea of “deferred defects” has always bothered me in software development. So, what can we do about this?

No BugsAlong comes agile software development cycles where a defect backlog is an anti-pattern (No Bugs). The idea is that through continuous integration, unit tests, early inspection, and regression tests, your team finds problems as they are introduced. This is great in theory, but how do we manage the inevitable defects that we can’t get to and is an acceptable risk to meeting business requirements, and those defects that will come in from the field as support requests to fix? My approach to managing this is to focus on “defect prevention” as opposed to “defect tracking.”

Defect prevention, really? …is that possible? Imagine counting defects on your two hands. constraining escaping defects to what fits on your two handsHow can we accomplish this? Efficient agile organizations focus on defect prevention rather than downstream defect discovery. A culture of defect prevention includes separating “work in process”  defects (WIP Defect) from “escaping” defects (Defect) to minimize defect management that escape beyond the sprint that features are developed in. This results in a much smaller defect backlog to manage and dramatically increased customer satisfaction. Agile is not just about releasing more often, but also with complete and tested features. So, we need to treat defects found in development as actionable sub-tasks of the feature work item. If we treat these WIP Defets as sub-tasks and acceptance criteria of completing the development tasks, then we are not introducing them to the field and not adding to the project team backlog as technical debt.

Escaping defects should then be treated as ranked backlog work items, along with other project work items. They should be prioritized high enough to resolve them within the next sprint or two and not accumulate a growing backlog. Watch the defect backlog as part of the project metrics. A growing defect backlog is a key indicator that the team is taking on more new work than it can handle. It may also be a key indicator that the team is operating as a “mini-waterfall” project, rather than a agile project, requiring more collaboration between Dev and Quality Engineers and early testing. Drop the number of new items the team works on until the escaping defects are well managed or eliminated.

When a WIP Defect must exist past the completion of the parent development task, then promote it to a Defect and place it in the backlog in rank order with other work. However, whenever possible the team should heavily scrutinize this practice and opt to hold delivery of the feature until the WIP Defects are complete. Also, a Defect in the backlog could be demoted and attached to an active development task to include it in the acceptance criteria for that task.

At Constant Contact, we now have our defects in the same tool (Jira/GreenHopper) that we manage new feature work, such that defects are in the same project and iteration backlogs. This provides greater visibility to the product owner ranking the work and the team implementing the work.

https://davidjellison.wordpress.com/2012/06/24/agile-defect-prevention-part-2/

Advertisements

KANBAN @CTCT

November 20, 2010 Leave a comment

Scrum works well to well when there is a distinct backlog of prioritized work and distinct time-boxed release cycles. Scrum scales well with a Scrum-of-Scrums group where scrum masters scrum across teams, provided you introduce program management to oversee the progress across teams. As an organization grows in agile product teams that align with a single delivery cycle, agile service teams emerge (e.g. Engineering Services, Database, Application Operations, automation infrastructure, etc.). Many of these teams have both planned backlog work and on-demand unplanned work (work request tickets).

Scrum is a push model where work is pushed into the team in prioritized order. Many times work on an item may be blocked by a dependency on another team, so the team picks another work item to work on, and so on. This leads to many things started without being completed and accepted in an consistent flow throughout the length of the sprint. The result is a lot of task switching and inefficiencies near the end of the sprint to get the work done in the sprint, and technical debt (deferred defects, refactoring for reuse, etc.). This is especially difficult when you have many interdependencies between scrum teams.

There are three key challenges that emerge as you scale these needs:

  • minimizing work in process (WIP) with smaller work items
  • balance the planned and unplanned work and their priorities
  • minimize bottlenecks in the flow of work

Kanban is another agile concept borrowed from the lean principles of the manufacturing industry in Japan. Kanban (or kamban in Hepburn romanization–kanji 看板, katakana カンバン, meaning “signboard” or “billboard”) is a concept related to lean and just-in-time (JIT) production. According to Taiichi Ohno, the man credited with developing JIT, kanban is a means through which JIT is achieved (en.wikipedia.org/wiki/Kanban). Kanban is a pull model, rather than the push model of Scrum. It uses the same idea of a user story broken into work items with tasks attached. It uses the card (sticky note) concept as in Scrum, but it uses the idea of clarifying distinct process steps (columns) and swim lanes (rows), where each step has a distinct WIP limit. You still have a prioritized backlog to pull work from. Each swim lane has a purpose for processing work, and each step can only pull work from the previous step once a work item or task is pulled from the downstream step. If the downstream step does not have capacity you can’t push the work to them. This leads to tuning the efficiencies and capacities in each step to accommodate a cadence of pulling work through the swim lanes.

Constant Contact (CTCT) is moving from Scrum to Kanban to address these needs. Mike Fitterman (Development Manager) and Rick Simmons (Agile Coach) illustrate the learning of the pilot team (website) at Constant Contact, as presented at the Agile2010 conference: AgileConference2010-Upstream_Kanban_at_CTCT. At Constant Contact, we have nearly all (both product delivery and service) agile teams using this approach at the time of this writing. You can see Kanban boards in many of our conference room and work area walls as you walk through the engineering space. The Scrum-of-Scrums still occurs daily, however it is more focused on larger project progress, interdependencies, and organizational impediments. We are still working out our inefficiencies with this new process, but it seems to be working well for us so far.

Kanban Board

Constant Contact Website Team Kanban Board

Kanban addresses the need to minimize the WIP by constraining each step in each swim lane. It allows for an express lane for high priority on-demand unplanned work (defects and footprint tickets). It makes inefficiencies and bottlenecks apparent to the team to self-correct and tune how the work is done at each step (colored sticky notes and visible policies). Overall it smooths out the cadence of work to keep work flowing through the team with visible status of all work items to the whole team.

Visit agilemanagement.net to explore more about Kanban agile practices, references, training with David Anderson, founder of Kanban as a software development practice.

Continuous Integration on our Highways

November 14, 2010 Leave a comment

What if we could apply the zero-defects vision of highly efficient Continuous Integration to our highways? We could then travel our highways at the full speed limit, at a sustained pace during rush hour. We would not have to expand our capacity of the highways nor extend our travel time with travel debt that eats into our private lives. Well, it appears Google is taking a crack at it (http://bit.ly/90RF3Q).

Functional Continuous Integration

Driving agile practices over the last 4 years in 3 SAAS companies, it is quite apparent to me that continuous integration (CI) requires both unit test build failure verification and regular deployed functional regression test verification to be really agile. Yes, you need all the SDL (software development lifecycle) practices to manage building the right product and completing work items, but quality of work cannot be compromised in the name of speed. Agile is all about completing small amounts of working (deliverable) software and iterating on continuous feedback. It also includes confidence that you are delivering tested software without regression defects (not breaking what already worked) and confidence future work will not break what was just delivered. Any remaining tests not completed in the scope of work items is considered technical debt. This technical debt is postponed work that results in missed defects.

Quality confidence is achieved by routinely running automated tests at both the code and system levels. Regardless of the agile practices used, design, development and test are interwoven and requires collabration of development and test resources in the delivery team. I believe this is the secret sause that differentiates a waterfall-ish team and an agile-ish team.

  • The waterfall-ish team has the mind-set of develop application code first and develop automation test code later, frequently not including testing in the work item scope.
  • The agile-ish team has the mind-set of developing both unit test and functional test code along with application code, either prior to application code (Test Driven Development) or just after application code, but within the scope of the work item. This includes meeting work item (e.g. user story) acceptance criteria.

Functional Continous Integration (FCI) is continuously creating and updating automated regression tests, and must be the expectation for PO’s when planning work commitments, Executives when assessing progress reporting, Developers including collaboration with QE in their estimating, and Quality Engineers in planning and completing test work. Infrastructure for FCI needs to include integration of automated tests with a test management and reporting database, and needs to be capable of running unattended. I’ve used SeleniumRC with both CruiseControl with Rails test scripts and Hudson with Java test scripts to run build-time deploy and unattended test runs. These CI applications can run with multiple client machines as slaves. This allows CI jobs to run each test suite on a different client machine simultaneaously to speed up the test duration.

The result is that test failures due to problems that break existing code, introduced with changes or new application code, are caught very early and corrected. Further, if these tests are run in the Developer’s sandbox and corrected prior to check-in, there is no defect created, which results in significantly reducing defect counts for the agile team.

Continuous Integration: Selenium RC v.s. XUnit tests

December 26, 2008 Leave a comment

Since April 2008 I’ve been a Consultant/QE Architect at Sermo in Cambridge, MA USA on Ruby on Rails agile teams. The first team started as an experiment to prove rapid development of rails applications, composited with the JBoss-based java core community, could work seamlessly. We have continued to successfully add several more rails applications with this approach. We are now undergoing a major rewrite of the core community and it’s applications entirely in ruby on rails. This new design includes formal SOA interfaces. We are continuously refining our scrum lifecycle as well as our test automation approaches.

We have been focusing on continuous integration with Cruise Control and comprehensive regression testing, including Test:Unit and Selenium tests, an automated test plan generator, and ci_reporter test report, that run with every SVN commit. I also added nightly and weekly batch runs for runtime and more timely tests. The big issue we have been wresting with is…at what level {unit, functional, integration, runtime, browser DOM, load} should acceptance/regression tests be created? This question led to some interesting and healthy debate between development and test staff.

To summarize the testing levels…

  • Rails Test::Unit test levels:
    • unit test coverage is important to verify the methods and their paths
    • functional tests validate controllers operate as intended, including environment and database configuration
    • integration tests validate systemic operations that cross controllers, in render pages properly, irrespective of the browser, including AJAX responses for page load
  • Runtime tests are run on a deployed fleet, either headless or in a simulated browser DOM
  • Selenium tests exercise interactive AJAX and JavaScript in the page, requiring a separate client machine running the selenium-server.jar (Selenium RC for rails) or webrat gem (includes selenium-server.jar and the webrat IDL)
  • Jmeter for load tests (including performance counters)

We found that the most important part of a story is the list of acceptance tests. This list shapes and clearly defines the expectations of the story and what makes it complete. Many times we stub out tests for stories in the sprint in test suites at the beginning a sprint and can be implemented by either a test or development engineer. The biggest problem we found was that adding too many selenium tests made the automated build validation time increase dramatically (4 to 10 times that of integration tests), difficult for developers to run regression tests prior to source code commit, and more fragile as GUI implementation changed. We had to re-factor many tests from selenium to Test::Unit functional or integration tests to improve test performance and reliability.

The key to successful continuous integration is to test continuously, either with TDD practices or TIA (test immediately afterward), as part of accepting stories. This includes all code implemented for the story and any additional tests to cover the acceptance criteria. Testing at the lowest level feasible for code coverage is important for test efficiency. This may require the creation or test fixtures, mocking response expectations, and data factories.

Testing should include both happy path and negative tests (exception handling). Development Engineers need to have a sense of ownership for regression tests. Quality Engineers need to have a sense of test coverage completeness. Together the scrum team needs to hold themselves and each other accountable for not leaving test coverage technical debt beyond the story acceptance. Plan this test engineering time into the story. It may mean that your velocity is a little less than not doing this, but the overall sustainable stride is greater and you really do catch problems prior to (or at the time of) committing changes.

There is a place for automating at the GUI with Selenium or a comparable HTML element or application control automation tool. By limiting the use of these tools to testing AJAX or behavior that requires interactive javascript to render the page, workflows between systems, use cases (data driven), and cross-browser testing.

What we found is that ideally…

1. Test Engineers embedded in a development scrum team should have the ability to:

  • read and exercise application code
  • author unit test cases
  • create and work with test fixtures, test mocks and test data factories
  • assess adequate test coverage for development stories

2. Test Engineers chartered with testing external to the development teams should be able to:

  • deploy to fleets (fully automated is preferred)
  • read and understand mocked interfaces (to exercise actual interfaces)
  • author and exercise run-time tests (cover GUI and API workflows across the system)
  • author and exercise performance/load tests

Continuous Integration assures a solid application code base with full test coverage. It engages all engineers in responsibility for application testing. It allows dedicated Test Engineers to focus on system-level functionality, deployment, load, and user experience.

Nokia Test: Are you really agile?

December 19, 2007 Leave a comment

I have recently had discussions with several management colleagues about agile and find that most are not really getting far enough to realize the full value of agile. Jeff Sutherland and many other champions for the Scrum model presecribe the Nokia Agile Test as a litmus test to deterimine if a team really is agile. This test was developed by Nokia internally to assess development teams at Nokia and their partners. Nokia has the largest number of certified ScrumMasters in a company in the world today.

Nokia first determines if the team is able to abopt Scrum by determining if they are doing iterative developement.

  • Iterations must be timeboxed to less than six weeks
  • Software must be tested and working at the end of an iteration
  • Iteration must start before specification is complete

Next, the Nokia Scrum Test…

  • You know who the product owner is
  • There is a product backlog prioritized by business value
  • The product backlog has estimates created by the team
  • The team generates burndown charts and knows their velocity
  • There are no project managers (or anyone else) disrupting the work of the team

I would also add some items to the list…

  • The story includes clearly defined acceptance test(s) [validates requirement complete]
  • The accpetance test(s) are automated, part of the code base, and run as a regression suite on a regular basis
  • The story is not fully complete (implemented) until the acceptance test(s) are automated

The bottom line is that there are many companies that think they are doing Scrum, but aren’t really and are plagued by legacy processes and measures. It is fine to take small steps to migrate your organization to agile, but keep going until you really get there.

Categories: Agile, Scrum Tags: ,

Agile Lesson Learned: Iterate in the Marketplace

December 16, 2007 Leave a comment

Convoq (a.k.a. Applied Messaging, Zingdom Communications) closed it’s doors Nov 30, 2007. I feel very fortunate to work with a very talented and skilled group of professionals in these 5 years. I had a tremendous experience in leadership and management roles that fostered career growth for me. For a timeline of this business and a summary of what happened at Convoq read the blog entry “Convoq and Zingdom – Five Years” by Chris Herot, CTO and co-founder
http://herot.typepad.com/cherot/2007/12/convoq-and-zing.html .

I have to comment on Chris Herot’s list of lessons learned…

  • I can’t say enough about the first bullet in Chris’ list of lessons learned: “Iterate in the marketplace and not in the conferenceroom. Agile is the only way to go.” Especially in startup companies or divisions, where the need for a product or service is identified, but not yet clear how to meet that need, you have to get working product into the marketplace to fail fast. You need a culture that can work with the learnings and let that drive the iterations priorities, along with the larger strategic goals. This doesn’t mean you have to be reactionary to what people are asking for, but rather what people will use and how they use it.
  • The second bullet in Chris’ list of lessons learned is also important to highlight: “Just because you are using agile methods doesn’t mean you don’t have to plan. Write your stories before you begin an iteration, but don’t waste a lot of time on the details that aren’t needed until later.” This plays well into my blog entry “Agile peak performance in early startups” https://davidjellison.wordpress.com/2007/10/24/agile-peak-performance-in-early-startups. Not having stories ready and prioritized by the product owner (voice of business and customer) before the beginning of an iteration breaks the cadence that is so critical to consistent agile velocity. Keep the stories written at the requirements level, include the user statement for each use case (As an ‘actor’, I need to ‘task to perform’, such that ‘goal to accomplish’), include known acceptance tests (key tests that the requirement is built right), and the adjusted priority (1 to 10).

Iterating in the marketplace, and rapidly acting on the findings to adjust the stories and their relative order of priority for the subsequent iteration, allows you to “build the right product.”

%d bloggers like this: