Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/raedan/public_html/textpattern/lib/txplib_db.php on line 14
State of Flow: Estimation XPeriment::journal

Estimation XPeriment

Lance Walton - Wednesday September 27, 2006

This article reports on an experiment I have been conducting in my current team that investigates the relationship between story estimates and sum of task estimates in order to determine if story estimates are sufficient for iteration and release planning.

Introduction

Prediction is difficult, especially of the future. – Neils Bohr

When I started with my current team, I saw the following cycle repeat over several iterations:

1. An iteration planning session would occur during which some stories were chosen
2. The chosen stories would be broken down into tasks and estimated
3. Developers would express concern about the estimated amount of work required for the iteration
4. Work would proceed with each pair of developers working on a story
5. The iteration would finish with many stories unfinished and some not yet started, thus making the next iteration planning session difficult due to leftovers and leaving us with a general sense of dissatisfaction about the iteration.

Thic cycle would continue until enough stories were completed to constitute a release, which frequently occurred in the middle of an iteration

There was no release planning1.

There are several issues that come out of the description above (lack of release planning, lack of control over the iteration structure, the independence of tasks in the breakdown of each story, etc.) but the one I want to focus on here is the lack of essential information in the iteration planning session.

What was missing from the process was estimates for the stories under consideration in iteration planning, and a velocity for the last iteration calculated as the sum of estimates for the planned stories.

When I asked the team why they did not estimate the stories and use these estimates for their planning, they expressed the same concerns that I’ve heard many times before; they felt that story estimates would be too inaccurate for the purposes of planning and also that stories were frequently unestimable – it was the task breakdown which bought out many of the issues which business analysts then needed to medicate. This frequently caused stories to be pulled from the iteration, invalidating a great deal of the results of the iteration planning session.

With this in mind, I proposed a safe experiment to determine the following:

1. Is there a simple (linear) relationship between story estimates and sum of task estimates and is this relationship strong enough to allow the use of story estimates for release and iteration planning?
2. Is the relationship between story estimates and the sum of task estimates stable over a period of time that is of the scale of a few releases so that longer term planning can also be achieved?

Explicitly excluded from this experiment was anything to do with comparing estimates to actuals. Also, the idea was not to change the process as we were doing the experiment, so apart from sessions to estimate stories, everything else remained the same.

The Experiment

To start the experiment, we needed to produce estimates for each of the stories that were likely to come up over the next few iterations. The business analysts were able to tell us which stories these were, although a few stories did subsequently get dropped. During every iteration, any new stories were also estimated.

We did not want to spend a huge amount of time on this. In particular, we did not want to do task breakdowns for all of the stories in advance. However, we did want to try to bring out more of the issues to allow the business analysts time to work out the missing pieces before putting the story into an iteration.

To achieve this, we used a story estimation technique called Poker Estimation, which you can read about in Mike Cohn’s book, Agile Estimating and Planning (Agile Estimating and Planning at amazon.co.uk Agile Estimating and Planning at amazon.com).

Task breakdowns were done as they always had been – immediately after the iteration planning session.

Collection of Results

The results were collected over 10 iterations (each iteration is one week long). This resulted in poker and task estimates for 41 stories. During this time, the number of developers in the team varied between eight and ten. The team works on three different projects, with developers moving between the projects fairly fluidly. One particular developer was on the project under observation for the duration of the experiment and he was joined by three other developers drawn from the pool at the beginning of each iteration. When poker estimates were required, as many developers as possible were involved, up to a maximum of about six.

The consequence of this is that the poker estimates were produced by different people over the course of the experiment and the task breakdowns and work were not necessarily done by the developers who produced the poker estimates2. This is not an ideal situation. However, it did not work out too badly as we can see from the results and there was a general sense of team ownership of, and commitment to, the estimates.

Results

The results presented here will not make much use of statistical measures of significance, null hypotheses, etc. It is my aim to inform readers about the results of the experiment without losing many along the way because of the lack of a statistical background. In any case, I would not expect to get statistically significant results given the amount of data collected. Instead I will make use the well known method of Proof By Gesticulation; the only requirement for these results were that that sufficient evidence was presented to allow my team to make intuitive use of the results.

Is There a Simple Relationship Between Story Estimates and Sum of Task Estimates?

We can get some simple summary information about the relationship between story and sum of task estimates by plotting them on a graph – see below. The points each represent a (story estimate, sum of task estimate) pair. The reason for the vertical clusters of points is the Estimation Poker method used for producing the story estimates. The line of best fit through the data points is also shown, together with its parameters.

It is clear that there is some correlation between the sum of task estimates and the story estimates. Somewhat gratifying is the fact that the line of best fit passes very close to the origin. These two things taken together mean that the story estimates predict the respective sum of task estimates with nothing more than a scale factor of about 1.3.

In addition, the R 2 value shown on the chart indicates that about 80% of the variation in the sum of task estimates is ‘explained’ (or predicted) by the story estimates.

This is an extremely useful result.

Is The Relationship Between Story Estimates and the Sum of Task Estimates Stable?

By ‘stability’, I mean that the relationship between story estimates and sum of task estimates does not change significantly over time. This is important because business analysts, project managers and other people on the business side of software development frequently needs to plan over time scales significantly larger than a single iteration (one week for my team) or release (a few weeks). If the relationship between stories and sum of task estimates does not hold for this significant period of time, then the business cannot use the story estimates for their longer term planning.

To get an idea of the stability of the process, think of the sequence of story estimates. These estimates are produced in small batches and the estimated stories accumulate until some are chosen to be developed in an iteration. At this point, the stories are broken down into tasks, each of which is estimated and we can now ‘measure the accuracy’ of the story estimates (subject to the known scale factor of 1.3). Because we have seen that the two methods of estimation produce values that are approximately proportional over the whole sample, we can divide the sum of task estimates by the story estimate and compare the value to the expected value of 1.3 to determine how well a particular story conforms to the expected relationship.

Plotting the sequence of sum of task estimates to story estimate ratios in the order of production of the story estimates (see the chart below) gives us points on a chart which preserves the time ordering of the story estimates.

What the chart shows is that over the period of the experiment, the relationship between sum of task estimates and story estimates held – there was no significant drift away from the expected value at any time. This is despite the fact that some of the story estimates were produced early in the experiment, and their task breakdowns and estimates were done after a significant elapsed time.

We can also note that all but one of the points on the chart fall ‘close’ to the mean value. This traditionally indicates that the process is under ‘statistical control3’, meaning that the deviations from the mean are due to random errors and are not due to some systemic change in the process that needs to be investigated. The one point that is a long way from the mean should be investigated to determine if there is a ‘special cause of error’ that can be avoided in the future.

Conclusions

The results show that, for my team using the estimation methods described, story estimates correlate well with sum of task estimates and the relationship holds over a significant time. This suggests that our story estimates are as good as the task estimates for iteration and release planning purposes, and this relationship is stable enough to allow longer term planning.

Consequences

As a result of this experiment, my team has started relying on the story estimates for planning of timeboxed releases. Previously, as described above, release planning was superficial and releases generally happened when sufficient scope had been accumulated.

Because of the greater confidence in the planning process, we have significantly increased the frequency of releases. Previous releases tended to happen every month to six weeks. They now happen every two weeks, with the occassional three week release.

Also, because we discuss the meaning of the stories when we attempt to produce story estimates, issues are bought out with the stories early, thus giving business analysts time to respond before development is required to start. This brings greater stability to our iterations with many fewer stories needing to be reconsidered after iteration planning.

When the larger stories are under development, we tend to have more than one pair working on them in order to make sure they are finished within a single iteration. This is a change in attitude that favours completion and closed iterations. This leads to greater collective code ownership and better focus on the task at hand.

1 What was termed ‘release planning’ was actually just the declaration of a business analyst that some themed, but individually and collectively ill defined set of features must be delivered before a given arbitrary date.

2 Actually, there were a couple of stories for which one developer insisted that their estimate was good, despite it being much lower than that of the other developers. In these cases, that developer was charged with leading the story implementation when it was bought into development.

3 I use this term very loosely here. To truly say the the process is under statistical control, we would have to determine an appropriate model for the distribution (which is not Gaussian, by the way) and place suitable lower and upper control limits on the chart. Nevertheless, we should not let the facts stand in the way of a good story…


  1. Franck    Friday February 23, 2007

    Good article and as it seems that you like proverbs, here are two for you ;o)

    “There are no good project managers – only lucky ones.”

    “The more you plan the luckier you get.”

  2. Strazhce    Monday September 17, 2007

    I have a problem with your conslusion “Is The Relationship Between Story Estimates and the Sum of Task Estimates Stable?”. What is the criteria for your decision? Most of your values are above 1 and, if I understand that relation correctly, that means you get quite often above original estimate in the process of implementing the story = in the middle of the project. How can you conclude, that “the process is in the control”.
    Even more interesting information would be – what is the relation between story estimates and actual task effort…

  3. Lance Walton    Sunday March 9, 2008

    Strazhce,

    As I describe in the article, my definition of stability is to do with whether ‘the relationship between story estimates and sum of task estimates does not change significantly over time’.

    The fact that many of the values are above 1 does not mean that the relationship is unstable. It simply means that there is a constant (over the entire period of the experiment) scale factor (of 1.3) involved when transforming from story estimates to sum of task estimates. This scale factor does not in itself mean anything either; many XP teams use arbitrary units for story and task estimates, and the units may even be different when estimating stories and tasks. Hence the scale factor combines the unit conversion factor and the fact that a few more details may be bought out when stories are decomposed into tasks. What is important here is that there is no significant slope visible in that second graph.

    The ‘process’ that I refer to here is the process of producing estimates, not the software development process. The kind of ‘control’ that I’m referring to is statistical control. But as I say in footnote 3, I use that term very loosely in this article.

    I agree that the relationship between story estimates and actual effort taken to implement that story would be interesting. It’s actually quite difficult to gather the data for that for several reasons, but I would be extremely happy to see someone (you?) do that experiment.

    But in any case, my purpose in this experiment was to show the team that their story estimates were as reliable as their task estimates for the purpose of iteration and release planning, given that velocity is an important piece of information for planning in an XP project. In this regard, actual effort that was required to implement a story is totally uninteresting.