Monday, 5 January 2015

Software Cost Estimation

 

Cost estimation can be defined as the approximate judgement of the costs for a project. Cost estimation will never be an exact science because there are too many variables involved in the calculation for a cost estimate, such as human, technical, environmental, and political. Further more, any process that involves a significant human factor can never be exact because humans are far too complex to be entirely predictable. Furthermore, software development for any fair-sized project will inevitably include a number of tasks that have complexities that are difficult to judge because of the complexity of software systems.
Cost estimation is usually measured in terms of effort. The most common metric used is person months or years (or man months or years). The effort is the amount of time for one person to work for a certain period of time. It is important that the specific characteristics of the development environment are taking into account when comparing the effort of two or more projects because no two development environments are the same. 
Cost estimation is an important tool that can affect the planning and budgeting of a project. Because there are a finite number of resources for a project, all of the features of a requirements document can often not all be included in the final product. A cost estimate done at the beginning of a project will help determine which features can be included within the resource constraints of the project (e.g., time). Requirements can be prioritized to ensure that the most important features are included in the product. The risk of a project is reduced when the most important features are included at the beginning because the complexity of a project increases with its size, which means there is more opportunity for mistakes as development progresses. Thus, cost estimation can have a big impact on the life cycle and schedule for a project.

 Cost Estimation Process
In order to understand the end result or the outputs of the software cost estimation process we must first understand what is software cost estimation process. By definition, software cost estimation process is a set of techniques and procedures that is used to derive the software cost estimate. There is usually a set of inputs to the process and then the process uses these inputs to generate or calculate a set of outputs.
  
Classical View
Most of the software cost estimation models views the estimation process as being a function that is computed from a set of cost drivers. And in most cost estimation techniques the primary cost driver or the most important cost driver is believed to be the software requirements. As illustrated in figure 1, in a classical view of software estimation process, the software requirements are the primary input to the process and also form the basis for the cost estimation. The cost estimate will then be adjusted accordingly to a number of other cost drivers to arrive at the final estimate. So what is cost driver? Cost driver is anything that may or will affect the cost of the software. Cost driver are things such as design methodology, skill-levels, risk assessment, personnel experience, programming language or system complexity. 

In a classical view of the estimation process, it will generate three outputs - efforts, duration and loading. The following is a brief description of the outputs:
  • Manpower loading - number of personnel (which also includes management personnel) that are allocated to the project as a function of time.
  • Project duration - time that is needed to complete the project.
  • Effort - amount of effort required to complete the project and is usually measured in units as man-months (MM) or person-months (PM).
The outputs (loading, duration and effort) are usually computed as fixed number with or without tolerance in the classical view. But in reality, the cost estimation process is more complex than what is shown in figure 1. Many of the data that are inputs to the process are modified or refined during the software cost estimation process.
 

Figure 1: Classical view of software estimation process (Vigder and Kark, 1994)

Actual View 
In the actual cost estimation process there are other inputs and constraints that needed to be considered besides the cost drivers. One of the primary constraints of the software cost estimate is the financial constraint, which are the amount of the money that can be budgeted or allocated to the project. There are other constraints such as manpower constraints, and date constraints. Other input such as architecture, which defines the components that made up the system and the interrelationships between these components. Some company will have certain software process or an existing architecture in place; hence for these companies the software cost estimation must base their estimates on these criteria.
There are only very few cases where the software requirements stay fixed. Hence, how do we deal with software requirement changes, ambiguities or inconsistencies? During the estimation process, an experienced estimator will detect the ambiguities and inconsistency in the requirements. As part of the estimation process, the estimator will try to solve all these ambiguities by modifying the requirements. If the ambiguities or inconsistent requirements stay unsolved, which will correspondingly affect the estimation accuracy.

                                                                                               WBS - work breakdown structure

Expert Judgment Method

Expert judgment techniques involve consulting with software cost estimation expert or a group of the experts to use their experience and understanding of the proposed project to arrive at an estimate of its cost.
Generally speaking, a group consensus technique, Delphi technique, is the best way to be used. The strengths and weaknesses are complementary to the strengths and weaknesses of algorithmic method.
To provide a sufficiently broad communication bandwidth for the experts to exchange the volume of information necessary to calibrate their estimates with those of the other experts, a wide band Delphi technique is introduced over standard Deliphi technique.
The estimating steps using this method:
  1. Coordinator present each expert with a specification and an estimation form.
  2. Coordinator calls a group meeting in which the experts discuss estimation issues with the coordinator and each other.
  3. Experts fill out forms anonymously
  4. Coordinator prepares and distributes a summary of the estimation on an iteration form.
  5. Coordinator calls a group meeting, specially focusing on having the experts discuss points where their estimates varied widely.
  6. Experts fill out forms, again anonymously, and steps 4 and 6 are iterated for as many rounds as appropriate.
The wide band Delphi Technique has subsequently been used in a number of studies and cost estimation activities. It has been highly successful in combining the free discuss advantages of the group meeting technique and advantage of anonymous estimation of the standard Delphi Technique.
The advantages of this method are:
  • The experts can factor in differences between past project experience and requirements of the proposed project.
  • The experts can factor in project impacts caused by new technologies, architectures, applications and languages involved in the future project and can also factor in exceptional personnel characteristics and interactions, etc.
The disadvantages include:
  • This method can not be quantified.
  • It is hard to document the factors used by the experts or experts-group.
  • Expert may be some biased, optimistic, and pessimistic, even though they have been decreased by the group consensus.
  • The expert judgment method always compliments the other cost estimating methods such as algorithmic method. 

Estimating by Analogy

Estimating by analogy means comparing the proposed project to previously completed similar project where the project development information id known. Actual data from the completed projects are extrapolated to estimate the proposed project. This method can be used either at system-level or at the component-level.
Estimating by analogy is relatively straightforward. Actually in some respects, it is a systematic form of expert judgment since experts often search for analogous situations so as to inform their opinion.
The steps using estimating by analogy are:
  1. Characterizing the proposed project.
  2. Selecting the most similar completed projects whose characteristics have been stored in the historical data base.
  3. Deriving the estimate for the proposed project from the most similar completed projects by analogy.
The main advantages of this method are:
  1. The estimation are based on actual project characteristic data.
  2. The estimator's past experience and knowledge can be used which is not easy to be quantified.
  3. The differences between the completed and the proposed project can be identified and impacts estimated.
However there are also some problems with this method,
  1. Using this method, we have to determine how best to describe projects. The choice of variables must be restricted to information that is available at the point that the prediction required. Possibilities include the type of application domain, the number of inputs, the number of distinct entities referenced, the number of screens and so forth.
  2. Even once we have characterized the project, we have to determine the similarity and how much confidence can we place in the analogies. Too few analogies might lead to maverick projects being used; too many might lead to the dilution of the effect of the closest analogies. Martin Shepperd etc. introduced the method of finding the analogies by measuring Euclidean distance in n-dimensional space where each dimension corresponds to a variable. Values are standardized so that each dimension contributes equal weight to the process of finding analogies. Generally speaking, two analogies are the most effective.
  3. Finally, we have to derive an estimate for the new project by using known effort values from the analogous projects. Possibilities include means and weighted means which will give more influence to the closer analogies.
It has been estimated that estimating by analogy is superior technique to estimation via algorithmic model in at least some circumstances. It is a more intuitive method so it is easier to understand the reasoning behind a particular prediction.. 

Top-Down and Bottom-Up Methods


  Top-Down Estimating Method

Top-down estimating method is also called Macro Model. Using top-down estimating method, an overall cost estimation for the project is derived from the global properties of the software project, and then the project is partitioned into various low-level components. The leading method using this approach is Putnam model. This method is more applicable to early cost estimation when only global properties are known. In the early phase of the software development, It is very useful because there are no detailed information available.
The advantages of this method are:
  • It focuses on system-level activities such as integration, documentation, configuration management, etc., many of which may be ignored in other estimating methods and it will not miss the cost of system-level functions.
  • It requires minimal project detail, and it is usually faster, easier to implement.
The disadvantages are:
  • It often does not identify difficult low-level problems that are likely to escalate costs and sometime tends to overlook low-level components.
  • It provides no detailed basis for justifying decisions or estimates.
Because it provides a global view of the software project, it usually embodies some effective features such as cost-time trade off capability that exists in Putnam model.

Bottom-up Estimating Method

Using bottom-up estimating method, the cost of each software components is estimated and then combine the results to arrive at an estimated cost of overall project. It aims at constructing the estimate of a system from the knowledge accumulated about the small software components and their interactions. The leading method using this approach is COCOMO's detailed model.
The advantages:
  • It permits the software group to handle an estimate in an almost traditional fashion and to handle estimate components for which the group has a feel.
  • It is more stable because the estimation errors in the various components have a chance to balance out.
The disadvantages:
  • It may overlook many of the system-level costs (integration, configuration management, quality assurance, etc.) associated with software development.
  • It may be inaccurate because the necessary information may not available in the early phase.
  • It tends to be more time-consuming.
  • It may not be feasible when either time and personnel are limited. 

Algorithmic Method


The algorithmic method is designed to provide some mathematical equations to perform software estimation. These mathematical equations are based on research and historical data and use inputs such as Source Lines of Code (SLOC), number of functions to perform, and other cost drivers such as language, design methodology, skill-levels, risk assessments, etc. The algorithmic methods have been largely studied and there are a lot of models have been developed, such as COCOMO models, Putnam model, and function points based models.
General advantages:
  1. It is able to generate repeatable estimations.
  2. It is easy to modify input data, refine and customize formulas.
  3. It is efficient and able to support a family of estimations or a sensitivity analysis.
  4. It is objectively calibrated to previous experience.
General disadvantages:
  1. .It is unable to deal with exceptional conditions, such as exceptional personnel in any software cost estimating exercises, exceptional teamwork, and an exceptional match between skill-levels and tasks.
  2. Poor sizing inputs and inaccurate cost driver rating will result in inaccurate estimation.
  3. Some experience and factors can not be easily quantified. 
  COCOMO:
 COCOMO stands for Constructive Cost Model, it is a software cost estimation model that was first published in 1981 by Barry Bohem (Bohem, 2001). It is an algorithmic approach to estimating the cost of a software project. By using COCOMO you can calculate the amount of effort and the time schedule for projects. From these calculations you can then find out how much staffing is required to complete a project on time. COCOMO's main metric used for calculating these values is lines of code (denoted KLOC for COCOMO II, or KDSI for COCOMO 81 and measured in thousands), function points (FP), or object points (OP).  
COCOMO also lets you check out 'what if' scenarios where by adjusting certain factors in COCOMO you can see how a projects time and effort estimates change as well .  There have been a few different versions of COCOMO; the two that are discussed in this report are COCOMO 81 and COCOMO II. 


COCOMO 81
COCOMO 81 was the first version of COCOMO.  It was modeled around software practices of the 1980’s.  It has been found that on average it is able to produce estimates that are within 20% of the actual values 68% of the time.  COCOMO 81 has three different models that can be used throughout a projects life cycle (Bohem, 2001):
  • Basic Model – this model would be applied early in a projects development.  It will provide a rough estimate early on that should be refined later on with one of the other models.
  • Intermediate Model – this model would be used after you have more detailed requirements for a project. 
  • Advanced Model – when your design for a project is complete you can apply this model to further refine your estimate.

Within each of these models there are also three different modes.  The mode you choose will depend on your work environment, and the size and constraints of the project itself.  The modes are:
  • Organic – this mode is used for “relativity small software teams developing software in a highly familiar, in-house environment” .
  • Embedded – operating within tight constraints where the product is strongly tied to a “complex of hardware, software, regulations and operational procedures” .
  • Semi-detached – an intermediate stage somewhere in between organic and embedded.  Projects are usually of moderate size of up to 300,000 lines of code .
 COCOMO II
COCOMO II was published in 1997 and is an updated model that addresses the problems with COCOMO 81.  The main objectives of COCOMO II were set out when it was first published.  They are:
  • To develop a software cost and schedule estimation model tuned to the life cycle practices of the 1990's and 2000's (Bohem et al, 1995).
  • To develop software cost database and tool support capabilities for continuous model improvement (Bohem et al, 1995).
  • To provide a quantitative analytic framework, and set of tools and techniques for evaluating the effects of software technology improvements on software life cycle costs and schedules (Bohem et al, 1995).
For the most part estimates are obtained in pretty much the same way as COCOMO 81.  The main changes have been in the number and type of cost drivers and the calculation of equation variables rather then the use of constants (for a detailed look at the specific differences between COCOMO 81 and COCOMO II see (Bohem, 1998)).  The equations still use lines of code as their main metric, you can however also using function points and object points to do estimates.  The line of code metric used is now the LOC.  There are standards set out by SEI for proper counting of lines, things like if/then/else statements would be counted as one line (there are automated tools that will do the counting for you when you want to collect data from your own code).
COCOMO II again has three models, but they are different from the ones for COCOMO 81.  They are:
  • Application Composition Model – this would be used for projects built using rapid application development tools. Normally you would use object points for size estimates.  It “involves prototyping efforts to resolve potential high-risk issues such as user interfaces, software/system interaction, performance, or technology maturity.”(Bohem et al, 1995).
  • Early Design Model – This model can provide you with estimates early in a projects design before the entire architecture has been decided on.  Normally you would use function points as a size estimate.  It “involves exploration of alternative software/system architectures and concepts of operation. At this stage, not enough is generally known to support fine-grain cost estimation.”(Bohem et al, 1995).
  • Post-Architecture Model – The most detailed on the three, used after the overall architecture for the project has been designed.  You could use function points or LOC’s for size estimates. It “involves the actual development and maintenance of a software product” (Bohem et al, 1995).
 Cost Drivers
In COCOMO II there are 17 cost drivers that are used in the Post-Architecture model.  They are used in the same way as in COCOMO 81 to calculate the EAF.  The cost drivers are not the same ones as in COCOMO 81; they are better suited for the software development environment on the 1990’s and 2000’s.   They are grouped together as shown in table 3.  We will not go into specific details on all of the cost drivers here as that information can be found in the paper “Cost Models for Future Software Life Cycle Processes: COCOMO 2.0” (Bohem et al, 1995).  The cost drivers for COCOMO II are again rated on a scale from Very Low to Extra High in the same was as in COCOMO 81.

Product Factors
RELY- Required Software Reliability
DATA - Data Base Size
CPLX - Product Complexity
RUSE - Required Reusability
DOCU - Documentation match to life-cycle needs
Platform Factors
TIME - Execution Time Constraint
STOR - Main Storage Constraint
PVOL - Platform Volatility
Personnel Factors
ACAP - Analyst Capability
PCAP - Programmer Capability
AEXP - Applications Experience
PEXP - Platform Experience
LTEX - Language and Tool Experience
PCON - Personnel Continuity
Project Factors
TOOL - Use of Software Tools
SITE - Multisite Development
SCED - Required Development Schedule

COCOMO is no doubt the most popular method for doing software cost estimation.  The estimations are relatively easy to do by hand.  There also are tools available which allow you to calculate more complex estimation.  Calibration of COCOMO is one of the most important things that needs to be done in order to get accurate estimations.  Even though COCOMO may be the most popular estimation method it is recommended that you always use another method of estimation to verify your results.  The other method should differ significantly from COCOMO.  This way your project is examined from more then one angle and something that you may have overlooked when using COCOMO is not overlooked again.


References:


1. http://www.computing.dcu.ie/~renaat/ca421/report.html
2. http://www.computing.dcu.ie/~renaat/ca421/LWu1.html