Cost estimation can be defined as the approximate judgement of the costs for a project. Cost estimation will never be an exact science because there are too many variables involved in the calculation for a cost estimate, such as human, technical, environmental, and political. Further more, any process that involves a significant human factor can never be exact because humans are far too complex to be entirely predictable. Furthermore, software development for any fair-sized project will inevitably include a number of tasks that have complexities that are difficult to judge because of the complexity of software systems.
Cost estimation is usually measured in terms of effort. The most common metric used is person months or years (or man months or years). The effort is the amount of time for one person to work for a certain period of time. It is important that the specific characteristics of the development environment are taking into account when comparing the effort of two or more projects because no two development environments are the same.
Cost estimation is an important tool that can affect the planning and budgeting of a project. Because there are a finite number of resources for a project, all of the features of a requirements document can often not all be included in the final product. A cost estimate done at the beginning of a project will help determine which features can be included within the resource constraints of the project (e.g., time). Requirements can be prioritized to ensure that the most important features are included in the product. The risk of a project is reduced when the most important features are included at the beginning because the complexity of a project increases with its size, which means there is more opportunity for mistakes as development progresses. Thus, cost estimation can have a big impact on the life cycle and schedule for a project.
Cost Estimation Process
In order to understand the end result or the outputs of the software cost estimation process we must first understand what is software cost estimation process. By definition, software cost estimation process is a set of techniques and procedures that is used to derive the software cost estimate. There is usually a set of inputs to the process and then the process uses these inputs to generate or calculate a set of outputs.
Classical View
Most of the software cost estimation models views the estimation process as being a function that is computed from a set of cost drivers. And in most cost estimation techniques the primary cost driver or the most important cost driver is believed to be the software requirements. As illustrated in figure 1, in a classical view of software estimation process, the software requirements are the primary input to the process and also form the basis for the cost estimation. The cost estimate will then be adjusted accordingly to a number of other cost drivers to arrive at the final estimate. So what is cost driver? Cost driver is anything that may or will affect the cost of the software. Cost driver are things such as design methodology, skill-levels, risk assessment, personnel experience, programming language or system complexity.
In a classical view of the estimation process, it will generate three outputs - efforts, duration and loading. The following is a brief description of the outputs:
- Manpower loading - number of personnel (which also includes management personnel) that are allocated to the project as a function of time.
- Project duration - time that is needed to complete the project.
- Effort - amount of effort required to complete the project and is usually measured in units as man-months (MM) or person-months (PM).
Figure 1: Classical view of software estimation process (Vigder and Kark, 1994) Actual View In the actual cost estimation process there are other inputs and constraints that needed to be considered besides the cost drivers. One of the primary constraints of the software cost estimate is the financial constraint, which are the amount of the money that can be budgeted or allocated to the project. There are other constraints such as manpower constraints, and date constraints. Other input such as architecture, which defines the components that made up the system and the interrelationships between these components. Some company will have certain software process or an existing architecture in place; hence for these companies the software cost estimation must base their estimates on these criteria. There are only very few cases where the software requirements stay fixed. Hence, how do we deal with software requirement changes, ambiguities or inconsistencies? During the estimation process, an experienced estimator will detect the ambiguities and inconsistency in the requirements. As part of the estimation process, the estimator will try to solve all these ambiguities by modifying the requirements. If the ambiguities or inconsistent requirements stay unsolved, which will correspondingly affect the estimation accuracy. | |||||
WBS - work breakdown structure
Expert Judgment Method
Expert judgment techniques involve consulting with software cost estimation expert or a group of the experts to use their experience and understanding of the proposed project to arrive at an estimate of its cost.Generally speaking, a group consensus technique, Delphi technique, is the best way to be used. The strengths and weaknesses are complementary to the strengths and weaknesses of algorithmic method.
To provide a sufficiently broad communication bandwidth for the experts to exchange the volume of information necessary to calibrate their estimates with those of the other experts, a wide band Delphi technique is introduced over standard Deliphi technique.
The estimating steps using this method:
- Coordinator present each expert with a specification and an estimation form.
- Coordinator calls a group meeting in which the experts discuss estimation issues with the coordinator and each other.
- Experts fill out forms anonymously
- Coordinator prepares and distributes a summary of the estimation on an iteration form.
- Coordinator calls a group meeting, specially focusing on having the experts discuss points where their estimates varied widely.
- Experts fill out forms, again anonymously, and steps 4 and 6 are iterated for as many rounds as appropriate.
The advantages of this method are:
- The experts can factor in differences between past project experience and requirements of the proposed project.
- The experts can factor in project impacts caused by new technologies, architectures, applications and languages involved in the future project and can also factor in exceptional personnel characteristics and interactions, etc.
- This method can not be quantified.
- It is hard to document the factors used by the experts or experts-group.
- Expert may be some biased, optimistic, and pessimistic, even though they have been decreased by the group consensus.
- The expert judgment method always compliments the other cost estimating methods such as algorithmic method.
Estimating by Analogy
Estimating by analogy means comparing the proposed project to previously completed similar project where the project development information id known. Actual data from the completed projects are extrapolated to estimate the proposed project. This method can be used either at system-level or at the component-level.Estimating by analogy is relatively straightforward. Actually in some respects, it is a systematic form of expert judgment since experts often search for analogous situations so as to inform their opinion.
The steps using estimating by analogy are:
- Characterizing the proposed project.
- Selecting the most similar completed projects whose characteristics have been stored in the historical data base.
- Deriving the estimate for the proposed project from the most similar completed projects by analogy.
- The estimation are based on actual project characteristic data.
- The estimator's past experience and knowledge can be used which is not easy to be quantified.
- The differences between the completed and the proposed project can be identified and impacts estimated.
- Using this method, we have to determine how best to describe projects. The choice of variables must be restricted to information that is available at the point that the prediction required. Possibilities include the type of application domain, the number of inputs, the number of distinct entities referenced, the number of screens and so forth.
- Even once we have characterized the project, we have to determine the similarity and how much confidence can we place in the analogies. Too few analogies might lead to maverick projects being used; too many might lead to the dilution of the effect of the closest analogies. Martin Shepperd etc. introduced the method of finding the analogies by measuring Euclidean distance in n-dimensional space where each dimension corresponds to a variable. Values are standardized so that each dimension contributes equal weight to the process of finding analogies. Generally speaking, two analogies are the most effective.
- Finally, we have to derive an estimate for the new project by using known effort values from the analogous projects. Possibilities include means and weighted means which will give more influence to the closer analogies.
Top-Down and Bottom-Up Methods
Top-Down Estimating Method
Top-down estimating method is also called Macro Model. Using top-down estimating method, an overall cost estimation for the project is derived from the global properties of the software project, and then the project is partitioned into various low-level components. The leading method using this approach is Putnam model. This method is more applicable to early cost estimation when only global properties are known. In the early phase of the software development, It is very useful because there are no detailed information available.The advantages of this method are:
- It focuses on system-level activities such as integration, documentation, configuration management, etc., many of which may be ignored in other estimating methods and it will not miss the cost of system-level functions.
- It requires minimal project detail, and it is usually faster, easier to implement.
- It often does not identify difficult low-level problems that are likely to escalate costs and sometime tends to overlook low-level components.
- It provides no detailed basis for justifying decisions or estimates.
Bottom-up Estimating Method
Using bottom-up estimating method, the cost of each software components is estimated and then combine the results to arrive at an estimated cost of overall project. It aims at constructing the estimate of a system from the knowledge accumulated about the small software components and their interactions. The leading method using this approach is COCOMO's detailed model.The advantages:
- It permits the software group to handle an estimate in an almost traditional fashion and to handle estimate components for which the group has a feel.
- It is more stable because the estimation errors in the various components have a chance to balance out.
- It may overlook many of the system-level costs (integration, configuration management, quality assurance, etc.) associated with software development.
- It may be inaccurate because the necessary information may not available in the early phase.
- It tends to be more time-consuming.
- It may not be feasible when either time and personnel are limited.
Algorithmic Method
The algorithmic method is designed to provide some mathematical equations to perform software estimation. These mathematical equations are based on research and historical data and use inputs such as Source Lines of Code (SLOC), number of functions to perform, and other cost drivers such as language, design methodology, skill-levels, risk assessments, etc. The algorithmic methods have been largely studied and there are a lot of models have been developed, such as COCOMO models, Putnam model, and function points based models.
General advantages:
- It is able to generate repeatable estimations.
- It is easy to modify input data, refine and customize formulas.
- It is efficient and able to support a family of estimations or a sensitivity analysis.
- It is objectively calibrated to previous experience.
- .It is unable to deal with exceptional conditions, such as exceptional personnel in any software cost estimating exercises, exceptional teamwork, and an exceptional match between skill-levels and tasks.
- Poor sizing inputs and inaccurate cost driver rating will result in inaccurate estimation.
- Some experience and factors can not be easily quantified.
COCOMO stands for Constructive Cost Model, it is a software cost estimation model that was first published in 1981 by Barry Bohem (Bohem, 2001). It is an algorithmic approach to estimating the cost of a software project. By using COCOMO you can calculate the amount of effort and the time schedule for projects. From these calculations you can then find out how much staffing is required to complete a project on time. COCOMO's main metric used for calculating these values is lines of code (denoted KLOC for COCOMO II, or KDSI for COCOMO 81 and measured in thousands), function points (FP), or object points (OP).
COCOMO also lets you check out 'what if' scenarios where by adjusting certain factors in COCOMO you can see how a projects time and effort estimates change as well . There have been a few different versions of COCOMO; the two that are discussed in this report are COCOMO 81 and COCOMO II.
COCOMO 81
COCOMO 81 was the first version of
COCOMO. It was modeled around
software practices of the 1980’s.
It has been found that on average it is able to produce estimates
that are within 20% of the actual values 68% of the time. COCOMO 81 has three different
models that can be used throughout a projects life cycle
(Bohem, 2001):
- Basic Model – this model would be applied early in a projects development. It will provide a rough estimate early on that should be refined later on with one of the other models.
- Intermediate Model – this model would be used after you have more detailed requirements for a project.
- Advanced Model – when your design for a project is complete you can apply this model to further refine your estimate.
Within each of these models there are
also three different modes.
The mode you choose will depend on your work environment, and the
size and constraints of the project itself. The modes are:
- Organic – this mode is used for “relativity small software teams developing software in a highly familiar, in-house environment” .
- Embedded – operating within tight constraints where the product is strongly tied to a “complex of hardware, software, regulations and operational procedures” .
- Semi-detached – an intermediate stage somewhere in between organic and embedded. Projects are usually of moderate size of up to 300,000 lines of code .
COCOMO II was published in 1997 and is
an updated model that addresses the problems with COCOMO 81. The main objectives of COCOMO II
were set out when it was first published. They are:
- To develop a software cost and schedule estimation model tuned to the life cycle practices of the 1990's and 2000's (Bohem et al, 1995).
- To develop software cost database and tool support capabilities for continuous model improvement (Bohem et al, 1995).
- To provide a quantitative analytic framework, and set of tools and techniques for evaluating the effects of software technology improvements on software life cycle costs and schedules (Bohem et al, 1995).
For the most part estimates are
obtained in pretty much the same way as COCOMO 81. The main changes have been in the
number and type of cost drivers and the calculation of equation variables
rather then the use of constants (for a detailed look at the specific
differences between COCOMO 81 and COCOMO II see (Bohem, 1998)).
The equations still use lines of code as their main metric, you can
however also using function points and object points to do estimates. The line of code metric used is
now the LOC. There are
standards set out by SEI for proper counting of lines, things like
if/then/else statements would be counted as one line (there are automated
tools that will do the counting for you when you want to collect data from
your own code).
COCOMO II again has three models, but
they are different from the ones for COCOMO 81. They are:
- Application Composition Model – this would be used for projects built using rapid application development tools. Normally you would use object points for size estimates. It “involves prototyping efforts to resolve potential high-risk issues such as user interfaces, software/system interaction, performance, or technology maturity.”(Bohem et al, 1995).
- Early Design Model – This model can provide you with estimates early in a projects design before the entire architecture has been decided on. Normally you would use function points as a size estimate. It “involves exploration of alternative software/system architectures and concepts of operation. At this stage, not enough is generally known to support fine-grain cost estimation.”(Bohem et al, 1995).
- Post-Architecture Model – The most detailed on the three, used after the overall architecture for the project has been designed. You could use function points or LOC’s for size estimates. It “involves the actual development and maintenance of a software product” (Bohem et al, 1995).
In COCOMO II there are 17 cost drivers
that are used in the Post-Architecture model. They are used in the same way as
in COCOMO 81 to calculate the EAF.
The cost drivers are not the same ones as in COCOMO 81; they are
better suited for the software development environment on the 1990’s and
2000’s. They are
grouped together as shown in table 3. We will not go into specific
details on all of the cost drivers here as that information can be found
in the paper “Cost Models for Future Software Life Cycle Processes: COCOMO
2.0” (Bohem et al, 1995). The cost drivers for
COCOMO II are again rated on a scale from Very Low to Extra High in the
same was as in COCOMO 81.
Product
Factors
|
RELY- Required Software
Reliability
|
DATA - Data Base Size
| |
CPLX - Product Complexity
| |
RUSE - Required Reusability
| |
DOCU - Documentation match to life-cycle needs
| |
Platform Factors
|
TIME - Execution Time Constraint
|
STOR - Main Storage Constraint
| |
PVOL - Platform Volatility
| |
Personnel Factors
|
ACAP - Analyst Capability
|
PCAP - Programmer Capability
| |
AEXP - Applications Experience
| |
PEXP - Platform Experience
| |
LTEX - Language and Tool Experience
| |
PCON - Personnel Continuity
| |
Project Factors
|
TOOL - Use of Software Tools
|
SITE - Multisite Development
| |
SCED - Required Development Schedule
|
COCOMO is no doubt the most popular method for doing
software cost estimation. The estimations are relatively easy to do
by hand. There also are tools available which allow you to calculate
more complex estimation. Calibration of COCOMO is one of the most
important things that needs to be done in order to get accurate
estimations. Even though COCOMO may be the most popular estimation
method it is recommended that you always use another method of estimation
to verify your results. The other method should differ significantly
from COCOMO. This way your project is examined from more then one
angle and something that you may have overlooked when using COCOMO is not
overlooked again.
References:
1. http://www.computing.dcu.ie/~renaat/ca421/report.html
2. http://www.computing.dcu.ie/~renaat/ca421/LWu1.html