Using Ensembles For Web Eort Estimation: A Replication

Get Complete Project Material File(s) Now! »

Systematic Literature Review

In Chapter 1 we discussed how e ective resource management is crucial for successful software development. We also presented the di erence between Web development and general software development. Considering the importance that Web development plays in today’s industry and its di erence from general software development, a detailed insight into Web resource estimation would be valuable. To this end, a systematic literature review would be essential in establishing the current state of the art as well as document existing gaps in the domain.
In this chapter we present a systematic literature review (SLR) of Web resource esti-mation that is geared at \identifying, evaluating, and interpreting all available research » [17] relevant to resource estimation for Web development. Research that has been done on estimating any factor that has \a bearing on a project’s outcome », per the de nition Mendes provides for resources [24] will be considered. Despite our research being focused on development e ort estimation, we feel that documenting research on resource estima-tion in its entirety, will provide us information on the datasets, predictors, and estimation techniques used in research that is closely related, and therefore relevant to ours.
The remainder of this chapter is organized as follows: Section 2.1 describes the steps involved in the SLR process. Section 2.2 discusses the SLR ndings, followed by a discus-sion of the results and any research gaps in Section 2.3. Section 2.4 concludes the chapter with a presentation of possible avenues for future research identi ed by the SLR.

Systematic Literature Review Protocol

The purpose of a SLR is to comprehensively identify, evaluate and interpret all research relevant to the research questions the review is to address [17]. The following section details the research questions central to this review, as well as the process followed to identify the relevant studies required to do so. This protocol is based on the guidelines published by Kitchenham in [17].

Research Questions

Formulating the research questions that a SLR will address is the rst step in the review process [17]. The research questions determine which primary studies are selected, the data to be extracted from these selected studies, and how this data is to be analyzed so that the research questions can be answered.
One approach to formulating research questions is to use the PICOC criteria speci ed by Petticrew and Roberts [33], which structures research questions according to ve at-tributes: population, intervention, comparison, outcome and context. However, since the focus of this literature review is not to compare interventions, the comparison attribute will not be utilized and hence only the population, intervention, outcome and context (PIOC) attributes of the research questions are shown in Table 2.1.
Context Within the domain of Web development with a focus on empirical studies.
Therefore in order to identify and evaluate all the research done on Web resource estimation, the research questions addressed by our SLR are as follows:
Question 1
What methods and techniques have been used for Web resource estimation?
Question 1a
What metrics have been used to measure estimation accuracy?
Question 1b
What (numerical) accuracy did these various methods/techniques achieve?

Systematic Literature Review Protocol 7

Question 2
What resource facets (e.g. e ort, quality, size) have been investigated in research on Web resource estimation?
Question 2a
What resource predictors have been used in the estimation process?
Question 2b
At what stage are these resource predictors gathered?
Question 3
What are the characteristics (single/cross-company, student/industry projects) of the datasets used for Web resource estimation?

Search Strategy

The process of identifying primary studies needs to be rigorous and unbiased. In order to minimize researcher bias a pre-de ned search strategy was required, and involved the following steps:
1. Identifying search terms to be used in the search process. These were identi ed using the PIOC attributes detailed in Table 2.1, and from subject headings/keywords used by related articles and journals. Synonyms, alternate spellings, and abbreviations of any search terms identi ed were also considered.
2. Once the search terms were identi ed, they were compiled into a search string that would be used in the search process. This was done using the Boolean operators OR and AND. The OR operator was used to group the various forms (e.g. synonyms and alternate spellings) of individual search terms. The AND operator was then used to link the di erent search terms into a single search string.

Search Process

With the search string compiled we began our search process, which was split into a primary and secondary search phase.
Primary search phase
This phase involved identifying and searching through primary sources of relevant litera-ture using our search string. These sources include online databases, search engines, and grey literature (e.g. PhD theses and technical reports). Given that resource estimation for Web development is the focus of this literature review, and that the World Wide Web started as a CERN project in 1989 with the rst Web browser Mosaic appearing in 1993 [2], the primary search phase only considered literature published from 1990 (inclusive) to February 2012. The list of primary sources is given in Table 2.2 along with the number of search results and number of relevant papers (see subsection 2.1.4). These resources were recommended by the University of Auckland Library website as resources relevant to Computer Science.
It is important to note that each primary source has its own procedure for entering a search query, with di erent databases using di erent keywords for data elds and op-erators. Therefore our search string in Figure 2.1 had to be tailored to each particular primary source. Initially we used our search string on full text. This however led to thousands of results being returned. We eventually restricted our search to titles and abstracts (and depending on the search engine, keywords).
Secondary search phase
The purpose of the secondary search phase is to ensure that the primary search phase has not missed any relevant literature. Our secondary search phase entailed reviewing the references for selected primary studies in order to identify any additional relevant articles. The secondary search phase and the study selection process (discussed in subsection 2.1.4) are iterative in nature, and were repeated until no new literature was found.

Study Selection

Study selection involved assessing the primary studies identi ed in order to select those that best addressed our research questions.
Inclusion and exclusion criteria for study selection
Studies were selected for the SLR if they met the following inclusion criteria:
1. The study looks at resource estimation within the domain of Web development. Studies can consider any facet of resource estimation, for example, e ort estimation.
2. The study describes the methodology, metrics, and datasets used for resource esti-mation.
3. The study provides an empirical basis for its ndings.
In terms of exclusion criteria, studies were excluded if they:
1. Did not focus on estimating a resource factor that is relevant to Web development.
2. Did not provide an empirical basis for their ndings.
Selection process
Using the inclusion and exclusion criteria, the primary studies identi ed by the search phase were screened. Their titles and abstracts were extracted and compiled into a list, and for those that were found relevant, a hardcopy was retrieved. In the situation that the title and the abstract were not su ciently detailed to determine a study’s relevance, a hardcopy was retrieved and used to make a decision. At this stage of the selection process 98 studies were deemed relevant (see Table 2.2 for further details). Each of the 98 studies was assigned a study id, beginning with the letter \S » followed by a numeral between 1 and 98.
In the nal selection process, the hardcopies retrieved previously were analyzed in detail, and if a study was still found to be relevant at this stage, it was added to the nal 10 Systematic Literature Review
reference library for the SLR. After completing the nal selection process, a further 21 studies were excluded:
7 studies did not focus on estimating a resource factor relevant to Web development (exclusion criterion 1).
7 studies did not provide an empirical basis for their ndings (exclusion criterion 2).
2 studies met both exclusion criteria.
4 studies were duplicates of other studies in the reference library, in which case only the most comprehensive study was selected.
1 study was not published in English despite what was indicated when it was re-trieved during the primary search phase.
The remaining 77 selected studies were used in the secondary search process which led to the inclusion of a further 7 studies. To distinguish studies identi ed in the secondary search phase from those identi ed in the primary search phase, they were assigned a study id consisting of the letter \E » followed by a numeral between 1 and 7, bringing the total number of studies in the nal reference library for the SLR to 84. A list of all 84 studies is provided in Appendix A.
2.1.5 Study Quality Assessment
A quality assessment checklist was de ned to provide a means to quantitatively assess the quality of the evidence presented by these studies. The conclusions drawn from a SLR are only as strong as the evidence they are based on, so compiling an appropriate checklist to assess study \quality » is important [17]. As such, the checklist was not meant to be a form of criticism of any researchers’ work.
Table 2.3 details the quality assessment checklist used to evaluate the primary studies. This checklist was adapted from those compiled by Kitchenham [17], with each question utilizing the same three point answer scale, with a \Yes » being worth 1 point, \No » being worth 0 points, and \Partially » being worth 0.5 points. A primary study could thus score between 0 and 12, with the higher the overall score a study obtains, the greater the degree with which this study addresses our research questions. We selected the rst quartile (i.e. 3) to act as a cuto point, with any study scoring 3 or below being excluded from our nal reference library. None of the 84 primary studies selected fell into this category.

1 Introduction
1.1 Motivation
1.2 Scientic Contributions
1.3 Organization
2 Systematic Literature Review
2.1 Systematic Literature Review Protocol
2.2 Systematic Literature Review Findings
2.3 Discussion
2.4 Conclusion
3 Background
3.1 Tukutuku Dataset
3.2 Ensembles
3.3 Conclusion
4 Using Ensembles For Web Eort Estimation: A Replication
4.1 The Original Study
4.2 Our Replication
4.3 Results
4.4 What Next?
4.5 Conclusion
5 Using Bagging With Ensembles For Web Eort Estimation
5.1 Methodology
5.2 Results
5.3 Discussion
5.4 Conclusion
6 Ensemble Diversity
6.1 The Accuracy-Diversity Trade-O
6.2 Results
6.3 Discussion
6.4 Conclusion
7 Conclusions
7.1 Summary
7.2 Threats To Validity
7.3 Future Directions
7.4 Conclusion
Bibliography
GET THE COMPLETE PROJECT