Revisiting the Impact of Rapid Releases on Software Development Activities

Get Complete Project Material File(s) Now! »

Bug triaging

Saha et al. [110] extracted code change metrics, such as the number of changed files, to identify the reasons for delays in bug fixes, and to improve the overall bug fixing process in four Eclipse Core projects: JDT, CDT, Plug-in Development Environment (PDE), and Platform. Their results showed that a significant number of long-lived bugs could be reduced through careful triaging and prioritization if developers could predict their severity, change eﬀort, and change impact in advance.
Zhang et al. [140] studied factors aﬀecting delays incurred by developers in bug fixing time. They analyzed three Eclipse projects: Mylyn, Platform and PDE. They found that metrics such as severity, operating system, description of the issue and comments are likely to impact the delay in starting to address and resolve the issue. Hooimeijer and Weimer [55] analyzed the correlation between bug triaging time and the reputation of a bug reporter. They designed a model that uses bug report fields, such as bug severity and submitter’s reputation, to predict whether a bug report will be triaged within a given amount of time in the Mozilla Firefox project.

Bug Resolution

Panjer [99] carried out a case study on Eclipse projects and showed that the most influential factors aﬀecting bug fixing time are found in initial bug report fields (e.g., severity, product, component and version) and post-submission information (e.g., comments). Giger et al. [44] found that the assigned developer, bug reporter and month when the bug was reported to have the strongest influence on the bug fixing time in Eclipse, Mozilla and Gnome. Marks et al. [81] studied diﬀerent features of a bug report in relation to bug fixing time using bug repository data from Mozilla and Eclipse. The most influential factors on bug fixing time were bug location and bug reporting time. Zou et al. [143] investigated the characteristics of bug fixing rate and studied the impact of a reporter’s diﬀerent contribution behaviors on the bug fixing rate in Eclipse and Mozilla. Among others, they observed an increase in fixing rate over the years for both projects. On the other hand, the observed rates were not high, especially for Mozilla. Rwemalika et al. [108] studied the characteristics and diﬀerences between pre-release bugs and post-release bugs in 37 industrial Java projects. They found that post-release bugs are more complex to fix since they require modification of several source code files, written in diﬀerent programming languages and configuration files. Lamkanfi et al. [73] proposed a dataset that provides a comprehensive information bundle on the historical evolution of the most relevant attributes from the Eclipse and Mozilla project bug reports. Such dataset motivates the bug analysis and the reproducibility and comparison of the bug detection models in Eclipse and Mozilla. Lamkanfi and Demeyer [72] observed that open source data on bug resolution times could be heavily distorted and include nonrealistic data with resolution times of less than a minute. They found that such outliers may confuse data mining techniques and produce distorted results. Thus, removing such data would improve the result of the model. Athanasiou et al. [9] proposed a model using source code metrics to assess test code quality. They calibrated that model with 86 open source and commercial Java systems for the ratings of a system’s test code to reflect its quality compared to those systems. They showed that there is a high correlation between test code quality and throughput and productivity of issue handling.
All these studies are valuable in understanding the overall bug fixing process, factors aﬀecting bug fixing time, bug fixing time estimation and triaging automation. In our study in Chapter 4, we focus on the bug triaging and fixing time and bug resolution and fixing rate. We are not aware of any related study comparing these metrics before and after the release is delivered and how they evolve over successive releases considering the bug severity level. Moreover, we study the impact of feature freeze periods on the bug handling process.

Benefits and Challenges of Adopting Rapid Releases

According to a recent literature review by Mäntylä et al. [80], rapid releases are fol-lowed in multiple domains including finance, automotive, telecom, space and energy. Prior research studied the benefits and challenges of adopting rapid releases. Such releases are claimed to oﬀer reduced time-to-market and faster user feedback [63]. End-users may benefit from this because they get faster access to functionality im-provements and security updates [80]. Moreover, Zimmermann [142] has shown that the adoption of a shorter release cycle has successfully managed to provide more sta-ble versions, with less breaking changes, that are easier to upgrade. In our study in Chapters 4 and 5, we will analyze the impact of rapid releases on the bug handling process. Joshi et al. [57] introduced a publicly available dataset consisting of 994 open source projects on GitHub featuring rapid releases. This dataset and its documentation and scripts aim to facilitate future empirical research in release engineering and agile software development.
Kerzazi and Khomh [60] examined over 14 months of release data from 246 rapid Firefox releases to determine the several types of factors that aﬀect release time. They identified three factors: technical factors (e.g., code merging and integration and automated tests), organizational factors (e.g., design and management of branches and release planning) and interactional factors (e.g., coordination policies amongst teams). Their analysis reveals that testing is the most time-consuming activity in the release process (86%). Their analysis shows that the time spent on code merging, stabilizing and packaging activities account for only 6%, 6% and 2% of the cycle time, respectively. Moreover, they note that a lack of socio-technical congruence among teams can delay releases. Socio-technical congruence refers to the alignment between the technical dimension of work and the social relationship between team members [23].
Maleknaz et al. [96] analyzed (among others) the release cycle times of 6,003 mobile apps on Google Play as a treatment to predict as an outcome the customer satisfaction expressed through an app rating. To do so, they introduced a generic analytical approach called the Gandhi-Washington Method (GWM). For the specific scenario of mobile app rating, the method consists of encoding and summarising the sequence of release cycle times of each app using regular expressions over the alphabet S (short release cycle), M (medium release cycle), L (long release cycle); followed by statistical tests over those generated expressions to determine causal eﬀects on the outcome variable. They found that apps with sequences of long releases followed by sequences of short releases have the highest median app rating. Apps with sequences of long followed by sequences of medium releases get a lower median rating. Finally, apps with sequences of long releases exclusively get the lowest median rating. Castelluccio et al. [22] investigated the reliability of the Mozilla uplift process. Patch uplift is the practice where patches that fix critical issues or implement high-value fea-tures are promoted directly from the development channel to a stabilization channel because they cannot wait for the next release. This practice is risky because the time allowed for the stabilization of the uplifted patches is short. The authors examined patch uplift operations in rapid release pipelines and formulated recommendations to improve their reliability. They investigated the decision making process of patch up-lift at Mozilla and observed that release managers are more inclined to accept patch uplift requests that concern certain specific components and/or that are submitted by certain specific developers.

Rapid Releases and Software Quality

Khomh et al. [63] empirically studied the eﬀect of rapid releases on Mozilla Firefox software quality. They quantified quality in terms of runtime failures, the presence of bugs and outdatedness of used releases. Related to our work, they compared the number of reported, fixed and unconfirmed bugs and the fixing time during both the testing period, i.e., the time between the first alpha version and the release, and the post-release period. They found that fewer bugs are fixed during the testing period and that bugs are fixed faster under a rapid release model. In a follow-up work [64], the authors reported that, although post-release bugs are fixed faster in a shorter release cycle, a smaller proportion of bugs is fixed compared to the traditional release model. Interviews conducted with six Mozilla employees revealed that they could be “less eﬀective at triaging bugs with the rapidrelease » and that more beta testers using the rapid releases can generate more bugs. Da Costa et al. [28] studied the impact of Mozilla’s rapid release cycles on the integration delay of addressed issues. They showed that the rapid release model does not integrate addressed issues more quickly into consumer-visible releases compared to the traditional release model. They also found that issues are triaged and fixed faster in rapid releases. In a follow-up work [27] they reported that triaging time is not significantly diﬀerent among the traditional and rapid releases. Baysal et al. [12] found that bugs were fixed faster in versions of Firefox using a traditional release model than in Chrome rapid releases, but this was not statistically significant. Clark et al. [24] empirically analyzed the security vulnerabilities in Firefox and found that a rapid release cycle does not result in higher vulnerability rates. The authors also found that frequent releases increase the time needed by attackers to learn the software code, in contrary to the popular belief that frequent code changes result in less secure software. Mäntylä et al. [80] analyzed the impact on software testing eﬀorts when switching to rapid releases in a case study for Firefox. They found that tests have a smaller scope and are performed more continuously with less coverage under rapid releases. They also found that the number of testers decreased in rapid releases, which increased the test workload. In the literature, several benefits of rapid releases were mentioned: rapid feedback, improved quality focus of the developers and testers, easier monitoring of progress and quality, customer satisfaction, shorter time-to-market, and increased eﬃciency due to increased time-pressure. Despite these benefits, previous research has shown that rapid releases often come at the expense of reduced software reliability, longer integration delays, accumulated technical debt and increased time pressure. Most of the studies done on rapid release were performed on the Firefox project which switches its release cycle in 2012 from 12-18 months to 6-weeks release. However, in our study in Chapter 5, we revisit some of the previous work, such as [64] and [80] to study the eﬀect of rapid releases on software quality and testing. However, we study these activities before and after the recent changes in the release policies(i.e., the removal of Aurora and the switch to 4-weeks cycle).

READ TAKING SPIRITUALITY TO WORK

Extracting and Processing Bug Report Data

Our empirical analysis is based on bug report data extracted from Bugzilla for each release of each Eclipse Core project. A typical bug report contains a wide variety of fields. A description of those fields that are relevant in the context of our empirical analysis is summarised in Table 2.2. The Severity of bugs is reported by their owners based on their personal perspective. The Eclipse community considers seven levels 2: enhancement, trivial, minor, nor-mal, major, critical and blocker. We excluded 28,579 bugs marked as enhancement from our analysis since feature enhancements are considered to be new functionality requests rather than bugs [109].
To extract the bug histories of all reported Eclipse bugs we used the Bugzilla API3. Since we focus on Eclipse Core projects only, we only considered bug reports for which the Product field was tagged with Platform, JDT, Equinox or PDE. We extracted 215,591 bug histories corresponding to these projects. Our dataset was fetched on 27 July 2020, and the earliest and latest dates of reported bugs in our dataset correspond to 11 October 2001 and 27 July 2020, respectively. In a next step, we filtered bugs based on their Version field. As our goal is to study the bug resolution process in relation to each Eclipse release, we only considered those bug reports whose version corresponds to an actual Eclipse release ranging between 3.0 and 4.15. To this end, we excluded 3,296 bugs with unspecified Version field and 33,701 bugs corresponding to versions outside of the specified version range. We restricted ourselves to values that actually correspond to valid Eclipse releases; e.g., the valid version values of the 4.7 release found in our dataset were 4.7, 4.7.1, 4.7.1a, 4.7.2, 4.7.3 and 4.7.0 Oxygen. From the remaining bugs, we excluded 2,569 bugs that corresponded to versions that are not listed in the oﬃcial releases of Eclipse, i.e., 4.0 and 4.1. Our final dataset consists of 143,606 bug reports, of which 107,397 (i.e., 74.8%) belonging to the 3.x 8
4.x version range. While the analysis in Section 4.7 focuses on all these bugs, the research questions in Section 4.8 and Section 4.9 focus only on the 4.x release range during which the transition to a faster release cycle took place. Within this range, there are 29,831 bug reports for annual releases (4.2→4.8) and 6,378 bug reports for quarterly releases (4.9→4.15).
For the remainder of the analysis, we partitioned all reported bugs into groups ac-cording to their major release number. For example, group 4.2 contains all reported bugs whose Version field prefix is 4.2. The aforementioned processing steps mitigate several threats that could bias the results of our study. As pointed out by Tu et al. [127], incorrect use of bug tracking data may threaten the validity of research findings because the values of bug report fields (e.g., Version, Status, Severity) may change over time. They recommend researchers who rely on such data to mitigate data leakage risks by fully understanding their application scenarios, the origin and change of the data, and the influential bug report fields. We therefore assessed this threat for the bug report fields Version and Status in Eclipse.4 Examining the bugs that changed their Version field during the bug fixing cycle, we found 1,437 bugs that were reassigned to diﬀerent releases throughout their history, out of which 1,291 bugs that were reassigned to diﬀerent major releases. We handle such bugs by considering them only for the last major release they aﬀected as in-cluding the same bugs in multiple releases would bias the results for our pre-release analysis. From these bugs, only 21 out of 1,233 RESOLVED bugs are resolved in mul-tiple major Eclipse releases, thus the impact on our analysis is minimal. In all our research questions, we consider the impact of bug severity on prioritization of bugs. We used the categorization strategy of Gomes et al. [47] to aggregate bugs into two groups: severe (including blocker, critical, and major severity) and non-severe (in-cluding normal, minor and trivial severity). The threats related to changes in the bug severity field during the bug history was examined, and we found 2,290 bug reports that changed their severity over time, out of which 1,503 bugs being reassigned to dif-ferent severity category. In those cases, we used the latest severity category assigned to each bug, as changes in the severity level indicate that prior severity levels were not accurate.

Table of contents :

Acknowledgements
Abstract
1 Introduction
1.1 Thesis Context
1.2 Thesis Statement
1.3 Contribution
1.4 Dissertation Structure
2 Background
2.1 Software Development Process
2.2 Release Management
2.2.1 Software Release Basics
2.2.2 Release Channels
2.2.3 Release Management Strategies
2.3 Bug Handling
2.3.1 Definitions
2.3.2 Bug Report
2.3.3 Bug Handling Process
2.3.4 Bugzilla Issue Tracking System
2.4 Summary
3 Related Work
3.1 Bug Handling Process
3.1.1 Bug triaging
3.1.2 Bug Resolution
3.2 Rapid Releases
3.2.1 Benefits and Challenges of Adopting Rapid Releases
3.2.2 Rapid Releases and Software Quality
3.3 Summary
4 Impact of Release Policies on Bug Handing Activity
4.1 Introduction
4.2 Methodology
4.2.1 Selected Case Study
4.2.2 Extracting and Processing Bug Report Data
4.2.3 Proposed Bug Handling Metrics
4.2.4 Applying the Metrics to Specific Eclipse Releases
4.3 Statistical Methods
4.4 Feedback from Eclipse Maintainers
4.5 Bug Handling Process Discovery
4.6 Applying Process Mining on Bug Handling Process
4.7 Quantitative Analysis of the Evolution of Eclipse Bug Handling Activity
4.8 Impact of Rapid Releases on the Bug Handling Process of Eclipse .
4.8.1 RQ1.1 How does the bug handling rate evolve across releases?
4.8.2 RQ1.2 How does the bug handling time differ before and after each release?
4.9 Impact of Feature Freezes on Bug Handling in Eclipse
4.9.1 RQ2.1 How does the feature freeze period impact bug handling rate?
4.9.2 RQ2.2 How does the feature freeze period impact bug handling time?
4.10 Discussion
4.11 Threats to Validity
4.12 Conclusion
5 Revisiting the Impact of Rapid Releases on Software Development Activities
5.1 Introduction
5.2 Methodology
5.2.1 Selected Case Study: Mozilla Firefox
5.2.1.1 Mozilla’s Development Process
5.2.1.2 Firefox Testing & Quality Assurance
5.2.1.3 Mozilla’s Patch Uplifting Process
5.2.2 Data Processing
5.2.3 Statistical Methods
5.3 Impact of Rapid Releases on Quality Assurance
5.3.1 RQ1.1 : How does switching to more rapid releases impact the number of post-release bugs?
5.3.2 RQ1.2: How does switching to more rapid releases affect the number of manually performed tests?
5.3.3 RQ1.3: How does switching to more rapid releases affect the number of testers working on a project?
5.3.4 RQ1.4 : How does the frequency of intermittent test failures change after switching to more rapid releases?
5.4 Impact of Rapid Releases on Patch Uplifts
5.4.1 RQ2.1: How does the number of accepted and rejected uplifts evolve over time?
5.4.2 RQ2.2 : How effective are the patch uplifts and how does this change with more rapid releases?
5.5 Discussion
5.6 Threats to Validity
5.7 Conclusion
6 Conclusion
6.1 Contributions
6.2 Limitations
6.3 Possible Research Extensions
6.4 Future Work
A Online Form
Bibliography