Nonparametric Density Estimation

Get Complete Project Material File(s) Now! »

Chapter 2 Literature Review

Shape-restricted Density Estimation

Density estimation under shape restrictions has been studied and gradually improved upon in the last few decades. The allure is the prospect of obtaining fully automatic nonparametric estimators, without using any tuning parameter, the value of which is often difficult to choose. Furthermore, shape-restricted density estimators allow the modelling assumptions to be tailored to more closely match reality and ensure that estimates have the desired shape characteristics for all samples, not just on average or asymptotically (Wolters, 2012).
Shape-restricted density estimation essentially assumes that f0 2 F, where f0 is the true density and F contains all functions that follow certain shape constraints. Choosing a suitable family F is important. It should be both “large” enough to include the true f0, or at the very least, f0 can be well approximated by elements from F, and “small” enough so that the LSE or the MLE is well-defined and consistent with respect to certain norms (Chen, 2013). The shape restrictions imposed on F should correspond to real-world problems and their use in practice should be easily justifiable.
Once choosing a family F, one can use either the least squares estimator (LSE) or the maximum likelihood estimator (MLE) to estimate f0. For a special family F, the LSE is defined as Z Z f^ = argmin f2(x) dx 2 f(x) dFn(x) ; f 2F where Fn denotes the empirical distribution obtained from the sample. The MLE, on the other

Shape-restricted Density Estimation

The qualitative constraints that have received the most attention in the literature are monotonicity, unimodality and convexity.
We note that throughout the thesis, for simplicity of notation, we typically use x1; x2; ; xn for the ordered copy of a random sample, i.e., the order statistics, owing to the nature of shape-restricted estimation. For methods under no shape restrictions, a random sample does not have to be ordered and the formulae of these methods are often irrelevant to the ordering of a sample. Distinctions will be made, if any ambiguity arises.

Monotonicity

Monotonicity is the most basic shape constraint for a real-valued function on R. In many situations, this constraint can be imposed on the data in a straightforward way. Grenander (1956) studied the nonparametric maximum likelihood estimation for a non-increasing density function on [0; 1) and showed that the estimator must be a step function with jumps only at some observations. The pool-adjacent-violators algorithm (PAVA) (Ayer et al., 1955; Robertson et al., 1988) can be applied to compute the estimate. More details about PAVA will be described later. However, this estimator is inconsistent at the mode(zero here); see Balabdaoui et al. (2011).

Unimodality

Unimodality is cited as a reasonable assumption in many problems. Note that the class of unimodal densities includes monotone densities as a special case. The Grenander (1956) estimator can be straightforwardly extended to the case of a unimodal density with a known mode, but it does not directly adapt to the case with an unknown mode (Rao, 1969; Woodroofe and Sun, 1993). Woodroofe and Sun (1993) proposed a consistent estimator by introducing a penalty term for the value at the mode based on maximum likelihood. Bickel and Fan (1996), who also used the PAVA, discussed some problems in unimodal density estimation and plugged in a consistent point estimate of the mode location. The linear spline is used by Meyer and Woodroofe (2004), who developed a consistent decreasing density estimator which is forced to be concave on an interval containing the mode. Meyer (2012) proposed a quadratic spline estimator for a decreasing density function. A unimodal density estimator is obtained by piecing together two isotonic density estimators at a known mode. A smooth log-concave density estimator was proposed by Dumbgen¨ and Rufibach

Nonparametric Estimation of a Unimodal Density Function

Anderson-Bergman (2014) introduced a new, more flexible shape constraint, “inverse convex”, for survival analysis and other types of heavy-tailed data.

Convexity

Groeneboom et al. (2001) considered a piecewise linear estimator in the decreasing and convex density estimation, but this estimator has a tendency to spike at the mode. A support reduction algorithm was proposed by Groeneboom et al. (2008) to compute this estimator. A cubic spline estimator for decreasing and convex density function was developed by Meyer (2012).
Of course, there are also many other shape restrictions; see Bartoszynski et al. (1981) based on MLE for estimating the intensity function related nonstationary poisson process, Aıt-Sahalia and Duarte (2003), Yatchew and Hardle¨ (2006), Birke and Dette (2007) and Horowitz and Lee (2015) for convexity based on LSE.

Nonparametric Estimation of a Unimodal Density Function

A density f on the real line is said to be unimodal if there exists a point M such that f is nonde-creasing on ( ; M) and nonincreasing on (M; 1). Then M is known as the mode of the density. When the true density is unimodal, there are two good reasons to enforce unimodality. First, mak-ing use of the shape information would help improve estimation accuracy (Wolters, 2012). Second, incorporating the constraint will eliminate spurious modes that may reduce the effectiveness of the density estimate as an exploratory tool and communication aid (Wolters, 2012). Maximum like-lihood estimation of a unimodal density with a known mode can be accomplished by using two decreasing estimators on either side of the mode. However, if the mode is unknown and has to be estimated as well, the maximum likelihood estimator does not exist because the likelihood is unbounded if the mode is allowed to vary (Birg´e, 1997). Several smoothed unimodal estimators have been proposed using kernel ideas and spline methods.

Kernel-based estimation

The kernel method is the most popular and conceptually simplest nonparametric approach. It is of wide applicability, particularly in the univariate case, and is probably the method whose properties are best understood (Parzen, 1962; Fryer, 1977; Silverman, 1986). It is defined as the weighted average of kernel functions centred at the observed values. Given xi, i = 1; ; n, the kernel

Nonparametric Estimation of a Unimodal Density Function

where K( ) is the kernel function satisfying K(x) dx = 1 and h is a positive number, known as the bandwidth. The behaviour of a KDE relies strongly on the choice of the value of the smoothing parameter h.
Many research efforts have been made in smooth estimation of a unimodal density function us-ing kernels. Silverman (1981) proposed a bandwidth test for unimodality based on nonparametric density estimation. However, this test can not form the basis for a unimodal density estimator. In the case where the mode is known, Foug`eres (1997) proposed a unimodal estimator based on a unimodal rearrangement of the kernel estimator. Cheng et al. (1999) treated a general unimodal density as a transformation of some known unimodal template and then introduced a recursive method for estimating the transformation. A smoothing estimator was constructed during the al-gorithm by using the kernel technique. A kernel estimator was also considered as the derivative of the least concave majorant of the distribution by Eggermont and LaRiccia (2000). Hall and Huang (2002) proposed a method for rendering unimodal a standard kernel density estimator by minimiz-ing the integrated squared distance between a conventional density estimator and its reweighted version. This estimator needs to remove spurious wiggles in the tails of the conventional density es-timator which can result in a detrimental increase in the density estimator at other places, leading to poor mean squared error performance. It also commonly suffers from the difficulty for heavy-tailed distributions. The kernel method for estimating monotone, convex and log-concave densities was proposed by Birke (2009). Dumbgen¨ and Rufibach (2009) proposed a smooth log-concave density estimate by convolving their nonparametric maximum likelihood density estimate with a Gaussian density, which preserves the log-concavity shape constraint. This estimate was further studied by Chen and Samworth (2013) who developed a new test of log-concavity and by Ru-fibach (2012) who developed a new smooth estimator of the ROC curve based on the log-concavity assumption of the constituent distributions.
It would be preferable if unimodality could be achieved by adding a conceptually simple mod-ification to a standard nonparametric estimtor. Data sharpening, as advanced by Braun and Hall (2001), is one approach that operates in this way and can improve upon the performance of numer-ous estimators. Data sharpening refers to methods for preprocessing data. Since the introduction of data sharpening methods by Choi and Hall (1999) and Choi et al. (2000), they become an attractive approach to achieve unimodality by perturbing the data and improve the performance of the stan-dard kernel density estimator by adding a simple modification. Data sharpening involves altering the positions of data values, controlled by minimizing a measure of the total distance that data are moved subject to unimodality on the estimator. Given x = (x1 xn)T , and letting y = (y1 yn)T be the new sharpened data vector, the original problem of unimodal density estimation can be set up as the sharpened KDE problem which is algebraically the same as the standard kernel density estimator, only a subscript y added to fb, indicating which data vector is used to produce the estimate. The usual KDE is fbx, that is, when y = x. The best sharpened data vector y can be defined as a solution to a constrained minimization problem:
subject to the unimodality that (with h fixed) there exists m such that fby0(x) 0 when x m and fby0(x) 0 when x m. D is a nonnegative and symmetric distance function, e.g., the Euclidean distance.
For a data-sharpening method, it requires the choice of a distance function D. A natural choice, used by Braun and Hall (2001) and Hall and Kang (2005), is a norm of the difference y x, defined as
Braun and Hall (2001) successfully applied data sharpening to obtain unimodal estimates, without providing any theoretical support or clear guidance to the choice of . Hall and Kang (2005) provided both theoretical and numerical properties of the data sharpening method based on L1 distance function. They produced a smooth unimodal estimator with very good mean squared error performance. Wolters (2009) proposed a greedy algorithm for unimodal kernel density estimation following Braun and Hall (2001) in applying data sharpening to a KDE.
A common feature of kernel-based methods is that they introduce certain tuning parameters, such as the order of the kernel or the bandwidth. Researchers have made a great effort for choosing properly these parameters, which is far from trivial; see Fryer (1977), Cao et al. (1994) and Jones et al. (1996). An inappropriate bandwidth may breed the danger of under- or over-smoothing. It also usually involves minimizing a measure of global effectiveness of a curve estimate, such as the Integrated Square Error (ISE), the Mean ISE (MISE) or other performance measures. However, these criteria are not good for capturing the unimodality of a density function. With the shape of unimodality known, spline estimators based on maximum likelihood are an alternative approach to the estimation of a unimodal density.

Spline-based estimation

A spline is a piecewise polynomial constructed in such a way that it is also continuous or even smooth at the points, called knots, at which two polynomials are pieced together. Splines can be used to approximate virtually any smooth function, at least if a sufficient large number of knots are used. It extends the advantages of polynomials to include greater flexibility, local effects of parameter value changes and the possibility of imposing shape constraints on the estimate.
Bickel and Fan (1996) proposed several methods for estimating a unimodal density based on the maximum likelihood method. They applied the pregrouping technique to the maximum likeli-hood method to reduce peaking problems and save computational cost. A plug-in MLE was firstly introduced, but it is discontinuous. Then they introduced a smoother estimate by finding the MLE satisfying the monotonicity restrictions among linear splines. However, this linear spline MLE gives zero mass outside the range of the observed values and does not produce a qualitatively different curve from the plug-in MLE itself. A smoothed curve was further obtained by solving an isotone cubic spline regression problem. Let fbbe the plug-in MLE estimate of a unimodal density of Bickel and Fan (1996). Denote by z1; :::; zN the midpoints of the fb histogram bins, and by y1; :::; yN the corresponding heights. As shown in Bickel and Fan (1996), the number of bins should be 25 50 depending on the number of data points. Setting z2; z6; :::; z4k+2 (k = b(N 2)=4c) as initial knots Meyer and Woodroofe (2004) introduced a consistent version using a decreasing linear spline estimator with concave interval at the mode and determined its rate of convergence in the Hellinger metric. If the concavity assumption turns out to be valid over a given interval, then this estimator needs a user-defined penalty or smoothing parameter. Otherwise, the concavity interval can be used as a penalty device and is allowed to go to zero as the sample size increases. Meyer (2008) proposed an algorithm for the cubic monotone case and extended it to a convexity constraint, as well as some variants such as increasing concavity. Later, Meyer (2012) also obtained a smooth unimodal estimator by introducing quadratic splines with knots spaced in the support. It has been pointed out that the mode can be given or estimated using polynomial kernel density estimation in a sufficiently fast way; see Eddy (1980). With a known mode, and without loss of generality, assuming the mode is 0, it places k1 interior knots to the left of the mode, and k2 interior knots to the right of the mode. There are m = k1 + k2 + 3 knots in all by adding the mode itself and the exterior knots encompassing the domain of the function (dl; du). It starts with the basis functions for the decreasing part (the right of the mode). Letting 1; :::; k2+2 be the right-hand basis functions defined on the knots 0 = 0 < 1; ; k2 < k2+1 = du and k2+3; :::; k1+k2+4 be the increasing basis functions defined on the k1 interior knots. Let m = k1 + k2 + 4, a smooth unimodal density estimator can be written as m X* fb(x) = sj j; j=1 subject to sj 0 for j = 1; :::; m and Pk2+2 sj = Pk1+k2+4 sj. The parameters can be obtained by j=1 j=k2+3
for selecting the number of weights (m) is provided and the mixing weights ! are chosen through quadratic programming techniques subject to linear inequality constraints.
Sometimes, the measure of inaccuracy, such as the integrated squared error, does not reliably reflect qualitative fidelity (see, e.g., Kooperberg and Stone (1991)). In order to make a good use of the available shape information, one can apply the logspline model to smooth unimodal estimation. In the logspline method of density estimation, the logarithm of the unknown density function is approximated by a polynomial spline, the unknown coefficients of which are estimated by maximum likelihood. The logspline density estimation for the univariate data set can be found in Kooperberg and Stone (1992), Stone et al. (1997), Koo et al. (1999) and Koo and Kooperberg (2000).

1. Introduction
1.1 Nonparametric Density Estimation
1.2 Nonparametric Density Estimation under Shape Restrictions
1.3 Motivation
1.4 Contributions
1.5 Outline of the Thesis
2. Literature Review
2.1 Shape-restricted Density Estimation
2.2 Nonparametric Estimation of a Unimodal Density Function
2.3 Nonparametric Estimation of a Log-concave Density Function
2.4 Nonparametric Estimation of a Unimodal and Heavy-tailed Distribution
2.5 Nonparametric Mixtures
3. A Fast Algorithm for Log-concave Density Estimation
3.1 Introduction
3.2 Characterization of the Nonparametric Maximum Likelihood Estimate
3.3 Computation
3.4 Convergence
3.5 Numerical Studies
3.6 Summary
4. Smooth Log-Concave Density Estimation
4.1 Introduction
4.2 Smoothness Assumption
4.3 Computation
4.4 Assessing Log-concavity
4.5 Simulation Studies
4.6 Real-world Data
4.6.3 Timings
4.7 Summary
5. An Application of Log-concave Density Estimation: ROC Curve Estimation
5.1 Introduction
5.2 ROC Curve Estimation Based on Log-concave Density Estimates
5.3 Simulation Studies
5.4 An Example
5.5 Summary
6. Nonparametric Estimation for Heavy-tailed Distributions under Shape Restrictions
6.1 Empirical Motivation
6.2 Introduction
6.3 Maximum Likelihood Estimation for Unimodal Heavy-tailed Distributions
6.4 Computation
6.5 Bootstrap Test
6.6 Simulation Studies
6.7 Financial Data
6.8 Summary
7. Heavy Tails and Value at Risk Estimation
7.1 Introduction
7.2 Estimators for Comparison
7.3 Heavy Tails Analysis
7.4 VaR estimation
7.5 Summary
8. Summary and Future Works
8.1 Summary
8.2 Future Works
GET THE COMPLETE PROJECT