Get Complete Project Material File(s) Now! »

## A distance-dependent test for localization

The test we propose in the previous section is a test for significant divergence on the whole set of distances, [0, 1110[ km. It is not a test for localization. In a continuous-space framework, localization is a concept that can only be defined with respect to a specific threshold distance d. Broadly speaking, an industry is said to be localized if its plants are relatively more numerous than overall active sites within this threshold distance. As a consequence, f obs is above the reference distribution (f ref ) for some distances smaller than this given threshold. This implies that any positive value for the divergence Dobs is mainly driven by a positive gap between the observed and reference distributions between 0 and d. On the contrary, dispersion occurs when the divergence is mainly driven by a positive gap beyond d. We thus introduce in this paragraph a distance-dependent test for divergence that disentangles localization from dispersion. Let us define: Z d(f obs(x) − f ref (x))+ . Dobs(d) = (1.4).

Dobs(d) is the area between the two densities f obs and f ref , when the former is above the latter on the range of distances [0, d] km. As previously, we can carry out a 5%-level significance test for the divergence on the [0, d] km range.

An industry is said to be localized at distance d if its divergence on [0 −d] km is greater than the 95th percentile of the distribution of divergences on [0 − d] km across the 1000 simulations. More generally, for any given threshold distance d, an industry is said to be localized at distance d if the null hypothesis of the corresponding distance-dependent test is rejected. On the contrary, an industry is said to be dispersed beyond d if the null hypothesis of the distance-dependent test is accepted (i.e. the observed distribution on the [0, d] km range is below the 95th percentile). Our methodology is easy to implement and allow to test for localization at many dif-ferent threshold distances d. In the empirical part, we are able to propose a sorting of industries according to the specific distance at which they a ppear localized.

**Employment-weighted test and index**

So far, we have only considered the number of plants in the industry and not their em-ployment sizes. However, Ellison and Glaeser (1997) argue that both the number of plants and the employment-size distribution have to be taken into account to properly control for the industrial concentration.

In this part, we develop a test for divergence and an index of divergence taking into account the employment-size distribution. In this new setup, the observed distribution does not describe the distribution of bilateral distances between pairs of plants, but between pairs of employees in different plants (hereafter employment-weighted observed distribution). In this case, large plants mainly drive the shape of the observed density distribution. Indeed, consider two large plants, the bilateral distance between these two plants has a frequency count equal to the cross-product of their employment sizes. When we turn to the density active sites of plants for an hypothetical industry with the same number of plants and the same distribution of employment across plants. Then, δW 14 is an index of divergence comparable across industries with different numbers of plants and different employment distributions across plants.

**An analogy with the methodology suggested by Ellison and Glaeser (1997)**

To firmly confirm that our index of divergence is fully compara ble across industries, let us develop an analogy with Ellison and Glaeser’s (1997) approach.

These authors emphasize that even if plants were randomly distributed over space, the cross-regional distribution of sectoral employment cannot exactly match the cross-regional distribution of overall employment, due to the limited number of plants in that industry.

To write it with their notation, under the null hypothesis of random distribution, the expected value of their raw concentration index G is not equal to zero17 but to a strictly positive value (1 −S)H, where H is the Herfindhal index in the industry and S = P R i=1 xi2 if (xi)i=1..R is the vector of overall employment share across the R regions. In our setting, Drand plays the same role as (1 − S)H in the EG approach, by taking into account the fact that even under randomness, the density distribution of bilateral distances cannot perfectly match the reference distribution.

Furthermore, in the EG approach, the maximum value of the expected measure of concentration E(G) is equal to 1 − S. E(G) is maximum for a spatial configuration where all the plants of a given industry are located in the same region. This corresponds to the case where the spillovers or natural advantages are so strong that all the plants choose to locate in the same area. In this configuration, their index of concentration (hereafter, γ) is equal to 1. In our setting, we prove that for some agglomeration forces sufficiently strong, the measure of divergence between the observed and reference distributions can be made very close to 1, the upper-bound of the metric.

**Comparing localization and dispersion in manufacturing and service indus-tries**

Table 1.4 outlines the sorting of manufacturing and service industries in each of the four previous mentioned cases. It clearly stands out that service industries mostly belong to the first case. This means that a majority of service indust ries are localized at very short distances (before 4 km). On the contrary, the patterns of localization in manufacturing in-dustries are less distinctive. A large share of manufacturing industries register localization at longer distances (between 40 and 180 km, or even beyond).

Not only do services diverge more often from randomness, but they also mostly diverge at very short distances. The spatial scope of localization for services is significantly smaller than for manufacturing industries.

**A comparison with the EG index**

Let us compare our results (in the weighted approach) with what is found with an EG index. Table 1.7 in introduction provides moments for the distribution of the EG index computed for both manufacturing and service industries at the employment-area level. We already note that the extent of spatial concentration is much smaller for services than manu-facturing industries when considering an EG index. Table 1.2 shows that the reverse result holds true with our methodology in the weighted approach. We thus argue that the EG index underestimate the extent of spatial concentration in business services.

Surprisingly enough, the test for the significance of spatia l concentration originally proposed by Ellison and Glaeser (1997) is rarely computed in the literature. Table 1.7 provides the results from such a comparison. 76% of industries are found spatially con-centrated (at the 5% level)29 with an EG index computed at the employment-area level: 74% among manufacturing industries and 85% among service industries. These figures are slightly larger than our results. As noted by Duranton and Overman (2005), the EG index overestimates the number of industries which depart from a random distribution. The gap is more pronounced for manufacturing than for service industries. Hence, we believe that our distance-based approach which is not affected by the MAUP is better suited to extract meaningful information about the extent and scope of localization of service industries than the EG index.

### Large plants in service industries are the main drivers of localization

A pervasive question in the literature concerns the localization patterns of the largest plants in comparison with the localization patterns of the smallest ones (see for instance Holmes and Stevens, 2002; Duranton and Overman, 2008). There are several reasons why the largest plants may be more localized. Okubo, Picard, and Thisse (2008) embed a Melitz-type model of monopolistic competition with heterogeneous firms in a new eco-nomic geography framework. Given an exogenous distribution of plant productivity, they show that, for some intermediate values of transport costs, the most productive plants, and thus the largest ones, are concentrated in one region: a sorting effect. Holmes and Stevens (2002) argue that, in a dynamic setting, if plants benefit from any k ind of localiza-tion economies, they could be more productive, and hence, grow faster in areas where an industry is concentrated than outside such areas.

We reconsider this question for both manufacturing and service industries. A first ap-proach is to compare our weighted and unweighted indices in table 1.2. This comparison suggests that the largest plants are more unevenly located than the other plants within most service industries. The same is not observed for manufacturing industries. Indeed, the weighted index strongly depends on the location choices of the largest plants. It stands out in table 1.2 that the distribution of the weighted index (δW ) for services is shifted to the right in comparison with the distribution of the unweighted index (δU ).31 Such a shift in the distribution of δ is not observed for manufacturing industries.32 The mean and me-dian values of δW are 1.5 times larger than the mean and median values of δU concerning services.

The previous comparison is a first proof that the largest plan ts in services display un-even patterns of location. However, the ultimate question is to understand whether or not these plants are more localized than the overall plants within those industries? To further investigate this question, we consider a slightly different approach, inspired from Duranton and Overman (2008). For each industry with more than 100 plants, we select the 10% largest plants. We first compute the distribution of bilater al distances between these largest plants in each industry and assess its divergence against an industry-specific reference dis-tribution. In this case, the reference distribution is the (previously-defined) observed dis-tribution of bilateral distances between all pairs of plants in the industry under scrutiny. In order to assess whether this divergence is significant or not , we randomly allocate these largest plants across the whole set of active sites for this industry. In other words, we con-sider that, for each industry, the potential location choices of the largest plants is the set of sites where a plant from the industry is located whatever its size. This is the simplest way to handle an within-industry comparison in the location choices of the largest plants.

#### First step: estimating individual firm productivity

In the first step, productivity is computed for each firm, whil e controlling for the qual-ity of its labor force and sector-specific (unobserved) dete rminants of productivity. Firm total factor productivity is predicted as the residual of a Cobb-Douglas production function estimation:7 X3 log(V Ait) = cst + θmulti + αslog(Lit ) + βs log(Kit) + δqtshiqt + uit, (2.1)q=2.

• i indices the firm, s the sector of main activity of the firm and t the time.

• Lit is a measure of employment. In our data, employment is measured as the number of working hours.

• shiqt is the share of hours worked by employees of skill group q. Workers are divided up into groups according to their qualifications. This aims a t controlling for the quality of the labor force. Using the French occupation classification, we set up three categories of skills: (Q3) for highly skilled workers (engineers, technicians and managers), (Q2) for skilled workers (skilled blue and white collars), finally (Q1) for unskilled workers, interns and part-time workers.

• Kit is a measure of the capital stock. In our dataset, this measure consists in the book value of tangible and intangible non-financial assets.

• θmulti is a dummy equal to 1 if the firm controls more than one plant.

• cst is a sector- and time-specific fixed effect. It captures any se ctor- and time-specific determinants of productivity such as: 1/ a sector-specific p rice index for the value-added,8 2/ a sector-specific age and depreciation rate of the capital stock,9 3/ any sector-specific macroeconomic shocks likely to affect valu e-added and input choices.

Labor and capital elasticities are supposed to be sector-specific. Hence, we make the assumption that the technology of production is the same for all firms in the same sector. We do not constrain the production function to have constant returns to scale. In other words, any increasing returns to scale internal to the firm ar e controlled for, and do not corrupt our measure of productivity.

A sector is defined as an item of the 3-digit French industrial classification (NAF220). More details about the data are provided in appendix A.

**Table of contents :**

**1 Location patterns of services in France: A distance-based approach **

1.1 Introduction

1.2 Methodology

1.2.1 Testing whether industrial location patterns significantly diverge from randomness

1.2.2 A distance-dependent test for localization

1.2.3 An index of divergence

1.2.4 Employment-weighted test and index

1.2.5 Comparisons with the previous literature

1.3 Data

1.4 Cross-industry results

1.4.1 Result #1: Uneven patterns of location are more pervasive for business services than for manufacturing industries

1.4.2 Result #2: Services are localized at shorter distances than manufacturing industries

1.4.3 Comparisons with alternative measures of localization

1.4.4 Robustness checks

1.5 Within-industry results

1.5.1 Result #3: Large plants in service industries are the main drivers of localization

1.5.2 Result #4: For most service industries, new plants reduce localization whereas exiters reinforce it

1.6 Conclusion

1.7 Appendix to chapter 1: On the consistency of the test for divergence

1.8 Complementary tables

**2 Agglomeration economies and firm productivity: Estimation from French individual data **

2.1 Introduction

2.2 Related Literature

2.3 Estimation strategy and econometric issues

2.3.1 First step: estimating individual firm productivity

2.3.2 Computing average productivity per cluster

2.3.3 Second step: explaining disparities in average firm productivity across clusters

2.4 Proxies for urbanization and localization economies

2.4.1 Urbanization economies

2.4.2 Localization economies

2.5 Main results

2.5.1 The magnitude of urbanization economies

2.5.2 The magnitude of localization economies

2.5.3 Sectoral heterogeneity

2.6 Robustness tests

2.7 Conclusion

2.8 Appendix to chapter 2: Data

2.9 Complementary tables

**3 Marshall’s scale economies: A quantile regression approach **

3.1 Introduction

3.2 Firm TFP estimation: Model and data

3.2.1 Firm and establishment data

3.2.2 Production function estimation

3.3 Agglomeration economies: the traditional linear-in-mean regression model

3.3.1 The traditional linear-in-mean regression model

3.3.2 Proxies for agglomeration economies

3.3.3 Results for the traditional linear-in-mean regression model

3.4 Agglomeration economies: a quantile regression approach

3.4.1 Limitations of the traditional linear-in-mean regression model

3.4.2 The quantile regression model

3.5 Results for the quantile regression model

3.5.1 Model A

3.5.2 Model B

3.5.3 Model C

3.5.4 Results by industry

3.6 Conclusion

3.7 Appendix to chapter 3: Complementary tables

**4 Product complexity, quality of institutions and the pro-trade effect of immigrants**

4.1 Introduction

4.2 Model specification, econometrics and data

4.3 The pro-trade effect of immigrants

4.3.1 Benchmark results

4.3.2 An instrumental variable approach

4.4 Product complexity, quality of institutions and immigration

4.5 Conclusion

4.6 Appendix A to chapter 4: Data on trade and immigration

4.7 Appendix B to chapter 4: Matching the NST/R and Rauch’s classifications .

4.8 Complementary tables

**5 Dots to boxes: Do the size and shape of spatial units jeopardize economic geography estimations? **

5.1 Introduction

5.2 The Modifiable Areal Unit Problem : A Quick Tour

5.2.1 A simple illustration of the MAUP

5.2.2 Mean and variance distortions: a first illustration with simulated data116

5.2.3 Correlations distortions

5.3 Zoning systems and data

5.3.1 Administrative zoning systems

5.3.2 Grid zoning systems

5.3.3 Partly random zoning systems

5.3.4 Characteristics of zoning systems

5.4 Spatial concentration

5.4.1 Gini indices

5.4.2 Ellison and Glaeser indices

5.4.3 Comparison between the Gini and the EG

5.5 Agglomeration economies

5.5.1 A wage-density simple correlation

5.5.2 Controlling for skills and experience

5.5.3 Market potential as a new control

5.5.4 An alternative definition of market potential

5.6 Gravity equations

5.6.1 Basic gravity

5.6.2 Augmented Gravity

5.7 Conclusion

5.8 Appendix to chapter 5: Data

**Bibliography **