Chapter 2: Literature Review
During the course of this study, other research regarding the assembly or prediction of inventories of building presence and dimensions was sought out. The search turned up a moderate quantity of relevant information. The two eldest studies examined are in many ways conceptually similar to the author’s study. Another, newer source described is a complex piece of software, of which only a portion is of interest to this study. The final item detailed, also the most recent, is merely a source of building information, though it shares a very similar objective with this study. Two supplementary items of significance are also noted.
Prediction/estimation of Building Presence and/or Dimensions
Two papers proved particularly relevant in addressing relationships between building characteristics and other factors. Both of these were published by the Cornell Institute For Social and Economic Research. The first paper, by Jones et al. (1987), speaks of earlier work by some of the authors proposing that certain “empirical regularities…. would make it possible to build models which would describe the characteristics of building stocks” (Jones et al. 1987, 2). These models would be valid in various geographic locations, even though the particulars of the model might need to be adjusted, depending on location. For the current work detailed in the paper, an inventory was assembled of all buildings in Sedgwick County, Kansas, including the city of Wichita, in order to define a “model, which could be applied to other areas for other uses” (Jones et al. 1987, 6). This model proposes “that relationships between buildings and people are relatively stable” (Jones et al. 1987, 6). Wichita is shown to exhibit similar values for floor area of primary buildings per person as two other cities from earlier studies.
This study also partitioned the county into “annules from the Center of Population” (Jones et al. 1987, 8). For each annule, “values per square mile were calculated… outward from the Center of Population for resident population, and number, area and replacement cost for total, residential and non-residential buildings” (Jones et al. 1987, 11). The majority of these density-adjusted variables display “monotonic declines outward from the center [of Population]” (Jones et al. 1987, 11). Moreover, the plot of distance vs. both the Log of population density and Log of building density reveals a striking correlation between the two logged variables (Jones et al. 1987). The near linearity visible in the plots of these variables, and to an even greater extent, Log of building area/sq.mi., over distance, demonstrates the applicability of a “negative exponential function” (Jones et al. 1987, 11).
While the study by Jones et al. focuses primarily on Sedgwick County, and to a lesser degree, cities from earlier, related studies, mention is made of other work looking at estimates of “building losses” (Jones et al. 1987, 2). Thought is also given to the wider applicability of the study’s results. Jones et al., speaking of buildings in “urban regions” (Jones et al. 1987, 17), state the following:
Preliminary analysis indicates empirical regularities characterize their distributions over categories and by categories over space. If these relationships are found to be general ones, it will facilitate the task of making indirect estimates of existing buildings in areas of interest and determining their attributes (Jones et al. 1987, 17-19).
Whether subsequent studies found similar relationships for other areas to those mentioned by Jones et al. is not known.
The inventory of buildings described by Jones et al. was acquired via two different methods. One of these involved scrutiny of tax assessor’s data for Sedgwick County (Jones et al. 1987). The second method is mentioned in Jones et al., but is expounded upon in a separate, related study by Johnson (1986). In Johnson’s work, buildings were inventoried through study, where available, of large-scale aerial photography. Where no aerial photography was available, primarily in lesser-developed portions of the county, buildings were inventoried from USGS 7.5’ Quadrangle maps. The final building count for Sedgwick County using Johnson’s method was compared with the building count from tax assessor’s data; less than 0.5% difference was found between the two counts (Johnson 1986).
As sampling proved useful in the author’s study, so it did in Johnson’s study (1986), though for different reasons. She aimed to ascertain “whether sampling procedures, when applied judiciously, could be useful in calibrating the parameters of a general model for estimating building stocks with a reasonable degree of accuracy” (Johnson 1986, 5).4 Johnson’s work “consider[ed] estimation of only the number of buildings in an area”; “other characteristics of the building population,” such as those discussed in the work of Jones et al. (1987), were not included (Johnson 1986, 5). A grid of ¼ square mile cells was imposed on Sedgwick County, and sampling of the cells for their building tally was implemented, using “various sampling strategies for numerous replications,” to generate estimates of the county’s total number of buildings (Johnson 1986, 6).5
Johnson describes the results of these different sampling strategies in great depth (1986). In a positive light, she states that “all of these sampling methods, when large numbers of samples are taken, provide average estimates very close to the true population value” (Johnson 1986, 119-121). Lest the reader perceive sampling as a panacea, though, Johnson announces that “the estimates produced by individual samples can be extremely different from the true population values” (Johnson 1986, 121). The study notes “the development of models of building stocks that indicate the relationship between the number of buildings in a geographic area and the people and activities that are present there,” and concludes that sampling holds possible worth in the “calibrat[ion]” of such a model (Johnson 1986, 128). Unfortunately, “it [sampling] is not likely to permit estimating parameters with any great precision” (Johnson 1986, 128). Johnson does see potential merit, however, in the use of sampling for estimation in non-standard cities, “such as resort cities… special purpose cities… cities of drastically different ages” (Johnson 1986, 128), etc., and in situations where there is no “alternative” (Johnson 1986, 129).
The two publications above appeared in the references of HAZUS , a GIS add-on software product that can produce loss assessments “from scenario earthquakes” for a given area in the U.S. (FEMA, User’s Manual, 1999, 1:1). The National Institute of Building Sciences (NIBS) created the methodology used in HAZUS for the Federal Emergency Management Agency (FEMA). To assess losses related to the “general building stock”6 (FEMA, User’s Manual, 1999, 4:3) in the area under consideration, “the structural system must be known or inferred for all of the buildings in the inventory” (FEMA, User’s Manual, 1999, 4:10). “Model building types,” such as Wood and Unreinforced Masonry Bearing Walls , “are defined to represent the average characteristics of buildings in a class” (FEMA, User’s Manual, 1999, 4-4). Most of these types are divided into sub-types for different building heights (Appendix 1).
For each census tract within the area being studied, HAZUS estimates the floor area of buildings of each model building type from the approximate areas of each “occupancy class” (FEMA, Technical Manual, 1999, 3:2).7 HAZUS takes into account that “the relationship between structural type and occupancy class will vary on a regional basis” (FEMA, User’s Manual, 1999, 4:10). Each of the three regions (East, Midwest, and West) has its own “mapping schemes for [all] specific occupancy classes (except for RES1 [Single Family Dwelling]) to model building types by floor area percentage” (FEMA, Technical Manual, 1999, 3:4).8 For the East and Midwest, these schemes “are based on proprietary insurance data, opinions of a limited number of experts, and inferences drawn from tax assessors records” (FEMA, Technical Manual, 1999, 3:5). Elsewhere, “for the Western U.S.,” such schemes “are based on information provided in ATC-13,” the Applied Technology Council’s 1985 publication “Earthquake Damage Evaluation Data for California” (FEMA, Technical Manual, 1999, 3:4). For occupancy class RES1, a different mapping scheme was created for each state in the U.S.
For each residential occupancy class except one, raw input data were transformed via “conversion factors” to derive the square footage values necessary to generate subsequent estimates of model building type areas (FEMA, Technical Manual, 1999, 3:29). For these occupancy classes, the requisite input data (UD) come from Census variables, and the “conversion factors are obtained from expert opinion and modifications to ATC-13 values” (FEMA, Technical Manual, 1999, 3:29). Figure 2.1 demonstrates how these square footage values were derived. Data on building square footage for non-residential and temporary residential occupancy classes came from a private corporation (FEMA, Technical Manual, 1999).9
Figure 2.1: Derivation of Building “Square Footage Estimates for Occupancy Classes RES1, 2, 3 and 5” (FEMA, Technical Manual, 1999, 3:29).
SFI = building square footage for an occupancy class
UD = unit of data for that occupancy class
CF = conversion factor for that occupancy class
Source for Figure: (FEMA, Technical Manual, 1999, 3:29)
Digital Building Inventories
While this study focused on prediction of building attributes, high-quality sources of actual building data, especially for use within GIS, were also of interest. In recent years, building inventories have become available in digital form. Several companies, such as I-Cubed, LLC (2001), ISTAR (2001), Urban Data Solutions (2000), and Marconi (2001), produce digital vector and/or raster “buildings” that can be quite faithful to their real-world counterparts. One of the primary customers for such inventories is the wireless communications industry (I-Cubed, LLC 2001, ISTAR 2001, Urban Data Solutions 2000, Marconi 2001). I-Cubed offers a “3-D Digital City Model” with a “3-D buildings” layer to assist in LMDS implementation (I-Cubed, LLC 2001, http://www.i3.com/wireless/lmds.htm). I-Cubed’s buildings, created “typically” by “traditional softcopy photogrammetric techniques”, are often accompanied by additional data, such as one or more digital elevation models, and “actual building addresses” (I-Cubed, LLC 2001, http://www.i3.com/wireless/lmds.htm). I-Cubed has made its 3-D Digital City Models available for several U.S. cities on an “off-the-shelf” basis, and such models are “being derived” for additional cities at the time of this writing (I-Cubed, LLC 2001, http://www.i3.com/products/CityModels/3d_models.htm).
In the literature examined here, the desired data on building count and dimensions have been aggregated to units of varying geographical scale. The initial inventories of Jones et al. (1987) and Johnson (1986) employed a small-scale focus, seeking to find the total value of number of buildings, building square footage, etc., for “an entire…metropolitan area” (Jones et al. 1987, 3). Johnson’s sampling also produced results aggregated to the entire county in question (1986). Jones et al.’s aggregation of building attributes into several annules represented a step to a larger scale (1987). HAZUS provided final estimates of building data at the Census Tract level (FEMA, User’s Manual 1999), which is also larger-scale than countywide aggregations (U.S.Census Bureau (date unknown)). When dealing with digital building data provided by vendors at the level of individual buildings, no aggregation or “filtering” has taken place at all (I-Cubed, LLC 2001, ISTAR 2001, Urban Data Solutions 2000, Marconi 2001).10 The reader may note that the author’s study uses the Census Block as the unit of aggregation; this falls between the digital sources and HAZUS in terms of data scale.
Other Related Work
Two additional pieces of work were considered to be of interest to this inquiry. The first of these is the Sanborn fire insurance atlas, an excellent, but often out-of-date and unwieldy, source of building information. Sanborn atlases proved useful for certain parts of this study and are described in greater detail in Chapter 3. Also examined briefly was the fascinating “Geo-typical Terrain Generator (GTG)”, a “software demonstrator system” that “takes in the topography, hydrography, soils and vegetation of a region and places geo-typical human cultural features on the underlying data” (Chapman 1999, 1). It is likely that yet other items unknown by the author could have provided additional, though minor, insights to this study.
Table of Contents
Chapter 1: Introduction
1.1. LMDS and Viewshed Analysis
1.2. Building Data
1.3. Study Objective
1.4. Further Uses for Study Results
Chapter 2: Literature Review
2.1. Prediction/estimation of Building Presence and/or Dimensions
2.1.1. Cornell Studies
2.2. Digital Building Inventories
2.3. Data Scale
2.4. Other Related Work
Chapter 3. Data Collection and Preparation
3.1. Why Not Non-Residential Blocks?
3.2. Data Acquisition Part One— Orthophotography and The Philadelphia Case
3.3. The Search for Building Height Data
3.3.1. Sanborn Fire Insurance Atlases
3.4. Data Acquisition Part Two— Sanborn Atlases and The Richmond Case
3.5. Data Acquisition Part Three— Digital Building Data
3.6. Overview of the Combined Sample
3.7. Necessary Modifications to Sample Data
Chapter 4: Building and Selection of Regression Models
4.1.1. Locations of Deleted Outliers
4.2. A Brief Overview of Multiple Regression Assumptions and Terms
4.3. Variable Selection Procedures
4.4. Assessing Model Predictability
4.5. Use of Transformations on Y
4.5.1. Verifying Homoscedasticity and Linearity using Scatter Plots
4.5.2. (Lack of) Autocorrelation (or Serial Correlation)
4.5.3. (Lack of) Multicollinearity
4.5.4. Normality and Significance Tests
4.6. Final Model Selection
Chapter 5. Final Results, Conclusions and Suggestions for Additional Study
5.1. Methods Used to Evaluate Results for Chosen Models
5.2. Overall Model Results
5.4. Recommendations for Further Study
GET THE COMPLETE PROJECT
Prediction of Building Count and Dimensions from U.S. Census Data Using Multiple Regression