Get Complete Project Material File(s) Now! »

**Chapter 3: Theory and Methods**

**Theory**

International trade in agricultural products is vital to the prosperity of U.S. agriculture and food industries and likewise to the well-being of U.S. food consumers. Producers gain access to foreign markets and, if there are economies of scale, enlarged markets allow firms to move down their long-run average cost curves and expand their sales. Consumers benefit from lower prices, greater variety, and more consistent supplies of goods throughout the year.

Based on theoretical models of trade going back to Ricardo, it is reasonable for a nation to seek trade with its neighbors. Further, it is logical for firms in trading countries to seek out marketing opportunities beyond their own borders. Along with theoretical trade benefits, this also provides an incentive to maintain relative world peace. At the same time, ensuring a safe and nutritious food supply has created incentives for some nations to become self-sufficient in key industries, including agriculture.

The model I am using is the gravity model. While it was based on Newton’s equation of universal gravity in physics, it has also been shown to describe an importing country’s demand for exports from a foreign country (Head and Mayer 2014; Grant and Lambert 2008; Peterson et al. 2013). The original equation from physics is below:

The left side is the force of gravity, *F*. The right side is the gravitational constant, *G*, multiplied by the product of the masses (*m*) over the squared distance (*r*) between the two bodies. The equation was first used in an economic context by Tinbergen (1964). This model equated GDP (gross domestic product) with economic “mass” and used this and distance to predict the value of trade between two nations. While physical gravity is a force exerted in both directions, the gravity model of trade estimates the value of expenditures in one direction. Anderson (2010) prescribed a theoretical foundation for the traditional model as follows (see also Anderson and van Wincoop 2003):

Where *X**ij* is the value of bilateral trade from *i* to *j*, *Y**i* is the total production value of the exporting country, *E**i* is the total expenditure of the importing country, and *d**ij* is the distance between them. When looking at total aggregate trade, GDP is a reasonable value for both *Y* and *E* since it can be measured as either output-based or expenditure-based GDP. Geographical distance accounts for frictions that increase trade costs the more distant countries are (i.e., transportation costs), and can also include factors such as language barriers, cultural differences in institutions, governance, and tastes and preferences, and lack of common historical experience (Head and Mayer 2014).

This model has performed well empirically (Anderson and van Wincoop, 2003; Bergstrand and Egger, 2009); it is generally stated that this basic model can explain 60% or more of the variation in trade flows. While many theoretical justifications have been suggested (they’re briefly described in “The Gravity Model”), this study adopts Anderson (2010) as its basis. The model is theoretically anchored in an expenditure equation, or the inverse of indirect utility. In gravity, the expenditure is dependent on the availability (production) and accessibility (distance) of a product given a “budget” (GDP). Anderson’s demand side structural gravity equation is as follows:

Where *Xij* is bilateral trade from country *i* to country *j, Ej* is the total expenditure on tradeable goods by the importing country (capacity to spend), *Yi* is the total production of the exporting country, (capacity to produce) and *Y* is total global expenditure. In this way, the first term is the share of total global expenditures that is composed of *j*’s expenditure’s on *i*’s goods. *σ* is the elasticity of substitution between varieties from different countries, and *P**i* and *Π**j* represent inward and outward multilateral resistance. These explain a country’s preference for one partner over all others, as trade doesn’t occur in a bilateral vacuum. The greater the multilateral resistance for a pair, the greater is the propensity for them to trade with each other. The trade costs associated with the country pair are indicated by *t**ij.* This may include language barriers, cultural differences, and historical relationships as well as physical distance. To estimate a multiplicative model, it can be estimated in a log-linearized form with panel data as follows:

Where is the error term, and *P**i* *and Π**j* are controlled by pair fixed effects. Equation [4] also includes year fixed effects to control for changes in world prices over the sample period and other period-specific shocks that are common across all trading pairs. The above equations describe aggregate trade of a generic good. The model can also allow for disaggregation by product with the subscript *k*:

Where is exporter production, representing the exporting country *i*’s capacity to produce good *k* in year *t*. Commodity fixed effects are represented by *u**k*.

In this case, my *k* subscript refers to whole and broken corn and soybeans at the HS-4 digit level, 1005 and 1201. The exporter’s annual production of the good reflects the capacity to produce, while the importer’s GDP, as before, represents the demand capacity in the importing country. If both countries in a pair produce similar bundles of goods, it is probable that they won’t have much incentive to trade unless one has a clear comparative advantage in production of one good. In this study, I use regressions based on equation [5] where *i* = US, such that regressions are performed on the sub-sample in which the U.S. is the only exporter.

**Estimation Methods**

The general framework of my methods are as follows: the first step is to estimate a gravity model using a state of the art panel data set. The next step is to determine the impacts of policy in variation between predicted and actual trade values.

I estimated many iterations of the gravity model with a panel data set. A panel data set looks at the same panel of subjects over time (i.e., repeatedly). In this case, the panel ID contains country pair and commodity. Therefore the index on the dependent variable, *v*, is *tijk*, where *t* denotes the year and *ijk* is the panel ID, representing exporter, importer, and commodity respectively. When looking at only U.S. exports, there is no variation in *i*, so the panel ID becomes *jk*. In the model specifications looking at an individual commodity, k does not vary, so the major source of variation to identify a policy effect comes from variation across destination markets, *j*.

Iterations of the model in the applied research literature now include variables, often dummies, which aim to capture the effects of other factors relevant to bilateral trade relationships. These include colonial history, common language, shared border, and shared membership in regional trade agreements. For example, the U.S. has entered into a free trade agreement with 20 other countries. Thus, an FTA binary variable is included in the regressions to control for potentially higher trade flows with partners where trade has been liberalized due to the FTA. However, inclusion of importer fixed effects rules out variables that vary only by the importer and not over time, such as colonial history, shared border, and shared language. For this study, I also created a dummy variable for GMO policies. It equals one for any importer and year wherein there was a policy, in place to prevent or reduce the production or use of GMO crops and foods, whether it was an import ban or a moratorium on cultivation. Cultivation bans don’t directly impact trade, but they reflect tastes and preferences that do not favor GMO products. In some cases, the importing of GMO crops is allowed, but the cultivation is prohibited. There are also cases where imports are allowed “with authorization,” but the actual difficulty of the authorization process is unclear. Table 3.1 shows those countries which have had anti-GMO policies at any point during the sample period. The countries included in the table are abbreviated as follows: Azerbaijan (AZE), Belize (BLZ), Bhutan (BTN), Switzerland (CHE), Colombia (COL), Algeria (DZA), Ecuador (ECU), EU-15^{1} (EUR), Indonesia (IDN), Kyrgyzstan (KGZ), Madagascar (MDG), Peru (PER), Russia (RUS), Saudi Arabia (SAU), Thailand (THA), Turkey (TUR), the Ukraine (UKR), and Venezuela (VEN).

In order to see the impact of GMO policies on U.S. exports of corn and soybeans, I ran models both with and without the GMO dummy variable. When the GMO dummy was left out, I assumed that the effects would appear in the error term and used residual analysis to identify shocks. However, the error term also reflects shocks due to other omitted factors. Residual analysis involves calculating the difference between actual and predicted trade for each observation and ranking the observations based on this value. The largest negative residual values represent the most significant deviations from the model. In this case, I assume the predicted values are a reasonable expectation of what trade should be conditional on all covariates included, and deviations from actual trade flows represent trade inefficiency due to policy factors.

The estimation methods I use are ordinary least squares (OLS), as used throughout the gravity literature (Rose, 2004; Cassey & Zhao, 2017) and a Poisson pseudo maximum likelihood (PPML) estimation, as recommended by Santos Silva and Tenreyro (2006). As explained in “The Log of Gravity,” estimating a log-linearized gravity model with OLS presents a few issues. The first and most basic is that trade data often includes many zero values in the dependent variable. These may be a result of measurement error or because of trade policy factors. Either way, however, the omission of zero trade flows due to logarithmic transformation of the dependent variable is likely to create sample selection bias (Grant and Boys 2012).

In addition, the OLS model predicts trade in logs rather than levels. In order to gain practical inference, it must be transformed back to real values. Jensen’s inequality (below) states that the log of the expected value is not equal to the expected value of the log, making the estimates obtained with OLS biased.

While the OLS estimator is statistically problematic for this reason, it is still used widely in the trade literature to estimate the gravity model, often alongside alternatives such as PPML, the new workhorse of gravity model estimation, as introduced by “The Log of Gravity” in 2006 Although Poisson distributions are used for count data, that is not required in this case for the estimator to be consistent. I estimate PPML both with and without zero values included. While it is an added benefit, keeping all the zero dependent variable values isn’t the main purpose of using PPML. To run the regressions in Stata, I used OLS regression and PPML commands created by Sergio Correia (2014) which allow the model to absorb multiple levels of fixed effects, such as importer and year.

**Data**

Global trade data is readily available in the UN Comtrade database. However, this database includes trade values as reported by both importers and exporters. Not every individual trade observation has information reported by both sides and they often are imperfectly aligned. The bulk of my data is sourced from the BACI data set, distributed by CEPII. CEPII is a French center for research and expertise on international trade, migration, finance, and macroeconomics. In the BACI set, the pairs of reported values are reconciled based on reliability indicators for exporter and importer. The set includes trade in both quantity and value in $1,000 units. The values are observed on a yearly basis at the country level. BACI reports the values by product at the HS-6 level, so I aggregated the values to the HS-4 digit level. The original BACI data includes only positive trade values. However, zero trade flows are just as important (Grant and Boys, 2012). For example, if a strict GMO policy leads to zero export values for U.S. corn exports, omitting zero trade flows through logarithmic transformation of the dependent variable will lead to underestimation of the GMO impact. Thus, I added zeros to the data set. To do this, I reshaped the data from long to wide format, and replaced missing trade values with zeros. Stata code for these steps and the aggregation from HS-6 to HS-4 can be found in the appendix.

While in most trade literature, indeed most research literature in general, it seems that more data is always better, that isn’t necessarily the case here. Large data sets may contain many zeros and very small observations that aren’t relevant or influential in the market of interest and pull down the mean of the sample. Because I am looking specifically at U.S. exports of corn (1005) and soybeans (1201), I sought to narrow the data set to those countries that imported significant amounts of the products from the United States. Because the USDA considers a trade relationship significant when it is valued at over $100 million dollars in a given year, I identified countries that imported over $100 million of either product from the United States in at least 10 of the 19 years in the sample. The table below describes the groups. The graph following it shows the share represented by these groups in total U.S. exports corn and soybean markets. Limiting the importer group to the largest importers of the U.S. products results in higher mean GDP and trade values, as well as all positive values. The 9 countries in the soy group hold a consistently larger portion of that market than the 8 countries in the corn group held in theirs. Both groups represent significant amounts of the U.S. export market. One downside of these selected groups is the dramatically reduced sample size.

For soybeans, the selected group of countries has a much higher average GDP than that of the corn group countries, but both have a mean GDP a factor of ten larger than that of the whole group. The average values of trade are similar. The soy group’s share of the U.S. export market was both larger and more constant than that of the corn group. The following graphs show the shares of U.S. exports for each individual country. For corn, Japan is the top destination for most of the period, but it is decreasing while Mexico’s share increases. South Korea’s share seems the least stable. For soybeans, China’s share soars from under 10% to over 60% while the other countries in the group stay stable at or decline to less than 10%. A third graph shows the soybean group without China and rescaled to make changes over time clearer. In this graph, you can see that the shares of Europe, Japan, and Mexico steadily decline. Europe imported over 35% of U.S. soybean exports in 1998, but fell to about 5% in 2009 and stayed below 10% for the rest of the period.

As mentioned in chapter 3, the data also includes an RTA dummy variable. The variable equals 1 if the importer and exporter are part of the same trade agreement in that year, or in this case, if the importer is part of an RTA with the United States in that year. The U.S. has trade agreements with 20 different countries, shown table 4.2. The countries included in the table are abbreviated as follows: Australia (AUS), Bahrain (BHR), Canada (CAN), Chile (CHL), Colombia (COL), Costa Rica (CRI), Dominican Republic (DOM), Guatemala (GTM), Honduras (HND), Israel (ISR), Jordan (JOR), Korea (KOR), Morocco (MAR), Mexico (MEX), Nicaragua (NIC), Oman (OMN), Panama (PAN), Peru (PER), Singapore (SGP), and El Salvador (SLV).

**Estimation and Results**

Many variations of equation [5] in Chapter 3 were estimated, varying the sample size, the fixed effects, the estimation method, and the inclusion of the GMO variable. I ran regressions for corn alone, soybeans alone, and both commodities together. For each group, I ran variations with the large or small sample size (small groups from Table 4.1 in Chapter 4), with OLS or PPML, with importer fixed effects or importer and year fixed effects, and with or without the GMO dummy variable. When using the PPML estimator, the dependent variable is not logged. For the large samples, there are zero values, so I ran the PPML estimation both with and without the zero values. Because the small groups have few countries, I only used importer fixed effects, as the U.S. is the only exporter, the sample size is small, and using both importer and year would take up too many degrees of freedom. Commodity fixed effects were used for all of the combined sample variations. With all of these possible combinations, there are 44 alternative models. A few of them are shown below, and the breakdown is illustrated in Figure 5.1. The Stata commands for all of the PPML regressions can be found in the appendix. The following are some of the variations on the model, shown for a single commodity group and labeled alphabetically.

**TABLE OF CONTENTS**

**ACKNOWLEDGEMENTS **

**TABLE OF CONTENTS **

**LIST OF FIGURES **

**LIST OF TABLES**

**Chapter 1: Introduction **

Objectives and Approach

Organization

**Chapter 2: Background **

Grain and Oilseed Crops and Markets

Policy Setting

Brief Historical Context

**Chapter 3: Theory and Methods .**

Theory

Estimation Methods

**Chapter 4: Data **

**Chapter 5: Estimation and Results **

**Chapter 6: Conclusions and Discussion**

Future Work

References

GET THE COMPLETE PROJECT

AGRICULTURAL TRADE PERFORMANCE AND POTENTIAL: A RETROSPECTIVE PANEL DATA ANALYSIS OF U.S. EXPORTS OF CORN AND SOYBEANS