Estimates of live-tree carbon stores in the Pacific Northwest are sensitive to model selection
© Melson et al; licensee BioMed Central Ltd. 2011
Received: 16 December 2010
Accepted: 10 April 2011
Published: 10 April 2011
Estimates of live-tree carbon stores are influenced by numerous uncertainties. One of them is model-selection uncertainty: one has to choose among multiple empirical equations and conversion factors that can be plausibly justified as locally applicable to calculate the carbon store from inventory measurements such as tree height and diameter at breast height (DBH). Here we quantify the model-selection uncertainty for the five most numerous tree species in six counties of northwest Oregon, USA.
The results of our study demonstrate that model-selection error may introduce 20 to 40% uncertainty into a live-tree carbon estimate, possibly making this form of error the largest source of uncertainty in estimation of live-tree carbon stores. The effect of model selection could be even greater if models are applied beyond the height and DBH ranges for which they were developed.
Model-selection uncertainty is potentially large enough that it could limit the ability to track forest carbon with the precision and accuracy required by carbon accounting protocols. Without local validation based on detailed measurements of usually destructively sampled trees, it is very difficult to choose the best model when there are several available. Our analysis suggests that considering tree form in equation selection may better match trees to existing equations and that substantial gaps exist, in terms of both species and diameter ranges, that are ripe for new model-building effort.
The rapid increase in atmospheric carbon dioxide (CO2) concentration is a major contributor to primarily anthropogenic global warming . International agreements such as the Kyoto Protocol require participating nations to reduce CO2 and other greenhouse gas emissions. To implement such commitments, countries must produce nation wide inventories of carbon (C) sources and sinks. Forests can be both C sources and sinks, so there is interest in exploring forest C sequestration to offset anthropogenic CO2 emissions . However, before sequestration potential can be assessed, the magnitude of forest C sources and sinks must first be determined.
Live trees are a significant C storage pool in United States of America (US) forests, ranking second behind soil C [3, 4]. Live-tree C is often estimated from regression equations that relate biomass (or volume subsequently expressed as biomass using density conversion factors) to some easily measured tree dimension obtained from inventory data, such as DBH (diameter at breast height, usually 1.37 m above ground level) or height. Estimated biomass is then converted to C with a C:biomass ratio (e.g., ).
Estimates of C in live trees are influenced by numerous uncertainties: sampling error associated with the inventory (affects precision but not bias as long as the sampling is well designed); measurement uncertainty (can affect both precision and bias); regression uncertainty inherent in any estimated regression relationship (usually affects only precision unless fit is poor) ; and model-selection uncertainty (can affect both precision and bias) introduced by having to choose among multiple, potentially equally applicable regression relationships and conversion factors. Each source contributes to uncertainty about the live-tree C estimate (uncertainty in the sense of Harmon et al. ). Sampling error and measurement uncertainty are typically studied and addressed by those taking inventories (e.g., [5, 7]), regression uncertainty is often assessed by those who publish regression equations (typically R2 and mean standard error are reported, e.g., ). Model selection uncertainty is rarely considered, although some authors have noted large differences between prediction equations, e.g., [9–11]. The first three types of uncertainty are routinely assumed to be independent for each sampling unit or individual tree in an inventory, a convention that assists in their estimation and results in minimal aggregate uncertainty when large numbers of trees are inventoried . However, both regression and measurement uncertainty can have a substantial bias component. Model-selection uncertainty, when it occurs, is systematic error, and cannot be modeled independently for individual trees.
To quantify and better understand the uncertainty that selection among models could contribute to regional live-tree C estimates, we conducted a sensitivity analysis on the model-selection component involved in estimating live-tree C for a subset of tree species in northwest Oregon (NWOR), USA. While this is an example from a single region, the findings have relevance to forests globally. For each species we calculated the range of live-tree total C estimates for each size class. These ranges were then applied to inventory data to estimate the range of model-selection uncertainty of live-tree C stores for the study area. Finally, we examined several strategies to reduce this uncertainty in live-tree C estimates.
Procedures for this sensitivity analysis were iterative, required a number of assumptions, and as this was a novel approach, necessitated the introduction of nonstandard terms. Distilled to the essence, we: (1) selected the most common tree species in the study area as the population of interest with respect to live-tree carbon estimation; (2) obtained candidate equations, then set and applied criteria to select equations for inclusion; (3) estimated height for each DBH class for each species for DBH-and-height equations; (4) created a calculation "road map" for combining tree parts to generate total tree estimates; (5) computed a range of predictions, which we call prediction envelopes; (6) devised 3 calculation approaches to test sensitivity to alternative assumptions about the acceptability of extrapolating equations beyond the DBH range used in their development (hereafter termed the developmental range); (7) selected a biomass-to-C conversion factor; (8) incorporated alternative assumptions about "correlations" among different tree components; (9) created total live-tree C estimates for each DBH class for each species and applied the resulting ranges to inventory data to produce live-tree C estimates for the study area; and (10) explored alternatives for reducing model selection uncertainty. Detailed methods appear after the Conclusions.
As expected, assumption of positive correlation between tree components produced wider total live-tree C prediction envelopes than did negative correlation assumptions. For Pseudotsuga menziesii, using approach 2, average percent uncertainty over the 3 to 66 cm DBH range was 38% for positive correlation, but this was reduced to 23% with negative correlation. In general, using negative correlation assumptions halved the average percent uncertainty.
NWOR live-tree uncertainty
Live-tree carbon (C) ranges and uncertainty for northwest Oregon (NWOR)
Trees 3 to 66 cm diameter at breast height (DBH)
Approach 1: No corrections
Approach 2: Developmental DBH range
Approach 3: With corrections
% of midpoint
% of midpoint
% of midpoint
Trees 3 to 66 cm diameter at breast height (DBH)
Approach 1: No corrections
Approach 2: Developmental DBH range
Approach 3: With corrections
% of midpoint
% of midpoint
% of midpoint
Species contribution to NWOR uncertainty was closely correlated with the estimated number of trees of the species (as calculated in the Forest Inventory and Analysis (FIA) Integrated Database ), and even more closely to the NWOR live-tree C midpoint of each species. Prediction envelopes were not that different among species, so species contribution to NWOR uncertainty followed species prevalence in the inventory. Under approach 2, Pseudotsuga menziesii accounted for 62%, Tsuga heterophylla 24%, Alnus rubra 8%, Picea sitchensis 4%, and Acer macrophyllum 1% of NWOR live-tree C uncertainty, whereas they accounted for 58, 22, 15, 3, and 3% of the estimated live-tree C (at the midpoint of the range) for the same species (percents may not sum to 100 due to rounding). Pseudotsuga menziesii, therefore, contributed slightly more uncertainty to NWOR live-tree C than expected, and Alnus rubra contributed less.
Comparison with other Pacific Northwest (PNW) regional estimates
The first comparison with single-source total live-tree C estimates for NWOR Pseudotsuga menziesii 26 to 60 cm DBH (see complete Methods for details) demonstrated that these particular single-source estimates clustered around the midrange of our positive correlation approach 2 output (25 to 55 Tg). Equations from Gholz et al.  predicted 38, Jenkins et al.  39, Harmon et al.  40, Grier and Logan  41, Shaw  42 Tg C, and the FIA-based estimate predicted 37 to 42 Tg C (positive correlation). The range spanned by single-source estimates covered 15% of our prediction envelope range. However, the more single-source estimates that were included, the wider the range of single source estimates became. Comparison of estimated aboveground total live-tree C (8 single-source predictions) for the same trees produced a range that spanned 43% of our output range, whereas stem wood plus bark live-tree C (11 single-source estimates) yielded a range that covered 53% of ours.
The second comparison using all species produced an FIA-based estimate of 81 to 99 Tg C and a Jenkins et al.  estimate of 83 Tg C. Our approach 2 positive correlation assumption generated a range of 56 to 119 Tg C.
Comparison with other estimates of error
The estimated 95% confidence interval for sampling error from the FIA inventory was roughly +/-6% of the C estimate for NWOR. Measurement error in DBH, treated as a normally-distributed error, introduced 0.03% uncertainty into the FIA stem wood volume estimate. The range created from reported Jenkins et al.  80% of residuals bounds was 63 to 105 Tg C, corresponding to 25% uncertainty.
Strategies to reduce model-selection uncertainty
Comparison of Equation Forms
Our comparison of height-diameter-based equations with diameter-based equations for Pseudotsuga menziesii suggested that incorporation of height did not produce greater agreement among predictions. By this test, DBH-height equations appeared no more universally-applicable than DBH-only forms, nor did they appreciably decrease uncertainty related to model selection. This is in agreement with the findings of others [10, 11, 17].
Assigning Equations to Subpopulations
When prediction envelopes were subdivided and trees assigned among them, uncertainty was reduced proportionally to the number of divisions; i.e., dividing the envelope in two halved the uncertainty and dividing the envelope into 10 sections resulted in one-tenth the uncertainty obtained when using the full-width prediction envelope. This suggests that if one could correctly assign biomass equations within species, one could greatly reduce this form of uncertainty.
We examined the sensitivity of live-tree carbon estimates to model selection. Rather than use a single model to estimate total tree carbon, we used multiple total tree models as well as tree component models, the results of which were summed via multiple pathway permutations to estimate total tree biomass and ultimately, carbon. Although some might regard estimating tree level carbon by adding up multiple, modeled components as an unlikely way to attempt total tree C estimation, in practice, analysts such as those at PNW FIA who rely on BIOPAK  and other equation compilations frequently do assemble estimates by combining component estimates, sometimes because the developmental DBH range of available total tree equations is more limited than the ranges for component equations or because the sample size for some component equations (e.g. bole volume) is much greater than for other components. Even then, there are unavoidably some trees in the sample that are larger than the developmental ranges of the equations used for a given species and location.
We found that the range of estimates was quite large at the level of tree components, total trees, and NWOR. The uncertainty introduced by selecting different models was high regardless of species or how tree components were combined (either in terms of subcomponents or the type of correlation of tree components).
Given that extrapolation is often required, we considered several approaches, each with advantages and disadvantages. Approach 1 made no assumptions while retaining many predictions for every DBH class; however, this created difficulties by incorporating extreme equation behavior into prediction envelopes. Approach 2 largely removed such problematic equation behavior; however, at small and large DBHs there was often just one applicable equation, which probably resulted in an artificial uncertainty reduction caused by the narrow prediction envelope. Furthermore, component equations were only available to predict tree total C for a small range of NWOR DBHs, and this resulted in our being able to only compare approaches between 3 and 66 cm DBH. Approach 3 generated what appeared to be more realistic prediction envelopes than approach 1, but relied on an extrapolation approach based on modelers' assumptions of acceptable equation behavior.
Comparison of NWOR live-tree C estimates from approaches 1 and 2 (Table 1) reinforces the too-infrequently-heeded warning against equation extrapolation. Uncertainties of 90% for approach 1 over the 3- to 66-cm DBH range, where 81% of the target species trees in NWOR occur, are unacceptable when attempting to balance the global C budget or calculate C credits. Short of conducting studies to create more biomass prediction equations, some extrapolation is inevitable, however. Realistically, it is unlikely that uncertainties of this magnitude exist in current biomass or C estimates because approach 1 included equations so obviously unsuited to estimation at DBHs outside their developmental DBH ranges that they would be discarded by researchers during analysis. Note that although equations which predicted negative values were not excluded from approach 1 unless they were lacking developmental DBH range metadata, very few equations produced negative predictions between 3 to 66 cm.
It seems reasonable to suppose that equations with developmental DBH ranges that lie far from a target DBH class will be worse predictors than those with developmental DBH ranges that span the target DBH class or classes of interest. We explored this by selecting three DBH classes (20, 60, and 100 cm), then finding equations with developmental DBH ranges that (1) spanned the DBH class, (2) ended at half the DBH class, or (3) started at twice the DBH class. We then predicted biomass at the selected DBH class using equations from each available category and determined that equations with developmental DBH ranges distant from the target DBH classes produced wider ranges of estimates, with midpoints shifted from those produced by the equations that spanned the given DBH class. This further illustrates that equation extrapolation generates additional uncertainty.
Although prediction envelopes indicated wide C ranges for large-DBH trees, large-tree percent uncertainty was not necessarily higher; in many cases it was less than for very small trees. Even though it initially appears that creating large tree equations might be the most useful way to reduce uncertainty, that may not be the case. When considering how best to reduce uncertainty from model selection, the underlying NWOR DBH distribution should also be considered. Currently most NWOR C is found in trees between 20 and 70 cm DBH, and large trees are rare. Therefore, a more practical way to reduce uncertainty would be to better identify how to assign these mid-range trees to an appropriate equation. However, for areas where the DBH distribution is shifted toward larger DBHs, extrapolation would introduce more uncertainty in aggregated totals, and investment in determining better-predicting equations would be more worthwhile.
Comparison with other PNW regional estimates and estimates of error
To evaluate various regional estimates of live C stores, one would ideally compare not only the mean estimate, but also the uncertainty bounds . Unfortunately few studies have produced the latter, and even when this is the case some key components contributing to uncertainty have not been considered. We previously presented two alternative estimates to provide context and points of comparison for our estimates: FIA-predicted biomass  and Jenkins et al.  general biomass equations.
Both of these estimates were consistent with outputs from this study. Our estimate included only model-selection uncertainty, and the FIA estimate included sampling error that contributed approximately 6% uncertainty (plus limited model-selection uncertainty introduced by our C:biomass conversion factor range used on our foliage, dead branch, and coarse root prediction envelopes). The fact that our midpoint estimates are similar reflects that FIA-selected equations for many species were near the midpoint of target species component prediction envelopes.
Jenkins et al.  80% bands derived from pseudodata residuals predicted a similar range to our approach 2 positive correlation range. This is hardly surprising given that our approach 2 bears great resemblance to their procedure, except they determined a central tendency whereas we retained the bounds. Applying their 80% regression-residual bounds to their estimate is essentially re-building the bounds of the equations they incorporated. The equations in  are simple to apply and are national in scope; consequently they may be widely used for estimation. For four of the NWOR target species as well as the NWOR total, these equations produced midpoints and ranges similar to those in our study. Estimates for Picea sitchensis, however, were considerably higher in all our approaches. This highlights that care must be taken to determine how well national biomass estimators predict at a regional level.
Relative uncertainty of error components
When estimating uncertainty in biomass and C estimates, at least four types of errors/uncertainties need to be considered: measurement, sampling, regression, and model selection. Of the four, the first three are best understood. Because they are usually modeled as random errors, region wide estimates of error are very low. Phillips et al.  considered sampling, regression, and measurement errors in FIA volume estimates. Given that they considered only one equation for softwoods and another for hardwoods, they did not address what we term model-selection uncertainty. They determined that measurement error was the smallest error component, accounting for only 0.1% of the overall variance (from the three factors). Our quick estimate of measurement error in NWOR volume indicated that it was also quite a small contributor to overall live-tree C uncertainty. Phillips et al.  found that sampling error was the largest error component, accounting for 98.7% of overall variance. Sampling error calculated for NWOR was similarly much larger than measurement error. The overall standard error from the five southeastern states they studied only amounted to about 0.6% of the total volume estimate. We made no calculations of regression error, but had we calculated standard error for NWOR in the manner of , we believe it would be quite low. This calculation method assumed independence between sampling units and/or trees in all cases. However, if even a small amount of systematic error were present, it could yield a large uncertainty when tree volumes were aggregated . Were the large potential systematic errors arising from model selection choice incorporated, we suspect overall uncertainty would increase by at least an order of magnitude.
In estimating live C stores for the US, Heath and Smith [20, 21] subjected the FORCARB model to an uncertainty analysis and concluded that uncertainty for total forest C (i.e., live, dead, soil) in U.S. private forests was +/-9% of their 20 petagram (Pg; 1 Pg = 1 × 1015 g) C estimate for the year 2000. Of nine model parameters examined, the tree volume-to-C conversion factor was second only to soil C in its contribution to the overall uncertainty. Our analysis seems relevant to two of their model parameters: volume and the volume-to-C conversion. FORCARB relies on FIA inventory tree volumes, and Heath and Smith  used reported FIA estimates of sampling error to arrive at a +/-5% sampling uncertainty estimate for volume (their uncertainty is expressed as a percentage of the median and represents +/-2 standard errors). This is similar to the +/-6% sampling error for volume estimated from our FIA dataset for NWOR. Their volume-to-C conversion factor was assigned +/-15% uncertainty . This approximated the uncertainty of our two-step volume-to-carbon conversion, which had an estimated range of +/-10% of the midpoint (for stem wood averaged across species). Heath and Smith  apparently did not include what we term model-selection uncertainty. Our analysis indicated model-selection uncertainty in NWOR for stem wood volume was 12% (from 22 stem wood volume equations, using approach 2 procedures over DBH classes 10 to 40 cm to allow inclusion of all species). The NWOR model-selection uncertainty for stem wood biomass was 22% (calculated from 44 stem wood biomass equations over the same DBH range). Inclusion of this level of model-selection uncertainty into the FORCARB uncertainty analysis would have likely increased tree C uncertainty and total forest C store uncertainty above the +/-9% they reported for their base model, perhaps to the point where it would exceed soil uncertainty .
Representing model-selection uncertainty
Although sensitivity to model selection has rarely been considered when estimating uncertainty in live-tree volume, biomass, or C stores, our analysis indicated it could be the most significant contributor to uncertainty. In our study, we chose to develop prediction envelopes to represent this facet of uncertainty. The advantage of this approach is that no assumptions about the form or weighting of equations need to be made. Given that the input equations were not part of an overall experimental design and that a variety of equation forms were used, prediction envelopes allow one to use the maximum amount of information. A disadvantage of this approach is that information about central tendencies of the calculation pathways is essentially discarded so the characterization of model-selection sensitivity is greater than it would be otherwise. Such approaches are also not amenable to statistical analysis. Furthermore, there are the issues of nonadditivity and back-transformation of log-log equations. Nonadditivity occurs when predictions from component equations do not sum to the prediction from an aggregated component equation developed from the same trees, and until recently  developers of biomass equations did not pay it much heed, although it was remarked on by biometricians for years [23, 24]. Using a sample of four sets of equations developed for Pseudotsuga menziesii that did not appear to have been constrained to ensure additivity, we found that additivity error for aboveground total biomass (over the developmental DBH range) was nowhere greater than 5%, and averaged -0.04, 1.18, -1.46 and 2.06% overall (equations from [16, 25, 26], and , respectively). Back-transformation of log-log equation predictions is a much debated issue, with some pointing out that without such back-transformation, estimates are biased downward . This was true for a subset of natural-logarithm-transformed volume and biomass equations that we examined, where bias as a percentage of the uncorrected values ranged from 0.7% for stem wood to an astounding 153% for dead branches. Mean biases for stem wood were 3.19% and 8.7% for stem bark (16 equations each). Other researchers contend that back-transformation introduces its own set of biases . Our largely uncorrected (in some cases corrections may have been applied by authors, but it wasn't clear) equations may therefore have introduced bias. The ranges of our prediction envelopes, however, were such that we deemed possible additivity and back-transformations biases unremarkable (assuming possible bias of 153% was quite uncommon) and they are unavoidable anyway by anyone using these sets of equations.
Jenkins et al.  pursued an alternative approach to dealing with model-selection sensitivity by developing general equations. They presented a set of national biomass equations, grouped by species similarity, that were based on a library of previously published regional and local equations. Lacking the original tree-level data, they created their new equations from pseudodata generated from the equation library. This approach utilizes the central tendency information inherent in the equation library but essentially discards the outer bounds and introduces various problems related to using pseudodata to generate equations . If the general equations truly represent the central tendency, they should consistently predict total live C stores for geographic areas comparable to those on which the library of equations was based. However, the use of the general equation may increase uncertainty, particularly when analysis is aimed at specific species or subregions dominated by particular species. Case and Hall , working with boreal forest data from west-Central Canada, determined that local and generalized regional biomass equations provided acceptable site-level estimates but that generalized national equations  produced considerably higher average predication errors at the site level. Mean prediction biases from national equation predictions were also statistically different from local and regional ones for 5 of 10 species. It is currently impossible to determine if equations in  produce unbiased estimates at regional/national levels (as we have little truth against which to compare estimates); however, when estimating for some regions, such as Ponderosa pine forests in the interior West, using the all-pine equation  that is constructed from equations developed for not only Pinus ponderosa (Ponderosa pine) but also for the comparatively faster-growing Pinus taeda (loblolly pine) and Pinus elliottii (slash pine), bias is likely. The difference in C estimates for Picea sitchensis between this study and the Jenkins et al.-based estimate  indicates potential bias, possibly arising from their grouping of Picea sitchensis with other Picea that have shorter growth habits and the relative scarcity of Picea sitchensis sources compared with those of other Picea species (i.e., 2 for Picea sitchensis versus 25 sources for 5 other Picea species). Bias would be unlikely if the equations were included in proportion to the abundance of tree species and area represented in the area to be analyzed. Inclusion of too many equations over a part of the DBH range or from a particular type of site or species could weight the overall equation in that direction, even if that type were rare.
Our approach and the Jenkins et al.  method both relied on existing biomass equations. However, there are major problems with existing equations [9, 13, 30]. These include inappropriate or nonrepresentative selection of trees in the development of equations, limited sample sizes (especially when large trees or difficult-to-measure components such as roots are involved), and limited sample DBH ranges. Equations (especially for volume) come in a variety of forms, and there is no consistent partitioning of trees into components. Even for a major component such as stem wood, equations differ in assumptions of stump heights and top diameter, complicating comparisons among models. Crowns are notorious for the variability in the approaches to their division into components, with branches classified or grouped at varying diameter breakpoints, foliage either included with the smallest branch class or not, and branches and foliage split into live and dead classes or not. Furthermore, component equations relying on nonlinear transformations of data are nonadditive. Use of different equation forms for different components (unless special procedures are observed during equation development ) also contributes to the nonadditivity of component equations . Statistical information necessary to compare equations is rarely presented, and few publications include the necessary information to create regression prediction intervals, so generation of pseudodata representing the true level of variation in predictions is not possible. Data describing site and sample characteristics lack consistency as well, making comparisons among equations based on these characteristics problematic. Raw data are rarely presented, but as Jenkins et al.  observed, this would be helpful to researchers developing new generalized equations. Authors of some recent North American equations have borne this in mind and provide, if not data, then at least more complete regression statistics and component equations that are additive .
Reducing uncertainty due to model selection
The considerable expense of developing new biomass equations and the urgency in getting to a system that can accurately characterize forest C stores and flux in support of C management, argues for utilization to the maximum extent practicable the biomass equations and data that have already been developed. Unfortunately these equations and data were typically developed to represent specific geographical areas, ranges of tree sizes, or tree components, and there is no practical way to objectively assess bias of the existing systems of equations. Although it would be unrealistic to set aside all existing equations and begin anew, an effort to more systematically capture the variability present within and between tree species would contribute to understanding the scope of the potential bias and uncertainty introduced by model selection. Such efforts have been untaken in some regions (e.g., manipulations of the Canada ENFOR data ) and are a logical starting point.
There are several ways to reduce uncertainty owing to model selection. Our analysis indicates that subdividing biomass equations would reduce uncertainty, but to succeed, development of a consistent and robust method for choosing the best equation for each tree is needed. To some degree, equations can be selected based on geography (e.g., equations for Douglas-fir in coastal versus interior British Columbia ). For example, PNW FIA already applies different equations for a quarter of the conifer species in their database depending on whether the tree is located east or west of the Cascades . However, equations developed from stands growing in proximity and apparently similar physiographic situations can also yield differing predictions, although closely matching the DBH range of the target to the developmental population may help in choosing an equation with a good fit . Understanding the degree to which local-scale factors control tree form, and the possible influence of genetics, would contribute to better model selection.
Another approach would be to use a biomass equation that is truly general. Inclusion of height in biomass equations is sometimes thought to create a more widely applicable equation, but our examination of equation predictions for Pseudotsuga menziesii, the most-sampled species in the PNW, indicated that there was no more agreement between height-and-DBH-based equations than among DBH-only ones (use of height-and-DBH-based equations might be preferable in limited circumstances, such as managed stands that have not arrived at crown closure ). This is probably due to the fact that tree form varies greatly. In the case of excurrent forms (those with a strong central leader), trees forms can range from paraboloids to cones to neiloids. Although few species span this entire range of forms, such differences in tree taper patterns could lead to differences of approximately 50% in volume and biomass between two trees, even when their DBH and height are identical. To some degree, these differences can be accounted for by knowing the species. Inclusion of a form factor into biomass equations may reduce model-selection uncertainties but create other problems. As with height, it would be difficult to determine the form of each tree; therefore, some prediction of form would be required. Moreover, development of efficient ways to quantify form and taper would also be needed, possibly via subsampling mid-height diameters in stands (for excurrent forms) and height to the first major branch (for deccurrent forms - those with weak central leaders). The degree to which the uncertainty introduced by such prediction offsets that introduced by model selection would require further investigation.
Sensitivity of NWOR live-tree estimates to model selection was substantial at every level examined and varied with the degree of correlation assumed between tree components. Especially considering the potential for this form of uncertainty to introduce bias, it is likely more important than the combined uncertainty introduced via measurement, sampling, and regression. This facet of uncertainty has not been generally appreciated because the full range of available biomass equations has not been factored into estimates of uncertainty; however, it should be considered by those interpreting estimates of live carbon stores and fluxes generated from national or local inventory based accounting protocols, especially in applications, such as valuing carbon credits, where unbiased estimates are critical. Model-selection uncertainty is not an easily-remedied error and may call into question the premise of being able to track forest carbon with the precision and accuracy required to support contemplated offset protocols. Our analysis indicates that the only way to truly reduce uncertainty from model selection is to subdivide the existing biomass equations or to develop an equation form that can predict the range present in existing equations. Our analysis suggests that for the latter solution to succeed, addition of tree height will not work unless some information on tree form is also included.
We considered only the five most commonly occurring tree species in NWOR (the "target species" set) to avoid confounding effects from the high-degree of equation substitution employed for less common (and less frequently studied) species. The target species include three conifers: Picea sitchensis (Sitka spruce), Pseudotsuga menziesii (Douglas fir), and Tsuga heterophylla (western hemlock) and two hardwoods: Acer macrophyllum (bigleaf maple) and Alnus rubra (red alder), They collectively account for 90% of all live trees estimated by the forest inventory to exist in NWOR . What we refer to hereafter as estimates of total live-tree C are, in fact, estimates of C in live trees of these 5 species.
Sources and criteria for equation selection
We obtained relevant equations for volume and dry biomass from BIOPAK , the Jenkins et al. Comprehensive Database , and other available literature (see Additional files 1 and 2). Equations were deemed relevant if data originated, at least in part, from western British Columbia, Canada, southern coastal Alaska, or from the area west of the Cascade crest in OR and Washington (WA). However, some root and stump equations from the eastern U.S., Canada, and parts of Europe were included owing to the limited number of appropriate local equations for these components. Equations were excluded if they (1) relied on variables other than DBH and height (although equations relying on components we could calculate from DBH and height were allowed), (2) were not accompanied by the range of DBHs used to develop the equation, (the developmental DBH range; excepting the Weyerhaeuser stem wood volume equation  because it is used by FIA), (3) used stump heights other than 10, 15, or 30 centimeters (cm; the most common values, corresponding roughly to 4, 6, and 12 inches), (4) did not extend to the top of the stem (excepting equations from ).
h = total tree height in meters,
d = DBH outside bark in cm,
e = the base of natural logarithms, 2.71828...,
b0 = maximum height,
b1 = steepness parameter, and
b2 = curvature parameter .
Northwest Oregon height equations
1.0711 (0. 0338)
We used the collected volume and biomass equations to create a total tree C "prediction envelope" for each species. This envelope encompassed the range of possible C values between the uppermost and lowermost predictions given by all possible combinations of the equations. To convert volume to biomass, we used density values from the literature (see Additional files 2 and 3) and retained the lowest and highest values for each species and component combination to create two biomass estimates based on each volume equation. Prediction envelopes were stored as lookup tables containing biomass ranges for each species and component by each 1-cm DBH class.
Few equations have been developed using data that encompass the full range of DBHs present in NWOR. Extrapolation of equations beyond the developmental DBH range, although statistically invalid, has been unavoidable for anyone needing to obtain estimates for large trees; the FIA Program, for example, has many large trees in their sample, and in some areas, these account for much of the live tree C. We therefore examined three contrasting approaches: approach 1 used each equation over the entire species DBH range with no corrections; approach 2 used each equation only over its developmental DBH range; approach 3 used a combination of extrapolated and modified equations when required to produce "reasonable" estimates at all DBH classes.
Crown components, especially foliage, were not expected to increase significantly after a tree reached maturity, as assumed by Turner and Long . Therefore we truncated crown component predictions for approach 3 in the middle of the species NWOR DBH range (as given in ) and applied predicted values at those points to all larger DBHs. However, the only species so modified for the purposes of this analysis was Alnus rubra, starting at the 54 cm DBH class; all other modifications began above 66 cm. Further details of approach 3 methods may be found in Additional file 4.
Conversion of biomass to C
The C content of wood for the target species set ranged from 47.7 to 50.6% of dry biomass , although C content of other components might be significantly different for some species . However, following Gifford's  suggestion of using 50 +/-2% for Australian national C estimates, biomass lookup table minima for all components were multiplied by 48% and the maxima by 52% for all species to account for the uncertainty in this conversion factor.
Incorporation of possible correlation between tree components
Species-specific total tree C prediction envelopes were generated via addition and comparison of envelopes for all lesser tree components, following the sequences depicted in Figure 6. Because we summed ranges rather than point estimates, "correlation" between components could differentially affect the width of the envelopes at each addition step. We refer here not to statistically-calculated correlations, but to patterns that might occur as trees partition resources. Consider a hypothetical tree of a given diameter, which may have grown taller than others in its DBH class and so has a higher stem wood biomass. Being a larger tree, it might have more branch biomass and root biomass (positive correlation). On the other hand, trees are also known to allocate resources to one component at the expense of the others, so a taller stem might indicate less biomass in the branches and roots (a negative correlation). Correlation between all pairs of tree components is unknown, and likely varies, so we devised a method to examine sensitivity using two extreme examples of correlation. In the first, we assumed completely positive correlation at each addition step for all approaches. In the second, we assumed negative correlation at each addition step. We do not consider either option to be particularly realistic; however, we sought to bracket the possibilities, not find a most likely value. (Further methods and an example may be found in Additional file 4.)
Applying prediction envelopes to inventory data
Total live-tree C prediction envelope values were linked with FIA inventory data  to produce a potential live-tree total tree C range for NWOR. Inventory data include tree measurements as well as necessary expansion factors for scaling plot data to county- and state wide levels. Appropriate prediction envelope bounds were then multiplied by expansion factors for each tree in the database. We summed the resulting values by species, then summed species totals to produce total live tree C storage bounds for NWOR. Total live tree C storage was calculated for both positive and negative correlation assumptions to assess how sensitive estimates were to correlation of tree components. Our reported uncertainty values represent half the output range and were also expressed as a percentage of the midpoint C estimate. Basing our uncertainty output on the midpoint or using the midpoint as a point of comparison between approaches is not meant to imply that it is the most likely value as this study was designed to examine the possible range of estimates caused by model selection.
Comparison with other Pacific Northwest regional estimates
We compared our NWOR live-tree C ranges from approach 2 with single-point NWOR C estimates produced using biomass equations presented in several separate articles (which we label "single-source" estimates, even if the author(s) incorporated equations developed outside of their own study (e.g., )). Our prediction envelope approach, in contrast, produced what might be called "multiple-source" estimates. For each component at a given DBH, one equation became the lower, and one the upper, bound of our prediction envelope. However, owing to differing equation forms and coefficients, the same equations often were not the bounds over the entire DBH range of a prediction envelope; thus equations from multiple sources could contribute to the bounds of our final total tree C prediction envelope. Single-source estimates for tree total biomass were rare in the literature, but were abundant for aboveground total and stem wood plus bark. Such single-source estimates were not necessarily local equations; some were regional or even national, and a few were national multispecies equations (e.g., ). We undertook two comparisons: one limited to total tree, aboveground total, and stem wood plus bark of Pseudotsuga menziesii 26 to 60 cm DBH to enable as many single-source estimates as possible, and the second limited to comparison between the FIA and Jenkins et al.  estimates but including all target species 3 to 66 cm DBH for tree total C.
Published and Web-posted FIA volume and biomass estimates are often relied upon as a basis for estimating biomass and C (e.g., [3, 4, 40]). However, PNW FIA biomass estimates lack foliage, dead branches, and any trees under 2.5 cm DBH. To compare FIA single-source estimates with our total tree and aboveground C, we added C from our prediction envelopes for missing tree components as a range at each DBH. No correction was made for small trees because our final DBH comparison range was constrained by the limitations imposed by approach 2 and did not extend to such low DBHs. All single-source biomass estimates used a 50% C-to-biomass conversion factor for this comparison only, excepting the components added to the FIA estimate.
To compare our NWOR totals with FIA-based estimates for the 3 to 66 cm DBH range for all target species, it was necessary to fill in some gaps in our prediction envelopes for branch dead, foliage total, and roots coarse with output from approach 3. The Jenkins et al.  tree totals also required limited extrapolation of their root equations over a few DBH classes for three species.
Comparison with other estimates of error
Other estimates of error generally include only sampling error (as in FIA reports, e.g., ) and, more rarely, errors generated by measurement and regression . To estimate FIA sampling error for NWOR live-tree C we examined the most recent FIA report for the western OR periodic inventory and obtained one standard error (SE) for a range of volume estimates . Volume was multiplied by the average density (weighted based on stem wood volume in NWOR, using FIA densities  of the target species set to convert to biomass and a C:biomass conversion factor of 50% to obtain sampling error in C. An approximate 95% confidence interval for the appropriate C value was obtained by doubling the associated SE.
Diameter measurement variation is generally <2% of diameter (expressed as a 95% confidence interval; ). To obtain an approximation of the magnitude of diameter measurement error for FIA-reported volume, we followed the example of Phillips et al.  and took a simple equation form, calculated standard error from a 2% DBH measurement error for each tree in the database, applied expansion factors, and summed to the NWOR level. This assumed that a simple equation form was used for each tree, that there was no measurement error in height, and that errors were independent.
Jenkins et al.  reported bounds that contained 80% of their residuals for each multi-species equation. These residuals were from pseudodata, so they do not represent exactly what traditional regression residuals do, but we wished to see how such bounds would compare to the output of our analysis. As a quick test, we used the Jenkins et al.  NWOR live-tree C from our second comparison in the previous section and calculated a simple range for each species using their data, then summed the bounds for each species to the NWOR level. Their reported values only apply to their aboveground equations, but we applied them to the sum of the aboveground and root biomass equations.
Strategies to reduce model-selection uncertainty
Comparison of Equation Forms
One way to reduce uncertainty would be to determine which, if any, equations were more accurate predictors. It is sometimes assumed that by accounting for height variation, so-called standard equations (those that incorporate both DBH and height as dependent variables) are more widely applicable than local (DBH-only) ones. We tested this by plotting Pseudotsuga menziesii stem wood biomass predicted by several standard equations (equation numbers 157, 204, 1536, 1932, and 2692 from [16, 26, 18, 31], and ; see equations in Additional file 1) against the product of DBH2 and height. We expected that if incorporation of height into the regression reduced uncertainty, standard equation predictions would converge more when plotted against the product of DBH2 and height (a common variable in standard equations) than when plotted against only DBH or height.
Assigning Equations to Subpopulations
If volume and biomass equations could be accurately assigned to individual trees, uncertainty of the live-tree C estimate should decrease. We tested our knowledge about regression equation assignment by creating a scenario in which hypothetical equations were represented by dividing the total tree C envelope into sub-envelopes. Each hypothetical equation accounted for an equal proportion of the total tree C envelope and was assigned to an equal number of trees. For each species and DBH class, the total tree C envelope was divided into 2, 3, 4, 5, or 10 smaller envelopes of identical width, and trees were partitioned into 2, 3, 4, 5, or 10 equal-sized groups. Then we applied the appropriate number of trees to the new sets of C bounds for that species to obtain NWOR totals for each hypothetical equation, then summed the resulting minima and maxima to achieve the new NWOR live-tree C range.
Funding and support for this research was provided by the Forest Inventory and Analysis Program of the Pacific Northwest Research Station and the Richardson Endowment of Oregon State University. The PNW FIA Program also loaned computer equipment and software.
The authors thank L. Ganio and V.J. Monleon for advice, J.T. Melson for supplemental computer programming, K. Waddell for providing initial tutoring on the FIA inventory database, all the PNW FIA analysts for their continuing assistance, and the FIA and Pacific Northwest Region inventory crews for data collection. The authors are also grateful to three anonymous reviewers whose suggestions improved this manuscript.
- IPCC: Climate Change 2007: the Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge and New York: Cambridge University Press; 2007.Google Scholar
- Schlesinger WH: Carbon trading. Science 2006, 314: 1217. 10.1126/science.1137177View ArticleGoogle Scholar
- Birdsey RA: Carbon Storage and Accumulation in United States Forest Ecosystems. Radnor: USDA Forest Service Northeastern Forest Experiment Station; 1992. Gen Tech Rep WO-59 Gen Tech Rep WO-59View ArticleGoogle Scholar
- Turner DP, Koerper GJ, Harmon ME, Lee JJ: A carbon budget for forests of the conterminous United States. Ecol Appl 1995, 5: 421–436. 10.2307/1942033View ArticleGoogle Scholar
- Phillips DL, Brown SL, Schroeder PE, Birdsey RA: Toward error analysis of large-scale forest carbon budgets. Global Ecol Biogeogr 2000, 9: 305–313. 10.1046/j.1365-2699.2000.00197.xView ArticleGoogle Scholar
- Harmon ME, Phillips DL, Battles JJ, Rassweiler A, Hall ROJ, Lauenroth WK: Quantifying uncertainty in net primary production measurements. In Principles and Standards for Measuring Net Primary Production in Long-Term Ecological Studies. Edited by: Fahey TJ, Knapp AK. Oxford: Oxford University Press; 2007:238–260. full_textView ArticleGoogle Scholar
- Azuma DL, Bednar LF, Hiserote BA, Veneklase CF: Timber Resource Statistics for Western Oregon, 1997. Portland: USDA Forest Service Pacific Northwest Research Station; 2002. Resour Bull PNW-RB-237 Resour Bull PNW-RB-237Google Scholar
- Gholz HL, Grier CC, Campbell AG, Brown AT: Equations for Estimating Biomass and Leaf Area of Plants in the Pacific Northwest. Corvallis: Oregon State University Forest Research Laboratory; 1979. Res Pap 41 Res Pap 41Google Scholar
- Tritton LM, Hornbeck JW: Biomass Equations for Major Tree Species of the Northeast. Durham: USDA Forest Service Northeastern Forest Experiment Station; 1982. Gen Tech Rep NE-69 Gen Tech Rep NE-69Google Scholar
- St.Clair JB: Family difference in equations for predicting biomass and leaf area in Douglas-fir ( Pseudotsuga menziesii var. menziesii ). For Sci 1993, 39: 743–755.Google Scholar
- Grigal DF, Kernik LK: Generality of black spruce biomass estimation equations. Can J For Res 1984, 14: 468–470. 10.1139/x84-085View ArticleGoogle Scholar
- Hiserote B, Waddell K: PNWFIA IDB: the integrated database 1.4. Portland: USDA Forest Service Pacific Northwest Research Station; 2004. [MS Access database] [MS Access database]Google Scholar
- Jenkins JC, Chojnacky DC, Heath LS, Birdsey RA: National-scale biomass estimators for United States tree species. For Sci 2003, 49: 12–35.Google Scholar
- Harmon ME, Garman SL, Ferrell WK: Modeling historical patterns of tree utilization in the Pacific Northwest: carbon sequestration implications. Ecol Appl 1996, 6: 641–652. 10.2307/2269398View ArticleGoogle Scholar
- Grier CC, Logan RS: Old-growth Pseudotsuga menziesii communities of a western Oregon watershed: biomass distribution and production budgets. Ecol Monogr 1977, 47: 373–400. 10.2307/1942174View ArticleGoogle Scholar
- Shaw DL: Biomass equations for Douglas-fir, western hemlock, and red cedar in Washington and Oregon. In Forest Resource Inventories. Edited by: Frayer WE. Fort Collins: Colorado State University; 1979:763–781.Google Scholar
- Bormann BT: Diameter-based biomass regression models ignore large sapwood-related variation in Sitka spruce. Can J For Res 1990, 20: 1098–1104. 10.1139/x90-145View ArticleGoogle Scholar
- Means JE, Hansen HA, Koerper GJ, Alaback PB, Klopsch MW: Software for Computing Plant Biomass-BIOPAK Users Guide. Portland: USDA Forest Service Pacific Northwest Research Station; 1994. Gen Tech Rep PNW-GTR-340 Gen Tech Rep PNW-GTR-340Google Scholar
- Gertner GZ: The sensitivity of measurement error in stand volume estimation. Can J For Res 1990, 20: 800–804. 10.1139/x90-105View ArticleGoogle Scholar
- Heath LS, Smith JE: An assessment of uncertainty in forest carbon budget projections. Environ Sci Policy 2000, 3: 73–82. 10.1016/S1462-9011(00)00075-7View ArticleGoogle Scholar
- Smith JE, Heath LS: Identifying influences on model uncertainty: an application using a forest carbon budget model. Environ Manage 2001, 27: 253–267. 10.1007/s002670010147View ArticleGoogle Scholar
- Lambert MC, Ung CH, Raulier F: Canadian national tree aboveground biomass equations. Can J For Res 2005, 35: 1996–2018. 10.1139/x05-112View ArticleGoogle Scholar
- Cunia T: Use of dummy variable techniques in the estimation of biomass regressions. In Estimating Tree Biomass Regressions and Their Error: Proceedings of the Workshop on Tree Biomass and Regression Error of Forest Inventory Estimates; May 26–30, 1986; Syracuse, NY. Volume 87. Edited by: Wharton EH, Cunia T. Broomall: USDA Forest Service; 37–48. NE-GTR-117 NE-GTR-117
- Parresol BR: Additivity of nonlinear biomass equations. Can J For Res 2001, 31: 865–878. 10.1139/cjfr-31-5-865View ArticleGoogle Scholar
- Barclay HJ, Pang PC, Pollard DFW: Aboveground biomass distribution within trees and stands in thinned and fertilized Douglas-fir. Can J For Res 1986, 16: 438–442. 10.1139/x86-080View ArticleGoogle Scholar
- Standish JT, Manning GH, Demaerschalk JP: Development of Biomass Equations for British Columbia Tree Species. Vancouver: Canadian Forestry Service Pacific Forest Research Centre; 1985. Rep BC-X-264 Rep BC-X-264Google Scholar
- Espinosa Bancalari MA, Perry DA: Distribution and increment of biomass in adjacent young Douglas-fir stands with different early growth rates. Can J For Res 1987, 17: 722–730. 10.1139/x87-115View ArticleGoogle Scholar
- Baskerville GL: Use of logarithmic regression in the estimation of plant biomass. Can J For Res 1972, 2: 49–53. 10.1139/x72-009View ArticleGoogle Scholar
- Case BS, Hall RJ: Assessing prediction errors of generalized tree biomass and volume equations for the boreal forest region of west-central Canada. Can J For Res 2008, 38: 878–889. 10.1139/X07-212View ArticleGoogle Scholar
- Cunia T: On tree biomass tables and regression: Some statistical comments. In Forest Resource Inventories. Edited by: Frayer WE. Fort Collins, CO: Colorado State University; 1979:629–642.Google Scholar
- Kurucz J: Component weights of Douglas-fir, western hemlock, and western red cedar biomass for simulation of amount and distribution of forest fuels. In MS thesis. University of British Columbia, Forestry Department; 1969.Google Scholar
- Pitt DG, Bell FW: Effects of stand tending on the estimation of aboveground biomass of planted juvenile white spruce. Can J For Res 2004, 34: 649–658. 10.1139/x03-234View ArticleGoogle Scholar
- Franklin JF, Dyrness CT: Natural Vegetation of Oregon and Washington. Corvallis: Oregon State University Press; 1988.Google Scholar
- Jenkins JC, Chojnacky DC, Heath LS, Birdsey RA: Comprehensive Database of Diameter-Based Biomass Regressions for North American Tree Species. Newtown Square: USDA Forest Service Northeastern Research Station; 2004. Gen Tech Rep GTR-NE-319 Gen Tech Rep GTR-NE-319Google Scholar
- Brackett M: Notes on Tarif Tree Volume Computation. Olympia: State of Washington Department of Natural Resources; 1977. Resour Manage Rep No. 24 (Eqn. No. 4) Resour Manage Rep No. 24 (Eqn. No. 4)Google Scholar
- Garman SL, Acker SA, Ohmann JL, Spies TA: Asymptotic Height-Diameter Equations for Twenty-Four Tree Species in Western Oregon. Corvallis: Oregon State University Forest Research Laboratory; 1995. Res Contrib 10 Res Contrib 10Google Scholar
- SAS Institute: The SAS system 8.1. Cary: The SAS Institute, Inc; 2000.Google Scholar
- Lamlom SH, Savidge RA: A reassessment of carbon content in wood: variation within and between 41 North American species. Biomass and Bioenergy 2003, 25: 381–388. 10.1016/S0961-9534(03)00033-3View ArticleGoogle Scholar
- Gifford RM: Carbon Contents of Above-Ground Tissues of Forest and Woodland Trees. Canberra: Australian Greenhouse Office; 2000. National Carbon Accounting System Tech Rep No. 22 National Carbon Accounting System Tech Rep No. 22Google Scholar
- Cost ND, Howard JO, Mead B, McWilliams WH, Smith WB, Van Hooser DD, Wharton EH: The Forest Biomass Resource of the United States. Washington, DC: USDA Forest Service; 1990. Gen Tech Rep WO-57 Gen Tech Rep WO-57View ArticleGoogle Scholar
- Kloppel B, Harmon ME, Fahey TJ: Estimating ANPP in forest dominated ecosystems. In Principles and Standards for Measuring Net Primary Production in Long-Term Ecological Studies. Edited by: Fahey TJ, Knapp AK. Oxford: Oxford University Press; 2007:63–81. full_textView ArticleGoogle Scholar
- Ung CH, Bernier P, Guo XJ: Canadian national biomass equations: new parameter estimates that include British Columbia data. Can J For Res 2008, 38: 1123–1132. 10.1139/X07-224View ArticleGoogle Scholar
- U.S. Environmental Protection Agency: Level III Ecoregions of Oregon.U.S. EPA Office of Research and Development National Health and Environmental Effects Research Laboratory; [ftp://ftp.epa.gov/wed/ecoregions/or/or_eco_l3.zip] [spatial data file]
- Oregon Gap Analysis Program: Forestland.[http://navigator.state.or.us/sdl/data/shapefile/k250/forestland.zip] [spatial data file]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.