Смекни!
smekni.com

Devils Lake Essay Research Paper Dramatic Fluctuations (стр. 3 из 4)

Forecasts using expressions like those in equations (1) or (2) can be produced using an iterated or direct approach. For the 1-step iterated approach, the next value of the time series (T=1 in the equations) is forecast. This is then used as a ?known? value at time step t+1, and the model is re-applied to forecast the next time step (t+2). This process is repeated T times until the desired lead-time for the forecast is reached. Only the existing data is used for fitting the 1-step ahead forecasting function function f(.). The estimated values of xt+1, xt+2 ? etc., are used only to compute new iterates and not to re-fit the function f(.). The iterated approach is consequently similar to the traditional autoregressive modeling approach. In the direct method, separate 1-step, 2-step, ?T-step ahead models are fit to the data, and are directly applied to generate 1, 2, ?T-step ahead forecasts. The two forecasting methods can provide different results depending on the relative signal-to-noise ratio (relative magnitude of the variance of the error term et, to the variance of xt), and on local variations in predictability that depend on the nonlinearity of the underlying f(.). Both methods of forecasting were evaluated in a cross-validated testing mode with the Devils Lake data. The models were fit to selected portions of the data and tested on the remainder.

For any candidate set of predictors in a particular fitting exercise, the predictors retained in the model as well as the complexity of the model (e.g., number and placement of knots for the regression spline) is selected using traditional statistical criteria (e.g., Generalized Cross Validation or GCV, and the Schwarz Criteria, SC). The use of logarithmic and square root transforms of the Devils Lake volume were also explored in the model building process. Suitably chosen predictors with different intrinsic time scales of fluctuations (e.g., interannual to decadal for ocean temperatures and seasonal for local precipitation) can potentially be used to reconstruct the short and long run dynamics of Devils Lake. A variety of numerical algorithms (e.g., spline regression, locally weighted polynomial regression, and neural networks) were explored for estimating f(.). Multivariate, adaptive regression splines (Friedman, 1991) encoded in a Windows 95 application (Ames, 1998) that focuses on time series model building and forecasting were used in the work reported here.

Pre-screening of potential predictors:

One interesting implication of Takens theorem is that if multiple, lagged variables are used to reconstruct the state space of a dynamical system, there is a potential for significant coordinate redundancy, since it is conceptually possible to reconstruct the state space by lagging any one of the state variables. The statistical criteria we used to select predictors seek to limit such redundancies and their effects on the forecast scheme. However, as the number of potential predictors and hence choices for the statistical criteria increases, there is increasing potential for model mis-specification. Consequently, it is important to pre-screen the potential predictor set, before a model such as in (2) is fit to the data. We used the correlative analyses described in the preceding section and fit a set of candidate models with different subsets of potential predictors. Efforts were made to include potential predictors in each candidate set that span a set of intrinsic time scales of variability.

After the appropriate transformations are applied, a series of trial forecasts are performed to identify the most important predictors. The predictors considered were the PDO, NAO, and NINO3 indices, the five SST areas indicated in Figure 1, and the monthly precipitation anomalies for climate division 3 of North Dakota (PCP). The cumulative sum of the predictor was also considered as a predictor (Corradi, 1995). Several MARS models were fit using different combinations of these predictors at several different lead-times and from different starting dates in the historical record. The most important predictors identified are the PDO index and the SEC area of SST. The PCP predictor is important in certain cases. Often models that used just the time history of Devils Lake performed as well as those that used climate predictors.

Validation forecasts and outlook for future Devils Lake levels

Once these predictors were identified, iterated and direct forecasts at four different lead-times were made to explore the predictability of the DL volume series in the historical record. Comparisons for direct forecasts are shown in Figure 19 for: (a) the period of relatively steady lake volume beginning in 1981, (b) the 1987 transition to decreasing lake volume, (c) the increasing lake volume from 1997-1999, and (d) a blind outlook for future lake levels. We consider four different lead-times: 12, 18, 24, and 36 months. The potential predictors provided to the model for these forecasts were the past volumes of Devils Lake (with a square root transform), PDO, SEC, and N. Dakota Climate division (3) precipitation. Cumulative sums of anomalies from

Figure 19. Forecasts of Devils Lake volume, converted to levels, starting at various times for different lead times as shown are given in the legend. These are all direct forecasts, based on models fit using only data prior to the start date of the forecast. For a 36-month lead forecast starting in January 1981, only data up to January 1978 (i.e. 36 months prior) is used for model fitting. The subsequent months data is used to generate the forecast with the fitted model. Thus, at the end of the 36 month forecast, data from December 1980 would be used. MARS chooses different predictors from the candidate set. There was considerable variation in the predictors used for models fit at different times, and for different leads. For example, the 36 month ahead forecast model fit for data up to May 1999, uses Devils Lake (t-36), Cum. Sum SEC (t-36, t-72, t-84), Cum. Sum PDO (t-84) and N. Dakota Climate Division Precipitation (t-84) as predictors.

the long-term average were used for all predictors except Devils Lake. Considerable variation was noted in the fitted models for different lead times and for different times. One finding was that sometimes using the precipitation (PCP) as a predictor can improve the longer lead-time forecasts. The 24- and 36-month lead forecasts mentioned above used PCP as a predictor. The 24-month forecast mentioned above without PCP failed to predict the decrease after July 1997 (Figure 19c). Using PCP as a predictor also improved the 36-month lead forecast starting from January 1997. The forecast made without PCP was too extreme, over-predicting the volume for January 1999 by 207,000 acre-feet. However, when PCP was included as a predictor the accuracy improved, predicting the correct volume within 12,000 acre-feet (0.6%).

The outlook on future Devils Lake levels (up to mid 2002) indicate decreasing lake levels. This is a rather surprising forecast given the recent increases. However, the outlook agrees with long-term forecasts of the Upper Mississippi River (UMR) streamflow, which indicate that 2000 will be near-normal, but that 2001 will be below-normal (Baldwin, 1998, also available at: http://pub.uwrl.usu.edu/~cbald/dlake/cmon-new2-fcast-updated.gif). We compare these forecasts with those of the UMR since they have been demonstrated to be more accurate than those of Devils Lake directly, and provide a useful validation.

Bayesian time series forecasting methodology

Wiche and Vecchia (1995) presented an effective implementation of established, stationary, hydrologic time series analysis methods for the analysis of Devils Lake volumes. Unfortunately, such methods, can have a hard time reproducing features such as the recent rise of the Devils Lake, even with parameter uncertainty considered. In addressing our second objective for forecasting that entailed the determination of long-run lake volume probabilities conditional on current state, we considered several alternatives to the Wiche and Vecchia work. These included Fractionally Integrated Autoregressive Moving Average models (ARFIMA), and a Bayesian autoregressive modeling approach (ARCOMP) that considers uncertainty in both the model parameters and coefficients. ARCOMP, is a relatively new approach due to Huerta and West (1999), that admits certain long memory and quasi-periodic sub-processes. A direct application of the ARFIMA model to the monthly Devils Lake volume (log transform) time series leads to the selection by AIC of a (AR=5, d=0.32, MA=1) model, with model coefficients (AR: 1.83, -0.97, 0.08, -0.004, 0.068; MA: 0.73). The forecasts from this model tended to the mean of the series, and were unsuccessful for the 1990?s. The ARCOMP procedure is described below.

Consider, a univariate autoregressive process of order p, AR(p):

(3)

where the fI are autoregressive coefficients for lag i, and et is a an independent, noise process.

This process can also be written as:

(4)

where B is the backshift operator; aj are the roots of the characteristic polynomial associated with the AR(p) process, including the (R=p-2C) real roots rj, for j=2C+1,?p, and the 2C complex roots, , for j=1?2C, corresponding to quasi-periodic processes with frequency wj; ztj and atj are latent processes corresponding to the complex and real roots respectively; and bj, dj, and ej, are some real constants.

Huerta and West note that (a) state space models can be written as ARMA(p’,q’) models, that can in turn be approximated as high order AR(p) models, and (b) as per (4) one can include a certain number of quasi-periodic components determined by the order of complex roots admitted (C). These observations are interesting, because they allow one to investigate low frequency trends and quasi-periodic behavior in univariate time series such as the Devils Lake volumes where these features are of interest.

In classical linear, autoregressive modeling, the order, p, of the model is selected and fixed at some level p*, using a criteria of best fit, such as the Akaike Information Criteria (AIC). Uncertainty of the AR coefficients and its impacts on simulations are then assessed within this framework. Huerta and West take a rather different approach. They assume user specified upper bounds on C and R. For instance, we could assume that the upper bound on C is 5 based on an assumption that potentially an annual cycle, two quasi-periodic periodic components with periodicities in the 3 to 5 year range related to ENSO, a quasi-periodic component with a decadal time scale related to NAO, and a quasi-periodic component at inter-decadal scales related to the PDO, were the main components of the climate system that are likely to be seen in the Devils Lake fluctuations. The actual frequencies wj and their amplitudes rj for these components are not specified. Upper and lower bounds (typically 2*l*n/2, where l=2p/w) on the frequencies are specified, and the amplitudes are bounded in absolute value by 1. The number of real roots R, could be chosen as a suitably large number, say 10. This would imply an upper bound on p of 20 (2C+R). In the absence of any information as to the number of admissible C, R, components, one may consider larger values for the upper bounds. Instead of seeking the “optimal” values for the model order and the associated model coefficients, Huerta and West use a Bayesian approach in which a prior probability distribution (typically approximately uniform, that admits mass on unit and zero roots) is specified on the values of the model parameters, and the data is used to then develop a posterior distribution for these parameters. Admitting zero roots allows for consideration of model order uncertainty, while admitting unit roots allows nonstationary components. A Markov Chain Monte Carlo (MCMC) approach is used to identify superior model coefficients. At the end of the MCMC simulations, contingent on the upper bounds for C and R, we have posterior probability distributions associated with each of the amplitudes rj for both the complex and the real roots. Thus, one could diagnose roots for which the posterior probability mass is significantly away from 0, and hence there is useful information. The corresponding frequencies wj for such complex roots can then be highlighted. Probabilistic model forecasts for xt+T consequently encode the posterior probability distributions for each of the admissible parameter values.

The ARCOMP algorithm was applied to the Devils Lake monthly volume data and up to 10,000 simulations were generated from different starting points using the posterior probability density functions of the AR components. The Devils Lake data was transformed using a square root transform (as also done by Wiche and Vecchia, 1995) prior to the ARCOMP analysis. Exceedence probabilities of different DL levels were computed from the simulations from the end of the model-fitting period to enable comparisons with the USGS model forecasts for 2000 and beyond. Split sample analyses were used to assess the performance of ARCOMP on historical lake fluctuations using only prior period data. Various specifications of C and R, for the same maximal order p were considered. The forecasts from these combinations were generally quite comparable. The main difference is in the interpretation of the different models. When complex roots are allowed, the modes of the posterior probability distributions are predominantly at periodicities of 80, 36, 26, 18 and 12 months. Typically, the posterior distribution for the number of real processes (R) has a strong mode at 2, with some mass on 3 and 4. This suggests a fairly low order autoregressive model with some quasi-periodic components. However, the posterior modes for the dominant real roots, indicate a highly nonstationary model, particularly if data from the 1990’s is included.

Split sample results are presented here (Figure 20a) for forecasts for the model calibrated to data prior to mid-1994. One notes, that within 2 years (i.e., 1996) from the start of the forecast the Devils Lake trajectory has already gone above the 0.2% exceedence level threshold indicated from ARCOMP simulations. Note also, that the median forecast from ARCOMP is the maintenance of the 1994 volume. This feature persists for any forecast started after 1994, including one from 1999 conditions. Somewhat different forecasts are realized for the case where only real roots are admitted. The lake now tends to its average state in the long run, as with the Wiche and Vecchia forecasts, but the extreme exceedence probabilities still indicate that the lake will spill over for the entire future projection. Given the posterior density estimates of the AR model structure and coefficients, and the forecasting results, we have concluded that the primary information that can be gleaned from this analysis is that a linear autoregressive structure (even allowing for high order models to approximate quasi-periodic and low frequency trends as exhibited by some state space and ARMA models) is likely inappropriate for describing the Devils Lake monthly volume process. The indicated uncertainty bands are very wide and perturbations in the assumptions translate into quite different median forecasts. The nonstationarity of the system is emphasized by the near unit root solutions to the parameter vectors. The Wiche and Vecchia decomposition of the lake volume data into an annual mean and difference process is a clever decomposition of the data that may deal with some aspects of the nonstationary data better than the direct application. One can think that the ARCOMP procedure essentially generalizes the Wiche and Vecchia procedure by considering model and parameter uncertainty in an ARMA modeling framework. Given the limited success of the Wiche and Vecchia procedure, and the indications from the direct application of ARCOMP to the raw DL data, we conclude that there is a need for developing new modeling tools that can represent such apparently nonstationary processes from relatively short data sets. Interestingly, split sample experiments with ARCOMP using data prior to the recent rise (e.g., up to 1980), were better able to forecast the next several years. This suggests that the 1990’s regime shift from prior conditions is not adequately modeled by a linear model.

Figure 20. ARCOMP probabilistic forecasts of lake level exceedence based on 1,000 to 10,000 simulations from models using monthly Devils Lake volume data up to (a) 1994, (b) 1999, and (c) 1999 considering only real roots.

Summary

The main results from our investigations are:

1) The recent trend in the Devils Lake volume is likely a consequence of changes in the seasonality of annual rainfall, and may be determined to a great degree by increases in summer and fall precipitation, that are associated with corresponding changes in the atmospheric circulation for those seasons, that are manifested as decreases in the regional atmospheric pressure. The seasonality of the factors at play in the recent period is different than for the Great Salt Lake and the Devils Lake in the 1980?s. Winter/spring/fall precipitation changes were a more dominant influence in that period, but for a shorter duration.

2) These changes in atmospheric circulation and regional precipitation have large spatial structure and are not likely due to increases in local convection and moisture recycling related to local conditions. There is some evidence that a combination of factors related to Pacific and Atlantic ocean-atmosphere oscillations is important. The summer-fall precipitation in the larger region that exhibits the consistent precipitation anomaly structure for the wet and dry periods for Devils Lake is influenced to some extent by features such as the night-time low-level jet, the Southwest monsoon, and northern frontal systems that bring ocean moisture to the region. These transient features are not directly reflected in the monthly atmospheric data that we analyzed, and had access to. Consequently, the correlative analyses used with climate indices and atmospheric pressure time series are inferential and diagnostic, rather than causal. It may be useful to pursue more direct investigations to better pin down the climate mechanisms responsible.