A Review of Extreme Value Threshold Estimation and Uncertainty Quantification

The threshr package deals primarily with the option of thresholds for utilise in extreme value modelling. The underlying methodology is described in detail in Northrop, Attalides, and Jonathan (2017). Bayesian get out-one-out cross-validation is used to compare the extreme value predictive performance resulting from each of a set of thresholds. This assesses the trade-off betwixt the model mis-specification bias that results from an inappropriately low threshold and the loss of precision of estimation from an unnecessarily high threshold. There many other approaches to accost this bias-variance trade-off. See Scarrott and MacDonald (2012) for a review.

At the moment simply the simplest example, where the information tin can be treated equally contained identically distributed observations, is considered. In this example the model used is a combination of a binomial distribution for the number of exceedances of a given threshold and a generalized Pareto (GP) distribution for the amounts, the threshold excesses by which exceedances prevarication above a threshold. We refer to this as a binomial-GP model. Hereafter releases of threshr will tackle more general situations.

We use the function ithresh to compare the predictive performances of each of a set of user-supplied thresholds. We also perform predictive inferences for future extreme values, using the predict method for objects returned from ithresh. These inferences tin be based either on a single threshold or on a weighted boilerplate of inferences from multiple thresholds. The weighting reflects an estimated measure of the predictive performance of the threshold and tin besides incorporate user-supplied prior probabilities for each threshold.

A traditional simple graphical method to inform threshold choice is to plot estimates of, and confidence intervals for, the GP shape parameter \(\11\) over a range of thresholds. This plot is used to choose a threshold above which the underlying GP shape parameter may be approximately constant. Run across Chapter 4 of Coles (2001) for details. Identifying a single threshold using this method is usually unrealistic but the plot can point to a range of thresholds that merit more sophisticated analysis. The threshr function stability produces this blazon of plot.

Cantankerous-validatory predictive performance for i.i.d. data

Nosotros provide a brief outline of the methodology underlying ithresh. For total details see Northrop, Attalides, and Jonathan (2017). Consider a set of grooming thresholds \(u_1, \ldots, u_k\). The validation threshold \(five = u_k\) defines validation data: indicators of whether or not an ascertainment exceeds \(v\) and, if it does, the amount by which \(v\) is exceeded. For a given preparation threshold leave-1-out cross-validation estimates the quality of predictive inference for each of the individual omitted samples based on Bayesian inferences from a binomial-GP model. Importance sampling is used to reduce ciphering fourth dimension: only 2 posterior samples are required for each training threshold. Simulation from the posterior distributions of the binomial-GP parameters is performed using the revdbayes parcel (Northrop 2017).

In the first release of threshr the binomial probability is assumed to exist independent of the parameters of the GP distribution a priori. This volition be relaxed in a after release. The user can choose from a selection of in-built prior distributions and may specify their own prior for GP models parameters. By default the Beta(i/ii, 1/two) Jeffreys' prior is used for the threshold exceedance probability of the binomial distribution and a generalization of the Maximal Data Information (MDI) prior is used for the GP parameters. See the documentation of ithresh and Northrop, Attalides, and Jonathan (2017) for details of the latter.

We use the tempest peak pregnant wave heights datasets analysed in Northrop, Attalides, and Jonathan (2017) from the Gulf of Mexico (gom, with 315 observations) and the northern Northward Sea (ns, with 628 observations) to illustrate the code. There should be enough exceedances of the validation threshold \(5 = u_k\) to enable the predictive performances of the training thresholds to be compared. Jonathan and Ewans (2013) recommend that when making inferences about a GP distribution in that location should be no fewer than 50 exceedances. We bear this rule-of-pollex in heed when setting the vectors of training thresholds beneath.

                                          library(threshr)                                            # Gear up the size of the posterior sample simulated at each threshold                            n <-                                10000                                                          ## North Body of water significant wave heights                                                          # Ready a vector of preparation thresholds                            u_vec_ns <-                                quantile(ns,                probs =                seq(0.1,                0.85,                past =                0.05))                              # Compare the predictive performances of the training thresholds                            ns_cv <-                                ithresh(data =                ns,                u_vec =                u_vec_ns,                n =                n)                                            ## Gulf of Mexico significant wave heights                                                          # Set a vector of training thresholds                            u_vec_gom <-                                quantile(gom,                probs =                seq(0.i,                0.viii,                past =                0.05))                              # Compare the predictive performances of the training thresholds                            gom_cv <-                                ithresh(information =                gom,                u_vec =                u_vec_gom,                n =                n)                      

The default plot method for objects returned past ithresh is of the estimated measures of predictive functioning, normalized to sum to 1, against training threshold. Come across equations (vii) and (14) of Northrop, Attalides, and Jonathan (2017).

                                          plot(ns_cv,                lwd =                2,                cex.axis =                0.eight)                              mtext("North Bounding main : significant wave height / k",                side =                iii,                line =                2.five)                                            plot(gom_cv,                lwd =                2,                cex.centrality =                0.8)                              mtext("Gulf of Mexico: meaning wave superlative / m",                side =                iii,                line =                two.5)                      

The summary method identifies which training threshold is estimated to perform best.

                                          summary(ns_cv)                              #>        5 v quantile best u all-time u quantile index of u_vec                                            #> 1 5.6972         85  2.204              25              4                                            summary(gom_cv)                              #>       v v quantile all-time u best u quantile index of u_vec                                            #> 1 iv.607         80 3.3878              60             11                                    

The plot method can also produce a plot of the posterior sample of the GP parameters generated using a training threshold chosen by the user, e.thou. the statement which_u = 5 specifies the fifth chemical element of the vector of grooming thresholds, or using the best threshold, every bit beneath.

                                          # Plot of Generalized Pareto posterior sample at the all-time threshold                                            # (based on the lowest validation threshold)                                            plot(ns_cv,                which_u =                "best")                              plot(gom_cv,                which_u =                "best")                      

Predictive inference for future extremes

Let \(M_N\) denote the largest value to be observed in a time period of length \(N\) years. The predict method for objects returned from ithresh performs predictive inference for \(M_N\) based either on a unmarried training threshold or on a weighted average of inferences from multiple training thresholds.

Single training threshold

By default the threshold that is estimated to perform best is used. A different threshold can exist selected using the argument which_u. Using type = "d" produces the predictive density function. The values of \(N\) can be gear up using n_years. The default is \(N = 100\).

                                                # Predictive distribution role                                best_p <-                                    predict(gom_cv,                  n_years =                  c(100,                  g),                  type =                  "d")                                  plot(best_p)                          

Inferences averaged over multiple thresholds

This option is selected using which_u = "all". The user tin can specify a prior probability for each threshold using u_prior. The default is that all thresholds receive equal prior probability, in which case the weights applied to private preparation thresholds are those displayed in the threshold diagnostic plot above. The default, type = "p" produces the predictive distribution part. If which_u = "all" and then n_years must have length ane. The default is \(N = 100\).

                                                ### All thresholds plus weighted boilerplate of inferences over all thresholds                                all_p <-                                    predict(gom_cv,                  which_u =                  "all")                                  plot(all_p)                          

As we expect, the estimated distribution office obtained past the weighted boilerplate over all thresholds lies between the pointwise envelope of the curves of the private thresholds.

References

Coles, Southward. One thousand. 2001. An Introduction to Statistical Modelling of Extreme Values. London: Springer.

Jonathan, P., and Grand. Ewans. 2013. "Statistical Modelling of Extreme Ocean Environments for Marine Design : A Review." Sea Technology 62: 91–109. https://doi.org/10.1016/j.oceaneng.2013.01.004.

Northrop, P. J., N. Attalides, and P. Jonathan. 2017. "Cross-Validatory Extreme Value Threshold Choice and Uncertainty with Application to Bounding main Tempest Severity." Journal of the Majestic Statistical Society: Series C (Applied Statistics) 66 (1): 93–120. https://doi.org/10.1111/rssc.12159.

Scarrott, C., and A. MacDonald. 2012. "A Review of Extreme Value Threshold Estimation and Dubiety Quantification." REVSTAT - Statistical Journal 10 (1): 33–60. https://www.ine.pt/revstat/pdf/rs120102.pdf.

thomsonlinny1965.blogspot.com

Source: https://cran.r-project.org/web/packages/threshr/vignettes/threshr-vignette.html

0 Response to "A Review of Extreme Value Threshold Estimation and Uncertainty Quantification"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel