Probabilistic forecasts of PV power with meteorological forecasts

Get Complete Project Material File(s) Now! »

Forecasting photovoltaic power production with meteorological forecasts

Context

This thesis aims at improving forecasts of photovoltaic (PV) power production. To be more specific, the company “Electricité de France” (EDF) is particularly interested in forecasting PV power for several geographical areas, in metropolitan France or for insular electric systems. For example, we model an estimation of the total production of Metropolitan France, the total production of Corsica Island and the total production of Réunion.
The installed capacity of renewable electricity is growing in France. The need for forecasts of renewable energy production grows with the amount of renewable energy delivered to the electrical grid. Over the past decade, the installed capacity of wind and solar power drastically increased. The current annual growth rate is close to 10%, see Table 1.1. Indeed, the installed capacity of solar power increased in France by 899 MW in year 2015 and 576 MW in 2016, to reach the amount of 6.8 GW. Over the same periods, the installed capacity of wind power increased in France by 1011 MW in year 2015 and 1345 MW in 2016, to reach the amount of 11.67 GW. This strong growth is significant compared to the total installed capacity of 129 GW at the beginning of 2016, with above 63 GW of nuclear capacity and above 25 GW of hydropower capacity. These numbers need to be put into perspective, because wind and solar are intermittent energy sources. One gigawatt of PV power is not equivalent to one gigawatt of nuclear power, at least in terms of load factor. The load factor is the ratio of the produced energy and the energy that would have been produced if the production level was maximal. In fact, the load factor of wind and solar power are quite low compared to the load factor of nuclear power plants (around 75%). The average load factor of solar power is of 15% with a strong seasonality (6% in December and 20% in summer). Besides, the load factor of wind power is around 24% (30-33% in winter and 15-20% in summer). We refer the interested reader to the periodic report “Overview of Renewable Electricity” ∗ (in French “Panorama de l’électricité renouvelable”), and to the “Forecast assessment of electricity supply-demand balance” † (in French “Bilan prévisionnel de l’équilibre oﬀre-demande d’électricité en France”), both published by the French Transmission System Operator “Réseau de Transport d’électricité” (RTE).
What is at stake in renewable energy forecasting ?
The production forecasts are achieved at various geographical and temporal scales, depending on the production target. For example, the production can be local at the scale of a production unit, or at a larger geographical scale, i.e. a sum of local productions.
For operational purposes, Transmission System Operators and Distribution System Operators need production forecasts to anticipate constraints on the electrical grid, generated by supply-demand imbalances. For financial purposes, Balance Responsible Entities (BR) also need production forecasts. A BR must declare its injections and extractions of power on the grid, with constraints on its so-called balance perimeter.
These constraints ensure the whole system to be balanced. When injections and extrac-tions do not match the BR declaration, the BR suﬀers financial penalties. The BRs can trade electricity on the market to ensure the balance of its perimeter or take benefits from the market price. Therefore, improved production forecasts on their perimeter means better market positioning and avoided penalties for BRs. Historically in France, EDF was the main BR with solar and wind production inside its balance perimeter due to the mechanism of feed-in tariﬀs. The trends of the market evolution is that wind and solar energy producers will introduce their own product on the market, or sell it directly to a BR.

Weather forecasting

Wind and solar power productions are strongly dependent on the weather, hence wind and solar forecasts resort to numerical weather predictions (NWPs) for lead times between a few hours to several days.
How numerical weather predictions are generated ? NWPs are produced with data assimilation schemes, the most famous being the Kalman filter [Kal60]. Data as-similation integrates several elements: a state vector, a model, a priori knowledge, observations, an observation function and control parameters, as represented in Fig-ure 1.1.
In practice, several meteorological forecasting systems are used in this thesis. Their current characteristics are summarized in Table 1.2. Besides the model and the data assimilation scheme, the forecasts may diﬀer in their covered geographical areas, their lead time, and their spatial and temporal resolutions. Ensemble forecasts are tradi-tionally generated with the same model as a deterministic forecast, but at a coarser resolution with slight changes in the model physics, in the parameterizations, or with diﬀerent perturbations of the initial conditions. Therefore ensemble forecasts integrate diﬀerent sources of errors and quantify the forecast uncertainty. All along this thesis, we show the benefits of using several forecasts. This approach begins with the variety of weather forecasts that we use as inputs.
Météo France generates the global model Arpège, from which the 34 ensemble fore-casts PEARP are derived. Météo France also provides mainly for France the local model AROME, designed to forecast severe weather events such as heavy rains in the south of France.
the cost of limited geographical extension and limited lead time of two days. We also use atmospheric forecasts from ECMWF: the deterministic forecast HRES and the en-semble of forecasts ENS. For solar forecasting with a 6-h temporal resolution, TIGGE (THORPEX Interactive Grand Global Ensemble) ensembles from several meteorolog-ical centers were used. The detailed description of TIGGE ensembles is postponed to Chapter 2.

Solar radiation forecasting

For PV applications, the main meteorological fields are irradiance (or solar radiation) and 2-m temperature. Irradiance is directly related to the energy produced by the solar panels and high temperature at ground-level decreases the panel eﬃciency.
The position of the Sun compared to the position of the Earth controls directly the top-of-atmosphere irradiance. Along one year, the evolution of these positions generates the diurnal and the seasonal cycles. The diurnal cycle facilitates solar forecasting, since we know that every day begins with a null level of radiation. For a clear sky day, the main challenge is to estimate the bell-shape of solar radiation and the maximal reached value. The seasonal cycle is clearly depicted with monthly maximal value of solar radiation, see Figure 1.2. The ratio between the maximal value of January and June almost reach a factor 4. We highlight the fact that solar radiation data are time-averaged, and usually indicate the average flux over the past period. For example, solar radiation at 12:00 with a 3-h temporal resolution is the average flux between 9:00 and 12:00.
From top-of-atmosphere to ground-level, solar radiation encounters many processes (absorption, scattering, reflection) mainly occurring in clouds. Hence irradiance maps are closely related to nebulosity maps. Cloud cover and nebulosity are often used as a proxy for solar radiation. Solar radiation maps of HelioClim (real-time estimations from satellite data), AROME and HRES forecasts (12-h lead time) are shown in Figure 1.3, for 2013-06-13 at 12:00 (UTC).
while the HRES map is a 3-h average. The spatial variations of irradiance are quite high since very cloudy areas may receive less than one third of the energy received by clear-sky areas. These variations are seen by the forecasts at the scale of a few tens of kilometers, or even less. The tremendous impacts of orographic eﬀects in mountains areas with many scattered clouds can be seen in Figure 1.3.
We see that spatial resolution is a key-issue in solar forecasting. At the scale of a grid-point, one may ask whether the level of irradiance is correctly forecasted. However, at the scale of a regional map, one may ask whether the clouds are modeled at the correct time and location. Indeed, a cloud may be “seen” by the model, but located with an error of 50 km. As a conclusion, we note that:
— Observations may be local (ground-station measurements) or estimated with satellite data. Satellite data provide a better spatial representation, but may be more prone to calibration errors than local measurements.
— We need a validation criterion to assess the quality of a forecast. This validation criterion may depend on the spatial scale of the validation.
— The forecasted average over several grid points may be the best estimation for comparisons at the grid-point scale.
Using high-resolution forecasts such as AROME is a delicate task, see our contribu-tion in Chapter 6.

PV power data and statistical modeling

As opposed to irradiance maps, production data of a PV power plant are localized. For lead times of several hours to several days, we usually work at a resolution of 30 min or 1 h. Averaging in time production data smooths the variability of the data, as shown in Figure 1.4, and facilitates the work of the forecaster. Indeed, the local production peaks are barely seen after time-averaging. The forecaster faces a dilemma when the production data show a large uncertainty. Should they attempt to forecast the average, or should they attempt to forecast local peaks although the prediction may not be at the correct time ?
Weather forecasts are converted to PV forecasts with statistical or physical models. Besides weather variables, PV power production rely on the technology of the solar panels and on the incident angle of solar radiation on the solar panels. Hence physi-cal models include the panels orientation and inclination. Solar radiation data are a good starting point to build a statistical model. Using solar radiation estimates from HelioClim (and not forecasts), we see in Figure 1.5 that linear models are a good ap-proximation of the relationship between solar radiation and PV production, for a period of nearly 20 days. Here we simply applied a multiplicative factor to the solar radia-tion estimates. Advanced models often use clear-sky production profiles and clear-sky radiation profiles computed for each day of the year. In this setting, weather forecasts provide indications on possible production decreases along the day. The production P and solar forecasts I are converted respectively to the clear-sky indices τP and τI :
where clear sky production Pcc and clear sky solar radiation Icc are the production and solar radiation in clear sky conditions. A statistical regression estimates the parameters ai between weather forecasts and the forecasts τcP of clear-sky index:
In this example, the non-linear term τI2 and the total cloud cover T cc are added as explanatory variables. The statistical models of our PV forecasts are detailed in Chap-ter 5.
In practice, the forecaster encounters major diﬃculties:
— The forecaster cannot determine the best set of parameters for the current period, but only for a past period. This may result in drifting eﬀects or inaccurate parameter estimates due to inter-year variability.
Figure 1.5 – Time-series of Helioclim (red line) and production data (black dots), with simple scaling of the data (for one plant in April 2013).
— Solar radiation forecasts may be quite far from the reality. Systematic biases such as seasonal biases may be corrected with statistical models taking into account the hour of the day, the time in the year and the lead time of the forecasts. However, specializing the statistical model increases the amount of model parameters, or reduces the available amount of data to fit several models.
— PV power observations are not always available, due to data corruption or lack of metering.
Interestingly, the forecaster can use statistical methods as downscaling methods. For example, statistical models are built between forecasts at a 3-h resolution and obser-vations at a 30-min resolution in order to provide forecasts at the 30-min resolution. Thanks to the fine resolution of the observations, it is possible to go under the resolution of the forecasts. The same principle is applied by using HelioClim as solar radiation observations to improve the spatial resolution of the forecasts in Chapter 2.

Probabilistic forecasts of PV power with meteorological forecasts

READ Uncertainty of transportation time due to stochastic degradation

Using multiple grid points gives a good insight into the spatial variability in the irradiance maps, see Figure 1.6. We show production data and solar forecasts from AROME, normalized by the daily maximum value. In a clear-sky situation (1.6(a)), the forecasts of nearby grid-points are almost identical. But in a situation with large uncertainties on the cloud cover (1.6(b)), the spatial variability of the solar forecasts is very high. For this particular day, the information of cloud cover variability is correct and noticeable in the production data. Our point here is to illustrate that using only one single forecast may not be appropriate, and that solar forecasts integrate complex spatio-temporal information.
Probabilistic forecasts acknowledge the inability of the forecaster to provide a priori a perfect estimate of the observation, that we consider here noiseless. Probabilistic forecasts are conveniently described by a Cumulative Distribution Function (CDF). They give the probability (according to the forecaster) that the observation is below a threshold. Confidence intervals are derived from the CDF of the forecaster. For example, the forecaster says that there is 5% chance that the production is below the number a and 95% chance that the observation is below the number b. The interval [a, b] is then a 90% confidence interval.
We illustrate probabilistic forecasts for PV applications with a 30-min temporal res-olution in Figure 1.7. For two consecutive days, the forecaster delivers a probabilistic forecast and a deterministic forecast. The median of the probabilistic forecasts mainly indicates a change in the level of production between the two days. With the median only, one cannot know whether a large intraday variability is expected. This infor-mation is clearly given by the confidence intervals, which are much larger for the first day than for the second day. In this example, the observed production has indeed a larger intraday variability during the first day. The large confidence intervals reflect the inability of the forecaster to describe precisely the observation variations. Besides, the deterministic forecast provides additional information, such as production decreases during the first day.

Sequential aggregation

Context

In this section, we introduce online learning in the expert setting. Just before time t > 0, a forecaster aims to deliver the best possible prediction for the observation yt ∈ Y, while an advice of M experts xm,t ∈ X is given to the forecaster. In other words, the forecaster receives several forecasts (or experts) and wishes to make an optimal use of this information. In real-world applications, the forecaster may also build experts and not only receive them. The key-point of sequential aggregation is nevertheless to ensure performance guarantees, given a set of arbitrary forecasts.
We work in the deterministic setting of individual sequences : no assumptions are made on the observations yt or on the experts xm,t. This setting is particularly in-teresting, because the performance guarantees described below are ensured no matter the received data (observations and forecasts). The methods providing performance guarantee with individual sequences are therefore intrinsically robust, see Cesa-Bianchi and Lugosi [CL06] for an in-depth analysis.
In practice, the forecaster gives the weight um,t to the expert xm,t, and provides the forecast st(ut, xt), where the function st can be arbitrarily chosen by the forecaster.
Strict propriety. We are interested in a strictly proper scoring rule in the context of probabilistic forecasting, see Gneiting and Raftery [GR07]. Say a forecaster provides a probabilistic forecast to predict y. We consider that y is a random variable described by the CDF F. The quality of the forecast is measured by a score S(G, y), where G is a CDF describing the probabilistic forecast. The scoring rule is said to be proper if Ey[S(F, y)] 6 Ey[S(G, y)] and strictly proper if the strict inequality holds. In other words, with a strictly proper scoring rule, the average score is minimized only by the correct distribution F.
Locality. Let {A1, . . . , AK } be a partition of K possible events. By definition, two events Ai and Aj cannot occur simultaneously. The forecaster delivers p = (p1, . . . , pK ) reflecting an opinion on the probability of occurrence of each event. If the event Ai occurred, a local scoring rule of p depends only on pi . In other words, a scoring rule is said to be local if the score only depends on the predicted probability of the event that actually occurred. Locality of the scoring rule is not always a desired property [BS07b]. The logarithm score was shown to be a natural choice of strictly proper local scoring rule in the non-binary case (K > 2), see Bernardo [Ber79] and Benedetti [Ben10] for example.
Why do we use non local strictly proper scoring rules ? Our objective is to provide probabilistic forecasts. To do so, we work with an ensemble of forecasts, which is conveniently described by a CDF. The corresponding probability density function is a combination of Dirac distributions, reaching 0 almost everywhere. The events “y is between θk and θk + dθk” are therefore poorly predicted, and local scoring rules are hardly applicable in our case.
We introduce the case of binary events in Section 1.3.1, and probabilistic scores for continuous variables in Section 1.3.2. We follow here Shuford et al. [SAE66], Schervish [Sch89], and Buja et al. [BSS05] to give a simple explanation on the construction (but not the choice) of non-local strictly proper scoring rules. A discussion on the choice of strictly proper scoring rule may be found in Merkle and Steyvers [MS13] and Lerch et al. [Ler+15]. A thorough analysis of non local strictly proper scoring rules and the decomposition with quantile and thresholds is given by Ehm et al. [Ehm+16]. A simple example is given in Section 1.3.3, to provide a better understanding of the evaluation of a probabilistic forecast by the celebrated CRPS.

Binary case

We start from the binary case, where the forecaster wishes to predict the output of a single event and delivers the probability p of occurence of the event according to his opinion. A scoring rule ` for binary event is local and of the form `(p) = 1evS1(p) + (1 − 1ev)S0(1 − p) , where 1ev = 1 if the event occurs and 0 otherwise. The forecaster suﬀers S1(p) when the event occurs and S0(1 − p) when the event does not occur.
where p is the list of predicted probabilities pk for the events, the weights φk > 0 define the importance of the thresholds θk, and H is the Heaviside step function. We adopt the bold notation p to describe the list or vector with coordinates pk. The events “y is below the threshold θk” are not a partition of events, consequently this scoring rule is not local. If the dimension of the weights φk and the dimension of the observation y coincide, the loss `(p, y) has the same dimension as y.
In the continous case, for each threshold θ, the weighting φk is related to φ(θ)dθ.

Examples with the CRPS

In this section, we provide simple examples with the CRPS, as an evaluation tool. The idea is that for a fixed value of the observation y, several CDFs may reach the same score. We illustrate how these CDFs are related to each other.
In Figure 1.9(a), we show the map of the CRPS for CDF of normal distribution N (µ, σ2) with mean µ and variance σ2, for the observation y = 0. For a fixed value of the standard deviation σ, we see that the lowest value of the CRPS is reached for µ = y = 0. Besides, for a fixed value of the mean µ, the lowest value of the CRPS is not reached for the lowest spread σ = 0. Instead, the lowest CRPS value is reached when the spread σ is slightly higher than the error |µ − y| on the mean. The idea behind is that, for an incorrect forecast with an error on the mean, the spread of the distribution lowers the eﬀect of the error |µ − y| > 0.
We compare CRPS values with the case N (1, 1) in Figure 1.9(b). The same CRPS values are obtained for ga = N (0.6022, 0), gb = N (1, 1) and gc = N (0, (2.577)2), with respective CDFs Ga, Gb and Gc. The distribution ga has no spread and a small error of 0.6 on the mean, while the distribution gc has a high spread but no error on the mean. We give an illustration of the CDFs Ga, Gb and Gc, the PDFs ga, gb and gc and the errors (Ga − Hy)2, (Gb − Hy)2, (Gc − Hy)2 in Figure 1.10. The strong eﬀect of the square in the CRPS is quite noticeable on the representation of (Gb − Hy)2, and more generally on the domains where |G(θ) − H(θ − y)| 1.
Figure 1.10 – Representation of the PDFs ga = N (0.6022, 0), gb = N (1, 1) and gc = N (0, (2.577)2) (top left), and their respective CDFs Ga, Gb and Gc (top right), reaching the same CRPS of 0.602 at 0.001 precision (for y = 0). The squared diﬀerence between Hy the Heaviside function centered on y and respectively Ga, Gb and Gc are shown at the bottom.
very sensitive to the distribution tails. On this subject, we address in Chapter 4 the question of other scoring rules than the CRPS.

Table of contents :

1 Introduction
1.1 Forecasting photovoltaic power production with meteorological forecasts
1.1.1 Context
1.1.2 Weather forecasting
1.1.3 Solar radiation forecasting
1.1.4 PV power data and statistical modeling
1.1.5 Probabilistic forecasts of PV power with meteorological forecasts
1.2 Sequential aggregation
1.2.1 Context
1.2.2 Algorithm evaluation with regret bounds
1.2.3 Online learning algorithms
1.2.4 Examples of loss functions
1.3 Probabilistic forecasting with non-local strictly proper scoring rules
1.3.1 Binary case
1.3.2 Ranked and continuous case
1.3.3 Examples with the CRPS
2 Ensemble forecast of solar radiation using TIGGE weather forecasts and HelioClim database
2.1 Introduction
2.2 Analysis of TIGGE solar radiation and HelioClim database
2.2.1 Description of TIGGE data
2.2.2 Analysis of the TIGGE ensembles of forecasts
2.2.3 Reference performance measures
2.2.4 Comparison with HelioClim
2.3 Ensemble forecast strategy: sequential aggregation
2.3.1 Notation
2.3.2 Sequential aggregation: method
2.3.3 Algorithm
2.4 Application
2.4.1 Experiment setup
2.4.2 Results
2.5 Conclusion
2.A.1 Methods
2.A.2 Numerical results
3 Online learning with the CRPS for ensemble forecasting
3.1 Mathematical background
3.1.1 Bibliographical remarks
3.1.2 The Continuous Ranked Probability Score (CRPS)
3.1.3 The ensemble CRPS
3.1.4 Bias of the ensemble CRPS with underlying mixture model
3.1.5 Mixture model described by classes of members
3.2 Online learning methods
3.2.1 Theoretical background
3.2.2 Ridge regression
3.2.3 Exponentiated gradient
3.3 Numerical example
3.3.1 Simple model
3.3.2 Experiments without online learning
3.3.3 Experiments with weight updates
4 Scoring and learning forecasts densities
4.1 Extension to threshold-weighted and quantile-weighted scoring rules
4.1.1 Effect of threshold-weighting
4.1.2 Effect of quantile-weighting
4.2 Probabilistic forecasting with observational noise
4.2.1 Generalized least square with the CRPS
4.2.2 Discussion and further work
5 Application of online CRPS learning to probabilistic PV power forecasting
5.1 Methods
5.1.1 Production and meteorological data
5.1.2 Conversion of meteorological forecasts to production forecasts
5.1.3 Quantile forecasts
5.1.4 Linear opinion pools
5.2 Evaluation
5.2.1 The CRPS
5.2.2 Other diagnostic tools
5.3 Online learning with the CRPS
5.3.1 Background
5.3.2 Example of general algorithm
5.3.3 ML-Poly
5.4 Application
5.4.1 Experiment setup
5.4.2 Results
6 PV probabilistic forecasts with the AROME high resolution forecasts
6.1 Building an ensembles of forecasts from AROME forecasts
6.1.1 Leveraging the high spatio-temporal resolution
6.1.2 First sequential aggregation results with AROME meteorological experts
6.1.3 Adding rolling quantiles experts
6.2 Sequential aggregation results with AROME statistically calibrated experts
6.2.1 Improvements with rolling quantile experts
6.2.2 Comparison of AROME with other forecasts from Météo France and ECMWF
6.3 Discussion and perspectives
7 PV probabilistic forecasts with intraday updates for insular systems
7.1 Intraday PV updates experimental setup
7.1.1 Operational forecasts
7.1.2 Building new forecasts with intraday updates
7.1.3 Online learning experiment
7.2 Results
7.2.1 Time-series, spread and weights
7.2.2 Probabilistic forecasts performance and calibration
Appendix 7.A Empirical results of quantile-weighted scoring rules with realworld data
8 Thesis conclusions