Paper 1 title:#
Deterministic and Probabilistic Wind Power Forecasts by Considering Various Atmospheric Models and Feature Engineering Approaches
Executive summary: The authors use three kind of numerical weather prediction (wind speed) rather than the single wind speed from the anemometer to serve as the feature engineering. The kernel of this paper is to use the specific data source to construct the feature and hence. What is useful for me is that we can consider the feature engineering in our work and cite this to alleviate the boring description. For reference, I think the following reference are of value:
- A New Fuzzy-Based Combined Prediction Interval for Wind Power Forecasting
Technical description:
PI coverage probability: show the percentage of the probability targets which will be covered by the upper and lower bound.
where \(N\) is the number of samples and \(c_t\) is a Boolean value that is evaluated as follows:
where \(y_t\) is the forecast target and \(U_t\) and \(L_t\) are upper and lower bounds of the interval, respectively.
PI normalized average width (PINAW): limit the high extra growth of the interval
where \(R\) is the range of the underlying targets used for normalizing PIs.
The LUBE method could be regarded as a constrained nonlinear optimization problem with conflicting objective as follows:
Objectives :
Maximize: \(\operatorname{PICP}(w)\) Minimize: PINAW \((w)\) Constraints :
PINAW \((w) \succ 0\)
This is resolved by the following method:
where \(n\) is the number of objectives (here \(n=2\) ), \(\mu_{f, i}\) is the membership function value of the \(i\) th objective, \(\Omega\) is the problem search space; \(X\) is the control vector including the \(\mathrm{NN}\) weighting factors, and \(\mu_{\mathrm{ref}, i}\) is the reference membership value for \(i\)-th objective.
Paper 2 Title:#
Privacy-preserving Spatiotemporal Scenario Generation of Renewable Energies: A Federated Deep Generative Learning Approach
Executive summary: The authors want to use the federated learning with a central server to generate the scenarios for the wind power. The authors use the federated learning and the least square generative adversarial networks (LSGANs) for renewable scenario generation. What I think is useful for me is the concept of scenario generation and the application of federated learning. There are some references that I think is interesting:
Technical description:
Generative adversarial networks: GAN contains discriminator and generator, the generator is used to generated samples and the discriminator is used to judge the input data whether historical data or the generated data as much as possible.

then the output of the discriminator network is
and the loss function of generated and discriminator are
where \(P_Z\) is a known distribution that is easy to sample. Then the mini-max game model with value function \(V_{\mathrm{GANs}}(G, D)\) is given by
Federated Learning: Suppose there are \(N\) clients, i.e. participating edge devices \(\left\{\mathcal{C}_1, \mathcal{C}_2, \ldots, \mathcal{C}_N\right\}\) based on their own dataset \(\left\{\mathcal{D}_1, \mathcal{D}_2, \ldots, \mathcal{D}_N\right\}\)
- traditional way: put all data together and train a big model
- Federated learning coordinates clients to train a global model \(\mathcal{M}_{\mathrm{FED}}\) deployed on a central server, not collecting all data.
\(\delta\)-accuracy loss: assuming that \(\mathcal{V}_{\text {SUM }}\) and \(\mathcal{V}_{\mathrm{FED}}\) are the performance metrics of the centralized model \(\mathcal{M}_{\text {SUM }}\) and federated model \(\mathcal{M}_{\text {FED }}\) , then \(\left|\mathcal{V}_{\mathrm{SUM}}-\mathcal{V}_{\mathrm{FED}}\right|<\delta\)

Global LSGANs Model:
- Traditionally, the generator is fixed, the optimal discriminator is as follows:
- New: Substitute the above equation into
then we could get
There are some drawbacks of the GAN, then least square-GAN are proposed. use \(a-b\) encoding and the least squares loss function, then the objective function of LSGAN is
The for the generator \(G\), the optimal discriminator \(D\) is
If we choose \(b-c=1\) and \(b-a=2\), then we could get
where \(\chi_{\text {Pearson }}^2\) is the Pearson \(\chi^2\) divergence. If \(b-c=1\) and \(b-a=2\) are satisfied, (8) is equivalent to minimize the Pearson \(\chi^2\) divergence.

network configuration: activation function, ReLU and LeakyReLU activation functions
Then we consider the FederatedAveraging (FedAvg) algorithm. This algorithm is proposed in this paper: Communication-Efficient Learning of Deep Networks from Decentralized Data
The major difference between federated optimization and distribution optimization.
-
Non-IID: any particular user's local dataset will not be representative of the population distribution
-
Unbalanced: Some users will make much heavier use of the service
-
Massively distributed:
-
Limited communication, Mobile devices are frequently offline or on slow or expensive connection.
For a machine learning problem, we typically take
We assume that there are \(K\) clients and \(\mathcal{P}_k\) the set of indexes of data points on client \(k\), with \(n_k = \lvert \mathcal{P}_k \rvert\), then
three key parameters: \(C\), the fraction of clients that perform computation on each round; \(E\), then number of training passes each client makes over its local dataset on each round; and \(B\), the local minibatch size used for the client updates. We write \(B=\infty\) to indicate that the full data are used.
Then the algorithm could be (there is no contribution, except the GANs optimization part.)
Correlation Analysis:
where \(S\) is a random time series; \(\mu\) and \(\sigma\) denote the mean and variance of \(S\), respectively; and \(\tau\) is the time lag.
We use the continuous ranked probability score (CRPS) which measures the dissimilarity of the cumulative distributions between generated scenarios and historical observations.
The score at lead time \(\ell\) is defined as
where \(M\) is the total number of scenarios, \(\widehat{F}_{t+l \mid t}(\xi)\) denotes the cumulative distribution function of normalized scenario, and \(\mathbf{1}\left(\xi \geq \xi_{t+l}\right)\) is the indicator function for comparing scenarios and observation.
Fréchet inception Distance (FID)
where \(\mu_d\) and \(\mu_g\) represent the empirical mean; \(\Sigma_d\) and \(\Sigma_g\) are empirical covariance.
Kernel Maximum Mean Discrepancy (MMD): measures the difference between \(P_d\) and \(P_g\) for some fixed kernel function \(k\), which is defined as
The 1-Nearest Neighbor classifier
Energy Score (ES)
\(\varsigma\) is the real renewable power output, \(\xi_i\) is the \(i\)-th generated time series scenario and \(M\) denotes the number of scenarios.
Pearson correlation coefficient \(\rho\) of two time series \(S_i\) and \(S_j\) is