^{1}

^{*}

^{1}

^{1}

^{2}

In this work we describe the implementation and analysis of different optimization algorithms used for finding the best set of parameters for a Fuzzy Inference System intended to classify solar flares. The parameters will be identified among a universe of possible solutions for the algorithms, and the system will be tested in the particular case of dealing with the aim of classifying the solar flares.

Se describe la implementación y análisis de diferentes algoritmos de optimización usados para encontrar el mejor conjunto de parámetros de un Sistema de Inferencia Difusa destinado a la clasificación de fulguraciones solares. Los parámetros serán identificados entre un universo de posibles soluciones para los algoritmos y el sistema será probado en el caso particular de tratar con el objetivo de clasificar las fulguraciones solares.

The Sun is the main responsible for the varying conditions of the interplanetary medium, particularly, in the space surrounding our planet, in what is commonly known as space weather. Multiple solar phenomena show up at many spatial and temporal scales, and are studied through observations, theoretical models and simulations. Among the most energetic phenomena in the solar system are the solar flares. These are transient events associated to the activity of the star in which certain regions of the solar atmosphere can emit a vast amount of energy up to 10^{25} Joules.

These zones in the solar atmosphere are associated with the presence of dark spots in the solar surface (photosphere) called sunspots. Sunspots are the manifestation of intense magnetic fields emerging from the solar interior and crossing the photosphere, inhibiting the normal convection of solar plasma and thus reducing the radiation emission.

For this reason the temperature values in sunspots drop approximately 2000 K compared to the temperature in the non-active photosphere, known as quiet sun. Sunspots are proxies of solar activity and their number on the solar disk was used to discover the solar cycle in 1843 ^{[}^{1}^{]} and are the main constituents of the so-called solar active regions.

Solar activity has become a very important research topic due to its connection with space weather and the possible impact of energetic phenomena on the normal development of the current technological society, based on satellites, which could be affected by intense solar emissions ^{[}^{2}^{]}.

Depending on the amount of energy released (flux in Wm^{-2} ) during the intensity peak of flaring events, solar flares are classified in A, B, C, M or X, as listed in ^{[}^{3}^{]}.

The main goal of this work is to choose the best Fuzzy Inference System (FIS), from among several FIS tuning methods used, through a validation index Starting from the solar flares characteristics and quantity of them in the solar disk (as inputs of the FIS), each FIS allows to obtain a classification of the solar flares (as output of the FIS).

The parameters of each system were tuned using five methods: Manual Tuning, Adaptive Neuro-Fuzzy Inference System (ANFIS) with random initialization ^{[}^{4}^{]}, Compact Genetic Algorithm (CGA) ^{[}^{5}^{]}, Differential Evolution (DE) ^{[}^{6}^{]} and Stochastic Hill Climbing (SHC) with random initialization ^{[}^{7}^{]}.

The flow chart that describes the problem is shown in

The sunspot features and their associated flares were obtained by generating a database according to ^{[}^{2}^{]}, through a cross search in the sunspots and solar flares catalogs from the National Geophysical Data Center (NGDC). The parameters for the cross search allowed to obtain a total of 1391 individual values, using a time span of 6 hours, in the records from 1999 to 2002, to cover the activity peak of the Solar Cycle 23.

The quantities for each class with these parameters are recorded in

Note that the generated data presents an imbalance: the number of type C (common) flares are big compared to the M (moderate) flares, data class. Similarly, the M class has more data than X (extreme) flares, as expected from displaying activity of the Sun during its cycle of approximately 11 years.

Aiming to abbreviate, the inputs of the database were numerated as follows:

Modified Zurich Class

Penumbra: Largest Spot

Sunspot Distribution

Normalized number of Sunspots

Creating scatter plots from pairs of inputs like in

Also, it is quite clear from the

A FIS consists of five components: a base of fuzzy rules, a data base that defines the membership functions of the fuzzy sets used in fuzzy rules, the fuzzy inference engine, the fuzzifier and defuzzifier ^{[}^{4}^{]}.

The FIS can be represented with a fuzzy basis function expansion in which an input vector x is related with a punctual y output, such that y=f(x). Thus, it is possible to represent in a compact manner the inference process of a FIS and the resulting function is a universal estimator ^{[}^{5}^{]}

The FIS represented by ^{[}^{1}^{]} has the following characteristics:

Fuzzification: Singleton

Membership Functions: Gaussian.

Implication: Product

Defuzzification: Average of centers.

The 𝑙 index refers to the 𝑙 -th rule, being M the total number of rules. By its part, the

is then unique for each input in every rule. Similarly, the center of the consequent set y_{𝑙} is unique in every rule ^{[}^{5}^{]}.

The 𝜇𝐴 l 𝑖 ( )are of Gaussian type, and can be written as ^{[}^{2}^{]}.

Every MF in ^{[}^{2}^{]} has their c mean value and a σ standard deviation.

The total quantity of parameters that defines a FIS in the form ^{[}^{1}^{]} are given by ^{[}^{3}^{]}, having in mind that, for each input and every rule there are two parameters due to the antecedent set (c and σ ), and an additional parameter being the center of the consequent.

Starting from the authors' perceptions about the data and the possible relations that may be present in it, it is possible to create an initial FIS with their fuzzy sets for each of the inputs, their punctual output values, and the rule base allowing to link the fuzzy sets of the inputs to the punctual outputs.

The purpose of this method is to deepen into the problem recognizing possible relationships among features as well as revealing preliminary classification rules.

Although a valid solution can be found, the most important result of this method is the knowledge derived from approaching the problem.

Initially the software used was GNU's Octave, loading the packages "io" and "fuzzy-logic-toolkit". The first allows that Pctave reads the generated CSV dataset, and the second to design, test and verify the manual tuned FIS.

Despite the fact that in the following algorithms the software used was MATLAB, the final FIS created with Pctave was migrated to MATLAB through the Fuzzy Logic Designer, a graphical tool part of the Fuzzy Logic Toolbox; with the mere purpose to use the same software tool at the final validation stage.

ANFIS, a FIS based on adaptive networks, is a method based on a supervised learning model that, given a set of input/output pairs (x,y), related by an unknown function f, there is an apprentice and a supervisor of the learning process from f, with the use of a validation metric to evaluate the results of the apprentice and able to correct it. The algorithm uses a hybrid model that combines least squares method and the decreasing gradient or back-propagation method.

In this case the apprentice is a fuzzy system that can be written as the expansion of fuzzy based functions for a Sugeno type system shown in ^{[}^{1}^{]}. The parameters to be determined correspond to y , X 𝑖 l and σ 𝑖 l ^{[}^{4}^{]}. The validation metrics represents the root mean square error (RMSE) between the output value for the fuzzy apprentice system and the output value y of the data pairs ^{[}^{5}^{]}. The process aims at minimizing the error for the input values in a set comprising part of the complete available data, which is generally about 70% of them. Searching for an apprentice generalization, it is validated with the remaining 30% of the database.

Additional to the individual (apprentice system with its parameters and rules) to be adjusted, ANFIS requires initial conditions such as the number of rules, number of inputs and the rate of initial learning. For the case mentioned above, the inputs stay constant and the other two parameters are tuned up. Because ANFIS fits the parameters of an existing individual, thus implying a local search, it executes several times and, prior to this, it generates the individual with initialized parameters in random values, aiming at (depending on randomization) perform a global search in a whole universe of possible solutions.

This belongs to a series of algorithms known as Probabilistic Model Building Genetic Algorithm (PMBGA) ^{[}^{8}^{]}, which are characterized by discriminating the significant contribution attributes in the construction of an optimal individual. The validation indexes for determining the performance of an individual is the "Fitness" function, which in turn depends on the problem to be solved. The implementation considers an individual with the best performance when the value of this function is minimized.

Because in this work we are dealing with a classification problem, besides using the RMSE, we decided to also consider the use of classification error and correlation. With that in mind, we can assemble an initial brief of a fitness function ^{[}^{4}^{]}.

And

Where:

Every 𝐸_{𝐶𝑥} classification error has its respective w_{
x
} weight. As the database is inherently imbalanced, every weight w

Therefore, the weight associated to the class X of solar flares, for which the number of data is lower, has the highest value. By doing this, a badly classified data that belongs to this class produces a more significant increase in the first factor of (4) that one not incorrectly classified in class C, in the final fitness function factors (6)

To explain the E_{Rmse} Root Mean Square Error in (4), suppose that the problem is not a classification problem, but a prediction problem instead. For a conceptual brief, the E_{Rmse} gives an idea on how the individual are not "following" the expected sequence from the training data ^{[}^{5}^{]}.

Then, a bad predictor will have a greater E_{Rmse} value, than other that gets closer to the output values of the database, and considering that the data also depends on some time unit. The root mean square error is mathematically described as:

Where

v_{0} is the value obtained

v_{e} is the expected value

The number of rules was taken from the obtained result with the ANFIS algorithm, R=8 rules. For developing the algorithm, the parameter for adjusting the converging speed of the probability vector n is tuned. Since the optimal value is unknown, it is randomly designated based on ^{[}^{5}^{]}, and implemented in MATLAB. The process of randomly varying n and developing the algorithm, is repeated several times (w = number of experiments). Finally, among the best solutions the value generating the lowest number in (4) with (6) is found.

The parameters describing every FIS (individual) are then converted from real to binary data, due to the method adjusting every bit.

This is an algorithm based on the evolution of a population of vectors (individuals) with real parameters, which represent solutions in the searching space.

The algorithm of differential evolution is basically composed by 4 steps, as follows:

Initialization: Every vector (individual) of the population is randomly initialized.

Mutation: A mutation is applied in order to create a testing population of individual.

Crossing: Every vector is used as a mutant vector.

Selection: The testing vector previously obtained is used to do the crossing procedure, which compete with the target vector by the evaluation of the Fitness function ^{[}^{6}^{]}.

^{[}^{6}^{]}.

The Stochastic Hill Climbing, consist on taking a FIS (1) and keep evaluating the solutions in the vicinity of it ^{[}^{7}^{] [}^{9}^{]} in a maximum number of iterations. The parameters of the input FIS are randomly initialized.

^{[}^{10}^{]}.

Here:

Imax: Maximum number of iterations 𝑆 : Some particular solution (like 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 or 𝐶𝑎𝑛𝑑𝑖??𝑎𝑡𝑒) Cost(𝑆𝑜𝑙) : Fitness function, obeys (2)

RandomNeighbor(Current) in Algorithm 3 also requires the center and deviation variations, that refers to the allowed absolute value variations of the related parameters when searching for a neighbor. As example, if some of the parameters has the value 0.6, and the specified variation of this parameter is 0.1, then the neighbor will have some uniformly distributed random value between 0.5 and 0.7.

Every separate experiment consist on a single run of a program that implements the Algorithm 4, to obtain a final single individual, but n individuals can be obtained by running n experiments. Afterwards, the individuals can be evaluated with (4) and the validation base, in order to choose the best individual of the n individuals.

The classifier output consists on C values, corresponding to the 𝜔_{1}, 𝜔_{2}, … , 𝜔𝑐 classes. Due to the erroneous classifications occasionally occurring, the multiclass sorter is evaluated through a (C x C) - dimensional confusion rate matrix showing the respective classification errors between classes (off diagonal) and correct classifications (diagonal elements) ^{[}^{11}^{]}.

_{𝑖,} elements correspond to the data quantity from the 𝜔_{𝑖} class that was classified as elements of the 𝜔_{j} class.

Excluding the manual tuned FIS, and in order to allow the replicability of similar results, we expose briefly the parameters used for the algorithms. For the CGA, DE and SHC algorithms, the number of rules was taken from the best ANFIS result, as shown in

As the parameters for this method obey to human perceptions of the problem, only the main features are shown in

In this section we show first the best results for every method and their analysis. This analysis includes a comparison of their performance.

The best FIS obtained by each algorithm was evaluated using the whole database. With the evaluated output values and the expected output values a confusion matrix can be filled as shown in

In the case of ANFIS, the individual with the lowest validation error was selected for each of the different combinations of number of rules and initial learning rate (LR) as shown in

From

𝑅𝑢𝑙𝑒𝑠 = 8

𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑅𝑎𝑡𝑒 (𝐿𝑅) = 1

The best FIS obtained by the CGA occurred on experiment 𝑤 = 175 and for a value 𝑛 = 41 of the probability adjustment parameter.

To perform a statistical analysis of the algorithms implemented, the Welch's t-test was used for two-samples, assuming unequal variances to confirm or reject the null hypothesis whether both methods provide similar analytical results or not ^{[}^{12}^{]}.

Comparing the results of the test between DE with the CGA and ANFIS algorithms as shown in

On the other hand, from ^{[}^{13}^{]}, which state that optimization methods perform similarly in average over the entire set of possible optimization problems.

The result of the Welch's t-test shows that the null hypothesis should not be rejected because in the case of two tails the confidence level to reject is less than 20% and in the case of one tail it is less than 60%. Therefore, both methods provide the same average results and the observed differences are purely due to random errors.

In this section we summarize the obtained results and discuss on the different aspects of their performance.

Due to the imbalance in the database, systems and algorithms used in the present work have limited options to learn from class M, and much lower ones from class X.

Additionally for ANFIS, because of the fact mentioned before, the validation metrics for RMSE is not adequate for solving the problem since it ignores the classification error, from which it is evidenced that the best individual obtained in this method is an optimal class C classifier, but not so for the rest of classes.

Despite the Compact Genetic Algorithm has a simple description with little memory, it sufficiently restricts the space of solutions since it works with parameters represented in fixed point, having a more reduced universe as compared to the representation in floating points.

From the items listed above, and from