NEW COMPLEX VALUED ACTIVATIONFUNCTIONS: COMPLEX MODIFIEDSWISH, COMPLEX E-SWISH ANDCOMPLEX FLATTEN-TSWISH

: Complex valued neural network (CVNN) has been developed to process complex valued data directly. In CVNN, one of the most important factors is selecting the node’s activation function. Choosing the right activation function for each layer is also crucial and may have a significant impact on metric scores and the training speed of the model. This paper introduces three new activation functions for CVNNs which is closely related to the activation function complex swish. These new activation functions are complex modified swish, complex E-swish and complex Flatten-T swish. In order to verify the validity and practicability of the proposed three new activation functions are tested and compared with complex swish activation function on complex valued four bit XOR problem, three inputs symmetry detection and the fading equalization problems. We show that complex E-swish ( β=1.4 ) has the best overall performance when compared to other networks using complex swish, complex modified swish and complex Flatten-T swish activation functions on the considered tasks.


INTRODUCTION
As an extension of real valued artificial neural networks (RVANN), complex valued artificial neural networks (CVANN)has been developed to process data with complex numbers directly without requiring any pretreatment. CVANN is one of the type of neural network consisting of complex numbers of parameters such as weights, threshold values, inputs and outputs [1].
CVANN are suitable for areas that deal with complex valued data such as image processing by taking the fourier transformation, radar, telecommunication and speech recognition. CVANN are also used in areas such as telecommunications and image processing can be found in the literature [2,3,4]. For example, the fading equalization problem has been successfully solved with a single complex-valued neuron with the highest generalization ability [5]. Also the fading equalization, symmetry detection and exclusive-or (XOR) problem scan be successfully solved by a single complex valued neuron [5,6].
The threshold values and initial weights of CVANN, normalization of data and activation function of hidden nodes affects CVANN's convergence to target [7]. For CVANNs, the main task is finding an appropriate complex activation function. Despite the real valued activation function is usually selected to be bounded and smooth like a logarithmic sigmoid function, these properties are not suitable for CVANN [8].
All parameters of CVANN such as inputs, outputs and weights are in complex plane. So that the activation function of CVANN must be extended into the complex domain. The complex activation function should be satisfy the following features: -The function φ (z) should not be linear in both the real and imaginary parts of Z, ZR and ZI. Otherwise, the multi-layer perceptron will have no advantage. If correct, using a multi-layer perceptron would be equal to a single-layer perceptron [1].
-The function φ (z) must be bounded. The formulas described for the forward passage of the multilayered perceptron require limitation. Otherwise there will be interruptions during the training [1].
-The Partial derivatives of φ (z) should exist and be bounded. Since we use complex back-propagation, the partial derivatives of φ (z) need to be bounded [1].
-The function φ (z) must be defined as a complex function that is analytic all over the complex plane [1].
There are many studies in the field of complex activation function.Some of these are given below.
Leung and Haykin [9] was used the sigmoid function on complex domain, as follows: Leung and Haykin scaled the input data to some region of complex domain given below, because this function has singular points at every = (2 + 1) , ∈ .

A. Complex-Valued Neuron Model
Complex-valued neurons (CVN) are as natural as complex numbers and they are more functional than realvalued neurons (RVN) such as learning faster and generalize better. CVN can be used for simulation of biological neurons and treat the phase information properly. A single RVN can learn only linearly-separable input/output mappings and cannot learn nonlinearly separable input/output mappings but CVN can learn both of them [15].
The complex valued neuron model is given in Eq.5 where the f is an activation function, x 1 ,….,x 2 are the inputs, are the weights and P B The activation function can be a real function : → or complex function : → but the function always acts on a complex variable [16].
is the activation function [15].
One of the main advantages of CVN is their ability to work with the phase, which is very important for the analysis of signals and for solving different pattern recognition and classification problems. The analysis of real-valued signals one of the most efficient approaches is the frequency domain analysis, which immediately involves complex numbers. By analysing signal properties in the frequency domain, we can see that each signal is characterized by magnitude and phase that carry different information about the signal. So that CVN treat the phase information properly [15].

B. The New Activation Functions
Swish activation function was proposed by researchers at Google (2007). This activation function and its derivative are formulized by Eq. 6, 7 [17].

Complex Swish
Where is the sigmoid activation function. In our study, the swish activation function has been studied in complex plane with the complex valued input z. Swish has been adapted to CVANNs as real and imaginary type function given by Eq.8 [7].
Modified swish activation function was proposed by Ramachandran et al [17]. Mod-swish function is defined as:

Complex Modified Swish
Where the value is either a constant or a trainable parameter. Since the proposed CVANN will use the back propagation, the derivative of is needed. The derivative of the y function is formulized by the following equations: In our study, the modified swish activation function has been studied in complex plane with the complex valued input z. Modified Swish has been adapted to CVANNs as real and imaginary type function as follows: Alcaide introduced the E-swish activation function (2018) [18]. E-swish and its derivative are formulated as: Complex E-Swish E-swish activation function is very similar t Swish. Actually, when we take the constant as 1, Eswishbecomes the same as Swish. The constant value is either a constant or a trainable parameter [18].
In our study, E-swish activation function has been studied in complex plane with the complex valued input . E-Swish has been adapted to CVANNs as real and imaginary type function given by: This new activation function called FTS wish or Flatten-T Swish (FTS) was proposed by Chieng [19]. Flatten-T swish combines the Swish and Rectified Linear Units (ReLU) activations functions into a new one. FTS wish is formulated as follows:

Complex Flatten-T Swish
T is the parameters called threshold values that enable the negative part of the equation to produce negative values [19].
The proposed CVANN will use the back propagation algorithm so that the derivative of the formula is needed. is a constant value so that its derivative simply converts to be 0 (similarly, this applies to the FTS(x) derivative x < 0). So that, the only term in the derivative is = . ( ) + .The derivative steps of FTS is given below: As a whole, the derivative of the FTS is formulized as: In our study, FTS wish activation function has been studied in complex plane with the complex valued input . FTSwish has been adapted to CVANNs as real and imaginary type function given by:

C. Complex Valued Data
Since all parameters in CVANN consist of complex numbers, the input data must also be in complex domain. Therefore the real valued input data must be moved to complex domain. In this study the real valued four bit XOR and three inputs one output symmetry detection problem converted to complex domain.
In this study, to verify the validity and applicability of the CVANN using the new activation functions, we applied it to three problems: the four bit XOR, three inputs one output symmetry detection and fading equalization problems.
The Complex valued Exclusive-Or (XOR) problem with four patterns is given in Table 1. The complex valued XOR problem is defined according to the following two rules

Complex valued XOR problem with four patterns
-The real part of the output, the real and the imaginary part of the XOR -The imaginary part of the output is taken as the real part of the input [20]. The CVANN has many advantages over the RVANN, such as the XOR problem, which can be solved with two layered CVANN [21].
The problem of symmetry determination aims to symmetrically determine the binary activity levels of a onedimensional input neuron array under the central point. Symmetry probability decreases as the number of bits increases. Therefore, the symmetry detection problem is a very suitable problem for researching unbalanced data because the possibility of being symmetric decreases as the number of bits increases. Three inputs and one output detection of symmetry problem is shown in Table 2 [21]. Table 2. The detection of symmetry problem

Symmetry detection problem
Since all parameters in CVANN consist of complex numbers, the input data must also be in complex domain. Therefore the real valued input data must be converted to complex domain. This conversion can be done with sample angle-based coding using the equation given below [22].
where ∈ [ , ] and Ɵ the is the mapping angle. After the encoded angle value is evaluated by a linear transformation using Eq. 19, by the Euler formula given in Eq. 20, the complex valued data is obtained on the unit circle with unity gain.
In this study, a and b were taken as 0 and 1, respectively. The data was moved to the complex plane with a phase angle Ɵ = /4 . Three input and one output symmetry detection problems in the complex plane are given Table 3. This section showed that the fading equalization problem can be successfully solved by the two-layered CVANN with the highest generalization ability [23].

Fading equalization problem
Channel equalization problem in a digital communication system can be seen as the pattern classification problem. The digital communication system receives a signal sequence transmitted with additional noise and tries to estimate the actual transmitted sequence from these signals. A transmitted signal can assume one of the following four possible complex values: −1 − , −1 + , 1 − and 1 + ( = √−1 ). Thus, the received signal will take value around −1 − , −1 + , 1 − and 1 + because some noises are added. We need to estimate the true complex values from such complex values with noises. Thus, a method with excellent generalization ability is needed for the estimate. The input-output mapping in the problem is shown in Table 4 [24]. Table 4. Input-output mapping in the fading equalization problem In order to solve the problem with the complex-valued neural network, the input-output mapping in Table 4 is encoded as shown in Table 5 [25]. Rumelhart et al., (1986a, b) showed that increasing the number of layers raised the computational power of neural networks [26,27].

A. Complex-Valued XOR Problem with Four Patterns
CVANN using the new activation functions (E-Swish, Flatten-T Swish, Modified Swish) was tested on complex valued XOR problem with four patterns for finding the best constant value(β,T and ). We use 1-2-1 network with the learning rate 0.5 as in the literature [5,[28][29][30][31][32].The network was stopped when the error rate was achieved. For the error rate we use Mean squared error (MSE) value given below: In order to find the best β,T and values, CVANN using E-Swish, Flatten-T Swish and Modified Swish activation functions was tested with 18 different values (randomly). When the error value (MSE) reached 0.001, the iteration number of the CVANNs are shown in the Table 6,7,8. The numbers in "score" column report the aggregate number of times of E-swish, Flatten-T swish and modified swish activation functions gives the best result obtained by the existing activation function across the four tests (Test-I, Test-II, Test-III and Test-IV).   1037 The results show that CVANN using E-Swish with β=1.4, Flatten-T Swish with T= -0.3 and Modified Swish with =1.4 converges to the target earlier than other CVANN using activation functions with different β,T and values on complex-valued XOR problem with four patterns.

B. Complex-Valued Symmetry Detection Problem
For finding the best constant value for the new activation functions (E-Swish, Flatten-T Swish, Modified Swish), CVANN was tested on complex-valued symmetry detection problem with 18 different β,T and values (randomly). We use 3-1-1 (three input, one hidden nodes and one output) network with the learning rate 0.5 as in the literature [5,25]. We use MSE as stopping criteria given by Eq.21.
The iteration number when the proposed CVANNs error rate reached 0.001 is given in Table 9, 10, 11.  It was seen that CVANN using E-Swish with β=1.4, Flatten-T Swish with T= -0.2 and Modified Swish with =1.5 converges to the target earlier than other CVANN using activation functions with different β,T and values on complex-valued symmetry detection problem.

C. The Fading Equalization Problem
In the following text, it is shown that the fading equalization problem which cannot be solved with a single real-valued neuron, can be successfully solved by a single CVN. First we found the best constant value for the new activation functions (E-Swish, Flatten-T Swish, Modified Swish) on the fading equalization problem. We use a 1-2-1 CVANN with the learning constant 0.5. When the error value (MSE) reached 0.001, the iteration number of the CVANNs are shown in the Table 12, 13, 14.   As seen in the results, the CVANN using E-Swish with β =1.4, Flatten-T Swish with T= 1 and Modified Swish with =4 gives the best results compared with the other β,T and values on the fading equalization problem.

D. Comparison of Four Activation Functions
In order to prove the validity of proposed CVANNs, three new activation functions (modified swish with a constant value α, complex E-swish with a constant value β and complex Flatten-T swish with a constant value T) are tested and compared with complex swish activation function on complex valued four bit XOR problem, three inputs symmetry detection and the fading equalization problems. When the error value (MSE) reached 0.001, the average learning epochs are shown in the Table 15, 16, 17. The experiments show that the proposed CVANN using complex E-Swish activation function with β=1.4 has better stability convergence performance than the other complex activation functions on complex valued XOR, symmetry and fading equalization problems. The average learning epochs, targets and outputs of Test-V are given in the Table 18-20 when the error value (MSE) reached the stopping criteria 0.001.

IV. DISCUSSION AND CONCLUSION
In this paper, three new activation functions which are called complex modified swish with a constant value , complex E-swish with a constant value and complex Flatten-T swish with a constant value have been presented. It is also showed that the parameters β, T and α determine the convergence of CVANNs to the target and the training speed of the model.
Our experiments has shown that complex E-swish with β=1.4, complex modified swish with =1.4 and complex Flatten-T swish with T= -0.3 converges to the target earlier on XOR problem. According to symmetry problem tests, it has shown that complex E-swish with β=1.4, complex modified swish with =1.5 and complex Flatten-T swish with T= -0.2 converges to the target earlier than the other values. Finally, the CVANN is tested on the fading equalization problem. The results showed that complex Eswish with β=1.4, complex modified swish with =4 and complex Flatten-T swish with T= 1 converges to the target earlier.
The performance of the three new activation functions with the best constant value, is compared with the new swish activation function on complex XOR, symmetry and the fading equalization problems. After training the activation functions on this benchmark tests, all results show that E-Swish with β=1.4 and β=1.5 has the best achievement (noted with an asterisk*) among the existing activation functions in term of iteration number that reached the error rate. The Means Squared Error was used as a performance index.
From the presented experiments, apparently, we conclude that CVANN using complex E-swish activation function with β=1.4 has the best overall performance when compared to other networks using complex swish, complex modified swish and complex Flatten-T swish activation functions on complex valued four bit XOR problem, three inputs symmetry detection and the fading equalization problems.