In this study, an effective OBCSA-OSAE technique is established for the detection and classification of PPH. The proposed OBCSA-OSAE technique incorporated two major stages, choice of features and classification. Principally, the OBCSA technique is presented for optimal selection of a subset of features. Followed by the EO algorithm to derive the optimally choose parameters involved in the SAE model. Lastly, the SAE model is used for PPH classification. Figure 1 illustrates the general process of the OBCSA-OSAE model.
Algorithmic design of OBCSA-FS technique
In the initial stage, the OBCSA-FS technique is designed to choose an optimal subset of features. The OBCSA algorithm is designed for the integration of oppositional based learning (OBL) concept alongside BCSA. The Crow Search Algorithm is one of the latest evolutionary algorithms developed by Askarzadeh [20], which was stimulated by the social habits of crow through a search procedure that mimics their behavior in the wild. The concept of CSA was inspired by the way that these animals hide food in a given place and retrieve it at a later time. Mathematically, a flock of crows is represented as nc, and in the search space, the position of crow i at iteration t is \(x_{i}^{t} \). In CSA, the hidden location of food can be remembered by the crow iThe procedure of upgrading the location of the crow thieves(crows who want to steal other crow food) was performed by
$$\begin{array}{@{}rcl@{}} x_{i}^{t+1}=x_{i}^{t}+\tau\times fl\times\left(M_{j}^{t}-x_{i}^{t}\right),\;i=1,2,\;\dots n_{c}, \end{array} $$
(1)
Whereas fl denotes the flight length and τ signifies an arbitrary value in the range of zero and one, M is a fitness function is used to evaluate each crow, and its value is put as an initial memory value for the updation.
The next state of the problem is that crow j, the owner of the food, knows that the crow i is observing him and following him, hence the crew owner would deceive crow i by going to any other location in the search space. In CSA, the location of crow i is upgraded by an arbitrary location, and the accurate state can be defined as:
$$\begin{array}{@{}rcl@{}} x_{i}^{t+1}= \left(\begin{array}{cc} x_{i}^{t}+\tau\times fl\times\left(M_{j}^{t}-x_{i}^{t}\right),&if\;\theta\geq AP\\ random\;position,&otherwise \end{array} \right. \end{array} $$
(2)
Whereas θ represents the arbitrary value from the range of [0, 1] and AP implies the probability of awareness. The CSA was adapted and is utilized in FS by suggesting a binary search model [21]. In the BCSA, the search space was developed based on the Boolean lattice of n dimension and the essential solution is upgraded through the corners of a hypercube, different from the typical CSA where solution is upgraded from the continuous spaces. The binary vector has been utilized for FS, whereas one equivalent to either the feature that chosen for creating the novel datasets, and or else 0. The idea of OBL has been employed the fl variable from the BCSA to prevent trapping from the local optimal and for enhancing the quality of resultant solution by attaining a balance among exploration as well as exploitation and obtain effective solution. The variable fl is initiated in the BCSA according to the OBL instead of utilizing an arbitrary initiation which might be far from the optimum global solutions and is made by:
Whereas \(\overline x \) represent the opposite number and x∈R denotes a real number determined on range of x∈[a,b]. While a=0 and b=1 Eq. (3) becomes
While there is a point P(x1, x2, ……xn) in n dimension coordinate and x1,x2,………,xn∈R later, the opposite point \(\overline P \) is determined as its coordinates \(\overline {x_{1}},\overline {x_{2}},\dots,\overline {x_{n}} \):
$$ \overline{x_{i}}=a_{i}+b_{i}-x_{i}\;i=1,\dots\dots\dots,\;n\;\; $$
(5)
In such cases, have 2 values, x represent initial arbitrary value in [a, b] and \(\overline x \) denotes the opposite values of x. They calculate f(x)& \(f(\overline x) \) in all the iterations of OBCSA, later, employ on the evaluation function g if \(g(f(x))\geq g(f(\overline x)) \) select x or else select \(\overline x.\; \)Consequently, the fl would be in range: fl∈[flmin,flmax]. The opposite number \(\overline {fl} \) can be determined by:
$$ \overline{fl}=fl_{min}+fl_{max}-x, $$
(6)
Later, evaluate the fitness for the first fl value and the fitness for \(\overline {fl} \) in all the iterations. When \(fitness(fl)\geq fitness(\overline {fl}) \), they select fl, or else \(\overline {fl} \) would be selected. The stages of presented method can be given in the following.
Step1: The count of crows is nc=25,flmin=0.1,flmax=1.8,AP=0.3, and the maximal number of iterations is tmax=100.
Step2: The position that represent the features are made by U(0, 1).
Step3: The fitness function (FF) can be determined by
$$ Fitness\;=C+W\times\left(1-\frac{F_{all}}{F_{sub}}\right),\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; $$
(7)
Whereas C represent the classification performance, W represent the weighted factors in the range of zero and one, Fall represent the overall amount of features and Fsub signifies the length of elected feature.
Step4: The position of the crows are upgraded as Eq. (2)
Step5: Steps 3 & 4 are repetitive till a tmax is attained.
Process involved in OSAE based classification model
At the time of classification process, the chosen features are passed to the OSAE model. AEs have been unsupervised ANN utilized to representation learning. The AE structure has been planned for imposing a bottleneck from the network which forces a compressed knowledge illustration of the input. So, the correlation among the input features is learned and recreated. AEs are encoding-decoding frameworks. The encoding map the original input x to hidden layer that has been regarded as latent space representation. The decoding then regenerates this latent representation as to \(\widehat x \). The encoder as well as decoder models are determined in Eqs. (8) and (9) correspondingly
$$ h=\sigma \left(Wx+b\right), $$
(8)
$$ X=\sigma \left(W^{'}h+b^{'}\right),\; $$
(9)
where x=(x1, x2, …xn) signifies the input data vector, h=(h1, h2, …hn) refers the low dimension vector reached in the hidden layer, and \(\widehat x=(\widehat {x}_{1},\;\widehat {x}_{2},\;\dots \widehat {x}_{n}) \) represents the recreated input. W and \(\phantom {\dot {i}\!}W^{'}\) defines the weight matrices, b and \(\phantom {\dot {i}\!}b^{'}\) demonstrated the bias vectors, and σ denotes the sigmoid activation function, for instance, \(\sigma =\frac {1}{1+e^{-x}} \). It can utilize the MSE function as the recreated error function amongst h and X:
$$ E=\frac{1}{N}\sum\limits_{i=1}^{N}\left\Vert\widehat{x_{i}}-x_{i}\right\Vert^{2} $$
(10)
Overfitting has been common challenge which occurs if trained AE network. An effectual manner for solving this issue is by implementing a weight penalty to cost function:
$$ E=\frac{1}{N} \sum\limits^{N}_{i=1}\frac{1}{2}\left\Vert\widehat{x_{i}}-x_{i}\right\Vert^{2}+\frac{\lambda}{2} \left(\left\Vert W \right\Vert^{2}+\left\Vert W^{'}\right\Vert^{2}\right) $$
(11)
where λ refers the weight attenuation coefficients. Moreover, a sparse penalty term has been presented from the AE hidden layer for achieving optimum feature learning in sparse constraint and keep a condition in which the AE copies the input data to outcome [22]. Considering \({\widehat p}_{j} \) represents the average activation of hidden layer neuron, it can be determined as \({\widehat p}_{j}=\frac {1}{N}\Sigma _{i=1}^{N}h_{j}(x_{i}) \), and ρ refers the sparsity proportion, frequently a small positive value nearby 0. For achieving sparsity, it can be restricted to \({\widehat \rho }_{j}=\rho \), and the Kullback-Leibler (KL) divergence was established for the loss function as regularization term:
$$ {}KL(\widehat{\rho}\parallel\rho)={\sum\nolimits}_{j=1}^{K}\rho log\left(\frac{\rho}{\rho_{j}}\right)+\left(1-\rho\right)log\;\left(\frac{1-\rho}{1-\rho_{j}}\right), $$
(12)
where K stands for the number of hidden neurons. Therefore, the loss function of sparse AE now has 3 parts: the MSE, weight attenuation, and sparsity regularization parts:
$$ {}\begin{aligned} E&=\frac{1}{N}{\sum\nolimits}_{i=1}^{N}\frac{1}{2}\left\| x_{i}-x_{i}\right\|^{2}+\frac{\lambda}{2}\left(\left\| W\right\|^{2}+\left\| W^{'}\right\|^{2}\right)\\ &\quad+\beta KL(\widehat\rho\vert\vert\rho), \end{aligned} $$
(13)
where β refers the sparsity regularization parameter. Also, it can stack various sparse AEs for achieving improved feature learning. The framework entails linking the encoder to input layers of the next sparse AE, so make sure the network gains optimum representation learning. Figure 2 depicts the framework of SAE.
During this analysis, the SSAE network is presented. In the SSAE network, the hidden layer of the earlier sparse AE serves as input to the next sparse AE. The last hidden layer is then linked to the oftmax classifier that carries out the classification. So, the presented SSAE network contains train sparse AEs and a Softmax classifier. The BP was implemented for fine-tuning the parameter of the total network with the trained samples and their labels. The fine-tuning stage assumes the many layers of the network as one model. Considering {y1, y2, …,ym} implies the target variable of the trained data, the cost function of the whole network is determined as:
$$ E=-\frac1m\left[{\sum\nolimits}_{i=1}^{m}{\sum\nolimits}_{j=1}^{N}1\left\{y^{i}=j\right\}\log\;\frac{e^{\theta_{i}^{T}x^{i}}}{\Sigma_{l=1}^{N}e^{\theta_{l}^{T}x^{i}}}\right],\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; $$
(14)
where 1{∙} indicates the indicator function, for instance, 1{yi=j}=1 if y=1, and 1{yi=j}=0 if y≠j,N implies the number classes, and θi represents the weight matrix associating the ith output unit. In addition, it can employ the EO technique for optimizing the SSAE parameter, for instance, an optimum weight as well as bias values. The selection of weights as well as bias are vital from trained robust NNs.
The parameter optimization of SAE model takes place using the EO algorithm [23]. It follows dynamic mass balance system that runs on control volume. An arithmetical expression is utilized for representing a mass balance to determine the concentration of a non-reactive constituent under the dynamic environments of control volume. This expression is a function with their several processes under the different kinds of source and sink. The entire description of the EO approach can be explained by the following: The haphazard population (primary concentration) is initiated by standard distribution depending on amount of particles and dimensional in provided search region:
$$ C_{i}^{initial}=C_{min}+rand_{i}\left(C_{max}-C_{min}\right)i=1,2,\dots,\;n\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; $$
(15)
where as \(C_{i}^{initial} \) represent vector of early concentration of ith particles, Cmin and Cmax represents lower and upper bounds, randi indicates uniform arbitrary number from the range of zero and one and n denotes the size of population.
In order to define the equilibrium state (global optimal), a pool of 4 optimum to this point candidates should be found along with other particles using a concentration equivalent to arithmetical mean of these 4 particles. Thus particles together procedure a pool vector as follows
$$ {\overrightarrow C}_{eq.pool}=\left({\overrightarrow C}_{(1)},{\overrightarrow C}_{(2)},{\overrightarrow C}_{(3)},{\overrightarrow C}_{(4)},{\overrightarrow C}_{eq(ave)}\right\} $$
(16)
In the evolution phase, initial particle updates its concentration in initial generation according to on \({\overrightarrow C}_{eq(1)} \) and in next generation, then upgrading might takes place on \({\overrightarrow C}_{eq(ave)} \). Then, all the particles with each candidate solution are upgraded until the completion of evolution process.
The exponential word F displayed in Eq. (3) helps EO method by attaining an appropriate balance among intensification and diversification. λ represent an arbitrary number from the range zero and one for controlling the turn-over rate in actual control volume.
$$ \overrightarrow F=e^{-\overrightarrow\lambda(t-t_{0})} $$
(17)
Whereas t represents the amount of iteration (Iter) as follows:
$$ t=\left(1-\frac{Iter}{Max_{-}iter}\right)\left(a_{2}\frac{Iter}{Max_{-}iter}\right), $$
(18)
In which Iter=current iteration,Max_iter= maximum iteration and parameters a2 is used for controlling exploitation capability of EO [24]. For ensuring the convergence when improving local and global search capability of the method:
$$ \overset{\rightharpoonup}{t_{0}}=\frac{1}{\overset{\rightharpoonup}{\lambda}}ln\left(-a_{1}sign\left(\overset{\rightharpoonup}{r}-0.5\right)\left(1-e^{-\overset{\rightharpoonup}{\lambda}t}\right]\right)+t, $$
(19)
In which a1 & a2 is utilized for controlling global as well as local search capability of EO method. The \(sign(\vec {r}-0.5),\) is accountable for the way of exploitation and exploration. In EO, the a1 & a2 values are selected to be 2 and 1 correspondingly.
By replacing Eq. (19) in Eq. (17), the equation would become:
$$ \overset{\rightharpoonup}{F}=a_{1}sign\left(\overset{\rightharpoonup}{r}-0.5\right)\left[e^{-\overset{\rightharpoonup}{\lambda} t}-1\right], $$
(20)
The generation rate in EO is employed for improving exploitation as a function of time. The initial order exponential decay procedure in the method of generation rate of multi-purpose models:
$$ \overset{\rightharpoonup}{G}={\overset{\rightharpoonup}G}_{0}e^{-\overset{\rightharpoonup}{k}(t-t_{0})}\;\;, $$
(21)
Whereas G0= first value, k= decay variable.
Lastly, the generation rate expression assumes k=λ:
$$ \overset{\rightharpoonup}{G}={\overset{\rightharpoonup}{G}}_{0}e^{-\overset{\rightharpoonup}{\lambda}(t-t_{0})}={\overset{\rightharpoonup}{G}}_{0}{\overset{\rightharpoonup}{F}}_{0}\;, $$
(22)
$$ {\overset{\rightharpoonup}{G}}_{0}=G\overset{\rightharpoonup}{C}P\left({\overset{\rightharpoonup}{C}}_{eq}-\overset{\rightharpoonup}{\lambda}\overset{\rightharpoonup}{C}\right), $$
(23)
$$ G\overset{\rightharpoonup}{C}P=\left(\begin{array}{cc} 0.5r_{1},&r_{2}\geq0\\ 0,&r_{2}<0\end{array}\right., $$
(24)
Let r1,r2 be the 2 arbitrary numbers from the range of [0, 1] and GCP variable is for controlling generation rate.
Based on the above equation, the last upgrading equation of concentration (particles) is determined by:
$$ \overset{\rightharpoonup}{C}={\overset{\rightharpoonup}{C}}_{eq}+\left(\overset{\rightharpoonup}{C}- {\overset{\rightharpoonup}{C}}_{eq}\right)\overset{\rightharpoonup}{F}+\frac{\overset{\rightharpoonup}{G}}{\overset{\rightharpoonup}{\lambda} V}\left(1-\overset{\rightharpoonup}{F}\right), $$
(25)
The upgrading equation has 3 terms: initial term is an equilibrium concentration; the next term is used to global search and last term has been accountable to local search for attaining solution more precisely. In order to optimally adjust the parameter of the SAE algorithm, the EO model is utilized and the thorough functioning can be given as follow. The training method of the SAE algorithm is performed via a FF. Additionally, ten-fold cross-validation procedure is used for evaluating the FF. In ten-fold CV, the training databases are arbitrarily segmented to a set of ten mutually exclusive subsets of almost equivalent size whereas 9 subsets are utilized for training the information and the residual one is employed for testing the information. This processes are repetitive for the collection of ten iterations that the subsets utilize for testing the method. The FF is represented as 1−CAvalidationof the ten-fold CV method in the training data, as determined in Eq. (26). As well, a solution with maximal CAvalidation results in minimum fitness value.
$$ Fitness\;=1-CA_{validation}, $$
(26)
$$ CA_{validation}=1-\frac1{10}{\sum\nolimits}_{i=1}^{10}{\vert\frac{y_{c}}{y_{c}+y_{f}}\vert\times100}\;, $$
(27)
Whereas, yc & yf indicates the amount of true and false classification. Lastly, the hyperparameter included in the SAE algorithm is optimally picked up by the EO method in which the classification performances get enhanced.