Feature Extraction the Heart UCI

A feature extraction and classification system has been create to predict heart disease. Addition, the results of this algorithm compared . The Chi-Square Test technique used to introduce information gain, Chi-Square Test, Mean Absolute Difference, and Dispersion ratio from feature extraction methods. In classification algorithms, the MLP and KNN algorithms, also utilized. The dataset was taken from the Kaggle platform and name as Heart Disease UCI. Also, the data set’s features are given below.

Database Information:

This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The “goal” field refers to the presence of heart disease in the patient. It is integer value from 0 (no presence) to 4.

Attribute Information:

age

sex

chest pain type (4 values)

resting blood pressure

serum cholestoral in mg/dl

fasting blood sugar > 120 mg/dl

resting electrocardiographic results (values 0,1,2)

maximum heart rate achieved

exercise induced angina

oldpeak = ST depression induced by exercise relative to rest

the slope of the peak exercise ST segment

number of major vessels (0-3) colored by flourosopy

thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

Algorithm Processing Steps

Step 1: Import Library

Step 2: Read CSV Files

Step 3 : Data Separated as Attribute and Target

Step 4 : Graphics of Categories

Feature Extraction

Filter Methods

Step 5 : Feature extraction using information gain

Step 6 : Feature extraction using Chi-Square Test

Step 7 : Feature extraction using Mean Absolute Difference

Step 8 : Feature extraction using Dispersion ratio

Chi-Square Test used for classification.

Step 9 : The dataset divided into training and test datasets

Classification Using the MLP

Step 10 : Creating a Model

Step 11 : Creating Input Layer

The number of input neurons is also given as five, as can be seen here, because five attributes remain from the feature extraction methods.

Step 12: Creating Hidden Layer

The number of hidden layers is set to 16 in this example.

Step 13: Creating Output Layer

Step 14: Model Training

Step 15: Model Accuracy for Test Data

Step 16: Model Accuracy for Training Data

Classification Using the KNN

The K coefficient in KNN determined to also be 3.

Step 17: Creating a Model

Step 18: Model Training

Step 19: Predict Data

Step 20: Model Accuracy for Test Data

Step 21: Model Accuracy for Training Data

RESULTS

The results obtained by using the Keras Sequential model, which one of the data augmantation algorithm models, are given below. Then for the first training, the parameter values, the number of neurons for the input layer, 5 activation parameters selected. In this training, an intermediate layer used and consists of 16 neurons, and similarly the activation parameter is tanh. Finally, in the output layer, which has 1 output, the activation parameter sigmoid selected.

In response to these obtained input and output values, the adam algorithm chosen for optimization. This is because it is a good algorithm for first-orderalgorithm for first-order gradient-based optimization and the most widely used method for optimizing neural networks. Additionally, number of iterations of 10 used. In addition, the test data success rate for this model found to be 0.5714. Similarly the success rate for the training data found to be 0.5896.

The KNN algorithm used for the second model. The K coefficient determined as 3 and the model trained. Then, predict data found according to the entered values. In addition,The test data success rate for this model found to be 0.8352. Similarly, the success rate for the training data found to be 0.8302.

Comparatively, if a general evaluation is made, the training success rate for the first model is higher, while the test success rate for the second model is higher.

You can find a lot of information on the subject here.
Click here to access the codes related to this article.