A Deep Learning Model to Perform Keras Binary Classification (2024)

Introduction

Introduction
Importing Data
Splitting Dataset into Train and Test Feature Matrix and Dependent Vector
Model Creation, Compilation, Fitting, and Evaluation
Conclusion
Appendix
Top

Introduction

Binary classification is one of the most common and frequently tackled problems in the machine learning domain. In it's simplest form the user tries to classify an entity into one of the two possible categories. For example, give the attributes of the fruits like weight, color, peel texture, etc. that classify the fruits as either peach or apple. Through the effective use of Neural Networks (Deep Learning Models), binary classification problems can be solved to a fairly high degree.

In this guide, we will see how we are going to classify the molecules as being either active or inactive based on the physical properties like the mass of the molecule, radius of gyration, electro-negativity, etc. The data set has been created just for the sake of this tutorial and is only indicative. To avoid confusion, the properties will be listed just as prop_1, prop_2 instead of mass, the radius of gyration, etc.

The Keras library, that comes along with the Tensorflow library, will be employed to generate the Deep Learning model.

Importing Data

Let us have a look at the sample of the dataset we will be working with

1import pandas as pd2df = pd.read_csv('molecular_activity.csv')3print(df.head())

python

Output

Splitting Dataset into Train and Test Feature Matrix and Dependent Vector

The dataset we imported needs pre-processing before it can be fed into the neural network. The first step will be to split it into independent features and dependent vector. For our molecular activity dataset, prop_1, prop_2, prop_3, and prop_4 are the independent features while Activity is the dependent variable.

1properties = list(df.columns.values)2properties.remove('Activity')3print(properties)4X = df[properties]5y = df['Activity']

python

The above code first creates the list using the column names available in the dataset and assigns it to the variable properties. Subsequently, the dependent variable name (Activity) is removed from properties. X Matrix is defined by taking up all the data in the data frame (df) apart from that or Activity. Similarly y vector is created by taking the Activity data from the df.

1print(X.head())

python

Output

1 prop_1 prop_2 prop_3 prop_420 4.06 71.01 57.20 5.8231 3.63 65.62 52.68 5.4442 3.63 68.90 58.29 6.0653 4.11 75.59 62.81 6.4464 4.00 70.86 58.05 6.06

1print(y.head())

python

Output

10 121 132 143 154 1

The next step will be to divide the data into test and train sets. This is achieved using test_train_split function provided in the model_selection class of sklearn module.

1from sklearn.model_selection import train_test_split2X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

python

The above code splits the data set such that seventy percent of the randomly selected data is put into the train set and rest of the thirty percent of data is kept aside as the test set that will be used for the validation purposes.

Model Creation, Compilation, Fitting, and Evaluation

1model = keras.Sequential([2 keras.layers.Flatten(input_shape=(4,)),3 keras.layers.Dense(16, activation=tf.nn.relu),4keras.layers.Dense(16, activation=tf.nn.relu),5 keras.layers.Dense(1, activation=tf.nn.sigmoid),6])

python

The above code creates a Neural Network that has three layers. There are two layers of 16 nodes each and one output node. The last node uses the sigmoid activation function that will squeeze all the values between 0 and 1 into the form of a sigmoid curve. The other two layers use ReLU (Rectified Linear Units) as the activation function. ReLU is a half rectified function; that is, for all the inputs less than 0 (e.g. -120,-6.7, -0.0344, 0) the value is 0 while for anything positive (e.g. 10,15, 34) the value is retained. One output unit is used since for each record values in X, a probability will be predicted. If it is high ( >0.9) than the molecule is definitely active. If it is less ( <0.2) then it is definitely not active.

1model.compile(optimizer='adam',2 loss='binary_crossentropy',3 metrics=['accuracy'])45model.fit(X_train, y_train, epochs=50, batch_size=1)6test_loss, test_acc = model.evaluate(X_test, y_test)

python

The above code compiles the network. It uses Adam, a momentum-based optimizer. The loss function used is binary_crossentropy. For binary classification problems that give output in the form of probability, binary_crossentropy is usually the optimizer of choice. mean_squared_error may also be used instead of binary_crossentropy as well. Metrics used is accuracy. The model is trained for 50 epochs with a batch size of 1. Finally, the trained model was evaluated for the test set to check the accuracy.

1import pandas as pd2import tensorflow as tf3from tensorflow import keras4from sklearn.model_selection import train_test_split5import numpy as np6df = pd.read_csv('molecular_activity.csv')7properties = list(df.columns.values)8properties.remove('Activity')9X = df[properties]10y = df['Activity']1112X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)1314model = keras.Sequential([15 keras.layers.Flatten(input_shape=(4,)),16 keras.layers.Dense(16, activation=tf.nn.relu),17keras.layers.Dense(16, activation=tf.nn.relu),18 keras.layers.Dense(1, activation=tf.nn.sigmoid),19])2021model.compile(optimizer='adam',22 loss='binary_crossentropy',23 metrics=['accuracy'])2425model.fit(X_train, y_train, epochs=50, batch_size=1)2627test_loss, test_acc = model.evaluate(X_test, y_test)28print('Test accuracy:', test_acc)

python

Output

1Epoch 1/502378/378 [==============================] - 1s 2ms/sample - loss: 0.6704 - acc: 0.69583Epoch 2/504378/378 [==============================] - 0s 1ms/sample - loss: 0.5604 - acc: 0.76725Epoch 3/506378/378 [==============================] - 0s 1ms/sample - loss: 0.5554 - acc: 0.77257Epoch 4/508378/378 [==============================] - 0s 1ms/sample - loss: 0.5536 - acc: 0.77519Epoch 5/5010'11'12'13Epoch 44/5014378/378 [==============================] - 0s 1ms/sample - loss: 0.4138 - acc: 0.836015Epoch 45/5016378/378 [==============================] - 0s 1ms/sample - loss: 0.4214 - acc: 0.828017Epoch 46/5018378/378 [==============================] - 0s 1ms/sample - loss: 0.4268 - acc: 0.833319Epoch 47/5020378/378 [==============================] - 0s 1ms/sample - loss: 0.4130 - acc: 0.828021Epoch 48/5022378/378 [==============================] - 0s 1ms/sample - loss: 0.4146 - acc: 0.830723Epoch 49/5024378/378 [==============================] - 0s 1ms/sample - loss: 0.4161 - acc: 0.833325Epoch 50/5026378/378 [==============================] - 1s 1ms/sample - loss: 0.4111 - acc: 0.825427162/162 [==============================] - 0s 421us/sample - loss: 0.3955 - acc: 0.833328Test accuracy: 0.8333333

python

The test accuracy predicted by the model is over 83%. It can further be increased by trying to optimize the epochs, the number of layers or the number of nodes per layer.

Now, let us use the trained model to predict the probability values for the new data set. The below code passes two feature arrays to the trained model and gives out the probability.

1a= np.array([[4.02,70.86,62.05,7.0],[2.99,60.30,57.46,6.06]])2print(model.predict(a))

python

Output

1[[0.8603756 ]2 [0.05907778]]

python

Conclusion

In this example, we developed a working Neural Network for the binary classification problem. The same problem can also be solved using other algorithms such as Logistic Regression, Naive Bayes, K-Nearest Neighbours. The choice of the algorithm to choose needs to be driven by the problem at hand and factors like, how much data size is available, computation power, etc. Deep Networks or Neural Networks are generally recommended if the available data size is large.

Appendix

I have compiled the complete data set which can be found at my GitHub.