Vijai Gandikota's Home Page

Breast Cancer class prediction using Machine Learning

Contents
1. Goal
2. Data Examination
3. Algorithm Choice, development and code
4. Decision Tree Generated
4. Output
5. Final Results

Predicting the class of a patient's breast tumor as 'Benign' or 'Malignant': Univ. of Wisconsin & Univ. of California Irvine dataset

This uses decision tree algorithm for prediction

Goal: Given 569 records of patient data on breast tumor testing and with the class outcome values "Benign" or "Malignant" the requirement is to build a model to predict the class of new patients' tumor based on the recorded features.
Data Examination:The features or attributes are

   1. Sample code number            id number

   2. Clump Thickness               1 - 10

   3. Uniformity of Cell Size       1 - 10

   4. Uniformity of Cell Shape      1 - 10

   5. Marginal Adhesion             1 - 10

   6. Single Epithelial Cell Size   1 - 10

   7. Bare Nuclei                   1 - 10

   8. Bland Chromatin               1 - 10

   9. Normal Nucleoli               1 - 10

  10. Mitoses                       1 - 10

  11. Class:   (2 for benign, 4 for malignant)

Algorithm Choice, development and code: Since this is a multivariate feature set and we are aiming to predict a class label or we are doing classification we will use the Decision Tree Algorithm.

Code

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn import tree
from sklearn.externals.six import StringIO
import pydotplus
from IPython.display import Image

print("Loading data")
bc_data = load_breast_cancer()

print("Selecting 3 records [0, 50, 100] as test samples")
test_idx = [0, 50, 100]

#training data
print("Creating train target data set by removing the 3 test samples")
train_target = np.delete(bc_data.target, test_idx)
print("Creating train data data set by removing the 3 test samples")
train_data = np.delete(bc_data.data, test_idx, axis=0)

#testing data
print("Creating test target data set with the three samples") 
test_target = bc_data.target[test_idx]
print("Creating test data dataset with the three samples")
test_data = bc_data.data[test_idx]

print("Creating Decision Tree Classifier")
my_decision_tree = tree.DecisionTreeClassifier()
print("Training the decision tree classifier")
my_decision_tree.fit(train_data, train_target)

predicted_values = my_decision_tree.predict(test_data)
print("\nTESTING:\n")
print("TEST TARGETS: %s" % (test_target))
print("PREDICTED VALUES: %s\n" % (predicted_values))

total_count = len(test_idx)
correct_count = 0
for i in range(len(test_idx)):
	if test_target[i] == predicted_values[i]:
		correct_count+=1	

accuracy = (correct_count/total_count) * 100

print("Accuracy = %.5f %%" % (accuracy))

dot_data = tree.export_graphviz(my_decision_tree, out_file=None) 
graph = pydotplus.graph_from_dot_data(dot_data) 
graph.write_pdf("bc_tree.pdf") 

dot_data = tree.export_graphviz(my_decision_tree, out_file=None, 
                         feature_names=bc_data.feature_names,  
                         class_names=bc_data.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = pydotplus.graph_from_dot_data(dot_data)  
Image(graph.create_png())

The Decision Tree Generated

Here is the output. Three samples were set aside for testing and the accuracy was computed after comparing their predictions to their known classes.
Here the Accuracy is the number of samples that were correctly classified over the total number of samples.

Output bc_tree.pdf: The Decsion Tree Model Generated

$ python3.6 ./treeclassify_bc.py 
Loading data
Selecting 3 records [0, 50, 100] as test samples
Creating train target data set by removing the 3 test samples
Creating train data data set by removing the 3 test samples
Creating test target data set with the three samples
Creating test data dataset with the three samples
Creating Decision Tree Classifier
Training the decision tree classifier

TESTING:

TEST TARGETS: [0 1 0]
PREDICTED VALUES: [0 1 0]

Accuracy = 100.00000 %

Vijai Gandikota

Breast Cancer class prediction using Machine Learning

Contents 1. Goal 2. Data Examination 3. Algorithm Choice, development and code 4. Decision Tree Generated 4. Output 5. Final Results

Predicting the class of a patient's breast tumor as 'Benign' or 'Malignant': Univ. of Wisconsin & Univ. of California Irvine dataset

This uses decision tree algorithm for prediction

The Decision Tree Generated

Contents
1. Goal
2. Data Examination
3. Algorithm Choice, development and code
4. Decision Tree Generated
4. Output
5. Final Results