top of page
Prostate_Cancer_2.jpg

MATLAB Biopsy Analysis

This MATLAB program is an automatic cancer detection tool with a proven accuracy of 73.54% over 5,000 images when trained with 50,000. At its core, the program works by analyzing many microscopic biopsy images of known cancerous/non-cancerous tumors, training a machine learning program with these data, and applying this new model to predict whether novel images are cancerous or not. Further detail on flowchart functionality, step-by-step operating, and more is given below.

Program Flow

Matlab Flowchart New.PNG

Stepwise Execution

00000000000000000000000001.jpg

Image read-in

Each image within the training dataset is read in individually for analysis. The image to the left is a real biopsy image of a prostate cancer tumor, obtained from an online database. Note, this image is for demonstration and was not included in the aforementioned 60,000 image dataset

Enhanced_ProstateCancer.jpg

Image enhancement

Each image is first enhanced in order to better collect data later on. This is accomplished through a complex process called contrast-limited adaptive histogram equalization (CLAHE) after grayscaling, which reduces noise and improves contrast relative to nearby image regions.

Isolated_Nuclei_PC.jpg

Image segmentation

Now, the program isolates only the nuclei in the image. Color k-means clustering is used first to perform simple isolation, then a watershed transformation which references local extended minima separates any clusters.

dataTable.PNG

Data collection

Next, the program calculates characteristic data on each of the nuclei identified: radius, texture, perimeter, area, smoothness, and compactness. Following this, higher-level statistics are collected on these data in order to compile each image in to a single row of interpretable data for the machine learning program: mean, median, max, min, Q1, Q3, standard deviation, and number of nuclei outside of two standard deviations. Pictured left is a snippet of data gathered from 100 images.

KNN_edited_edited.jpg

Machine learning model fit

A K-Nearest Neighbor machine learning model is fit using both the gathered data from the training set images and their associated classifications. This model is saved for future use.

Classification.PNG

Novel image classification

Using the image analysis process described above, the trained machine learning model makes an "educated guess" as to whether a novel image is cancerous or non-cancerous. The program is currently set up to run through many images and check accuracy rather than to do single images, but simple adjustments can be made to accommodate this method.

More Information

Sources:

Research article that inspired this project: https://www.hindawi.com/journals/jme/2015/457906/

Helpful in improving segmentation: https://www.nature.com/articles/s41598-019-38813-2

Dataset source: https://www.kaggle.com/c/histopathologic-cancer-detection/data

bottom of page