AI for Breast Cancer Detection using whole-slide images
Step-by-step approach from data selection, cleaning and augmentation to training of a deep neural network for segmenting whole-slide images and counting nuclei.
With computers becoming widely available to the general public in the twenty-first century, digital microscopes have risen to make their apparition in pathology laboratories. The addition of a camera to a light microscope now allows people to visualize its output directly on a monitor. While this technology brings a lot of comfort and ergonomy to laboratory scientists, it still requires manual work in moving the sample, as the area of interest for the researcher might not be directly underneath the lens of the microscope. However, not so long ago a breakthrough was made in modern microscopy: whole slide imaging. Instead of directly observing a subset of the slide through the lens of a microscope or the retransmission of the slide on a monitor, the slide is passed in its entirety through a whole slide scanner that produces a very high-resolution image of the whole specimen, called a Whole Slide Image (WSI).
A revolution of modern optical microscopy
Whole Slide Images have the potential to revolutionize optical microscopy in the medical sector. Not only do they facilitate the current work of pathologists by reducing the time they need to spend on each slide, they also open the door to long distance pathology. This attribute can greatly help less developed countries, where there is less than one pathologist per 500,000 of population. Importing scanners to laboratories in areas with few pathologists and transferring the patient slides data to practitioners from different regions and all over the world could give these countries access to better health care. Nevertheless, the increase of patient data to be analyzed also means that there’s an urgent need to automate as much of this novel workflow for pathologists. This is where Artificial Intelligence comes into play.
Machine Learning to the rescue
A subset of Machine Learning, called “Supervised Learning”, works by training an algorithm to make a prediction (e.g. “tell me which disease can you observe”) based on data (e.g. “on this whole slide image”), where the ground truth can be used as feedback to the algorithm (“and now compare the prediction the algorithm made with an annotation from an expert pathologist, and update your prediction”) until the algorithm learns the underlying patterns that result in correct predictions. In order to do this training, machine learning algorithms require data, and often a lot of data.
Luckily, large databases of anonymized WSIs are made available to the world, enabling researchers and companies to train Machine Learning models aiming at assisting the diagnosis for a large variety of diseases and from numerous supports such as histological sections, blood smears, bone marrow or seminal fluids to name a few.
One common prediction that is asked of machine learning is to create a “segmentation map” - a technique where every single pixel in an image is given a certain meaning, such as “foreground” or “background”, “diseased tissue” versus “healthy tissue”. The automatic generation of these segmentation maps would empower experts by immediately highlighting and quantifying features and patterns of interest such as biomarkers indicative of certain diseases. Most importantly, segmentation maps can dramatically reduce work and increase the quality and reproducibility of the work done by pathologists and researchers.
A challenging setup
In order to reach sufficient results, image segmentation algorithms need to mimic the capacity of a pathologist to precisely identify the position and the shape of objects in biological samples. It is not surprising that this task is among the most complex of computer vision. To the untrained eye, some microscopical objects can be extremely difficult to identify. These tasks require highly skilled practitioners with up to five years of full-time training in addition to the time it takes to become a doctor. Furthermore, labeling entire sets of microscopical images, each of extremely high resolution and holding countless objects of interest, is a tedious task that also turns out to be extremely expensive considering the level of expertise implied. Identifying scalable Machine Learning solutions, scoping the data collection and labeling process down while ensuring feasibility and quality for the predictions is the main challenge faced today and on which we can help.
The rest of this article will take you through the whole journey of implementing a cancer diagnosis tool capable of segmenting tumors extendable to tumor gradation thanks to nuclei density evaluation.
Improving Diagnosis of Breast Cancers
One particular application of machine-learning generated segmentation maps in the world of Histology is the detection of breast cancer. According to the World Health Organisation, breast cancer is the most widespread cancer worldwide with more than 2.3 million cases and 685 000 deaths in 2020. The disease arises in the lining cells in the glandular tissue of the breast. In its early stages, the tumor is symptomless and has limited power to spread, which makes it hard to detect without regular examinations.
Our motivations behind the case
The high prevalence of the disease and its devastating consequences are of course the main driver for us to try to improve the current state of its detection, gradation, and treatment. This endeavor of course animates most of the scientific community which created precious yet public databases of breast cancer WSIs, which we leveraged in the context of this study.
The second reason why we chose this use case lies in the complexity of the task. When exams such as mammograms or ultrasound detect abnormal changes in the breast, tissue samples are taken from the suspicious area. These samples, also called biopsies, are then sent to histology labs to be prepared on their glass slides from which histopathologists make their diagnosis. The objects that need to be segmented have very diversified shapes, colors, sizes and contexts. We perceived this complexity as an opportunity to design complex yet highly transferable Machine Learning pipelines as this use case comes with most of the challenges similar segmentation tasks will present.
Working with Whole-slide Images
For the purpose of our research we have used a set of 151 WSI from The Cancer Genomic Atlas (TCGA). Each of them had medium sized regions selected and labeled by experts. In total, 20 types of objects were segmented and unidentified objects were grouped together in an “other” class.
A first step in training a segmentation algorithm using the TCGA data was to deal with some of the challenges related to this particular dataset. The predominant problems were that:
- A third of the images contained a large amount of unlabeled pixels
- Four out of the twenty-one classes represented close to 90% of all the pixels together
- The WSI had very different shapes
- The WSI were still too large to fit in the memory of commercially available Graphical
- Processing Units (GPUs), the main type of computer hardware used to train machine learning algorithms.
- The number of annotated whole slide images was limited (only 151 WSIs).
Removing unlabeled pixels
The first problem we treated was to remove the large amount of unlabeled pixels. Some of the labels provided by experts were not aligned with the original WSI, resulting in large patches of pixels of the whole slide image for which we did not know the ground truth. Because supervised learning algorithms only work when the ground truth is known, and that large parts of unlabeled data take up unnecessary computing power, we needed to remove the unlabeled pixels. To do so, we firstly rotated the whole slide image (so that the labelled area is perpendicular) and secondly crop the parts of the image that were unlabeled, resulting in a perfectly rectangular image where every pixel was labeled. When the labels provided by experts cover a perfect rectangular area, figuring out the exact rotation and crop of this area can easily be done by looking at the most extreme points of the area and deducing the angle from those points, and cropping the whole region that is unlabeled. Nevertheless, labels often don’t cover a full area which results in the need to use more advanced methods to figure out how to rotate this map to keep the fewest possible unlabeled pixels while limiting how much cropping happens.
One solution to this problem is to measure the number of unlabeled pixels for each rotation, and attempt to rotate and crop the image in order to both minimize the number of unlabeled pixels that are present after the rotation and crop, while at the same time minimize the number of labeled pixels that are removed after the rotation and crop. To achieve this task, we used an evolutionary algorithm. In such algorithms, a population of individuals (here rotation angles and crops) are evaluated given their fitness to reach the objective (here a minimal number of unlabeled pixels that are still present and a minimal number of labeled pixels that are removed). At each iteration, the fitness function is used to select a subset of the original population to breed better individuals for the next generation.
We used the state-of-the-art Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm to achieve our purpose. Within a few iterations, it was able to converge and return the best rotation angle and crop, therefore purging more than 99.9% of unlabeled pixels and maintaining 99.8% of labeled pixels.
From a twenty-one class to a three class problem
The second problem we dealt with was that four out of the twenty-one classes represented close to 90% of all the pixels together, a problem known as “class imbalance”. Dealing with class imbalance is never an easy task, especially in such an extreme case where more than half of the classes constitute less than 1% of the data. Given its distribution and the fact that the problem is mostly about breast cancer, the easiest way to deal with this is by keeping the two majority classes (tumor and stroma), and regrouping the rest in a third class (other).
Cutting images into patches
The third and fourth problems we dealt with was that the images came in different shapes and were generally too large to fit into the memory of a commercially available GPU. To deal with the two remaining problems, the medium-sized images were cut into small 500x500 pixels patches. The patch size was chosen in such a way that several of them could fit in the GPU at the same time and that they were big enough for most of them to contain decent sized patterns. More than 15 000 patches were created out of the 143 images used for the training of the machine learning model. The 8 remaining images were kept whole for the evaluation of the performance of the algorithm.
From now on, we use the term “image” for the preprocessed medium-sized images and “patch” for the smaller images cut out of them.
To deal with our final problem, namely that we had limited data to train our algorithms on, we used a technique called “data augmentation”. Data augmentation involves randomly flipping, rotating, cropping, zooming in or out, brightening or darkening the image, to name a few techniques. Because the label of the augmented image is still known, we effectively increase the number of samples that an algorithm can use to train on. However, augmentations need to be chosen carefully as some of these augmentations are more fit than others for a given task: for example, some biopsies are stained with specialized chemicals, meaning that inverting the color “blue” with the color “red” might result in worse performance of the algorithm in real life scenarios.
Training the segmentation model
Training a segmentation model can be expensive. The used machine learning algorithms have large computational requirements, which implies long training and high GPU memory usage. Therefore, it is important to rapidly get it right. We rely on state of the art techniques to speed up and improve the quality of the learning. We also use transfer learning, which consists in using the outputs of a machine learning algorithm trained on generic problems, which we then fine-tune for our specific use-case. This technology allows us to significantly decrease the training time since the model is not trained from scratch.
It is also necessary to select an appropriate metric to evaluate how well the model is performing. We used the dice coefficient, which measures how much overlap there is between the objects from the prediction and the ground truth.
Making predictions on whole images
The trained model is only capable of predicting segmentation maps of patch sized images. However, the algorithm needs to work with production sized images, which are expected to be way larger.
The most straightforward way of making predictions for a whole slide image is by cutting the image into patches, asking the machine learning algorithm to make a prediction for each patch, retrieving the patch sized prediction and stitching them back together to obtain the final result. Yet, this method yields segmentation maps with many artefacts. The model lacks context near the edges of the patches which results in unreliable predictions around these edges.
Smarter Inference Method - Context-based majority vote
In order to improve on the naive method, we added additional post processing steps. These steps involved:
- Cutting more diversified patches from the original image and creating overlap between them.
- Giving more weight to predictions made at the center of patches where the model has the most context.
- Stitching the patches back together and using a majority vote strategy to determine the pixel class between overlaps.
This new pipeline ensured that low confidence predictions are overlapped with higher confidence predictions, therefore completely removing the artefacts and overall smoothing the prediction. The shapes are more consistent with only a small amount of noise remaining on certain curves.
Additional smoothing using mirror/rotated images
To completely remove the noise from the edge of the objects, duplicate images are created from the original using mirroring and rotations (a technique we call “duplicate overlapping”). From there, the duplicates are passed through the post processing pipeline and the reverse transformations are applied to the output duplicate segmentation maps. These copies are then merged together to obtain a high quality prediction with very smooth shapes.
To quantify the effects of these three methods, the dice coefficient between the ground truth and the prediction was computed for each image from the test set for each method. The results are listed in the table below.
Using weighted overlapping has a noticeable effect on the dice score, while adding duplicate overlapping mainly has a corrective effect by removing predictive noise. It is interesting to note that the model performs very poorly for one of the test images compared to the others. The insights of an expert could be very valuable to understand this discrepancy.
Transferring the pipeline to nuclei segmentation
By tackling the challenging breast cancer use case, we were able to design an elegant solution to generate segmentation maps from microscopic data. We then decided to use transfer learning, or the fact of reusing a trained neural network on a brand new task, on the nuclei segmentation case.
Why nuclei segmentation
Nuclei study is an important step in any histopathological image analysis. It involves several tasks (nuclei count, cell type identification, nuclei shape, etc) which can provide crucial information to pathologists for their diagnosis, especially for cancer related diseases. Because of how time consuming they are, they are often performed on small subsets of the image. Conventional algorithms are still used today to automate this process, but they usually fail to generalize to images with unseen features.
Making a fast, accurate and adaptable algorithm to automatically segment the nuclei from a region of a WSI would be extremely valuable to histopathologists. Most of the metrics that contribute to their diagnosis can easily be computed from the generated segmentation maps.
Impressive results despite little data
The model was re-trained using only 30 images, with a total of 30 000 000 labeled pixels, of cancerous tissues from 7 different organs: breast, kidney, liver, prostate, bladder, colon and stomach. The same post processing pipeline as described above was used to produce and evaluate the maps from 14 test images.
The bottom right image is a visualization of the comparison between the expected result and the obtained result. On one hand, the AI seems to predict more nuclei than the trained medical experts. On another hand, it does not miss a single nuclei from the ones that the experts annotated. This is an interesting case as the AI did not contradict the decisions of the medical experts, but proposed additional nuclei to be taken into consideration or to be subjected under additional examination.
The dice coefficient, which in this case measures the amount of overlap between the nuclei (and not the background), for each test images are listed in the table below.
The results are very promising given the fact that the model was trained on very few images. Furthermore, the AI was able to get very good results on brain tissues, which is an organ that was not used during training.
Let’s get in touch!
Our technology was capable of yielding convincing results on complex segmentation tasks involving few high resolution optical microscopy images. Thanks to transfer learning, our algorithms are scalable to many other disease applications and could bring valuable assistance to histopathologists by removing most of the repetitiveness of their work. If you are interested in collaboration or have a specific diagnostics case in mind, leave us a message!