PathMNIST Histology Classification

A comparative study of Random Forest, multilayer perceptron, and convolutional neural network models on nine colorectal tissue patch classes. The result shows how much spatial structure matters for low resolution histology images.

Explore results Paper PDF Notebook PDF

Local PathMNIST samples exported from the included training split

Best model CNN

Spatial filters captured the local texture signal.

Test accuracy 86.25%

Measured on the held out 8,000 image test set.

Macro F1 0.8607

Class balanced score across all nine categories.

Model Comparison

Final models were trained from the selected hyperparameters and evaluated on the same test split.

Accuracy and Macro F1

CNN creates the clear separation.

Training Cost

Accuracy came with a longer CPU run.

Interactive Results

Switch between confusion matrices, tuning searches, and training histories without losing the comparison context.

Convolutional Neural Network Confusion matrices

Class Explorer

Review sample patches, class balance, and per class F1 scores for each tissue category.

Method Notes

The modelling choices are kept visible so the result page remains audit friendly for a technical reader.

Dataset

The assignment subset contains 32,000 training images and 8,000 test images. Every patch is a 28 by 28 RGB crop from PathMNIST.

Training and test class distribution chart

Preprocessing

Random Forest used flattened pixels with PCA, MLP used flattened standardized vectors, and CNN kept the image tensor shape.

Interpretation

Flat pixel models struggled on glandular and stromal texture. The CNN improved the clinically important normal mucosa and adenocarcinoma classes, but this is coursework and not a diagnostic system.

Read paper View notebook