Enterprise AI Analysis
Automatic Diagnosis of Colorectal Cancer Based on Histopathological Images Using Artificial Intelligence Models
This research develops and evaluates a comprehensive Artificial Intelligence-based framework for multi-class classification of colorectal cancer histopathological images. To date, this is the first systematic comparison of three analytical approaches using the EBHI dataset: traditional machine learning with handcrafted features, a hybrid method combining automatic feature extraction with conventional classifiers, and end-to-end deep learning. The study also examines the influence of various magnification levels (40×, 100x, 200x, and 400x) to identify the optimal setting for capturing diagnostically relevant features, thereby assisting pathologists in selecting the most effective magnification during microscopic examination. In contrast to previous studies that primarily addressed binary classification or limited configurations, this work addresses a five-class classification problem using a large and consistent dataset that offers a more representative analysis of colorectal tissue heterogeneity. This research answers the following questions: (1) How do machine learning, hybrid, and deep learning approaches compare in multi-class colorectal cancer classification? (2) What is the effect of magnification level on classification performance? (3) Which combination of analytical method and magnification level provides the most reliable and clinically valuable results?
Executive Impact
Key performance indicators from the study highlight the potential for significant advancements in medical diagnostics.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Machine Learning with Manual Features
This section introduces the experimental results of our methodology for three scenarios in detail. It is split into three subsections: (1) Results and discussion of machine learning with manual feature extraction, (2) results and discussion of machine learning with automatic feature extraction, and (3) deep learning results. Following oversampling and preprocessing, which included normalization, standardization, and color conversion as required, color features and texture features were extracted. Specifically, the grey-level co-occurrence matrix (GLCM) was computed with a distance of 1 and angles of 0°, 45°, 90°, and 135°. The grey-level difference method (GLDM) was calculated with a distance of 10 pixels in four directions: horizontal, vertical, diagonal (top-right), and diagonal (bottom-left). Frequency-domain features, including wavelet and Fourier transforms, were also extracted, yielding a total of 252 features. Stratified k-fold cross-validation with five folds was then applied. Subsequently, classification was performed using ten established machine learning classifiers. The oversampling technique utilized the following parameters: rotation range of 0.5, shear range of 0.1, zoom range of 0.1, and horizontal flip.
Machine Learning with Automatic Features
To improve the performance of machine learning algorithms, automatic features, also referred to as deep features, were extracted using convolutional neural networks (CNNs). The data were prepared and preprocessed before being input into various CNN architectures, including ResNet-18, DenseNet-121, VGG-16, AlexNet, and EfficientNet-B1. The number of features extracted differed across models. Stratified K-Fold Cross-Validation was applied to the training data, and classifier performance was evaluated using the test data.
Deep Learning Models
Deep learning is a key concept in our study, along with machine learning. Initially, the images were resized and divided into training (80%) and testing (20%) sets. During training, we applied 5-fold stratified cross-validation on the training set to tune hyperparameters and optimize model performance. This approach ensures that the model learns effectively while maintaining a separate test set for unbiased evaluation. In conjunction with the Adam optimizer, we used different CNN architectures, including ResNet-50, DenseNet-121, VGG-16, InceptionV3, and EfficientNet-B1. We adjusted the batch size, learning rate, optimizer, number of hidden layers, activation function, dropout rate, and epochs multiple times to obtain satisfactory results, treating each magnification level separately. We experimented with various hyperparameter values for each model at different magnifications, running numerous trials to improve performance from approximately 91% to around 97% for the 100x magnification set. Table 6 shows the parameter values that achieved the best performance, and Table 7 presents the performance of the CNN models across all magnifications on the test data.
Enterprise Process Flow
| Approach | Key Characteristics | Performance Highlights |
|---|---|---|
| Manual ML |
|
Achieved 79% accuracy with XGBoost at 200x magnification, but lower overall compared to AI methods. |
| Hybrid ML (Automatic Features) |
|
Achieved 89% accuracy with SVM at 100x/200x magnification, balancing detail and context. |
| Deep Learning (End-to-End) |
|
Outperformed all other methods, achieving 97% accuracy with ResNet-50 at 100x magnification, highlighting superior feature learning. |
Impact of Magnification Levels on Diagnosis
The study revealed that magnification levels significantly influence classification performance. 100x magnification emerged as optimal for deep learning models, providing the best balance between tissue structure and cellular detail, leading to the highest accuracy (97%). Lower magnifications (40x) lacked sufficient detail, while higher magnifications (400x) introduced excessive granularity, hindering generalization. This finding is crucial for pathologists in selecting the most effective view for accurate diagnosis, enhancing both efficiency and reliability of AI-assisted systems.
Calculate Your Potential AI ROI
Estimate the financial and operational benefits of integrating advanced AI solutions into your enterprise workflow.
Your AI Implementation Roadmap
A typical phased approach to integrating AI into your enterprise, ensuring a smooth transition and maximum benefit.
Phase 01: Discovery & Strategy
Comprehensive assessment of your current infrastructure, data landscape, and business objectives. We collaborate to define clear AI use cases and a tailored strategy.
Phase 02: Data Engineering & Model Development
Cleaning, preparing, and structuring your data for AI. Development or fine-tuning of models, rigorously tested against benchmarks and domain expertise.
Phase 03: Integration & Deployment
Seamless integration of AI solutions into your existing systems. Deployment with robust monitoring and security protocols, ensuring operational readiness.
Phase 04: Performance Monitoring & Optimization
Continuous monitoring of AI model performance in live environments. Iterative refinement and optimization to adapt to new data and evolving business needs.
Ready to Transform Your Enterprise with AI?
Our experts are ready to help you navigate the complexities of AI implementation and unlock significant value. Schedule a free consultation today.