Enterprise AI Analysis
DTF-STCANet: A Dual Time-Frequency Swin Transformer and ConvNeXt Attention Network for Heart Sound Classification
Cardiovascular diseases (CVDs) are the leading global cause of mortality, necessitating early and accurate diagnosis. Traditional stethoscope auscultation is subjective and prone to human error. This study introduces DTF-STCANet, a novel AI-driven framework for heart sound classification, addressing limitations of single-domain representations and monolithic feature extraction. DTF-STCANet integrates Spectrogram and Continuous Wavelet Transform (CWT) inputs, processed by parallel Swin Transformer and ConvNeXt attention networks, respectively. This dual time-frequency fusion captures both stationary and transient PCG signal characteristics. An attention mechanism refines features, emphasizing discriminative regions. The model achieves a remarkable 99.29% accuracy on the PhysioNet/CinC 2016 dataset, significantly outperforming state-of-the-art methods. The use of a Weighted KNN classifier on the learned embedding manifold further enhances class separation and reduces misclassification rates, particularly false negatives, crucial for clinical screening.
Executive Impact at a Glance
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
- **Dual Time-Frequency Fusion**: The DTF-STCANet model leverages both Spectrograms (STFT) and Continuous Wavelet Transforms (CWT) of Phonocardiogram (PCG) signals. This dual-input approach is crucial for capturing both stationary (S1, S2 sounds) and transient (murmurs) cardiac events, overcoming the limitations of single-domain representations.
- **Heterogeneous Backbone**: The architecture integrates a Swin Transformer (for global contextual dependencies and long-range relationships in Spectrograms) and a ConvNeXt (for local texture patterns and fine-grained acoustic features in CWT images). This parallel processing enhances representational diversity and robustness.
- **Attention Mechanism**: A channel-wise attention mechanism is applied post-fusion to refine the concatenated features. This mechanism selectively prioritizes discriminative features and suppresses noise, allowing adaptive feature recalibration crucial for sparse pathological acoustic signatures.
- **Weighted KNN Classifier**: Instead of a traditional Softmax layer, a Weighted K-Nearest Neighbor (KNN) algorithm is used for final classification. This geometry-aware decision mechanism operates directly on the learned embedding manifold, creating stronger decision boundaries and improving classification consistency, especially for overlapping or imbalanced feature distributions.
- **High Classification Accuracy**: The DTF-STCANet + KNN model achieved an outstanding 99.29% overall accuracy on the PhysioNet/CinC 2016 Challenge dataset, which represents a significant improvement over existing state-of-the-art methods.
- **Robustness Across Folds**: Performance was validated using a stratified 10-fold cross-validation, demonstrating stable and consistent results (99.29% ± 0.35 accuracy, 95% CI: [98.67–99.51]), confirming the reliability and generalizability of the model.
- **Reduced Misclassification Rates**: The total number of incorrectly classified samples drastically dropped to 35, an 84.2% error reduction compared to the backbone-only model. Crucially, it achieved a very low false-negative rate (22 instances), vital for pathological screening.
- **Superior Sensitivity & Specificity**: The model achieved balanced sensitivity (0.9901) and specificity (0.9894), and an F1-score of 0.9901, indicating strong discriminative capacity for both healthy and unhealthy heart sounds.
- **Early Diagnosis of CVDs**: The high accuracy and low false-negative rate make DTF-STCANet a powerful tool for early and critical diagnosis of cardiovascular diseases, potentially leading to improved patient outcomes and reduced long-term complications.
- **Automated Decision Support**: By integrating AI, the framework overcomes limitations of traditional auscultation, such as reliance on physician expertise, auditory sensitivity, and susceptibility to environmental noise, offering a reliable automated decision support system.
- **Interpretability and Stability**: The geometry-aware Weighted KNN classifier provides stable and interpretable classification boundaries, enhancing trust in the AI system's diagnostic recommendations in a clinical setting.
- **Addressing Data Heterogeneity**: The dual time-frequency fusion and heterogeneous backbone effectively capture complex, non-stationary PCG signal characteristics from varied clinical and non-clinical conditions, making it adaptable to real-world data.
Source: Model 4, including Weighted KNN
DTF-STCANet Core Process
| Method | Accuracy (%) | Specificity (%) | Sensitivity (%) | F1-Score (%) |
|---|---|---|---|---|
| Ours (DTF-STCANet + KNN) | 99.29 | 98.94 | 99.01 | 99.01 |
| RAMM + NRBMI + SVM [14] | 98.80 | 98.30 | 98.90 | 99.20 |
| Params [13] | 96.40 | 99.10 | 86.50 | 89.30 |
| HS-based Vectors [31] | 95.62 | 97.72 | 87.61 | 89.2 |
| Res2Net-CNN [12] | 91.03 | 95.01 | 74.51 | 77.52 |
Clinical Efficacy: Reducing False Negatives in Heart Sound Screening
In a critical clinical scenario for early cardiovascular disease screening, minimizing false negatives is paramount to avoid delayed diagnosis and severe patient outcomes. The DTF-STCANet model, especially with its Weighted KNN classifier, demonstrated exceptional performance in this regard. During cross-validation on the PhysioNet/CinC 2016 dataset (3541 samples), the model achieved only 22 false negatives out of 816 unhealthy recordings, corresponding to a sensitivity of 97.30% for the unhealthy class. This is a significant improvement over other classifiers (e.g., SVM: 25 false negatives, EBT: 112 false negatives). This low false-negative rate, combined with high overall accuracy, establishes DTF-STCANet as a clinically reliable tool for identifying abnormal heart sounds, providing a robust decision support system that can aid healthcare professionals in making timely and accurate referrals.
Calculate Your Potential ROI
Estimate the efficiency gains and cost savings your enterprise could achieve by implementing DTF-STCANet.
Your DTF-STCANet Implementation Roadmap
A phased approach to integrate advanced AI into your diagnostic workflow, ensuring seamless adoption and measurable impact.
Phase 1: Data Integration & Preprocessing
Securely integrate existing PCG datasets and establish a robust pipeline for Spectrogram and CWT generation. Define normalization and augmentation strategies tailored to your clinical data.
Phase 2: Model Adaptation & Fine-tuning
Adapt the pre-trained Swin Transformer and ConvNeXt backbones to your specific data characteristics. Fine-tune the DTF-STCANet architecture to optimize performance on your internal validation sets.
Phase 3: Integration with Clinical Workflow
Deploy the DTF-STCANet model into your diagnostic systems, ensuring seamless integration with existing EHRs and PACS. Develop user interfaces for clinicians to interpret AI-driven insights.
Phase 4: Continuous Monitoring & Improvement
Implement real-time performance monitoring and feedback loops for continuous model improvement. Establish a framework for regular updates and retraining with new clinical data to maintain diagnostic accuracy.
Unlock Advanced Diagnostic Capabilities
Ready to transform your healthcare diagnostics? Schedule a personalized strategy session to explore how DTF-STCANet can integrate with your existing systems and improve patient outcomes.