Skip to main content
Enterprise AI Analysis: Can Code Evaluation Metrics Detect Code Plagiarism?

Enterprise AI Analysis: Software Engineering Education

Can Code Evaluation Metrics Detect Code Plagiarism?

A comparative empirical study on source code plagiarism detection

Fahad Ebrahim, Mike Joy • University of Warwick • April 28, 2026

Executive Impact Summary

Our research highlights critical performance benchmarks for plagiarism detection using Code Evaluation Metrics (CEMs) in software engineering. Key metrics demonstrate the potential of CEMs, especially with preprocessing, to rival or exceed traditional tools.

0.882 Pooled AUROC (FusionTop3, Preprocessed)
0.865 Pooled AP (CrystalBLEU, Preprocessed)
0.864 Pooled AUROC (Dolos, Raw)
0.845 Pooled AP (FusionTop3, Raw)

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Methodology
Key Findings
Implications

Our Approach to Evaluating Code Evaluation Metrics

We performed a comparative empirical study using two open-source labelled datasets, ConPlag and IRPlag, to evaluate five CEMs: CodeBLEU, CrystalBLEU, RUBY, TSED, and CodeBERTScore. Performance was assessed using threshold-free ranking-based measures (AUROC, AUPRC) across overall, per-dataset, and per-level plagiarism. Results were compared against SOTA SCPDTs (JPlag, Dolos), and the impact of preprocessing was also examined.

Core Discoveries in Plagiarism Detection

Our analysis reveals that CEMs can achieve comparable ranking performance to dedicated SCPDTs, especially with preprocessing. CrystalBLEU and FusionTop3 demonstrated strong performance, often surpassing Dolos and JPlag on pooled preprocessed datasets. However, all methods struggled with complex plagiarism levels (L4-L6), indicating a need for metrics that capture deeper semantic changes.

Strategic Implications for Software Engineering Education

CEMs are valuable for screening and filtering potentially plagiarised pairs in educational settings, but final judgments still require instructor review due to false positives and limitations in complex cases. A recommended workflow involves preprocessing, using strong ranking methods like CrystalBLEU to flag suspicious pairs, and then confirming with dedicated plagiarism tools and manual inspection. Future efforts should focus on combining complementary metrics and addressing higher-level semantic plagiarism.

Enterprise Process Flow

Code Pair Extraction
Similarity Estimation (CEMs & Tools)
Ranking Performance Evaluation
Comparative Analysis
Preprocessing Impact Study

CEMs vs. SOTA Plagiarism Detectors (Preprocessed)

A head-to-head comparison highlighting the improved performance of CEMs with preprocessing.

Metric/Tool AUROC (Pooled) AP (Pooled)
FusionTop3 0.882 0.862
CrystalBLEU 0.879 0.865
Dolos 0.864 0.842
CodeBLEU 0.843 0.822
RUBY 0.839 0.826
JPlag 0.777 0.762
L4+ Performance Drop at Higher Plagiarism Levels

All evaluated methods, including both CEMs and dedicated SCPDTs, experienced a significant drop in detection performance from plagiarism level L4 onwards. This highlights the inherent difficulty in identifying structural and semantic modifications in source code.

0.99+ CodeBERTScore Similarity Bias

CodeBERTScore consistently generated very high similarity scores (above 0.99) for both plagiarised and non-plagiarised pairs, making differentiation extremely difficult. This indicates a strong surface bias and limited ability to capture semantic nuances.

Recommended Workflow for Academic Integrity

For educational institutions, a practical workflow for using CEMs involves: 1. Preprocessing code to normalize it. 2. Employing strong ranking methods like CrystalBLEU to flag suspicious pairs. 3. Final confirmation using dedicated plagiarism tools and manual instructor review for complex cases. CEMs serve as an effective initial screening layer.

Impact: Enhance fairness and academic integrity by effectively identifying potential plagiarism, especially at lower modification levels, while acknowledging the need for human oversight for semantic changes.

Calculate Your Potential AI Impact

Estimate the efficiency gains and cost savings AI can bring to your specific enterprise operations. Adjust the parameters to see a customized projection.

Annual Cost Savings with AI $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

Our phased approach ensures a smooth and effective integration of AI, maximizing your return on investment with minimal disruption.

Phase 1: Discovery & Strategy

In-depth analysis of your current workflows, identification of AI opportunities, and development of a tailored implementation strategy.

Phase 2: Pilot Program Development

Building and testing a small-scale AI pilot to validate functionality, gather feedback, and demonstrate initial value.

Phase 3: Full-Scale Integration

Seamless deployment of AI solutions across your enterprise, including data migration, system integration, and user training.

Phase 4: Optimization & Scaling

Continuous monitoring, performance optimization, and strategic scaling of AI capabilities to new areas of your business.

Ready to Transform Your Enterprise with AI?

Book a complimentary 30-minute consultation with our AI experts to discuss your specific challenges and how our tailored solutions can drive your business forward.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking