Skip to main content
Enterprise AI Analysis: PyCaret-Based Text Feature Identification for Projects of the China International College Students' Innovation Competition (CICSIC 2025)

AI ANALYSIS FOR:

PyCaret-Based Text Feature Identification for Projects of the China International College Students' Innovation Competition (CICSIC 2025)

This study leverages the PyCaret automated machine learning framework and natural language processing techniques to analyze 758 award-winning project titles from the CICSIC 2025 vocational education track. The research successfully developed an automated text feature recognition model, identifying key textual elements like 'precision', 'laser', 'industrial transformation', and 'industrial robots' that significantly distinguish gold-award projects. The final RandomForestClassifier achieved an accuracy of 73.03% on the test set with a perfect precision of 1.0. SHAP interpretability analysis further revealed that features indicating industrial transformation and robotics are top predictors, offering a novel, objective, and interpretable method for evaluating innovation and entrepreneurship projects.

Quantifiable Impact

Our advanced AI model extracts critical insights and predicts outcomes with measurable accuracy, driving data-informed decisions for your enterprise.

758 Total Projects Analyzed
73.03% Model Accuracy
1.0 Top Predictor Precision
224 Gold & Silver Awards Identified

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Introduction & Problem

This study addresses the critical need for objective and interpretable evaluation of innovation and entrepreneurship projects, particularly in college student competitions. With the rise of AI and machine learning, intelligent algorithms offer a pathway to move beyond subjective assessments. The research aims to apply these advanced techniques to project title texts from the China International College Students' Innovation Competition (CICSIC) to identify distinguishing features for high-quality projects, providing a novel perspective on project evaluation.

Research Methodology

The research developed a comprehensive textual intelligent analysis framework based on PyCaret, integrating advanced natural language processing (NLP) and automated machine learning (AutoML). Key steps included Chinese text cleaning and neologism discovery using Pointwise Mutual Information (PMI), a voting-based tokenization fusion algorithm (combining Jieba and Pkuseg), and feature extraction using CountVectorizer and TfidfVectorizer with Chi-square for selection. The framework then employed AutoML to compare and tune models, selecting RandomForestClassifier for its optimal performance. SHAP (SHapley Additive exPlanations) was used for model interpretability.

Key Findings & Analysis

Analysis of 758 project titles revealed critical textual features distinguishing gold-award projects. High-frequency 1-gram terms included 'precision', 'laser', and 'China's original innovation'. Discriminative 2-gram combinations like 'industrial transformation' and 'industrial robots' also emerged as key features. The RandomForestClassifier model achieved an impressive 73.03% accuracy on the test set and a perfect precision of 1.0. SHAP analysis further pinpointed features such as 'has_industrial_transformation' and 'has_industrial_robot' as the strongest positive contributors to predicting higher award levels, underscoring the evaluators' emphasis on practical application and technological advancement.

Conclusion & Future Work

The study successfully demonstrated that text features from project titles can predict award levels, highlighting the importance of technological innovation and industrial application potential. While providing actionable insights for project optimization, the study acknowledges limitations such as limited data dimensionality (only project title text) and shallow semantic extraction (relying on TF-IDF over deep learning like BERT). Future work will explore multimodal features, advanced text representations, online learning frameworks, and personalized recommendation systems to enhance the generalizability and utility of the approach.

73.03% Predictive Model Accuracy

The RandomForestClassifier model achieved high accuracy on the test set, demonstrating its effectiveness in identifying high-potential projects.

Enterprise Process Flow

Text Cleaning
Tokenization Fusion
New Word Discovery
Feature Extraction (TF-IDF)
AutoML Modeling (PyCaret)
SHAP Interpretability

Top Discriminative Unigram Features

These terms were found to be highly frequent and statistically significant in distinguishing gold-award projects, based on Chi-square scores.

Term (Chinese) Term (English) Chi-square Score
精密 Precision 12.904
激光 Laser 9.850
国内首创 China's original innovation 9.850
破局 Market breakthrough 9.536
新一代 Next-generation 8.343
焊接 Welding 7.859

Impactful Features from SHAP Analysis

SHAP-based interpretability revealed that projects emphasizing industrial transformation and the use of industrial robots were among the strongest positive predictors for receiving higher awards. This highlights the evaluators' focus on practical application and technological advancement.

Advanced ROI Calculator

Estimate the potential return on investment for implementing AI-driven project evaluation in your organization.

Estimated Annual Savings $0
Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A typical enterprise AI journey with us is structured into clear, manageable phases, ensuring a smooth transition and measurable success.

Phase 1: Discovery & Strategy

Initial consultations, data assessment, and AI strategy alignment with your business objectives.

Phase 2: Custom Model Development

Building and training AI models tailored to your specific data and evaluation criteria.

Phase 3: Integration & Pilot

Seamless integration into existing workflows and a pilot program to test and refine the system.

Phase 4: Deployment & Scaling

Full-scale deployment across your organization, with ongoing monitoring and optimization.

Ready to Transform Your Project Evaluation?

Unlock the power of AI to make objective, data-driven decisions and foster innovation within your enterprise.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking