Enterprise AI Analysis

Prompting and Fine-Tuning Open Source Large Language Models for Stance Classification

Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborious annotation effort, and thus hampers its potential to generalize across different contexts. In this work, we investigate the use of Large Language Models (LLMs) as a stance detection methodology that can reduce or even eliminate the need for manual annotations. We investigate 10 open source models and 7 prompting schemes, finding that LLMs are competitive with in-domain supervised models but are not necessarily consistent in their performance. We also fine-tuned the LLMs, but discovered that fine-tuning process does not necessarily lead to better performance. In general, we discover that LLMs do not routinely outperform their smaller supervised machine learning models, and thus call for stance detection to be a benchmark for which LLMs also optimize for.

Authors: Iain J. Cruickshank, Lynnette Hui Xian Ng
Publication: ACM Transactions on Intelligent Systems and Technology (April 2026)

Executive Impact: Key Findings & Strategic Implications

This research offers critical insights for enterprises leveraging LLMs for advanced text analysis, particularly in understanding public opinion and social media sentiment. It highlights both the potential and current limitations of open-source LLMs in complex NLP tasks like stance classification.

0 LLMs Evaluated

0 Prompting Schemes Tested

0 Max F1-Score Achieved

0 Correctness for Valid Outputs

Schedule Your Strategy Session

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Overall LLM Performance

Prompt Engineering Insights

Fine-Tuning Outcomes

Challenges & Ethics

LLMs for Stance Classification: Capabilities & Consistency

Large Language Models offer a promising new approach to stance classification, capable of reducing manual annotation efforts and generalizing across contexts. This research evaluated 10 open-source LLMs and 7 prompting schemes across 6 social media datasets to assess their suitability and identify performance trends. While LLMs can be competitive with traditional supervised models, their performance is not consistently superior and varies significantly based on model architecture and prompting strategy.

0.69 Highest F1-Score Achieved by an LLM (SemEval2016 Dataset)

Feature	LLMs (Zero-shot)	Supervised Models (Traditional)
Annotation Effort	Minimal to none for new tasks	High, labor-intensive manual annotation
Generalizability	Potentially high across varied contexts	Strong in-domain, struggles out-of-domain
Performance	Competitive but inconsistent Sensitive to prompting	High consistency for specific tasks Requires extensive retraining for new contexts
Adaptation	Primarily via prompt engineering	Extensive model re-training or architecture changes

The Art of Prompting: Unlocking LLM Potential

The effectiveness of LLMs in stance classification is highly sensitive to the way inputs are formatted and instructions are given. This study explores seven distinct prompting schemes, revealing that carefully crafted prompts, especially those involving few-shot examples or chain-of-thought reasoning, can significantly improve performance and lead to more valid and accurate outputs.

Enterprise Process Flow: Prompting Schemes

Task-Only

→

Task Definition

→

Context Analyze

→

Context Question

→

Few-Shot Prompt (FSP)

→

Zero-shot Chain-of-Thought (CoT)

→

CoDA

FSP & CoT Top Performing Prompting Schemes for Stance Detection

Fine-Tuning: Specialization vs. Generalization Trade-offs

While fine-tuning is often seen as a way to specialize LLMs for specific tasks, this research indicates that it does not consistently lead to better out-of-domain performance for stance classification. In many cases, fine-tuned models performed worse than their zero-shot counterparts, suggesting that specialization might hinder generalization, especially with limited fine-tuning data and varying definitions of stance across datasets.

Inconsistent Improvement Observed Effect of Fine-Tuning on Out-of-Domain Performance

Aspect	Fine-Tuned LLMs	Zero-Shot LLMs
Generalization	Often reduced due to over-specialization Struggles with out-of-domain data	Generally higher, leveraging broad pre-trained knowledge More adaptable to new, unseen contexts
Performance	Mixed results, can worsen out-of-domain F1 scores Best for very specific, in-domain niche tasks	Competitive with baselines, but inconsistent Can achieve high F1 with optimal prompting
Data Requirement	Requires small, task-specific fine-tuning data Performance sensitive to data quantity relative to model size	No task-specific training data needed Relies entirely on pre-trained knowledge
Flexibility	Less flexible for diverse new contexts	More adaptable to varied tasks and new targets

Nuances of Evaluation and Ethical Considerations

Evaluating LLMs for stance detection presents unique challenges, including handling ambiguous or invalid outputs and addressing inherent biases from pre-training data. Despite explicit instructions, models frequently return extraneous text, and the quality of output is strongly linked to prediction correctness. Furthermore, the energy consumption of LLMs and potential for misuse (e.g., censorship) are critical ethical considerations.

59% Accuracy in Predicting if LLM Stance Output is Correct (Overall)

Case Study: The Michael Essien Ebola Example: Ambiguity in Labels

The study highlights challenges in stance annotation, citing an example: a statement about Michael Essien having Ebola, ‘@xx no he hasn't. The man himself confirmed not true @MichaelEssien’ that was annotated as neutral, but arguably should have been 'against' the claim. This illustrates how varied sentence interpretations and inconsistent manual annotations can complicate the task for both human labelers and LLMs, emphasizing the need for robust evaluation practices.

Discuss Your Implementation

Advanced ROI Calculator

Estimate the potential return on investment for implementing AI-driven stance classification within your organization.

Your Industry

Number of Employees (Impacted by Manual Stance Analysis)

Avg. Weekly Hours on Manual Analysis per Employee

Average Hourly Cost per Employee ($)

Estimated Annual Savings

Annual Hours Reclaimed

Calculate Your Custom ROI

Your AI Implementation Roadmap

A phased approach to integrating LLM-driven stance classification for maximum impact and minimal disruption.

Phase 01: Strategy & Discovery

Identify key targets and domains for stance classification, align with business objectives, and assess current data infrastructure.

Phase 02: Model Selection & Prompt Engineering

Select optimal open-source LLMs based on performance characteristics and fine-tune prompting schemes for your specific use cases.

Phase 03: Pilot & Iteration

Conduct a pilot program on a representative dataset, gather feedback, and iterate on prompting strategies and model configurations.

Phase 04: Integration & Scaling

Integrate the LLM-driven solution into existing workflows, ensuring robust data pipelines and scalable inference capabilities.

Phase 05: Monitoring & Optimization

Continuously monitor model performance, refine prompts, and explore further fine-tuning opportunities to maintain accuracy and relevance.

Map Your AI Journey

Ready to Transform Your Enterprise with AI?

Unlock the power of advanced language models for deeper insights into public opinion and sentiment. Our experts are ready to guide you.

Book a Free Consultation

Enterprise AI Analysis

Prompting and Fine-Tuning Open Source Large Language Models for Stance Classification

Executive Impact: Key Findings & Strategic Implications

Deep Analysis & Enterprise Applications

LLMs for Stance Classification: Capabilities & Consistency

The Art of Prompting: Unlocking LLM Potential

Enterprise Process Flow: Prompting Schemes

Fine-Tuning: Specialization vs. Generalization Trade-offs

Nuances of Evaluation and Ethical Considerations

Case Study: The Michael Essien Ebola Example: Ambiguity in Labels

Advanced ROI Calculator

Your AI Implementation Roadmap

Phase 01: Strategy & Discovery

Phase 02: Model Selection & Prompt Engineering

Phase 03: Pilot & Iteration

Phase 04: Integration & Scaling

Phase 05: Monitoring & Optimization

Ready to Transform Your Enterprise with AI?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai