LEGALMIDM: USE-CASE-DRIVEN LEGAL DOMAIN SPECIALIZATION FOR KOREAN LARGE LANGUAGE MODEL

Revolutionizing Legal AI with LEGALMIDM

Discover how LEGALMIDM, a specialized Korean legal-domain LLM, sets new benchmarks in precision and utility for AI-assisted legal workflows, grounded in real-world use cases.

Unlock Legal AI Insights

Quantifiable Impact of Domain Specialization

LEGALMIDM's innovative use-case-driven approach delivers superior performance across critical legal tasks and maintains robust general domain capabilities.

0 Performance Boost (Legal Tasks)

0 Human-Curated Datasets

0 Open LLMs Adapted

See the Full Report

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Domain Adaptation: Strategies for specializing general LLMs for the legal domain.

Strategies for specializing general LLMs for the legal domain.

LEGALMIDM A specialized Korean legal-domain LLM built on a proprietary base model

LEGALMIDM vs. General LLMs on Legal Tasks (AVG Score)
Model	AVG (R-L)
LEGALMIDM-11B	41.06
Qwen2.5-32B	30.64
Llama3.3-70B	25.61
Gemma-2-27b	27.75

The Challenge of Practical Utility in Legal AI

Existing domain-specialized LLMs often lack alignment with nuanced real-world legal application requirements, especially where precision and reliability are essential. This limits their practical utility and necessitates a use-case-driven framework, which LEGALMIDM addresses directly.

"precision and reliability are essential, this lack of consideration limits practical utility."

Data Curation: Methods for constructing high-quality, use-case-driven legal datasets.

Methods for constructing high-quality, use-case-driven legal datasets.

Human-Curated Data Composition Process

Law Majors (B.S. degrees)

→

Legal Industry Professionals

→

Practicing Attorneys

→

Final Dataset (6 tasks, 100 samples each for test)

6 High-Demand Legal Tasks Identified & Curated

Leveraging Written Law for Automatic QA Generation

A distinctive feature of the legal domain is the presence of clear statutory references. LEGALMIDM leverages GPT-4o to generate questions, answers, and specific references from Korean legal statutes, ensuring factual grounding through a verification step.

"Each generated QA pair is factually grounded in the provided legal text."

Training Pipeline: Optimized training protocols including CPT, IT, and prompt optimization.

Optimized training protocols including CPT, IT, and prompt optimization.

LEGALMIDM Training Pipeline Stages

Automatic Law QA Generation

→

Continual Pre-Training (CPT)

→

Instruction-Tuning (IT)

→

System Prompt Optimization

Impact of General Domain Data Integration
Training Stage	Data Composition	Legal Task Performance Impact
CPT	Legal Only	Lower Adaptability (Catastrophic Forgetting Risk)
CPT	Legal + General	Superior Performance & Generalization
IT	Legal Only	Lower Average Results
IT	Legal + General	Better Average Results On Average

Optimal Synthetic Data Format for Legal Training
Variation	Doc-based (R-L)	Open QA (R-L)	MC (Acc)
Q ⇒ A (No Ref)	45.83	14.58	0.64
Q ⇒ A + Ref (Ref in Output)	47.53	16.80	0.56
Q + Ref ⇒ A (Ref in Input)	46.89	17.74	0.65

Legal Advisor System Prompt Persona for Inference

Base Model: Mi:dm-2.0-Base Foundation

LEGALMIDM leverages Mi:dm-2.0-Base, a proprietary Korean-English bilingual 11.5B language model from KT, which is pre-trained on high-quality Korean and English data, ensuring strong foundational understanding of cultural contexts and a substantial 32K context length.

"Korea-centric LLM, trained on high-quality Korean and English data to understand Korean cultural contexts, and features a 32K context length."

Evaluation & Results: Benchmarking LEGALMIDM against state-of-the-art LLMs.

Benchmarking LEGALMIDM against state-of-the-art LLMs.

41.06 LEGALMIDM Achieves Highest Average ROUGE-L on Legal Tasks

Comprehensive Performance Across Key Legal Tasks (AVG R-L)
Model	Complaint	Summary	Petition	QA	MRC	MC	AVG
LEGALMIDM-11B	67.67	47.94	14.46	17.74	57.50	0.65	41.06
Qwen2.5-32B	58.81	30.76	14.08	15.70	33.86	0.26	30.64
Llama3.3-70B	53.40	30.30	9.33	12.23	22.77	0.45	25.61
Gemma-2-27b	51.61	32.37	11.17	13.51	30.09	0.40	27.75
EXAONE-3.5-32B	54.29	25.47	11.28	14.98	30.60	0.27	27.32

Validation of Use-Case-Driven Methodology

Ablation studies robustly confirm the effectiveness of each component of LEGALMIDM's training strategy: integrating general domain data, formatting references in the input, and using system prompts only during inference. This approach leads to superior performance in legal tasks while maintaining strong general domain capabilities.

"confirming the effectiveness of our methodology."

Advanced AI ROI Calculator

Estimate the potential annual savings and hours reclaimed by implementing enterprise AI solutions tailored to your business.

Your Industry

Number of Employees

Hours per Week on Manual Tasks

Average Hourly Rate ($)

Potential Annual Savings $0

Hours Reclaimed Annually 0

Your AI Implementation Roadmap

A structured approach to integrate LEGALMIDM into your legal operations, from initial assessment to full-scale deployment.

Phase 1: Use-Case Definition & Data Curation

Collaborate with legal professionals to identify high-demand tasks and construct human-curated, use-case-driven datasets.

Phase 2: Automated Data Generation & Pre-training

Leverage written law for synthetic data creation and perform continual pre-training on a mix of legal and general domain data.

Phase 3: Instruction-Tuning & Prompt Optimization

Refine the model with instruction-tuning using mixed datasets and optimize system prompts for inference.

Phase 4: Comprehensive Evaluation & Deployment

Rigorously benchmark LEGALMIDM against state-of-the-art LLMs on both legal and general tasks, then prepare for deployment.

Ready to Transform Your Legal Operations?

Discuss how LEGALMIDM can be tailored to your specific enterprise needs and start building a more efficient, precise legal workflow.

Schedule a Legal AI Consultation

LEGALMIDM: USE-CASE-DRIVEN LEGAL DOMAIN SPECIALIZATION FOR KOREAN LARGE LANGUAGE MODEL

Revolutionizing Legal AI with LEGALMIDM

Quantifiable Impact of Domain Specialization

Deep Analysis & Enterprise Applications

Domain Adaptation: Strategies for specializing general LLMs for the legal domain.

The Challenge of Practical Utility in Legal AI

Data Curation: Methods for constructing high-quality, use-case-driven legal datasets.

Human-Curated Data Composition Process

Leveraging Written Law for Automatic QA Generation

Training Pipeline: Optimized training protocols including CPT, IT, and prompt optimization.

LEGALMIDM Training Pipeline Stages

Base Model: Mi:dm-2.0-Base Foundation

Evaluation & Results: Benchmarking LEGALMIDM against state-of-the-art LLMs.

Validation of Use-Case-Driven Methodology

Advanced AI ROI Calculator

Your AI Implementation Roadmap

Phase 1: Use-Case Definition & Data Curation

Phase 2: Automated Data Generation & Pre-training

Phase 3: Instruction-Tuning & Prompt Optimization

Phase 4: Comprehensive Evaluation & Deployment

Ready to Transform Your Legal Operations?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai