Enterprise AI Analysis
Integrating Program Analysis with Observability for Root-Cause Analysis
Discover how PRAXIS, an innovative agentic approach, revolutionizes cloud incident diagnosis by leveraging structured graph reasoning and LLM-driven insights, significantly improving accuracy and efficiency.
Transforming Cloud Incident Resolution
PRAXIS delivers unprecedented accuracy and efficiency in root-cause analysis, mitigating the high costs of unresolved cloud incidents.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Unresolved cloud incidents cost over $2M per hour, with 24% not resolved by traditional operational mitigations. Root-cause analysis (RCA) is critical but often ineffective, leading to prolonged and disruptive outages. Traditional LLM-based RCA struggles with unstructured data, context overflow, and lack of dependency awareness.
PRAXIS introduces an orchestrator managing an agentic workflow for diagnosing cloud incidents. It employs an LLM-driven structured traversal over two types of graphs: a Service Dependency Graph (SDG) for microservice-level dependencies and a Hammock-Block Program Dependence Graph (PDG) for code-level dependencies. This integration enables precise root-cause identification down to code or configurations.
The PRAXIS workflow involves four phases: Data Gathering (collecting alerts, logs, traces, metrics, events), Graph Construction (building SDG and Hammock-Block PDG), RCA Decision-Making (LLM-driven PDG traversal, context construction, entity role judgment), and Final RCA Summary. This graph-aware traversal constrains the LLM's reasoning, reducing context space and improving focus.
PRAXIS significantly outperforms state-of-the-art ReAct baselines. It achieves a 61.5% root-cause reasoning accuracy and 73.9% root-cause identification accuracy, representing a 6.3x and 3.4x improvement respectively. Additionally, PRAXIS reduces token consumption by 5.3x, making RCA more efficient and scalable for large cloud applications.
Our ablation study confirms that program context via PDG construction and LLM-guided traversal is the primary driver of PRAXIS's effectiveness. Variants without PDG-guided analysis showed significantly lower accuracy, demonstrating the critical role of structured dependency graphs in anchoring LLM reasoning and preventing premature termination of diagnosis.
Enterprise Process Flow
PRAXIS significantly outperforms state-of-the-art baselines in accurately identifying and explaining cloud incident root causes.
PRAXIS's graph-aware reasoning and focused context lead to a substantial reduction in LLM token consumption, enhancing efficiency.
| Feature | PRAXIS Advantage |
|---|---|
| Reasoning Approach |
|
| Code Context Handling |
|
| Problem Space Coverage |
|
| Efficiency & Scalability |
|
Real-World Incident: Cross-SDG-PDG Traversal
In a challenging incident (Figure 1 in the paper), a degraded external database returned empty responses, triggering a silent retry loop in the Recommendation service. Traditional ReAct agents failed to pinpoint the precise root cause due to missing explicit error logs. PRAXIS successfully isolated the underlying storage issue by performing cross-SDG-PDG traversal, demonstrating its ability to diagnose complex, multi-hop failures that span microservice and code layers.
This capability is critical for incidents where observability data alone is insufficient, allowing PRAXIS to infer hidden dependencies and uncover the true root cause, even without direct error traces.
Calculate Your Potential Savings
Estimate the financial and operational benefits of implementing an advanced AI RCA solution like PRAXIS.
Your Implementation Roadmap
A structured approach to integrating PRAXIS into your enterprise for maximum impact and seamless adoption.
Phase 1: Discovery & Strategy
Conduct an initial assessment of your current RCA processes, identify key pain points, and define strategic objectives for AI-driven incident resolution.
Phase 2: Data Integration & Modeling
Integrate your observability platforms (logs, metrics, traces) and microservice codebases. PRAXIS constructs your unique SDG and PDG models.
Phase 3: Pilot Deployment & Validation
Deploy PRAXIS in a controlled pilot environment, validate its RCA accuracy against real and synthetic incidents, and fine-tune agentic workflows.
Phase 4: Full Scale-Out & Continuous Optimization
Roll out PRAXIS across your entire cloud environment. Establish continuous feedback loops for model retraining and performance optimization.
Ready to Transform Your RCA?
Connect with our AI specialists to explore how PRAXIS can empower your SRE teams and dramatically reduce incident resolution times.