Reinforcement Learning-based Fine-tuning of Traversal Agents (RL-FTA) significantly reduces the 'steps-to-diagnosis' metric compared to zero-shot prompt-based traversal in large-scale microservice graphs.
Motivation
The original paper relies on LLM prompting for graph traversal, which may result in inefficient random walks or loops in highly complex graphs with hundreds of nodes. By treating the traversal as a sequential decision-making problem, an agent can learn optimal search heuristics specific to system topologies rather than relying solely on generalized semantic knowledge.
Proposed Method
Develop an OpenAI Gym-compatible environment representing the Service Dependency Graph (SDG) where the agent's actions are 'move to neighbor' or 'analyze node'. Train a lightweight LLM (e.g., Llama-3-8B) using Proximal Policy Optimization (PPO). Define the reward function based on minimizing the hop-count to the ground-truth root cause node and penalizing revisiting nodes. Compare the fine-tuned agent's path efficiency against the baseline prompt-based agent on a held-out set of historical incidents.
Expected Contribution
A trained 'RCA-Navigator' model that demonstrates faster convergence to root causes in complex topologies, establishing a methodology for specializing LLMs in graph navigation tasks.
Required Resources
Historical incident datasets with ground truth (e.g., Train Ticket or firm-internal logs), GPU resources for RL fine-tuning (e.g., 4x A100s), and a graph simulation environment.
Source Paper
Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications