I just open-sourced my Proof of Concept for Agentic Test Explorer, a product-agnostic, AI-driven exploratory testing framework built with LangGraph.
Project repository: https://github.com/srbarrios/agentic-test-explorer
What started as a private project for a specific application has been completely refactored to be adaptable to any web stack. The goal is to evolve this autonomous QA approach with community feedback shaping its direction.
Configure it for your stack via a small config.yaml, point it at your app, and let specialized agents drive a real browser to find bugs, render anomalies, and unscripted edge cases.
It intelligently gathers test context directly from Pull Requests, documentation, or API specifications via MCP tools.
The framework is built on a Supervisor-Worker Swarm pattern powered by LangGraph, Playwright, and your choice of Claude (default) or Google Gemini.
Based on the mission type, the system spins up either a Standard or Advanced routing graph:
A Supervisor node dynamically evaluates the workspace state and dispatches control to specialized worker nodes. Agents never touch the browser directly. Instead they emit strict JSON intents to a Record-and-Translate Browser Engine, which validates selectors, executes commands with Playwright, and captures an Accessibility Tree / DOM snapshot.
config.yaml adapts the framework to any web app.reproduction_*.spec.ts Playwright script.data-test-subj > aria-label > visible text priority.mcp_servers.json for domain knowledge.AGENT_SKILLS_ROOT and they are exposed automatically..spec.ts files, and an executive Markdown report.# Install
pip install -e .
playwright install chromium
# Authenticate against your app
agent-auth
# Run a standard QA mission
agent-explorer --missions missions/new_user_agent.yaml
# Run with visible browser
agent-explorer --missions missions/explorer_agent.yaml --headed
# Generate tests from a PR
agent-explorer --pr-url https://github.com/org/repo/pull/123 --execute --headed
For every mission, the framework generates a report_<thread_id>/ directory containing:
traces.log – Full audit trail of every thought, plan, and tool invocation.test_report.md – Concise executive summary (objective, actions, bugs, PASS/FAIL).action_tape.jsonl – Line-delimited JSON log of every browser command.reproduction_*.spec.ts – Auto-generated Playwright tests, one per bug detected.screenshots/ – Image evidence captured on every detected bug.My goal is to evolve this autonomous QA approach, and community feedback is critical to shaping its direction. If you are an SDET, QE Architect, or simply interested in the future of AI in software testing, I would appreciate it if you could test it out and share your thoughts. Contributions are welcomed.