Analytics, Predictive Flakiness Scoring & Onboarding 2.0 | desplega.ai

Hey team! This week we're bringing intelligence to your testing workflow. We launched comprehensive analytics dashboards for tracking quality metrics over time, Predictive Flakiness Scoring (PFS) for test suites that uses machine learning to predict which tests will fail before they break your pipeline, a completely redesigned onboarding that gets new users from signup to first test in minutes, and AI-powered test grading that reviews your tests and suggests quality improvements. These updates give you visibility, foresight, and guidance—the three pillars of confidence when shipping at speed. At desplega.ai, we believe data-driven testing means you stop trading quality for velocity. Let's dive in!

Time Series Analytics: Track Your Quality Metrics Over Time

We've built a comprehensive analytics dashboard that tracks your testing metrics over time. See how many tests you've created, how test runs trend week-over-week, what your pass/fail rates look like historically, and where your coverage stands—all visualized with interactive time series charts. Filter by time period (7 days, 30 days, 90 days, all time), granularity (daily, weekly, monthly), and app configuration to drill down exactly where you need insights.

The analytics system surfaces DORA metrics that matter for engineering velocity: deployment frequency (correlated with test run frequency), change failure rate (tracked via test failures), and time to restore service (time from failure detection to fix). These industry-standard metrics help CTOs and engineering leaders measure developer velocity objectively, showing whether testing is accelerating or blocking releases.

For teams practicing continuous deployment, these analytics answer critical questions: Is test coverage increasing as we ship features? Are we catching more bugs before production? How does our test reliability trend over time? For scaling QA operations, time series data shows whether quality improves or degrades as the team grows—crucial visibility when deciding whether to hire more QA engineers or invest in automation. The charts make it easy to spot patterns: if test failures spike every Friday, maybe your CI/CD pipeline needs retry logic. If new tests plateau, maybe test discovery needs a boost. Data-driven decisions beat gut feelings when managing quality at scale.

Predictive Flakiness Scoring for Test Suites: Predict Failures Before They Happen

Flaky tests are the silent productivity killer in every CI/CD pipeline. We've extended our Predictive Flakiness Scoring (PFS) system to cover entire test suites, not just individual tests. PFS uses Bayesian statistics and historical run data to calculate the probability that a test or suite will fail due to flakiness rather than a real bug. Now you can see PFS scores for all your test suites in a dedicated Flakiness Tab in the analytics dashboard.

Here's how it works: PFS analyzes the last 100 runs of each test suite, looking at pass/fail patterns, retry behavior, and temporal correlation (does it fail at specific times or randomly?). It then assigns a flakiness score from 0 to 1, where 0 means "never flaky, always deterministic" and 1 means "constantly flaky, unreliable". Suites with high PFS scores get flagged automatically, so you can prioritize fixing them before they waste more developer time. The system distinguishes between flaky tests (high variance, passes on retry) and genuine regressions (consistent failures), giving your team clearer signals about what needs immediate attention.

For teams managing large end-to-end testing suites, PFS is transformative. Instead of playing whack-a-mole with random test failures, you get a prioritized list of the flakiest suites ranked by impact. Focus your test maintenance budget on high-PFS tests first, and watch overall reliability improve. For QA for startups where engineering time is precious, PFS prevents wasting hours investigating false positives—the system tells you "this failure is probably flakiness, auto-retry worked" versus "this failure is consistent, investigate immediately." This improves developer confidence by reducing alert fatigue: when the pipeline fails, you know it's real, not noise. Combined with our self-healing tests, PFS creates a testing workflow that's both intelligent and resilient—tests adapt to changes automatically, and you know which tests need manual attention.

Onboarding 2.0: From Signup to First Test in Minutes

We've completely redesigned the new user onboarding experience to get you from "I just signed up" to "I have my first automated test running" in under 5 minutes. The new onboarding is a guided, step-by-step journey that walks you through: (1) setting up your app configuration with URL validation, (2) configuring authentication if needed, (3) creating your first test with AI assistance, and (4) running an automated discovery to find more test opportunities. Each step has inline help, validation, and progress indicators, so you always know where you are and what's next.

The onboarding system uses a split-panel layout: on the left, clear instructions and form inputs; on the right, live preview of what you're building. As you configure your app, you see a live checklist of requirements (base URL valid? login credentials set? first test created?). When you create your first test, you can watch it execute in real-time, building confidence that the system understands your app. When discovery runs, you see suggestions appear live, with a Tinder-style swipe UI to approve or skip—making test coverage building interactive and fun.

For QA for startups evaluating testing solutions, streamlined onboarding drastically reduces time-to-value. No more reading 20-page setup guides or waiting for sales demos—just sign up, follow the guided flow, and see results immediately. For scaling QA teams onboarding new engineers, this self-service experience means developers can start writing tests on day one without QA team handholding. The onboarding also includes automatic organization creation via Clerk integration: when you sign up, we create your organization automatically (sanitizing the name to avoid URL-like strings Clerk rejects), select it for you, and drop you straight into the guided setup. No manual org creation, no navigation confusion—just a smooth path from "create account" to "running tests." This embodies shift-left testing: when onboarding is fast, developers test earlier and more often, catching issues before they reach production.

AI Test Grading: Get Quality Feedback on Your Tests

We've built an AI test grading system that reviews your tests and suggests improvements. The grader analyzes each test's steps, looking for issues like redundant actions, missing assertions, poorly ordered steps, or opportunities to merge similar checks. It then suggests specific changes: add a wait here (page loads slowly), merge these steps (they test the same thing), move this assertion earlier (validate sooner), or delete this step (it's redundant). Each suggestion includes a clear explanation of why the change improves test quality.

The grader uses structured outputs via LangChain tools, giving you actionable mutations: TestStepAdd to insert new steps, TestStepDelete to remove unnecessary ones, TestStepsMerge to combine redundant checks, TestStepMove to reorder for better logic flow, and TestStepDisable to temporarily turn off flaky steps while investigating. This makes test improvement a conversation: the AI suggests, you review, you apply or reject. Over time, you build higher-quality tests that are easier to maintain and more reliable.

For teams managing technical debt QA where old tests need cleanup, the grader automates code review for tests. Instead of manually auditing hundreds of test steps, run the grader and get a prioritized list of improvements. For teams practicing continuous deployment, the grader ensures new tests meet quality standards before they enter your CI/CD pipeline—preventing low-quality tests from becoming maintenance burdens later. The grader also helps with knowledge transfer: junior developers learn testing best practices by seeing AI suggestions on their tests. This is AI testing used correctly—not replacing human judgment, but augmenting it with intelligent suggestions that save time and improve quality.

Block Grouping & Visual Improvements: Better Test Readability

We've added block grouping to the test editor, letting you organize test steps into collapsible groups for better readability. Long tests with dozens of steps can now be structured into logical sections: "Setup," "Main Flow," "Assertions," "Cleanup." Collapse groups you're not actively editing, expand the section you're working on, and navigate large tests effortlessly. Each group shows a summary (how many steps, pass/fail status), so you can quickly spot which section failed without expanding everything.

The new BlockEditor component and grouping utilities make test authoring feel more like writing structured documents than scripting. Groups are contiguous (all steps in a group must be adjacent), automatically detected from metadata, and visually distinct (indented, collapsible headers). This improves collaboration: when a teammate opens your test, they immediately see the structure and intent, rather than a flat list of 50 cryptic steps.

For teams with complex end-to-end testing scenarios (multi-step checkout flows, intricate user journeys), grouping dramatically improves maintainability. When a test fails, you see which group failed (e.g., "Payment Processing"), immediately narrowing your debugging scope. Combined with enhanced test context details (showing screenshots, logs, and data for each step), this gives you a complete picture of test execution. For scaling QA teams where multiple people touch the same tests, visual structure reduces cognitive load—everyone understands the test faster, making collaboration smoother.

URL Validation & HTTP Health Checks: Catch Config Errors Early

We've built intelligent URL validation that checks URLs as you type them into app configuration. The system performs DNS resolution to verify the domain exists, makes an HTTP request to check the server responds, measures latency, and validates the status code. If a URL is unreachable or returns an error, you get immediate feedback with actionable suggestions: "DNS lookup failed—check the domain" or "Server returned 404—verify the path."

The validation includes a validated URL input component with real-time feedback: green checkmark when valid, red error with details when invalid, and loading spinner while checking. This prevents the classic mistake of configuring tests with typos in the base URL, then spending 20 minutes debugging why tests fail. Catch configuration errors at input time, not at test execution time, saving massive debugging frustration.

For QA for startups where every engineer sets up their own test environments, URL validation reduces support burden—fewer tickets saying "tests don't work" when the real issue is a misconfigured URL. For teams practicing continuous deployment with multiple staging environments, validation ensures you're testing the right environment before you waste CI minutes. The HTTP health checks also enable smarter test scheduling: if the target server is down, delay tests instead of marking them as failed, reducing false negatives that erode developer confidence.

Marketing Metrics & UX Polish: Smoother Experience All Around

Beyond the major features, we've shipped a ton of UX polish this week. A new marketing metrics API helps us track user acquisition and engagement, so we can understand what drives value and iterate faster. Full-screen loading states prevent UI jank during data fetching, making the app feel snappier. User preferences now have a reset option, so you can clear onboarding state or revert customizations if needed.

We've improved auto-scroll behavior in the sidebar (scroll to active item automatically, no more hunting for your current test), refined the onboarding content and layout based on user feedback, and fixed edge cases in test agent prompts. These "small" changes (one commit message literally said "some 'small' changes I guess, praying now") add up to a significantly smoother experience across the platform.

For teams focused on developer velocity, polish matters. When the UI responds instantly, when the sidebar scrolls to the right place, when forms validate immediately, developers stay in flow instead of getting frustrated. These micro-optimizations compound: a developer who spends 30 seconds less per test creation saves hours per month, accelerating the entire team's output. This is how we think about quality at desplega.ai—big features get headlines, but polish delivers daily value.

Intelligence-Driven Testing: The Future is Data + AI

This week's releases represent a shift toward intelligence-driven testing. Analytics give you visibility into trends and patterns. Predictive flakiness scoring gives you foresight about which tests will cause problems. AI test grading gives you guidance on improving quality. URL validation and health checks give you early warnings about configuration issues. Together, these capabilities create a testing workflow that's proactive rather than reactive—you spot problems before they break production, improve quality before tests become technical debt, and scale testing without proportionally scaling headcount.

For CTOs and engineering leaders managing DORA metrics, this week's updates provide the data infrastructure needed to measure and improve engineering effectiveness objectively. Track how test coverage correlates with deployment frequency. See if fixing flaky tests reduces change failure rate. Measure whether faster onboarding increases developer productivity. Data-driven decisions beat gut feelings when optimizing for both quality and velocity.

For teams navigating product-market fit, these features remove testing as a velocity bottleneck. Ship features faster because you have confidence in your test suite (analytics show improving coverage, PFS shows low flakiness, grader ensures quality). Onboard new engineers faster because the guided flow gets them productive immediately. Reduce account executive churn caused by quality issues because you catch bugs before they reach customers. At desplega.ai, we believe intelligence-driven testing is how you stop trading quality for speed—you get both, sustainably.

What's Next: Smarter Predictions, Deeper Integrations

With analytics and PFS infrastructure in place, we're exploring advanced predictions: which tests are most likely to catch regressions for specific code changes (smart test selection), when to automatically disable flaky tests (quality gates), and how to predict future test coverage needs based on feature roadmap. We're also building deeper integrations: Slack notifications for PFS alerts (get warned when a suite becomes flaky), GitHub App for automatic test suggestions on pull requests, and dashboard widgets showing real-time quality metrics.

We'd love feedback on this week's intelligence features! Do the analytics dashboards surface actionable insights? Is PFS accurately predicting flaky tests? Does the new onboarding get you to value faster? How's the test grading helping your team? Reach out at contact@desplega.ai or book a demo to see these features in action—we can't wait to hear how data-driven testing transforms your workflow!

Analytics, Predictive Flakiness Scoring & Onboarding 2.0 | desplega.ai

Time Series Analytics: Track Your Quality Metrics Over Time

Predictive Flakiness Scoring for Test Suites: Predict Failures Before They Happen

Onboarding 2.0: From Signup to First Test in Minutes

AI Test Grading: Get Quality Feedback on Your Tests

Block Grouping & Visual Improvements: Better Test Readability

URL Validation & HTTP Health Checks: Catch Config Errors Early

Marketing Metrics & UX Polish: Smoother Experience All Around

Intelligence-Driven Testing: The Future is Data + AI

What's Next: Smarter Predictions, Deeper Integrations

Ready to Get Started?

Related Releases

Playwright Reporter Integration & Test Suite Optimization

Self-Service Billing, VIBE QA & Automated Test Discovery

Test Execution Engine & Reliability Improvements