A More Efficient Way to Investigate with Datadog

Role
Senior Product Designer — Datadog

Focus
Strategy, UX Research, Visual Design, Platform Design

 

Introduction

Datadog Case Management helps track, manage, and solve application issues in one place. Users can make cases from any signal and escalate them to formal incidents without switching apps.
 
 

The Problem

Handling an urgent software issue is like being a first responder for the internet — intense, and a bit extreme. Today, the typical incident management journey is fragmented across many applications.

 
 
 

The Many Components of Incident Response

During an incident, the on-call engineer is alerted, often at night, to investigate and resolve customer issues by using various apps.

 
 

Customer Quote

“We use New Relic for application monitoring, Datadog for logs, BigPanda for correlation, and ServiceNow for tickets... It's a mess.”

— DevOps Engineer from Citizen’s Bank

 
 

The Vision

Fix the broken incident investigation experience by putting it all in one platform so that users can more fluidly move from the raw alerting data, to case investigations, to a formal incident response.

 
 

One Team View for Prioritization and Ownership

With a centralized ticketing system for tracking, triaging, and troubleshooting issues across the Datadog platform, users can easily prioritize and assign ownership of cases all within one view.

 

Making Cases a Single Source of Truth

By integrating ticketing with raw observability data, Case Management allows for better root cause analysis and collaboration without context switching. Users can easily create a source of truth for the issue by attaching other Datadog signals, graphs, and telemetry data throughout the case, where they are rendered as linkable cards.

 
 
 

Building Out a True Platform Experience

To bring the vision to life, we built an escalation toolkit so other product teams could easily integrate investigation workflows into their product areas.

 

Impact

After launching, Case Management saw significant success. In the first year, we experienced substantial growth and customer retention, establishing us as a top player in the AIOps industry.


15% Reduction

Reduction in MTTR (mean time to resolve) from our top orgs.


40 New Orgs

Using Case Management when launched to GA


6 Products

Integrated investigation workflows within their views: Monitors, Security Signals, Error Tracking, Cloud Costs, Watchdog & Incidents


2 SKUs

Launched and moneitized as part of this product effort - Event Correlation, Security Case Management


Datadog named leader in AIOps by independent research firm, Forrester