Why we are working on reliability

Software failures are a costly burden on productivity and customer trust

Why we are working on reliability

Troubleshooting requires engineers to embark on an urgent hunt across multiple data sources - sometimes after getting paged at 2am. In our former roles at growing startups and large‑scale organisations we’ve experienced the frustration firsthand. We’ve seen how critical time-to-resolution can be to maintain customer trust. We’ve also seen that the most experienced engineers - the scarcest strategic resource - are often pulled into troubleshooting tasks.

System data still requires time-consuming manual work

Modern software stacks emit tremendous volumes of observability, code changes, infra events and other data that can help anticipate and diagnose problems. But using all this data still requires time-consuming manual work. Why? Because engineers have contextual knowledge that system data lacks.

Engineers remember how parts of the system were built and how things have broken in the past. When troubleshooting incidents, these mental models help them hypothesize about problems and decide what data will help prove or disprove those hypotheses. When building new features, the same contextual knowledge helps engineers anticipate what could go wrong and build fault tolerance and testing accordingly.  

Existing AI products struggle with missing contextual knowledge

We frequently hear of the limitations of AI features from incumbent software companies: anomaly detection full of false positives and cause analysis that can’t handle the ambiguous problems it is most needed for. These limitations are driven by a lack of contextual understanding of each customer’s unique system.

System context is the most important challenge we are solving at Phoebe. Phoebe's underlying contextual knowledge enables AI agents to better understand disparate, high volume data sources. As a result, engineering teams are already using Phoebe's Search Agent to triage alerts and diagnose ambiguous incidents - sometimes up to 90% faster. 

Companies should retain control of AI reliability intelligence

We see a common trap emerging: vendors are building black box AI features, often unnecessarily tied to expensive observability pipelines and incident workflow software. As engineering teams explore AI tools they therefore face a dilemma: take on the cost and risk of trying to build internally, or accept the loss of control as their investments of time, data and model improvements create vendor lock-in. 

Phoebe customers retain control. It is data source and workflow software agnostic so they can change and combine solutions as the needs of different parts of their company evolve. Uniquely, the contextual knowledge Phoebe builds from modelling system attributes and causal relationships, becomes an AI data asset that customers retain ownership of and can use in any other application. One user is already experimenting with Phoebe data in their Cursor IDE to anticipate potential reliability vulnerabilities.

Follow our progress

Over the coming months we will improve and extend Phoebe agents’ ability to help detect, resolve and prevent bugs and incidents. We're excited to see how they help engineers build super-resilient systems!