Milliseconds matter: How Stack Overflow uses Grafana to optimize its systems
Founded in 2008, Stack Overflow欧洲杯投注软件 is the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. More than 50 million professional and aspiring programmers visit Stack Overflow each month to help solve coding problems, develop new skills, and find job opportunities. Grafana Labs spoke with Kyle Brandt, Director, Site Reliability Engineering at Stack Overflow, to learn about the various data sources and monitoring systems they rely on, metric visualization problems they were having, and how not all dashboards are created equal.
欧洲杯投注软件Stack Overflow’s custom monitoring tool Opserver provided purpose-built dashboards, but lacked a way to easily create custom self-service dashboards. They needed a tool to allow their developers and SRE teams the freedom to quickly create custom-tailored dashboards to visualize data from OpenTSDB, Elasticsearch, and their alerting tools, all in the same experience. They wanted to empower the teams that knew the data the best.
Stack Overflow’s Ad-server team was the first to discover Grafana. They were searching for a tool to create custom server latency dashboards. Displaying ads on a website is latency sensitive; a millisecond delay can have a huge impact on revenue. Server latency also affects which ads are displayed to whom on the site. The quicker the ad is served, the more targeted it can be for the user. Grafana’s real-time dashboards were critical in discovering where the Ad-Server team could optimize to have the best server performance possible. Grafana quickly spread from the Ad-Server team to other teams at Stack Overflow, since it can visualize data from many different data sources, both open source and commercial. For Stack Overflow, this meant OpenTSDB data could be visualized alongside Elasticsearch data, which could be viewed alongside their custom alerting data from Bosun.
The Bosun alerting system and a new Grafana plugin
欧洲杯投注软件 is an open source alerting system Stack Overflow created. It has an expressive domain-specific language for evaluating alerts and creating detailed notifications. It also tests alerts against history for a faster development experience. Bosun is robust, but comes with a complex user interface and an often steep learning curve. Its visualization options are also limited. In Kyle’s recent GrafanaCon talk, “,” he described monitoring as “a medium for humans to communicate with other humans through machines.” The Bosun project reflects a deep understanding of the impact of alerting on culture, which Kyle is extremely sensitive to. An intuitive UI and consistent user experience are key to making complex systems easier to understand – something Grafana has always prioritized. So the team decided to build a plugin to bring the power of Bosun into Grafana’s user-friendly interface.
Grafana’s plugin architecture allowed the Stack Overflow team to create a data source plugin for Bosun to visualize Bosun alerting data directly in Grafana. Since Grafana is more user-friendly, people tend to pick it up more naturally. We have turned Grafana users into Bosun consumers, and soon hope to turn Bosun consumers into Bosun authors.
– Kyle Brandt欧洲杯投注软件, Director, Site Reliability, Stack Overflow
The plugin allows teams to use the Bosun expression language inside Grafana to achieve visualizations not previously possible. Grafana can also display annotations created from Bosun, adding valuable context to various metrics behavior. Showing relevant alerts directly on Grafana dashboards also provides the benefit of having to look in fewer places for information. This consolidation provides actionable insight at a critical moment – right when they’re viewing the data. The Bosun team didn’t keep this new plugin for themselves; it is freely available for download at Grafana Labs’ Plugin Repository.
欧洲杯投注软件Stack Overflow is focused on the audience and placing yourself in their shoes. Think of what they know and what they might not know. Don’t tell them what they need to know; show them what they need to know on a Grafana dashboard.
– Kyle Brandt, Director, Site Reliability, Stack Overflow
欧洲杯投注软件Grafana allows teams across Stack Overflow to quickly and easily build custom self-service dashboards for what’s important to them, no matter where the data lives or which database it’s stored in. Because Grafana is open source and has a robust plugin architecture, the Bosun team was able to create a plugin to leverage its powerful alerting system, and can now visualize the data in new ways. The new plugin empowers users new to Bosun to write queries and set alerts directly from Grafana’s UI, as well as the flexibility to leverage Bosun’s native expression language. With the popularity of the Bosun plugin internally, the team shared the plugin with the entire thisisbabbleon.community, and it has been installed thousands of times by users of both projects.
Related Case Studies
After trying to DIY, Wix embraces Grafana Cloud
Metrics is an important part of Wix’s culture, so Grafana Cloud was chosen to monitor mission-critical systems.
DigitalOcean gains new insight with Grafana visualizations
欧洲杯投注软件The company relies on Grafana to be the consolidated data visualization and dashboard solution for sharing data.
Grafana enhances end user experience for Apica Systems
The company uses Grafana alongside its SaaS product to detect availability and performance issues before they affect users.