How Spreadsheets are created and updated

Each time Opsgrid receives metrics, it will create or update a top-level folder called "opsgrid" and a Spreadsheet named after the Telegraf host. You can delete the folder or spreadsheets at any time, but they will be recreated as long as you send metrics to Opsgrid.

Before writing metrics to a sheet, Opsgrid resizes the grid to maximize retention. Currently, it uses a static value of 4 million of the 5 million limit imposed by Google. For example, with 10 columns the sheet will be sized to 4 million / 10 == 400,000 rows.

Rows are written circularly once the sheet is full (to limit slow row additions/removals). This means the most recent row is not always at the top of the sheet. For an easy view into your most recent data, use the demo sheet as an example; it has notes on cells A1 and A2 explaining how to recreate it.

Adding metrics will automatically append new columns and resize the sheet to maintain the cell limit. Rows are removed oldest-first if you are already at the cell limit.

Safely changing existing sheets

Opsgrid only operates on the first sheet in a spreadsheet (called "opsgrid"). Feel free to create other sheets to process your data.

If you need to change the opsgrid sheet, most changes that don't affect the number of rows are safe. For example, you can reorder or delete columns without affecting Opsgrid's behavior.

Column name generation

Telegraf often uses input tags to signify what a metric applies to. To fit these into a 2d format, Opsgrid combines tags to form a column header. This is done by ordering the tags and joining their values with ":", then appending the result to the input and field name.

For example, the disk input has four tags: device, fstype, mode, and path. With a typical root mount this might generate a column header like "disk:free:sda1:ext4:rw:/".

You're encouraged to use settings like taginclude in your Telegraf config to control what tags end up in your headers.

Alerting on historical data

Opsgrid only considers a single row of metrics when processing alerts. Still, it's possible to alert on values like "average cpu over the past 10 minutes" by using aggregator plugins such as basicstats. Each of the aggregated values will be treated as a normal metric by Opsgrid, getting its own column and showing up when configuring alerts.

Making dashboards in Google Sheets

You can make live graphs of your metrics right in Google Sheets. Here's the recommended way to do this:

  • make a new sheet within your spreadsheet
  • set up that sheet to track the most recent x rows from the opsgrid sheet
  • make a column that parses timestamps to a Google Sheets datetime
  • use timeline charts to track metrics you're interested in

There's a live example of this setup available here. Click on cells B1, A2, and B2 to see notes explaining how to replicate it.