Add automated monitoring checks

Last modified on 10-Jul-24

× This feature is not supported in Soda Core OSS.
Migrate to Soda Library in minutes to start using this feature for free with a 45-day trial.

Use automated monitoring checks to instruct Soda to automatically check for row count anomalies and schema changes in a dataset.

automated monitoring:
  datasets:
    - include %
    - exclude test%

✔️    Requires Soda Core Scientific (included in a Soda Agent)
✖️    Supported in Soda Core
✔️    Supported in Soda Library + Soda Cloud
✔️    Supported in Soda Cloud Agreements + Soda Agent

✖️ Available as no-code checks

About automated monitoring checks
Add automated monitoring checks
Add quotes to all datasets
Go further

About automated monitoring checks

When you add automated monitoring checks to a data source connected to your Soda Cloud account via a self-hosted agent, Soda prepares and executes two checks on all the datasets you indicated as included in the configuration.

Anomaly score check on row count: This check counts the number of rows in a dataset during scan and registers anomalous counts relative to previous measurements for the row count metric. Refer to Anomaly score checks for details.
Anomaly score checks require a minimum of four data points (four scans at stable intervals) to establish a baseline against which to gauge anomalies. If you do not see check results immediately, allow Soda Library to accumulate the necessary data points for relative comparison.

Schema evolution check: This check monitors schema changes in datasets, including column addition, deletion, data type changes, and index changes. By default, this automated check results in a failure if a column is deleted, its type changes, or its index changes; it results in a warning if a column is added. Refer to Schema checks for details.
Schema checks require a minimum of one data point to use as a baseline against which to gauge schema changes. If you do not see check results immediately, wait until after you have scanned the dataset twice.

Add automated monitoring checks

Add automated monitoring checks as part of the guided workflow to create a new data source only in deployment models that use a self-hosted Soda agent, not a Soda-hosted Soda agent. For a Soda-hosted agent, consider using the automated anomaly dashboard for observability into basic data quality in your datasets.

If you are using a self-operated deployment model that leverages Soda Library, add the column profiling configuration outlined below to your checks YAML file.

In Soda Cloud, navigate to your avatar > Data Sources > New Data Source to begin.

In step 5. Check of the guided workflow, you have the option of listing the datasets to which you wish to automatically add anomaly score and schema evolution checks. (Note that if you have signed up for early access to anomaly dashboards for datasets, this Check tab is unavailable as Soda performs all automated monitoring automatically in the dashboards.)

The example check below uses a wildcard character (%) to specify that Soda Library executes automated monitoring checks against all datasets with names that begin with prod, and not to execute the checks against any dataset with a name that begins with test.

automated monitoring:
  datasets:
    - include prod%
    - exclude test%

You can also specify individual datasets to include or exclude, as in the following example.

automated monitoring:
  datasets:
    - include orders

Scan results in Soda Cloud

To review the check results for automated monitoring checks in Soda Cloud, you can:

navigate to the Checks dashboard to see the check results
navigate to the Datasets dashboard to see the check results for an individual dataset

Add quotes to all datasets

If your dataset names include white spaces or use special characters, you must wrap those dataset names in quotes whenever you identify them to Soda, such as in a checks YAML file.

To add those necessary quotes to dataset names that Soda acts upon automatically – discovering, profiling, or sampling datasets, or creating automated monitoring checks – you can add a quote_tables configuration to your data source, as in the following example.

data_source soda_demo:
  type: sqlserver
  host: localhost
  username: ${SQL_USERNAME}
  password: ${SQL_PASSWORD}
  quote_tables: true

Go further

Need help? Join the Soda community on Slack.
Learn more about the anomaly dashboard for datasets.
Reference tips and best practices for SodaCL.
Use a freshness check to gauge how recently your data was captured.
Use reference checks to compare the values of one column to another.

Was this documentation helpful?

What could we do to improve this page?

Suggest a docs change in GitHub.
Share feedback in the Soda community on Slack.

Documentation always applies to the latest version of Soda products
Last modified on 10-Jul-24