Skip to content

Comments

MOLT Verify transformations and filter rules#22700

Open
taroface wants to merge 1 commit intomainfrom
molt-verify-transformations
Open

MOLT Verify transformations and filter rules#22700
taroface wants to merge 1 commit intomainfrom
molt-verify-transformations

Conversation

@taroface
Copy link
Contributor

@netlify
Copy link

netlify bot commented Feb 18, 2026

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit fd4dd4c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/699639d37ad2c6000856f16e

@netlify
Copy link

netlify bot commented Feb 18, 2026

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit fd4dd4c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/699639d39345d2000864abe8

@github-actions
Copy link

Files changed:

@netlify
Copy link

netlify bot commented Feb 18, 2026

Netlify Preview

Name Link
🔨 Latest commit fd4dd4c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/699639d3d7396d00082a9ca7
😎 Deploy Preview https://deploy-preview-22700--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

@KeithCh KeithCh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@bsanchez-the-roach bsanchez-the-roach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments are minor nits about phrasing that you can take or leave, others are slightly more substantial.

`--row-batch-size` | Number of rows to get from a table at a time. <br>**Default:** 20000
`--schema-filter` | Verify schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).<br><br>**Default:** `'.*'`
`--table-filter` | Verify tables that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).<br><br>**Default:** `'.*'`
`--transformations-file` | Path to a JSON file that defines transformation rules applied during comparison to verify data that was transformed during [fetch]({% link molt/molt-fetch.md %}#transformations). Use the same transformation file from `molt fetch`. Refer to [Verify transformed data](#verify-transformed-data).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`--transformations-file` | Path to a JSON file that defines transformation rules applied during comparison to verify data that was transformed during [fetch]({% link molt/molt-fetch.md %}#transformations). Use the same transformation file from `molt fetch`. Refer to [Verify transformed data](#verify-transformed-data).
`--transformations-file` | Path to a JSON file that defines transformation rules to be applied during comparison. If verifying data that was [transformed during a bulk load with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), use the same transformation file from that `molt fetch` run. Refer to [Verify transformed data](#verify-transformed-data).

Filter rules apply `WHERE` clauses to specified tables during verification. Columns referenced in filter expressions **must** be indexed.

{{site.data.alerts.callout_info}}
Only PostgreSQL and MySQL sources are supported for selective data verification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Only PostgreSQL and MySQL sources are supported for selective data verification.
Selective data verification is only supported for PostgreSQL and MySQL sources.

- `resource_specifier`: Identifies which schemas and tables to filter. Schema and table names are case-insensitive.
- `schema`: Schema name containing the table.
- `table`: Table name to apply the filter to.
- `expr`: SQL expression that applies to both source and target databases. The expression must be valid for both database dialects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "for both database dialects" mean both the source and the target database dialects? I assume so, but might be good to say that explicitly to avoid ambiguity. (Especially because this comes not long after the callout about both Postgres and MySQL so that's where my head jumped to when considering "two dialects").


#### Step 1. Create a filter rules file

Create a JSON file that defines the filter rules. The following example defines filter rules on two tables, `public.filtertbl` and `public.filtertbl2`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those the names of the schemas/tables on the source or on the target? Or must they match? (Maybe that latter question relates to the transformation content that I haven't yet read).

- `schema`: Schema name containing the table.
- `table`: Table name to apply the filter to.
- `expr`: SQL expression that applies to both source and target databases. The expression must be valid for both database dialects.
- `source_expr` and `target_expr`: SQL expressions that apply to the source and target databases, respectively. These must be defined together, and cannot be used with `expr`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might want a bit more understanding about how this works. So are there three total options: source_expr, target_expr, and expr? Which are optional, and which are mutually exclusive? Are these filter expressions applied sequentially, like first the source_expr does one round of filtering then the target_expr does another? Does the expr filter get applied in between?

I also wonder if elaborating on all of that info is best done in a conceptual doc as opposed to a how-to (though I don't have a strong opinion about that at the moment).


- `resource_specifier`: Identifies which schemas and tables to filter. Schema and table names are case-insensitive.
- `schema`: Schema name containing the table.
- `table`: Table name to apply the filter to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `table`: Table name to apply the filter to.
- `table`: Name of the table to apply the filter to.

~~~

- `resource_specifier`: Identifies which schemas and tables to transform. Schema and table names are case-insensitive.
- `schema`: Schema name containing the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `schema`: Schema name containing the table.
- `schema`: Name of the schema containing the table.

}
~~~

- `resource_specifier`: Identifies which schemas and tables to transform. Schema and table names are case-insensitive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both schema and table: these are the names on the source, I assume? Should probably state that for clarity.

- `table_rename_opts`: Rename the table on the target database.
- `value`: The target table name to compare against.
- `schema_rename_opts`: Rename the schema on the target database.
- `value`: The target schema name to compare against.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question as above re: which are optional, which are mutually exclusive. If I only wanted to rename the schema, would I even need to include the "table" item?

## Known limitations

- MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. To configure the row batch size, use the `--row_batch_size` [flag](#flags).
- MOLT Verify only supports comparing one MySQL database to a whole CockroachDB schema (which is assumed to be `public`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. Does this not contradict the fact that MySQL sources are supported for selective data verification?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants