datavault4dbt is highly customizable by using many global variables. Since they are applied on multiple levels, a high rate of standardization across your data vault 2.0 solution is guaranteed.
Prerequisites
The default values of those variables are set inside the packages dbt_project.yml under <your_dbt_project>/dbt_packages/datavault4dbt/dbt_project.yml and should be copied to your own dbt_project.yml. Copy all variables defined under the keyword vars, and paste them under the keywordvarsin your file.
Variables and their usages
All the following variables are prefixed with datavault4dbt.
General Configuration
| Name | Usage | Explanation |
|---|---|---|
| include_business_objects_ before_appearance | Ref Table | If a Ref_Hub entry should appear in the ref_table (snapshot based), even if the snapshot date is before the first appearance of that business object. |
| enable_static_ analysis_overwrite | All macros | Relevant for Fusion compatibility. For more info, see here. |
| multi_source_models__ execution_aware_loading | Multi source entities | Whether multi source entities should respect the dbt command to reduce runtimes. |
Column aliases
| Name | Usage | Explanation |
|---|---|---|
| ldts_alias | Stage, DV entities | The name of the load-date column in all DV entities. Is generated in the staging area. |
| rsrc_alias | Stage, DV entities | The name of the record-source column in all DV entities. Is generated in the staging area. |
| ledts_alias | Version 1 Satellites | The name of the load-end date column in version 1 Satellites and MA-Satellites. |
| sdts_alias | Snapshot Table, PITs | The name of the snapshot-date column in the snapshot table (+view) and all PITs. |
| snapshot_trigger_ column | Snapshot Table, PITs, Post-Hook | The name of the column that shows the activation state of single snapshots. |
| stg_alias | Record Tracking Satellite | The name of the column, that holds info about the staging model of each record. |
| is_current_col_alias | Version 1 Satellites | The name of the column that indicates the current row per hashkey. |
| is_active_alias | Effectivity Satellite v0 | The name of the column that marks activity. Generated by the macro. |
Hash Configuration
| Name | Usage | Explanation |
|---|---|---|
| hash | Stage | What hash algorithm should be used for generating hash values. MD5, SHA1 or SHA2. |
| hash_datatype | Stage | The datatype that hash columns should have. Needs to fit the output of the used hash algorithm. |
| hashkey_input_ case_sensititve | Stage | Whether the input business keys for hashkey calculation should be case sensitive or not. |
| hashdiff_input_ case_sensititve | Stage | Whether the input descriptive attributes for hashdiff calculation should be case sensitive or not. |
| hashdiff_use_trim | Stage | Define the global default if hashdiff input columns should be wrapped with TRIM() or not. Defaults to true. Available from v1.16.0 |
Stage Configuration
| Name | Usage | Explanation |
|---|---|---|
| copy_rsrc_ldts_ input_columns | Stage | Whether the columns that are used for ldts and rsrc should also be inside the stage, or not. If true, the stage would hold the ldts- & rsrc-alias columns, and the original columns. If false, only the aliased columns are kept. |
Satellite Configuration
| Name | Usage | Explanation |
|---|---|---|
| is_active_datatype | Effectivity Satellite v0 | Controls the datatype which is used for the is_active-column. Defaults to Bit for Fabric & Synapse, Number on Oracle and Boolean on the remaining adapters. Available from v1.10.0 |
Ghost Record and Zero Key Configuration
| Name | Usage | Explanation |
|---|---|---|
| beginning_of_all_times | Stage, Satellites, PIT | The timestamp that represents your earliest technical timestamp. |
| end_of_all_times | Stage, Version 1 Satellites, PIT | The timestamp that represents your latest technical timestamp. We recommend to not use the maximum possible timestamp of your database. |
| timestamp_format | Stage, Version 1 Satellites, PIT | The timestamp format of the two previous variables. |
| beginning_of_all_times_date | Stage | The date that represents your earliest technical date. Used for ghost-record creation of columns with the date-datatype |
| end_of_all_times_date | Stage | The date that represents your latest technical date. Used for ghost-record creation of columns with the date-datatype |
| date_format | Stage | The date format of the two previous variables. |
| default_unknown_rsrc | Stage | The default unknown value for the record source column you want to use. |
| default_error_rsrc | Stage | The default error value for the record source column you want to use. |
| rsrc_default_dtype | Stage | The default datatype that should be used for the two variables above. |
| stg_default_dtype | Record Tracking Satellite | The default datatype for the ‘stg_alias’ column inside a record tracking satellite . |
| derived_columns_ default_dtype | Stage | The default datatype for derived columns, if no other datatype can be detected automatically. |
Datatype specific default values
For each datatype there is a default unknown and error value defined. Additionally, an alternative, usually much shorter value is defined. See the applied default values in the table below.
| Datatype | Error Value | Error Value alt. | Unknown Value | Unknown Value alt. |
|---|---|---|---|---|
| STRING | (error) | e | (unknown) | u |
| Numeric | -2 | -1 |
Those values are best changed by adding a global variable inside your dbt project, that is called following this pattern: `datavault4dbt._value_<(alt)>__`. If you want to change the default alternative error value for datatype STRING, you would need to set the global variable `datavault4dbt.error_value_alt__STRING` to your desired value.