Data Quality Report: customer_transactions

May 20, 2025
production.sales
Transaction: 0000012abd56e4
Overview
Data Profiling
Anomalies
Schema Changes
Data Lineage
Rule Validation

92%

Quality Score

3

Anomalies Detected

2

Schema Changes

1.2M

Records Processed

Data Quality Score

92%

Quality Assessment

Completeness

98%

Consistency

95%

Accuracy

87%

Timeliness

93%

Critical Issues

Anomaly Detected: transaction_amount

Z-score threshold exceeded (4.2)

Unusual spike in transaction amounts detected in the last 24 hours. Values exceed 3 standard deviations from the mean.

Rule Violation: customer_id

Uniqueness constraint failed

3 duplicate customer_id values found in recent transactions. This violates the primary key constraint.

Recent Activity

Schema Change

Added field: customer_segment (STRING)

May 18, 2025 00000128ab42f9

Anomaly Detected

Unusual pattern in transaction_fee column

May 17, 2025 Automated Detection

Rule Validation

All 24 data quality rules passed successfully

May 16, 2025 Daily Check

Column Statistics

Distribution Histogram

$0
$250
$500
$750
$1000+
Frequency

Statistical Summary

  • Min Value $0.99
  • Max Value $1,249.99
  • Mean $142.87
  • Median $118.50
  • Standard Deviation $87.32
  • 25th Percentile $79.99
  • 75th Percentile $189.99
  • IQR $110.00
  • Skewness 1.87 (Right-skewed)
  • Null Count 0 (0%)

Correlation Analysis

transaction_amount transaction_fee customer_age items_count
transaction_amount 1.00 0.78 0.12 0.92
transaction_fee 0.78 1.00 0.08 0.65
customer_age 0.12 0.08 1.00 0.21
items_count 0.92 0.65 0.21 1.00
Strong (>0.7)
Medium (0.4-0.7)
Weak (<0.4)

Unique Value Analysis

customer_id

Distinct Values 845,231
Uniqueness 99.7%
Top Value CID-78342 (12 occurrences)

product_id

Distinct Values 5,842
Uniqueness 0.5%
Top Value PID-1001 (8,742 occurrences)

store_id

Distinct Values 128
Uniqueness 0.01%
Top Value STORE-042 (24,531 occurrences)

Detected Anomalies

High

Z-Score Anomaly: transaction_amount

Values detected that exceed 3 standard deviations from the mean.

May 14 May 15 May 16 May 17 May 18 May 19 +3σ Mean -3σ
Detection Method Z-Score (threshold: 3.0)
Affected Records 42 (0.0035%)
Max Z-Score 4.2
Medium

IQR Anomaly: transaction_fee

Values detected that fall outside 1.5 times the interquartile range.

$0.50 $2.50 Q1: $5.00 Median: $7.50 Q3: $10.00 $12.50 $15.00 Transaction Fee Distribution
Detection Method IQR (threshold: 1.5)
Affected Records 128 (0.0107%)
Outlier Range Below $0.50, Above $12.50
Low

Sudden Change: daily_transactions

Sudden change detected in the number of daily transactions.

Anomaly May 14 May 15 May 16 May 17 May 18 May 19 50K 40K 30K 20K 10K
Detection Method Sudden Change (threshold: 25%)
Change Magnitude -18.2%
Date Detected May 19, 2025

Anomaly Detection Settings

Z-Score Detection

Enabled Yes
Threshold 3.0
Applied To All numeric columns

IQR Detection

Enabled Yes
Threshold 1.5
Applied To Selected columns only

Sudden Change Detection

Enabled Yes
Threshold 25%
Window Size 7 days

Oscillation Detection

Enabled Yes
Sensitivity Medium
Min Oscillations 3

Schema Evolution

Field Type Change Date Transaction
customer_segment STRING Added May 18, 2025 00000128ab42f9
transaction_fee DECIMAL(10,2) Modified May 15, 2025 00000127de31c8
promotion_code STRING Removed May 10, 2025 00000126fc29b7
payment_method STRING Added May 5, 2025 00000125ba18a6
customer_location STRUCT Modified May 1, 2025 00000124a907b5

Change Details: transaction_fee

Previous Type DECIMAL(8,2)
New Type DECIMAL(10,2)
Reason Increased precision for international transactions
Changed By data_pipeline_job

Current Schema

Field Type Nullable Description
transaction_id STRING No Unique identifier for the transaction
customer_id STRING No Customer identifier
transaction_date TIMESTAMP No Date and time of the transaction
transaction_amount DECIMAL(10,2) No Amount of the transaction
transaction_fee DECIMAL(10,2) Yes Fee associated with the transaction
product_id STRING No Product identifier
store_id STRING No Store identifier
payment_method STRING Yes Method of payment
customer_segment STRING Yes Customer segment classification
customer_location STRUCT Yes Customer location information

Data Lineage Visualization

customer_dim Dimension Table transaction_metrics Derived Table customer_transactions Delta Table sales_summary Aggregate Table customer_segmentation ML Model Output
customer_transactions Delta Table
raw_transactions Source Table
customer_data Dimension Table
transaction_metrics Derived Table
customer_insights Derived Table

Selected Node: customer_transactions

Full Path production.sales.customer_transactions
Type Delta Lake Table
Last Updated May 20, 2025 08:45 AM
Upstream Dependencies 2 tables
Downstream Dependencies 2 tables

Transformation Details

raw_transactions → customer_transactions

-- SQL Transformation
SELECT
  rt.transaction_id,
  rt.customer_id,
  rt.transaction_date,
  rt.transaction_amount,
  rt.transaction_fee,
  rt.product_id,
  rt.store_id,
  rt.payment_method,
  cd.customer_segment,
  cd.customer_location
FROM
  raw_transactions rt
JOIN
  customer_data cd ON rt.customer_id = cd.customer_id
WHERE
  rt.transaction_date >= current_date() - INTERVAL 30 DAYS
Last Run: May 20, 2025 Duration: 4m 12s Status: Success

customer_transactions → transaction_metrics

-- SQL Transformation
SELECT
  store_id,
  DATE(transaction_date) as transaction_day,
  COUNT(*) as daily_transactions,
  SUM(transaction_amount) as daily_amount,
  AVG(transaction_amount) as avg_transaction_value,
  MIN(transaction_amount) as min_transaction_value,
  MAX(transaction_amount) as max_transaction_value
FROM
  customer_transactions
GROUP BY
  store_id, DATE(transaction_date)
ORDER BY
  store_id, transaction_day
Last Run: May 20, 2025 Duration: 2m 38s Status: Success

Impact Analysis

Schema Change Impact: customer_segment (Added)

The following downstream tables and processes are affected by this schema change:

customer_insights

This table references the new column and requires updates to its transformation logic.

transaction_metrics

This table does not reference the new column and requires no changes.

customer_segmentation_report

This report depends on the new column and requires updates to its visualization logic.

Rule Validation Results

22

Passed Rules

1

Warning Rules

1

Failed Rules

24

Total Rules

Status Rule Name Type Description Last Run Actions
unique_customer_id SQL Customer ID must be unique across all transactions May 20, 2025
transaction_amount_range Python Transaction amount should be between $0.01 and $10,000.00 May 20, 2025
not_null_transaction_id YAML Transaction ID must not be null May 20, 2025
valid_transaction_date SQL Transaction date must not be in the future May 20, 2025

Rule Definition Example

Rule: unique_customer_id

-- SQL Rule Definition
SELECT
  customer_id,
  COUNT(*) as occurrence_count
FROM
  customer_transactions
GROUP BY
  customer_id
HAVING
  COUNT(*) > 1

Failure Details

3 records found that violate this rule:

customer_id occurrence_count
CID-78342 12
CID-65219 3
CID-91087 2

Rule Execution History

May 14 May 15 May 16 May 17 May 18 May 19 May 20 100% 75% 50% 25%
Passed
Warning
Failed