Skip to main content

Data Refresh

There may be cases when data in the consumer database needs to be refreshed. This guide covers the process and considerations for performing a data refresh.

When to Perform a Data Refresh

Common scenarios requiring a data refresh:

  • Data corruption - Issues with the consumer database
  • Schema changes - Significant upstream schema modifications
  • Testing - Setting up a fresh test environment
  • Compliance - Data retention policy requirements
  • Upgrades - Major consumer version upgrades

Data Refresh Process

Step 1: Stop the Consumer

Ensure the consumer is stopped before making changes.

# Docker
docker compose down

# Kubernetes
kubectl scale deployment solifi-consumer --replicas=0

# Standalone
# Stop the Java process

Step 2: Clean the Database

You have two options:

Option A: Delete the entire database (Full refresh)

-- Create a new database
DROP DATABASE solifi_data;
CREATE DATABASE solifi_data;

Option B: Delete specific tables (Partial refresh)

-- Delete data tables
DROP TABLE IF EXISTS table_name;
DROP TABLE IF EXISTS table_name_audit;

-- Or truncate (faster but keeps structure)
TRUNCATE TABLE table_name;
TRUNCATE TABLE table_name_audit;

Step 3: Change the Consumer Group ID

Update the group-id in application.yml to force the consumer to read from the beginning:

spring:
kafka:
consumer:
group-id: mycompany-consumer-refresh-20241221 # New unique ID
Why Change group-id?

Kafka tracks which messages have been read per consumer group. Using a new group ID forces the consumer to start reading from the earliest available offset.

Step 4: Restart the Consumer

# Docker
docker compose up -d

# Kubernetes
kubectl scale deployment solifi-consumer --replicas=1

# Standalone
java -jar solifi-consumer-<version>.jar

Preserving Audit History

Keeping Audit History

If you want to preserve audit history during a data refresh, do not delete the _audit tables.

Solifi compresses data after 7 days, so you may lose audit history of older data during a refresh. If you want to keep historical audit records:

  1. Keep the existing _audit tables intact
  2. Only drop/truncate the main data tables
  3. Change the group-id
  4. Restart the consumer

The consumer will append new audit events to the existing audit tables. You might see some duplicate records, but the value in column lp_db_insert_user will be different (reflecting the new group-id).

Example: Preserving Audit Tables

-- Only drop main tables, keep audit tables
DROP TABLE IF EXISTS cs_master_nf;
DROP TABLE IF EXISTS ls_master_nf;
-- DO NOT drop: cs_master_nf_audit, ls_master_nf_audit

Using Initial Load Mode for Large Refreshes

For large data volumes (>5 million messages), consider using Initial Load Mode instead of standard streaming:

  1. Clean the database completely
  2. Configure initial-load mode
  3. Run the dry run
  4. Execute the initial load
  5. Switch back to streaming mode

This is significantly faster than a standard data refresh.

Post-Refresh Verification

After the refresh completes:

1. Check Table Counts

SELECT 
'cs_master_nf' as table_name,
COUNT(*) as row_count
FROM cs_master_nf
UNION ALL
SELECT
'ls_master_nf' as table_name,
COUNT(*) as row_count
FROM ls_master_nf;

2. Verify Consumer Health

curl http://localhost:8080/actuator/health

3. Check Consumer Lag

Monitor that consumer lag is decreasing and eventually reaches zero.

4. Review Error Log

SELECT COUNT(*) FROM error_log
WHERE created_at > DATEADD(hour, -1, GETDATE());

Partial Topic Refresh

To refresh only specific topics without affecting others:

  1. Stop the consumer
  2. Drop only the tables for the target topics
  3. Update application.yml to consume only those topics temporarily
  4. Use a new group-id specific to this refresh
  5. Run the consumer until caught up
  6. Restore original configuration

Refresh Checklist

  • Stop all consumer instances
  • Backup database (if needed)
  • Document current group-id
  • Drop/truncate required tables
  • Decide on audit table handling
  • Update group-id in configuration
  • Start consumer
  • Monitor progress
  • Verify data completeness
  • Check error log

Next Steps