Hybrid 3-connector fuzz test + VM_HOST env var#19
Conversation
Replace all hardcoded 192.168.122.x IPs with ${VM_HOST} env var.
VM IP is auto-detected by vm-env.sh via virsh — no manual update
needed when VM IP changes (e.g. new VM deployment).
- .properties files: use Quarkus ${VM_HOST} substitution
- docker-compose: use ${VM_HOST} in commands
- rac.sh: fail-fast with ${VM_HOST:?} instead of silent default
- db-check.py: require ORACLE_HOST, no fallback IP
- .env: remove DB_CONN (set by rac driver)
- vm-env.sh: remove stale IP mismatch validation
Switch to rophy/debezium-server:3.5.0-2a7978c0af which includes: - debezium-config JAR fix (missing in stock 3.5.0.Final) - mergeLogsByPrecedence RAC fix (thread-aware dedup) Implement hybrid architecture: OLR for non-LOB tables + LogMiner for LOB tables on the "actual" side. - Add dbz-lob-logminer service (LOB-only LogMiner connector) - OLR config: skip-lob-tables=1 - Consumer: subscribe to 3 topics, route olr-lob-events to OLR side - Validator: remove KNOWN_LOB_TABLES exemption, LOB mismatches are real failures with hybrid setup - fuzz-test.sh: 3-connector offset seeding, wait, logs - Remove restart: unless-stopped, remove Beta1 JAR mounts
📝 WalkthroughWalkthroughAdded a dedicated Debezium LogMiner connector for LOB tables and integrated it into the RAC fuzz-test stack; replaced hardcoded IPs with Changes
Sequence Diagram(s)sequenceDiagram
participant Oracle as Oracle DB (LogMiner)
participant DebeziumLOB as Debezium LogMiner (lob-logminer)
participant Kafka as Kafka
participant Consumer as kafka-consumer.py
participant Validator as validator.py
Oracle->>DebeziumLOB: Stream LOB redo records
DebeziumLOB->>Kafka: Publish to `olr-lob-events` topic
Kafka->>Consumer: Deliver messages from topics (lm, olr, olr-lob)
Consumer->>Validator: Forward events grouped by adapter ('logminer'/'olr')
Validator->>Validator: Compare LM vs OLR events, treat LOB diffs as mismatches
Validator-->>Consumer: Emit validation results / metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 431-445: Update the script header printed by action_help() to
include the new lob-logminer component and to reflect the correct workflow
ordering and RAC cleanup instruction: locate the action_help() function (or the
top-of-file help comment block it prints) and add "lob-logminer" to the
Components list, reorder the example workflow to show "down" before "up", and
add the note "Always run ./fuzz-test.sh down before ./fuzz-test.sh up in RAC
fuzz tests to ensure a clean environment" so the output of ./fuzz-test.sh help
matches the usage text updated in the logs case.
In `@tests/dbz-twin/rac/kafka-consumer.py`:
- Around line 147-159: The loop that checks Kafka topics should wait for all
entries in all_topics (LM_TOPIC, OLR_TOPIC, OLR_LOB_TOPIC) before proceeding;
currently it breaks as soon as LM_TOPIC or OLR_TOPIC exists which can mask an
implicitly created OLR_LOB_TOPIC on subscribe. Change the condition in the
polling loop that examines consumer.topics() (the block using topics =
consumer.topics()) to require every topic in all_topics to be present (e.g.,
check that set(all_topics).issubset(topics) or iterate and confirm each), only
then break and call consumer.subscribe(all_topics) and consumer.poll().
In `@tests/dbz-twin/rac/perf/docker-compose.yaml`:
- Line 47: The JDBC URL string "//${VM_HOST}:1521/ORCLPDB" silently becomes
malformed when VM_HOST is unset; update the interpolation to the required
expansion form so Compose fails fast—replace "${VM_HOST}" with a required
expansion like "${VM_HOST:?VM_HOST is not set}" in that JDBC URL entry to force
an error if VM_HOST is missing.
In `@tests/dbz-twin/rac/validator.py`:
- Line 248: The current LOB detection uses direct membership check ("is_lob =
event_table in LOB_TABLES") which fails for schema-qualified or case-variant
names; update the check to normalize event_table by stripping any schema
qualifier and surrounding quotes/backticks and then compare in a normalized case
(e.g., upper() or lower()) against a similarly normalized LOB_TABLES set. Modify
the logic around the is_lob assignment in validator.py so it derives a plain
table name (split on '.' and remove quotes) and performs case-normalized
membership against LOB_TABLES to correctly detect LOB tables regardless of
schema qualification or case differences.
In `@tests/environments/rac/.env`:
- Around line 1-2: Update the comment for VM_HOST in the rac environment file to
explicitly state that VM_HOST must be initialized by sourcing the vm-env.sh
script (i.e., callers must run or source vm-env.sh before invoking RAC flows),
and mention that rac.sh expects VM_HOST to be set or it will exit (see rac.sh
behavior around its VM_HOST check). Keep the note next to the existing DB_CONN
comment and use the exact symbol VM_HOST and filenames vm-env.sh and rac.sh so
callers know which script to run and which script relies on the variable.
In `@tests/sql/scripts/drivers/rac.sh`:
- Line 39: The strict fail-fast assignment VM_HOST="${VM_HOST:?VM_HOST is
required — source tests/environments/rac/vm-env.sh}" causes the RAC driver to
hard-fail because generate.sh doesn't source tests/environments/rac/vm-env.sh
first; change the driver to not use the "?" expansion (e.g.,
VM_HOST="${VM_HOST:-}") and instead add a guarded attempt to source vm-env.sh
(if present) or bail gracefully only when RAC execution truly requires VM_HOST
later; update the VM_HOST handling in tests/sql/scripts/drivers/rac.sh so it
either sources vm-env.sh when VM_HOST is empty or defers the fatal error to a
later, explicit check that runs inside the RAC execution path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a4006b05-3c83-423c-8e5f-cf9f55add7d2
📒 Files selected for processing (18)
tests/dbz-twin/rac/config/application-lob-logminer-kafka.propertiestests/dbz-twin/rac/config/application-logminer-kafka.propertiestests/dbz-twin/rac/config/application-logminer.propertiestests/dbz-twin/rac/config/application-olr-kafka.propertiestests/dbz-twin/rac/config/application-olr.propertiestests/dbz-twin/rac/config/olr-config.jsontests/dbz-twin/rac/db-check.pytests/dbz-twin/rac/docker-compose-fuzz.yamltests/dbz-twin/rac/fuzz-test.shtests/dbz-twin/rac/kafka-consumer.pytests/dbz-twin/rac/perf/config/application-logminer.propertiestests/dbz-twin/rac/perf/config/application-olr.propertiestests/dbz-twin/rac/perf/config/prometheus.ymltests/dbz-twin/rac/perf/docker-compose.yamltests/dbz-twin/rac/validator.pytests/environments/rac/.envtests/environments/rac/vm-env.shtests/sql/scripts/drivers/rac.sh
💤 Files with no reviewable changes (1)
- tests/environments/rac/vm-env.sh
- fuzz-test.sh: add lob-logminer to help text, show down-before-up workflow
- kafka-consumer.py: wait for all 3 topics before subscribing, fail on missing
- perf/docker-compose.yaml: use ${VM_HOST:?} for fail-fast on missing env var
- validator.py: normalize table name (strip schema, uppercase) for LOB detection
- .env: clarify VM_HOST must be sourced from vm-env.sh
- rac.sh: auto-source vm-env.sh when VM_HOST is not set
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/dbz-twin/rac/fuzz-test.sh (1)
216-225: Bound Debezium readiness log scans to avoid expensive full-log polling.
Line 218-220reads entire container logs on every poll iteration. As logs grow, this gets slower and noisier. Use--tail(or--since) for bounded checks.♻️ Suggested refinement
- docker logs fuzz-dbz-logminer 2>&1 | grep -q "Starting streaming" && LM_OK=true - docker logs fuzz-dbz-olr 2>&1 | grep -q "streaming client started\|Starting streaming" && OLR_OK=true - docker logs fuzz-dbz-lob-logminer 2>&1 | grep -q "Starting streaming" && LOB_LM_OK=true + docker logs --tail 200 fuzz-dbz-logminer 2>&1 | grep -q "Starting streaming" && LM_OK=true + docker logs --tail 200 fuzz-dbz-olr 2>&1 | grep -Eq "streaming client started|Starting streaming" && OLR_OK=true + docker logs --tail 200 fuzz-dbz-lob-logminer 2>&1 | grep -q "Starting streaming" && LOB_LM_OK=true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/fuzz-test.sh` around lines 216 - 225, The readiness loop repeatedly calls docker logs for each container which rereads the full logs on every iteration; modify the three docker logs invocations that set LM_OK, OLR_OK, and LOB_LM_OK to include a bounded tail (e.g., --tail N) or since (e.g., --since "10s") flag so each grep only scans recent output instead of the entire log; update the commands that check "Starting streaming" / "streaming client started" in the loop (the lines that set LM_OK, OLR_OK, LOB_LM_OK) to use --tail or --since to limit work and improve performance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 216-225: The readiness loop repeatedly calls docker logs for each
container which rereads the full logs on every iteration; modify the three
docker logs invocations that set LM_OK, OLR_OK, and LOB_LM_OK to include a
bounded tail (e.g., --tail N) or since (e.g., --since "10s") flag so each grep
only scans recent output instead of the entire log; update the commands that
check "Starting streaming" / "streaming client started" in the loop (the lines
that set LM_OK, OLR_OK, LOB_LM_OK) to use --tail or --since to limit work and
improve performance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1e972a6d-df7b-44f1-9413-68d2b3a0c9fa
📒 Files selected for processing (6)
tests/dbz-twin/rac/fuzz-test.shtests/dbz-twin/rac/kafka-consumer.pytests/dbz-twin/rac/perf/docker-compose.yamltests/dbz-twin/rac/validator.pytests/environments/rac/.envtests/sql/scripts/drivers/rac.sh
✅ Files skipped from review due to trivial changes (1)
- tests/environments/rac/.env
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/sql/scripts/drivers/rac.sh
- tests/dbz-twin/rac/perf/docker-compose.yaml
- tests/dbz-twin/rac/kafka-consumer.py
- tests/dbz-twin/rac/validator.py
Summary
${VM_HOST}env var (auto-detected byvm-env.shvia virsh)rophy/debezium-server:3.5.0-2a7978c0afwhich includesmergeLogsByPrecedenceRAC fix anddebezium-configJAR fixTest plan
fuzz-test.sh up— all 3 connectors start with${VM_HOST}env var substitutionfuzz-test.sh run 10— 10-minute workload, 0 errors, all connectors survive log switchesfuzz-test.sh validate— 55,228 events, 0 mismatches (PASS)fuzz-test.sh db-check— 22 LOB extras + 4 LOB diffs (known DBZ-9615, identical on both sides)Summary by CodeRabbit
New Features
Chores
Tests