Skip to content

fix(query): clarify condition resolution semantics for label queries#2994

Open
contrueCT wants to merge 2 commits intoapache:masterfrom
contrueCT:task/improve-condition-query-semantics
Open

fix(query): clarify condition resolution semantics for label queries#2994
contrueCT wants to merge 2 commits intoapache:masterfrom
contrueCT:task/improve-condition-query-semantics

Conversation

@contrueCT
Copy link
Copy Markdown
Contributor

Purpose of the PR

ConditionQuery.condition() currently mixes several different meanings in one API, including:

  • no condition
  • conflicting conditions resolved to empty
  • a unique resolved value
  • a raw multi-value result
  • an exception for ambiguous resolved values

This PR keeps the legacy condition() behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-risk LABEL call sites to use the clearer semantics.

Main Changes

  • Add explicit condition-resolution APIs to ConditionQuery
    • containsCondition(Object key)
    • conditionValues(Object key)
    • conditionValue(Object key)
  • Keep the legacy condition() method backward-compatible
  • Document the semantic differences between the legacy API and the new explicit APIs
  • Migrate LABEL-related high-risk callers to the new APIs in:
    • graph/index transactions
    • serializers
    • traversers
    • in-memory / hstore paths
  • Preserve the old behavior for non-LABEL legacy usages in this first step

Verifying these changes

Added and extended regression coverage for the new semantics:

  • QueryTest#testConditionWithoutLabel
  • QueryTest#testConditionWithEqAndIn
  • QueryTest#testConditionWithSingleInValues
  • QueryTest#testConditionWithConflictingEqAndIn
  • QueryTest#testConditionWithMultipleMatchedInValues

Added a targeted regression for the label-index fallback path:

  • VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedProperties

This test verifies:

  • a multi-label query can conservatively fall back and still match the indexed label
  • conflicting label conditions produce no matched indexes

Existing label-query regressions were also rechecked to ensure no behavior regression:

  • EdgeCoreTest#testQueryInEdgesOfVertexByLabels
  • EdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabels
  • EdgeCoreTest#testQueryInEdgesOfVertexBySortkey
  • VertexCoreTest#testQueryByJointLabels
  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='QueryTest#testConditionWithoutLabel+testConditionWithEqAndIn+testConditionWithSingleInValues+testConditionWithConflictingEqAndIn+testConditionWithMultipleMatchedInValues' test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='EdgeCoreTest#testQueryInEdgesOfVertexByLabels+testQueryInEdgesOfVertexByConflictingLabels+testQueryInEdgesOfVertexBySortkey' test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='VertexCoreTest#testQueryByJointLabels+testCollectMatchedIndexesByJointLabelsWithIndexedProperties' test

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics.

Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 13, 2026
return null;
}
return (Id) labels.iterator().next();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Code Duplication: uniqueLabel() duplicated in 4 files — consider consolidating

The uniqueLabel(ConditionQuery) helper is copy-pasted into 4 files with identical bodies:

  • GraphTransaction.java (private static)
  • GraphIndexTransaction.java (private static)
  • RamTable.java (private static)
  • HstoreStore.java (private instance method — inconsistent with the other 3)

I understand these callers need "return null when multiple labels exist" semantics (unlike conditionValue() which throws). Two suggestions:

Option A — Add this as a first-class method on ConditionQuery (e.g., uniqueConditionValue(Object key) or singleConditionValueOrNull(Object key)) to eliminate all 4 copies:

public <T> T uniqueConditionValue(Object key) {
    Set<Object> values = this.conditionValues(key);
    if (values.size() != 1) {
        return null;
    }
    @SuppressWarnings("unchecked")
    T value = (T) values.iterator().next();
    return value;
}

Option B — If you prefer not to expand the ConditionQuery API surface, at minimum consolidate to a static utility in one shared location (e.g., a package-visible helper in ConditionQuery or a QueryUtil).

Also note: the HstoreStore version is an instance method while the other 3 are static — this inconsistency should be fixed either way.

direction = Directions.OUT;
}
Id label = cq.condition(HugeKeys.LABEL);
Id label = cq.conditionValue(HugeKeys.LABEL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Inconsistent migration: mixed conditionValue() vs uniqueLabel() for LABEL key

In this PR, some callers migrate to conditionValue(HugeKeys.LABEL) (serializers, traverser), while others use the new private uniqueLabel() helper (transactions, RamTable, HstoreStore). The two have different failure behavior for multi-label queries:

  • conditionValue()throws IllegalStateException if multiple values remain
  • uniqueLabel() → returns null silently

Is this difference intentional per call site? If so, a brief comment at each call site explaining why that particular semantic was chosen would help future readers. Otherwise, consider unifying to a single approach — either all callers that need "single or nothing" use one API, and those that need "single or fail" use the other, with the distinction documented.

for (HugeKeys key : EdgeId.KEYS) {
Object value = cq.condition(key);
Object value = key == HugeKeys.LABEL ?
cq.conditionValue(key) : cq.condition(key);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Ternary for LABEL-only routing is fragile

This introduces a special-case ternary:

Object value = key == HugeKeys.LABEL ?
               cq.conditionValue(key) : cq.condition(key);

The iteration is over EdgeId.KEYS (OWNER_VERTEX, DIRECTION, LABEL, SORT_VALUES, OTHER_VERTEX), and only LABEL gets the new API — a fact that's not self-documenting. If another key eventually needs the same treatment, a reader must remember to add it here too.

Consider either:

  1. Adding a brief inline comment explaining why only LABEL needs conditionValue() (e.g., "LABEL may have multiple IN conditions; other EdgeId keys are always single-valued"), or
  2. Applying conditionValue() uniformly for all keys in this loop if the semantics are safe (since the other keys are guaranteed single-valued, conditionValue would behave identically to condition for them)


private Set<Object> resolveConditionValues(List<Object> valuesEQ,
List<Object> valuesIN) {
boolean initialized = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ resolveConditionValues duplicates logic already in condition() — consider extracting the shared intersection

The new resolveConditionValues() method contains the same intersection logic as the existing condition() method. The only difference is condition() has extra fast-path returns and the final single-value check.

Since condition() now delegates collectConditionValues() to the new private method (good!), it could also delegate the intersection to resolveConditionValues() to fully eliminate the duplicated loop:

public <T> T condition(Object key) {
    List<Object> valuesEQ = InsertionOrderUtil.newList();
    List<Object> valuesIN = InsertionOrderUtil.newList();
    this.collectConditionValues(key, valuesEQ, valuesIN);
    if (valuesEQ.isEmpty() && valuesIN.isEmpty()) {
        return null;
    }
    // Keep legacy fast paths for backward compatibility
    if (valuesEQ.size() == 1 && valuesIN.isEmpty()) { ... }
    if (valuesEQ.isEmpty() && valuesIN.size() == 1) { ... }
    // Delegate to shared intersection instead of duplicating
    Set<Object> intersectValues = this.resolveConditionValues(valuesEQ, valuesIN);
    // ... rest of legacy checks
}

This would make condition() and conditionValues() share the same intersection code path, reducing divergence risk.

}

private static Id uniqueLabel(ConditionQuery query) {
java.util.Set<Object> labels = query.conditionValues(HugeKeys.LABEL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Minor: RamTable.uniqueLabel uses fully-qualified java.util.Set instead of import

private static Id uniqueLabel(ConditionQuery query) {
    java.util.Set<Object> labels = query.conditionValues(HugeKeys.LABEL);

The other 3 copies of uniqueLabel() use the imported Set. This one uses java.util.Set — likely a missing import. Trivial, but worth cleaning up for consistency (or moot if the duplication is resolved per the earlier comment).

@contrueCT contrueCT changed the title improve(query): clarify condition resolution semantics for label queries fix(query): clarify condition resolution semantics for label queries Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve]: clarify ConditionQuery.condition() semantics for missing, conflicting, and multi-value conditions

2 participants