API Reference

Interactive documentation for the Commodus similarity and data quality verification APIs.

Base URL
Version
1.0.0
Content-Type
application/x-www-form-urlencoded
POST /api/compare

Compare two documents and compute their semantic similarity score using an ensemble of algorithms.

Request Body

Parameter Type Required Description
doc_a string required First document (JSON, XML, or HTML)
doc_b string required Second document (JSON, XML, or HTML)
use_synonyms boolean optional Enable dental EDI synonym resolution (default: false)

Response

{
  "score": 0.856,
  "simhash_score": 0.891,
  "minhash_score": 0.823,
  "structural_score": 0.854,
  "is_similar": true,
  "confidence": "high",
  "confidence_reason": "All algorithms strongly agree...",
  "format_a": "json",
  "format_b": "json",
  "use_synonyms": false,
  "field_matches": [...],
  "weights": {
    "simhash": 0.4,
    "minhash": 0.4,
    "structural": 0.2
  }
}
Response Fields
Field Type Description
score number Weighted ensemble similarity score (0.0 - 1.0)
simhash_score number SimHash content similarity (0.0 - 1.0)
minhash_score number MinHash token overlap (0.0 - 1.0)
structural_score number Field schema similarity (0.0 - 1.0)
is_similar boolean Whether score exceeds threshold (0.5)
confidence string Confidence level: "high", "medium", or "low"
format_a string Detected format of doc_a: "json", "xml", or "html"
format_b string Detected format of doc_b: "json", "xml", or "html"
field_matches array Field-by-field match details with match types
weights object Algorithm weights used for ensemble scoring
error string? Error message if parsing failed (null on success)
POST /api/verify

Run a 4-stage data quality verification pipeline on two PatientCoverage JSON documents: Flatten → Diff → Drift Analysis → Semantic Validation.

Request Body

Parameter Type Required Description
baseline string required Baseline PatientCoverage JSON document
current string required Current PatientCoverage JSON document to verify against baseline

Pipeline Stages

Stage Module Description
1. Flatten FlatDocument Converts nested JSON into dot-notation paths (e.g., plan.deductibles[0].totalAmount)
2. Diff DiffReport Value-aware field-by-field comparison: unchanged, changed, added, removed
3. Drift TransformerDrift Detects {raw, transformed} pairs and classifies drift: Stable, RawDriftOnly, TransformedDriftOnly, BothDrifted, Structural
4. Validate validate_semantics Runs 7 semantic invariant rules against the current document

Response

{
  "similarity": 0.794,
  "total_paths": 34,
  "unchanged_count": 27,
  "changed_count": 7,
  "added_count": 0,
  "removed_count": 0,
  "diff_entries": [
    { "path": "plan.deductibles[0].remainingAmount",
      "status": "changed",
      "baseline_value": "2500",
      "current_value": "0" }
  ],
  "transformer_pairs": [
    { "path": "plan.coinsurances[0].percentage",
      "drift_type": "raw_drift",
      "flagged": false,
      "baseline_raw": "\"80%\"",
      "current_raw": "\"80 percent\"",
      "baseline_transformed": "80.0",
      "current_transformed": "80.0" }
  ],
  "flagged_drift_count": 1,
  "violations": [
    { "rule_id": "DeductibleRemainingExceedsTotal",
      "message": "Deductible remaining amount exceeds total...",
      "paths": ["plan.deductibles[0].remainingAmount", "plan.deductibles[0].totalAmount"] }
  ]
}
Response Fields
Field Type Description
similarity number Ratio of unchanged paths to total paths (0.0 - 1.0)
total_paths number Total flattened leaf paths across both documents
unchanged_count number Paths with identical values in both documents
changed_count number Paths present in both documents with different values
added_count number Paths in current but not in baseline
removed_count number Paths in baseline but not in current
diff_entries array Per-path diff with status, baseline_value, current_value
transformer_pairs array Detected {raw, transformed} pairs with drift classification
flagged_drift_count number Number of pairs with TransformedDriftOnly or BothDrifted
violations array Semantic rule violations with rule_id, message, and affected paths
error string? Error message if JSON parsing failed (null on success)

Semantic Rules

Rule ID Description
DeductibleRemainingExceedsTotal Deductible remaining amount must not exceed total amount
MaximumRemainingExceedsTotal Maximum coverage remaining must not exceed total
CoinsuranceOutOfRange Coinsurance percentage must be between 0 and 100
IndividualDeductibleExceedsLimit Individual deductible must not exceed $1,000
FamilyDeductibleExceedsLimit Family deductible must not exceed $1,000
MaximumTotalInForbiddenRange Maximum coverage total must be ≤$10,000 or ≥$50,001
ProcedureCodeRequiresHighCoinsurance Codes D0120, D0210, D0274 require coinsurance ≥50%

Usage Notes

  • Format Detection: /api/compare auto-detects JSON, XML, or HTML. /api/verify requires JSON objects.
  • Cross-Format: You can compare documents across different formats with /api/compare (e.g., JSON to XML).
  • Synonyms: Enable use_synonyms on /api/compare for dental EDI field matching (member_id = subscriber_id = patient_id).
  • Verification Pipeline: /api/verify runs a 4-stage pipeline: Flatten → Diff → Drift → Validate. It detects TransformerResult {raw, transformed} drift and enforces 7 semantic invariant rules.
  • Performance: Typical response time is <20µs for comparison, <5ms for full verification pipeline.