API - Similarity Engine

POST /api/compare

Compare two documents and compute their semantic similarity score using an ensemble of algorithms.

Request Body

Parameter	Type	Required	Description
`doc_a`	string	required	First document (JSON, XML, or HTML)
`doc_b`	string	required	Second document (JSON, XML, or HTML)
`use_synonyms`	boolean	optional	Enable dental EDI synonym resolution (default: false)

Response

{
  "score": 0.856,
  "simhash_score": 0.891,
  "minhash_score": 0.823,
  "structural_score": 0.854,
  "is_similar": true,
  "confidence": "high",
  "confidence_reason": "All algorithms strongly agree...",
  "format_a": "json",
  "format_b": "json",
  "use_synonyms": false,
  "field_matches": [...],
  "weights": {
    "simhash": 0.4,
    "minhash": 0.4,
    "structural": 0.2
  }
}

Response Fields

Field	Type	Description
`score`	number	Weighted ensemble similarity score (0.0 - 1.0)
`simhash_score`	number	SimHash content similarity (0.0 - 1.0)
`minhash_score`	number	MinHash token overlap (0.0 - 1.0)
`structural_score`	number	Field schema similarity (0.0 - 1.0)
`is_similar`	boolean	Whether score exceeds threshold (0.5)
`confidence`	string	Confidence level: "high", "medium", or "low"
`format_a`	string	Detected format of doc_a: "json", "xml", or "html"
`format_b`	string	Detected format of doc_b: "json", "xml", or "html"
`field_matches`	array	Field-by-field match details with match types
`weights`	object	Algorithm weights used for ensemble scoring
`error`	string?	Error message if parsing failed (null on success)

Request Builder

doc_a *

doc_b *

use_synonyms

Click "Execute" to see the response

Raw JSON

curl -X POST '/api/compare' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'doc_a={"example": "value"}' \
  -d 'doc_b={"example": "value"}' \
  -d 'use_synonyms=false'

const response = await fetch('/api/compare', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/x-www-form-urlencoded',
  },
  body: new URLSearchParams({
    doc_a: '{"example": "value"}',
    doc_b: '{"example": "value"}',
    use_synonyms: 'false',
  }),
});

const result = await response.json();

Usage Notes

Format Detection: Documents are automatically detected as JSON, XML, or HTML based on content.

Cross-Format: You can compare documents across different formats (e.g., JSON to XML).

Synonyms: Enable use_synonyms for dental EDI field matching (member_id = subscriber_id = patient_id).

Performance: Typical response time is <20µs for comparison logic.