Near Duplicate
Identical claim data
~100%Paste JSON or XML documents to analyze their semantic similarity. Format is auto-detected.
Load sample documents or paste your own to get started.
Explore how the similarity algorithm responds to different document pairs. The algorithm measures structural and token similarity.
Documents with identical field names score high, even with different values.
Identical claim data
~100%Same fields, different procedure/dates
~100%Same structure, different patient data
~98%"1200.00" vs 1200.00 - same structure
~95%2024-01-15 vs 01/15/2024 - same fields
~98%Original vs corrected - some new fields
~85%Documents sharing some field names but with structural differences.
270/271 pair - shared subscriber fields
~60%Different claim types, some field overlap
~63%camelCase, PascalCase, kebab-case are automatically normalized to snake_case.
claimId → claim_id (normalized)
~99%Different words for same concept (member_id vs subscriber_id). Enable "Synonym resolution" above to see high scores.
member_id vs subscriber_id
~39% (~95% with synonyms)dob vs date_of_birth
~39% (~95% with synonyms)Different JSON structure or completely different document types.
Different structure depth
~39%837D vs 835 - different transaction types
~35%X12 format vs normalized JSON
~31%Completely different entity types
~39%Compare documents in different formats. Format is auto-detected.
Identical field names across formats
~88%subscriber_id ↔ member_id with synonyms
~88% (with synonyms)XML attributes extracted as fields
~83%Compare HTML forms/tables with JSON. Field names extracted from inputs, headers, labels.
Form inputs → JSON fields
~85%Table headers → JSON fields
~85%data-* attributes as fields
~83%