
Cleansing and Conflict Resolution: The Filter of Truth
Resolve contradictions in your Knowledge Graph. Learn how to handle conflicting facts from shared sources and building a consensus-based retrieval system for high-integrity AI.
Cleansing and Conflict Resolution: The Filter of Truth
In a perfect world, all your data sources agree. In the real world, Source A says "Sudeep is in London," Source B says "Sudeep is in Tokyo," and Source C says "Sudeep left the company." If you put all three in your graph without a strategy, your AI agent will be hopelessly confused—and a confused agent is an agent that hallucinates.
In this lesson, we will learn how to build a Consensus Engine for your Knowledge Graph. We will explore Fact Probabilities, Source Authority, and Conflict Management. We will see how to handle the "He Said/She Said" problem in data and why "The Most Recent Fact" isn't always the "True Fact."
1. The Conflict Types
- Direct Contradiction:
(Node)-[:STAYS_IN]->(London)vs(Node)-[:STAYS_IN]->(Tokyo). - Attribute Stale-ness: Source A has 2023 salary data; Source B has 2024 salary data.
- Entity Confusion: Source A thinks
JS-101is "JavaScript," Source B thinks it is "Job Sheet 101."
2. Strategy 1: Authority Ranking (Source Weighting)
Not all sources are equal. You should assign an Authority Score to your connectors.
- Source: HR Database -> Authority: 1.0 (The ultimate truth).
- Source: Slack Channel -> Authority: 0.4 (Maybe rumors).
- Source: Scraping Web -> Authority: 0.1 (Unverified).
Resolution Rule: If HR says "London" and Slack says "Tokyo," the HR fact Overwrites the Slack fact.
3. Strategy 2: Temporal Priority (The "Last Update" Wins)
This is the simplest resolution logic.
- "Whatever fact has the newest
timestampproperty is the truth."
Danger: What if the Newest fact is a typo or a malicious injection? This is why Temporal Priority should usually be used within the same Authority level.
4. Strategy 3: Multi-Truth Representation
Sometimes, there is no "True" fact.
- Query: "Is Sudeep working on the Graph project?"
- Graph: Source A says YES, Source B says NO.
Solution: Don't resolve. Store both.
(Sudeep) -[:MENTIONED_AS_MEMBER {source: 'Slack', confidence: 0.6}]-> (Graph)
(Sudeep) -[:NOT_IN_ROSTER {source: 'HR', confidence: 1.0}]-> (Graph)
AI Outcome: The agent can now tell the user: "According to the HR records, Sudeep is not on the team, but there are discussions in Slack indicating he might be contributing." This is Transparent AI.
graph TD
A[Source A: HR] -->|Fact: London| CR[Conflict Resolver]
B[Source B: Slack] -->|Fact: Tokyo| CR
CR -->|Auth Check| KG[(Knowledge Graph)]
KG -->|Result| F[Final Fact: London]
style A fill:#34A853,color:#fff
style B fill:#f4b400,color:#fff
style F fill:#4285F4,color:#fff
5. Implementation: A Conflict-Aware Ingester
Let's write a Python function that implements Source Authority.
source_authority = {
"HR": 1.0,
"Jira": 0.8,
"Slack": 0.4
}
# Current state of the graph
current_graph = {
"Sudeep": {"location": "London", "weight": 1.0}
}
def update_property(entity, key, val, source):
new_weight = source_authority.get(source, 0.1)
current_weight = current_graph.get(entity, {}).get("weight", 0)
if new_weight >= current_weight:
current_graph[entity][key] = val
current_graph[entity]["weight"] = new_weight
print(f"Updated {entity} {key} to {val} (Via {source})")
else:
print(f"Rejected {val} from {source}. Current {current_weight} > {new_weight}")
# TEST
update_property("Sudeep", "location", "Tokyo", "Slack") # REJECTED
update_property("Sudeep", "location", "Remote", "HR") # UPDATED
6. Summary and Exercises
Conflict resolution is the "Immune System" of your Knowledge Graph.
- Authority Ranking ensures high-fidelity sources win.
- Temporal Priority handles sequential updates.
- Multi-Truth stores conflicting perspectives for the LLM to analyze.
- Transparency is better than False Certainty.
Exercises
- Authority Ranking: You have three sources:
Wikipedia,A Peer-Reviewed Journal, andX (Twitter). Rank them from 1 to 10 for a graph about "Medical Statistics." - Conflict Narrative: How would you prompt an LLM to explain that two departments disagree on a project's deadline?
- The Overwrite Risk: If the CEO makes a typo in an email ("Project starts in 1999" instead of "2009"), and the CEO source has a high authority, how does your system recover from authorized errors?
In the next lesson, we will look at how to scale this entire process: Scaling Ingestion with Distributed Systems.