ValidationResult Schema Reference

Overview

The ValidationResult schema represents the output from the Judge arm after validating outputs against schemas, acceptance criteria, facts, and quality standards. This multi-layer validation ensures outputs are structurally correct, factually accurate, and meet quality thresholds.

Used By: Judge Arm (output), Orchestrator (for decision-making) Primary Endpoint: POST /validate Format: JSON

Structure

ValidationResult

Complete validation output with issues, confidence, and quality metrics.

interface ValidationResult {
  valid: boolean;                   // Required: No errors (warnings/info OK)
  confidence: number;               // Required: 0.0-1.0 confidence score
  issues: ValidationIssue[];        // Required: List of issues found
  passed_criteria: string[];        // Optional: Criteria that passed
  failed_criteria: string[];        // Optional: Criteria that failed
  quality_score: number;            // Required: 0.0-1.0 overall quality
  metadata: ValidationMetadata;     // Optional: Additional info
}

interface ValidationIssue {
  severity: 'error' | 'warning' | 'info';  // Required: Issue severity
  type: string;                            // Required: Issue type
  message: string;                         // Required: Human-readable description
  location: string;                        // Optional: Where the issue was found
  suggestion: string;                      // Optional: How to fix it
}

interface ValidationMetadata {
  validation_types_run: string[];   // Types executed (schema, facts, etc.)
  total_issues: number;             // Total issue count
  error_count: number;              // Number of errors
  warning_count: number;            // Number of warnings
  info_count: number;               // Number of info messages
  duration_ms: number;              // Validation execution time
  model?: string;                   // LLM model used (if applicable)
}

Field Definitions

`valid` (required)

Type: boolean Description: Whether the output is considered valid (no errors)

Validation Logic:

true: No issues with severity: 'error' (warnings and info are acceptable)
false: At least one issue with severity: 'error'

Examples:

// Valid output (warnings OK)
{
  "valid": true,
  "issues": [
    {"severity": "warning", "message": "Consider adding docstring"},
    {"severity": "info", "message": "Code style follows PEP 8"}
  ]
}

// Invalid output (errors present)
{
  "valid": false,
  "issues": [
    {"severity": "error", "message": "Missing required field 'tests'"},
    {"severity": "warning", "message": "Function name could be more descriptive"}
  ]
}

`confidence` (required)

Type: number Constraints: 0.0-1.0 Description: Confidence in the validation result (higher = more certain)

Confidence Levels:

Range	Interpretation	Meaning
0.9-1.0	Very High	Extremely confident in validation
0.7-0.89	High	Confident, minor ambiguities
0.5-0.69	Medium	Moderate confidence, some uncertainty
0.3-0.49	Low	Significant uncertainty
0.0-0.29	Very Low	Highly uncertain, review manually

Factors Affecting Confidence:

Clear vs ambiguous acceptance criteria
Availability of trusted sources for fact-checking
Complexity of schema validation
Presence of hallucination indicators
Quality of LLM reasoning (if used)

Examples:

// High confidence - clear violations
{
  valid: false,
  confidence: 0.95,
  issues: [
    {severity: "error", message: "Missing required field 'email'"}
  ]
}

// Low confidence - ambiguous criteria
{
  valid: true,
  confidence: 0.45,
  issues: [
    {severity: "warning", message: "Criterion 'code is good' is subjective"}
  ]
}

`issues` (required)

Type: array of ValidationIssue objects Description: List of all issues found during validation

ValidationIssue Structure

`severity` (required)

Type: enum - 'error' | 'warning' | 'info' Description: Severity level of the issue

Severity Definitions:

error - Blocking issue, prevents output acceptance

Missing required fields
Schema violations
Failed acceptance criteria
Factual hallucinations
Critical quality issues

warning - Non-blocking issue, should be addressed but not critical

Suboptimal implementations
Style inconsistencies
Minor quality concerns
Deprecated patterns

info - Informational, no action required

Best practice suggestions
Optimization opportunities
Context notes

Example:

{
  "issues": [
    {
      "severity": "error",
      "type": "schema_violation",
      "message": "Missing required field 'tests'"
    },
    {
      "severity": "warning",
      "type": "style_issue",
      "message": "Function name uses camelCase instead of snake_case"
    },
    {
      "severity": "info",
      "type": "optimization",
      "message": "Consider using list comprehension for better performance"
    }
  ]
}

`type` (required)

Type: string Description: Categorizes the issue for filtering and tracking

Common Issue Types:

Schema Validation:

schema_violation - Output doesn't match expected schema
missing_field - Required field is absent
invalid_type - Field has wrong data type
constraint_violation - Field violates constraints (min/max, regex, etc.)

Criteria Validation:

criteria_not_met - Acceptance criterion failed
criteria_ambiguous - Criterion is unclear or subjective

Fact Checking:

fact_mismatch - Stated fact contradicts trusted sources
unsupported_claim - Claim not found in sources
source_missing - Citation lacks source

Hallucination Detection:

hallucination - LLM fabricated information
confidence_mismatch - High confidence on uncertain facts
detail_inconsistency - Details contradict each other

Quality Assessment:

readability_issue - Code/text is hard to understand
complexity_issue - Unnecessarily complex solution
performance_issue - Inefficient implementation
security_issue - Potential security vulnerability
style_issue - Code style inconsistencies

Example:

{
  "issues": [
    {"type": "schema_violation", "message": "..."},
    {"type": "hallucination", "message": "..."},
    {"type": "security_issue", "message": "..."}
  ]
}

`message` (required)

Type: string Constraints: 10-500 characters Description: Human-readable description of the issue

Best Practices:

Be specific and actionable
Include relevant details (field names, expected vs actual values)
Use clear, non-technical language when possible
Avoid jargon unless necessary

Examples:

// Good messages
"Missing required field 'email' in user object"
"CVSS score stated as 9.8 but actual score is 7.5 according to NVD"
"Function 'calc_avg' has cyclomatic complexity of 15 (max recommended: 10)"

// Bad messages
"Schema error"  // Too vague
"The code doesn't follow best practices"  // Not specific

`location` (optional)

Type: string Description: Where the issue was found (field path, line number, function name)

Format Examples:

// Field paths (dot notation)
"user.profile.email"
"tasks[2].status"

// Code locations
"function:calculate_average"
"line:42"
"file:auth.py:line:87"

// General locations
"root"
"N/A"

`suggestion` (optional)

Type: string Constraints: 10-500 characters Description: Actionable advice on how to fix the issue

Examples:

{
  "issue": "Missing required field 'tests'",
  "suggestion": "Add a 'tests' field containing unit tests for the code"
},
{
  "issue": "Function has no docstring",
  "suggestion": "Add a docstring explaining parameters, return value, and example usage"
},
{
  "issue": "CVSS score mismatch",
  "suggestion": "Update CVSS score to 7.5 based on https://nvd.nist.gov/vuln/detail/CVE-2024-12345"
}

`passed_criteria` (optional)

Type: array of strings Description: Acceptance criteria that were successfully met

Example:

{
  "passed_criteria": [
    "Code implements sorting functionality",
    "Function has proper naming",
    "Edge cases are handled"
  ]
}

`failed_criteria` (optional)

Type: array of strings Description: Acceptance criteria that were not met

Example:

{
  "failed_criteria": [
    "Tests are included",
    "Performance is O(n log n) or better"
  ]
}

`quality_score` (required)

Type: number Constraints: 0.0-1.0 Description: Overall quality assessment of the output

Quality Scoring Rubric:

Score Range	Grade	Interpretation
0.9-1.0	Excellent	Production-ready, minimal issues
0.7-0.89	Good	Minor improvements needed
0.5-0.69	Fair	Moderate issues, rework suggested
0.3-0.49	Poor	Significant issues, major rework required
0.0-0.29	Very Poor	Unacceptable quality, restart recommended

Factors Considered:

Correctness (does it work?)
Completeness (meets all requirements?)
Readability (easy to understand?)
Maintainability (easy to modify?)
Performance (efficient?)
Security (safe from vulnerabilities?)
Style (consistent formatting?)

Example:

{
  "quality_score": 0.85,
  "issues": [
    {"severity": "warning", "type": "style_issue", "message": "Minor style inconsistency"},
    {"severity": "info", "type": "optimization", "message": "Could use list comprehension"}
  ]
}

`metadata` (optional)

Type: object Description: Additional information about the validation process

Common Metadata Fields:

validation_types_run: Types of validation performed
total_issues: Total number of issues found
error_count: Number of errors
warning_count: Number of warnings
info_count: Number of info messages
duration_ms: Validation execution time
model: LLM model used (if applicable)

Example:

{
  "metadata": {
    "validation_types_run": ["schema", "criteria", "quality"],
    "total_issues": 3,
    "error_count": 1,
    "warning_count": 1,
    "info_count": 1,
    "duration_ms": 1250,
    "model": "gpt-3.5-turbo"
  }
}

Complete Examples

Example 1: Valid Output with Warnings

{
  "valid": true,
  "confidence": 0.88,
  "issues": [
    {
      "severity": "warning",
      "type": "style_issue",
      "message": "Function name uses camelCase instead of snake_case",
      "location": "function:sortList",
      "suggestion": "Rename to 'sort_list' to follow Python naming conventions"
    },
    {
      "severity": "info",
      "type": "optimization",
      "message": "Consider adding type hints for better code clarity",
      "location": "function:sortList",
      "suggestion": "Add type hints like 'def sort_list(lst: List[int]) -> List[int]:'"
    }
  ],
  "passed_criteria": [
    "Code implements sorting functionality",
    "Tests are included",
    "Edge cases are handled"
  ],
  "failed_criteria": [],
  "quality_score": 0.82,
  "metadata": {
    "validation_types_run": ["schema", "criteria", "quality"],
    "total_issues": 2,
    "error_count": 0,
    "warning_count": 1,
    "info_count": 1,
    "duration_ms": 950,
    "model": "gpt-3.5-turbo"
  }
}

Example 2: Invalid Output (Schema Violation)

{
  "valid": false,
  "confidence": 0.95,
  "issues": [
    {
      "severity": "error",
      "type": "missing_field",
      "message": "Missing required field 'tests'",
      "location": "root",
      "suggestion": "Add a 'tests' field containing unit tests for the code"
    },
    {
      "severity": "error",
      "type": "criteria_not_met",
      "message": "Acceptance criterion not met: Tests are included",
      "location": "N/A",
      "suggestion": "Review output and ensure tests are included"
    },
    {
      "severity": "warning",
      "type": "style_issue",
      "message": "Function lacks docstring",
      "location": "function:sort_list",
      "suggestion": "Add docstring explaining parameters and return value"
    }
  ],
  "passed_criteria": [
    "Code implements sorting functionality"
  ],
  "failed_criteria": [
    "Tests are included"
  ],
  "quality_score": 0.55,
  "metadata": {
    "validation_types_run": ["schema", "criteria", "quality"],
    "total_issues": 3,
    "error_count": 2,
    "warning_count": 1,
    "info_count": 0,
    "duration_ms": 1150
  }
}

Example 3: Hallucination Detection

{
  "valid": false,
  "confidence": 0.72,
  "issues": [
    {
      "severity": "error",
      "type": "hallucination",
      "message": "CVSS score stated as 9.8 but actual score is 7.5 according to NVD",
      "location": "summary:cvss_score",
      "suggestion": "Update CVSS score to 7.5 based on https://nvd.nist.gov/vuln/detail/CVE-2024-12345"
    },
    {
      "severity": "error",
      "type": "hallucination",
      "message": "Affected versions claim 'prior to 1.24.0' but actually 'prior to 1.24.1'",
      "location": "summary:affected_versions",
      "suggestion": "Correct affected versions to 'prior to 1.24.1'"
    },
    {
      "severity": "error",
      "type": "unsupported_claim",
      "message": "Discoverer 'Alice Smith' not found in sources",
      "location": "summary:discoverer",
      "suggestion": "Remove unsupported claim or provide valid source"
    },
    {
      "severity": "warning",
      "type": "fact_mismatch",
      "message": "Discovery date stated as March but actual date is February",
      "location": "summary:discovery_date",
      "suggestion": "Correct discovery date to February 2024"
    }
  ],
  "passed_criteria": [],
  "failed_criteria": [
    "All facts are supported by trusted sources",
    "No hallucinations present"
  ],
  "quality_score": 0.35,
  "metadata": {
    "validation_types_run": ["facts", "hallucination"],
    "total_issues": 4,
    "error_count": 3,
    "warning_count": 1,
    "info_count": 0,
    "duration_ms": 2800,
    "model": "gpt-3.5-turbo"
  }
}

Example 4: Quality Assessment (Low Score)

{
  "valid": true,
  "confidence": 0.68,
  "issues": [
    {
      "severity": "warning",
      "type": "complexity_issue",
      "message": "Function has cyclomatic complexity of 15 (recommended max: 10)",
      "location": "function:calculate_statistics",
      "suggestion": "Refactor into smaller helper functions"
    },
    {
      "severity": "warning",
      "type": "performance_issue",
      "message": "Nested loops result in O(n²) complexity",
      "location": "function:find_duplicates",
      "suggestion": "Use a set-based approach for O(n) complexity"
    },
    {
      "severity": "warning",
      "type": "security_issue",
      "message": "User input not sanitized before use in shell command",
      "location": "line:87",
      "suggestion": "Use subprocess with parameterized commands instead of shell=True"
    },
    {
      "severity": "warning",
      "type": "readability_issue",
      "message": "Variable name 'x' is not descriptive",
      "location": "function:process_data",
      "suggestion": "Rename to descriptive name like 'user_count' or 'total_items'"
    },
    {
      "severity": "info",
      "type": "style_issue",
      "message": "Line length exceeds 88 characters (PEP 8 recommendation)",
      "location": "line:42",
      "suggestion": "Break line into multiple lines"
    }
  ],
  "passed_criteria": [
    "Code is functional",
    "Tests pass"
  ],
  "failed_criteria": [],
  "quality_score": 0.52,
  "metadata": {
    "validation_types_run": ["quality"],
    "total_issues": 5,
    "error_count": 0,
    "warning_count": 4,
    "info_count": 1,
    "duration_ms": 3500,
    "model": "gpt-4"
  }
}

Usage Patterns

Pattern 1: Interpreting Validation Results

function interpretValidationResult(result: ValidationResult): string {
  if (result.valid && result.quality_score >= 0.8) {
    return '✅ Output is excellent and ready to use';
  }

  if (result.valid && result.quality_score >= 0.6) {
    return '⚠️ Output is acceptable but could be improved';
  }

  if (result.valid && result.quality_score < 0.6) {
    return '⚠️ Output is valid but quality is below threshold';
  }

  if (!result.valid && result.confidence > 0.8) {
    return '❌ Output is invalid (high confidence)';
  }

  if (!result.valid && result.confidence < 0.5) {
    return '❓ Output may be invalid (low confidence, manual review needed)';
  }

  return '❌ Output is invalid';
}

Pattern 2: Filtering Issues by Severity

def get_blocking_issues(result: ValidationResult) -> List[ValidationIssue]:
    """Get only error-level issues that block acceptance."""
    return [issue for issue in result.issues if issue.severity == "error"]

def has_security_issues(result: ValidationResult) -> bool:
    """Check if any security issues were found."""
    return any(issue.type == "security_issue" for issue in result.issues)

# Example usage
result = await judge_client.validate(output)

blocking = get_blocking_issues(result)
if blocking:
    print(f"❌ {len(blocking)} blocking issues found:")
    for issue in blocking:
        print(f"  - {issue.message}")

if has_security_issues(result):
    print("🔒 Security issues detected, review required")

Pattern 3: Automatic Retry with Lower Quality Threshold

async function validateWithRetry(
  output: any,
  minQualityScore: number = 0.8,
  maxRetries: number = 3
): Promise<ValidationResult> {
  let currentQuality = minQualityScore;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    const result = await judgeClient.validate({
      output,
      validationTypes: ['schema', 'criteria', 'quality']
    });

    // If valid and meets quality threshold, return
    if (result.valid && result.quality_score >= currentQuality) {
      console.log(`✅ Validation passed (attempt ${attempt})`);
      return result;
    }

    // Lower quality threshold for subsequent attempts
    currentQuality = Math.max(0.5, currentQuality - 0.1);

    console.log(`❌ Attempt ${attempt} failed (quality: ${result.quality_score.toFixed(2)})`);

    if (attempt < maxRetries) {
      console.log(`Retrying with lower threshold: ${currentQuality.toFixed(2)}...`);
    }
  }

  throw new Error('Validation failed after maximum retries');
}

Pattern 4: Issue Aggregation and Reporting

from collections import defaultdict

def generate_validation_report(result: ValidationResult) -> str:
    """Generate human-readable validation report."""

    report = []
    report.append(f"Validation Result: {'✅ PASS' if result.valid else '❌ FAIL'}")
    report.append(f"Confidence: {result.confidence:.2f}")
    report.append(f"Quality Score: {result.quality_score:.2f}")
    report.append("")

    # Group issues by severity
    issues_by_severity = defaultdict(list)
    for issue in result.issues:
        issues_by_severity[issue.severity].append(issue)

    # Report errors
    if "error" in issues_by_severity:
        report.append(f"🔴 ERRORS ({len(issues_by_severity['error'])})")
        for issue in issues_by_severity["error"]:
            report.append(f"  • [{issue.type}] {issue.message}")
            if issue.suggestion:
                report.append(f"    → {issue.suggestion}")
        report.append("")

    # Report warnings
    if "warning" in issues_by_severity:
        report.append(f"🟡 WARNINGS ({len(issues_by_severity['warning'])})")
        for issue in issues_by_severity["warning"]:
            report.append(f"  • [{issue.type}] {issue.message}")
        report.append("")

    # Report criteria results
    if result.passed_criteria:
        report.append(f"✅ PASSED CRITERIA ({len(result.passed_criteria)})")
        for criterion in result.passed_criteria:
            report.append(f"  • {criterion}")
        report.append("")

    if result.failed_criteria:
        report.append(f"❌ FAILED CRITERIA ({len(result.failed_criteria)})")
        for criterion in result.failed_criteria:
            report.append(f"  • {criterion}")
        report.append("")

    return "\n".join(report)

# Example usage
result = await judge_client.validate(output)
print(generate_validation_report(result))

Best Practices

1. Always Check Both `valid` and `quality_score`

Why: An output can be valid but still low quality How: Set minimum thresholds for both

if result.valid and result.quality_score >= 0.7:
    accept_output(output)
else:
    reject_output(output)

2. Filter Issues by Severity for Decision-Making

Why: Not all issues are blocking How: Only treat errors as blocking, warnings as advisory

const errors = result.issues.filter(i => i.severity === 'error');
if (errors.length === 0) {
  // Accept with warnings
  acceptWithWarnings(output, result);
} else {
  // Reject due to errors
  reject(output, errors);
}

3. Use Confidence Scores for Manual Review Triggers

Why: Low confidence indicates uncertainty How: Trigger manual review for low confidence

if result.confidence < 0.6:
    send_for_manual_review(output, result)
elif result.valid:
    accept_automatically(output)
else:
    reject_automatically(output)

4. Track Issue Types Over Time

Why: Identify patterns and improve prompts How: Log issue types for analysis

// Track issue types in metrics
for (const issue of result.issues) {
  metrics.recordIssue(issue.type, issue.severity);
}

// Analyze trends
const commonIssues = metrics.getTopIssues(limit: 10);
console.log('Most common issues:', commonIssues);

JSON Schema

Complete JSON Schema for validation:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "ValidationResult",
  "type": "object",
  "required": ["valid", "confidence", "issues", "quality_score"],
  "properties": {
    "valid": {
      "type": "boolean",
      "description": "Whether output is valid (no errors)"
    },
    "confidence": {
      "type": "number",
      "minimum": 0.0,
      "maximum": 1.0,
      "description": "Confidence in validation result"
    },
    "issues": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/ValidationIssue"
      },
      "description": "List of issues found"
    },
    "passed_criteria": {
      "type": "array",
      "items": {"type": "string"},
      "description": "Acceptance criteria that passed"
    },
    "failed_criteria": {
      "type": "array",
      "items": {"type": "string"},
      "description": "Acceptance criteria that failed"
    },
    "quality_score": {
      "type": "number",
      "minimum": 0.0,
      "maximum": 1.0,
      "description": "Overall quality score"
    },
    "metadata": {
      "type": "object",
      "properties": {
        "validation_types_run": {
          "type": "array",
          "items": {"type": "string"}
        },
        "total_issues": {"type": "integer"},
        "error_count": {"type": "integer"},
        "warning_count": {"type": "integer"},
        "info_count": {"type": "integer"},
        "duration_ms": {"type": "number"},
        "model": {"type": "string"}
      }
    }
  },
  "definitions": {
    "ValidationIssue": {
      "type": "object",
      "required": ["severity", "type", "message"],
      "properties": {
        "severity": {
          "type": "string",
          "enum": ["error", "warning", "info"],
          "description": "Issue severity level"
        },
        "type": {
          "type": "string",
          "description": "Issue type (e.g., schema_violation, hallucination)"
        },
        "message": {
          "type": "string",
          "minLength": 10,
          "maxLength": 500,
          "description": "Human-readable issue description"
        },
        "location": {
          "type": "string",
          "description": "Where the issue was found"
        },
        "suggestion": {
          "type": "string",
          "minLength": 10,
          "maxLength": 500,
          "description": "How to fix the issue"
        }
      }
    }
  }
}

Keyboard shortcuts

OctoLLM Documentation