Error Handling

kube-ingress-dash implements comprehensive error handling to provide a reliable and resilient user experience.

Architecture Overview

The error handling system consists of multiple layers:

Error Classification - Categorizes errors as transient or permanent
Retry Logic - Automatically retries transient errors with exponential backoff (implemented)
Circuit Breaker - Prevents cascading failures during outages (implemented)
Error Boundaries - Catches and displays UI errors gracefully
User Feedback - Provides clear, actionable error messages

Implementation Status

The core error handling infrastructure (classification, retry logic, and circuit breaker) has been implemented. Integration with all API routes is ongoing.

For detailed architecture diagrams including error handling flow, circuit breaker state machine, and retry logic, see the Production Features Architecture documentation.

Error Classification

Errors are automatically classified into two categories:

Transient Errors

Temporary failures that may succeed on retry:

Network timeouts (ETIMEDOUT, ECONNRESET)
Connection refused (ECONNREFUSED)
DNS resolution failures (ENOTFOUND)
HTTP 503 Service Unavailable
HTTP 429 Too Many Requests
Temporary Kubernetes API unavailability

Handling: Automatically retried with exponential backoff

Permanent Errors

Failures that won't succeed on retry:

Authentication failures (401 Unauthorized)
Permission denied (403 Forbidden)
Resource not found (404 Not Found)
Invalid requests (400 Bad Request)
RBAC permission errors

Handling: Returned immediately to user with helpful message

Retry Logic

Transient errors are automatically retried using exponential backoff:

// Retry configuration
{
  maxAttempts: 3,
  initialDelay: 1000,      // 1 second
  maxDelay: 30000,         // 30 seconds
  backoffMultiplier: 2     // Double delay each retry
}

Retry Sequence

First attempt - Immediate
First retry - After 1 second
Second retry - After 2 seconds
Third retry - After 4 seconds
Give up - Return error to user

Example

Attempt 1: Failed (ETIMEDOUT)
Wait 1s...
Attempt 2: Failed (ETIMEDOUT)
Wait 2s...
Attempt 3: Success ✓

Circuit Breaker

Protects the application and Kubernetes API from cascading failures.

States

Closed (Normal Operation)

All requests pass through
Failures are tracked
Transitions to Open if failure threshold exceeded

Open (Failing Fast)

Requests fail immediately without calling Kubernetes API
Prevents overload during outages
Transitions to Half-Open after timeout

Half-Open (Testing Recovery)

Limited requests allowed through
Tests if service has recovered
Transitions to Closed if successful, back to Open if failed

Configuration

{
  failureThreshold: 0.5,    // Open at 50% failure rate
  windowSize: 30000,        // 30-second window
  openTimeout: 60000        // Wait 60s before testing recovery
}

Behavior Example

Time 0s:  Circuit Closed - Normal operation
Time 10s: 15 failures out of 30 requests (50%)
Time 10s: Circuit Opens - Fail fast mode
Time 70s: Circuit Half-Open - Testing recovery
Time 71s: Test request succeeds
Time 71s: Circuit Closed - Back to normal

Error Boundaries

React Error Boundaries catch and handle UI errors gracefully.

Dashboard Error Boundary

Catches errors in the entire dashboard:

<DashboardErrorBoundary>
  <Dashboard />
</DashboardErrorBoundary>

Fallback: Full-page error screen with retry option

Component Error Boundaries

Catches errors in specific components:

IngressListErrorBoundary - Ingress list rendering errors
FiltersErrorBoundary - Filter component errors

Fallback: Component-level error message, rest of UI continues working

Benefits

Prevents entire app from crashing
Isolates errors to affected components
Provides recovery options
Maintains user context

User-Facing Error Messages

Clear, actionable error messages help users resolve issues:

Permission Errors

Permission Error

You don't have sufficient permissions to access Kubernetes resources.
Check your RBAC configuration.

[View RBAC Setup Documentation]  [Retry]

API Errors

API Error

There was an issue connecting to the Kubernetes API.
Please check your cluster configuration.

[Retry]

Generic Errors

Something went wrong

An unexpected error occurred. Please try again.

[Retry]

Error Logging

Errors are logged with context for debugging:

{
  "level": "error",
  "message": "Kubernetes API request failed",
  "error": {
    "code": "ETIMEDOUT",
    "message": "Request timeout",
    "statusCode": null
  },
  "context": {
    "operation": "listIngresses",
    "namespace": "default",
    "attempt": 2,
    "timestamp": "2024-01-15T10:30:00.000Z"
  }
}

Monitoring and Alerts

Metrics to Monitor

Error rate - Percentage of failed requests
Circuit breaker state - Open/Closed/Half-Open
Retry attempts - Number of retries per request
Error types - Distribution of error categories

Recommended Alerts

High error rate - Alert if error rate > 10% for 5 minutes
Circuit breaker open - Alert when circuit opens
Repeated failures - Alert if same error occurs > 10 times
Permission errors - Alert on RBAC errors (may indicate misconfiguration)

Troubleshooting

High Error Rate

Symptoms: Many requests failing

Possible Causes:

Kubernetes API unavailable or slow
Network connectivity issues
RBAC permissions incorrect
Resource limits too low

Solutions:

Check Kubernetes API health
Verify network connectivity
Review RBAC configuration
Increase resource limits

Circuit Breaker Frequently Opening

Symptoms: Circuit breaker opens repeatedly

Possible Causes:

Kubernetes API instability
Application resource constraints
Network issues

Solutions:

Investigate Kubernetes API health
Increase application resources
Adjust circuit breaker thresholds
Check network latency

Permission Errors

Symptoms: 403 Forbidden errors

Possible Causes:

Missing RBAC permissions
Incorrect service account
Namespace restrictions

Solutions:

Verify ClusterRole includes all required permissions
Check ClusterRoleBinding references correct ServiceAccount
Ensure ServiceAccount is mounted in pod
Review RBAC Setup Guide

Best Practices

Monitor error rates - Set up alerts for unusual error patterns
Review logs regularly - Look for recurring errors
Test error scenarios - Verify error handling works as expected
Update RBAC - Ensure permissions match application needs
Tune circuit breaker - Adjust thresholds based on your environment

Configuration

info

Error handling configuration is currently hardcoded in the implementation. Environment variable configuration is planned for a future release.

Current default configuration:

Retry attempts: 3
Initial delay: 1000ms
Max delay: 30000ms
Circuit breaker failure threshold: 50%
Circuit breaker window: 30 seconds
Circuit breaker timeout: 60 seconds

Architecture Overview​

Error Classification​

Transient Errors​

Permanent Errors​

Retry Logic​

Retry Sequence​

Example​

Circuit Breaker​

States​

Closed (Normal Operation)​

Open (Failing Fast)​

Half-Open (Testing Recovery)​

Configuration​

Behavior Example​

Error Boundaries​

Dashboard Error Boundary​

Component Error Boundaries​

Benefits​

User-Facing Error Messages​

Permission Errors​

API Errors​

Generic Errors​

Error Logging​

Monitoring and Alerts​

Metrics to Monitor​

Recommended Alerts​

Troubleshooting​

High Error Rate​

Circuit Breaker Frequently Opening​

Permission Errors​

Best Practices​

Configuration​

Related Documentation​

Architecture Overview

Error Classification

Transient Errors

Permanent Errors

Retry Logic

Retry Sequence

Example

Circuit Breaker

States

Closed (Normal Operation)

Open (Failing Fast)

Half-Open (Testing Recovery)

Configuration

Behavior Example

Error Boundaries

Dashboard Error Boundary

Component Error Boundaries

Benefits

User-Facing Error Messages

Permission Errors

API Errors

Generic Errors

Error Logging

Monitoring and Alerts

Metrics to Monitor

Recommended Alerts

Troubleshooting

High Error Rate

Circuit Breaker Frequently Opening

Permission Errors

Best Practices

Configuration

Related Documentation