2. Authentication and Authorization Architecture¶

Date: 2025-12-30

Status¶

Accepted

Context¶

Imbi v2 requires a comprehensive authentication and authorization system to: - Secure API endpoints (all except /status and /docs) - Support multiple authentication methods for different use cases - Enable flexible permission management for users and service accounts - Provide audit logging for compliance and security monitoring - Support both UI users and inter-service communication

Requirements¶

Authentication Methods:
Local username/password for internal users
OAuth2 (GitHub, Google Workspace, Generic OIDC) for SSO
JWT tokens for stateless API access
API keys for service-to-service communication
Authorization Model:
Hybrid permission system combining role-based and resource-level access control
Support for permission inheritance through role and group hierarchies
Fine-grained permissions (e.g., project:read, blueprint:write)
Resource-specific access grants (e.g., user X can write to project Y)
Audit Requirements:
Log all authentication events (login, logout, token issuance)
Log all authorization decisions (permission checks, access grants/denials)
Retain logs for 2 years for compliance
Support analytics queries on auth patterns
Service Accounts:
Support for imbi-automations, imbi-webhooks, and future services
Long-lived authentication credentials
Scoped permissions per service

Decision¶

1. Data Storage Strategy¶

Neo4j for User and Permission Data

We will store users, groups, roles, permissions, API keys, and OAuth links in Neo4j alongside existing service data.

Rationale: - Natural fit for permission inheritance modeling (role hierarchies, group membership) - Efficient graph traversal for "who can access what" queries - Single source of truth for all Imbi entities and their relationships - Existing infrastructure and operational knowledge

Trade-offs: - Neo4j is less mature for authentication than PostgreSQL - Must implement custom auth logic rather than using off-the-shelf solutions - Acceptable given the graph-centric nature of Imbi and existing Neo4j investment

ClickHouse for Audit Logs

We will store authentication and authorization audit events in ClickHouse.

Rationale: - Already in stack for analytics - Excellent performance for time-series data - Built-in TTL for automatic retention management (2 years) - Efficient compression for large log volumes - SQL-like interface for audit queries

Trade-offs: - Less flexible than Elasticsearch for full-text search - Sufficient for structured audit event queries and compliance reporting

2. Token Strategy¶

JWT Tokens with Revocation List

We will use JWT (JSON Web Tokens) for authentication with a revocation mechanism.

Token Types: - Access tokens: Short-lived (1 hour), used for API requests - Refresh tokens: Long-lived (30 days), used to obtain new access tokens

Token Storage: - JWTs are stateless (not stored server-side for validation) - Token metadata stored in Neo4j for revocation and tracking - Each token has a unique JTI (JWT ID) for identification

Rationale: - Stateless validation enables horizontal scaling - Self-contained tokens reduce database lookups - Well-suited for inter-service communication - Revocation list handles logout and security events - Industry standard with mature libraries

Trade-offs: - Larger token size than opaque tokens - Permissions are not in the JWT (must be loaded per request) - Revocation requires database check, but only on explicit logout/revocation - Acceptable given scaling benefits and inter-service requirements

3. Password Hashing¶

Argon2id Algorithm

We will use Argon2id for password hashing via the argon2-cffi library.

Rationale: - Winner of Password Hashing Competition (2015) - Memory-hard function resistant to GPU/ASIC attacks - Configurable parameters (memory cost, time cost, parallelism) - Automatic rehashing when parameters are upgraded - Industry best practice for new applications

Trade-offs: - Newer than bcrypt (but well-established since 2015) - Slightly slower than bcrypt (intentional security feature) - Preferred for modern applications

4. OAuth2 Integration¶

Authlib Library with Multiple Providers

We will use Authlib for OAuth2/OIDC integration supporting: - GitHub OAuth - Google Workspace OAuth - Generic OIDC (Okta, Auth0, Keycloak, etc.)

Rationale: - Modern, actively maintained library - Native Starlette/FastAPI integration - Full OIDC support out-of-the-box - Handles OAuth2 security concerns (state parameter, redirect validation) - Extensible for future providers

OAuth User Linking: - OAuth identities stored as relationships to User nodes - Multiple OAuth providers can link to same user - Email matching for automatic account linking (configurable)

Trade-offs: - Additional dependency with moderate complexity - Requires provider-specific configuration - Necessary for enterprise SSO requirements

5. Permission Model¶

Hybrid: Role-Based + Resource-Level Access Control

We will implement a hybrid permission model combining RBAC and resource-specific grants.

Global Permissions (RBAC): - Permissions named as resource:action (e.g., project:read, blueprint:write) - Roles contain multiple permissions - Users assigned roles directly or through group membership - Role inheritance (e.g., admin role inherits from developer role)

Resource-Level Permissions: - CAN_ACCESS relationships from users/groups to specific resources - Relationship properties specify allowed actions (read, write, delete) - Override model: resource-level grants supplement global permissions

Permission Resolution: 1. Check global permission (user → roles → permissions) 2. If not found, check resource-level permission (user → CAN_ACCESS → resource) 3. Permissions collected through group membership and role inheritance 4. Graph traversal in single Cypher query for efficiency

Rationale: - Balances simplicity (RBAC) with flexibility (resource-level) - Natural fit for graph database - Supports both broad roles (admin, developer) and specific grants - Enables delegation (project owner grants access to their projects)

Trade-offs: - More complex than pure RBAC - Permission checks require graph traversal - Acceptable given requirement for fine-grained access control

6. Authorization Pattern¶

FastAPI Dependency Injection (Not Middleware)

We will use FastAPI's dependency injection for authentication and authorization.

Pattern:

@router.get('/resource')
async def get_resource(
    auth: Annotated[AuthContext, Depends(get_current_user)],
    # ... or ...
    auth: Annotated[AuthContext, Depends(require_permission('resource:read'))],
):
    # Endpoint code with authenticated user context

Rationale: - FastAPI's idiomatic approach - Granular control per endpoint - Clear intent (explicit permission requirements in function signature) - Easy to test (mock dependencies) - Supports optional authentication (public endpoints don't use dependency) - Composable (can layer multiple dependencies)

Trade-offs: - More boilerplate than global middleware - Must remember to add dependencies to protected endpoints - Benefits outweigh drawbacks: clarity, testability, flexibility

7. Inter-Service Authentication¶

Dual Approach: JWTs and API Keys

We will support both JWT tokens and API keys for service accounts.

Service Accounts: - Special users with is_service_account=True - Can authenticate with JWTs or API keys - Assigned roles like regular users

JWT for Services: - Services can login and receive JWTs - Suitable for temporary credentials - Supports token refresh

API Keys for Services: - Long-lived credentials (up to 1 year) - Format: imbi_key_{id}_{secret} for easy identification - Hashed storage (SHA-256) - Optional scope restrictions - Tracked usage (last_used timestamp)

Rationale: - JWTs: Better for services that can manage token rotation - API Keys: Simpler for services needing stable credentials - Flexibility accommodates different service integration patterns - Both methods use same permission system

Trade-offs: - Maintaining two authentication methods adds complexity - Necessary to support diverse service requirements

8. Security Measures¶

Password Policy: - Minimum length: 12 characters (configurable) - Required character types: uppercase, lowercase, digit, special - Automatic password hash upgrades - No password reuse checking in v1 (can add in future)

Token Security: - Short-lived access tokens (1 hour) limit exposure window - Refresh tokens can be revoked - JTI tracking enables individual token revocation - Tokens never logged or exposed in error messages

API Key Security: - Prefix imbi_key_ enables detection in logs/code (like GitHub tokens) - Hashed storage prevents exposure if database compromised - Expiry enforcement - Full key shown only once at creation - Per-key scope restrictions

OAuth2 Security: - State parameter prevents CSRF attacks - Redirect URI validation - Domain whitelist for email-based access (optional) - OAuth tokens stored encrypted (future enhancement)

Audit Logging: - All auth events logged with timestamp, IP, user agent - Failed attempts tracked for security monitoring - 2-year retention via ClickHouse TTL - Structured logging enables automated alerting

Rate Limiting (Phase 8): - Login attempts: 5 per minute per IP - API key creation: 10 per hour per user - Token refresh: 60 per hour per user - Prevents brute force and abuse

Consequences¶

Positive¶

Comprehensive Authentication: Supports multiple methods (password, OAuth2, JWT, API keys) for different use cases
Flexible Authorization: Hybrid permission model supports both role-based and fine-grained access control
Scalable Architecture: Stateless JWTs enable horizontal scaling without session management
Audit Compliance: All auth events logged to ClickHouse with 2-year retention
Graph-Native: Leverages Neo4j for natural permission inheritance and relationship modeling
Service-Friendly: Both JWTs and API keys support inter-service authentication
Security Best Practices: Argon2id hashing, token revocation, rate limiting, audit logging
Testable Design: Dependency injection enables easy mocking and testing
Maintainable: Clear separation of concerns (auth/core, auth/permissions, auth/oauth2, auth/audit)
Extensible: Can add new OAuth providers, permission types, or auth methods without refactoring

Negative¶

Implementation Complexity: Building custom auth system requires significant development effort
Neo4j for Auth: Less common than PostgreSQL for user data, fewer reference implementations
Permission Check Overhead: Graph traversal for permission resolution adds latency
Dual Auth Methods: Supporting both JWTs and API keys increases maintenance burden
Revocation Overhead: Token revocation requires database lookup, losing full stateless benefit
Testing Burden: More test scenarios with multiple auth methods and permission combinations

Mitigation Strategies¶

Phased Implementation: 8 phases spread over 9+ weeks reduces risk and enables early feedback
Comprehensive Testing: 90% coverage requirement ensures reliability
Permission Caching: Load permissions once per request, cache in AuthContext
Index Optimization: Neo4j indexes on username, email, API key IDs for fast lookups
Documentation: ADR, API docs, and developer guides reduce onboarding friction
Reference Patterns: Following FastAPI best practices makes code familiar to Python developers

Risks¶

Performance at Scale: Permission checks via graph traversal may not scale to millions of users
Mitigation: Index optimization, permission caching, consider caching layer if needed
OAuth Provider Changes: External OAuth APIs may change or deprecate
Mitigation: Authlib library abstracts provider specifics, version pinning
Token Compromise: If JWT secret leaked, all tokens compromised
Mitigation: Secret rotation capability, monitor for suspicious activity, short token lifetime
Complexity for Simple Cases: Hybrid permission model may be overkill for basic use cases
Mitigation: Simple roles (admin, viewer) provide easy starting point, complexity is opt-in

Future Enhancements¶

Not in scope for initial implementation, but architecturally supported:

Multi-Factor Authentication (MFA): Can add TOTP/WebAuthn with minimal changes
Magic Link Authentication: Passwordless auth via email links
SAML Support: Enterprise SSO via SAML in addition to OAuth2/OIDC
Passwordless Service Auth: mTLS or certificate-based authentication
Permission Caching Layer: Redis cache for permission resolution at scale
Attribute-Based Access Control (ABAC): Context-aware permissions (time, location, resource properties)
OAuth Token Encryption: Encrypt stored OAuth tokens at rest
Password History: Prevent password reuse
Session Management: Track and limit concurrent sessions per user
Anomaly Detection: ML-based detection of unusual auth patterns