ClickHouse Client¶
The ClickHouse client provides async access to the ClickHouse analytics database with GDPR-compliant privacy utilities.
Overview¶
The ClickHouse client manages connections to ClickHouse and provides query execution, data insertion, and schema management capabilities. It includes utilities for GDPR-compliant data handling.
Basic Usage¶
from imbi_common import clickhouse
# Initialize the client
await clickhouse.initialize()
# Set up database schema from schemata.toml
await clickhouse.setup_schema()
# Insert data (list of Pydantic models)
await clickhouse.insert("table_name", [model_instance])
# Query data
results = await clickhouse.query(
"SELECT user_id, COUNT(*) as count "
"FROM session_activity "
"GROUP BY user_id"
)
for row in results:
print(f"{row['user_id']}: {row['count']}")
Privacy Utilities¶
from imbi_common.clickhouse import privacy
# Truncate IP addresses for GDPR compliance
ipv4 = privacy.truncate_ip_to_subnet("192.168.1.100") # "192.168.1.0"
ipv6 = privacy.truncate_ip_to_subnet("2001:0db8::1") # "2001:0db8::"
# Hash IP addresses
hashed = privacy.hash_ip_address("192.168.1.100")
API Reference¶
Initialization¶
initialize
async
¶
setup_schema
async
¶
Query Operations¶
query
async
¶
Query the Clickhouse database and return results as a list of dicts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql
|
str
|
SQL query string, possibly with parameter placeholders |
required |
parameters
|
dict[str, Any] | None
|
Optional dictionary of parameter values |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of dictionaries mapping column names to values |
Raises:
| Type | Description |
|---|---|
DatabaseError
|
If there's an error executing the query |
Source code in src/imbi_common/clickhouse/__init__.py
insert
async
¶
Insert data into Clickhouse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str
|
The name of the table to insert into |
required |
data
|
list[BaseModel]
|
List of Pydantic models to insert (all must be the same type) |
required |
Returns:
| Type | Description |
|---|---|
QuerySummary
|
QuerySummary containing information about the insert operation |
Raises:
| Type | Description |
|---|---|
ValueError
|
If data list is empty or models are not all the same type |
Source code in src/imbi_common/clickhouse/__init__.py
Client¶
Clickhouse ¶
Source code in src/imbi_common/clickhouse/client.py
aclose
async
¶
Close any open connections to Clickhouse.
Source code in src/imbi_common/clickhouse/client.py
get_instance
classmethod
¶
initialize
async
¶
Create a new async client and test the connection.
Source code in src/imbi_common/clickhouse/client.py
insert
async
¶
Insert data into Clickhouse
Source code in src/imbi_common/clickhouse/client.py
query
async
¶
Query the Clickhouse database and return results as a list of dicts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
statement
|
str
|
SQL query string, possibly with parameter placeholders |
required |
parameters
|
dict[str, Any] | None
|
Optional dictionary of parameter values |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of dictionaries mapping column names to values |
Raises:
| Type | Description |
|---|---|
DatabaseError
|
If there's an error executing the query |
Source code in src/imbi_common/clickhouse/client.py
setup_schema
async
¶
Execute DDL queries from schemata.toml to set up database schema.
This should be called explicitly during initial setup, not on every startup. Loads and executes enabled queries from schemata.toml.
Source code in src/imbi_common/clickhouse/client.py
Privacy Utilities¶
hash_ip_address ¶
Hash IP address using SHA256 for privacy-preserving storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ip
|
str
|
IP address string (IPv4 or IPv6) |
required |
Returns:
| Type | Description |
|---|---|
str
|
SHA256 hex digest of the IP address |
Example
hash_ip_address('192.168.1.1') 'c775e7b757ede630cd0aa1113bd102661ab38829ca52a6422ab782862f268646'
Source code in src/imbi_common/clickhouse/privacy.py
truncate_ip_to_subnet ¶
Truncate IP address to subnet for privacy-preserving storage.
IPv4 addresses are truncated to /24 subnet (last octet zeroed). IPv6 addresses are truncated to /48 subnet.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ip
|
str
|
IP address string (IPv4 or IPv6) |
required |
Returns:
| Type | Description |
|---|---|
str
|
Truncated IP subnet string |
Raises:
| Type | Description |
|---|---|
ValueError
|
If IP address is invalid |
Examples:
>>> truncate_ip_to_subnet('192.168.1.1')
'192.168.1.0'
>>> truncate_ip_to_subnet('2001:0db8:85a3:0000:0000:8a2e:0370:7334')
'2001:db8:85a3::'
Source code in src/imbi_common/clickhouse/privacy.py
parse_user_agent ¶
Parse user agent string to extract browser family and version.
This function performs basic parsing without external dependencies. For production use, consider using the 'user-agents' package for more accurate parsing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_agent
|
str | None
|
User agent string from HTTP headers |
required |
Returns:
| Type | Description |
|---|---|
str
|
Tuple of (browser_family, version) where version is 'major.minor' |
str
|
Returns ('unknown', 'unknown') if parsing fails |
Examples:
>>> parse_user_agent(
... 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
... 'AppleWebKit/537.36 (KHTML, like Gecko) '
... 'Chrome/91.0.4472.124 Safari/537.36'
... )
('Chrome', '91.0')
Note
This is a simplified parser. For better accuracy, integrate a dedicated user agent parsing library like 'user-agents' or 'ua-parser'.
Source code in src/imbi_common/clickhouse/privacy.py
sanitize_metadata ¶
Sanitize metadata dictionary to ensure no PII is included.
Removes or redacts fields that commonly contain PII: - email, password, token, api_key, secret - Any field containing 'email', 'password', 'token', 'secret'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
dict[str, Any]
|
Dictionary of metadata to sanitize |
required |
Returns:
| Type | Description |
|---|---|
str
|
JSON string with PII fields redacted |
Example
sanitize_metadata({ ... 'endpoint': '/api/users', ... 'email': 'user@example.com', ... 'status': 200 ... }) '{"endpoint": "/api/users", "email": "[REDACTED]", "status": 200}'