AI API Proxy Usage Limits
Implement robust usage limits, rate limiting, and quota management for AI API proxies. Protect your infrastructure while ensuring fair access.
75%
Monthly Quota Usage
75,000 / 100,000 requests used
Rate Limiting
Control request rates per user, API key, or IP address. Prevent abuse and ensure fair resource allocation.
Quota Management
Set daily, weekly, or monthly usage quotas. Track consumption and notify users approaching limits.
Rate Limiter Implementation
rate_limiter.py
# Token Bucket Rate Limiter
import time
from collections import defaultdict
class TokenBucketLimiter:
def __init__(self, rate, capacity):
self.rate = rate # tokens per second
self.capacity = capacity
self.buckets = defaultdict(lambda: {"tokens": capacity, "last": time.time()})
def allow_request(self, key):
bucket = self.buckets[key]
now = time.time()
# Refill tokens based on time elapsed
elapsed = now - bucket["last"]
bucket["tokens"] = min(
self.capacity,
bucket["tokens"] + elapsed * self.rate
)
bucket["last"] = now
if bucket["tokens"] >= 1:
bucket["tokens"] -= 1
return True
return False
def get_remaining(self, key):
return int(self.buckets[key]["tokens"])
Common Rate Limit Strategies
rate_limits.yaml
rate_limits:
# Per-user rate limits
user:
requests_per_minute: 60
requests_per_hour: 1000
requests_per_day: 10000
# Per-API-key limits
api_key:
requests_per_minute: 100
tokens_per_minute: 150000
# Per-IP limits
ip:
requests_per_second: 10
requests_per_minute: 300
# Burst handling
burst:
max_burst_size: 20
burst_window: 5 # seconds