Mobile-Specific Considerations
Mobile apps face unique challenges when integrating AI API gateways: battery life, network reliability, data limits, and app size constraints.
🔋 Battery Optimization
Minimize background API calls and batch requests to reduce battery drain from network radio usage.
📶 Offline Support
Implement caching and local fallbacks for AI features when network connectivity is unavailable.
💾 Data Efficiency
Compress requests and use delta updates to minimize cellular data usage for users on limited plans.
iOS Integration
Swift SDK Setup
// Package.swift
dependencies: [
.package(url: "https://github.com/example/ai-gateway-swift", from: "2.0.0")
]
// Initialize SDK
import AIGateway
let client = AIGatewayClient(
apiKey: "your-api-key",
configuration: .init(
baseURL: URL(string: "https://api.yourgateway.com/v1")!,
timeout: 60,
retryPolicy: .exponentialBackoff(maxRetries: 3)
)
)
// Make request
let response = try await client.chatCompletion(
messages: [.user("Hello, AI!")],
model: "gpt-4"
)
Background Processing
// Handle background API calls efficiently
class BackgroundAIService {
func processInBackground(_ request: AIRequest) async {
let task = Task {
// Process AI request
let result = try await client.process(request)
// Store result locally
await CacheManager.shared.store(result)
}
// Register with BackgroundTasks framework
BGTaskScheduler.shared.submit(task)
}
}
iOS Best Practice
Use URLSessionConfiguration.background for AI requests that need to complete even if the app is suspended. Handle completion in AppDelegate or SceneDelegate.
Android Integration
Kotlin SDK Setup
// build.gradle
implementation 'com.example:ai-gateway-android:2.0.0'
// Initialize SDK
val client = AIGatewayClient(
apiKey = "your-api-key",
config = ClientConfig(
baseUrl = "https://api.yourgateway.com/v1",
timeout = 60.seconds,
retryPolicy = RetryPolicy.ExponentialBackoff(maxRetries = 3)
)
)
// Make request (coroutine)
lifecycleScope.launch {
val response = client.chatCompletion(
messages = listOf(Message.User("Hello, AI!")),
model = "gpt-4"
)
}
WorkManager Integration
// Background AI processing with WorkManager
class AIProcessingWorker(
context: Context,
params: WorkerParameters
) : CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
return try {
val request = inputData.getString("request")
val result = client.process(request)
// Store result
CacheManager.store(result)
Result.success()
} catch (e: Exception) {
Result.retry()
}
}
}
// Schedule work
val workRequest = OneTimeWorkRequestBuilder<AIProcessingWorker>()
.setInputData(workDataOf("request" to requestJson))
.setConstraints(
Constraints.Builder()
.setRequiredNetworkType(NetworkType.CONNECTED)
.setRequiresBatteryNotLow(true)
.build()
)
.build()
WorkManager.getInstance(context).enqueue(workRequest)
Performance Optimization
Request Batching
Combine multiple AI requests into single API calls to reduce network overhead:
// Batch multiple prompts
val batchRequest = BatchRequest(
requests = listOf(
AIRequest(prompt = "Analyze sentiment 1"),
AIRequest(prompt = "Analyze sentiment 2"),
AIRequest(prompt = "Analyze sentiment 3")
)
)
val responses = client.batchProcess(batchRequest)
Response Caching
- Memory cache: Fast access for frequently used responses (LRU eviction)
- Disk cache: Persistent storage for offline access
- Semantic cache: Match similar queries to cached responses
Offline Capabilities
Offline-First Architecture
- Local model fallback: Use on-device models for basic AI tasks when offline
- Request queuing: Queue API requests and process when connectivity returns
- Cached responses: Return cached results for repeated queries
- Graceful degradation: Communicate limited functionality to users
Offline Strategy
Implement a tiered approach: online-first with cached fallback, then on-device model, and finally user notification of limited functionality.