Configuring and switching between multiple LLM providers in Spring AI
Introduction: The Need for Multi-Provider LLM Support in Production AI Systems
As large language models (LLMs) become integral to enterprise applications, depending on a single LLM provider introduces significant operational risks, including cost surges, outages, performance issues, and compliance challenges. Supporting multiple LLM providers is vital for production AI systems, as it enables cost optimization by routing requests to the most economical provider, ensures business continuity through seamless failover, and helps meet regulatory requirements by leveraging region-specific or certified models. In this comprehensive tutorial, you’ll discover how to configure and efficiently switch between multiple LLM providers using Spring AI within a Spring Boot application, focusing on practical, production-ready strategies.
Understanding Spring AI’s Provider Abstraction Model
Spring AI offers a robust provider abstraction layer that decouples your application logic from any specific LLM vendor. This is accomplished through interfaces like ChatModel and EmbeddingModel, which are implemented by provider-specific classes such as OpenAiChatModel and AzureOpenAiChatModel. By developing against these abstractions, you can seamlessly swap providers with minimal code changes, inject different models based on configuration, and implement dynamic provider selection. This architecture supports advanced patterns such as chaining, fallback, and multi-model orchestration, all leveraging Spring’s powerful dependency injection and configuration features. This approach is essential for building flexible, maintainable, and easily testable AI-powered systems.
Setting Up Multiple LLM Providers: OpenAI, Azure OpenAI, and Others via application.properties
To enable support for multiple LLM providers, define provider-specific configurations in your application.properties or application.yaml files. Spring AI provides native integrations with leading providers like OpenAI and Azure OpenAI. For example:
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
model: gpt-3.5-turbo
azure:
openai:
api-key: ${AZURE_OPENAI_API_KEY}
endpoint: https://your-azure-endpoint.openai.azure.com/
chat:
deployment-name: my-azure-gpt
For additional providers such as Hugging Face or Cohere, leverage their respective Spring AI modules. Always inject secrets through environment variables or a secrets manager—never hardcode them.
Next, define beans for each provider in your Spring configuration:
@Bean
@Qualifier("openai")
public ChatModel openAiChatModel(OpenAiChatModelProperties properties) {
return new OpenAiChatModel(properties);
}
@Bean
@Qualifier("azure-openai")
public ChatModel azureOpenAiChatModel(AzureOpenAiChatModelProperties properties) {
return new AzureOpenAiChatModel(properties);
}
This setup allows you to inject the desired ChatModel using the @Qualifier annotation, ensuring a clean separation and effortless switching between providers.
Environment-Specific Configuration Using Spring Profiles and Feature Flags
Different environments—such as production, staging, and development—often require distinct LLM providers or configurations. Spring profiles enable you to define environment-specific properties and beans. For example, you might use OpenAI in development to reduce costs, but Azure OpenAI in production to meet compliance requirements:
application-dev.yaml:
spring:
ai:
provider: openai
openai:
api-key: ${OPENAI_API_KEY}
chat:
model: gpt-3.5-turbo
application-prod.yaml:
spring:
ai:
provider: azure-openai
azure:
openai:
api-key: ${AZURE_OPENAI_API_KEY}
endpoint: https://your-azure-endpoint.openai.azure.com/
chat:
deployment-name: prod-gpt
Activate specific profiles using the SPRING_PROFILES_ACTIVE environment variable. For runtime flexibility, implement feature flags (using tools like Togglz or LaunchDarkly) to switch providers or enable LLM features without redeployment. This approach supports gradual rollouts, A/B testing, and rapid rollback if a provider experiences issues.
Dynamically Selecting and Switching LLM Providers at Runtime
Dynamic provider selection is essential for routing requests based on user, context, or system health. Implement a provider router or factory to select the appropriate ChatModel or EmbeddingModel bean at runtime. For example:
@Component
public class LlmProviderRouter {
private final Map<String, ChatModel> chatModels;
@Autowired
public LlmProviderRouter(Map<String, ChatModel> chatModels) {
this.chatModels = chatModels;
}
public ChatModel getChatModel(String provider) {
return chatModels.get(provider);
}
}
Inject all available ChatModel beans as a map, keyed by their qualifier. At runtime, select the provider based on a request header, feature flag, or system metric:
@RestController
public class ChatController {
@Autowired
private LlmProviderRouter router;
@PostMapping("/chat")
public ResponseEntity<String> chat(@RequestBody ChatRequest request, @RequestHeader("X-LLM-Provider") String provider) {
ChatModel model = router.getChatModel(provider);
String reply = model.generate(request.getPrompt());
return ResponseEntity.ok(reply);
}
}
This pattern enables multi-tenancy, user-level customization, and dynamic failover. For more advanced routing, encapsulate business logic in a ProviderSelector service that evaluates cost, latency, or business rules.
Implementing Fallback and Failover Strategies for High Availability
No LLM provider is immune to downtime or degraded performance. Fallback strategies are crucial to maintaining application availability. One effective approach is to chain providers with a fallback mechanism:
@Component
public class FallbackChatModel implements ChatModel {
private final ChatModel primary;
private final ChatModel secondary;
public FallbackChatModel(@Qualifier("openai") ChatModel primary,
@Qualifier("azure-openai") ChatModel secondary) {
this.primary = primary;
this.sec
}
@Override
public String generate(String prompt) {
try {
return primary.generate(prompt);
} catch (Exception ex) {
// Log the failure and fallback
return secondary.generate(prompt);
}
}
}
Register FallbackChatModel as the default ChatModel bean to provide automatic failover. For greater resilience, integrate Spring Retry for retry policies, use circuit breakers (such as Resilience4j) to prevent cascading failures, and implement health checks to monitor provider status. These patterns are essential for building highly available, production-grade AI systems.
Best Practices: Secrets Management, Rate Limiting, Observability, and Logging
To ensure a production-ready multi-provider AI setup, adhere to operational best practices. Manage secrets using solutions like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault, and inject them at runtime via environment variables or Spring Cloud Config. Never hardcode sensitive information:
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
Implement rate limiting at the service layer to prevent provider throttling or unexpected costs. Use libraries such as Bucket4j or Resilience4j RateLimiter:
@Bean
public RateLimiter openAiRateLimiter() {
return RateLimiter.of("openai", RateLimiterConfig.custom()
.limitForPeriod(60)
.limitRefreshPeriod(Duration.ofMinutes(1))
.timeoutDuration(Duration.ofSeconds(2))
.build());
}
Ensure observability with distributed tracing (OpenTelemetry), structured logging (SLF4J, Logback), and metrics (Micrometer) to monitor provider latency, error rates, and usage. For example:
logging:
level:
com.example.ai: DEBUG
Maintain auditability and compliance by logging provider selection and user prompts, while redacting sensitive information as necessary. Regularly review provider SLAs and update failover logic to address evolving operational risks.
Conclusion: Building Resilient Multi-Provider AI Architectures with Spring AI
Configuring and switching between multiple LLM providers in Spring AI equips your production applications with the flexibility, resilience, and cost efficiency required for modern AI workloads. By leveraging Spring’s provider abstraction, environment-specific configurations, dynamic selection, and robust failover strategies, you can deliver dependable AI-powered features at scale. Combine these techniques with best practices in secrets management, rate limiting, and observability to build a future-proof, production-ready multi-provider AI architecture. Begin implementing these patterns today to fully realize the potential of large language models in your Spring Boot applications.
