Configuring and switching between multiple LLM providers in Spring AI

Introduction: The Need for Multi-Provider LLM Support in Production AI Systems

As large language models (LLMs) become integral to enterprise applications, depending on a single LLM provider introduces significant operational risks, including cost surges, outages, performance issues, and compliance challenges. Supporting multiple LLM providers is vital for production AI systems, as it enables cost optimization by routing requests to the most economical provider, ensures business continuity through seamless failover, and helps meet regulatory requirements by leveraging region-specific or certified models. In this comprehensive tutorial, you’ll discover how to configure and efficiently switch between multiple LLM providers using Spring AI within a Spring Boot application, focusing on practical, production-ready strategies.

Understanding Spring AI’s Provider Abstraction Model

Spring AI offers a robust provider abstraction layer that decouples your application logic from any specific LLM vendor. This is accomplished through interfaces like ChatModel and EmbeddingModel, which are implemented by provider-specific classes such as OpenAiChatModel and AzureOpenAiChatModel. By developing against these abstractions, you can seamlessly swap providers with minimal code changes, inject different models based on configuration, and implement dynamic provider selection. This architecture supports advanced patterns such as chaining, fallback, and multi-model orchestration, all leveraging Spring’s powerful dependency injection and configuration features. This approach is essential for building flexible, maintainable, and easily testable AI-powered systems.

Setting Up Multiple LLM Providers: OpenAI, Azure OpenAI, and Others via application.properties

To enable support for multiple LLM providers, define provider-specific configurations in your application.properties or application.yaml files. Spring AI provides native integrations with leading providers like OpenAI and Azure OpenAI. For example:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        model: gpt-3.5-turbo
    azure:
      openai:
        api-key: ${AZURE_OPENAI_API_KEY}
        endpoint: https://your-azure-endpoint.openai.azure.com/
        chat:
          deployment-name: my-azure-gpt

For additional providers such as Hugging Face or Cohere, leverage their respective Spring AI modules. Always inject secrets through environment variables or a secrets manager—never hardcode them.

Next, define beans for each provider in your Spring configuration:

@Bean
@Qualifier("openai")
public ChatModel openAiChatModel(OpenAiChatModelProperties properties) {
    return new OpenAiChatModel(properties);
}

@Bean
@Qualifier("azure-openai")
public ChatModel azureOpenAiChatModel(AzureOpenAiChatModelProperties properties) {
    return new AzureOpenAiChatModel(properties);
}

This setup allows you to inject the desired ChatModel using the @Qualifier annotation, ensuring a clean separation and effortless switching between providers.

Environment-Specific Configuration Using Spring Profiles and Feature Flags

Different environments—such as production, staging, and development—often require distinct LLM providers or configurations. Spring profiles enable you to define environment-specific properties and beans. For example, you might use OpenAI in development to reduce costs, but Azure OpenAI in production to meet compliance requirements:

application-dev.yaml:

spring:
  ai:
    provider: openai
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        model: gpt-3.5-turbo

application-prod.yaml:

spring:
  ai:
    provider: azure-openai
    azure:
      openai:
        api-key: ${AZURE_OPENAI_API_KEY}
        endpoint: https://your-azure-endpoint.openai.azure.com/
        chat:
          deployment-name: prod-gpt

Activate specific profiles using the SPRING_PROFILES_ACTIVE environment variable. For runtime flexibility, implement feature flags (using tools like Togglz or LaunchDarkly) to switch providers or enable LLM features without redeployment. This approach supports gradual rollouts, A/B testing, and rapid rollback if a provider experiences issues.

Dynamically Selecting and Switching LLM Providers at Runtime

Dynamic provider selection is essential for routing requests based on user, context, or system health. Implement a provider router or factory to select the appropriate ChatModel or EmbeddingModel bean at runtime. For example:

@Component
public class LlmProviderRouter {
    private final Map<String, ChatModel> chatModels;

    @Autowired
    public LlmProviderRouter(Map<String, ChatModel> chatModels) {
        this.chatModels = chatModels;
    }

    public ChatModel getChatModel(String provider) {
        return chatModels.get(provider);
    }
}

Inject all available ChatModel beans as a map, keyed by their qualifier. At runtime, select the provider based on a request header, feature flag, or system metric:

@RestController
public class ChatController {
    @Autowired
    private LlmProviderRouter router;

    @PostMapping("/chat")
    public ResponseEntity<String> chat(@RequestBody ChatRequest request, @RequestHeader("X-LLM-Provider") String provider) {
        ChatModel model = router.getChatModel(provider);
        String reply = model.generate(request.getPrompt());
        return ResponseEntity.ok(reply);
    }
}

This pattern enables multi-tenancy, user-level customization, and dynamic failover. For more advanced routing, encapsulate business logic in a ProviderSelector service that evaluates cost, latency, or business rules.

Implementing Fallback and Failover Strategies for High Availability

No LLM provider is immune to downtime or degraded performance. Fallback strategies are crucial to maintaining application availability. One effective approach is to chain providers with a fallback mechanism:

@Component
public class FallbackChatModel implements ChatModel {
    private final ChatModel primary;
    private final ChatModel secondary;

    public FallbackChatModel(@Qualifier("openai") ChatModel primary,
                             @Qualifier("azure-openai") ChatModel secondary) {
        this.primary = primary;
        this.sec
    }

    @Override
    public String generate(String prompt) {
        try {
            return primary.generate(prompt);
        } catch (Exception ex) {
            // Log the failure and fallback
            return secondary.generate(prompt);
        }
    }
}

Register FallbackChatModel as the default ChatModel bean to provide automatic failover. For greater resilience, integrate Spring Retry for retry policies, use circuit breakers (such as Resilience4j) to prevent cascading failures, and implement health checks to monitor provider status. These patterns are essential for building highly available, production-grade AI systems.

Best Practices: Secrets Management, Rate Limiting, Observability, and Logging

To ensure a production-ready multi-provider AI setup, adhere to operational best practices. Manage secrets using solutions like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault, and inject them at runtime via environment variables or Spring Cloud Config. Never hardcode sensitive information:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}

Implement rate limiting at the service layer to prevent provider throttling or unexpected costs. Use libraries such as Bucket4j or Resilience4j RateLimiter:

@Bean
public RateLimiter openAiRateLimiter() {
    return RateLimiter.of("openai", RateLimiterConfig.custom()
        .limitForPeriod(60)
        .limitRefreshPeriod(Duration.ofMinutes(1))
        .timeoutDuration(Duration.ofSeconds(2))
        .build());
}

Ensure observability with distributed tracing (OpenTelemetry), structured logging (SLF4J, Logback), and metrics (Micrometer) to monitor provider latency, error rates, and usage. For example:

logging:
  level:
    com.example.ai: DEBUG

Maintain auditability and compliance by logging provider selection and user prompts, while redacting sensitive information as necessary. Regularly review provider SLAs and update failover logic to address evolving operational risks.

Conclusion: Building Resilient Multi-Provider AI Architectures with Spring AI

Configuring and switching between multiple LLM providers in Spring AI equips your production applications with the flexibility, resilience, and cost efficiency required for modern AI workloads. By leveraging Spring’s provider abstraction, environment-specific configurations, dynamic selection, and robust failover strategies, you can deliver dependable AI-powered features at scale. Combine these techniques with best practices in secrets management, rate limiting, and observability to build a future-proof, production-ready multi-provider AI architecture. Begin implementing these patterns today to fully realize the potential of large language models in your Spring Boot applications.

Configuring and switching between multiple LLM providers in Spring AI

Introduction: The Need for Multi-Provider LLM Support in Production AI Systems

Understanding Spring AI’s Provider Abstraction Model

Setting Up Multiple LLM Providers: OpenAI, Azure OpenAI, and Others via application.properties

Environment-Specific Configuration Using Spring Profiles and Feature Flags

Dynamically Selecting and Switching LLM Providers at Runtime

Implementing Fallback and Failover Strategies for High Availability

Best Practices: Secrets Management, Rate Limiting, Observability, and Logging

Conclusion: Building Resilient Multi-Provider AI Architectures with Spring AI

Implementing streaming AI responses in Spring AI

Building semantic search applications using Spring AI Embeddings and vector stores

Designing multi-agent orchestration workflows using Spring AI

Implementing AI-powered document processing pipelines using Spring AI

Implementing conversational memory in Spring AI

Integrating tool calling in Spring AI

Introduction: The Need for Multi-Provider LLM Support in Production AI Systems

Understanding Spring AI’s Provider Abstraction Model

Setting Up Multiple LLM Providers: OpenAI, Azure OpenAI, and Others via application.properties

Environment-Specific Configuration Using Spring Profiles and Feature Flags

Dynamically Selecting and Switching LLM Providers at Runtime

Implementing Fallback and Failover Strategies for High Availability

Best Practices: Secrets Management, Rate Limiting, Observability, and Logging

Conclusion: Building Resilient Multi-Provider AI Architectures with Spring AI

Similar Posts