Building semantic search applications using Spring AI Embeddings and vector stores
Introduction: Understanding Semantic Search and Embeddings in Modern Applications
Semantic search is transforming how modern applications retrieve information. Unlike traditional keyword search, which depends on exact word matches, semantic search utilizes AI-driven embeddings to grasp the contextual meaning behind queries and documents. This approach delivers more relevant and accurate results, especially in scenarios involving synonyms, paraphrasing, or complex language. At the heart of semantic search are embeddings—dense vector representations of text that capture semantic relationships. In this practical guide, you’ll discover how to build a semantic search application with Spring AI, integrate it with a vector store, and implement features suited for real-world use cases.
How Embeddings Enable Semantic Similarity Search vs. Keyword Search
Embeddings are high-dimensional numerical vectors generated by large language models to represent text. Each sentence, paragraph, or document is mapped to a point in vector space, ensuring that semantically similar texts are positioned close together, regardless of their exact wording. This is a significant advancement over keyword-based search, which only retrieves documents containing specific terms. For instance, a keyword search for ‘car’ might overlook documents that mention ‘automobile’, whereas semantic search recognizes their equivalence. By storing embeddings in a vector store and applying similarity metrics like cosine similarity, you can efficiently retrieve the most relevant documents for a query—even when different vocabulary is used.
Setting Up a Spring Boot Project with Spring AI and Vector Store Dependencies
Begin by creating a new Spring Boot project using Spring Initializr. Select Java 17 or newer, include the ‘Spring Web’ starter, and add ‘Spring AI’ dependencies. For vector storage, choose a provider such as PGVector (a PostgreSQL extension), Pinecone, or Redis. This guide uses PGVector, but the setup is similar for other stores.
Add the following dependencies to your build.gradle:
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.ai:spring-ai-embeddings-openai'
implementation 'org.springframework.ai:spring-ai-vectorstore-pgvector'
implementation 'org.postgresql:postgresql:42.6.0'
}
Or, if you use Maven:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-embeddings-openai</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-vectorstore-pgvector</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
</dependency>
Ensure your PostgreSQL instance has the PGVector extension enabled:
CREATE EXTENSION IF NOT EXISTS vector;
Configuring the Embedding Model and Connecting to a Vector Store (PGVector, Pinecone, or Redis)
Configure your embedding provider and vector store in src/main/resources/application.properties. For OpenAI embeddings and PGVector, add the following properties:
spring.ai.openai.api-key=YOUR_OPENAI_API_KEY spring.ai.embeddings.provider=openai spring.ai.vectorstore.pgvector.url=jdbc:postgresql://localhost:5432/yourdb spring.ai.vectorstore.pgvector.username=youruser spring.ai.vectorstore.pgvector.password=yourpassword spring.ai.vectorstore.pgvector.table-name=embeddings
Next, create a configuration class to wire up the embedding and vector store beans:
@Configuration
public class AiConfig {
@Bean
public EmbeddingClient embeddingClient(OpenAiEmbeddingClient openAiEmbeddingClient) {
return openAiEmbeddingClient;
}
@Bean
public VectorStore vectorStore(PgVectorStore pgVectorStore) {
return pgVectorStore;
}
}
This configuration enables your application to generate embeddings and interact with the vector store. If you choose Pinecone or Redis, update the dependencies and properties accordingly.
Ingesting Documents: Chunking Strategies, Generating Embeddings, and Storing Vectors with Metadata
For effective semantic search, it’s essential to break large documents into manageable chunks, enabling granular retrieval and precise context matching. You can chunk documents by paragraphs or fixed-length blocks (such as 500 tokens).
Here’s a simple chunking utility:
public List<String> chunkText(String text, int chunkSize) {
List<String> chunks = new ArrayList<>();
for (int start = 0; start < text.length(); start += chunkSize) {
int end = Math.min(text.length(), start + chunkSize);
chunks.add(text.substring(start, end));
}
return chunks;
}
For each chunk, generate its embedding and store it in the vector store along with metadata such as document ID and chunk index:
@Autowired
private EmbeddingClient embeddingClient;
@Autowired
private VectorStore vectorStore;
public void ingestDocument(String docId, String text) {
List<String> chunks = chunkText(text, 500);
for (int i = 0; i < chunks.size(); i++) {
String chunk = chunks.get(i);
List<Double> embedding = embeddingClient.embed(chunk);
Map<String, Object> metadata = Map.of(
"docId", docId,
"chunkIndex", i
);
vectorStore.save(new Vector(chunk, embedding, metadata));
}
}
Expose this ingestion process through a REST endpoint to enable document uploads and seamless integration.
Implementing Semantic Search: Service and Controller Layer Code Walkthrough with Example REST Endpoints
Next, implement the core semantic search logic by creating a dedicated service for search operations and a REST controller to expose endpoints.
@Service
public class SemanticSearchService {
@Autowired
private EmbeddingClient embeddingClient;
@Autowired
private VectorStore vectorStore;
public List<SearchResult> search(String query, int topK) {
List<Double> queryEmbedding = embeddingClient.embed(query);
List<Vector> results = vectorStore.similaritySearch(queryEmbedding, topK);
return results.stream()
.map(vector -> new SearchResult(
(String) vector.getMetadata().get("docId"),
(String) vector.getText(),
vector.getScore()
))
.collect(Collectors.toList());
}
public void ingestDocument(String docId, String text) {
// Use the chunking and embedding logic from previous section
}
}
@RestController
@RequestMapping("/api/semantic-search")
public class SemanticSearchController {
@Autowired
private SemanticSearchService searchService;
@PostMapping("/ingest")
public ResponseEntity<String> ingest(@RequestBody DocumentUploadRequest request) {
searchService.ingestDocument(request.getDocId(), request.getText());
return ResponseEntity.ok("Document ingested successfully");
}
@GetMapping("/search")
public ResponseEntity<List<SearchResult>> search(
@RequestParam String query,
@RequestParam(defaultValue = "5") int topK) {
return ResponseEntity.ok(searchService.search(query, topK));
}
}
These endpoints allow you to upload documents for ingestion and perform semantic searches. The search endpoint returns the top K most semantically similar chunks, including document ID, text, and similarity score for each result.
Performance Considerations: Chunking, Indexing, Filtering with Metadata, and Scaling Vector Storage
Optimizing performance is crucial as your semantic search data grows. Chunking strategy directly affects retrieval accuracy and latency: smaller chunks offer finer granularity but increase storage and computational overhead. Ensure your vector store supports efficient indexing. For PGVector, leverage ivfflat or hnsw indexes. For example, in PostgreSQL:
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Utilize metadata filtering to narrow search results by document type, user, or tags. PGVector and other vector stores support metadata-based queries, enabling you to restrict searches to specific document types or date ranges.
As your dataset expands, consider sharding your vector database or adopting managed vector services like Pinecone for horizontal scaling. Continuously monitor query latency and adjust chunk size, embedding models, and batch operations to optimize large-scale ingestion and search performance.
Production Best Practices: Monitoring Embedding Costs, Security, and Scaling Strategies
In production environments, embedding generation can incur significant costs, especially when using commercial APIs like OpenAI. To manage expenses, batch embedding requests, cache results, and monitor API usage closely. Implement robust error handling and use exponential backoff for transient failures to ensure reliability. Secure your REST endpoints with proper authentication and authorization, particularly for ingestion and search APIs.
For scalability, opt for managed vector stores or deploy clusters with replication and failover capabilities. Regularly vacuum and reindex your vector tables to maintain optimal performance. Set up logging and monitoring for embedding throughput, vector store latency, and storage utilization. Apply data retention policies to control storage costs, and keep your embedding models updated, periodically re-embedding your corpus to benefit from model improvements.
Conclusion: Key Takeaways and Next Steps for Building Robust Semantic Search with Spring AI
Semantic search powered by embeddings and vector stores empowers applications to deliver contextually relevant results far beyond traditional keyword matching. With Spring AI, you can quickly prototype and deploy robust semantic search solutions integrated with vector databases like PGVector, Pinecone, or Redis. This guide covered setting up your Spring Boot project, configuring models, ingesting and chunking documents, and implementing RESTful search endpoints. As you move toward production, prioritize performance optimization, cost monitoring, and security. Explore advanced features such as hybrid search (combining keyword and semantic search), metadata filtering, and integration with Retrieval-Augmented Generation (RAG) pipelines to unlock even more powerful AI-driven search experiences.
