May 2026 – Michael Clayton

Tuning Your AI App: The “Sweet Spot” Strategy

Tools Used: Ollama + llama3, Spring Ai

Building an AI-driven booking system sounds straightforward until your Kenyan safari guide starts trying to sell trips to Hawaii. After some deep-dive troubleshooting with Spring AI and Vector Stores, I’ve found that the secret to a perfect RAG (Retrieval-Augmented Generation) system isn’t just better data—it’s finding the right Similarity Threshold.

The Problem: The “Over-Helpful” AI

When using a vector store (like SimpleVectorStore), the system looks for the “closest” match to a user’s question. If a user asks for “Hawaii” and your database only contains “Kenya,” the math will still find the closest thing it can—likely a Kenyan beach tour. Without a gatekeeper, your AI sees that “closest” match and assumes it’s relevant, leading to confusing responses.

The Solution: The Similarity Threshold

The Similarity Threshold acts as a filter. It tells the system: “Unless the match is at least this relevant, ignore it.”

Setting it too high (e.g., 0.8): The “Nothing works” zone. Even valid searches for “Safari” or “Elephants” get blocked because the mathematical match isn’t “perfect” enough.
Setting it too low (e.g., 0.0): The “Hawaii” zone. Everything gets through, including irrelevant data, making your AI hallucinate or provide wrong locations.

Finding the “Sweet Spot”

Through testing with Llama 3, I discovered that the industry-standard numbers (like 0.7) don’t always work for every local setup. In my case, 0.1 was the magic number.

How to Implement It in Spring AI

In the ConciergeChatService, you can enforce this by using a constant and applying it to your SearchRequest. This ensures that if the similarity doesn’t hit your target, the list of documents comes back empty, and you can handle it with a polite “Jambo! I only specialize in Kenya” message.

Java

			
private static final double MIN_SIMILARITY = 0.1; // The magic number
public ConciergeChatResponse getConversationalResponse(String userQuestion) {
    List<Document> docs = vectorStore.similaritySearch(
            SearchRequest.builder()
                    .similarityThreshold(MIN_SIMILARITY) 
                    .topK(5)
                    .build()
    );
    if (docs.isEmpty()) {
        return new ConciergeChatResponse("I only specialize in Kenyan tours.", docs);
    }
    // Proceed to generate AI response...
}

		

Key Takeaway

If your AI is leaking irrelevant results or staying silent when it should speak, don’t assume your code is broken.Instead:

Set your threshold to 0.0 to see what scores your valid terms are getting.
Slowly nudge the number up until the “noise” (like Hawaii) disappears.
Let your Java logic handle the empty results rather than letting the AI guess.

Tuning these numbers is the difference between an AI that guesses and an AI that knows.

Month: May 2026

Fine tuning Ai to get the response you want and prevent the response you don’t want!