EXPERIMENTAL CONVERSATIONAL AI AGENT BUILDS

Conversational AI &
Spatial Awareness

An experimental Gemini build exploring the newest audio-to-audio AI models, real-time map rendering, and the eternal human question of where they moved the olive oil.

Unexpected Item.

Drop me into Target and I feel like a chicken released into Antarctica. Where am I? What is this? Why?

If I go to Target for a USB drive, I somehow leave with two bottles of Ibuprofen, a ceramic penguin, and a dozen neon pool noodles.

I usually forget my USB drive, and I don’t own a pool.

While I can’t explain my behavior, I do have a theory: Target, Costco, and pretty much every major retailer on earth design their stores so you cannot find anything, but you can find everything.

Getting lost is the product, and you think you’re the smart one because you know why the milk is in the back. You’ll show them!

Next thing you know, you own a kayak.

So I built a thing retailers would never build themselves. GeminiMart is a voice-activated AI kiosk that knows where everything is in a retail environment, and will tell you, in plain English, exactly how to get there.

Head left until you hit Aisle 4. Take a right into the aisle, and you’ll find olive oil halfway down the aisle on your right side.

An animated path draws itself across the store map in real time. Ask for multiple items, and three paths map out simultaneously. Can we print that for you? How about a QR code for $1.00 off?

Will Costco ever deploy this? Absolutely not. They need you wandering past bins of trail mix and shark-shaped soap dispensers.

But what about places that actually want to help you find things? An airport. A hospital campus. A convention center that needs to herd 40,000 people but can only hire 4 to do it. These kinds of places don’t benefit from confusion, they just offer it.

I didn’t build this idea to sell it, I simply wanted to see if I could do it. Or more importantly, how I could do it, and how I might incorporate what I learned into future conversational AI projects.

Now if I could only convince Target.

GEMINIMART IN ACTION

One question, two outputs, real time.

Each user request streams in real time to a Python backend over a WebSocket connection.

Gemini processes our audio natively, skipping the traditional speech-to-text and text-to-speech pipeline entirely. One API connection now replaces three.

A location tool queries the store inventory, returns map coordinates as a JSON result, and drives two simultaneous outputs: a spoken voice response guiding the user to the product, and an animated path drawing itself across the SVG map in real time. Voice and map stay in sync because both read from the same source.

While built on a single store environment in this example, the architecture readily scales to multi-floor mapping, live inventory APIs, and other large physical spaces.

GEMINIMART TOOLKIT