EXPERIMENTAL CONVERSATIONAL AI AGENT BUILDS

Conversational AI &
Spatial Awareness

An experimental Gemini build exploring the newest audio-to-audio AI models, real-time map rendering, and the eternal human question of where they moved the olive oil.

Unexpected Item.

A couple weeks ago I saw a man near the meatloaf at Costco who looked like he knew what a budget was.

He moved with purpose. He had a list. No emotional detours into inflatable furniture. The kind of person who buys what his family needs and leaves.

I was holding a ceramic seagull timepiece at the time.

I had come for garbage bags. I left with 108 AA batteries, the ceramic seagull, and two dozen neon pool noodles.

I forgot the garbage bags. I don’t own a pool. 

While I can’t explain my shopping behavior, I do have a theory: Target, Costco, and pretty much every major retailer on earth design their stores so you cannot easily find anything, but you can easily find everything.

Getting lost is the product, and you think you’re the smart one because you know why they put the milk in the back. Haha, you’re not gonna fall for that one!

Next thing you know, you own a canoe.

But in the middle of wrestling pool noodles into my car, I started thinking about that man near the meatloaf. How could I be more like him? 

What if there were a kiosk up front that knew exactly where that meatloaf was and told me, in plain English, exactly how to get there. Fast.

“Head left past the registers until you hit Aisle 4. Take a right. Meatloaf is in the cooler halfway down on your right.”

A path draws itself across the store map in real time. Ask for multiple items and the routes stack on top of each other. Can we print that for you? QR code for $1.00 off meatloaf?

Would Costco ever deploy this? Absolutely not. They need you wandering past aisles of dog trampolines and shark-shaped soap dispensers.

But what about places that actually want to help you find things? A hospital campus. An airport. A convention center that needs to herd 40,000 people but only hires 4 to do it. These kinds of places don’t benefit from confusion, they just offer it for free.

I didn’t build this idea to sell it, I simply wanted to see if I could do it. Or more importantly, how I could do it, and how I might incorporate what I learned into future conversational AI projects.

Now if I could only convince Target.

GEMINIMART IN ACTION

One question, two outputs, real time.

Each user request streams in real time to a Python backend over a WebSocket connection.

Gemini processes our audio natively, skipping the traditional speech-to-text and text-to-speech pipeline entirely. One API connection now replaces three.

A location tool queries the store inventory, returns map coordinates as a JSON result, and drives two simultaneous outputs: a spoken voice response guiding the user to the product, and an animated path drawing itself across the SVG map in real time. Voice and map stay in sync because both read from the same source.

While built on a single store environment in this example, the architecture readily scales to multi-floor mapping, live inventory APIs, and other large physical spaces.

GEMINIMART TOOLKIT