Trace of Ink Storyteller: Orchestrating Gemini 2.5 and Imagen 3 into a Serverless Narrative Platform

Forging Infinite Tales: Building VertexAI Storyteller with Google Cloud

Disclaimer: I created this piece of content for the purposes of entering this hackathon.

Interactive storytelling has traditionally been constrained by predefined narrative paths—until now. VertexAI Storyteller is a dynamic, multimodal “choose-your-own-adventure” engine constructed entirely on top of Google Cloud and Google AI. This application empowers users to custom-craft magical characters, transport them to completely unscripted worlds, and follow an ever-evolving narrative built precisely in real-time.

Here is a comprehensive breakdown of how we brought this immersive, multimodal platform to life relying on Google Cloud’s highly scalable infrastructure and Vertex AI models.

1. The Core Narrative Engine: Gemini 2.5 on Vertex AI

At the heart of our story lies a fleet of independent Python microservices orchestrating intelligent workflows using the LangChain Google GenAI client (langchain-google-genai), securely hooked to Vertex AI.

Instead of a giant monolithic app, we isolated the intelligence:

Our story-api controls the flow of the book. Every time a user makes a choice, the backend seamlessly pings Gemini 2.5 Flash on Vertex AI.
By providing Gemini 2.5 with a rich, persistent “Session Object” (including character traits like Brave, accessories like a Top Hat, and the current environmental mood), we heavily structured the generative prompt.
The incredibly low latency of Gemini 2.5 Flash allowed us to implement speculative N+1 pre-rendering: the moment you read Page 1, our background tasks are already querying Gemini to silently construct the two potential realities stemming from your pending Choice A and Choice B decisions!

(Because we needed strictly structured output, we tuned our Gemini interaction to enforce strict JSON schemas detailing the left-page text, right-page text, choice blurbs, and highly specific image prompts).

2. Visualizing the World: Imagen 3 on Vertex AI

A great storybook demands breathtaking illustrations.

Our image-api (Art Agent) microservice connects directly to Imagen 3 (imagen-3.0-generate-002) through the vertexai.preview.vision_models SDK.

Because we explicitly decoupled the Text Generation (Gemini) from the Art Generation (Imagen), we could enforce deep visual consistency:

When generating characters at the start of the book, Imagen 3 builds personalized, vibrant portraits in styles from “Disney” to “Cyberpunk”.
On every single narrative page turn, the story-api (using Gemini) intelligently composes a dense visual prompt summarizing the character’s exact action and attire.
This prompt is quietly handed over to the image-api. We actively utilize Imagen 3’s fast-generation options to paint gorgeous, full-page chapter illustrations in parallel. By passing previous reference images to Imagen 3’s edit_image functionality, characters remain visually coherent from the front cover entirely to the finale!

3. High-Performance Infrastructure: Cloud Run & Docker Compose

Deploying an application with five concurrent AI agents isn’t simple. We needed production-level orchestration natively woven into Google Cloud.

FastAPI + Docker: Every agent (story-api, image-api, character-agent) runs on FastAPI and its own dedicated Docker container.
Docker Compose (IaC): The entire orchestration network (bridging the Next.js React frontend to the deep backend APIs) is automated locally and on Virtual Machines via docker-compose.yml.
Google Cloud Run: We automated our cloud deployment entirely via a serverless deployment script utilizing gcloud run deploy. This handles memory allocation, global ingress, and auto-scaling to thousands of concurrent readers with absolutely zero downtime or server maintenance.

Conclusion

Combining the sheer creative depth of Gemini 2.5 Flash, the stunning visual artistry of Imagen 3, and the robust reliability of Google Cloud Run, VertexAI Storyteller is our vision for the next generation of generative media.

We are incredibly proud to submit this platform, built entirely natively on Vertex AI. I created this piece of content for the purposes of entering this hackathon, and we hope you thoroughly enjoy exploring the infinite worlds generated by our agents.