A comprehensive AI-powered creative platform that combines advanced photo editing with professional video generation. Create, edit, and enhance images using Gemini 2.5 Flash's multi-image capabilities, then generate stunning videos with Veo 3 - all in one seamless interface.
Note
This application features three distinct creative modes with a modern Material 3 Expressive design system, providing professional-grade tools for content creators, designers, and storytellers.
- Multi-Image Chat Interface: Upload up to 50 reference images for context
- Iterative Editing: Conversational workflow for step-by-step image refinement
- Character Consistency: Maintain subjects and styles across generations
- Advanced Prompting: Leverage Gemini 2.5 Flash's full 3,600 image API capability
- Download Management: Save all generated variations with timestamps
- Text-to-Video: Create videos from detailed text descriptions
- Image-to-Video: Transform static images into dynamic video content
- Custom Aspect Ratios: Support for 16:9, 9:16, 1:1, 4:5, and more
- Timeline Editor: Trim videos with precision using browser-based tools
- Multi-Scene Projects: Create complex video narratives with multiple scenes
- Drag & Drop Organization: Reorder scenes with intuitive interface
- Batch Generation: Process multiple scenes with progress tracking
- Scene Management: Individual prompts and settings per scene
- Node.js (version 18 or higher) and npm
- Gemini API Key (Paid tier required) from AI Studio
Warning
Paid Tier Required: Veo 3 video generation and Gemini 2.5 Flash image editing require the Gemini API Paid tier.
-
Clone the repository
git clone https://github.com/rdfitted/storycomposer.git cd storycomposer -
Install dependencies
npm install
-
Set up environment variables Create a
.envfile in the project root:GEMINI_API_KEY="your-gemini-api-key-here" -
Start the development server
npm run dev
-
Open your browser Navigate to
http://localhost:3000
- Upload Reference Images: Click "Images" button to add up to 50 context images
- Start Chatting: Describe what you want to create or edit
- Iterate & Refine: Continue the conversation to refine your images
- Download Results: Click download buttons on generated images
- Add Image (Optional): Click "Image" to upload a reference image
- Write Prompt: Describe your video in the text area
- Configure Settings: Choose aspect ratio and model
- Generate: Click the arrow button to start video generation
- Edit & Download: Use timeline controls to trim and download
- Create Scenes: Click "Add Scene" to start building your storyboard
- Upload Images: Add reference images for each scene
- Write Prompts: Describe each scene's action
- Generate All: Click "Generate All Scenes" for batch processing
- Organize: Drag and drop to reorder your final storyboard
Built with Next.js 15 and React 19, featuring:
- Server-Side API Routes for secure AI model integration
- Real-time Polling for operation status updates
- Client-Side State Management for complex UI interactions
- Material 3 Design System with custom Tailwind CSS implementation
app/api/
βββ photo-editor/generate/ # Multi-image chat generation
βββ veo/generate/ # Video generation initiation
βββ veo/operation/ # Operation status polling
βββ veo/download/ # Secure video download
βββ imagen/generate/ # Single image generation
components/ui/
βββ PhotoEditor.tsx # Main photo editing interface
βββ PhotoEditorComposer.tsx # Multi-image upload & chat input
βββ ChatMessage.tsx # Conversation message display
βββ VideoPlayer.tsx # Custom video player with timeline
βββ ModeSelector.tsx # Tab navigation component
βββ StoryboardComposer.tsx # Multi-scene project manager
- User Input β FormData with images/prompts
- API Processing β Gemini/Veo model generation
- Status Polling β Real-time progress updates
- Content Delivery β Base64/blob URL responses
- User Download β Direct file save capabilities
- Next.js 15 - Full-stack React framework
- React 19 - Modern UI library with latest features
- TypeScript - Type-safe development
- Tailwind CSS - Utility-first styling
- Google Gemini API - Advanced AI model access
- Veo 3 - State-of-the-art video generation
- Gemini 2.5 Flash - Multi-modal image processing
- Lucide React - Beautiful icon library
- React Player - Video playbook components
- RC Slider - Timeline controls
- React Dropzone - File upload handling
- Lazy Loading - Components and images load on demand
- Memory Management - Automatic cleanup of blob URLs and file references
- Progressive Enhancement - Works across all modern browsers
- Responsive Design - Mobile-first approach with desktop optimization
- Real-time Updates - Live status polling with optimized intervals
# Development
npm run dev # Start development server (http://localhost:3000)
npm run build # Create production build
npm run start # Start production server
npm run lint # Run ESLint for code quality
# Environment Setup
echo 'GEMINI_API_KEY="your-api-key-here"' > .env- Input Validation - Comprehensive server-side validation
- File Type Restrictions - Only allows safe image formats
- Size Limits - 10MB per file, 50 files maximum
- Environment Variables - Secure API key management
- Error Handling - Graceful error responses without data exposure
This project builds upon the Google Gemini Veo 3 API Quickstart and includes substantial enhancements for photo editing capabilities.
For more information, visit fitted-automation.com.
This project is licensed under the Apache License 2.0.