Skip to content

Combining the Google image and video editors in order to create consistent characters and scenes

License

Notifications You must be signed in to change notification settings

rdfitted/storycomposer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Image & Video Editor

A comprehensive AI-powered creative platform that combines advanced photo editing with professional video generation. Create, edit, and enhance images using Gemini 2.5 Flash's multi-image capabilities, then generate stunning videos with Veo 3 - all in one seamless interface.

Note

This application features three distinct creative modes with a modern Material 3 Expressive design system, providing professional-grade tools for content creators, designers, and storytellers.

🎨 Three Creative Modes

1. Photo Editor (Primary Mode)

  • Multi-Image Chat Interface: Upload up to 50 reference images for context
  • Iterative Editing: Conversational workflow for step-by-step image refinement
  • Character Consistency: Maintain subjects and styles across generations
  • Advanced Prompting: Leverage Gemini 2.5 Flash's full 3,600 image API capability
  • Download Management: Save all generated variations with timestamps

2. Single Video Generation

  • Text-to-Video: Create videos from detailed text descriptions
  • Image-to-Video: Transform static images into dynamic video content
  • Custom Aspect Ratios: Support for 16:9, 9:16, 1:1, 4:5, and more
  • Timeline Editor: Trim videos with precision using browser-based tools

3. Storyboard Mode

  • Multi-Scene Projects: Create complex video narratives with multiple scenes
  • Drag & Drop Organization: Reorder scenes with intuitive interface
  • Batch Generation: Process multiple scenes with progress tracking
  • Scene Management: Individual prompts and settings per scene

πŸš€ Quick Start Guide

Prerequisites

  • Node.js (version 18 or higher) and npm
  • Gemini API Key (Paid tier required) from AI Studio

Warning

Paid Tier Required: Veo 3 video generation and Gemini 2.5 Flash image editing require the Gemini API Paid tier.

Installation

  1. Clone the repository

    git clone https://github.com/rdfitted/storycomposer.git
    cd storycomposer
  2. Install dependencies

    npm install
  3. Set up environment variables Create a .env file in the project root:

    GEMINI_API_KEY="your-gemini-api-key-here"
  4. Start the development server

    npm run dev
  5. Open your browser Navigate to http://localhost:3000

🎯 Getting Started with Each Mode

Photo Editor Mode

  1. Upload Reference Images: Click "Images" button to add up to 50 context images
  2. Start Chatting: Describe what you want to create or edit
  3. Iterate & Refine: Continue the conversation to refine your images
  4. Download Results: Click download buttons on generated images

Single Video Mode

  1. Add Image (Optional): Click "Image" to upload a reference image
  2. Write Prompt: Describe your video in the text area
  3. Configure Settings: Choose aspect ratio and model
  4. Generate: Click the arrow button to start video generation
  5. Edit & Download: Use timeline controls to trim and download

Storyboard Mode

  1. Create Scenes: Click "Add Scene" to start building your storyboard
  2. Upload Images: Add reference images for each scene
  3. Write Prompts: Describe each scene's action
  4. Generate All: Click "Generate All Scenes" for batch processing
  5. Organize: Drag and drop to reorder your final storyboard

πŸ› οΈ Technical Overview

Architecture

Built with Next.js 15 and React 19, featuring:

  • Server-Side API Routes for secure AI model integration
  • Real-time Polling for operation status updates
  • Client-Side State Management for complex UI interactions
  • Material 3 Design System with custom Tailwind CSS implementation

API Routes

app/api/
β”œβ”€β”€ photo-editor/generate/     # Multi-image chat generation
β”œβ”€β”€ veo/generate/             # Video generation initiation  
β”œβ”€β”€ veo/operation/            # Operation status polling
β”œβ”€β”€ veo/download/             # Secure video download
└── imagen/generate/          # Single image generation

Key Components

components/ui/
β”œβ”€β”€ PhotoEditor.tsx           # Main photo editing interface
β”œβ”€β”€ PhotoEditorComposer.tsx   # Multi-image upload & chat input
β”œβ”€β”€ ChatMessage.tsx           # Conversation message display
β”œβ”€β”€ VideoPlayer.tsx           # Custom video player with timeline
β”œβ”€β”€ ModeSelector.tsx          # Tab navigation component
└── StoryboardComposer.tsx    # Multi-scene project manager

Data Flow

  1. User Input β†’ FormData with images/prompts
  2. API Processing β†’ Gemini/Veo model generation
  3. Status Polling β†’ Real-time progress updates
  4. Content Delivery β†’ Base64/blob URL responses
  5. User Download β†’ Direct file save capabilities

πŸ”§ Technologies & Dependencies

Core Stack

AI Integration

  • Google Gemini API - Advanced AI model access
  • Veo 3 - State-of-the-art video generation
  • Gemini 2.5 Flash - Multi-modal image processing

UI Components & Interactions

⚑ Performance Features

  • Lazy Loading - Components and images load on demand
  • Memory Management - Automatic cleanup of blob URLs and file references
  • Progressive Enhancement - Works across all modern browsers
  • Responsive Design - Mobile-first approach with desktop optimization
  • Real-time Updates - Live status polling with optimized intervals

πŸš€ Development Commands

# Development
npm run dev      # Start development server (http://localhost:3000)
npm run build    # Create production build
npm run start    # Start production server
npm run lint     # Run ESLint for code quality

# Environment Setup
echo 'GEMINI_API_KEY="your-api-key-here"' > .env

πŸ›‘οΈ Security & Best Practices

  • Input Validation - Comprehensive server-side validation
  • File Type Restrictions - Only allows safe image formats
  • Size Limits - 10MB per file, 50 files maximum
  • Environment Variables - Secure API key management
  • Error Handling - Graceful error responses without data exposure

πŸ“ Attribution

This project builds upon the Google Gemini Veo 3 API Quickstart and includes substantial enhancements for photo editing capabilities.

For more information, visit fitted-automation.com.

πŸ“„ License

This project is licensed under the Apache License 2.0.

About

Combining the Google image and video editors in order to create consistent characters and scenes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •