Skip to content

SJTU-DENG-Lab/ProductWebGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProductWebGen: Benchmarking Multimodal Webpage Generation


📖 Abstract

Crafting a product display webpage from a source product image, along with layout and visual content instructions, holds significant practical value for domains such as marketing, advertising, and E-commerce. Intuitively, this task demands strict visual consistency across product displays and high-fidelity instruction following to jointly generate renderable HTML code. These requirements on controllability and instruction-following are closely aligned with the core features of advanced multimodal generative models, such as image editing models and unified models (UMs). % which encompass controllable and consistent generation, instruction following To this end, this paper introduces ProductWebGen to systematically benchmark the product webpage generation capacities of these models. We organize ProductWebGen with 500 test samples covering 13 product categories; each sample consists of a source image, a visual content instruction, and a webpage instruction. The task is to generate a product showcase webpage including multiple consistent images in accordance with the source image and instructions. Given the mixed-modality input-output nature of the task, we design and systematically compare two workflows for evaluation---one uses large language models (LLMs) and image editing models to separately generate HTML code and images (editing-based), %while the other relies on UMs for co-generation (UM-based). while the other relies on a single UM to generate both, with image generation conditioned on the preceding multimodal context (UM-based). Empirical results show that editing-based approaches achieve leading results in webpage instruction following and content appeal, while UM-based ones may display more advantages in fulfilling visual content instructions. We also construct a supervised fine-tuning (SFT) dataset, ProductWebGen-1k, with 1,000 groups of real product images and LLM-generated HTML code. We verify its effectiveness on the open-source UM BAGEL.

💡 Core Methodology

We evaluate two primary workflows for multimodal webpage generation: the Editing-based approach and the UM-based approach. The key difference lies in how the multiple images are generated.

  • Editing-based: An LLM first generates the complete HTML code along with textual descriptions for the images (often in alt tags). These descriptions are then fed, along with the source image, into a specialized image editing model to produce the final images.
  • UM-based: A Unified Model (UM) generates the images by conditioning on a multimodal context, which can include the source image, previously generated images, textual descriptions and user instructions. This allows the model to maintain better consistency across images.

ProductWebGen Approaches

🛠️ Environment Setup

  1. Set up environment
git clone https://github.com/SJTU-DENG-Lab/ProductWebGen.git
cd ProductWebGen
conda create -n ProductWebGen python=3.10 -y
conda activate ProductWebGen
pip install -r requirements.txt
pip install ninja
pip install flash-attn==2.8.3 --no-build-isolation

The ProductWebGen environment is used for evaluation and inference with editing-based methods.

  1. Model-Specific Setup
    • API-based Models (Gemini, etc.): All API-based models are accessed via the OpenRouter API. Please obtain your API key from OpenRouter and pass it via the --api_key argument in the run commands.
    • Open-Source UMs (BAGEL, Ovis-U1, OmniGen2): For inference with BAGEL, Ovis-U1, and OmniGen2, please refer to their respective official projects for detailed instructions on setting up the environment and downloading pre-trained model weights.

🚀 Running the Benchmark

The following commands detail how to run inference for all baseline models and how to evaluate the results.

Benchmark download: ProductWebGen on Hugging Face

Fine-tuning dataset download: ProductWebGen-1K on Hugging Face

Note: Please replace ProductWebGen_benchmark.json with the correct benchmark JSON file path, "your_model_path" with the actual path to your downloaded model weights, and "xxxx" with your OpenRouter API key.

1. Inference

Editing-based Approaches

Step 1: Generate HTML (using an LLM)

python inference/editing-based/generate_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_name "x-ai/grok-4" --start 0 --end 1 --api_key "xxxx" --output_path "editing-result"

Step 2: Generate Images (using an Image Editor)

python inference/editing-based/edit_source_image_qwen.py --benchmark_path "ProductWebGen_benchmark.json" --model_path your_model_path --start 0 --end 1 --output_path "editing-result"
python inference/editing-based/edit_source_image_flux.py --benchmark_path "ProductWebGen_benchmark.json" --model_path your_model_path --start 0 --end 1 --output_path "editing-result"

UM-based Approaches

# Gemini-2.5-Flash-Image
python inference/um-based/Gemini-2.5-flash-image/nano_banana_generate_html.py --benchmark_path "ProductWebGen_benchmark.json" --start 0 --end 1 --api_key "xxxx" --output_path "um-result"
python inference/um-based/Gemini-2.5-flash-image/nano_banana_generate_image_without_html.py --benchmark_path "ProductWebGen_benchmark.json" --start 0 --end 1 --api_key "xxxx" --output_path "um-result"
python inference/um-based/Gemini-2.5-flash-image/nano_banana_generate_image_with_html.py --benchmark_path "ProductWebGen_benchmark.json" --start 0 --end 1 --api_key "xxxx" --output_path "um-result"

# BAGEL
python inference/um-based/Bagel/bagel_generate_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path "inference/um-based/Bagel/models/BAGEL-7B-MoT" --start 0 --end 1 --output_path "bagel-result"
python inference/um-based/Bagel/bagel_generate_image_without_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path "inference/um-based/Bagel/models/BAGEL-7B-MoT" --start 0 --end 1 --output_path "bagel-result"
python inference/um-based/Bagel/bagel_generate_image_with_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path "inference/um-based/Bagel/models/BAGEL-7B-MoT" --start 0 --end 1 --output_path "bagel-result"

# Ovis-U1
python inference/um-based/Ovis-U1/ovis_generate_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path "inference/um-based/Ovis-U1/Ovis-U1-3B" --start 0 --end 1 --output_path "ovis-result"
python inference/um-based/Ovis-U1/ovis_generate_image_without_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path "inference/um-based/Ovis-U1/Ovis-U1-3B" --start 0 --end 1 --output_path "ovis-result"
python inference/um-based/Ovis-U1/ovis_generate_image_with_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path "inference/um-based/Ovis-U1/Ovis-U1-3B" --start 0 --end 1 --output_path "ovis-result"

# OmniGen2
python inference/um-based/OmniGen2/omnigen2_generate_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path your_model_path --start 0 --end 1 --output_path "omnigen2-result"
python inference/um-based/OmniGen2/omnigen2_generate_image_without_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path your_model_path --start 0 --end 1 --output_path "omnigen2-result"
python inference/um-based/OmniGen2/omnigen2_generate_image_with_html.py --benchmark_path "ProductWebGen_benchmark.json" --model_path your_model_path --start 0 --end 1 --output_path "omnigen2-result"

2. Evaluation

After running inference, your output directory should be structured as follows:

example-result/
└── 1
    ├── 1.html
    ├── 1_edit_1_qwen.jpg
    ├── 1_edit_2_qwen.jpg
    ├── 1_edit_3_qwen.jpg
    └── 1_edit_4_qwen.jpg

Render the HTML files and take screenshots for visual evaluation, then evaluate the results.

python evaluate/screenshot.py --benchmark_path "ProductWebGen_benchmark.json" --inference_result_path "example-result" --start 0 --end 1
python evaluate/metric.py --benchmark_path "ProductWebGen_benchmark.json" --inference_result_path "example-result" --start 0 --end 1 --api_key "xxxx" --output_path "evaluate-result"

🙏 Acknowledgements

We would like to sincerely thank the developers of the open-source models BAGEL, Ovis-U1, OmniGen2, Qwen-Image-Edit, FLUX.1 Kontext, as our work is heavily built upon these resources.

📜 Citation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors