## 🚀 Quick Start ### 🪟 Windows All-in-One Package (Recommended for Windows Users) **No need to install Python, uv, or ffmpeg - ready to use out of the box!** 👉 **[Download Windows All-in-One Package](https://github.com/AIDC-AI/Pixelle-Video/releases/latest)** 1. Download the latest Windows All-in-One Package and extract it 2. Double-click `start.bat` to launch the Web interface 3. Browser will automatically open http://localhost:8501 4. Configure LLM API and image generation service in "⚙️ System Configuration" 5. Start generating videos! > 💡 **Tip**: The package includes all dependencies, no need to manually install any environment. On first use, you only need to configure API keys. ### Install from Source (For macOS / Linux Users or Users Who Need Customization) #### Prerequisites Before starting, you need to install Python package manager `uv` and video processing tool `ffmpeg`: ##### Install uv Please visit the uv official documentation to see the installation method for your system: 👉 **[uv Installation Guide](https://docs.astral.sh/uv/getting-started/installation/)** After installation, run `uv --version` in the terminal to verify successful installation. ##### Install ffmpeg **macOS** ```bash brew install ffmpeg ``` **Ubuntu / Debian** ```bash sudo apt update sudo apt install ffmpeg ``` **Windows** - Download URL: https://ffmpeg.org/download.html - After downloading, extract and add the `bin` directory to the system environment variable PATH After installation, run `ffmpeg -version` in the terminal to verify successful installation. #### Step 1: Clone Project ```bash git clone https://github.com/AIDC-AI/Pixelle-Video.git cd Pixelle-Video ``` #### Step 2: Launch Web Interface ```bash # Run with uv (recommended, will automatically install dependencies) uv run streamlit run web/app.py ``` Browser will automatically open http://localhost:8501 #### Step 3: Configure in Web Interface On first use, expand the "⚙️ System Configuration" panel and fill in: - **LLM Configuration**: Select AI model (such as Qwen, GPT, etc.) and enter API Key - **ComfyUI / RunningHub Configuration**: Configure local ComfyUI or RunningHub API Key if you want to use workflow-based image, video, or voice generation - **API Media Model Configuration**: Configure API Key, Base URL, and proxy options for direct image/video model providers such as DashScope, OpenAI, ARK, and Kling After configuration, click "Save Configuration", and you can start generating videos!

## 💻 Usage After opening the Web interface, you will see a three-column layout. Here's a detailed explanation of each part: ### ⚙️ System Configuration (Required on First Use) Configuration is required on first use. Click to expand the "⚙️ System Configuration" panel: #### 1. LLM Configuration (Large Language Model) Used for generating video scripts. **Quick Select Preset** - Select preset model from dropdown menu (Qwen, GPT-4o, DeepSeek, etc.) - After selection, base_url and model will be automatically filled - Click "🔑 Get API Key" link to register and obtain key **Manual Configuration** - API Key: Enter your key - Base URL: API address - Model: Model name #### 2. ComfyUI / RunningHub Configuration Used for generating video images, video clips, or voices through ComfyUI workflows. **Local Deployment (Recommended)** - ComfyUI URL: Local ComfyUI service address (default http://127.0.0.1:8188) - Click "Test Connection" to confirm service is available **Cloud Deployment** - RunningHub API Key: Cloud image generation service key #### 3. API Media Model Configuration Used to directly call image, video, or asset-analysis model providers without relying on ComfyUI/RunningHub. **Supported Providers** - OpenAI / GPT Image: for GPT image generation models - DashScope / Wan / HappyHorse: for Alibaba Tongyi Wan image and video generation - Volcengine ARK / Seedream / Seedance: for Seedream image generation and Seedance video generation - Kling AI: for Kling video generation **Configurable Items** - API Key / Access Key / Secret Key: provider credentials - Base URL: model service endpoint, with official defaults prefilled in WebUI - Local proxy: for example `http://127.0.0.1:9090` - Use proxy: each provider can independently choose whether to route requests through the local proxy - Print model request parameters: debug option that prints prompts, model names, and input file paths to the terminal > 💡 If you only use ComfyUI or RunningHub, you can leave API Media Model Configuration empty. If you choose an `api/...` workflow, configure the corresponding provider credentials first. After configuration, click "Save Configuration". ### 📝 Content Input (Left Column) #### Generation Mode - **AI Generated Content**: Input topic, AI automatically creates script - Suitable for: Want to quickly generate video, let AI write script - Example: "Why develop a reading habit" - **Fixed Script Content**: Directly input complete script, skip AI creation - Suitable for: Already have ready-made script, directly generate video #### Background Music (BGM) - **No BGM**: Pure voice narration - **Built-in Music**: Select preset background music (such as default.mp3) - **Custom Music**: Put your music files (MP3/WAV, etc.) in the `bgm/` folder - Click "Preview BGM" to preview music ### 🎤 Voice Settings (Middle Column) #### TTS Workflow - Select TTS workflow from dropdown menu (supports Edge-TTS, Index-TTS, etc.) - System will automatically scan TTS workflows in the `workflows/` folder - If you know ComfyUI, you can customize TTS workflows #### Reference Audio (Optional) - Upload reference audio file for voice cloning (supports MP3/WAV/FLAC and other formats) - Suitable for TTS workflows that support voice cloning (such as Index-TTS) - Can listen directly after upload #### Preview Function - Enter test text, click "Preview Voice" to listen to the effect - Supports using reference audio for preview ### 🎨 Visual Settings (Middle Column) #### Image Generation Determine what style of images AI generates. **ComfyUI Workflow** - Select image generation workflow from dropdown menu - Supports local deployment (selfhost) and cloud (RunningHub) workflows - Also supports `api/...` direct image model workflows after configuring the corresponding provider credentials - Default uses `image_flux.json` - If you know ComfyUI, you can put your own workflows in the `workflows/` folder **Image Dimensions** - Set width and height of generated images (unit: pixels) - Default 1024x1024, can be adjusted as needed - Note: Different models have different dimension limitations **Prompt Prefix** - Controls overall image style (language needs to be English) - Example: Minimalist black-and-white matchstick figure style illustration, clean lines, simple sketch style - Click "Preview Style" to test effect #### Video Template Determines video layout and design. **Template Naming Convention** - `static_*.html`: Static templates (no AI-generated media, text-only styles) - `image_*.html`: Image templates (uses AI-generated images as background) - `video_*.html`: Video templates (uses AI-generated videos as background) **Usage** - Select template from dropdown menu, displayed grouped by dimension (portrait/landscape/square) - Click "Preview Template" to test effect with custom parameters - If you know HTML, you can create your own templates in the `templates/` folder - 🔗 [View All Template Previews](https://aidc-ai.github.io/Pixelle-Video/user-guide/templates/#built-in-template-preview) #### API Video Generation When using dynamic video templates or extension workflows, you can generate clips through direct API video models. - Supports DashScope Wan / HappyHorse, Kling, Seedance and other video models - Displays model-aware options such as resolution, aspect ratio, duration, watermark, and native audio - Supports network/download retries and LLM-based prompt neutralization retry for content-inspection failures - In the Custom Media workflow, API video segments try to follow narration audio duration and use neighboring segment information to improve continuity ### 🎬 Generate Video (Right Column) #### Generate Button - After configuring all parameters, click "🎬 Generate Video" - Shows real-time progress (generating script → generating images → synthesizing voice → composing video) - Automatically shows video preview after completion #### Progress Display - Shows current step in real-time - Example: "Frame 3/5 - Generating Image" #### Video Preview - Automatically plays after generation - Shows video duration, file size, number of frames, etc. - Video files are saved in the `output/` folder ### ❓ FAQ **Q: How long does it take to use for the first time?** A: Generation time depends on the number of video frames, network conditions, and AI inference speed, typically completed within a few minutes. **Q: What if I'm not satisfied with the video?** A: You can try: 1. Change LLM model (different models have different script styles) 2. Adjust image dimensions and prompt prefix (change image style) 3. Change TTS workflow or upload reference audio (change voice effect) 4. Try different video templates and dimensions **Q: What about the cost?** A: **This project fully supports free operation!** - **Completely Free Solution**: LLM using Ollama (local) + ComfyUI local deployment = 0 cost - **Recommended Solution**: LLM using Qwen (extremely low cost, highly cost-effective) + ComfyUI local deployment - **Cloud Solution**: LLM using OpenAI + Image using RunningHub (higher cost but no need for local environment) **Selection Suggestion**: If you have a local GPU, recommend completely free solution, otherwise recommend using Qwen (cost-effective) ## 🤝 Referenced Projects Pixelle-Video design is inspired by the following excellent open-source projects: - [Pixelle-MCP](https://github.com/AIDC-AI/Pixelle-MCP) - ComfyUI MCP server, allows AI assistants to directly call ComfyUI - [MoneyPrinterTurbo](https://github.com/harry0703/MoneyPrinterTurbo) - Excellent video generation tool - [NarratoAI](https://github.com/linyqh/NarratoAI) - Film commentary automation tool - [MoneyPrinterPlus](https://github.com/ddean2009/MoneyPrinterPlus) - Video creation platform - [ComfyKit](https://github.com/puke3615/ComfyKit) - ComfyUI workflow wrapper library Thanks for the open-source spirit of these projects! 🙏 ## 💬 Community Scan the QR codes below to join our communities for latest updates and technical support: | Discord Community | WeChat Group | | ---- | ---- | |

| ## 📢 Feedback and Support - 🐛 **Encountered Issues**: Submit [Issue](https://github.com/AIDC-AI/Pixelle-Video/issues) - 💡 **Feature Suggestions**: Submit [Feature Request](https://github.com/AIDC-AI/Pixelle-Video/issues) - ⭐ **Give a Star**: If this project helps you, feel free to give a Star for support! ## 📝 License This project is released under the Apache License 2.0. For details, please see the [LICENSE](LICENSE) file. ## 📚 Research Series | Framework | Paper | |:---:|---| | FilmAgent framework

| **[SIGGRAPH Asia 2024] FilmAgent: Automating Virtual Film Production Through a Multi-Agent Collaborative Framework**
*Zhenran Xu, Longyue Wang, Jifang Wang, Zhouyi Li, Senbao Shi, Xue Yang, Yiyu Wang, Baotian Hu, Jun Yu, Min Zhang*
[[Paper](https://arxiv.org/pdf/2501.12909)] [[GitHub](https://github.com/HITsz-TMG/VideoClaw/blob/main/FilmAgent)] | | Anim-Director result

| **[ACL 2025] ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development**
*Zhenran Xu, Xue Yang, Yiyu Wang, Qingli Hu, Zijiao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang*
[[Paper](https://aclanthology.org/2025.acl-demo.61/)] [[GitHub](https://github.com/AIDC-AI/ComfyUI-Copilot)] | | AniMaker pipeline

| **[SIGGRAPH Asia 2025] AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation**
*Haoyuan Shi, Yunxin Li, Xinyu Chen, Longyue Wang, Baotian Hu, Min Zhang*
[[Paper](https://doi.org/10.1145/3757377.3764009)] [[GitHub](https://github.com/HITsz-TMG/Anim-Director/tree/main/AniMaker)] | ## ⭐ Star History [![Star History Chart](https://api.star-history.com/svg?repos=AIDC-AI/Pixelle-Video&type=Date)](https://star-history.com/#AIDC-AI/Pixelle-Video&Date)

🎬 Pixelle-Video —— AI Fully Automated Short Video Engine

👤 AI Digital Avatar

🖼️ Image-to-Video

💃 Motion Transfer

🌄 Documentary & Lifestyle – Default Template

🔍 Cultural Deconstruction – Default Template

🔭 Scientific Inquiry – Default Template

🌱 Personal Growth – Cloned Voice

🧠 Deep Thinking – Default Template

🏯 History & Culture – Static Frame

☀️ Emotional Storytelling – Cloned Voice

📜 Novel Adaptation – Custom Script

🧬 Knowledge Explainer – Qwen Image Generation

💰 Side Hustle Money Making - Movie Template

🏛️ Historical Commentary - Custom Template