A.I. – Page 7 – pIXELsHAM

A.I.

Netflix Eyeline-Research Go-with-the-Flow – An easy and efficient way to control the motion patterns of video diffusion models

pIXELsHAM.com

Feb 1, 2025

https://github.com/Eyeline-Research/Go-with-the-Flow

https://huggingface.co/Eyeline-Research/Go-with-the-Flow/tree/main

https://eyeline-research.github.io/Go-with-the-Flow

https://github.com/Pablerdo/hexaframe-dark

Watch this video on YouTube

Watch this video on YouTube

A.I.

Hashem Al-Ghaili – Historical Icons Brought Back to Life using AI

pIXELsHAM.com

Feb 1, 2025

A.I.

Heather Cooper – 9 Video Models Comparison: Text to video

pIXELsHAM.com

Feb 1, 2025

https://www.linkedin.com/posts/heatherbcooper_video-model-comparison-text-to-video-activity-7290822319407550464-QzUY

🔹 Google DeepMind Veo 2
🔹 OpenAI Sora
🔹 Hunyuan Video
🔹 Pika 2.1
🔹 Alibaba Cloud Wanx 2.1
🔹 Runway Gen-3
🔹 Kling AI 1.6
🔹 Luma AI Ray2
🔹 Hailuo T2V-01

Uncompressed video under the post

(more…)

A.I.

DimensionX – Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

pIXELsHAM.com

Jan 28, 2025

https://chenshuo20.github.io/DimensionX

https://github.com/wenqsun/DimensionX

https://huggingface.co/spaces/fffiloni/DimensionX

https://huggingface.co/wenqsun/DimensionX/tree/main

Watch this video on YouTube

A.I., ves

Brian Gallagher – Why Almost Everybody Is Wrong About DeepSeek vs. All the Other AI Companies

pIXELsHAM.com

Jan 28, 2025

https://lemalogic.com/post/why-almost-everybody-is-wrong-about-deepseek-vs-all-the-other-ai-companies

Benchmarks don’t capture real-world complexity like latency, domain-specific tasks, or edge cases. Enterprises often need more than raw performance, also needing reliability, ease of integration, and robust vendor support. Enterprise money will support the industries providing these services.

… it is also reasonable to assume that anything you put into the app or their website will be going to the Chinese government as well, so factor that in as well.

A.I.

ComfyUI-CogVideoXWrapper – Control motion paths in ComfyUI

pIXELsHAM.com

Jan 27, 2025

https://github.com/kijai/ComfyUI-CogVideoXWrapper

A.I.

One-Prompt-One-Story – Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

pIXELsHAM.com

Jan 27, 2025

https://byliutao.github.io/1Prompt1Story.github.io

Tneration models can create high-quality images from input prompts. However, they struggle to support the consistent generation of identity-preserving requirements for storytelling.

Our approach 1Prompt1Story concatenates all prompts into a single input for T2I diffusion models, initially preserving character identities.

A.I.

What did DeepSeek figure out about reasoning with DeepSeek-R1?

pIXELsHAM.com

Jan 27, 2025

https://www.seangoedecke.com/deepseek-r1

The Chinese AI lab DeepSeek recently released their new reasoning model R1, which is supposedly (a) better than the current best reasoning models (OpenAI’s o1- series), and (b) was trained on a GPU cluster a fraction the size of any of the big western AI labs.

DeepSeek uses a reinforcement learning approach, not a fine-tuning approach. There’s no need to generate a huge body of chain-of-thought data ahead of time, and there’s no need to run an expensive answer-checking model. Instead, the model generates its own chains-of-thought as it goes.

https://medium.com/@ShankarsPayana/how-deepseek-r1-using-fp8-instead-of-fp32-beat-openai-meta-gemini-and-claude-c105d94d0c39

The secret behind their success? A bold move to train their models using FP8 (8-bit floating-point precision) instead of the standard FP32 (32-bit floating-point precision).
…
By using a clever system that applies high precision only when absolutely necessary, they achieved incredible efficiency without losing accuracy.
…
The impressive part? These multi-token predictions are about 85–90% accurate, meaning DeepSeek R1 can deliver high-quality answers at double the speed of its competitors.

https://www.tweaktown.com/news/102798/chinese-ai-firm-deepseek-has-50-000-nvidia-h100-gpus-says-ceo-even-with-us-restrictions/index.html

Chinese AI firm DeepSeek has 50,000 NVIDIA H100 AI GPUs

Watch this video on YouTube

A.I., software

Raphael AI – World’s First Unlimited Free AI Image Generator powered by FLUX.1-Dev model

pIXELsHAM.com

Jan 26, 2025

https://raphael.app

A.I.

Texture Copilot – AI Copilot for 3D Texturing

pIXELsHAM.com

Jan 26, 2025

https://ncsoft.github.io/ncresearch/3f0ba4889e331ddbed68c9dd48d845fa18d874de

Watch this video on YouTube

A.I., modeling

CaPa – Carve-n-Paint Synthesisfor Efficient 4K Textured Mesh Generation

pIXELsHAM.com

Jan 26, 2025

https://ncsoft.github.io/CaPa

https://github.com/ncsoft/CaPa

a novel method for generating hyper-quality 4K textured mesh under only 30 seconds, providing 3D assets ready for commercial applications such as games, movies, and VR/AR.

A.I.

LumaLabs Ray2 – A large–scale video generative model

pIXELsHAM.com

Jan 26, 2025

https://lumalabs.ai/ray

A.I.

Spell.Spline – 2D-to-3D generate entire 3D scenes or “Worlds” from an image

pIXELsHAM.com

Jan 26, 2025

https://blog.spline.design/introducing-spell

https://spell.spline.design/explore/featured

Watch this video on YouTube

A.I.

The Best AI Animation Tool in 2025? (Prompt Battle)

pIXELsHAM.com

Jan 26, 2025

A.I., software

Fal Video Studio – The first open-source AI toolkit for video editing

pIXELsHAM.com

Jan 25, 2025

https://github.com/fal-ai-community/video-starter-kit

https://fal-video-studio.vercel.app

🎬 Browser-Native Video Processing: Seamless video handling and composition in the browser
🤖 AI Model Integration: Direct access to state-of-the-art video models through fal.ai
- Minimax for video generation
- Hunyuan for visual synthesis
- LTX for video manipulation
🎵 Advanced Media Capabilities:
- Multi-clip video composition
- Audio track integration
- Voiceover support
- Extended video duration handling
🛠️ Developer Utilities:
- Metadata encoding
- Video processing pipeline
- Ready-to-use UI components
- TypeScript support

A.I.

Tencent Hunyuan3D – an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets

pIXELsHAM.com

Jan 25, 2025

https://github.com/tencent/Hunyuan3D-2

Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model – Hunyuan3D-DiT, and a large-scale texture synthesis model – Hunyuan3D-Paint.

The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio – a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets.

It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and e.t.c.

A.I., software

Invoke.com – The Gen AI Platform for Pro Studios

pIXELsHAM.com

Jan 25, 2025

https://www.invoke.com

Invoke is a powerful, secure, and easy-to-deploy generative AI platform for professional studios to create visual media. Train models on your intellectual property, control every aspect of the production process, and maintain complete ownership of your data, in perpetuity.

Watch this video on YouTube

Watch this video on YouTube

Watch this video on YouTube

Watch this video on YouTube

A.I., Featured

How does Stable Diffusion work?

pIXELsHAM.com

Jan 24, 2025

https://stable-diffusion-art.com/how-stable-diffusion-work/

Stable Diffusion is a latent diffusion model that generates AI images from text. Instead of operating in the high-dimensional image space, it first compresses the image into the latent space.

Stable Diffusion belongs to a class of deep learning models called diffusion models. They are generative models, meaning they are designed to generate new data similar to what they have seen in training. In the case of Stable Diffusion, the data are images.

Why is it called the diffusion model? Because its math looks very much like diffusion in physics. Let’s go through the idea.

(more…)

A.I.

Meta DINOv2 – A Self-supervised Vision Transformer Model

pIXELsHAM.com

Jan 24, 2025

https://ai.meta.com/blog/dino-v2-computer-vision-self-supervised-learning

https://dinov2.metademolab.com

A.I.

Hunyuan video-to-video re-styling

pIXELsHAM.com

Jan 20, 2025

The open-source community has figured out how to run Hunyuan V2V using LoRAs.

You’ll need to install Kijai’s ComfyUI-HunyuanLoom and LoRAs, which you can either train yourself or find on Civitai.

https://www.linkedin.com/posts/leokadieff_ai-generativeai-filmmaking-activity-7286521455448608769-abMg

1) you’ll need HunyuanLoom, after install, workflow found in the repo.
https://github.com/logtd/ComfyUI-HunyuanLoom

2) John Wick lora found here.
https://civitai.com/models/1131159/john-wick-hunyuan-video-lora

A.I., software

KlingAI – Kolors and Elements

pIXELsHAM.com

Jan 20, 2025

https://klingai.com

Watch this video on YouTube

Watch this video on YouTube

A.I., lighting

SynthLight – Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

pIXELsHAM.com

Jan 20, 2025

https://vrroom.github.io/synthlight

A.I., modeling

Shapen – Pixels to polygons text-to-model

pIXELsHAM.com

Jan 20, 2025

https://shapen.com

A.I.

Seaweed APT – Diffusion Adversarial Post-Training for One-Step Video Generation

pIXELsHAM.com

Jan 20, 2025

https://seaweed-apt.com

https://cdn.seaweed-apt.com/assets/showreel/seaweed-apt.mp4

This demonstrate large-scale text-to-video generation with a single neural function evaluation (1NFE) by using our proposed adversarial post-training technique. Our model generates 2 seconds of 1280×720 24fps videos in real-time

A.I.

IPAdapter – Text Compatible Image Prompt Adapter for Text-to-Image Image-to-Image Diffusion Models and ComfyUI implementation

pIXELsHAM.com

Jan 17, 2025

github.com/tencent-ailab/IP-Adapter

ip-adapter.github.io/

The IPAdapter are very powerful models for image-to-image conditioning. The subject or even just the style of the reference image(s) can be easily transferred to a generation. Think of it as a 1-image lora. They are an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model.

Once the IP-Adapter is trained, it can be directly reusable on custom models fine-tuned from the same base model.

The IP-Adapter is fully compatible with existing controllable tools, e.g., ControlNet and T2I-Adapter.

Watch this video on YouTube

A.I.

Sony – Diffusion Training from Scratch on a Micro-Budget

pIXELsHAM.com

Jan 17, 2025

stability.ai/news/stable-point-aware-3d

huggingface.co/VSehwag24/MicroDiT

A.I.

GAGA – Group Any Gaussians via 3D-aware Memory Bank segmentation

pIXELsHAM.com

Jan 17, 2025

www.gaga.gallery/

https://github.com/weijielyu/Gaga

Watch this video on YouTube

A.I.

ComfyUI – Zero to hero with Cubiq (Matteo)

pIXELsHAM.com

Jan 15, 2025

A.I.

Kinetix.tech – Character motion control

pIXELsHAM.com

Jan 15, 2025

www.kinetix.tech/

www.kinetix.tech/character-motion-control-for-video-generation-models

A.I., modeling, photogrammetry

SPAR3D – Stable Point-Aware Reconstruction of 3D Objects from Single Images

pIXELsHAM.com

Jan 15, 2025

SPAR3D is a fast single-image 3D reconstructor with intermediate point cloud generation, which allows for interactive user edits and achieves state-of-the-art performance.

https://github.com/Stability-AI/stable-point-aware-3d

https://static1.squarespace.com/static/6213c340453c3f502425776e/t/677e3bc1b9e5df16b60ed4fe/1736326093956/SPAR3D+Research+Paper.pdf

https://stability.ai/news/stable-point-aware-3d?utm_source=x&utm_medium=social&utm_campaign=SPAR3D

Watch this video on YouTube

A.I.

MiniMax-01 goes open source

pIXELsHAM.com

Jan 15, 2025

MiniMax is thrilled to announce the release of the MiniMax-01 series, featuring two groundbreaking models:

MiniMax-Text-01: A foundational language model.
MiniMax-VL-01: A visual multi-modal model.

Both models are now open-source, paving the way for innovation and accessibility in AI development!

🔑 Key Innovations
1. Lightning Attention Architecture: Combines 7/8 Lightning Attention with 1/8 Softmax Attention, delivering unparalleled performance.
2. Massive Scale with MoE (Mixture of Experts): 456B parameters with 32 experts and 45.9B activated parameters.
3. 4M-Token Context Window: Processes up to 4 million tokens, 20–32x the capacity of leading models, redefining what’s possible in long-context AI applications.

💡 Why MiniMax-01 Matters
1. Innovative Architecture for Top-Tier Performance
The MiniMax-01 series introduces the Lightning Attention mechanism, a bold alternative to traditional Transformer architectures, delivering unmatched efficiency and scalability.

2. 4M Ultra-Long Context: Ushering in the AI Agent Era
With the ability to handle 4 million tokens, MiniMax-01 is designed to lead the next wave of agent-based applications, where extended context handling and sustained memory are critical.

3. Unbeatable Cost-Effectiveness
Through proprietary architectural innovations and infrastructure optimization, we’re offering the most competitive pricing in the industry:
$0.2 per million input tokens
$1.1 per million output tokens

🌟 Experience the Future of AI Today
We believe MiniMax-01 is poised to transform AI applications across industries. Whether you’re building next-gen AI agents, tackling ultra-long context tasks, or exploring new frontiers in AI, MiniMax-01 is here to empower your vision.

✅ Try it now for free: hailuo.ai

📄 Read the technical paper: filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf

🌐 Learn more: minimaxi.com/en/news/minimax-01-series-2

💡API Platform: intl.minimaxi.com/