FoundationVision repositories

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Gen…

transformers generative-model image-generation

transformers generative-model image-generation auto-regressive-model gpt neurips gpt-2 diffusion-models autoregressive-models vision-transformer

Jupyter Notebook

•

MIT License

•566•8.7k•57•3•Updated

Nov 10, 2025

Liquid

Public

(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators

generative text-to-image image-gen

generative text-to-image image-gen autoregressive-models large-language-models text-to-image-generation llms generative-ai multimodal-large-language-models

Python

•

MIT License

•35•642•12•0•Updated

Nov 10, 2025

Infinity

Public

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

transformers generative-model image-generation

transformers generative-model image-generation auto-regressive-model gpt text-to-image gpt-2 autoregressive-models text-to-image-generation

Python

•

MIT License

•93•1.6k•61•5•Updated

Nov 10, 2025

Waver

Public

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

115•925•12•2•Updated

Aug 27, 2025

BitVAE

Public

official training and inference code of bitwise tokenizer

vae image-generation vqvae

vae image-generation vqvae autoregressive-models image-tokenizer

Python

•

MIT License

•2•71•4•0•Updated

May 18, 2025

GenerateU

Public

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

open-world object-detection multimodality

open-world object-detection multimodality open-vocabulary mllm open-vocabulary-detection

Python

•

MIT License

•8•194•15•0•Updated

Mar 29, 2025

FlashVideo

Public

[AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

generative-models video-generation diffusion-models

generative-models video-generation diffusion-models text-to-video efficient-generative-model

Python

•

Apache License 2.0

•26•458•13•1•Updated

Mar 5, 2025

UniRef

Public

[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces

object-segmentation unified-model

Python

•

MIT License

•15•238•4•0•Updated

Feb 14, 2025

flashvideo-page

Public

HTML

•0•0•0•0•Updated

Feb 10, 2025

infinity.project

Public

HTML

•0•0•0•0•Updated

Dec 24, 2024

GLEE

Public

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

tracking open-world object-detection

tracking open-world object-detection interactive-segmentation video-object-segmentation referring-expression-segmentation referring-expression-comprehension video-instance-segmentation zero-shot-object-detection referring-video-object-segmentation

Python

•

MIT License

•76•1.2k•45•2•Updated

Oct 21, 2024

LlamaGen

Public

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

image-generation llama auto-regressive-model

image-generation llama auto-regressive-model diffusion text2image diffusion-models llm

Python

•

MIT License

•93•1.9k•71•1•Updated

Aug 15, 2024

OmniTokenizer

Public

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

vae image-generation auto-regressive-model

vae image-generation auto-regressive-model tokenization video-generation vqvae

Python

•

MIT License

•8•323•9•0•Updated

Jul 9, 2024

vaex

Public

🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook

Python

•

MIT License

•8•106•4•0•Updated

Jun 23, 2024

ByteTrack

Public

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

real-time deployment pytorch

real-time deployment pytorch multi-object-tracking

Python

•

MIT License

•1.1k•6.2k•300•30•Updated

Jun 19, 2024

Groma

Public

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Python

•

Apache License 2.0

•45•586•16•1•Updated

Jun 7, 2024

VNext

Public archive

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

tracking motion transformer

tracking motion transformer object-detection instance-segmentation video-instance-segmentation

Python

•

Apache License 2.0

•56•616•41•1•Updated

Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FoundationVision

All

All

21 repositories

Alive

InfinityStar

.github

UniTok

VAR

Liquid

Infinity

Waver

BitVAE

GenerateU

FlashVideo

UniRef

flashvideo-page

infinity.project

GLEE

LlamaGen

OmniTokenizer

vaex

ByteTrack

Groma

VNext

All

All

Repositories list

21 repositories