LocalAI - Models

qwen3-vl-30b-a3b-instruct

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment. #### Key Enhancements: * **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. * **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos. * **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. * **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. * **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers. * **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. * **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. * **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension. #### Model Architecture Updates: 1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment. 3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-30B-A3B-Instruct.

Links

https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF

Tags

qwen3-vl-30b-a3b-thinking

Qwen3-VL-30B-A3B-Thinking is a 30B parameter model that is thinking.

Links

https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Thinking-GGUF

Tags

qwen3-vl-4b-instruct

Qwen3-VL-4B-Instruct is the 4B parameter model of the Qwen3-VL series.

Links

https://huggingface.co/unsloth/Qwen3-VL-4B-Instruct-GGUF

Tags

qwen3-vl-32b-instruct

Qwen3-VL-32B-Instruct is the 32B parameter model of the Qwen3-VL series.

Links

https://huggingface.co/unsloth/Qwen3-VL-32B-Instruct-GGUF

Tags

qwen3-vl-4b-thinking

Qwen3-VL-4B-Thinking is the 4B parameter model of the Qwen3-VL series that is thinking.

Links

https://huggingface.co/unsloth/Qwen3-VL-4B-Thinking-GGUF

Tags

qwen3-vl-2b-thinking

Qwen3-VL-2B-Thinking is the 2B parameter model of the Qwen3-VL series that is thinking.

Links

https://huggingface.co/unsloth/Qwen3-VL-2B-Thinking-GGUF

Tags

qwen3-vl-2b-instruct

Qwen3-VL-2B-Instruct is the 2B parameter model of the Qwen3-VL series.

Links

https://huggingface.co/unsloth/Qwen3-VL-2B-Instruct-GGUF

Tags

huihui-qwen3-vl-30b-a3b-instruct-abliterated

These are quantizations of the model Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF

Links

https://huggingface.co/noctrex/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF

Tags

qwen3-vl-8b-instruct

Qwen3-VL-8B-Instruct is the 8B parameter model of the Qwen3-VL series. Uses recommended default parameters according to Unsloth documentation for Qwen 3 VL.

Links

https://huggingface.co/unsloth/Qwen3-VL-8B-Instruct-GGUF

Tags

qwen3-vl-8b-thinking

Qwen3-VL-8B-Thinking is the 8B parameter model of the Qwen3-VL series that is thinking. Uses recommended default parameters according to Unsloth documentation for Qwen 3 VL.

Links

https://huggingface.co/unsloth/Qwen3-VL-8B-Thinking-GGUF

Tags

lfm2-vl-450m

LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Links

Tags

lfm2-vl-1.6b

LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Links

Tags

kokoro

Kokoro is an open-weight TTS model with 82 million parametrs. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Links

https://github.com/hexgrad/kokoro

Tags

kitten-tts

Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.

Links

https://github.com/KittenML/KittenTTS

Tags

qwen-image

We are thrilled to release Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.

Links

https://huggingface.co/Qwen/Qwen-Image

Tags

qwen-image-edit

Qwen-Image-Edit is a model for image editing, which is based on Qwen-Image.

Links

https://huggingface.co/Qwen/Qwen-Image-Edit

Tags

qwen-image-edit-2509

Qwen-Image-Edit is a model for image editing, which is based on Qwen-Image.

Links

https://huggingface.co/Qwen/Qwen-Image-Edit-2509

Tags

ltx-2

**LTX-2** is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution. **Key Features:** - **Joint Audio-Video Generation**: Generates synchronized video and audio in a single model - **Image-to-Video**: Converts static images into dynamic videos with matching audio - **High Quality**: Produces realistic video with natural motion and synchronized audio - **Open Weights**: Available under the LTX-2 Community License Agreement **Model Details:** - **Model Type**: Diffusion-based audio-video foundation model - **Architecture**: DiT (Diffusion Transformer) based - **Developed by**: Lightricks - **Paper**: [LTX-2: Efficient Joint Audio-Visual Foundation Model](https://arxiv.org/abs/2601.03233) **Usage Tips:** - Width & height settings must be divisible by 32 - Frame count must be divisible by 8 + 1 (e.g., 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121) - Recommended settings: width=768, height=512, num_frames=121, frame_rate=24.0 - For best results, use detailed prompts describing motion and scene dynamics **Limitations:** - This model is not intended or able to provide factual information - Prompt following is heavily influenced by the prompting-style - When generating audio without speech, the audio may be of lower quality **Citation:** ```bibtex @article{hacohen2025ltx2, title={LTX-2: Efficient Joint Audio-Visual Foundation Model}, author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and others}, journal={arXiv preprint arXiv:2601.03233}, year={2025} } ```

Links

https://huggingface.co/Lightricks/LTX-2

Tags

gpt-oss-20b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

gpt-oss-120b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

openai_gpt-oss-20b-neo

These are NEO Imatrix GGUFs, NEO dataset by DavidAU. NEO dataset improves overall performance, and is for all use cases. Example output below (creative), using settings below. Model also passed "hard" coding test too (6 experts); no issues (IQ4_NL). (Forcing the model to create code with no dependencies and limits of coding short cuts, with multiple loops, and in real time with no blocking in a language that does not support it normally.) Due to quanting issues with this model (which result in oddball quant sizes / mixtures), only TESTED quants will be uploaded (at the moment).

Links

https://huggingface.co/DavidAU/Openai_gpt-oss-20b-NEO-GGUF

Tags

Model Gallery

Find Your Perfect Model

Filter by Model Type

Browse by Tags

qwen3-vl-30b-a3b-instruct

qwen3-vl-30b-a3b-thinking

qwen3-vl-4b-instruct

qwen3-vl-32b-instruct

qwen3-vl-4b-thinking

qwen3-vl-2b-thinking

qwen3-vl-2b-instruct

huihui-qwen3-vl-30b-a3b-instruct-abliterated

qwen3-vl-8b-instruct

qwen3-vl-8b-thinking

lfm2-vl-450m

lfm2-vl-1.6b

kokoro

kitten-tts

qwen-image

qwen-image-edit

qwen-image-edit-2509

ltx-2

gpt-oss-20b

gpt-oss-120b

openai_gpt-oss-20b-neo