ollama-gradio-chat is lightweight local AI chat interface for experimenting with multiple Ollama models.
This project provides a simple web interface that allows you to interact with local LLMs, compare responses from different models, and display answers in structured Markdown format.
The interface is designed as a local AI laboratory for testing models and prompts.
- Runs completely locally
- Uses Ollama as the model backend
- Web interface built with Gradio
- Supports multiple models simultaneously
- Model A / Model B comparison
- Streaming responses
- Markdown rendering
- Numbered questions
- Model-labelled answers
- Optional Deep Thinking mode
- Multi-language interface: English, Swedish, Russian
Example interface:
and example answers: https://github.com/Soviet9773Red/ollama-gradio-chat/blob/main/Answers.md
You must install:
- Ollama (local LLM runtime)
- Python (Gradio and other libraries will be installed later using pip)
Download Ollama from: https://ollama.ai
Minimum system requirements for local models:
- CPU: performance comparable to Intel i5-8400 or higher 9000+ PassMark
- RAM: 16 GB min (32-128 GB recommended)
- GPU: NVIDIA GTX1660 6 GB or better (e.g. RTX 3060 12 GB)
- OS: Windows, Linux or macOS
Note:
- The application can run on CPU only, but performance may be slow.
- GPU significantly improves response time (if supported by Ollama).
- Larger models (14B+) require more RAM and VRAM.
Example models:
ollama pull cogito:14b
ollama pull qwen3.5:9b
ollama pull gemma3n:e4b-it-fp16
ollama pull ministral-3:14b
The example models above are only suggestions. You can use any models available in the Ollama library: https://ollama.com/search?o=newest
After installation verify in terminal:
ollama list
For testing and comparison it is recommended to install at least two models.
The interface supports switching between models, so having multiple models allows you to compare responses and behavior directly. Installed models also will automatically appear in the interface.
Install dependencies:
pip install gradio ollama
Or using requirements file:
pip install -r requirements.txtStart the server: python chat.py
Console output will show something like:
STEP 1: requesting model list from Ollama
STEP 2: models detected: ['None', 'qwen3.5:9b', 'ministral-3:14b', 'qwen3.5:27b', 'gemma3n:e4b-it-fp16', 'cogito:14b']
STEP 3: detecting local IP
STEP 4: local IP detected: 192.168.10.15
Local server: http://127.0.0.1:7860/?__theme=dark
LAN access: http://192.168.10.15:7860/?__theme=dark
STEP 5: starting Gradio server. ⬆ Use local or LAN IP ⬆
AI Chat version: v2.2.0 build 20260317
* Running on local URL: http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.
Open the link in your browser with Ctrl+ LMB
The interface allows:
- selecting language
- selecting Model A
- selecting Model B
- enabling Deep Thinking
- asking questions
Each question is numbered:
#1
#2
#3
Each answer is labelled with the model name. Example:
### cogito:14b
response text
### qwen3.5:27b
response text
Some models support an optional Deep Thinking mode.
When enabled the system prompt includes an additional instruction that encourages deeper reasoning.
Currently enabled for:
cogito:14b
Main components:
chat.py
Core parts:
- Ollama streaming API
- Markdown rendering
- Gradio UI
- multi-model response comparison
This interface can be used for:
- local LLM experimentation
- prompt testing
- model comparison
- multilingual interaction
- code generation testing
The interface uses a GIF animation displayed at the top.
The file ai.gif must be placed in the same directory as chat.py
You can replace ai.gif with your own animation.
If the file is missing, the image will not be displayed.
v2.2.0 build 20260317
Chat history is currently stored only in the browser session.
Changelog.md
Planned improvements include:
- Export chat history to Markdown (MD)
- Export chat history to JSON
- Persistent conversation storage
@license (c) Alexander Soviet9773Red - https://github.com/Soviet9773Red/ollama-gradio-chat
