Skip to content

Soviet9773Red/ollama-gradio-chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local AI Chat (Ollama + Gradio)

ollama-gradio-chat is lightweight local AI chat interface for experimenting with multiple Ollama models.

This project provides a simple web interface that allows you to interact with local LLMs, compare responses from different models, and display answers in structured Markdown format.
The interface is designed as a local AI laboratory for testing models and prompts.

Features

  • Runs completely locally
  • Uses Ollama as the model backend
  • Web interface built with Gradio
  • Supports multiple models simultaneously
  • Model A / Model B comparison
  • Streaming responses
  • Markdown rendering
  • Numbered questions
  • Model-labelled answers
  • Optional Deep Thinking mode
  • Multi-language interface: English, Swedish, Russian

Screenshot

Example interface:

and example answers: https://github.com/Soviet9773Red/ollama-gradio-chat/blob/main/Answers.md

Requirements

You must install:

  • Ollama (local LLM runtime)
  • Python (Gradio and other libraries will be installed later using pip)

1 Install Ollama

Download Ollama from: https://ollama.ai

Minimum system requirements for local models:

  • CPU: performance comparable to Intel i5-8400 or higher 9000+ PassMark
  • RAM: 16 GB min (32-128 GB recommended)
  • GPU: NVIDIA GTX1660 6 GB or better (e.g. RTX 3060 12 GB)
  • OS: Windows, Linux or macOS

Note:

  • The application can run on CPU only, but performance may be slow.
  • GPU significantly improves response time (if supported by Ollama).
  • Larger models (14B+) require more RAM and VRAM.

2 Install models

Example models:

ollama pull cogito:14b
ollama pull qwen3.5:9b
ollama pull gemma3n:e4b-it-fp16
ollama pull ministral-3:14b

The example models above are only suggestions. You can use any models available in the Ollama library: https://ollama.com/search?o=newest

After installation verify in terminal:

ollama list

For testing and comparison it is recommended to install at least two models.
The interface supports switching between models, so having multiple models allows you to compare responses and behavior directly. Installed models also will automatically appear in the interface.

3 Install Python packages

Install dependencies:

pip install gradio ollama

Or using requirements file:

pip install -r requirements.txt

4 Run the application

Start the server: python chat.py

Console output will show something like:

STEP 1: requesting model list from Ollama
STEP 2: models detected: ['None', 'qwen3.5:9b', 'ministral-3:14b', 'qwen3.5:27b', 'gemma3n:e4b-it-fp16', 'cogito:14b']
STEP 3: detecting local IP
STEP 4: local IP detected: 192.168.10.15
Local server: http://127.0.0.1:7860/?__theme=dark
LAN access:  http://192.168.10.15:7860/?__theme=dark
STEP 5: starting Gradio server. ⬆ Use local or LAN IP ⬆
AI Chat version: v2.2.0 build 20260317
* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.

Open the link in your browser with Ctrl+ LMB

Interface overview

The interface allows:

  • selecting language
  • selecting Model A
  • selecting Model B
  • enabling Deep Thinking
  • asking questions

Each question is numbered:

 #1
 #2
 #3

Each answer is labelled with the model name. Example:

### cogito:14b
response text

### qwen3.5:27b
response text

Deep Thinking mode

Some models support an optional Deep Thinking mode.

When enabled the system prompt includes an additional instruction that encourages deeper reasoning.

Currently enabled for:

cogito:14b

Architecture

Main components:

chat.py

Core parts:

  • Ollama streaming API
  • Markdown rendering
  • Gradio UI
  • multi-model response comparison

Use cases

This interface can be used for:

  • local LLM experimentation
  • prompt testing
  • model comparison
  • multilingual interaction
  • code generation testing

Custom UI animation

The interface uses a GIF animation displayed at the top.

The file ai.gif must be placed in the same directory as chat.py
You can replace ai.gif with your own animation.
If the file is missing, the image will not be displayed.

Current state and version

v2.2.0 build 20260317
Chat history is currently stored only in the browser session.
Changelog.md

Roadmap

Planned improvements include:

  • Export chat history to Markdown (MD)
  • Export chat history to JSON
  • Persistent conversation storage

MIT License

@license (c) Alexander Soviet9773Red - https://github.com/Soviet9773Red/ollama-gradio-chat

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages