LLM Hub

Open-source Android app bringing the power of Large Language Models directly to your mobile device. Experience AI conversations with Gemma, Llama, and Phi models - all running locally for maximum privacy and offline accessibility.

Key Features

Everything you need for private, powerful AI conversations on your Android device

Multiple LLM Models
Support for Gemma-3, Llama-3.2, Phi-4, and Gemma-3n multimodal models
Privacy First
Complete privacy - your conversations never leave your device
Vision Support
Multimodal models that understand text, images, and audio input
Writing Aid
AI-powered writing assistance: summarize, expand, rewrite, improve text, or generate code
Translator
Translate text, images (OCR), and audio across 50+ languages - works offline
Audio Transcription
Convert speech to text with on-device processing using Gemma-3n models
Text-to-Speech
TTS with auto-readout for AI responses during conversations
Image Generator
Create images from text prompts using Stable Diffusion 1.5 with swipeable gallery for variations
Scam Detector
AI-powered fraud detection for messages, emails, and images with risk assessment
GPU Acceleration
Optimized performance on supported devices (8GB+ RAM recommended)
Offline Usage
Chat without internet connection after model download - complete offline functionality
Direct Downloads
Download models directly from HuggingFace or import custom MediaPipe models

Supported Models

Choose from a variety of state-of-the-art LLM models optimized for mobile devices

Gemma-3 1B
Google
Optimized text models for mobile with GPU acceleration. Multiple quantizations and context windows (1.2k-4k)
INT4 - 529MB • INT8 - 1005MB-1024MB
Text generationGPU acceleration
Gemma-3 GGUF Multimodal
Google
Vision & Audio
GGUF multimodal models with vision projectors for CPU/GPU/NPU acceleration
4B (3.0GB) • 12B (7.7GB)
Text generationImage understandingCPU/GPU/NPU acceleration
Gemma-3n Multimodal
Google
Vision & Audio
Multimodal models with text, vision, and audio capabilities. Selective parameter activation
E2B (3.15GB) • E4B (4.33GB)
Text generationImage understandingAudio transcriptionGPU acceleration
Llama-3.2 (LiteRT)
Meta
Meta's INT8 models for on-device inference with 4k context
1B (2.01GB) • 3B (5.11GB)
Text generationCPU only4k context
Llama-3.2 GGUF
Meta
GGUF format Llama models with 128k context and CPU/GPU/NPU support
1B & 3B - Multiple quantizations
Text generation128k contextCPU/GPU/NPU acceleration
IBM Granite 4.0 H-Tiny
IBM
Compact enterprise model with 128k context window and multiple quantizations
7B - Q2_K to f16 (2.59GB-13.9GB)
Text generation128k contextCPU/GPU acceleration
IBM Granite 4.0 H-Small
IBM
High-quality enterprise model with 128k context window
32B - Q2_K to f16 (11.8GB-64.4GB)
Text generation128k contextCPU/GPU acceleration
Phi-4 Mini
Microsoft
Microsoft's efficient model for advanced reasoning with GPU support on 8GB+ devices
INT8 (3.91GB)
Text generation4k contextGPU acceleration (8GB+ RAM)
LFM-2.5 1.2B Instruct
LiquidAI
LiquidAI's efficient instruction model with 128k context. Available in GGUF and ONNX formats
GGUF & ONNX variants
Text generation128k contextCPU/GPU acceleration
LFM-2.5 1.2B Thinking
LiquidAI
Reasoning model with 'thinking' mode support and 128k context. Available in GGUF and ONNX
GGUF & ONNX variants
Text generationThinking mode128k contextCPU/GPU acceleration
LFM-2.5 VL 1.6B
LiquidAI
Vision & Audio
Vision-language model supporting both text and image input with 128k context
GGUF with vision projectors
Text generationImage understanding128k contextCPU/GPU acceleration
Ministral-3 3B
MistralAI
Vision & Audio
Multimodal instruction model with vision support and 262k context. Available in GGUF and ONNX
GGUF & ONNX variants
Text generationImage understanding262k contextCPU/GPU acceleration
Absolute Reality SD1.5
Stable Diffusion
Image Generation
Image generation model for creating images from text prompts
MNN (CPU) • QNN (NPU)
Image generationCPU/NPU acceleration (8gen1/2/3/4)
Gecko-110M
Google
Embeddings (RAG)
Compact embedding model for RAG memory system with multiple dimension options
64D-1024D embeddings
Text embeddingsRAG support
EmbeddingGemma-300M
Google
Embeddings (RAG)
High-quality text embeddings for enhanced RAG retrieval
High-quality embeddings
Text embeddingsRAG support

Advanced Capabilities

Powerful features that enhance your AI experience while maintaining complete privacy

RAG Memory System

On-device Retrieval-Augmented Generation with local embeddings and semantic search

Global context memoryDocument chunkingPersistent embeddingsNo external endpoints
Web Search Integration

Built-in DuckDuckGo search for fact-checking and real-time information

Content-aware searchesInstant Answer APIOptional augmentationPrivacy-focused
Custom Model Import

Import your own MediaPipe-compatible models

.task format.litertlm formatMediaPipe Model MakerAI Edge Converter
Smart AI Tools

Comprehensive suite of AI-powered productivity tools

Writing assistanceMulti-language translationAudio transcriptionScam detection

Technology Stack

Built with modern Android development tools and cutting-edge AI technology

Kotlin
Modern Android development language
Jetpack Compose
Modern UI toolkit for Android
MediaPipe & LiteRT
AI runtime (formerly TensorFlow Lite)
Nexa SDK
GGUF model inference on CPU/GPU/NPU
ONNX Runtime
Cross-platform AI inference engine
INT4/INT8 Quantization
Optimized model compression for mobile
GPU Acceleration
LiteRT XNNPACK & Qualcomm GPU delegates
NPU Acceleration
Snapdragon NPU support: 8gen4 for GGUF, 8gen1/2/3/4 for image gen
HuggingFace
Model source and hosting

Requirements

Android 8.0+
API level 26 or higher
2GB+ RAM
6GB+ recommended, 8GB+ for Phi-4 GPU
1GB - 5GB Storage
Depending on selected models
Internet
Required only for model downloads

How It Works

LLM Hub uses Google's MediaPipe framework with LiteRT to run quantized AI models directly on your Android device

1
Download
Pre-optimized .task files from HuggingFace
2
Load
Models into MediaPipe's LLM Inference API
3
Process
Your input locally using CPU or GPU
4
Generate
Responses without sending data to external servers

⭐ GitHub Star History

Join thousands of developers who have starred LLM Hub on GitHub

Ready to Experience Private AI?

Download LLM Hub today and start having AI conversations that stay completely private on your device.