ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

8.7k

Stars

+229

Gained

2.7%

Growth

Python

Language

View on GitHub → ↑0.1% this week

💡 Why It Matters

The ipex-llm repository addresses the challenge of accelerating local inference and fine-tuning for large language models (LLMs) on various Intel XPU configurations. This is particularly beneficial for ML/AI teams looking to optimise their workflows without relying on cloud services. With 8,677 stars, it demonstrates a solid level of community interest and maturity, indicating that it is a production-ready solution for engineers. However, it may not be suitable for teams that require extensive support for non-Intel hardware or those needing a fully managed service.

🎯 When to Use

This tool is a strong choice when teams need to enhance the performance of LLMs on local hardware, especially in environments with Intel GPUs. Teams should consider alternatives if they require support for a wider range of hardware or if they prefer a fully cloud-based solution.

👥 Team Fit & Use Cases

This open source tool is primarily used by ML engineers and data scientists who need to integrate LLMs into their applications. It is commonly included in products that require advanced natural language processing capabilities, such as chatbots, recommendation systems, and AI-driven analytics platforms.

🎭 Best For

Machine Learning and AI Engineer

🏷️ Topics & Ecosystem

gpu llm pytorch transformers

📊 Activity

Latest commit: 2026-01-28. Over the past 96 days, this repository gained 229 stars (+2.7% growth). Activity data is based on daily RepoPi snapshots of the GitHub repository.