TeaRAG is a token‑efficient, agentic Retrieval‑Augmented Generation framework that solves complex queries with fewer tokens and faster reasoning. By compressing both retrieval content and reasoning steps, TeaRAG delivers +4% / +2% EM gains on Llama3‑8B‑Instruct and Qwen2.5‑14B‑Instruct while cutting token usage by ~60%. Built on FlashRAG, it integrates graph‑based knowledge retrieval and a novel Iterative Process‑aware DPO to achieve better results and higher efficiency in agentic RAG.
Our models and the Wiki corpus–based knowledge graph are available on Hugging Face: 🤗 zclfe/tearag
# Clone TeaRAG (built on top of FlashRAG)
git clone https://github.com/Applied-Machine-Learning-Lab/TeaRAG.git
cd TeaRAG
pip install -e .
# FlashRAG dependencies
pip install vllm>=0.4.1
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
# Training dependencies
pip3 install flash-attn --no-build-isolation
pip install accelerate==0.34.2
pip install trl==0.17.0
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
--index-url https://download.pytorch.org/whl/cu126
# Redis setup
pip install redis
sudo apt install redis-server -y
redis-server --dir path/redis \
--appendonly yes \
--appendfilename appendonly.aof \
--daemonize yes \
--port 6379
# Test Redis (note: dataset loading may take some time)
redis-cli GET t:77899428TeaRAG's file organization should be constructed as follows, to minimize modifications to the code.
├── Root Path
├── TeaRAG # Source code for TeaRAG
├── model # Saved pre-trained models
├── data # Datasets and corpora
├── train_log # Intermediate outputs and trained models
├── log # Inference logs (config, intermediate results, final results)
├── redis # Redis-based knowledge graph storage
├── index # Pre-built retrieval indexes
This repository builds on FlashRAG and uses a similar structure except for the alg directory, which contains our customized scripts:
├── alg
├── config # Fixed inference hyper-parameter configs
├── data # Dataset preparation scripts
├── download # Model + dataset download scripts
├── ds_config # DeepSpeed config
├── index # Index construction scripts
├── infer_script # Inference scripts
├── method # Method entry points
├── train # Training code
├── train_script # Training scripts
├── prepare.sh # Full preparation pipeline
├── run_pipeline_medium.sh # Full pipeline for Llama3-8B-Instruct
├── run_pipeline_qwen.sh # Full pipeline for Qwen2.5-14B-Instruct
Prepare (download data, build KG + index, prepare training set)
cd alg
bash prepare.shTrain & Evaluate (Llama-3-8B-Instruct)
cd alg
bash run_pipeline_medium.shTrain & Evaluate (Qwen2.5-14B-Instruct)
cd alg
bash run_pipeline_qwen.shRun inference for all datasets/models/baselines
cd alg/infer_script
bash run_all.shFor questions, suggestions, or bug reports, please reach out:
📧 zclfe00@gmail.com
We welcome contributions and feedback to make TeaRAG even better!
- FlashRAG – TeaRAG is built upon the overall framework of FlashRAG.
- Xiaohongshu – This research was supported by computational resources from Xiaohongshu’s Search group, which greatly facilitated this research.
If TeaRAG is helpful in your research or applications, please consider citing our work:
@article{zhang2025tearag,
title={TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework},
author={Zhang, Chao and Wang, Yuhao and Xu, Derong and Zhang, Haoxin and Lyu, Yuanjie and Chen, Yuhao and Liu, Shuochen and Xu, Tong and Zhao, Xiangyu and Gao, Yan and others},
journal={arXiv preprint arXiv:2511.05385},
year={2025}
}

