Conversational AI for genomic science β ask questions in plain English, get expert-level bioinformatic answers.
Bioinformatics pipelines are powerful β but they've never been easy to talk to. Researchers spend hours navigating documentation, chasing pipeline parameters, and decoding opaque error messages.
We built GenePattern Copilot to change that. It wraps the full GenePattern analysis ecosystem in a multi-turn conversational AI layer, so you can:
- π£οΈ Ask "Run a GSEA analysis on my expression data" and get a step-by-step guided answer.
- π Query the GenePattern module library using natural language instead of clicking through menus.
- π€ Automate job submission, result retrieval, and pipeline orchestration via an AI agent with live tool-use.
- π Audit every reasoning step β the model's chain-of-thought, tool calls, and retrieved context are all stored and queryable.
This is not a generic chatbot bolted onto a bioinformatics portal. GenePattern Copilot is a purpose-built agentic AI that understands GenePattern's job model, module catalog, file system, and user permissions β and acts on them in real time.
| Feature | Description | |
|---|---|---|
| 𧬠| Genomic Integration | Deep integration with GenePattern modules, tasks, and pipelines. The agent can submit jobs, fetch results, and interpret outputs. |
| π€ | Multi-Model AI | Swap between the most expert LLM models with a single menu click. |
| π§ | Agentic Tool Use | Powered by Pydantic AI + MCP (Model Context Protocol), the assistant has 30+ live tools covering job management, file I/O, user stats, and server introspection. |
| ποΈ | RAG Pipeline | A ChromaDB vector store indexes GenePattern documentation and forum posts so every answer is grounded in verified sources. |
| βοΈ | Cloud Scalability | Containerized with Docker and deployed behind a Django ASGI server, ready to scale horizontally on GCP, AWS, or any Kubernetes cluster. |
| π | Reproducible Science | Every conversation, query, reasoning step, token count, and user rating is persisted to a relational database β your interactions are a first-class research artifact. |
| π | Auth & Multi-User | Session + token authentication, per-user GenePattern API key storage, and admin-only analytics endpoints. |
| π | Human Feedback Loop | Thumbs-up / thumbs-down ratings on every response feed directly into a queryable evaluation dataset. |
git clone https://github.com/genepattern/GPCopilot.git
cd GPCopilot
# Using Conda (recommended)
conda create -n gp_copilot python=3.11
conda activate gp_copilot
# Or using venv
python -m venv venv && source venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root:
# ββ Django ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SECRET_KEY='your-strong-django-secret-key' # python -c "import secrets; print(secrets.token_hex(50))"
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1
CSRF_TRUSTED_ORIGINS=http://localhost:3000,http://127.0.0.1:8000
# ββ LLM API Keys (add only the providers you use) βββββββββββββββββββββββββββββ
OPENAI_API_KEY='sk-...'
GOOGLE_GEMINI_API_KEY='AIza...'
AWS_ACCESS_KEY_ID='AKIA...'
AWS_SECRET_ACCESS_KEY='...'
# ββ CORS ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CORS_ALLOWED_ORIGINS=http://localhost:3000Tip: You only need the API keys for the LLM providers you intend to use. The server will gracefully skip unconfigured providers.
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser # optional β enables the /admin dashboardpython manage.py runserver
# β http://127.0.0.1:8000/GenePattern Copilot can also be embedded in other apps via a simple RESTful API. For example, to ask a question directly:
import httpx
response = httpx.post("http://127.0.0.1:8000/api/chat/", json={
"query": "What GenePattern modules are available for RNA-seq differential expression?",
"model_id": "gpt-4o",
})
print(response.json()["response"])Start a pipeline job via the agent:
response = httpx.post("http://127.0.0.1:8000/api/chat/", json={
"query": "Run GSEA on my expression file at /uploads/expr.gct using the Hallmarks gene set.",
"model_id": "gpt-4o",
"conversation_id": "existing-uuid-or-omit-for-new",
}, cookies={"sessionid": your_session_cookie})
print(response.json()["response"])The agent will automatically invoke the appropriate GenePattern MCP tools, submit the job, and report back β all in one conversational turn.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat/ |
Send a message; starts or continues a conversation |
GET |
/api/conversations/ |
List the authenticated user's conversations |
GET |
/api/conversations/<id>/ |
Retrieve full turn-by-turn history for a conversation |
POST |
/api/rate/<query_id>/ |
Submit a thumbs-up / thumbs-down rating |
GET |
/api/models/ |
List all configured and enabled LLM models |
GET |
/api/token-summary/ |
Admin: aggregate token usage statistics |
GenePattern Copilot is designed to fit in, not to be a silo:
- GenePattern Notebook β Run the backend locally alongside a GenePattern Notebook instance to combine programmatic LLM access with interactive notebook workflows.
- MCP Tool Protocol β Any MCP-compatible tool server (local or remote) can be plugged in, making it trivial to extend the agent with new capabilities.
- ChromaDB / RAG β Swap in your own document corpus (publications, SOPs, lab wikis) by pointing the vector store builder at a new source directory.
- Django REST Framework β The browsable API (
/api/) makes it easy to explore all endpoints interactively without a frontend. - Docker / Kubernetes β The included
Dockerfilemakes cloud deployment a single command away.
# Build
docker build -t genepattern/copilot .
# Run (pass your .env at runtime)
docker run -p 8000:8000 --env-file .env genepattern/copilotWe built this for the bioinformatics community β researchers, analysts, and developers alike. Contributions are warmly welcomed.
- π Found a bug? Open an issue β the more detail, the better.
- π‘ Have a feature idea? Start a GitHub Discussion β we read everything.
- π§ Want to contribute code? Fork the repo, make your changes, and open a PR. Please include tests.
- π¬ GenePattern community support: groups.google.com/g/genepattern-help
If GenePattern Copilot contributes to published research, please cite:
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0 Nature Genetics 38 no. 5 (2006): pp500-501 Google Scholar
| Variable | Required | Default | Description |
|---|---|---|---|
SECRET_KEY |
β | β | Django secret key. Generate with secrets.token_hex(50). |
DEBUG |
True |
Set False in production. |
|
ALLOWED_HOSTS |
127.0.0.1,localhost |
Comma-separated list of allowed hostnames. | |
OPENAI_API_KEY |
β | OpenAI API key for GPT models. | |
GOOGLE_GEMINI_API_KEY |
β | Google API key for Gemini models. | |
AWS_ACCESS_KEY_ID |
β | AWS credentials for Bedrock models. | |
AWS_SECRET_ACCESS_KEY |
β | AWS credentials for Bedrock models. | |
CORS_ALLOWED_ORIGINS |
http://localhost:3000 |
Frontend origins permitted for cross-site requests. | |
DATABASE_URL |
SQLite | Any dj-database-url-compatible connection string. |