Skip to content

genepattern/copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

68 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 GenePattern Copilot

Conversational AI for genomic science β€” ask questions in plain English, get expert-level bioinformatic answers.

Python 3.11+ Django 5.2 License: BSD 3 Clause Powered by GenePattern


Why GenePattern Copilot?

Bioinformatics pipelines are powerful β€” but they've never been easy to talk to. Researchers spend hours navigating documentation, chasing pipeline parameters, and decoding opaque error messages.

We built GenePattern Copilot to change that. It wraps the full GenePattern analysis ecosystem in a multi-turn conversational AI layer, so you can:

  • πŸ—£οΈ Ask "Run a GSEA analysis on my expression data" and get a step-by-step guided answer.
  • πŸ” Query the GenePattern module library using natural language instead of clicking through menus.
  • πŸ€– Automate job submission, result retrieval, and pipeline orchestration via an AI agent with live tool-use.
  • πŸ“œ Audit every reasoning step β€” the model's chain-of-thought, tool calls, and retrieved context are all stored and queryable.

This is not a generic chatbot bolted onto a bioinformatics portal. GenePattern Copilot is a purpose-built agentic AI that understands GenePattern's job model, module catalog, file system, and user permissions β€” and acts on them in real time.


✨ Features

Feature Description
🧬 Genomic Integration Deep integration with GenePattern modules, tasks, and pipelines. The agent can submit jobs, fetch results, and interpret outputs.
πŸ€– Multi-Model AI Swap between the most expert LLM models with a single menu click.
πŸ”§ Agentic Tool Use Powered by Pydantic AI + MCP (Model Context Protocol), the assistant has 30+ live tools covering job management, file I/O, user stats, and server introspection.
πŸ—‚οΈ RAG Pipeline A ChromaDB vector store indexes GenePattern documentation and forum posts so every answer is grounded in verified sources.
☁️ Cloud Scalability Containerized with Docker and deployed behind a Django ASGI server, ready to scale horizontally on GCP, AWS, or any Kubernetes cluster.
πŸ“Š Reproducible Science Every conversation, query, reasoning step, token count, and user rating is persisted to a relational database β€” your interactions are a first-class research artifact.
πŸ” Auth & Multi-User Session + token authentication, per-user GenePattern API key storage, and admin-only analytics endpoints.
πŸ‘ Human Feedback Loop Thumbs-up / thumbs-down ratings on every response feed directly into a queryable evaluation dataset.

πŸš€ Quick Start

1 Β· Clone & create environment

git clone https://github.com/genepattern/GPCopilot.git
cd GPCopilot

# Using Conda (recommended)
conda create -n gp_copilot python=3.11
conda activate gp_copilot

# Or using venv
python -m venv venv && source venv/bin/activate

2 Β· Install dependencies

pip install -r requirements.txt

3 Β· Configure environment variables

Create a .env file in the project root:

# ── Django ────────────────────────────────────────────────────────────────────
SECRET_KEY='your-strong-django-secret-key'   # python -c "import secrets; print(secrets.token_hex(50))"
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1
CSRF_TRUSTED_ORIGINS=http://localhost:3000,http://127.0.0.1:8000

# ── LLM API Keys (add only the providers you use) ─────────────────────────────
OPENAI_API_KEY='sk-...'
GOOGLE_GEMINI_API_KEY='AIza...'
AWS_ACCESS_KEY_ID='AKIA...'
AWS_SECRET_ACCESS_KEY='...'

# ── CORS ──────────────────────────────────────────────────────────────────────
CORS_ALLOWED_ORIGINS=http://localhost:3000

Tip: You only need the API keys for the LLM providers you intend to use. The server will gracefully skip unconfigured providers.

4 Β· Initialize the database

python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser   # optional β€” enables the /admin dashboard

5 Β· Start the server

python manage.py runserver
# β†’ http://127.0.0.1:8000/

πŸ’¬ API at a Glance

GenePattern Copilot can also be embedded in other apps via a simple RESTful API. For example, to ask a question directly:

import httpx

response = httpx.post("http://127.0.0.1:8000/api/chat/", json={
    "query": "What GenePattern modules are available for RNA-seq differential expression?",
    "model_id": "gpt-4o",
})
print(response.json()["response"])

Start a pipeline job via the agent:

response = httpx.post("http://127.0.0.1:8000/api/chat/", json={
    "query": "Run GSEA on my expression file at /uploads/expr.gct using the Hallmarks gene set.",
    "model_id": "gpt-4o",
    "conversation_id": "existing-uuid-or-omit-for-new",
}, cookies={"sessionid": your_session_cookie})
print(response.json()["response"])

The agent will automatically invoke the appropriate GenePattern MCP tools, submit the job, and report back β€” all in one conversational turn.

Key Endpoints

Method Endpoint Description
POST /api/chat/ Send a message; starts or continues a conversation
GET /api/conversations/ List the authenticated user's conversations
GET /api/conversations/<id>/ Retrieve full turn-by-turn history for a conversation
POST /api/rate/<query_id>/ Submit a thumbs-up / thumbs-down rating
GET /api/models/ List all configured and enabled LLM models
GET /api/token-summary/ Admin: aggregate token usage statistics

🧩 Ecosystem & Interoperability

GenePattern Copilot is designed to fit in, not to be a silo:

  • GenePattern Notebook β€” Run the backend locally alongside a GenePattern Notebook instance to combine programmatic LLM access with interactive notebook workflows.
  • MCP Tool Protocol β€” Any MCP-compatible tool server (local or remote) can be plugged in, making it trivial to extend the agent with new capabilities.
  • ChromaDB / RAG β€” Swap in your own document corpus (publications, SOPs, lab wikis) by pointing the vector store builder at a new source directory.
  • Django REST Framework β€” The browsable API (/api/) makes it easy to explore all endpoints interactively without a frontend.
  • Docker / Kubernetes β€” The included Dockerfile makes cloud deployment a single command away.

🐳 Docker Deployment

# Build
docker build -t genepattern/copilot .

# Run (pass your .env at runtime)
docker run -p 8000:8000 --env-file .env genepattern/copilot

🀝 Join the Conversation

We built this for the bioinformatics community β€” researchers, analysts, and developers alike. Contributions are warmly welcomed.

  • πŸ› Found a bug? Open an issue β€” the more detail, the better.
  • πŸ’‘ Have a feature idea? Start a GitHub Discussion β€” we read everything.
  • πŸ”§ Want to contribute code? Fork the repo, make your changes, and open a PR. Please include tests.
  • πŸ’¬ GenePattern community support: groups.google.com/g/genepattern-help

πŸ“– Citing This Work

If GenePattern Copilot contributes to published research, please cite:

Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0 Nature Genetics 38 no. 5 (2006): pp500-501 Google Scholar


βš™οΈ Configuration Reference

Variable Required Default Description
SECRET_KEY βœ… β€” Django secret key. Generate with secrets.token_hex(50).
DEBUG True Set False in production.
ALLOWED_HOSTS 127.0.0.1,localhost Comma-separated list of allowed hostnames.
OPENAI_API_KEY β€” OpenAI API key for GPT models.
GOOGLE_GEMINI_API_KEY β€” Google API key for Gemini models.
AWS_ACCESS_KEY_ID β€” AWS credentials for Bedrock models.
AWS_SECRET_ACCESS_KEY β€” AWS credentials for Bedrock models.
CORS_ALLOWED_ORIGINS http://localhost:3000 Frontend origins permitted for cross-site requests.
DATABASE_URL SQLite Any dj-database-url-compatible connection string.

About

GenePattern Copilot -- Your personal bioinformatics assistant

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors