ProtSearch is a full-stack web application that helps researchers discover and summarize scientific literature about proteins. It searches EuropePMC for relevant papers, validates protein names, and generates AI-powered summaries of the research findings.
- Protein Search: Search for scientific papers related to one or more proteins
- Protein Validation: Automatic validation and suggestions for protein names using gene alias services
- Flexible Search Modes:
- Search for papers containing all specified proteins together (AND mode)
- Search for papers for each protein individually (OR mode)
- Additional Search Terms: Add custom search terms with AND/OR operators to refine results
- AI-Powered Summaries: Generate comprehensive summaries of research findings using AI (OpenAI or Google Gemini)
- Paper Management: View abstracts, access PubMed links, and copy content for further analysis
- Framework: Next.js 16, React 19, TypeScript
- Styling: Tailwind CSS
- Icons: Heroicons
- State Management: React Hooks, LocalStorage
- Framework: Flask (Python)
- API: RESTful API with Server-Sent Events for streaming
- Services:
- EuropePMC integration for paper search
- Gene alias validation
- OpenAI/Gemini integration for AI summaries
- UniProt integration
- Deployment: Docker-ready with Gunicorn
protsearchself/
├── protsearch/ # Frontend (Next.js)
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Main search interface
│ │ │ ├── results/
│ │ │ │ └── page.tsx # Results display page
│ │ │ └── layout.tsx # Root layout
│ │ ├── env.js # Environment variable validation
│ │ └── styles/
│ │ └── globals.css # Global styles
│ ├── public/ # Static assets
│ └── package.json
├── backend/ # Backend API (Flask)
│ ├── app.py # Flask app entry point
│ ├── api/
│ │ └── src/
│ │ ├── index.py # Main API routes
│ │ ├── services/ # Backend services
│ │ │ ├── pubmedhelper.py
│ │ │ ├── genealias.py
│ │ │ ├── llmhelper.py
│ │ │ ├── summarizationwrapper.py
│ │ │ └── uniprothelper.py
│ │ └── config.yaml
│ ├── requirements.txt
│ └── Dockerfile
├── requirements.txt # Root-level Python dependencies
└── pyproject.toml # Python project configuration
- Node.js 18+ and npm
- Python 3.8+ (for backend)
- API Keys (optional but recommended):
- OpenAI API key OR Google Gemini API key for AI summaries
- Without API keys, the system will use abstracts only
git clone <repository-url>
cd protsearchcd backend
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
# Create a .env file in the backend directory:
# OPENAI_API_KEY=your_key_here (optional)
# GOOGLE_API_KEY=your_key_here (optional)cd ../protsearch
# Install dependencies
npm install
# Set up environment variables (optional)
# Create a .env.local file:
# NEXT_PUBLIC_API_BASE=http://localhost:8080Terminal 1 - Backend:
cd backend
python app.py
# Backend will run on http://localhost:8080Terminal 2 - Frontend:
cd protsearch
npm run dev
# Frontend will run on http://localhost:3000Open http://localhost:3000 in your browser.
Backend:
cd backend
gunicorn --bind 0.0.0.0:8080 --workers 1 --threads 8 app:appFrontend:
cd protsearch
npm run build
npm startThe backend includes a Dockerfile for containerized deployment:
cd backend
docker build -t protsearch-api .
docker run -p 8080:8080 -e PORT=8080 protsearch-api-
Enter Proteins: Input one or more protein names (comma-separated), e.g.,
ACE, APP, BACE1 -
Choose Search Mode:
- Toggle to search for papers with ALL proteins together
- Or search for each protein individually
-
Add Search Terms (Optional): Add additional terms to narrow your search with AND/OR operators
-
Configure AI Summary (Optional): Add specific questions or focus areas for the AI summary
-
Provide API Key (Optional): Enter your OpenAI or Google Gemini API key for enhanced summaries
-
Start Search: Click "Start Research" to begin searching
-
Review Results:
- View papers in the "Papers" tab as they stream in
- Read AI-generated summaries in the "AI Summary" tab
- Copy abstracts or summaries for your research
The backend provides the following endpoints:
POST /api/search_start- Start a new search sessionGET /api/search_events?session_id=<id>- Stream search results via SSEPOST /api/suggest- Get protein name suggestions/validationPOST /api/summarize- Generate AI summary for a session
OPENAI_API_KEY- OpenAI API key for summaries (optional)GOOGLE_API_KEY- Google Gemini API key for summaries (optional)PORT- Server port (default: 8080)LOGLEVEL- Logging level (default: INFO)
NEXT_PUBLIC_API_BASE- Backend API URL (default: production API URL)
npm run dev- Start development servernpm run build- Build for productionnpm run start- Start production servernpm run lint- Run ESLintnpm run lint:fix- Fix ESLint errorsnpm run typecheck- Run TypeScript type checkingnpm run format:check- Check code formattingnpm run format:write- Format code
The backend uses Flask with threading for concurrent request handling. Key services:
- pubmedhelper.py: PubMed API integration
- genealias.py: Gene name validation and alias resolution
- llmhelper.py: OpenAI/Gemini integration
- summarizationwrapper.py: Summary generation orchestration
- uniprothelper.py: UniProt database integration
API keys are optional but enhance functionality:
-
With API Key:
- Uses full paper content when available
- Better quality AI summaries
- Access to more comprehensive results
-
Without API Key:
- Uses abstracts only
- Limited summary capabilities
- Still fully functional for paper discovery
API keys can be provided:
- In the frontend UI (stored in browser cookies)
- As environment variables in the backend
- Per-request in API calls
Contributions are welcome! Please feel free to submit a Pull Request.
- The backend uses Server-Sent Events (SSE) for real-time result streaming
- Session management is handled in-memory (consider Redis for production scaling)
- The frontend stores results in localStorage for persistence across page refreshes