Skip to content

Serve all client API requests from DB instead of ScrapingBee #37

@Jing-yilin

Description

@Jing-yilin

Problem

Currently, all client requests for trending/newest/ending sorts call ScrapingBee API, consuming 1 credit per request. This is wasteful since:

  • DB already has 5,614+ campaigns updated daily via cron
  • Multiple users requesting the same data = duplicate API calls
  • Unnecessary cost (~/mo plan being exhausted quickly)

Current Flow

User → API → ScrapingBee (2-5s, 1 credit) → Response

Current Credit Usage

  • Every trending request: 1 credit
  • Every newest request: 1 credit
  • Every ending request: 1 credit
  • Hot sort: already uses DB ✓

Solution

Serve all client queries from database:

User → API → PostgreSQL (<5ms, 0 credit) → Response

DB Mapping

  • trending → ORDER BY velocity_24h DESC
  • newest → ORDER BY first_seen_at DESC
  • ending → ORDER BY deadline ASC
  • hot → ORDER BY velocity_24h DESC (already implemented)

Required Changes

  1. campaigns.go: Replace ScrapingBee calls with DB queries
  2. model.go: Add indexes for performance
  3. db.go: Add composite indexes migration

Benefits

  • Cost: ~90% reduction in ScrapingBee usage
  • Speed: 2-5s → <5ms response time
  • Scalability: Support unlimited concurrent users
  • Data freshness: Still updated daily via cron (2:00 UTC)

Implementation

Add DB Indexes

CREATE INDEX idx_campaigns_trending ON campaigns(state, velocity_24h DESC, percent_funded DESC);
CREATE INDEX idx_campaigns_newest ON campaigns(state, first_seen_at DESC);
CREATE INDEX idx_campaigns_ending ON campaigns(state, deadline ASC);
CREATE INDEX idx_campaigns_category_trending ON campaigns(state, category_id, velocity_24h DESC);

Update Handler

All sort modes served from DB with offset-based pagination.

Related

  • Nightly cron already populates DB with fresh data
  • ScrapingBee only used for:
    • Nightly crawl (cron job)
    • User search queries (manual /search endpoint)

Acceptance Criteria

  • All trending/newest/ending requests use DB
  • Response time < 10ms for first page
  • Pagination works correctly
  • Category filtering works
  • ScrapingBee fallback only if DB unavailable
  • Add indexes to optimize queries
  • Verify credit usage drops to near-zero for client requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions