diff --git a/docs/development/Postgrest-design-docs/Detailed-design.md b/docs/development/Postgrest-design-docs/Detailed-design.md new file mode 100644 index 00000000..a8a9ffc7 --- /dev/null +++ b/docs/development/Postgrest-design-docs/Detailed-design.md @@ -0,0 +1,1263 @@ +# Adding a Supported Service + +This guide explains how to add a new supported service type to the control plane. +MCP is the reference implementation throughout. All touch points are summarized in +the [checklist](#checklist-adding-a-new-service-type) at the end. + +> [!NOTE] +> **Last updated after PR #280** (MCP YAML config) and PostgREST design work. +> +> Configuration delivery is now **service-type-specific**: +> - **MCP**: bind-mounted YAML files (`config.yaml`, `tokens.yaml`, `users.yaml`) +> at `/app/data/`, managed by `MCPConfigResource`. +> - **PostgREST** (planned): environment variables (`PGRST_*` + libpq `PG*`), +> injected directly into the container spec. +> +> Service user provisioning is also per-type: MCP gets fine-grained public-schema +> grants; PostgREST will get `NOINHERIT` + anonymous role grant. + +## Overview + +A supported service is a containerized application deployed alongside a pgEdge +database. Services are declared in `DatabaseSpec.Services`, provisioned after +the database is available, and deleted when the database is deleted. + +One `ServiceSpec` produces N `ServiceInstance` records — one per `host_id` in +the spec. Each instance runs as its own Docker Swarm service on the database's +overlay network, with its own dedicated Postgres credentials. + +## Design Constraint: Services Depend on the Database Only + +The current architecture constrains every service to have exactly one +dependency: the parent database. There is no concept of service provisioning +ordering or dependencies between services. All services declared in a database +spec are provisioned independently and in parallel once the database is +available. + +This means: + +- A service cannot declare a dependency on another service +- There is no ordering guarantee between services (service A may start before + or after service B) +- A service cannot discover another service's hostname or connection info at + provisioning time + +This constraint keeps the model simple and parallelizable, but it will likely +need to be relaxed in the future. Scaling out the AI workbench will require +multi-container service types where one component depends on another (e.g., a +web client that proxies to an API server). Supporting this will require +introducing service-to-service dependencies, provisioning ordering, and +health-gated startup — effectively a dependency graph within the service layer. + +The data flow from API request to running container looks like this: + +```text +API Request (spec.services) + → Validation (API layer) + → Store spec in etcd + → UpdateDatabase workflow + → PlanUpdate sub-workflow + → For each (service, host_id): + → Resolve service image from registry + → Validate Postgres/Spock version compatibility + → Generate resource objects (network, user role, spec, instance, monitor) + → Merge service resources into EndState + → Compute plan (diff current state vs. desired state) + → Apply plan (execute resource Create/Update/Delete) + → Service container running +``` + +## API Layer + +The `ServiceSpec` Goa type is defined in `api/apiv1/design/database.go`. It +lives inside the `DatabaseSpec` as an array: + +```go +g.Attribute("services", g.ArrayOf(ServiceSpec), func() { ... }) +``` + +The `ServiceSpec` type has these attributes: + +| Attribute | Type | Description | +|-----------|------|-------------| +| `service_id` | `Identifier` | Unique ID for this service within the database | +| `service_type` | `String` (enum) | The type of service (`"mcp"`, `"postgrest"`) | +| `version` | `String` | Semver (e.g., `"1.0.0"`) or `"latest"` | +| `host_ids` | `[]Identifier` | Which hosts should run this service (one instance per host) | +| `port` | `Int` (optional) | Host port to publish; `0` = random; omitted = not published | +| `config` | `MapOf(String, Any)` | Service-specific configuration | +| `cpus` | `String` (optional) | CPU limit; accepts SI suffix `m` (e.g., `"500m"`, `"1"`) | +| `memory` | `String` (optional) | Memory limit in SI or IEC notation (e.g., `"512M"`, `"1GiB"`) | + +To add a new service type, add its string value to the enum on `service_type`: + +```go +g.Attribute("service_type", g.String, func() { + g.Enum("mcp", "postgrest", "my-new-service") +}) +``` + +The `config` attribute is intentionally `MapOf(String, Any)` so the API schema +doesn't change when new service types are added. Config structure is validated +at the application layer (see [Validation](#validation)). + +After editing the design file, regenerate the API code: + +```sh +make -C api generate +``` + +## Validation + +There are two validation layers that catch different classes of errors. + +### API validation (HTTP 400) + +`validateServiceSpec()` in `server/internal/api/apiv1/validate.go` runs on +every API request that includes services. It performs fast, syntactic checks: + +1. **service_id**: must be a valid identifier (the shared `validateIdentifier` + function) +2. **service_type**: must be in the allowlist. Currently this is a direct check: + ```go + if svc.ServiceType != "mcp" { + // error: unsupported service type + } + ``` + Add your new type here. +3. **version**: must match semver format or be the literal `"latest"` +4. **host_ids**: must be unique within the service (duplicate host IDs are + rejected) +5. **config**: dispatches to a per-type config validator: + ```go + if svc.ServiceType == "mcp" { + errs = append(errs, validateMCPServiceConfig(svc.Config, ...)...) + } + ``` + +The per-type config validator is where you enforce required fields, type +correctness, and service-specific constraints. + +**MCP** (`validateMCPServiceConfig()`): Delegates to +`database.ParseMCPServiceConfig()` which validates all MCP config fields +(LLM provider, model, API keys, embedding, pool, tool toggles, bootstrap auth). +Accepts an `isUpdate` parameter to reject bootstrap-only fields on updates. + +**PostgREST** (`validatePostgRESTServiceConfig()`): Validates `db_schemas` +(required, non-empty string), `db_anon_role` (required, non-empty string), +`db_pool` (optional, integer 1–30), `max_rows` (optional, integer 1–10000). + +Write a parallel `validateMyServiceConfig()` function for your service type and +add a dispatch branch in `validateServiceSpec()`. + +### Workflow-time validation + +`GetServiceImage()` in `server/internal/orchestrator/swarm/service_images.go` is +called during workflow execution. If the `service_type`/`version` combination is +not registered in the image registry, it returns an error that fails the +workflow task and sets the database to `"failed"` state. + +This catches cases where the API validation passes (valid semver, known type) +but the specific version hasn't been registered. The E2E test +`TestProvisionMCPServiceUnsupportedVersion` in `e2e/service_provisioning_test.go` +demonstrates this: version `"99.99.99"` passes API validation but fails at +workflow time. + +## Service Image Registry + +The image registry maps `(serviceType, version)` pairs to container image +references. It lives in `server/internal/orchestrator/swarm/service_images.go`. + +### Data structures + +```go +type ServiceImage struct { + Tag string // Full image:tag reference + PostgresConstraint *host.VersionConstraint // Optional: restrict PG versions + SpockConstraint *host.VersionConstraint // Optional: restrict Spock versions +} + +type ServiceVersions struct { + images map[string]map[string]*ServiceImage // serviceType → version → image +} +``` + +### Registering a new service type + +Add your image in `NewServiceVersions()`: + +```go +func NewServiceVersions(cfg config.Config) *ServiceVersions { + versions := &ServiceVersions{...} + + // Existing MCP registration + versions.addServiceImage("mcp", "latest", &ServiceImage{ + Tag: serviceImageTag(cfg, "postgres-mcp:latest"), + }) + + // Your new service + versions.addServiceImage("my-service", "1.0.0", &ServiceImage{ + Tag: serviceImageTag(cfg, "my-service:1.0.0"), + }) + + return versions +} +``` + +`serviceImageTag()` prepends the configured registry host +(`cfg.DockerSwarm.ImageRepositoryHost`) unless the image reference already +contains a registry prefix (detected by checking if the first path component +contains a `.`, `:`, or is `localhost`). + +### Version constraints + +If your service is only compatible with specific Postgres or Spock versions, set +the constraint fields: + +```go +versions.addServiceImage("my-service", "1.0.0", &ServiceImage{ + Tag: serviceImageTag(cfg, "my-service:1.0.0"), + PostgresConstraint: &host.VersionConstraint{ + Min: host.MustParseVersion("15"), + Max: host.MustParseVersion("17"), + }, + SpockConstraint: &host.VersionConstraint{ + Min: host.MustParseVersion("4.0.0"), + }, +}) +``` + +`ValidateCompatibility()` checks these constraints against the database's +running versions during `GenerateServiceInstanceResources()` in the workflow. +Constraint failures produce errors like `"postgres version 14 does not satisfy +constraint >=15"`. + +## Resource Lifecycle + +Every service instance is represented by five resources that participate in the +standard plan/apply reconciliation cycle. These resource types are generic — +adding a new service type does **not** require new resource types. + +### Dependency chain + +The resource chain varies by service type. MCP has additional config file +resources; PostgREST uses the base 4-resource chain. + +**MCP (post-PR-280):** +```text +Phase 1: Network (swarm.network) — no dependencies + ServiceUserRole (swarm.service_user_role) — no dependencies +Phase 2: DirResource (filesystem.dir) — no dependencies +Phase 3: MCPConfigResource (swarm.mcp_config) — depends on DirResource + ServiceUserRole +Phase 4: ServiceInstanceSpec (swarm.service_instance_spec) — depends on Network + ServiceUserRole + MCPConfigResource +Phase 5: ServiceInstance (swarm.service_instance) — depends on ServiceUserRole + ServiceInstanceSpec +Phase 6: ServiceInstanceMonitor (monitor.service_instance) — depends on ServiceInstance +``` + +**PostgREST (and base pattern for new services):** +```text +Phase 1: Network (swarm.network) — no dependencies + ServiceUserRole (swarm.service_user_role) — no dependencies +Phase 2: ServiceInstanceSpec (swarm.service_instance_spec) — depends on Network + ServiceUserRole +Phase 3: ServiceInstance (swarm.service_instance) — depends on ServiceUserRole + ServiceInstanceSpec +Phase 4: ServiceInstanceMonitor (monitor.service_instance) — depends on ServiceInstance +``` + +On deletion, the order reverses: monitor first, then instance, spec, and +finally the network and user role. + +### What each resource does + +**Network** (`server/internal/orchestrator/swarm/network.go`): Creates a Docker +Swarm overlay network for the database. The network is shared between Postgres +instances and service instances of the same database, so it deduplicates +naturally via identifier matching (both generate the same +`"{databaseID}-database"` name). Uses the IPAM service to allocate subnets. +Runs on `ManagerExecutor`. + +**ServiceUserRole** (`server/internal/orchestrator/swarm/service_user_role.go`): +Manages the Postgres user lifecycle for a service instance. On `Create`, it +generates a deterministic username via `database.GenerateServiceUsername()` (format: +`svc_{serviceID}_{hostID}`), generates a random 32-byte password, and creates a +Postgres role with `LOGIN` attribute. It then runs **service-type-specific grants**: + +- **MCP**: `GRANT CONNECT`, `GRANT USAGE/SELECT ON SCHEMA public`, + `ALTER DEFAULT PRIVILEGES`, `GRANT pg_read_all_settings` +- **PostgREST**: `ALTER ROLE ... NOINHERIT`, `GRANT web_anon TO svc_*`, + `GRANT CONNECT` + +On `Delete`, it drops the role. Runs on `HostExecutor(postgresHostID)` because it +needs Docker access to the Postgres container for direct database connectivity. See +`docs/development/service-credentials.md` for full details on credential generation. + +**ServiceInstanceSpec** (`server/internal/orchestrator/swarm/service_instance_spec.go`): +A virtual resource that generates the Docker Swarm `ServiceSpec`. Its +`Refresh()`, `Create()`, and `Update()` methods all call `ServiceContainerSpec()` +to compute the spec from the current inputs. The computed spec is stored in the +`Spec` field and consumed by the `ServiceInstance` resource during deployment. +`Delete()` is a no-op. Runs on `HostExecutor(hostID)`. + +**ServiceInstance** (`server/internal/orchestrator/swarm/service_instance.go`): +The resource that actually deploys the Docker Swarm service. On `Create`, it +stores an initial etcd record with `state="creating"`, then calls +`client.ServiceDeploy()` with the spec from `ServiceInstanceSpec`, waits up to +5 minutes for the service to start, and transitions the state to `"running"`. On +`Delete`, it scales the service to 0 replicas (waiting for containers to stop), +removes the Docker service, and deletes the etcd record. Runs on +`ManagerExecutor`. + +**ServiceInstanceMonitor** (`server/internal/monitor/service_instance_monitor_resource.go`): +Registers (or deregisters) a health monitor for the service instance. The +monitor periodically checks the service's `/health` endpoint and updates status +in etcd. Runs on `HostExecutor(hostID)`. + +### How resources are generated + +`GenerateServiceInstanceResources()` in +`server/internal/orchestrator/swarm/orchestrator.go` is the entry point. It: + +1. Resolves the `ServiceImage` via `GetServiceImage(serviceType, version)` +2. Validates Postgres/Spock version compatibility if constraints exist +3. Creates base resources (`Network`, `ServiceUserRole`) shared by all service types +4. **Branches by service type** to build the resource chain: + - **MCP**: Parses config via `database.ParseMCPServiceConfig()`, creates + `DirResource` (host directory chowned to UID 1001), creates `MCPConfigResource` + (writes `config.yaml`, `tokens.yaml`, `users.yaml`), then `ServiceInstanceSpec` + (with `DataDirID`) and `ServiceInstance` + - **PostgREST**: Creates `ServiceInstanceSpec` (no `DataDirID`) and + `ServiceInstance` directly — no config file resources needed +5. Serializes resources to `[]*resource.ResourceData` via `resource.ToResourceData()` +6. Returns a `*database.ServiceInstanceResources` wrapper + +The monitor resource is added separately in the workflow layer (see +[Workflow Integration](#workflow-integration)). + +All resource types are registered in +`server/internal/orchestrator/swarm/resources.go` via +`resource.RegisterResourceType[*T](registry, ResourceTypeConstant)`, which +enables the resource framework to deserialize stored state back into typed +structs. + +## Container Spec + +`ServiceContainerSpec()` in `server/internal/orchestrator/swarm/service_spec.go` +builds the Docker Swarm `ServiceSpec` from a `ServiceContainerSpecOptions` +struct. The options contain everything needed to build the spec: the +`ServiceSpec`, credentials, image, network IDs, database connection info, and +placement constraints. + +The generated spec configures: + +- **Placement**: pins the container to a specific Swarm node via + `node.id==` +- **Networks**: attaches to both the default bridge network (for control plane + access and external connectivity) and the database overlay network (for + Postgres connectivity) +- **Port publication**: `buildServicePortConfig()` publishes port 8080 in host + mode. If the `port` field in the spec is nil, no port is published. If it's + 0, Docker assigns a random port. If it's a specific value, that port is used. +- **Health check**: currently configured to `curl -f http://localhost:8080/health` + with a 30s start period, 10s interval, 5s timeout, and 3 retries +- **Resource limits**: CPU and memory limits from the spec, if provided + +Configuration delivery is **service-type-specific**. `ServiceContainerSpec()` +branches on `ServiceType` to configure each container differently: + +| Service Type | Config Delivery | Command | User | Mounts | +|-------------|----------------|---------|------|--------| +| MCP | YAML files at `/app/data/` (managed by `MCPConfigResource`) | `/app/pgedge-postgres-mcp -config /app/data/config.yaml` | UID 1001 | Bind mount: `{DataPath}` → `/app/data` | +| PostgREST | Env vars (`PGRST_*` + libpq `PG*`) via `buildPostgRESTEnvVars()` | Image default | Image default | None | + +For a new service type, you will need to: + +- Add a `case` branch in the `switch opts.ServiceSpec.ServiceType` block within + `ServiceContainerSpec()` +- Extend `ServiceContainerSpecOptions` if your service needs additional fields + (e.g., `DataPath` for bind mounts) +- If your service uses config files, create a config resource (like + `MCPConfigResource`) and wire it into the resource chain in + `GenerateServiceInstanceResources()` + +## Workflow Integration + +The workflow layer is generic. No per-service-type changes are needed here. + +### PlanUpdate + +`PlanUpdate` in `server/internal/workflows/plan_update.go` is the sub-workflow +that computes the reconciliation plan. It: + +1. Computes `NodeInstances` from the spec +2. Generates node resources (same as before services existed) +3. Determines a `postgresHostID` for `ServiceUserRole` executor routing — + `ServiceUserRole.Create()` needs local Docker access to the Postgres + container, so it picks the first available Postgres host +4. Iterates `spec.Services` and for each `(service, hostID)` pair, calls + `getServiceResources()` +5. Passes both node and service resources to `operations.UpdateDatabase()` + +### getServiceResources + +`getServiceResources()` in the same file builds an `operations.ServiceResources` +for a single service instance: + +1. Generates the `ServiceInstanceID` via + `database.GenerateServiceInstanceID(databaseID, serviceID, hostID)` +2. Resolves the Postgres hostname via `findPostgresInstance()`, which prefers a + co-located instance (same host as the service) for lower latency but falls + back to any instance in the database +3. Constructs a `database.ServiceInstanceSpec` with all the inputs +4. Fires the `GenerateServiceInstanceResources` activity (executes on the + manager queue) +5. Wraps the result in `operations.ServiceResources`, adding the + `ServiceInstanceMonitorResource` + +### EndState + +`EndState()` in `server/internal/database/operations/end.go` merges service +resources into the desired end state: + +```go +for _, svc := range services { + state, err := svc.State() + end.Merge(state) +} +``` + +Service resources always land in the final plan, after all node operations. This +is because intermediate states (from `UpdateNodes`, `AddNodes`, +`PopulateNodes`) only contain node diffs. `PlanAll` produces one plan per state +transition, so services end up in the last plan (the diff from the last +intermediate state to EndState). + +Resources that exist in the current state but are absent from the end state are +automatically marked `PendingDeletion` by the plan engine, which generates +delete events in reverse dependency order. + +## Domain Model + +### ServiceSpec + +`server/internal/database/spec.go`: + +```go +type ServiceSpec struct { + ServiceID string `json:"service_id"` + ServiceType string `json:"service_type"` + Version string `json:"version"` + HostIDs []string `json:"host_ids"` + Config map[string]any `json:"config"` + Port *int `json:"port,omitempty"` + CPUs *float64 `json:"cpus,omitempty"` + MemoryBytes *uint64 `json:"memory,omitempty"` +} +``` + +This is the spec-level declaration that lives inside `Spec.Services`. It's +service-type-agnostic — no fields are MCP-specific. The `Config` map holds all +service-specific settings. + +### ServiceInstance + +`server/internal/database/service_instance.go`: + +The runtime artifact that tracks an individual container's state. Key fields: +`ServiceInstanceID`, `ServiceID`, `DatabaseID`, `HostID`, `State` +(`creating`/`running`/`failed`/`deleting`), `Status` (container ID, hostname, +IP, port mappings, health), and `Credentials` (`ServiceUser` with username and +password). + +### ID generation + +| Function | Format | Example | +|----------|--------|---------| +| `GenerateServiceInstanceID(dbID, svcID, hostID)` | `"{dbID}-{svcID}-{hostID}"` | `"mydb-mcp-host1"` | +| `GenerateServiceUsername(svcID, hostID)` | `"svc_{svcID}_{hostID}"` | `"svc_mcp_host1"` | +| `GenerateDatabaseNetworkID(dbID)` | `"{dbID}"` | `"mydb"` | + +Usernames longer than 63 characters are truncated with a deterministic hash +suffix. See `docs/development/service-credentials.md` for details. + +### ServiceResources + +`server/internal/database/operations/common.go`: + +```go +type ServiceResources struct { + ServiceInstanceID string + Resources []*resource.ResourceData + MonitorResource resource.Resource +} +``` + +The operations-layer wrapper that bridges the orchestrator output and the +planning system. `Resources` holds the serialized orchestrator resources (from +`GenerateServiceInstanceResources`). `MonitorResource` is the +`ServiceInstanceMonitorResource`. The `State()` method merges both into a +`resource.State` for use in `EndState()`. + +## Testing + +### Unit tests + +**Container spec tests** in +`server/internal/orchestrator/swarm/service_spec_test.go`: + +Table-driven tests for `ServiceContainerSpec()` and its helpers. The test +pattern uses per-case check functions: + +```go +{ + name: "basic MCP service", + opts: &ServiceContainerSpecOptions{...}, + checks: []checkFunc{ + checkLabels(expectedLabels), + checkNetworks("bridge", "my-db-database"), + checkEnv("PGHOST=...", "PGPORT=5432", ...), + checkPlacement("node.id==swarm-node-1"), + checkHealthcheck("/health", 8080), + checkPorts(8080, 5434), + }, +} +``` + +Add test cases for your new service type here, particularly if it has different +health check endpoints, ports, or other spec differences. + +**Image registry tests** in +`server/internal/orchestrator/swarm/service_images_test.go`: + +Tests `GetServiceImage()`, `SupportedServiceVersions()`, and +`ValidateCompatibility()`. Covers both happy path (valid type + version) and +error cases (unsupported type, unregistered version, constraint violations). Add +test cases for your new service type's image registration and any version +constraints. + +### Golden plan tests + +The golden plan tests in +`server/internal/database/operations/update_database_test.go` validate that +service resources are correctly integrated into the plan/apply reconciliation +cycle. + +**How they work:** + +1. Build a start `*resource.State` representing the current state of the world +2. Build `[]*operations.NodeResources` and `[]*operations.ServiceResources` + representing the desired state +3. Call `operations.UpdateDatabase()` to compute the plan +4. Summarize the plans via `resource.SummarizePlans()` +5. Compare against a committed JSON golden file via `testutils.GoldenTest[T]` + +**Test helpers** in `server/internal/database/operations/helpers_test.go` define +stub resource types that mirror the real swarm types' `Identifier()`, +`Dependencies()`, `DiffIgnore()`, and `Executor()` without importing the swarm +package. This avoids pulling in the Docker SDK and keeps tests self-contained. +The stubs use the `orchestratorResource` embedding pattern already established +in the file. + +> [!NOTE] +> The service-specific test stubs (`makeServiceResources()` and its companion +> types) are being added as part of PLAT-412. Once merged, `makeServiceResources()` +> will construct a complete set of stub resources for a single service instance, +> serialize them to `[]*resource.ResourceData`, create the real +> `monitor.ServiceInstanceMonitorResource`, and return the +> `operations.ServiceResources` wrapper. + +**Five standard test cases** will cover the full lifecycle: + +| Test case | Start state | Services | Verifies | +|-----------|-------------|----------|----------| +| `single node with service from empty` | empty | 1 service | Service resources created in correct phase order alongside database resources | +| `single node with service no-op` | node + service | same service | Unchanged services produce an empty plan (the core regression test) | +| `add service to existing database` | node only | 1 new service | Only service create events, no database changes | +| `remove service from existing database` | node + service | nil | Service delete events in reverse dependency order, database unchanged | +| `update database node with unchanged service` | node + service | same service | Only database update events, service resources untouched | + +These test cases are generic and apply regardless of service type. To regenerate +golden files after changes: + +```sh +go test ./server/internal/database/operations/... -run TestUpdateDatabase -update +``` + +Always review the generated JSON files in +`golden_test/TestUpdateDatabase/` before committing. + +### E2E tests + +E2E tests in `e2e/service_provisioning_test.go` validate service provisioning +against a real control plane cluster. + +Build tag: `//go:build e2e_test` + +The tests use `fixture.NewDatabaseFixture()` for auto-cleanup and poll the API +for state transitions. Key patterns to replicate for a new service type: + +| Pattern | Example test | What it validates | +|---------|-------------|-------------------| +| Single-host provision | `TestProvisionMCPService` | Service reaches `"running"` state | +| Multi-host provision | `TestProvisionMultiHostMCPService` | One instance per host, all reach `"running"` | +| Add to existing DB | `TestUpdateDatabaseAddService` | Service added without affecting database | +| Remove from DB | `TestUpdateDatabaseRemoveService` | Empty `Services` array removes the service | +| Stability | `TestUpdateDatabaseServiceStable` | Unrelated DB update doesn't recreate service (checks `created_at` and `container_id` unchanged) | +| Bad version | `TestProvisionMCPServiceUnsupportedVersion` | Unregistered version fails task, DB goes to `"failed"` | +| Recovery | `TestProvisionMCPServiceRecovery` | Failed DB recovered by updating with valid version | + +Run service E2E tests: + +```sh +make test-e2e E2E_RUN=TestProvisionMCPService +``` + +## Checklist: Adding a New Service Type + +| Step | File | Change | +|------|------|--------| +| 1. API enum | `api/apiv1/design/database.go` | Add to `g.Enum(...)` on `service_type` | +| 2. Regenerate | — | `make -C api generate` | +| 3. Validation | `server/internal/api/apiv1/validate.go` | Add type to allowlist in `validateServiceSpec()`; add `validateMyServiceConfig()` function | +| 4. Image registry | `server/internal/orchestrator/swarm/service_images.go` | Call `versions.addServiceImage()` in `NewServiceVersions()` | +| 5. Container spec | `server/internal/orchestrator/swarm/service_spec.go` | Add `case` branch in `ServiceContainerSpec()` for env vars, command, user, mounts | +| 6. Service user grants | `server/internal/orchestrator/swarm/service_user_role.go` | Add `case` branch in `Create()` for service-type-specific Postgres grants | +| 7. Resource chain | `server/internal/orchestrator/swarm/orchestrator.go` | Add `case` branch in `GenerateServiceInstanceResources()` if service needs extra resources (config files, directories) | +| 8. Unit tests | `swarm/service_spec_test.go`, `swarm/service_images_test.go` | Add cases for new type | +| 9. Golden plan tests | `operations/update_database_test.go` | Already covered generically; regenerate with `-update` if resource shape changes | +| 10. E2E tests | `e2e/service_provisioning_test.go` | Add provision, lifecycle, stability, and failure/recovery tests | + +For a concrete example, see `docs/development/Postgrest-design-docs/postgrest-control-plane-changes.md` +which walks through every change needed to add PostgREST as the second service type. + +## What Doesn't Change + +The following are service-type-agnostic and require no modification when adding +a new service type: + +- `ServiceSpec` struct — `server/internal/database/spec.go` +- `ServiceInstance` domain model — `server/internal/database/service_instance.go` +- Workflow code — `server/internal/workflows/plan_update.go` +- Base resource types — `Network`, `ServiceInstanceSpec`, `ServiceInstance`, + `ServiceInstanceMonitor` (all generic) +- Operations layer — `server/internal/database/operations/` (`UpdateDatabase`, + `EndState`) +- Store/etcd layer + +> [!NOTE] +> `orchestrator.go`, `service_user_role.go`, and `service_spec.go` **do** require +> per-service-type branches (for resource chain selection, grant logic, and +> container configuration respectively). These are the three files where +> service-type-specific logic lives. + +## Future Work + +- ~~**Persistent bind mounts**~~: **Done** (PR #280). MCP uses `DirResource` + + bind mount at `/app/data/`. The pattern is available for new service types + that need persistent config or state. +- ~~**Read/write service user accounts**~~: **Partially done** (PR #280 + PostgREST + design). `ServiceUserRole.Create()` now branches on `ServiceType` with + per-type grant logic. MCP gets public-schema read + `pg_read_all_settings`. + PostgREST will get `NOINHERIT` + anonymous role grant. Still needed: a + general-purpose `allow_writes` mechanism for MCP (requires the primary-awareness + fix below). +- **Primary-aware database connection routing**: Services currently connect to a + Postgres instance resolved at provisioning time by `findPostgresInstance()`, + which prefers a co-located instance but does not distinguish between primary + and replica. This blocks MCP `allow_writes` and affects PostgREST after + failover. Recommended approach: change `DatabaseHost string` to + `DatabaseHosts []string` in `ServiceContainerSpecOptions`, set `PGHOST` to a + comma-separated list of all node hostnames, and add + `PGTARGETSESSIONATTRS=read-write`. libpq handles primary discovery + automatically. Alternative: route through a Patroni-aware proxy (HAProxy or + pgBouncer) if connection pooling is planned. See + `docs/development/Postgrest-design-docs/postgrest-service-qa.md` Gap 2 for + the full analysis and options. +- **PostgREST schema setup validation**: PostgREST requires `api` schema and + `web_anon` role to exist before deploy. A preflight SQL check would catch + missing prerequisites at API validation time rather than failing silently + (PostgREST starts but returns 404 for all endpoints). + +## Appendix: MCP Reference Implementation for AI-Assisted Development + +> [!CAUTION] +> The code snippets below reflect the **pre-PR-280** state of `main`. After PR #280 +> merges, the following will be stale: +> - **A.2** (`validateMCPServiceConfig`): rewritten to delegate to `database.ParseMCPServiceConfig()` +> - **A.4** (`buildServiceEnvVars`, `ServiceContainerSpec`): env vars removed, replaced with YAML config + bind mounts +> - Service user role grants changed from `pgedge_application_read_only` to fine-grained SQL +> +> For PostgREST-specific code changes, see +> `docs/development/Postgrest-design-docs/postgrest-control-plane-changes.md`. + +> [!NOTE] +> This section is designed for consumption by Claude Code (or similar AI +> assistants). When a developer asks Claude to help add a new service type, point +> it at this document. The code below provides complete, copy-editable reference +> implementations from MCP at each touch point, so the assistant can work from +> concrete examples rather than having to read every source file independently. + +### A.1 API Enum (`api/apiv1/design/database.go`) + +The `ServiceSpec` Goa type. To add a new service type, add it to the `g.Enum()` +call on `service_type`: + +```go +// lines 125–191 +var ServiceSpec = g.Type("ServiceSpec", func() { + g.Attribute("service_id", Identifier, func() { + g.Description("The unique identifier for this service.") + g.Example("mcp-server") + g.Example("analytics-service") + g.Meta("struct:tag:json", "service_id") + }) + g.Attribute("service_type", g.String, func() { + g.Description("The type of service to run.") + g.Enum("mcp", "postgrest") // ← add new type here, e.g. g.Enum("mcp", "postgrest", "my-service") + g.Example("mcp") + g.Meta("struct:tag:json", "service_type") + }) + g.Attribute("version", g.String, func() { + g.Description("The version of the service in semver format (e.g., '1.0.0') or the literal 'latest'.") + g.Pattern(serviceVersionPattern) // `^\d+\.\d+\.\d+|latest$` + g.Example("1.0.0") + g.Example("latest") + g.Meta("struct:tag:json", "version") + }) + g.Attribute("host_ids", HostIDs, func() { + g.Description("The IDs of the hosts that should run this service.") + g.MinLength(1) + g.Meta("struct:tag:json", "host_ids") + }) + g.Attribute("port", g.Int, func() { + g.Description("The port to publish the service on the host.") + g.Minimum(0) + g.Maximum(65535) + g.Meta("struct:tag:json", "port,omitempty") + }) + g.Attribute("config", g.MapOf(g.String, g.Any), func() { + g.Description("Service-specific configuration.") + g.Meta("struct:tag:json", "config") + }) + g.Attribute("cpus", g.String, func() { + g.Description("CPU limit. Accepts SI suffix 'm', e.g. '500m'.") + g.Pattern(cpuPattern) + g.Meta("struct:tag:json", "cpus,omitempty") + }) + g.Attribute("memory", g.String, func() { + g.Description("Memory limit in SI or IEC notation, e.g. '512M', '1GiB'.") + g.MaxLength(16) + g.Meta("struct:tag:json", "memory,omitempty") + }) + + g.Required("service_id", "service_type", "version", "host_ids", "config") +}) +``` + +After editing, run `make -C api generate`. + +### A.2 Validation (`server/internal/api/apiv1/validate.go`) + +**`validateServiceSpec()`** — the dispatcher. Add your type to the allowlist and +config dispatch: + +```go +// lines 229–280 +func validateServiceSpec(svc *api.ServiceSpec, path []string) []error { + var errs []error + + serviceIDPath := appendPath(path, "service_id") + errs = append(errs, validateIdentifier(string(svc.ServiceID), serviceIDPath)) + + // ← add your type to this check + if svc.ServiceType != "mcp" { + err := fmt.Errorf("unsupported service type '%s' (only 'mcp' is currently supported)", svc.ServiceType) + errs = append(errs, newValidationError(err, appendPath(path, "service_type"))) + } + + if svc.Version != "latest" && !semverPattern.MatchString(svc.Version) { + err := errors.New("version must be in semver format (e.g., '1.0.0') or 'latest'") + errs = append(errs, newValidationError(err, appendPath(path, "version"))) + } + + seenHostIDs := make(ds.Set[string], len(svc.HostIds)) + for i, hostID := range svc.HostIds { + hostIDStr := string(hostID) + hostIDPath := appendPath(path, "host_ids", arrayIndexPath(i)) + errs = append(errs, validateIdentifier(hostIDStr, hostIDPath)) + if seenHostIDs.Has(hostIDStr) { + err := errors.New("host IDs must be unique within a service") + errs = append(errs, newValidationError(err, hostIDPath)) + } + seenHostIDs.Add(hostIDStr) + } + + // ← add dispatch for your type here + if svc.ServiceType == "mcp" { + errs = append(errs, validateMCPServiceConfig(svc.Config, appendPath(path, "config"))...) + } + + if svc.Cpus != nil { + errs = append(errs, validateCPUs(svc.Cpus, appendPath(path, "cpus"))...) + } + if svc.Memory != nil { + errs = append(errs, validateMemory(svc.Memory, appendPath(path, "memory"))...) + } + + return errs +} +``` + +**`validateMCPServiceConfig()`** — the per-type config validator to use as a +template: + +```go +// lines 283–330 +func validateMCPServiceConfig(config map[string]any, path []string) []error { + var errs []error + + requiredFields := []string{"llm_provider", "llm_model"} + for _, field := range requiredFields { + if _, ok := config[field]; !ok { + err := fmt.Errorf("missing required field '%s'", field) + errs = append(errs, newValidationError(err, path)) + } + } + + if val, exists := config["llm_provider"]; exists { + provider, ok := val.(string) + if !ok { + err := errors.New("llm_provider must be a string") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("llm_provider")))) + } else { + validProviders := []string{"anthropic", "openai", "ollama"} + if !slices.Contains(validProviders, provider) { + err := fmt.Errorf("unsupported llm_provider '%s' (must be one of: %s)", + provider, strings.Join(validProviders, ", ")) + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("llm_provider")))) + } + + switch provider { + case "anthropic": + if _, ok := config["anthropic_api_key"]; !ok { + err := errors.New("missing required field 'anthropic_api_key' for anthropic provider") + errs = append(errs, newValidationError(err, path)) + } + case "openai": + if _, ok := config["openai_api_key"]; !ok { + err := errors.New("missing required field 'openai_api_key' for openai provider") + errs = append(errs, newValidationError(err, path)) + } + case "ollama": + if _, ok := config["ollama_url"]; !ok { + err := errors.New("missing required field 'ollama_url' for ollama provider") + errs = append(errs, newValidationError(err, path)) + } + } + } + } + + return errs +} +``` + +### A.3 Image Registry (`server/internal/orchestrator/swarm/service_images.go`) + +Register the image in `NewServiceVersions()`: + +```go +// lines 39–68 +func NewServiceVersions(cfg config.Config) *ServiceVersions { + versions := &ServiceVersions{ + cfg: cfg, + images: make(map[string]map[string]*ServiceImage), + } + + // MCP service versions + versions.addServiceImage("mcp", "latest", &ServiceImage{ + Tag: serviceImageTag(cfg, "postgres-mcp:latest"), + }) + + // ← add your service here: + // versions.addServiceImage("my-service", "1.0.0", &ServiceImage{ + // Tag: serviceImageTag(cfg, "my-service:1.0.0"), + // }) + + return versions +} +``` + +Supporting types: + +```go +type ServiceImage struct { + Tag string `json:"tag"` + PostgresConstraint *host.VersionConstraint `json:"postgres_constraint,omitempty"` + SpockConstraint *host.VersionConstraint `json:"spock_constraint,omitempty"` +} + +// GetServiceImage resolves (serviceType, version) → *ServiceImage. +// Returns an error if the type or version is unregistered. +func (sv *ServiceVersions) GetServiceImage(serviceType string, version string) (*ServiceImage, error) { + versionMap, ok := sv.images[serviceType] + if !ok { + return nil, fmt.Errorf("unsupported service type %q", serviceType) + } + image, ok := versionMap[version] + if !ok { + return nil, fmt.Errorf("unsupported version %q for service type %q", version, serviceType) + } + return image, nil +} + +// serviceImageTag prepends the configured registry host unless the image +// reference already contains a registry prefix. +func serviceImageTag(cfg config.Config, imageRef string) string { + if strings.Contains(imageRef, "/") { + parts := strings.Split(imageRef, "/") + firstPart := parts[0] + if strings.Contains(firstPart, ".") || strings.Contains(firstPart, ":") || firstPart == "localhost" { + return imageRef + } + } + if cfg.DockerSwarm.ImageRepositoryHost == "" { + return imageRef + } + return fmt.Sprintf("%s/%s", cfg.DockerSwarm.ImageRepositoryHost, imageRef) +} +``` + +### A.4 Container Spec (`server/internal/orchestrator/swarm/service_spec.go`) + +**`ServiceContainerSpecOptions`** — the input struct: + +```go +// lines 15–32 +type ServiceContainerSpecOptions struct { + ServiceSpec *database.ServiceSpec + ServiceInstanceID string + DatabaseID string + DatabaseName string + HostID string + ServiceName string + Hostname string + CohortMemberID string + ServiceImage *ServiceImage + Credentials *database.ServiceUser + DatabaseNetworkID string + DatabaseHost string + DatabasePort int + Port *int +} +``` + +**`ServiceContainerSpec()`** — builds the Docker Swarm `ServiceSpec`: + +```go +// lines 35–117 +func ServiceContainerSpec(opts *ServiceContainerSpecOptions) (swarm.ServiceSpec, error) { + labels := map[string]string{ + "pgedge.component": "service", + "pgedge.service.instance.id": opts.ServiceInstanceID, + "pgedge.service.id": opts.ServiceSpec.ServiceID, + "pgedge.database.id": opts.DatabaseID, + "pgedge.host.id": opts.HostID, + } + + networks := []swarm.NetworkAttachmentConfig{ + {Target: "bridge"}, + {Target: opts.DatabaseNetworkID}, + } + + env := buildServiceEnvVars(opts) + image := opts.ServiceImage.Tag + ports := buildServicePortConfig(opts.Port) + + var resources *swarm.ResourceRequirements + if opts.ServiceSpec.CPUs != nil || opts.ServiceSpec.MemoryBytes != nil { + resources = &swarm.ResourceRequirements{ + Limits: &swarm.Limit{}, + } + if opts.ServiceSpec.CPUs != nil { + resources.Limits.NanoCPUs = int64(*opts.ServiceSpec.CPUs * 1e9) + } + if opts.ServiceSpec.MemoryBytes != nil { + resources.Limits.MemoryBytes = int64(*opts.ServiceSpec.MemoryBytes) + } + } + + return swarm.ServiceSpec{ + TaskTemplate: swarm.TaskSpec{ + ContainerSpec: &swarm.ContainerSpec{ + Image: image, + Labels: labels, + Hostname: opts.Hostname, + Env: env, + Healthcheck: &container.HealthConfig{ + Test: []string{"CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"}, + StartPeriod: time.Second * 30, + Interval: time.Second * 10, + Timeout: time.Second * 5, + Retries: 3, + }, + Mounts: []mount.Mount{}, + }, + Networks: networks, + Placement: &swarm.Placement{ + Constraints: []string{ + "node.id==" + opts.CohortMemberID, + }, + }, + Resources: resources, + }, + EndpointSpec: &swarm.EndpointSpec{ + Mode: swarm.ResolutionModeVIP, + Ports: ports, + }, + Annotations: swarm.Annotations{ + Name: opts.ServiceName, + Labels: labels, + }, + }, nil +} +``` + +**`buildServiceEnvVars()`** — MCP-specific env var injection (this is expected to +change; see the caution at the top of this document): + +```go +// lines 120–168 +func buildServiceEnvVars(opts *ServiceContainerSpecOptions) []string { + env := []string{ + fmt.Sprintf("PGHOST=%s", opts.DatabaseHost), + fmt.Sprintf("PGPORT=%d", opts.DatabasePort), + fmt.Sprintf("PGDATABASE=%s", opts.DatabaseName), + "PGSSLMODE=prefer", + fmt.Sprintf("PGEDGE_SERVICE_ID=%s", opts.ServiceSpec.ServiceID), + fmt.Sprintf("PGEDGE_DATABASE_ID=%s", opts.DatabaseID), + } + + if opts.Credentials != nil { + env = append(env, + fmt.Sprintf("PGUSER=%s", opts.Credentials.Username), + fmt.Sprintf("PGPASSWORD=%s", opts.Credentials.Password), + ) + } + + // MCP-specific config → env var mapping + if provider, ok := opts.ServiceSpec.Config["llm_provider"].(string); ok { + env = append(env, fmt.Sprintf("PGEDGE_LLM_PROVIDER=%s", provider)) + } + if model, ok := opts.ServiceSpec.Config["llm_model"].(string); ok { + env = append(env, fmt.Sprintf("PGEDGE_LLM_MODEL=%s", model)) + } + + if provider, ok := opts.ServiceSpec.Config["llm_provider"].(string); ok { + switch provider { + case "anthropic": + if key, ok := opts.ServiceSpec.Config["anthropic_api_key"].(string); ok { + env = append(env, fmt.Sprintf("PGEDGE_ANTHROPIC_API_KEY=%s", key)) + } + case "openai": + if key, ok := opts.ServiceSpec.Config["openai_api_key"].(string); ok { + env = append(env, fmt.Sprintf("PGEDGE_OPENAI_API_KEY=%s", key)) + } + case "ollama": + if url, ok := opts.ServiceSpec.Config["ollama_url"].(string); ok { + env = append(env, fmt.Sprintf("PGEDGE_OLLAMA_URL=%s", url)) + } + } + } + + return env +} +``` + +**`buildServicePortConfig()`** — port publication: + +```go +// lines 175–196 +func buildServicePortConfig(port *int) []swarm.PortConfig { + if port == nil { + return nil + } + config := swarm.PortConfig{ + PublishMode: swarm.PortConfigPublishModeHost, + TargetPort: 8080, + Name: "http", + Protocol: swarm.PortConfigProtocolTCP, + } + if *port > 0 { + config.PublishedPort = uint32(*port) + } else if *port == 0 { + config.PublishedPort = 0 + } + return []swarm.PortConfig{config} +} +``` + +### A.5 Domain Model (`server/internal/database/spec.go`) + +```go +// lines 116–125 +type ServiceSpec struct { + ServiceID string `json:"service_id"` + ServiceType string `json:"service_type"` + Version string `json:"version"` + HostIDs []string `json:"host_ids"` + Config map[string]any `json:"config"` + Port *int `json:"port,omitempty"` + CPUs *float64 `json:"cpus,omitempty"` + MemoryBytes *uint64 `json:"memory,omitempty"` +} +``` + +This struct is service-type-agnostic. No changes needed when adding a new type. + +### A.6 E2E Test Pattern (`e2e/service_provisioning_test.go`) + +Complete example of provisioning a service and verifying it reaches `"running"` +state: + +```go +// lines 18–147 +func TestProvisionMCPService(t *testing.T) { + t.Parallel() + + host1 := fixture.HostIDs()[0] + + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute) + defer cancel() + + t.Log("Creating database with MCP service") + + db := fixture.NewDatabaseFixture(ctx, t, &controlplane.CreateDatabaseRequest{ + Spec: &controlplane.DatabaseSpec{ + DatabaseName: "test_mcp_service", + DatabaseUsers: []*controlplane.DatabaseUserSpec{ + { + Username: "admin", + Password: pointerTo("testpassword"), + DbOwner: pointerTo(true), + Attributes: []string{"LOGIN", "SUPERUSER"}, + }, + }, + Port: pointerTo(0), + Nodes: []*controlplane.DatabaseNodeSpec{ + { + Name: "n1", + HostIds: []controlplane.Identifier{controlplane.Identifier(host1)}, + }, + }, + Services: []*controlplane.ServiceSpec{ + { + ServiceID: "mcp-server", + ServiceType: "mcp", + Version: "latest", + HostIds: []controlplane.Identifier{controlplane.Identifier(host1)}, + Config: map[string]any{ + "llm_provider": "anthropic", + "llm_model": "claude-sonnet-4-5", + "anthropic_api_key": "sk-ant-test-key-12345", + }, + }, + }, + }, + }) + + t.Log("Database created, verifying service instances") + + require.NotNil(t, db.ServiceInstances, "ServiceInstances should not be nil") + require.Len(t, db.ServiceInstances, 1, "Expected 1 service instance") + + serviceInstance := db.ServiceInstances[0] + assert.Equal(t, "mcp-server", serviceInstance.ServiceID) + assert.Equal(t, string(host1), serviceInstance.HostID) + assert.NotEmpty(t, serviceInstance.ServiceInstanceID) + + validStates := []string{"creating", "running"} + assert.Contains(t, validStates, serviceInstance.State) + + // Poll until running + if serviceInstance.State == "creating" { + t.Log("Service is still creating, waiting for running...") + maxWait := 5 * time.Minute + pollInterval := 5 * time.Second + deadline := time.Now().Add(maxWait) + + for time.Now().Before(deadline) { + err := db.Refresh(ctx) + require.NoError(t, err) + if len(db.ServiceInstances) > 0 && db.ServiceInstances[0].State == "running" { + break + } + time.Sleep(pollInterval) + } + + require.Len(t, db.ServiceInstances, 1) + assert.Equal(t, "running", db.ServiceInstances[0].State) + } + + // Verify status/connection info + serviceInstance = db.ServiceInstances[0] + if serviceInstance.Status != nil { + assert.NotNil(t, serviceInstance.Status.Hostname) + assert.NotNil(t, serviceInstance.Status.Ipv4Address) + + foundHTTPPort := false + for _, port := range serviceInstance.Status.Ports { + if port.Name == "http" && port.ContainerPort != nil && *port.ContainerPort == 8080 { + foundHTTPPort = true + break + } + } + assert.True(t, foundHTTPPort, "HTTP port (8080) should be configured") + } +} +``` + +### A.7 Step-by-Step Instructions for Adding a New Service + +When a developer asks you to add a new service type (e.g., `"my-service"`), +follow these steps in order: + +1. **`api/apiv1/design/database.go`**: Add `"my-service"` to the `g.Enum()` + call on `service_type` (see [A.1](#a1-api-enum-apiapiv1designdatabasego)). + Run `make -C api generate`. + +2. **`server/internal/api/apiv1/validate.go`**: Change the allowlist check in + `validateServiceSpec()` to accept `"my-service"`. Write a + `validateMyServiceConfig()` function following the pattern in + `validateMCPServiceConfig()` (see [A.2](#a2-validation-serverinternalapiapiv1validatego)). + Add a dispatch branch in `validateServiceSpec()`. + +3. **`server/internal/orchestrator/swarm/service_images.go`**: Add a + `versions.addServiceImage("my-service", ...)` call in `NewServiceVersions()` + (see [A.3](#a3-image-registry-serverinternalorchestratorswarmservice_imagesgo)). + +4. **`server/internal/orchestrator/swarm/service_spec.go`**: If your service + needs different environment variables, a different health check endpoint, + different port, or bind mounts, modify `ServiceContainerSpec()` and/or its + helpers to branch on `ServiceType` (see + [A.4](#a4-container-spec-serverinternalorchestratorswarmservice_specgo)). + +5. **`server/internal/orchestrator/swarm/service_spec_test.go`**: Add table-driven + test cases for the new service type's container spec, env vars, and port config. + +6. **`server/internal/orchestrator/swarm/service_images_test.go`**: Add test + cases for `GetServiceImage()` with the new type and version(s). + +7. **`e2e/service_provisioning_test.go`**: Add E2E tests following the patterns + in [A.6](#a6-e2e-test-pattern-e2eservice_provisioning_testgo): single-host + provision, multi-host, add/remove from existing database, stability (unrelated + update doesn't recreate), bad version failure + recovery. + +8. **`server/internal/orchestrator/swarm/service_user_role.go`** (post-PR-280): + Add a `case` branch in `Create()` for your service type's Postgres grants. + If your service needs `NOINHERIT` or specific role grants, add them here. + +9. **`server/internal/orchestrator/swarm/orchestrator.go`** (post-PR-280): + If your service needs config file resources (like MCP's `MCPConfigResource`), + add them in a `case` branch in `GenerateServiceInstanceResources()`. If your + service only needs env vars (like PostgREST), use the base 4-resource chain. + +**Files that do NOT need changes**: `spec.go`, `service_instance.go`, +`plan_update.go`, `resources.go`, `end.go`, any store/etcd code. diff --git a/docs/development/Postgrest-design-docs/postgrest-control-plane-changes.md b/docs/development/Postgrest-design-docs/postgrest-control-plane-changes.md new file mode 100644 index 00000000..1eff00be --- /dev/null +++ b/docs/development/Postgrest-design-docs/postgrest-control-plane-changes.md @@ -0,0 +1,1345 @@ +# PostgREST Integration: Complete Code Changes + +Every file that needs changing, what to change, and the code to write. This targets +the **post-PR-#280 codebase** (MCP YAML config merged). Pre-PR-280 alternatives are +noted where they differ. + +> **Prerequisite:** PR #280 (feat: replace MCP env-var config with bind-mounted YAML +> config files) must be merged first. That PR restructures `service_spec.go`, +> `service_user_role.go`, `orchestrator.go`, `resources.go`, `service_instance_spec.go`, +> and `plan_update.go`. All PostgREST changes below are written against the post-merge +> state. +> +> **Current main vs this doc:** As of writing, PR #280 is open and not merged. If you +> open these files on `main` today, you will see the **pre-PR-280** code. Every +> "Current (post-PR-280)" block below shows the state *after* PR #280 merges, not what +> is on `main` right now. Key differences on current `main`: +> +> | File | Current `main` | After PR #280 (this doc) | +> |------|---------------|--------------------------| +> | `validate.go` | `validateMCPServiceConfig(config, path)` — 2 args | `validateMCPServiceConfig(config, path, isUpdate)` — 3 args | +> | `service_spec.go` | `buildServiceEnvVars()` exists, delivers config via env vars | `buildServiceEnvVars()` deleted, MCP uses YAML files | +> | `service_user_role.go` | `Roles: []string{"pgedge_application_read_only"}` | `Attributes: []string{"LOGIN"}` + fine-grained SQL grants | +> | `orchestrator.go` | No MCP-specific resources | Adds `DirResource` + `MCPConfigResource` | +> | `service_instance_spec.go` | No `DataDirID` field | Adds `DataDirID` + MCPConfigResource dependency | +> +> See [Section 14](#14-dependency-on-pr-280) for what to change if PostgREST work +> starts before PR #280 merges. + +--- + +## Table of Contents + +1. [API Enum](#1-api-enum) +2. [Validation](#2-validation) +3. [Service Image Registry](#3-service-image-registry) +4. [Container Spec (Env Vars)](#4-container-spec) +5. [Service User Role (NOINHERIT + Grants)](#5-service-user-role) +6. [Orchestrator (Resource Generation)](#6-orchestrator) +7. [Resource Registration](#7-resource-registration) +8. [Service Instance Spec](#8-service-instance-spec) +9. [Workflow (plan_update.go)](#9-workflow) +10. [Config Redaction (API Responses)](#10-config-redaction) +11. [Unit Tests](#11-unit-tests) +12. [E2E Tests](#12-e2e-tests) +13. [Files That Do NOT Change](#13-files-that-do-not-change) +14. [Dependency on PR #280](#14-dependency-on-pr-280) + +--- + +## 1. API Enum + +**File:** `api/apiv1/design/database.go` +**Lines:** ~159 (the `g.Enum(...)` call on `service_type`) + +### Current (post-PR-280) + +```go +g.Attribute("service_type", g.String, func() { + g.Description("The type of service to run.") + g.Enum("mcp") + g.Example("mcp") + g.Meta("struct:tag:json", "service_type") +}) +``` + +### Change + +```go +g.Attribute("service_type", g.String, func() { + g.Description("The type of service to run.") + g.Enum("mcp", "postgrest") + g.Example("mcp") + g.Example("postgrest") + g.Meta("struct:tag:json", "service_type") +}) +``` + +### After editing + +```sh +make -C api generate +``` + +This regenerates `api/apiv1/gen/` types. The generated code validates `service_type` +against the enum at the Goa framework level (before reaching application validation). + +--- + +## 2. Validation + +**File:** `server/internal/api/apiv1/validate.go` + +### 2a. Service type allowlist + +**Location:** `validateServiceSpec()` — the `if svc.ServiceType != "mcp"` check. + +#### Current (post-PR-280) + +```go +if svc.ServiceType != "mcp" { + err := fmt.Errorf("unsupported service type '%s' (only 'mcp' is currently supported)", svc.ServiceType) + errs = append(errs, newValidationError(err, appendPath(path, "service_type"))) +} +``` + +#### Change + +```go +supportedServiceTypes := []string{"mcp", "postgrest"} +if !slices.Contains(supportedServiceTypes, svc.ServiceType) { + err := fmt.Errorf("unsupported service type '%s' (supported: %s)", + svc.ServiceType, strings.Join(supportedServiceTypes, ", ")) + errs = append(errs, newValidationError(err, appendPath(path, "service_type"))) +} +``` + +**Import needed:** `"slices"` (Go 1.21+, already used elsewhere in the codebase). + +### 2b. Config validation dispatch + +**Location:** `validateServiceSpec()` — the `if svc.ServiceType == "mcp"` dispatch. + +> On current `main`, `validateMCPServiceConfig` takes 2 args (config, path). +> Post-PR-280, it gains a third `isUpdate bool` parameter. See Section 14 for +> the pre-PR-280 alternative. + +#### Current (post-PR-280) + +```go +if svc.ServiceType == "mcp" { + errs = append(errs, validateMCPServiceConfig(svc.Config, appendPath(path, "config"), isUpdate)...) +} +``` + +#### Change + +```go +switch svc.ServiceType { +case "mcp": + errs = append(errs, validateMCPServiceConfig(svc.Config, appendPath(path, "config"), isUpdate)...) +case "postgrest": + errs = append(errs, validatePostgRESTServiceConfig(svc.Config, appendPath(path, "config"))...) +} +``` + +### 2c. New function: `validatePostgRESTServiceConfig()` + +Add to the same file, after `validateMCPServiceConfig()`: + +```go +// validatePostgRESTServiceConfig validates the config map for PostgREST services. +func validatePostgRESTServiceConfig(config map[string]any, path []string) []error { + var errs []error + + // Required fields + requiredFields := []string{"db_schemas", "db_anon_role"} + for _, field := range requiredFields { + if _, ok := config[field]; !ok { + err := fmt.Errorf("%s is required", field) + errs = append(errs, newValidationError(err, path)) + } + } + + // db_schemas: must be a non-empty string + if val, exists := config["db_schemas"]; exists { + s, ok := val.(string) + if !ok { + err := errors.New("db_schemas must be a string") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_schemas")))) + } else if strings.TrimSpace(s) == "" { + err := errors.New("db_schemas must not be empty") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_schemas")))) + } + } + + // db_anon_role: must be a non-empty string + if val, exists := config["db_anon_role"]; exists { + s, ok := val.(string) + if !ok { + err := errors.New("db_anon_role must be a string") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_anon_role")))) + } else if strings.TrimSpace(s) == "" { + err := errors.New("db_anon_role must not be empty") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_anon_role")))) + } + } + + // db_pool: optional, integer in range [1, 30] + if val, exists := config["db_pool"]; exists { + switch v := val.(type) { + case float64: + if v != float64(int(v)) || v < 1 || v > 30 { + err := errors.New("db_pool must be an integer between 1 and 30") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_pool")))) + } + case json.Number: + n, nErr := v.Int64() + if nErr != nil || n < 1 || n > 30 { + err := errors.New("db_pool must be an integer between 1 and 30") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_pool")))) + } + default: + err := errors.New("db_pool must be a number") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("db_pool")))) + } + } + + // max_rows: optional, integer in range [1, 10000] + if val, exists := config["max_rows"]; exists { + switch v := val.(type) { + case float64: + if v != float64(int(v)) || v < 1 || v > 10000 { + err := errors.New("max_rows must be an integer between 1 and 10000") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("max_rows")))) + } + case json.Number: + n, nErr := v.Int64() + if nErr != nil || n < 1 || n > 10000 { + err := errors.New("max_rows must be an integer between 1 and 10000") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("max_rows")))) + } + default: + err := errors.New("max_rows must be a number") + errs = append(errs, newValidationError(err, appendPath(path, mapKeyPath("max_rows")))) + } + } + + return errs +} +``` + +**Import needed:** `"encoding/json"` (for `json.Number` handling, matches PR #280 pattern). + +### Validation test file + +**File:** `server/internal/api/apiv1/validate_test.go` + +Add test cases for `validatePostgRESTServiceConfig()`: + +```go +func TestValidatePostgRESTServiceConfig(t *testing.T) { + tests := []struct { + name string + config map[string]any + wantErrs int + errSubstr string + }{ + { + name: "valid minimal config", + config: map[string]any{"db_schemas": "api", "db_anon_role": "web_anon"}, + wantErrs: 0, + }, + { + name: "valid full config", + config: map[string]any{"db_schemas": "api", "db_anon_role": "web_anon", "db_pool": float64(15), "max_rows": float64(500)}, + wantErrs: 0, + }, + { + name: "missing db_schemas", + config: map[string]any{"db_anon_role": "web_anon"}, + wantErrs: 1, + errSubstr: "db_schemas is required", + }, + { + name: "missing db_anon_role", + config: map[string]any{"db_schemas": "api"}, + wantErrs: 1, + errSubstr: "db_anon_role is required", + }, + { + name: "missing both required fields", + config: map[string]any{}, + wantErrs: 2, + }, + { + name: "empty db_schemas", + config: map[string]any{"db_schemas": "", "db_anon_role": "web_anon"}, + wantErrs: 1, + errSubstr: "must not be empty", + }, + { + name: "db_pool too high", + config: map[string]any{"db_schemas": "api", "db_anon_role": "web_anon", "db_pool": float64(50)}, + wantErrs: 1, + errSubstr: "between 1 and 30", + }, + { + name: "db_pool zero", + config: map[string]any{"db_schemas": "api", "db_anon_role": "web_anon", "db_pool": float64(0)}, + wantErrs: 1, + errSubstr: "between 1 and 30", + }, + { + name: "max_rows too high", + config: map[string]any{"db_schemas": "api", "db_anon_role": "web_anon", "max_rows": float64(99999)}, + wantErrs: 1, + errSubstr: "between 1 and 10000", + }, + { + name: "unknown keys are ignored", + config: map[string]any{"db_schemas": "api", "db_anon_role": "web_anon", "unknown_key": "value"}, + wantErrs: 0, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + errs := validatePostgRESTServiceConfig(tt.config, []string{"config"}) + assert.Len(t, errs, tt.wantErrs) + if tt.errSubstr != "" && len(errs) > 0 { + assert.Contains(t, errs[0].Error(), tt.errSubstr) + } + }) + } +} +``` + +--- + +## 3. Service Image Registry + +**File:** `server/internal/orchestrator/swarm/service_images.go` +**Location:** `NewServiceVersions()` function + +### Current (post-PR-280) + +```go +func NewServiceVersions(cfg config.Config) *ServiceVersions { + versions := &ServiceVersions{ + cfg: cfg, + images: make(map[string]map[string]*ServiceImage), + } + + // MCP service versions + versions.addServiceImage("mcp", "latest", &ServiceImage{ + Tag: serviceImageTag(cfg, "postgres-mcp:latest"), + }) + + return versions +} +``` + +### Change + +```go +func NewServiceVersions(cfg config.Config) *ServiceVersions { + versions := &ServiceVersions{ + cfg: cfg, + images: make(map[string]map[string]*ServiceImage), + } + + // MCP service versions + versions.addServiceImage("mcp", "latest", &ServiceImage{ + Tag: serviceImageTag(cfg, "postgres-mcp:latest"), + }) + + // PostgREST service versions + versions.addServiceImage("postgrest", "14.5", &ServiceImage{ + Tag: serviceImageTag(cfg, "postgrest/postgrest:v14.5"), + PostgresConstraint: &host.VersionConstraint{ + Min: host.MustParseVersion("15"), + }, + }) + versions.addServiceImage("postgrest", "latest", &ServiceImage{ + Tag: serviceImageTag(cfg, "postgrest/postgrest:latest"), + }) + + return versions +} +``` + +**Note on version:** PostgREST v14.5 is the latest stable release (Feb 2026). v14 +dropped support for PostgreSQL 12 (EOL). The Postgres >= 15 constraint in the +implementation guide is stricter than PostgREST's own minimum — it reflects a pgEdge +platform decision, not an upstream limitation. + +**Note on image registry:** `serviceImageTag()` will prepend +`cfg.DockerSwarm.ImageRepositoryHost` to `postgrest/postgrest:v14.5` if a registry host +is configured. The function only bypasses the prefix when the first path component +contains `.`, `:`, or equals `localhost` — `"postgrest"` matches none of these. This +means PostgREST images must be mirrored to the private registry if one is configured. +If the intent is to always pull from Docker Hub, bypass `serviceImageTag()` and use the +image reference directly: + +```go +// Option A: Go through configured registry (image must be mirrored) +Tag: serviceImageTag(cfg, "postgrest/postgrest:v14.5"), + +// Option B: Always pull from Docker Hub (bypass registry) +Tag: "postgrest/postgrest:v14.5", +``` + +Choose based on whether the deployment environment mirrors images to a private registry. + +### Image registry tests + +**File:** `server/internal/orchestrator/swarm/service_images_test.go` + +Add test cases: + +```go +{ + name: "postgrest latest resolves", + serviceType: "postgrest", + version: "latest", + wantTag: "postgrest/postgrest:latest", +}, +{ + name: "postgrest 14.5 resolves", + serviceType: "postgrest", + version: "14.5", + wantTag: "postgrest/postgrest:v14.5", +}, +{ + name: "postgrest unsupported version", + serviceType: "postgrest", + version: "99.99.99", + wantErr: true, +}, +``` + +--- + +## 4. Container Spec + +**File:** `server/internal/orchestrator/swarm/service_spec.go` + +PostgREST uses environment variables for configuration (unlike MCP post-PR-280 which +uses bind-mounted YAML files). PostgREST's official Docker image reads `PGRST_*` env +vars and libpq `PG*` env vars — no config file needed. + +### 4a. Config delivery: env vars vs YAML files + +PR #280 removed `buildServiceEnvVars()` entirely and replaced it with YAML config +files for MCP. PostgREST needs env vars back, but only for PostgREST containers. +On current `main`, `buildServiceEnvVars()` still exists — see +[Section 14](#14-dependency-on-pr-280) for how to add PostgREST there instead. + +#### Post-PR-280 state of `ServiceContainerSpec()` + +After PR #280, the container spec has: +- No `Env` field (env vars removed for MCP) +- `Command` and `Args` set for MCP +- `User: "1001"` for MCP +- Bind mount for `/app/data` + +#### Change: Add service-type branching + +The `ServiceContainerSpec()` function needs to branch on service type for: +- **Env vars**: PostgREST needs them, MCP doesn't (post-PR-280) +- **Command/Args**: MCP sets custom command, PostgREST uses the image default +- **User**: MCP runs as UID 1001, PostgREST uses the image default +- **Mounts**: MCP has bind mount, PostgREST has none + +```go +func ServiceContainerSpec(opts *ServiceContainerSpecOptions) (swarm.ServiceSpec, error) { + labels := map[string]string{ + "pgedge.component": "service", + "pgedge.service.instance.id": opts.ServiceInstanceID, + "pgedge.service.id": opts.ServiceSpec.ServiceID, + "pgedge.database.id": opts.DatabaseID, + "pgedge.host.id": opts.HostID, + } + + networks := []swarm.NetworkAttachmentConfig{ + {Target: "bridge"}, + {Target: opts.DatabaseNetworkID}, + } + + ports := buildServicePortConfig(opts.Port) + + var resources *swarm.ResourceRequirements + if opts.ServiceSpec.CPUs != nil || opts.ServiceSpec.MemoryBytes != nil { + resources = &swarm.ResourceRequirements{ + Limits: &swarm.Limit{}, + } + if opts.ServiceSpec.CPUs != nil { + resources.Limits.NanoCPUs = int64(*opts.ServiceSpec.CPUs * 1e9) + } + if opts.ServiceSpec.MemoryBytes != nil { + resources.Limits.MemoryBytes = int64(*opts.ServiceSpec.MemoryBytes) + } + } + + containerSpec := &swarm.ContainerSpec{ + Image: opts.ServiceImage.Tag, + Labels: labels, + Hostname: opts.Hostname, + Healthcheck: &container.HealthConfig{ + Test: buildServiceHealthCheckCmd(opts), + StartPeriod: time.Second * 30, + Interval: time.Second * 10, + Timeout: time.Second * 5, + Retries: 3, + }, + } + + // Service-type-specific container configuration + switch opts.ServiceSpec.ServiceType { + case "mcp": + // MCP: bind-mounted YAML config, custom command, runs as UID 1001 + containerSpec.User = fmt.Sprintf("%d", mcpContainerUID) + containerSpec.Command = []string{"/app/pgedge-postgres-mcp"} + containerSpec.Args = []string{"-config", "/app/data/config.yaml"} + containerSpec.Mounts = []mount.Mount{ + docker.BuildMount(opts.DataPath, "/app/data", false), + } + + case "postgrest": + // PostgREST: env-var-based config, default image entrypoint, no mounts + containerSpec.Env = buildPostgRESTEnvVars(opts) + containerSpec.Mounts = []mount.Mount{} + } + + return swarm.ServiceSpec{ + TaskTemplate: swarm.TaskSpec{ + ContainerSpec: containerSpec, + Networks: networks, + Placement: &swarm.Placement{ + Constraints: []string{ + "node.id==" + opts.CohortMemberID, + }, + }, + Resources: resources, + }, + EndpointSpec: &swarm.EndpointSpec{ + Mode: swarm.ResolutionModeVIP, + Ports: ports, + }, + Annotations: swarm.Annotations{ + Name: opts.ServiceName, + Labels: labels, + }, + }, nil +} +``` + +### 4b. New function: `buildPostgRESTEnvVars()` + +Add to `service_spec.go`: + +```go +// buildPostgRESTEnvVars constructs the environment variables for a PostgREST container. +// PostgREST reads PGRST_* variables for its own config and PG* variables (via libpq) +// for the database connection. +func buildPostgRESTEnvVars(opts *ServiceContainerSpecOptions) []string { + env := []string{ + // libpq connection — PGRST_DB_URI is set to a bare "postgresql://" so that + // libpq fills in host/port/dbname/user/password from the PG* env vars. + // This keeps credentials out of the connection string. + "PGRST_DB_URI=postgresql://", + fmt.Sprintf("PGHOST=%s", opts.DatabaseHost), + fmt.Sprintf("PGPORT=%d", opts.DatabasePort), + fmt.Sprintf("PGDATABASE=%s", opts.DatabaseName), + "PGSSLMODE=prefer", + "PGTARGETSESSIONATTRS=read-write", + } + + // Credentials (injected by ServiceUserRole) + if opts.Credentials != nil { + env = append(env, + fmt.Sprintf("PGUSER=%s", opts.Credentials.Username), + fmt.Sprintf("PGPASSWORD=%s", opts.Credentials.Password), + ) + } + + // PostgREST-specific config from ServiceSpec.Config + if schemas, ok := opts.ServiceSpec.Config["db_schemas"].(string); ok { + env = append(env, fmt.Sprintf("PGRST_DB_SCHEMAS=%s", schemas)) + } + if role, ok := opts.ServiceSpec.Config["db_anon_role"].(string); ok { + env = append(env, fmt.Sprintf("PGRST_DB_ANON_ROLE=%s", role)) + } + if pool, ok := opts.ServiceSpec.Config["db_pool"].(float64); ok { + env = append(env, fmt.Sprintf("PGRST_DB_POOL=%d", int(pool))) + } + if maxRows, ok := opts.ServiceSpec.Config["max_rows"].(float64); ok { + env = append(env, fmt.Sprintf("PGRST_DB_MAX_ROWS=%d", int(maxRows))) + } + + // Hardcoded PostgREST defaults + env = append(env, + "PGRST_DB_POOL_ACQUISITION_TIMEOUT=10", + "PGRST_SERVER_PORT=8080", + "PGRST_ADMIN_SERVER_PORT=3001", + "PGRST_LOG_LEVEL=warn", + "PGRST_DB_CHANNEL_ENABLED=true", + ) + + return env +} +``` + +--- + +## 5. Service User Role + +**File:** `server/internal/orchestrator/swarm/service_user_role.go` + +PostgREST requires two changes to the service user that MCP does not: + +1. **NOINHERIT** — PostgREST's authenticator pattern requires `SET ROLE`, which only + works correctly when the service user does not automatically inherit granted role + privileges. +2. **GRANT web_anon TO service_user** — PostgREST needs the anonymous role granted to + the service user so it can `SET ROLE web_anon` on each request. + +### 5a. Current state + +**On current `main`:** `Create()` generates username/password, then calls +`postgres.CreateUserRole()` with `Roles: []string{"pgedge_application_read_only"}`. +No per-service-type branching. + +**After PR #280:** `Create()` generates username/password, creates the role with +`Attributes: []string{"LOGIN"}` (no inherited roles), then runs fine-grained SQL grants +for public schema access (`GRANT CONNECT`, `GRANT USAGE ON SCHEMA public`, etc.). + +### 5b. New field on ServiceUserRole + +Add `ServiceType` to the struct so `Create()` can branch: + +```go +type ServiceUserRole struct { + ServiceInstanceID string `json:"service_instance_id"` + DatabaseID string `json:"database_id"` + DatabaseName string `json:"database_name"` + Username string `json:"username"` + HostID string `json:"host_id"` + PostgresHostID string `json:"postgres_host_id"` + ServiceID string `json:"service_id"` + ServiceType string `json:"service_type"` // NEW: "mcp" or "postgrest" + AnonRole string `json:"anon_role"` // NEW: PostgREST only, from config["db_anon_role"] + Password string `json:"password"` +} +``` + +**DiffIgnore** must include the new fields that are set at creation time: +```go +func (r *ServiceUserRole) DiffIgnore() []string { + return []string{"/postgres_host_id", "/username", "/password", "/anon_role"} +} +``` + +### 5c. Changes to `Create()` + +After the existing role creation and grant logic, add PostgREST-specific handling: + +```go +func (r *ServiceUserRole) Create(ctx context.Context, rc resource.ResourceContext) error { + // ... existing: generate username, password, connect to primary ... + + // Create the Postgres role (same for all service types post-PR-280) + err = postgres.CreateUserRole(ctx, conn, postgres.CreateUserRoleOptions{ + Name: r.Username, + Password: r.Password, + DBName: r.DatabaseName, + DBOwner: false, + Attributes: []string{"LOGIN"}, + }) + if err != nil { + return fmt.Errorf("creating user role: %w", err) + } + + // Service-type-specific grants. + // Default to "mcp" for backward compatibility: existing ServiceUserRole resources + // in etcd were created before the ServiceType field existed and will deserialize + // with ServiceType == "". Treating "" as "mcp" ensures re-creation of existing + // MCP service users still runs the correct grants. + serviceType := r.ServiceType + if serviceType == "" { + serviceType = "mcp" + } + + switch serviceType { + case "mcp": + // Post-PR-280 MCP grants: public schema read-only + pg_read_all_settings + grants := []string{ + fmt.Sprintf("GRANT CONNECT ON DATABASE %s TO %s", sanitizeIdentifier(r.DatabaseName), sanitizeIdentifier(r.Username)), + fmt.Sprintf("GRANT USAGE ON SCHEMA public TO %s", sanitizeIdentifier(r.Username)), + fmt.Sprintf("GRANT SELECT ON ALL TABLES IN SCHEMA public TO %s", sanitizeIdentifier(r.Username)), + fmt.Sprintf("ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO %s", sanitizeIdentifier(r.Username)), + fmt.Sprintf("GRANT pg_read_all_settings TO %s", sanitizeIdentifier(r.Username)), + } + for _, grant := range grants { + if _, err := conn.Exec(ctx, grant); err != nil { + return fmt.Errorf("granting MCP privileges: %w", err) + } + } + + case "postgrest": + // PostgREST: NOINHERIT + grant anonymous role + // + // NOINHERIT is required because PostgREST uses SET ROLE to switch to the + // anonymous role on each request. Without NOINHERIT, the service user would + // automatically inherit web_anon privileges, defeating role isolation. + _, err = conn.Exec(ctx, fmt.Sprintf( + "ALTER ROLE %s NOINHERIT", sanitizeIdentifier(r.Username))) + if err != nil { + return fmt.Errorf("setting NOINHERIT on postgrest role: %w", err) + } + + // Grant the anonymous role so SET ROLE works. + // The role name comes from config["db_anon_role"] (e.g., "web_anon"). + if r.AnonRole != "" { + _, err = conn.Exec(ctx, fmt.Sprintf( + "GRANT %s TO %s", sanitizeIdentifier(r.AnonRole), sanitizeIdentifier(r.Username))) + if err != nil { + return fmt.Errorf("granting anonymous role %q to postgrest user: %w", r.AnonRole, err) + } + } + + // Grant CONNECT so the service user can connect to the database. + _, err = conn.Exec(ctx, fmt.Sprintf( + "GRANT CONNECT ON DATABASE %s TO %s", sanitizeIdentifier(r.DatabaseName), sanitizeIdentifier(r.Username))) + if err != nil { + return fmt.Errorf("granting CONNECT to postgrest user: %w", err) + } + } + + return nil +} +``` + +### 5d. Update `populateCredentials()` in `service_instance_spec.go` + +Post-PR-280, `populateCredentials()` sets `Role: "..."` on the `ServiceUser`. For +PostgREST, the role field is not used (PostgREST's role switching is handled by +`SET ROLE` at the Postgres level, not by the Control Plane). Set it to a descriptive +value: + +```go +func (s *ServiceInstanceSpecResource) populateCredentials(rc resource.ResourceContext) error { + userRole, err := resource.GetFromState[*ServiceUserRole](rc, ServiceUserRoleIdentifier(s.ServiceInstanceID)) + if err != nil { + return fmt.Errorf("getting service user role: %w", err) + } + + role := "public_read_only" // default for MCP post-PR-280 + if s.ServiceSpec.ServiceType == "postgrest" { + role = "postgrest_authenticator" + } + + s.Credentials = &database.ServiceUser{ + Username: userRole.Username, + Password: userRole.Password, + Role: role, + } + return nil +} +``` + +--- + +## 6. Orchestrator + +**File:** `server/internal/orchestrator/swarm/orchestrator.go` +**Location:** `GenerateServiceInstanceResources()` + +### Current (post-PR-280) + +After PR #280, this function: +1. Resolves service image +2. Validates Postgres/Spock compatibility +3. **For MCP:** Parses config via `database.ParseMCPServiceConfig()`, creates + `DirResource`, creates `MCPConfigResource` +4. Creates Network, ServiceUserRole, ServiceInstanceSpec, ServiceInstance +5. Returns resource chain + +### Change: Add PostgREST branch + +The PostgREST branch is simpler than MCP — no config files, no DirResource, no +MCPConfigResource. PostgREST only needs the four base resources. + +```go +func (o *Orchestrator) GenerateServiceInstanceResources(spec *database.ServiceInstanceSpec) ( + *database.ServiceInstanceResources, error) { + + // 1. Resolve service image (unchanged) + serviceImage, err := o.serviceVersions.GetServiceImage( + spec.ServiceSpec.ServiceType, spec.ServiceSpec.Version) + if err != nil { + return nil, err + } + + // 2. Validate compatibility (unchanged) + if err := serviceImage.ValidateCompatibility(spec.PgEdgeVersion); err != nil { + return nil, err + } + + // 3. Create base resources (shared across all service types) + networkResource := &Network{ + // ... unchanged ... + } + + serviceUserRole := &ServiceUserRole{ + ServiceInstanceID: spec.ServiceInstanceID, + DatabaseID: spec.DatabaseID, + DatabaseName: spec.DatabaseName, + HostID: spec.HostID, + PostgresHostID: spec.PostgresHostID, + ServiceID: spec.ServiceSpec.ServiceID, + ServiceType: spec.ServiceSpec.ServiceType, // NEW + Password: "", // Generated on Create + } + + // PostgREST: set AnonRole from config + if spec.ServiceSpec.ServiceType == "postgrest" { + if anonRole, ok := spec.ServiceSpec.Config["db_anon_role"].(string); ok { + serviceUserRole.AnonRole = anonRole + } + } + + // Copy persisted credentials if they exist (backward compatibility, unchanged) + if spec.Credentials != nil { + serviceUserRole.Username = spec.Credentials.Username + serviceUserRole.Password = spec.Credentials.Password + } + + // 4. Service-type-specific resource chain + var resources []resource.Resource + + switch spec.ServiceSpec.ServiceType { + case "mcp": + // MCP: DirResource → MCPConfigResource → ServiceInstanceSpec → ServiceInstance + // (post-PR-280 logic, unchanged) + mcpConfig, err := database.ParseMCPServiceConfig(spec.ServiceSpec.Config, false) + if err != nil { + return nil, fmt.Errorf("parsing MCP config: %w", err) + } + + dirResource := &filesystem.DirResource{/* ... unchanged ... */} + mcpConfigResource := &MCPConfigResource{/* ... unchanged ... */} + + serviceInstanceSpec := &ServiceInstanceSpecResource{ + // ... standard fields ... + DataDirID: dirResource.Identifier().ID, + } + + serviceInstance := &ServiceInstanceResource{/* ... unchanged ... */} + + resources = []resource.Resource{ + networkResource, serviceUserRole, + dirResource, mcpConfigResource, + serviceInstanceSpec, serviceInstance, + } + + case "postgrest": + // PostgREST: Network → ServiceUserRole → ServiceInstanceSpec → ServiceInstance + // No config files, no DirResource. Config delivered via env vars. + serviceInstanceSpec := &ServiceInstanceSpecResource{ + ServiceInstanceID: spec.ServiceInstanceID, + ServiceSpec: spec.ServiceSpec, + DatabaseID: spec.DatabaseID, + DatabaseName: spec.DatabaseName, + HostID: spec.HostID, + ServiceName: ServiceInstanceName( + spec.ServiceSpec.ServiceType, spec.DatabaseID, + spec.ServiceSpec.ServiceID, spec.HostID), + Hostname: ServiceInstanceHostname(spec.ServiceSpec.ServiceID, spec.HostID), + CohortMemberID: spec.CohortMemberID, + ServiceImage: serviceImage, + DatabaseNetworkID: database.GenerateDatabaseNetworkID(spec.DatabaseID), + DatabaseHost: spec.DatabaseHost, + DatabasePort: spec.DatabasePort, + Port: spec.Port, + // No DataDirID — PostgREST has no bind mounts + } + + serviceInstance := &ServiceInstanceResource{ + ServiceInstanceID: spec.ServiceInstanceID, + DatabaseID: spec.DatabaseID, + HostID: spec.HostID, + } + + resources = []resource.Resource{ + networkResource, serviceUserRole, + serviceInstanceSpec, serviceInstance, + } + } + + // 5. Convert to ResourceData (unchanged) + data, err := resource.ToResourceData(resources...) + if err != nil { + return nil, err + } + + return &database.ServiceInstanceResources{ + ServiceInstance: &database.ServiceInstance{ + ServiceInstanceID: spec.ServiceInstanceID, + ServiceID: spec.ServiceSpec.ServiceID, + DatabaseID: spec.DatabaseID, + HostID: spec.HostID, + State: database.ServiceInstanceStateCreating, + }, + Resources: data, + }, nil +} +``` + +--- + +## 7. Resource Registration + +**File:** `server/internal/orchestrator/swarm/resources.go` + +### No change needed for PostgREST + +PostgREST uses the same four resource types as the base service model: +- `Network` (already registered) +- `ServiceUserRole` (already registered) +- `ServiceInstanceSpecResource` (already registered) +- `ServiceInstanceResource` (already registered) + +MCP's `MCPConfigResource` (registered by PR #280) is MCP-specific and not used by +PostgREST. + +--- + +## 8. Service Instance Spec + +**File:** `server/internal/orchestrator/swarm/service_instance_spec.go` + +### Change: Handle missing DataDirID for PostgREST + +Post-PR-280, `Refresh()` resolves `DataDirID` to get the host-side data path for the +bind mount. PostgREST doesn't have a `DataDirID` (no bind mount), so this must be +conditional: + +```go +func (s *ServiceInstanceSpecResource) Refresh(ctx context.Context, rc resource.ResourceContext) error { + // Validate network exists (unchanged) + _, err := resource.GetFromState[*Network](rc, NetworkResourceIdentifier(s.DatabaseNetworkID)) + if err != nil { + return fmt.Errorf("database network not found: %w", err) + } + + // Populate credentials from ServiceUserRole (unchanged) + if err := s.populateCredentials(rc); err != nil { + return err + } + + // Resolve data path (MCP only — PostgREST has no bind mount) + var dataPath string + if s.DataDirID != "" { + dataPath, err = filesystem.DirResourceFullPath(rc, s.DataDirID) + if err != nil { + return fmt.Errorf("resolving data directory: %w", err) + } + } + + // Generate Docker service spec + spec, err := ServiceContainerSpec(&ServiceContainerSpecOptions{ + ServiceSpec: s.ServiceSpec, + ServiceInstanceID: s.ServiceInstanceID, + DatabaseID: s.DatabaseID, + DatabaseName: s.DatabaseName, + HostID: s.HostID, + ServiceName: s.ServiceName, + Hostname: s.Hostname, + CohortMemberID: s.CohortMemberID, + ServiceImage: s.ServiceImage, + Credentials: s.Credentials, + DatabaseNetworkID: s.DatabaseNetworkID, + DatabaseHost: s.DatabaseHost, + DatabasePort: s.DatabasePort, + Port: s.Port, + DataPath: dataPath, // empty string for PostgREST + }) + if err != nil { + return err + } + + s.Spec = spec + return nil +} +``` + +### Change: Dependencies + +Post-PR-280, `Dependencies()` includes `MCPConfigResourceIdentifier`. For PostgREST, +this dependency doesn't apply: + +```go +func (s *ServiceInstanceSpecResource) Dependencies() []resource.ResourceIdentifier { + deps := []resource.ResourceIdentifier{ + NetworkResourceIdentifier(s.DatabaseNetworkID), + ServiceUserRoleIdentifier(s.ServiceInstanceID), + } + // MCP has additional config resource dependency + if s.DataDirID != "" { + deps = append(deps, MCPConfigResourceIdentifier(s.ServiceInstanceID)) + } + return deps +} +``` + +--- + +## 9. Workflow + +**File:** `server/internal/workflows/plan_update.go` +**Location:** `getServiceResources()` + +### No PostgREST-specific changes needed + +Post-PR-280, `getServiceResources()` is service-type-agnostic. It: +1. Generates the service instance ID +2. Finds a Postgres instance (co-located or fallback) +3. Builds a `ServiceInstanceSpec` +4. Fires the `GenerateServiceInstanceResources` activity + +The PostgREST-specific logic lives in `GenerateServiceInstanceResources()` (section 6) +and `ServiceContainerSpec()` (section 4), not here. + +### Port fix already done by PR #280 + +PR #280 changed `findPostgresInstance()` to always return internal port 5432 (not the +host-published port). This is correct for PostgREST, which connects over the overlay +network. + +--- + +## 10. Config Redaction + +**File:** `server/internal/api/apiv1/convert.go` + +PR #280 adds config key redaction for MCP (hides `anthropic_api_key`, `openai_api_key`, +`init_users` from API responses). PostgREST has no sensitive config keys — `db_schemas`, +`db_anon_role`, `db_pool`, and `max_rows` are all safe to return. + +### No change needed + +--- + +## 11. Unit Tests + +### 11a. Container spec tests + +**File:** `server/internal/orchestrator/swarm/service_spec_test.go` + +Add test cases for PostgREST container spec: + +```go +{ + name: "postgrest service basic", + opts: &ServiceContainerSpecOptions{ + ServiceSpec: &database.ServiceSpec{ + ServiceID: "postgrest", + ServiceType: "postgrest", + Version: "14.5", + Config: map[string]any{ + "db_schemas": "api", + "db_anon_role": "web_anon", + "db_pool": float64(10), + "max_rows": float64(1000), + }, + }, + ServiceInstanceID: "mydb-postgrest-host1", + DatabaseID: "mydb", + DatabaseName: "storefront", + HostID: "host1", + ServiceName: "postgrest-mydb-postgrest-host1", + Hostname: "postgrest-host1", + CohortMemberID: "swarm-node-1", + ServiceImage: &ServiceImage{Tag: "postgrest/postgrest:v14.5"}, + Credentials: &database.ServiceUser{Username: "svc_postgrest_host1", Password: "secret"}, + DatabaseNetworkID: "mydb-database", + DatabaseHost: "postgres-mydb-n1", + DatabasePort: 5432, + Port: intPtr(3100), + }, + checks: []checkFunc{ + checkLabels(map[string]string{ + "pgedge.component": "service", + "pgedge.service.instance.id": "mydb-postgrest-host1", + "pgedge.service.id": "postgrest", + "pgedge.database.id": "mydb", + "pgedge.host.id": "host1", + }), + checkNetworks("bridge", "mydb-database"), + checkPlacement("node.id==swarm-node-1"), + checkHealthcheck("/health", 8080), + checkPorts(8080, 3100), + checkEnv( + "PGRST_DB_URI=postgresql://", + "PGHOST=postgres-mydb-n1", + "PGPORT=5432", + "PGDATABASE=storefront", + "PGSSLMODE=prefer", + "PGTARGETSESSIONATTRS=read-write", + "PGUSER=svc_postgrest_host1", + "PGPASSWORD=secret", + "PGRST_DB_SCHEMAS=api", + "PGRST_DB_ANON_ROLE=web_anon", + "PGRST_DB_POOL=10", + "PGRST_DB_MAX_ROWS=1000", + "PGRST_DB_POOL_ACQUISITION_TIMEOUT=10", + "PGRST_SERVER_PORT=8080", + "PGRST_ADMIN_SERVER_PORT=3001", + "PGRST_LOG_LEVEL=warn", + "PGRST_DB_CHANNEL_ENABLED=true", + ), + // PostgREST: no mounts, no custom command, no custom user, health check disabled + checkNoMounts(), + checkNoCommand(), + }, +}, +{ + name: "postgrest service optional fields omitted", + opts: &ServiceContainerSpecOptions{ + ServiceSpec: &database.ServiceSpec{ + ServiceID: "postgrest", + ServiceType: "postgrest", + Version: "latest", + Config: map[string]any{ + "db_schemas": "api", + "db_anon_role": "web_anon", + // db_pool and max_rows omitted + }, + }, + ServiceInstanceID: "mydb-postgrest-host1", + DatabaseID: "mydb", + DatabaseName: "storefront", + HostID: "host1", + ServiceName: "postgrest-mydb-postgrest-host1", + Hostname: "postgrest-host1", + CohortMemberID: "swarm-node-1", + ServiceImage: &ServiceImage{Tag: "postgrest/postgrest:latest"}, + Credentials: &database.ServiceUser{Username: "svc_postgrest_host1", Password: "secret"}, + DatabaseNetworkID: "mydb-database", + DatabaseHost: "postgres-mydb-n1", + DatabasePort: 5432, + }, + checks: []checkFunc{ + checkEnvAbsent("PGRST_DB_POOL", "PGRST_DB_MAX_ROWS"), + checkEnv( + "PGRST_DB_POOL_ACQUISITION_TIMEOUT=10", + "PGRST_SERVER_PORT=8080", + ), + }, +}, +``` + +**New helper function for health check (in `service_spec.go`):** + +```go +// buildServiceHealthCheckCmd returns the health check command for the service. +// PostgREST's Docker image is a static binary with no shell utilities (no curl, no wget). +// We disable the Docker health check for PostgREST and rely on the CP's ServiceInstanceMonitor. +func buildServiceHealthCheckCmd(opts *ServiceContainerSpecOptions) []string { + if opts.ServiceSpec.ServiceType == "postgrest" { + return []string{"NONE"} + } + return []string{"CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"} +} +``` + +**New check functions needed (in `service_spec_test.go`):** + +```go +func checkNoMounts() checkFunc { + return func(t *testing.T, spec swarm.ServiceSpec) { + t.Helper() + assert.Empty(t, spec.TaskTemplate.ContainerSpec.Mounts) + } +} + +func checkNoCommand() checkFunc { + return func(t *testing.T, spec swarm.ServiceSpec) { + t.Helper() + assert.Empty(t, spec.TaskTemplate.ContainerSpec.Command) + assert.Empty(t, spec.TaskTemplate.ContainerSpec.Args) + } +} + +func checkEnvAbsent(keys ...string) checkFunc { + return func(t *testing.T, spec swarm.ServiceSpec) { + t.Helper() + env := spec.TaskTemplate.ContainerSpec.Env + for _, key := range keys { + for _, e := range env { + assert.False(t, strings.HasPrefix(e, key+"="), + "env var %s should not be present", key) + } + } + } +} +``` + +--- + +## 12. E2E Tests + +**File:** `e2e/service_provisioning_test.go` + +### 12a. Basic provisioning test + +```go +func TestProvisionPostgRESTService(t *testing.T) { + t.Parallel() + + host1 := fixture.HostIDs()[0] + + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute) + defer cancel() + + t.Log("Creating database with PostgREST service") + + db := fixture.NewDatabaseFixture(ctx, t, &controlplane.CreateDatabaseRequest{ + Spec: &controlplane.DatabaseSpec{ + DatabaseName: "test_postgrest_service", + DatabaseUsers: []*controlplane.DatabaseUserSpec{ + { + Username: "admin", + Password: pointerTo("testpassword"), + DbOwner: pointerTo(true), + Attributes: []string{"LOGIN", "SUPERUSER"}, + }, + }, + Port: pointerTo(0), + Nodes: []*controlplane.DatabaseNodeSpec{ + { + Name: "n1", + HostIds: []controlplane.Identifier{controlplane.Identifier(host1)}, + }, + }, + Services: []*controlplane.ServiceSpec{ + { + ServiceID: "postgrest", + ServiceType: "postgrest", + Version: "latest", + HostIds: []controlplane.Identifier{controlplane.Identifier(host1)}, + Config: map[string]any{ + "db_schemas": "api", + "db_anon_role": "web_anon", + }, + }, + }, + }, + }) + + t.Log("Database created, verifying service instances") + + require.NotNil(t, db.ServiceInstances) + require.Len(t, db.ServiceInstances, 1) + + serviceInstance := db.ServiceInstances[0] + assert.Equal(t, "postgrest", serviceInstance.ServiceID) + assert.Equal(t, string(host1), serviceInstance.HostID) + + // Poll until running + if serviceInstance.State != "running" { + t.Log("Service is still creating, waiting for running...") + maxWait := 5 * time.Minute + pollInterval := 5 * time.Second + deadline := time.Now().Add(maxWait) + + for time.Now().Before(deadline) { + err := db.Refresh(ctx) + require.NoError(t, err) + if len(db.ServiceInstances) > 0 && db.ServiceInstances[0].State == "running" { + break + } + time.Sleep(pollInterval) + } + + require.Len(t, db.ServiceInstances, 1) + assert.Equal(t, "running", db.ServiceInstances[0].State) + } + + // Verify HTTP port is configured + serviceInstance = db.ServiceInstances[0] + if serviceInstance.Status != nil { + foundHTTPPort := false + for _, port := range serviceInstance.Status.Ports { + if port.Name == "http" && port.ContainerPort != nil && *port.ContainerPort == 8080 { + foundHTTPPort = true + break + } + } + assert.True(t, foundHTTPPort, "HTTP port (8080) should be configured") + } +} +``` + +### 12b. Additional E2E tests to add + +| Test | What it validates | +|------|-------------------| +| `TestProvisionPostgRESTServiceUnsupportedVersion` | Version `"99.99.99"` fails workflow, DB goes to `"failed"` | +| `TestUpdateDatabaseAddPostgRESTService` | Adding PostgREST to existing DB works without affecting DB | +| `TestUpdateDatabaseRemovePostgRESTService` | Empty `Services` array removes PostgREST cleanly | +| `TestPostgRESTServiceStable` | Unrelated DB update doesn't recreate PostgREST (checks `container_id` unchanged) | +| `TestProvisionPostgRESTServiceConfigUpdate` | Changing `max_rows` triggers redeploy, service reaches `"running"` | + +--- + +## 13. Files That Do NOT Change + +| File | Why | +|------|-----| +| `server/internal/database/spec.go` | `ServiceSpec` is service-type-agnostic | +| `server/internal/database/service_instance.go` | `ServiceInstance` is service-type-agnostic | +| `server/internal/database/operations/` | Plan/apply/EndState are generic | +| `server/internal/orchestrator/swarm/resources.go` | No new resource types needed | +| `server/internal/orchestrator/swarm/network.go` | Network is shared, unchanged | +| `server/internal/orchestrator/swarm/service_instance.go` | Docker service deploy is generic | +| `server/internal/monitor/service_instance_monitor_resource.go` | Health monitor is generic | +| `server/internal/workflows/plan_update.go` | Service resource generation is generic | +| `server/internal/database/mcp_service_config.go` | MCP-specific, not touched | +| `server/internal/orchestrator/swarm/mcp_config*.go` | MCP-specific, not touched | + +--- + +## 14. Dependency on PR #280 + +This implementation doc assumes PR #280 is merged. The following table shows what +changes if PostgREST work starts before the merge: + +| Touch point | Post-PR-280 (this doc) | Pre-PR-280 | +|-------------|----------------------|------------| +| `buildServiceEnvVars()` | Deleted. Add `buildPostgRESTEnvVars()` separately. | Still exists. Add PostgREST branch inside existing function. | +| `ServiceContainerSpec()` | Has MCP-specific `Command`, `Args`, `User`, `Mounts`. Add `case "postgrest"` to the switch. | Has generic `Env: buildServiceEnvVars(opts)`. Add PostgREST env vars to the existing function. | +| `ServiceUserRole.Create()` | Has fine-grained grants (public schema, `pg_read_all_settings`). Add PostgREST branch. | Has `Roles: []string{"pgedge_application_read_only"}`. Add PostgREST-specific `ALTER ROLE NOINHERIT` and `GRANT web_anon` after role creation. | +| `service_instance_spec.go` | Has `DataDirID`, `MCPConfigResource` dependency. Make conditional for PostgREST. | No `DataDirID`. Dependencies are only Network + ServiceUserRole. PostgREST needs no changes here. | +| `orchestrator.go` | Has MCP config parsing, DirResource, MCPConfigResource. Add PostgREST branch that skips these. | No MCP-specific resources. PostgREST uses the four base resources directly (no changes needed). | + +--- + +## Change Summary + +| # | File | Change | Lines (est.) | +|---|------|--------|-------------| +| 1 | `api/apiv1/design/database.go` | Add `"postgrest"` to enum | 2 | +| 2 | `server/internal/api/apiv1/validate.go` | Add allowlist entry + dispatch + `validatePostgRESTServiceConfig()` | 70 | +| 3 | `server/internal/api/apiv1/validate_test.go` | Add PostgREST validation tests | 80 | +| 4 | `server/internal/orchestrator/swarm/service_images.go` | Register PostgREST images | 8 | +| 5 | `server/internal/orchestrator/swarm/service_images_test.go` | Add PostgREST image tests | 20 | +| 6 | `server/internal/orchestrator/swarm/service_spec.go` | Add `buildPostgRESTEnvVars()` + service-type branching in `ServiceContainerSpec()` | 60 | +| 7 | `server/internal/orchestrator/swarm/service_spec_test.go` | Add PostgREST container spec tests | 80 | +| 8 | `server/internal/orchestrator/swarm/service_user_role.go` | Add `ServiceType`/`AnonRole` fields + PostgREST grants in `Create()` | 30 | +| 9 | `server/internal/orchestrator/swarm/orchestrator.go` | Add PostgREST branch in `GenerateServiceInstanceResources()` | 40 | +| 10 | `server/internal/orchestrator/swarm/service_instance_spec.go` | Conditional `DataDirID`/dependency handling | 10 | +| 11 | `e2e/service_provisioning_test.go` | Add PostgREST E2E tests | 150 | +| | **Total** | | **~550** | diff --git a/docs/development/Postgrest-design-docs/postgrest-design-doc.md b/docs/development/Postgrest-design-docs/postgrest-design-doc.md new file mode 100644 index 00000000..3fe29b83 --- /dev/null +++ b/docs/development/Postgrest-design-docs/postgrest-design-doc.md @@ -0,0 +1,104 @@ +# PostgREST Service: Design Document + +**Status:** Proposed | **Depends on:** PR #280 (MCP YAML config) | **Ticket:** PLAT-458 + +## What + +Add PostgREST as the second supported service type in the Control Plane. PostgREST +turns a Postgres schema into a REST API automatically — no application code. It reads +the database schema at startup and generates HTTP endpoints for every table, view, and +function in the configured schema. + +## Why + +Customers need a way to query structured data over HTTP without writing a custom API. +PostgREST fills this gap using the same deployment model we already have for MCP: +declared in `DatabaseSpec.Services`, provisioned as a Docker Swarm service on the +overlay network, with CP-managed credentials and health monitoring. + +## How It Differs from MCP + +| | MCP (post-PR-280) | PostgREST | +|---|---|---| +| Config delivery | YAML files bind-mounted at `/app/data/` | Env vars (`PGRST_*` + libpq `PG*`) | +| Image | Custom (`postgres-mcp`) | Upstream (`postgrest/postgrest`) — static binary, no shell utils | +| DB role | `LOGIN`, public-schema read + `pg_read_all_settings` | `LOGIN NOINHERIT` + `GRANT web_anon TO svc_*` | +| Docker health check | `curl -f http://localhost:8080/health` | Disabled (`NONE`) — image has no curl/wget | +| Admin port | N/A | `PGRST_ADMIN_SERVER_PORT=3001` for `/health`, `/ready`, `/live` | +| Extra resources | DirResource → MCPConfigResource | None — uses the 4 base resources only | +| External access | LLM provider APIs (outbound internet) | None | + +## Config + +```json +{ + "service_type": "postgrest", + "config": { + "db_schemas": "api", + "db_anon_role": "web_anon", + "db_pool": 10, + "max_rows": 1000 + } +} +``` + +| Field | Required | Type | Validation | +|-------|----------|------|------------| +| `db_schemas` | yes | string | Non-empty | +| `db_anon_role` | yes | string | Non-empty | +| `db_pool` | no | int | 1–30, default 10 | +| `max_rows` | no | int | 1–10000, default 1000 | + +The CP translates these into env vars and adds hardcoded defaults (`PGRST_DB_URI=postgresql://`, `PGSSLMODE=prefer`, `PGRST_SERVER_PORT=8080`, `PGRST_LOG_LEVEL=warn`, `PGRST_DB_CHANNEL_ENABLED=true`, `PGRST_DB_POOL_ACQUISITION_TIMEOUT=10`, `PGTARGETSESSIONATTRS=read-write`). + +## Code Changes + +11 files, ~550 lines. No new resource types. No new packages. + +| File | Change | +|------|--------| +| `api/apiv1/design/database.go` | Add `"postgrest"` to `service_type` enum. Run `make -C api generate`. | +| `api/apiv1/validate.go` | Add to allowlist. Add `validatePostgRESTServiceConfig()` (required field + range checks). Wire dispatch via `switch`. | +| `swarm/service_images.go` | Register `postgrest/postgrest:v14.5` (with PG >= 15 constraint) and `latest` in `NewServiceVersions()`. Note: `serviceImageTag()` will prepend the configured registry host — images must be mirrored to the private registry, or bypass the helper and use Docker Hub refs directly. | +| `swarm/service_spec.go` | Add `buildPostgRESTEnvVars()`. Branch in `ServiceContainerSpec()`: PostgREST gets env vars + no mounts; MCP keeps YAML + bind mount. | +| `swarm/service_user_role.go` | Add `ServiceType` and `AnonRole` fields. In `Create()`, PostgREST branch runs `ALTER ROLE ... NOINHERIT` then `GRANT web_anon TO svc_*`. | +| `swarm/orchestrator.go` | Add PostgREST branch in `GenerateServiceInstanceResources()` — skips DirResource/MCPConfigResource, uses 4 base resources. | +| `swarm/service_instance_spec.go` | Make `DataDirID` and MCPConfigResource dependency conditional (empty for PostgREST). | +| `validate_test.go` | PostgREST config validation tests. | +| `service_spec_test.go` | PostgREST container spec + env var assertions. | +| `service_images_test.go` | PostgREST image resolution tests. | +| `e2e/service_provisioning_test.go` | Provision, bad version, add/remove, stability, config update tests. | + +**Files that do NOT change:** `spec.go`, `service_instance.go`, `plan_update.go`, `operations/`, `resources.go`, `network.go`, `mcp_config*.go`. + +## Postgres Prerequisites (Manual) + +Before deploying PostgREST, the DBA must run: + +```sql +CREATE ROLE web_anon NOLOGIN; +CREATE SCHEMA api; +GRANT USAGE ON SCHEMA api TO web_anon; +GRANT SELECT ON ALL TABLES IN SCHEMA api TO web_anon; +GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA api TO web_anon; +``` + +The CP handles: service user creation, `NOINHERIT`, `GRANT web_anon TO svc_*`, credential injection. + +## Open Items + +| Item | Severity | Decision | +|------|----------|----------| +| **No primary-awareness** — `PGHOST` is a single hostname baked at provision time. Breaks failover for PostgREST and MCP `allow_writes`. | High | Solve separately (multi-host PGHOST or Patroni routing). Not a blocker for initial PostgREST deploy — read-only workloads work on any node. | +| **Schema setup validation** — if `api` schema or `web_anon` role doesn't exist, PostgREST starts but returns 404 for everything. | Low | Document prerequisites. Consider preflight SQL check in a follow-up. | +| **PostgREST version** — v14.5 is the latest stable release (Feb 2026). v14 dropped support for PostgreSQL 12 (EOL). The implementation guide's Postgres >= 15 constraint is stricter than what PostgREST requires — decide whether to enforce >= 13 (PostgREST minimum) or >= 15 (pgEdge preference). | Low | Register v14.5 with the agreed Postgres constraint. Add newer versions as they ship. | + +## Verification + +```bash +docker service ls --filter name=postgrest +curl http://localhost:3100/health # 200 +curl http://localhost:3100/ -H "Accept-Profile: api" # OpenAPI spec +curl "http://localhost:3100/gold_summaries?limit=1" -H "Accept-Profile: api" # JSON rows +curl -X DELETE "http://localhost:3100/gold_summaries?id=eq.1" -H "Content-Profile: api" # 405 (blocked) +``` diff --git a/docs/development/Postgrest-design-docs/postgrest-service-qa.md b/docs/development/Postgrest-design-docs/postgrest-service-qa.md new file mode 100644 index 00000000..8f0b0411 --- /dev/null +++ b/docs/development/Postgrest-design-docs/postgrest-service-qa.md @@ -0,0 +1,434 @@ +# PostgREST Service: Technical Q&A + +A structured deep-dive into PostgREST as a Control Plane service type — what it is, how it runs, what it needs, and where the current model has gaps. + +--- + +## 1. What Is This Service and Why Does It Live Next to the Database? + +PostgREST is a standalone web server (written in Haskell) that reads a PostgreSQL schema and automatically generates a RESTful HTTP API from it. Tables become endpoints, columns become fields, foreign keys become nested resources, and PostgreSQL's own role/grant system becomes the authorization layer. There is no application code — the database schema *is* the API contract. + +**Why it lives next to the database:** + +- **Zero network hops to Postgres.** PostgREST connects over the Docker overlay network to the Postgres instance on the same host. Latency is sub-millisecond. Deploying it remotely adds a network round-trip to every request. +- **Credential isolation.** The Control Plane generates a dedicated `svc_postgrest_{host_id}` user per instance and injects it as environment variables. No credentials leave the overlay network. +- **Failover tracking (planned).** Once multi-host `PGHOST` is implemented (see Gap 2), the container's environment will have all node hostnames. Combined with `PGTARGETSESSIONATTRS=read-write`, libpq will automatically reconnect to the new primary after a Patroni switchover. **Today**, the Control Plane sets a single `PGHOST` value, so failover requires a manual restart or redeploy. +- **Lifecycle parity with other services.** PostgREST follows the same resource model as MCP: `Network → ServiceUserRole → ServiceInstanceSpec → ServiceInstance`. The orchestrator, workflow engine, and monitor treat it identically. No new framework is needed. + +**Comparison with MCP:** + +| Dimension | MCP | PostgREST | +|-----------|-----|-----------| +| Purpose | AI/LLM-powered queries | Schema-derived REST API | +| Database access | Read-only (public-schema `SELECT` + `pg_read_all_settings`, post-PR-280) | Configurable. Read-only by default (via `web_anon` grants); write possible with RLS + authenticated roles | +| External dependencies | LLM provider API keys | None | +| Config complexity | LLM provider, model, API key | Schema name, anonymous role, pool size, row limit | +| Port | 8080 internal, user-chosen published | 8080 internal, user-chosen published | + +--- + +## 2. What Does Its Config Look Like? + +PostgREST config lives in the `config` map of `ServiceSpec`. The Control Plane translates these into `PGRST_*` and `PG*` environment variables at container creation time. + +### Required Fields + +| Field | Type | Validation | Maps To | +|-------|------|------------|---------| +| `db_schemas` | `string` | Non-empty. Comma-separated list of PostgreSQL schema names. Recommended: `"api"` only. | `PGRST_DB_SCHEMAS` | +| `db_anon_role` | `string` | Non-empty. Must be a valid PostgreSQL role name (no spaces, no special chars). | `PGRST_DB_ANON_ROLE` | + +### Optional Fields + +| Field | Type | Default | Validation | Maps To | +|-------|------|---------|------------|---------| +| `db_pool` | `number` | 10 | Integer, range 1–30. Values above 30 risk exhausting `max_connections`. | `PGRST_DB_POOL` | +| `max_rows` | `number` | 1000 | Integer, range 1–10000. Caps rows returned per response. Does not prevent full table scans in Postgres — use `statement_timeout` for that. | `PGRST_DB_MAX_ROWS` | + +### Hardcoded (Not User-Configurable) + +These are set by the Control Plane unconditionally: + +| Variable | Value | Reason | +|----------|-------|--------| +| `PGRST_DB_URI` | `postgresql://` | Minimal base URI. Actual connection details come from `PGHOST`, `PGPORT`, etc. Keeps credentials out of the connection string. | +| `PGSSLMODE` | `prefer` | Use TLS if available, fall back to unencrypted. Matches the Control Plane's default for all service types. | +| `PGTARGETSESSIONATTRS` | `read-write` | Ensures libpq always connects to the current primary. Only effective once multi-host `PGHOST` is implemented (Gap 2). | +| `PGRST_DB_POOL_ACQUISITION_TIMEOUT` | `10` | Seconds to wait for a pool connection. Requests during failover get HTTP 503 rather than hanging. | +| `PGRST_SERVER_PORT` | `8080` | Must match the Control Plane health check target port. | +| `PGRST_LOG_LEVEL` | `warn` | Avoids log noise. Change to `info` or `debug` only for troubleshooting. | +| `PGRST_DB_CHANNEL_ENABLED` | `true` | Allows `NOTIFY pgrst, 'reload schema'` to refresh the schema cache without restarting. | + +### Validation Rules + +``` +validatePostgRESTServiceConfig(config, path): + 1. config["db_schemas"] must exist and be a non-empty string + 2. config["db_anon_role"] must exist and be a non-empty string + 3. If config["db_pool"] exists: + - must be a number (float64 from JSON) + - must be an integer in range [1, 30] + 4. If config["max_rows"] exists: + - must be a number (float64 from JSON) + - must be an integer in range [1, 10000] + 5. Unknown keys are silently ignored (forward-compatible) +``` + +### Example Spec + +```json +{ + "service_id": "postgrest", + "service_type": "postgrest", + "version": "14.5", + "host_ids": ["host-abc"], + "port": 3100, + "config": { + "db_schemas": "api", + "db_anon_role": "web_anon", + "db_pool": 10, + "max_rows": 1000 + } +} +``` + +--- + +## 3. How Does the User Utilize It After It's Running? + +### REST API (Automatic) + +Once deployed, PostgREST exposes every table, view, and function in the configured schema as HTTP endpoints: + +| Operation | HTTP | Example | +|-----------|------|---------| +| List rows | `GET /table` | `GET /gold_summaries?limit=10` | +| Filter rows | `GET /table?col=op.val` | `GET /gold_summaries?year=eq.2025&order=date.desc` | +| Get single row | `GET /table?id=eq.1` + `Accept: application/vnd.pgrst.object+json` | Single JSON object | +| Insert rows | `POST /table` | `POST /gold_summaries` with JSON body | +| Update rows | `PATCH /table?filter` | `PATCH /gold_summaries?id=eq.1` | +| Upsert | `PUT /table` | With `Prefer: resolution=merge-duplicates` | +| Delete rows | `DELETE /table?filter` | `DELETE /gold_summaries?id=eq.1` | +| Call function | `POST /rpc/fn_name` | `POST /rpc/search_gold` with JSON args | +| OpenAPI spec | `GET /` | Full schema introspection | +| Health check | `GET /health` | Returns 200 if connected to Postgres | +| Readiness | `GET /ready` | Returns 200 if schema cache is loaded | + +### Filtering Operators + +`eq`, `neq`, `gt`, `gte`, `lt`, `lte`, `like`, `ilike`, `in`, `is`, `cs` (contains), `cd` (contained by), `ov` (overlaps), `fts` (full-text search). + +Boolean logic: `?and=(col1.eq.a,col2.gt.5)`, `?or=(...)`. + +### Embedding (Relationships) + +PostgREST auto-detects foreign keys and allows nested resource fetching: + +``` +GET /posts?select=id,title,comments(id,text) +``` + +### Pagination + +``` +GET /table?limit=25&offset=50 +``` + +Response headers include `Content-Range` for total count when `Prefer: count=exact` is set. + +### Runtime Settings That Can Change Without Redeploying + +| Setting | How to Change | Effect | +|---------|---------------|--------| +| Schema cache | `NOTIFY pgrst, 'reload schema'` (SQL) | PostgREST re-introspects the database. Use after DDL changes (new tables, columns, functions). | +| Row-level security policies | `ALTER POLICY` / `CREATE POLICY` (SQL) | Takes effect immediately on next request. No PostgREST restart needed. | +| Grant/revoke permissions | `GRANT SELECT ON api.new_table TO web_anon` (SQL) | Takes effect after schema cache reload. | +| Statement timeout | `ALTER ROLE svc_postgrest_host1 SET statement_timeout = '30s'` (SQL) | Takes effect on new connections. Kills long-running queries. | +| JWT secret | Requires container restart (env var change) | Must redeploy via Control Plane spec update. | + +### What the User Should NOT Do + +- Do not expose PostgREST directly to the internet without a reverse proxy (rate limiting, TLS, auth). +- Do not grant `INSERT`/`UPDATE`/`DELETE` to `web_anon` without RLS policies on every affected table. +- Do not add `SECURITY DEFINER` functions to the `api` schema — place them in a `private` schema. + +--- + +## 4. How Does It Run? + +### Container Image + +| Version | Image Tag | Postgres Constraint | +|---------|-----------|---------------------| +| `14.5` | `postgrest/postgrest:v14.5` | Postgres >= 15 (as defined in the implementation guide) | +| `latest` | `postgrest/postgrest:latest` | None | + +PostgREST v14.5 is the latest stable release (Feb 2026). v14 dropped support for PostgreSQL 12 (EOL) but supports PostgreSQL 13+. The Postgres >= 15 constraint in the implementation guide is stricter than PostgREST's own minimum — it reflects a pgEdge platform decision. The `latest` entry has no constraints and is intended for development. + +### Health Check + +**Docker health check:** Disabled (`NONE`) for PostgREST. The PostgREST Docker image +is a static binary with no shell utilities (no `curl`, no `wget`), so the standard +`curl -f http://localhost:8080/health` command fails. + +**PostgREST admin endpoints:** PostgREST v14 serves `/health`, `/live`, and `/ready` +on a separate admin port (configured via `PGRST_ADMIN_SERVER_PORT=3001`), not the API +port (8080). These endpoints are: +- `/health` — returns 200 if PostgREST has at least one usable DB connection +- `/ready` — returns 200 if the schema cache is loaded +- `/live` — alias for `/health` + +**CP health monitoring:** The `ServiceInstanceMonitor` checks health from outside the +container via the bridge network, using the published port. This still works. + +**MCP comparison:** MCP uses `curl -f http://localhost:8080/health` because the MCP +image includes curl. PostgREST cannot use this approach. + +### Port + +- **Internal (container):** 8080 — hardcoded via `PGRST_SERVER_PORT=8080`. Matches the health check and all Control Plane assumptions. +- **Published (host):** User-configurable via `spec.port`. Set to `3100` in the design spec example. `nil` = not published, `0` = Docker assigns random port. +- **Publish mode:** Host mode (`swarm.PortConfigPublishModeHost`) — traffic hits the specific Swarm node, not a VIP. + +### Resource Defaults + +| Resource | Default | Notes | +|----------|---------|-------| +| CPUs | Container default (no limit) | User can set via `spec.cpus` (e.g., `"0.5"`, `"1"`, `"500m"`) | +| Memory | Container default (no limit) | User can set via `spec.memory` (e.g., `"512M"`, `"1GiB"`) | +| Volumes | None | PostgREST is stateless. No persistent mounts. | +| Replicas | 1 per host_id | One ServiceInstance per host in `host_ids`. | + +### Differences from MCP + +| Aspect | MCP | PostgREST | +|--------|-----|-----------| +| Image source | `postgres-mcp:latest` (custom) | `postgrest/postgrest:v14.5` (upstream) | +| Version constraints | None | Postgres >= 15 for v14.5 | +| Config env vars | `PGEDGE_LLM_PROVIDER`, `PGEDGE_LLM_MODEL`, provider API keys | `PGRST_DB_SCHEMAS`, `PGRST_DB_ANON_ROLE`, `PGRST_DB_POOL`, `PGRST_DB_MAX_ROWS`, plus hardcoded PGRST defaults | +| Extra PG env vars | None | `PGTARGETSESSIONATTRS=read-write` | +| DB role attributes | Default (INHERIT) | NOINHERIT required | +| External dependencies | LLM provider internet access | None | +| Schema requirements | None specific | `api` schema + `web_anon` role + grants must exist before deploy | + +--- + +## 5. What Does It Need from Postgres? + +### Access Level + +**Default: read-only.** The `web_anon` anonymous role is granted `SELECT` and `EXECUTE` only. Write access is possible but requires: +1. Creating authenticated roles (e.g., `app_user`) with `INSERT`/`UPDATE`/`DELETE`. +2. Enabling row-level security (RLS) on every writable table. +3. Configuring JWT authentication so PostgREST can `SET ROLE` to the authenticated role. + +### Users and Roles + +| Role | Created By | Attributes | Purpose | +|------|-----------|------------|---------| +| `svc_postgrest_{host_id}` | Control Plane (ServiceUserRole) | `LOGIN` today; needs `NOINHERIT` (Gap 1) | Authenticator role. PostgREST connects as this user, then `SET ROLE` to the request role. | +| `web_anon` | DBA (manual) | `NOLOGIN` | Anonymous request role. Used when no JWT is provided. Must be granted to the service user (Gap 4 — not yet automated). | +| `app_user` (optional) | DBA (manual) | `NOLOGIN` | Authenticated request role. Used when a valid JWT with `"role": "app_user"` is provided. | + +**Why NOINHERIT matters:** PostgREST uses the PostgreSQL `SET ROLE` mechanism. The authenticator role must NOT automatically inherit the anonymous role's privileges — otherwise it would have ambient access to everything `web_anon` can do, defeating role-based isolation. `NOINHERIT` ensures `SET ROLE` is the only way to gain permissions. + +> **Current state:** Today `ServiceUserRole.Create()` creates users with `LOGIN` and default `INHERIT`. Both `NOINHERIT` and `GRANT web_anon TO svc_postgrest_{host_id}` must be added for PostgREST to function (Gaps 1 and 4). + +### Schema Requirements + +| Schema | Purpose | Created By | +|--------|---------|------------| +| `api` | Exposed to HTTP clients. Contains views and functions only. | DBA (manual, before deploy) | +| `private` (recommended) | Business logic, helper functions. Not exposed. | DBA (manual) | +| `internal` (recommended) | Raw tables, internal data. Not exposed. | DBA (manual) | + +### Required Grants (Before Deploy) + +```sql +GRANT USAGE ON SCHEMA api TO web_anon; +GRANT SELECT ON ALL TABLES IN SCHEMA api TO web_anon; +GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA api TO web_anon; +ALTER DEFAULT PRIVILEGES IN SCHEMA api GRANT SELECT ON TABLES TO web_anon; +ALTER DEFAULT PRIVILEGES IN SCHEMA api GRANT EXECUTE ON FUNCTIONS TO web_anon; + +-- After Control Plane creates the service user: +GRANT web_anon TO svc_postgrest_host1; +``` + +### Extensions + +No extensions are strictly required. Common companions: + +| Extension | Purpose | +|-----------|---------| +| `pgvector` | Similarity search via `api` functions | +| `pg_stat_statements` | Query performance monitoring | +| `pg_trgm` | Fuzzy text search support | + +### Row-Level Security + +RLS must be enabled on every table backing an `api` view if write access is granted: + +```sql +ALTER TABLE internal.my_table ENABLE ROW LEVEL SECURITY; +ALTER TABLE internal.my_table FORCE ROW LEVEL SECURITY; +``` + +Without `FORCE`, table owners bypass policies silently. + +--- + +## 6. What Are Its Networking Needs? + +### Network Attachments + +PostgREST attaches to two Docker networks (same as MCP): + +1. **Bridge network** — Local to the host. + - Control Plane reaches the container on port 8080 for health checks. + - End-users reach the published port (e.g., 3100) from outside Docker. + +2. **Database overlay network** (`{database_id}-database`) — Spans the Swarm cluster. + - PostgREST connects to Postgres instances via overlay DNS: `{hostname}.{database_id}-database`. + - Isolated per database. Services for database A cannot reach Postgres for database B. + +### Port Allocation + +| Port | Scope | Purpose | +|------|-------|---------| +| 8080 | Container-internal | HTTP API. Used by health checks and inter-service calls. | +| User-configured (e.g., 3100) | Host-published | External access. Published in host mode, not VIP mode. | + +**Conflict risk:** If two services on the same host both request port 3100, the second deployment fails. Docker Swarm rejects duplicate host-port bindings at deployment time. The Control Plane has a `validatePortAvailable` check in some node-update flows, but it is not confirmed to run for every new service deploy — port conflicts may surface as Swarm errors rather than pre-validated API errors. Users must coordinate published ports across services on the same host. + +### Service Discovery + +- **Intra-overlay:** Other containers on the same database overlay can reach PostgREST at `{service_name}:{8080}` via Docker DNS. +- **From host:** Access via `localhost:{published_port}` or `{host_ip}:{published_port}`. +- **Cross-database:** Not possible by default. Overlay networks are per-database. A service in database A cannot reach PostgREST in database B without explicit network configuration. + +### Interaction with Other Services + +- PostgREST does not conflict with MCP or other services on the overlay network — they share the network but listen on different container names. +- The only collision vector is **published host ports**. Two services requesting the same host port will fail. +- PostgREST does not need outbound internet access (unlike MCP, which calls LLM provider APIs). + +--- + +## 7. What Doesn't Fit? + +### Gap 1: NOINHERIT on Service User Role + +**Problem:** `ServiceUserRole.Create()` currently creates all service users with default PostgreSQL attributes (which means `INHERIT`). PostgREST requires `NOINHERIT`. MCP does not. + +**Current workaround in the implementation guide:** +```go +if opts.ServiceType == "postgrest" { + _, err = conn.Exec(ctx, fmt.Sprintf( + "ALTER ROLE %s NOINHERIT", sanitizeIdentifier(r.Username))) +} +``` + +**Concern:** This is a service-type-specific `if` statement in a generic resource. As more service types are added, this becomes a growing switch statement. + +**Options:** +1. **Per-service-type flag:** Add a `NoInherit bool` field to `ServiceUserRole` or `ServiceSpec`, set by the orchestrator based on service type. +2. **Role options struct:** Pass `postgres.UserRoleOptions` with an `Inherit` field, letting each service type declare its needs declaratively. +3. **Accept the if-statement for now.** Two service types don't warrant abstraction. Revisit at three. + +### Gap 2: No Primary-Awareness in Service Provisioning + +**Problem:** The platform has no mechanism for a service container to discover or follow the current Postgres primary. This is not PostgREST-specific — it affects every service type. + +**Root cause:** The provisioning logic in `server/internal/workflows/provision_services.go` (lines ~159–200) picks a single Postgres instance at provisioning time — preferring same-host, falling back to first found — and bakes it into the container's `PGHOST` env var. There is no primary/replica awareness. + +**Impact on PostgREST:** After a Patroni switchover, PostgREST keeps connecting to the old primary (now a replica). Read requests may still work, but the schema cache reload channel (`NOTIFY`) only works on the primary. If PostgREST was intended to serve writes via authenticated roles, those fail immediately. + +**Impact on MCP:** MCP's `allow_writes: true` mode needs write access to Postgres. With the current provisioning logic, an MCP container could be connected to a replica that can't accept writes, or could lose write access after failover. `allow_writes` cannot work reliably until this is resolved. + +**Options under consideration:** + +**A) Patroni routing endpoint.** Set `PGHOST` to a Patroni-aware proxy (HAProxy or pgBouncer) that health-checks each node's `/primary` endpoint and routes to the current primary. + +| Pro | Con | +|-----|-----| +| `PGHOST` stays a single string — no `DatabaseHosts` refactor | Adds a proxy component to deploy, monitor, and fail over | +| Solves failover for all services uniformly | Extra network hop on every query (not just during failover) | +| If pgBouncer is planned for connection pooling, failover comes free | Single point of failure if the proxy goes down | + +**B) Multi-host PGHOST.** Set `PGHOST=host1,host2,host3` with `PGTARGETSESSIONATTRS=read-write`. libpq tries each host natively and connects to the one accepting writes. + +| Pro | Con | +|-----|-----| +| No extra infrastructure — uses libpq's built-in multi-host support | Requires `DatabaseHost string` → `DatabaseHosts []string` across the domain model | +| No additional latency — direct connection | Provisioning workflow must populate all node hostnames from the database spec | +| No new failure modes — libpq handles retries internally | Each service reconnects independently (no shared routing knowledge) | + +**C) Validate at provisioning time only.** If `allow_writes: true` (MCP) or service type is `postgrest`, reject unless the preferred instance is currently the primary. + +| Pro | Con | +|-----|-----| +| No code change to PGHOST handling | Fragile — breaks on every failover | +| Simple to implement | Requires re-provisioning after every switchover | +| | Does not actually solve the problem, only defers it | + +**Recommendation:** Option B (multi-host PGHOST) is the right default — it's the simplest path with no new failure modes or infrastructure. If pgBouncer is on the roadmap for connection pooling reasons, Option A becomes attractive because routing comes as a side effect of the pooler. Option C is insufficient on its own. + +**Required changes for Option B:** +- `DatabaseHost string` → `DatabaseHosts []string` in `ServiceContainerSpecOptions` +- `ServiceInstanceSpec.DatabaseHost string` → `DatabaseHosts []string` in the domain model +- Provisioning workflow populates all Postgres node hostnames from the database spec +- `buildServiceEnvVars` joins them: `PGHOST=host1,host2,host3` + +### Gap 3: Schema Setup Automation + +**Problem:** PostgREST requires three SQL prerequisites before it can start: +1. `CREATE ROLE web_anon NOLOGIN` +2. `CREATE SCHEMA api` + views/functions +3. `GRANT USAGE/SELECT/EXECUTE ON SCHEMA api TO web_anon` + +These are currently manual steps. If forgotten, PostgREST starts but returns 404 for all endpoints (no schema to introspect). + +**Options:** +1. **Validation at deploy time:** Before creating the service, run a preflight check that connects to Postgres and verifies the `api` schema and `web_anon` role exist. Fail with a clear error if missing. +2. **Init workflow:** Add a database initialization workflow (or hook) that runs the prerequisite SQL when PostgREST is first added to a database spec. +3. **Document and defer.** The DBA is responsible for schema setup. PostgREST's `/ready` endpoint will return 503 until the schema exists, which is self-diagnosing. + +### Gap 4: GRANT web_anon TO Authenticator + +**Problem:** After the Control Plane creates `svc_postgrest_{host_id}`, someone must run: +```sql +GRANT web_anon TO svc_postgrest_host1; +``` + +The implementation guide lists this as step 4 of prerequisites and says "The Control Plane handles step 4 credentials automatically" — but `ServiceUserRole.Create()` does not grant the anonymous role to the service user. On current `main` it creates the user with `pgedge_application_read_only`; post-PR-280 it creates with `LOGIN` + fine-grained public-schema grants. Neither version grants the anonymous role. + +**Required fix:** After creating the role and setting NOINHERIT, also: +```sql +GRANT web_anon TO svc_postgrest_{host_id}; +``` + +The anonymous role name comes from `config["db_anon_role"]`. This means `ServiceUserRole` needs access to the service config, or the grant must happen in a separate step. + +### Gap 5: Postgres Version Constraint Validation + +**Problem:** The `postgrest:v14.5` image registers a Postgres >= 15 constraint. The validation runs in `GenerateServiceInstanceResources()` via `serviceImage.ValidateCompatibility()`. If the database is running Postgres 14, the error surfaces during resource generation — which happens inside a workflow, not at API validation time. The user gets an async failure instead of an immediate 400 response. + +**Option:** Add an early compatibility check in `validateServiceSpec()` or the API handler, before the workflow starts. This requires access to the database's Postgres version at validation time. + +### Gap 6: No Config Update Without Redeploy + +**Problem:** Changing `db_pool` or `max_rows` requires a full spec update through the Control Plane, which triggers resource reconciliation and container redeployment. PostgREST v12+ supports `NOTIFY pgrst, 'reload config'` for some settings, but the Control Plane doesn't use NOTIFY — it manages everything through container environment variables. + +**This is by design.** The Control Plane's model is declarative: the spec is the source of truth, and the container is replaced when the spec changes. Runtime config patching via NOTIFY would create a second control path outside the spec, leading to drift. Accept the redeploy cost. + +### Summary of Open Items + +| Gap | Severity | Recommendation | +|-----|----------|----------------| +| NOINHERIT per service type | Medium | Add conditional in `ServiceUserRole.Create()` now; abstract later if a third service needs it | +| No primary-awareness | High | Multi-host PGHOST (Option B) is the default recommendation. Consider Patroni routing (Option A) if pgBouncer is planned. Blocks MCP `allow_writes` and PostgREST failover. | +| Schema setup automation | Low | Document prerequisites. Add preflight validation if time permits. | +| GRANT anon role to authenticator | High | Must be implemented in `ServiceUserRole.Create()` for PostgREST. | +| Early version constraint check | Low | Nice-to-have. Current async error is acceptable for v1. | +| Config update without redeploy | None | By design. No action needed. |