doppa: A Framework for Comparing Traditional & CNG Queries

Setup

Azure Resources

This project utilizes several Azure resources. Some are created and deleted during runtime, whilst others have to be created manually. This section will give a brief walkthrough on the resources that have to be configured and how to do so.

Note

To ensure fair benchmarks set up all resources in Norway East

Resource group

Start by creating a resource group named doppa. Ensure that you can configure Kubernetes and Databricks with your current subscription and roles.

Blob storage

Blob storage is an essential part of this benchmarking framework. Everything from benchmarking results to the actual datasets are stored here. Create a storage account named doppablobstorage. There is no need to create the containers as these are created during runtime. Each container is created with the Container access level. If you wish to make this stricter make the following changes in the ensure_container function in BlobStorageService.

# Public container access
self.__blob_storage_context.create_container(container_name.value, public_access=PublicAccess.CONTAINER)

# Private container access
self.__blob_storage_context.create_container(container_name.value, public_access=PublicAccess.BLOB)

User-Assigned Managed Identity (UAMI)

To provide the correct access to Azure resources when running the script from GitHub Actions a UAMI have to be configured. The Actions will sign in to Azure and executes the scripts using the UAMI. Create a UAMI named github-actions-ci and navigate to the Federated credentials setting. Create two federated credentials with the following setup:

Change the fields according to your setup.

The next step is to give the UAMI a Contributor in the resource group. Navigate to the Azure role assignments setting and press Add role assignment. Select the scope Resource group and then the resource group doppa. Pick the role Contributor and press Save.

Container registry

Create a container registry named doppaacr. The Docker images will be saved here. To ensure that the Actions are able to pull the images give the UAMI created in the last step a AcrPull role. In the doppaacr resource navigate to Access control (IAM) and press Add > Add role assignment. Select the role AcrPull and continue. On the next screen select Managed identity under Assign access to, and select the github-actions-ci UAMI under Members. Navigate to the last step and press create.

PostgreSQL database

Create an Azure database for PostgreSQL with the following configuration:

Under Basics:

Server name: doppa-data
Region: Norway East
Workload type: Production
Compute + Storage: Disable Geo-Redundancy and leave everything else as is
Zonal resiliency: Disabled
Authentication method: PostgreSQL authentication only

Under Networking:

Firewall rules: Check the box Allow public access from any Azure service within Azure to this server.
Add current IP address to Firewall rules

Navigate to Review and create and create the resource.

Web app for containers

Create a web app for containers The process is the same for each of the following API servers:

doppa-vmt

Under Basics:

Resource group: doppa
Name: <name-from-list-above>
Publish: Container
Operating system: Linux
Pricing plan: Premium V4 P0V4

Under Container:

Image source: Azure Container Registry
Registry: doppaacr
Authentication: Managed identity
Identity: github-actions-ci
Image: <select the image that matches with the name>
Tag: latest
Startup command uvicorn src.presentation.endpoints.<API server script>:app --host 0.0.0.0 --port 8000

Navigate to Review + create and create the resource. Repeat this process for each name in the list.

GitHub Actions

In your repository navigate to Secrets and variables under Settings. Add the following secrets:

ACR_NAME
ACR_PASSWORD
ACR_USERNAME
AZURE_BLOB_STORAGE_CONNECTION_STRING
POSTGRES_USERNAME
POSTGRES_PASSWORD

and add the following variables:

ACR_LOGIN_SERVER
AZURE_BLOB_STORAGE_BENCHMARK_CONTAINER
AZURE_BLOB_STORAGE_METADATA_CONTAINER
AZURE_CLIENT_ID
AZURE_RESOURCE_GROUP
AZURE_SUBSCRIPTION_ID
AZURE_TENANT_ID

These values can be found under the Azure resources previously created. The workflows should now work!

Local development

Note

This does not run fully locally, so ensure that all the Azure resources have been configured

Clone the repository from GitHub and navigate to the project root.

git clone https://github.com/kartAI/doppa-data.git
cd doppa-data

Create a virtual environment and install the dependencies in the requirements-file.

python -m venv venv                 # Create virtual environment
./venv/Scripts/activate             # Activate venv
pip install -r requirements.txt     # Install dependencies

Add the following .env file to the project root directory. Swap out the values enclosed by <> with the actual secrets. The containers dev-benchmarks and dev-metadata ensure that results from the test runs do not disrupt results from actual runs.

AZURE_BLOB_STORAGE_CONNECTION_STRING=<azure-blob-storage-connection-string>
AZURE_BLOB_STORAGE_BENCHMARK_CONTAINER=dev-benchmarks
AZURE_BLOB_STORAGE_METADATA_CONTAINER=dev-metadata

ACR_LOGIN_SERVER=<azure-container-registry-login-server>
ACR_USERNAME=<azure-container-registry-username>
ACR_PASSWORD=<azure-container-registry-password>

POSTGRES_USERNAME=<postgres-username>
POSTGRES_PASSWORD=<postgres-password>

To run the entire script simply run python main.py or python -m main and to run a single benchmark run python benchmark_runner.py --script-id <script-id> --benchmark-run <int >= 1> --run-id <run-id>. See the table below for more information about --script-id and --run-id.

Flag	Format / Pattern	Meaning
`--script-id`	`<query-type>-<service>`	Identifies which query is being executed. `<query-type>` examples: `db-scan`, `bbox-filtering`. `<service>` examples: `blob-storage`, `postgis`.
`--benchmark-run`	`int`	Identifier that tells which iteration of the benchmarking is currently running. This is to run the benchmarks on multiple container instances.
`--run-id`	`<current-date>-<random-id>`	Identifies a benchmark run. Shared across all queries in a single orchestrated run. Date format: `yyyy-mm-dd`; random ID: 6-character uppercase alphanumeric.

Name		Name	Last commit message	Last commit date
Latest commit History 514 Commits
.docker		.docker
.github		.github
k8s		k8s
logs		logs
resources		resources
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark_runner.py		benchmark_runner.py
benchmarks.yml		benchmarks.yml
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doppa: A Framework for Comparing Traditional & CNG Queries

Table of contents

Setup

Azure Resources

Resource group

Blob storage

User-Assigned Managed Identity (UAMI)

Container registry

PostgreSQL database

Web app for containers

GitHub Actions

Local development

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

doppa: A Framework for Comparing Traditional & CNG Queries

Table of contents

Setup

Azure Resources

Resource group

Blob storage

User-Assigned Managed Identity (UAMI)

Container registry

PostgreSQL database

Web app for containers

GitHub Actions

Local development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages