This project utilizes several Azure resources. Some are created and deleted during runtime, whilst others have to be created manually. This section will give a brief walkthrough on the resources that have to be configured and how to do so.
Note
To ensure fair benchmarks set up all resources in Norway East
Start by creating a resource group named doppa. Ensure that you can configure Kubernetes and Databricks with your
current subscription and roles.
Blob storage is an essential part of this benchmarking framework. Everything from benchmarking results to the actual
datasets are stored here. Create a storage account named doppablobstorage. There is no need to create the containers
as these are created during runtime. Each container is created with the Container access level. If you wish to make
this stricter make the following changes in the ensure_container function in
BlobStorageService.
# Public container access
self.__blob_storage_context.create_container(container_name.value, public_access=PublicAccess.CONTAINER) # Private container access
self.__blob_storage_context.create_container(container_name.value, public_access=PublicAccess.BLOB)To provide the correct access to Azure resources when running the script from GitHub Actions a UAMI have to be
configured. The Actions will sign in to Azure and executes the scripts using the UAMI. Create a UAMI named
github-actions-ci and navigate to the Federated credentials setting. Create two federated credentials with the
following setup:
Change the fields according to your setup.
The next step is to give the UAMI a Contributor in the resource group. Navigate to the Azure role assignments
setting and press Add role assignment. Select the scope Resource group and then the resource group doppa. Pick the
role Contributor and press Save.
Create a container registry named doppaacr. The Docker images will be saved here. To ensure that the Actions are able
to pull the images give the UAMI created in the last step a AcrPull role. In the doppaacr resource navigate to
Access control (IAM) and press Add > Add role assignment. Select the role AcrPull and continue. On the next
screen select Managed identity under Assign access to, and select the github-actions-ci UAMI under Members.
Navigate to the last step and press create.
Create an Azure database for PostgreSQL with the following configuration:
Under Basics:
- Server name:
doppa-data - Region:
Norway East - Workload type:
Production - Compute + Storage: Disable
Geo-Redundancyand leave everything else as is - Zonal resiliency:
Disabled - Authentication method:
PostgreSQL authentication only
Under Networking:
- Firewall rules: Check the box Allow public access from any Azure service within Azure to this server.
- Add current IP address to Firewall rules
Navigate to Review and create and create the resource.
Create a web app for containers The process is the same for each of the following API servers:
doppa-vmt
Under Basics:
- Resource group:
doppa - Name:
<name-from-list-above> - Publish:
Container - Operating system:
Linux - Pricing plan:
Premium V4 P0V4
Under Container:
- Image source:
Azure Container Registry - Registry:
doppaacr - Authentication:
Managed identity - Identity:
github-actions-ci - Image:
<select the image that matches with the name> - Tag:
latest - Startup command
uvicorn src.presentation.endpoints.<API server script>:app --host 0.0.0.0 --port 8000
Navigate to Review + create and create the resource. Repeat this process for each name in the list.
In your repository navigate to Secrets and variables under Settings. Add the following secrets:
ACR_NAMEACR_PASSWORDACR_USERNAMEAZURE_BLOB_STORAGE_CONNECTION_STRINGPOSTGRES_USERNAMEPOSTGRES_PASSWORD
and add the following variables:
ACR_LOGIN_SERVERAZURE_BLOB_STORAGE_BENCHMARK_CONTAINERAZURE_BLOB_STORAGE_METADATA_CONTAINERAZURE_CLIENT_IDAZURE_RESOURCE_GROUPAZURE_SUBSCRIPTION_IDAZURE_TENANT_ID
These values can be found under the Azure resources previously created. The workflows should now work!
Note
This does not run fully locally, so ensure that all the Azure resources have been configured
Clone the repository from GitHub and navigate to the project root.
git clone https://github.com/kartAI/doppa-data.git
cd doppa-dataCreate a virtual environment and install the dependencies in the requirements-file.
python -m venv venv # Create virtual environment
./venv/Scripts/activate # Activate venv
pip install -r requirements.txt # Install dependenciesAdd the following .env file to the project root directory. Swap out the values enclosed by <> with the actual
secrets. The containers dev-benchmarks and dev-metadata ensure that results from the test runs do not disrupt
results from actual runs.
AZURE_BLOB_STORAGE_CONNECTION_STRING=<azure-blob-storage-connection-string>
AZURE_BLOB_STORAGE_BENCHMARK_CONTAINER=dev-benchmarks
AZURE_BLOB_STORAGE_METADATA_CONTAINER=dev-metadata
ACR_LOGIN_SERVER=<azure-container-registry-login-server>
ACR_USERNAME=<azure-container-registry-username>
ACR_PASSWORD=<azure-container-registry-password>
POSTGRES_USERNAME=<postgres-username>
POSTGRES_PASSWORD=<postgres-password>To run the entire script simply run python main.py or python -m main and to run a single benchmark run
python benchmark_runner.py --script-id <script-id> --benchmark-run <int >= 1> --run-id <run-id>. See the table
below for more information about
--script-id and --run-id.
| Flag | Format / Pattern | Meaning |
|---|---|---|
--script-id |
<query-type>-<service> |
Identifies which query is being executed. <query-type> examples: db-scan, bbox-filtering. <service> examples: blob-storage, postgis. |
--benchmark-run |
int |
Identifier that tells which iteration of the benchmarking is currently running. This is to run the benchmarks on multiple container instances. |
--run-id |
<current-date>-<random-id> |
Identifies a benchmark run. Shared across all queries in a single orchestrated run. Date format: yyyy-mm-dd; random ID: 6-character uppercase alphanumeric. |

