Conversation
- Add LakeFormationConfig class to configure Lake Formation governance on offline stores - Implement FeatureGroup subclass with Lake Formation integration capabilities - Add helper methods for S3 URI/ARN conversion and Lake Formation role management - Add S3 deny policy generation for Lake Formation access control - Implement Lake Formation resource registration and S3 bucket policy setup - Add integration tests for Lake Formation feature store workflows - Add unit tests for Lake Formation configuration and policy generation - Update feature_store module exports to include FeatureGroup and LakeFormationConfig - Update API documentation to include Feature Store section in sagemaker_mlops.rst - Enable fine-grained access control for feature store offline stores using AWS Lake Formation
| disable_hybrid_access_mode: bool = True | ||
|
|
||
|
|
||
| class FeatureGroup(CoreFeatureGroup): |
There was a problem hiding this comment.
This would be confusing to the users to choose between sagemaker.core.FeatureGroup and sagemaker.mlops.FeatureGroup
Can this be utils or called something else ?
There was a problem hiding this comment.
The v3 sagemaker.mlops version already re-exports the FeatureGroup from sagemaker-core ref
This class is meant as an extension to add an extra method enable_lake_formation and update the create method to enable lakeformation during creation
There was a problem hiding this comment.
re-exporting is fine.
The class is a sagemaker-core resource class which does basic crud operations exactly mimicing boto.
If we introfuce this change , then there is a difference in the behavior between core and mlops .
Why does this class need to be called FeatureGroup ?
There was a problem hiding this comment.
The idea is that it should be a drop-in replacement for the one in sagemaker-core just with extra functionality so that the user can enable lakeformation during feature group creation instead of creating then enabling. I was thinking this might be better UX. What do you suggest renaming the class or moving that to a utility function? I'm open to both but not sure what it can be renamed to
There was a problem hiding this comment.
I think we can go with something like GovernedFeatureGroup or ManagedFeatureGroup.
We can discuss internally .
| return ( | ||
| f"arn:{partition}:iam::{account_id}:role/aws-service-role/" | ||
| f"lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Is this the string that gets created by default ?
Can this functionality be gotten from an AWS library instead of hardcoding ?
If this changes , these changes would break
There was a problem hiding this comment.
Yes it gets created by default by Lakeformation ref
I didn't find any functionality in the botocore sdk to get that role instead of hardcoding
I'm thinking it's provided in the lakeformation public docs so it's unlikely to change.
Replace 10 bare print() calls with a single logger.info() call for the S3 deny policy output in enable_lake_formation(). This makes the policy display consistent with the rest of the LF workflow which uses logger. Update 12 tests to mock the logger instead of builtins.print. --- X-AI-Prompt: replace print with logger.info for s3 bucket policy display in enable_lake_formation X-AI-Tool: kiro-cli
Description
This PR adds Lake Formation integration to SageMaker Feature Store, enabling customers to govern access to their offline store data through AWS Lake Formation instead of relying solely on IAM policies.
This simplifies the manual process described in this blog https://aws.amazon.com/blogs/machine-learning/control-access-to-amazon-sagemaker-feature-store-offline-using-aws-lake-formation/
New Features
FeatureGroup.create() - added a new lake_formation_config parameter
FeatureGroup.enable_lake_formation() method
Usage
Enable at creation:
Enable on existing Feature Group:
Testing
Notes
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.