[ENH] Simplified Publish API with Automatic Type Recognition#1554
[ENH] Simplified Publish API with Automatic Type Recognition#1554Omswastik-11 wants to merge 26 commits intoopenml:mainfrom
Conversation
fkiraly
left a comment
There was a problem hiding this comment.
I get this is a draft still, some early comments.
- works for flows only, I would recommend to try for at least two different object types to see the dispatching challenge there.
- do the extension checking inside
publishand not in the usage example
|
Thanks @fkiraly !! |
|
The PR description is not entirely correct. This is how the interface looks currently: from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()# User instantiates the extension object
knn_flow = extension.model_to_flow(clf) # User manually converts the model (estimator instance) to an OpenMLFlow object
knn_flow.publish()But I like the idea of a unified |
Thanks for the correction I used the syntax example used in example script . this unified publish was Franz's idea . https://github.com/gc-os-ai/openml-project-dev/issues/8 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1554 +/- ##
==========================================
- Coverage 52.70% 52.63% -0.07%
==========================================
Files 37 38 +1
Lines 4385 4404 +19
==========================================
+ Hits 2311 2318 +7
- Misses 2074 2086 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I have added some comments. I also feel we should not populate |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
openml/init.py:33
openml.publish(...)is used by the new tests/examples, but the package top-level does not expose apublishattribute (only thepublishingsubmodule is imported). This will raiseAttributeErrorforopenml.publish(...). Re-export the function fromopenml.publishing(and add it to__all__), or update the call sites to useopenml.publishing.publish(...)consistently.
from . import (
_api_calls,
config,
datasets,
evaluations,
exceptions,
extensions,
flows,
publishing,
runs,
setups,
study,
tasks,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Armaghan Shakir <raoarmaghanshakir040@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ## Upload the machine learning experiments to OpenML | ||
| # First, create a fow and fill it with metadata about the machine learning model. | ||
| # | ||
| # ### Option A: Automatic publishing (simplified) | ||
| # The publish function automatically detects the model type and creates the flow: | ||
|
|
||
| # %% | ||
| knn_flow = openml.flows.OpenMLFlow( | ||
| # Metadata | ||
| model=clf, # or None, if you do not want to upload the model object. | ||
| name="CustomKNeighborsClassifier", | ||
| description="A custom KNeighborsClassifier flow for OpenML.", | ||
| external_version=f"{sklearn.__version__}", | ||
| language="English", | ||
| tags=["openml_tutorial_knn"], | ||
| dependencies=f"{sklearn.__version__}", | ||
| # Hyperparameters | ||
| parameters={k: str(v) for k, v in knn_parameters.items()}, | ||
| parameters_meta_info={ | ||
| "n_neighbors": {"description": "number of neighbors to use", "data_type": "int"} | ||
| }, | ||
| # If you have a pipeline with subcomponents, such as preprocessing, add them here. | ||
| components={}, | ||
| ) | ||
| knn_flow.publish() | ||
| print(f"knn_flow was published with the ID {knn_flow.flow_id}") | ||
| knn_flow = openml.publish(clf, tags=["openml_tutorial_knn"]) | ||
| print(f"Flow was auto-published with ID {knn_flow.flow_id}") |
There was a problem hiding this comment.
This tutorial now uses openml.publish(clf, ...) which requires an installed/registered scikit-learn extension (typically openml-sklearn). Since the script doesn’t import openml_sklearn or mention the dependency, users running the example without that extra will get a ValueError. Consider adding a short note (or an explicit import openml_sklearn # noqa: F401) near the top so the example is self-contained and the requirement is clear.
openml/publishing.py
Outdated
| if tags and hasattr(obj, "tags"): | ||
| existing = obj.tags or [] | ||
| if all(isinstance(tag, str) for tag in existing): | ||
| obj.tags = list(dict.fromkeys([*existing, *tags])) | ||
| if name is not None and hasattr(obj, "name"): | ||
| obj.name = name | ||
| return obj.publish() |
There was a problem hiding this comment.
tags is typed as Sequence[str], but at runtime passing a single string (e.g., tags="foo") will be treated as an iterable of characters and will silently add "f", "o", "o". It would be safer to validate that tags is not a str (and that all provided tags are strings) and raise a clear TypeError/ValueError when the input is invalid.
initially
API