Skip to content

chore(deps): update dependency datasets to v4.6.1#82

Open
red-hat-konflux-kflux-prd-rh02[bot] wants to merge 1 commit intomainfrom
konflux/mintmaker/main/datasets-4.x
Open

chore(deps): update dependency datasets to v4.6.1#82
red-hat-konflux-kflux-prd-rh02[bot] wants to merge 1 commit intomainfrom
konflux/mintmaker/main/datasets-4.x

Conversation

@red-hat-konflux-kflux-prd-rh02
Copy link
Contributor

@red-hat-konflux-kflux-prd-rh02 red-hat-konflux-kflux-prd-rh02 bot commented Feb 7, 2026

This PR contains the following updates:

Package Change Age Confidence
datasets ==4.4.1 -> ==4.6.1 age confidence

Release Notes

huggingface/datasets (datasets)

v4.6.1

Compare Source

Bug fix

Full Changelog: huggingface/datasets@4.6.0...4.6.1

v4.6.0

Compare Source

Dataset Features

  • Support Image, Video and Audio types in Lance datasets

    >>> from datasets import load_dataset
    >>> ds = load_dataset("lance-format/Openvid-1M", streaming=True, split="train")
    >>> ds.features
    {'video_blob': Video(),
     'video_path': Value('string'),
     'caption': Value('string'),
     'aesthetic_score': Value('float64'),
     'motion_score': Value('float64'),
     'temporal_consistency_score': Value('float64'),
     'camera_motion': Value('string'),
     'frame': Value('int64'),
     'fps': Value('float64'),
     'seconds': Value('float64'),
     'embedding': List(Value('float32'), length=1024)}
  • Push to hub now supports Video types

     >>> from datasets import Dataset, Video
    >>> ds = Dataset.from_dict({"video": ["path/to/video.mp4"]})
    >>> ds = ds.cast_column("video", Video())
    >>> ds.push_to_hub("username/my-video-dataset")
  • Write image/audio/video blobs as is in parquet (PLAIN) in push_to_hub() by @​lhoestq in #​7976

    • this enables cross-format Xet deduplication for image/audio/video, e.g. deduplicate videos between Lance, WebDataset, Parquet files and plain video files and make downloads and uploads faster to Hugging Face
    • E.g. if you convert a Lance video dataset to a Parquet video dataset on Hugging Face, the upload will be much faster since videos don't need to be reuploaded. Under the hood, the Xet storage reuses the binary chunks from the videos in Lance format for the videos in Parquet format
    • See more info here: https://huggingface.co/docs/hub/en/xet/deduplication

image

  • Add IterableDataset.reshard() by @​lhoestq in #​7992

    Reshard the dataset if possible, i.e. split the current shards further into more shards.
    This increases the number of shards and the resulting dataset has num_shards >= previous_num_shards.
    Equality may happen if no shard can be split further.

    The resharding mechanism depends on the dataset file format:

    • Parquet: shard per row group instead of per file
    • Other: not implemented yet (contributions are welcome !)
    >>> from datasets import load_dataset
    >>> ds = load_dataset("fancyzhx/amazon_polarity", split="train", streaming=True)
    >>> ds
    IterableDataset({
        features: ['label', 'title', 'content'],
        num_shards: 4
    })
    >>> ds.reshard()
    IterableDataset({
        features: ['label', 'title', 'content'],
        num_shards: 3600
    })

What's Changed

New Contributors

Full Changelog: huggingface/datasets@4.5.0...4.6.0

v4.5.0

Compare Source

Dataset Features

  • Add lance format support by @​eddyxu in #​7913

    • Support for both Lance dataset (including metadata / manifests) and standalone .lance files
    • e.g. with lance-format/fineweb-edu
    from datasets import load_dataset
    
    ds = load_dataset("lance-format/fineweb-edu", streaming=True)
    for example in ds["train"]:
        ...

What's Changed

New Contributors

Full Changelog: huggingface/datasets@4.4.2...4.5.0

v4.4.2

Compare Source

Bug fixes

Minor additions

New Contributors

Full Changelog: huggingface/datasets@4.4.1...4.4.2


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

To execute skipped test pipelines write comment /ok-to-test.


Documentation

Find out how to configure dependency updates in MintMaker documentation or see all available configuration options in Renovate documentation.

@red-hat-konflux-kflux-prd-rh02 red-hat-konflux-kflux-prd-rh02 bot force-pushed the konflux/mintmaker/main/datasets-4.x branch from 4fc4035 to 3065cb0 Compare February 10, 2026 12:04
@red-hat-konflux-kflux-prd-rh02 red-hat-konflux-kflux-prd-rh02 bot force-pushed the konflux/mintmaker/main/datasets-4.x branch from 3065cb0 to c17f397 Compare February 25, 2026 12:05
@red-hat-konflux-kflux-prd-rh02 red-hat-konflux-kflux-prd-rh02 bot changed the title chore(deps): update dependency datasets to v4.5.0 chore(deps): update dependency datasets to v4.6.0 Feb 25, 2026
Signed-off-by: red-hat-konflux-kflux-prd-rh02 <190377777+red-hat-konflux-kflux-prd-rh02[bot]@users.noreply.github.com>
@red-hat-konflux-kflux-prd-rh02 red-hat-konflux-kflux-prd-rh02 bot force-pushed the konflux/mintmaker/main/datasets-4.x branch from c17f397 to c445402 Compare February 28, 2026 04:05
@red-hat-konflux-kflux-prd-rh02 red-hat-konflux-kflux-prd-rh02 bot changed the title chore(deps): update dependency datasets to v4.6.0 chore(deps): update dependency datasets to v4.6.1 Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants