docs: Add column overwrite example to batch mapping guide by Sanjaykumar030 · Pull Request #7737 · huggingface/datasets
This PR adds a complementary example showing the column-overwriting pattern, which is both more direct and more flexible for many transformations.
Proposed Change
The original remove_columns example remains untouched. Below it, this PR introduces an alternative approach that overwrites an existing column during batch mapping.
This teaches users a core .map() capability for in-place transformations without extra intermediate steps.
New Example:
>>> from datasets import Dataset >>> dataset = Dataset.from_dict({"a": [0, 1, 2]}) # Overwrite "a" directly to duplicate each value >>> duplicated_dataset = dataset.map( ... lambda batch: {"a": [x for x in batch["a"] for _ in range(2)]}, ... batched=True ... ) >>> duplicated_dataset Dataset({ features: ['a'], num_rows: 6 }) >>> duplicated_dataset["a"] [0, 0, 1, 1, 2, 2]