Fix #5627: SFT example notebook references inaccessible S3 dataset URI by JiwaniZakir · Pull Request #5704 · aws/sagemaker-python-sdk

Closes #5627

Motivation

The SFT finetuning example notebook hardcoded an internal S3 URI (s3://mc-flows-sdk-testing/...) that external users cannot access, causing an immediate 403 Forbidden error when running the dataset registration cell.

Changes

File: v3-examples/model-customization-examples/sft_finetuning_example_notebook_pysdk_prod_v3.ipynb

  • Dataset registration cell (~line 85): Replaced the hardcoded s3://mc-flows-sdk-testing/input_data/sft/sample_data_256_final.jsonl source with a named placeholder variable MY_DATASET_S3_URI = "s3://<your-bucket>/<path-to-your-dataset>.jsonl" marked with a # TODO comment. Added an explanatory comment block describing the required JSONL format (prompt/completion fields per line) and linking to the SageMaker SFT documentation.
  • Training job cell (~line 169): Replaced s3_output_path="s3://mc-flows-sdk-testing/output/" with "s3://<your-bucket>/output/" and a # TODO comment.
  • Second training job cell (~line 384): Same s3_output_path substitution as above.
  • Nova training job cell (~line 445): Replaced s3_output_path="s3://mc-flows-sdk-testing-us-east-1/output/" with the same placeholder pattern.

Testing

Manual verification: open the notebook and confirm no cells reference mc-flows-sdk-testing — all four occurrences are replaced with <your-bucket> placeholders. A user following the notebook will now see the TODO markers before executing any cell that requires S3 access, preventing the 403 error. Substituting a valid bucket and a JSONL file with prompt/completion fields allows the notebook to run end-to-end successfully.