Fix #5627: SFT example notebook references inaccessible S3 dataset URI by JiwaniZakir · Pull Request #5704 · aws/sagemaker-python-sdk
Closes #5627
Motivation
The SFT finetuning example notebook hardcoded an internal S3 URI (s3://mc-flows-sdk-testing/...) that external users cannot access, causing an immediate 403 Forbidden error when running the dataset registration cell.
Changes
File: v3-examples/model-customization-examples/sft_finetuning_example_notebook_pysdk_prod_v3.ipynb
- Dataset registration cell (~line 85): Replaced the hardcoded
s3://mc-flows-sdk-testing/input_data/sft/sample_data_256_final.jsonlsource with a named placeholder variableMY_DATASET_S3_URI = "s3://<your-bucket>/<path-to-your-dataset>.jsonl"marked with a# TODOcomment. Added an explanatory comment block describing the required JSONL format (prompt/completionfields per line) and linking to the SageMaker SFT documentation. - Training job cell (~line 169): Replaced
s3_output_path="s3://mc-flows-sdk-testing/output/"with"s3://<your-bucket>/output/"and a# TODOcomment. - Second training job cell (~line 384): Same
s3_output_pathsubstitution as above. - Nova training job cell (~line 445): Replaced
s3_output_path="s3://mc-flows-sdk-testing-us-east-1/output/"with the same placeholder pattern.
Testing
Manual verification: open the notebook and confirm no cells reference mc-flows-sdk-testing — all four occurrences are replaced with <your-bucket> placeholders. A user following the notebook will now see the TODO markers before executing any cell that requires S3 access, preventing the 403 error. Substituting a valid bucket and a JSONL file with prompt/completion fields allows the notebook to run end-to-end successfully.