Welcome to s3pathlib Documentation
s3pathlib is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library pathlib and is user-friendly. The package also supports versioning in AWS S3.
Quick Start
Import the library, declare an S3Path object
# import >>> from s3pathlib import S3Path # construct from string, auto join parts >>> p = S3Path("bucket", "folder", "file.txt") # construct from S3 URI works too >>> p = S3Path("s3://bucket/folder/file.txt") # construct from S3 ARN works too >>> p = S3Path("arn:aws:s3:::bucket/folder/file.txt") >>> p.bucket 'bucket' >>> p.key 'folder/file.txt' >>> p.uri 's3://bucket/folder/file.txt' >>> p.console_url # click to preview it in AWS console 'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt' >>> p.arn 'arn:aws:s3:::bucket/folder/file.txt'
Talk to AWS S3 and get some information
# s3pathlib maintains a "context" object that holds the AWS authentication information # you just need to build your own boto session object and attach to it >>> import boto3 >>> from s3pathlib import context >>> context.attach_boto_session( ... boto3.session.Session( ... region_name="us-east-1", ... profile_name="my_aws_profile", ... ) ... ) >>> p = S3Path("bucket", "folder", "file.txt") >>> p.write_text("a lot of data ...") >>> p.etag '3e20b77868d1a39a587e280b99cec4a8' >>> p.size 56789000 >>> p.size_for_human '51.16 MB' # folder works too, you just need to use a tailing "/" to identify that >>> p = S3Path("bucket", "datalake/") >>> p.count_objects() 7164 # number of files under this prefix >>> p.calculate_total_size() (7164, 236483701963) # 7164 objects, 220.24 GB >>> p.calculate_total_size(for_human=True) (7164, '220.24 GB') # 7164 objects, 220.24 GB
Manipulate Folder in S3
Native S3 Write API (those operation that change the state of S3) only operate on object level. And the list_objects API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. s3pathlib CAN SAVE YOUR LIFE
# create a S3 folder >>> p = S3Path("bucket", "github", "repos", "my-repo/") # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/ >>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False) # copy entire s3 folder to another s3 folder >>> p2 = S3Path("bucket", "github", "repos", "another-repo/") >>> p1.copy_to(p2, overwrite=True) # delete all objects in the folder, recursively, to clean up your test bucket >>> p.delete() >>> p2.delete()
S3 Path Filter
Ever think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:
>>> s3bkt = S3Path("bucket") # assume you have a lots of files in this bucket >>> iterproxy = s3bkt.iter_objects().filter( ... S3Path.size >= 10_000_000, S3Path.ext == ".csv" # add filter ... ) >>> iterproxy.one() # fetch one S3Path('s3://bucket/larger-than-10MB-1.csv') >>> iterproxy.many(3) # fetch three [ S3Path('s3://bucket/larger-than-10MB-1.csv'), S3Path('s3://bucket/larger-than-10MB-2.csv'), S3Path('s3://bucket/larger-than-10MB-3.csv'), ] >>> for p in iterproxy: # iter the rest ... print(p)
File Like Object for Simple IO
S3Path is file-like object. It support open and context manager syntax out of the box. Here are only some highlight examples:
# Stream big file by line >>> p = S3Path("bucket", "log.txt") >>> with p.open("r") as f: ... for line in f: ... do what every you want # JSON io >>> import json >>> p = S3Path("bucket", "config.json") >>> with p.open("w") as f: ... json.dump({"password": "mypass"}, f) # pandas IO >>> import pandas as pd >>> p = S3Path("bucket", "dataset.csv") >>> df = pd.DataFrame(...) >>> with p.open("w") as f: ... df.to_csv(f)
Now that you have a basic understanding of s3pathlib, let's read the full document to explore its capabilities in greater depth.
Getting Help
Please use the python-s3pathlib tag on Stack Overflow to get help.
Submit a I want help issue tickets on GitHub Issues
Contributing
Please see the Contribution Guidelines.
Copyright
s3pathlib is an open source project. See the LICENSE file for more information.