API

S3FileSystem(*args, **kwargs)

Access S3 as if it were a file system.

S3FileSystem.cat(path[, recursive, on_error])

Fetch (potentially multiple) paths' contents

S3FileSystem.du(path[, total, maxdepth])

Space used by files within a path

S3FileSystem.exists(path)

Is there a file at the given path

S3FileSystem.find(path[, maxdepth, ...])

List all files below path. Like posix find command without conditions Parameters ---------- path : str maxdepth: int or None If not None, the maximum number of levels to descend withdirs: bool Whether to include directory paths in the output. This is True when used by glob, but users usually only want files. prefix: str Only return files that match ^{path}/{prefix} (if there is an exact match filename == {path}/{prefix}, it also will be included).

S3FileSystem.get(rpath, lpath[, recursive, ...])

Copy file(s) to local.

S3FileSystem.glob(path, **kwargs)

Find files by glob-matching.

S3FileSystem.info(path, **kwargs)

Give details of entry at path

S3FileSystem.ls(path[, detail])

List objects at path.

S3FileSystem.mkdir(path[, acl, create_parents])

Create directory entry at path

S3FileSystem.mv(path1, path2[, recursive, ...])

Move file(s) from one location to another

S3FileSystem.open(path[, mode, block_size, ...])

Return a file-like object from the filesystem

S3FileSystem.put(lpath, rpath[, recursive, ...])

Copy file(s) from local.

S3FileSystem.read_block(fn, offset, length)

Read a block of bytes from

S3FileSystem.rm(path[, recursive, maxdepth])

Delete files.

S3FileSystem.tail(path[, size])

Get the last size bytes from file

S3FileSystem.touch(path[, truncate, data])

Create empty file or truncate

S3File(s3, path[, mode, block_size, acl, ...])

Open S3 key as a file.

S3File.close()

Close file

S3File.flush([force])

Write buffered data to backend store.

S3File.info()

File information about this path

S3File.read([length])

Return data from cache, or fetch pieces as necessary

S3File.seek(loc[, whence])

Set current file location

S3File.tell()

Current file location

S3File.write(data)

Write data to buffer.

S3Map(root, s3[, check, create])

Mirror previous class, not implemented in fsspec

class s3fs.core.S3FileSystem(*args, **kwargs)[source]

Access S3 as if it were a file system.

This exposes a filesystem-like API (ls, cp, open, etc.) on top of S3 storage.

Provide credentials either explicitly (key=, secret=) or depend on boto’s credential methods. See botocore documentation for more information. If no credentials are available, use anon=True.

Parameters
anonbool (False)

Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order)

keystring (None)

If not anonymous, use this access key ID, if specified

secretstring (None)

If not anonymous, use this secret access key, if specified

tokenstring (None)

If not anonymous, use this security token, if specified

use_sslbool (True)

Whether to use SSL in connections to S3; may be faster without, but insecure. If use_ssl is also set in client_kwargs, the value set in client_kwargs will take priority.

s3_additional_kwargsdict of parameters that are used when calling s3 api

methods. Typically used for things like “ServerSideEncryption”.

client_kwargsdict of parameters for the botocore client
requester_paysbool (False)

If RequesterPays buckets are supported.

default_block_size: int (None)

If given, the default block size value used for open(), if no specific value is given at all time. The built-in default is 5MB.

default_fill_cacheBool (True)

Whether to use cache filling with open by default. Refer to S3File.open.

default_cache_typestring (‘bytes’)

If given, the default cache_type value used for open(). Set to “none” if no caching is desired. See fsspec’s documentation for other available cache_type values. Default cache_type is ‘bytes’.

version_awarebool (False)

Whether to support bucket versioning. If enable this will require the user to have the necessary IAM permissions for dealing with versioned objects. Note that in the event that you only need to work with the latest version of objects in a versioned bucket, and do not need the VersionId for those objects, you should set version_aware to False for performance reasons. When set to True, filesystem instances will use the S3 ListObjectVersions API call to list directory contents, which requires listing all historical object versions.

cache_regionsbool (False)

Whether to cache bucket regions or not. Whenever a new bucket is used, it will first find out which region it belongs and then use the client for that region.

asynchronousbool (False)

Whether this instance is to be used from inside coroutines.

config_kwargsdict of parameters passed to botocore.client.Config
kwargsother parameters for core session.
sessionaiobotocore AioSession object to be used for all connections.

This session will be used inplace of creating a new session inside S3FileSystem. For example: aiobotocore.session.AioSession(profile=’test_user’)

The following parameters are passed on to fsspec:
skip_instance_cache: to control reuse of instances
use_listings_cache, listings_expiry_time, max_paths: to control reuse of directory listings

Examples

>>> s3 = S3FileSystem(anon=False)  
>>> s3.ls('my-bucket/')  
['my-file.txt']
>>> with s3.open('my-bucket/my-file.txt', mode='rb') as f:  
...     print(f.read())  
b'Hello, world!'
Attributes
loop
s3
transaction

A context within which files are committed together upon exit

Methods

cat(path[, recursive, on_error])

Fetch (potentially multiple) paths' contents

cat_file(path[, start, end])

Get the content of a file

checksum(path[, refresh])

Unique value for current version of file

chmod(path, acl, **kwargs)

Set Access Control on a bucket/key

clear_instance_cache()

Clear the cache of filesystem instances.

clear_multipart_uploads(bucket)

Remove any partial uploads in the bucket

connect([refresh, kwargs])

Establish S3 connection object.

copy(path1, path2[, recursive, on_error])

Copy within two locations in the filesystem

cp(path1, path2, **kwargs)

Alias of AbstractFileSystem.copy.

created(path)

Return the created timestamp of a file as a datetime.datetime

current()

Return the most recently instantiated FileSystem

delete(path[, recursive, maxdepth])

Alias of AbstractFileSystem.rm.

disk_usage(path[, total, maxdepth])

Alias of AbstractFileSystem.du.

download(rpath, lpath[, recursive])

Alias of AbstractFileSystem.get.

du(path[, total, maxdepth])

Space used by files within a path

end_transaction()

Finish write transaction, non-context version

expand_path(path[, recursive, maxdepth])

Turn one or more globs or directories into a list of all matching paths to files or directories.

find(path[, maxdepth, withdirs, detail, prefix])

List all files below path. Like posix find command without conditions Parameters ---------- path : str maxdepth: int or None If not None, the maximum number of levels to descend withdirs: bool Whether to include directory paths in the output. This is True when used by glob, but users usually only want files. prefix: str Only return files that match ^{path}/{prefix} (if there is an exact match filename == {path}/{prefix}, it also will be included).

from_json(blob)

Recreate a filesystem instance from JSON representation

get(rpath, lpath[, recursive, callback])

Copy file(s) to local.

get_delegated_s3pars([exp])

Get temporary credentials from STS, appropriate for sending across a network.

get_file(rpath, lpath[, callback, outfile])

Copy single remote file to local

get_mapper([root, check, create, ...])

Create key/value store based on this file-system

get_tags(path)

Retrieve tag key/values for the given path

getxattr(path, attr_name, **kwargs)

Get an attribute from the metadata.

glob(path, **kwargs)

Find files by glob-matching.

head(path[, size])

Get the first size bytes from file

info(path, **kwargs)

Give details of entry at path

invalidate_cache([path])

Discard any cached directory information

invalidate_region_cache()

Invalidate the region cache (associated with buckets) if cache_regions is turned on.

isfile(path)

Is this entry file-like?

lexists(path, **kwargs)

If there is a file at the given path (including broken links)

listdir(path[, detail])

Alias of AbstractFileSystem.ls.

ls(path[, detail])

List objects at path.

makedir(path[, create_parents])

Alias of AbstractFileSystem.mkdir.

merge(path, filelist, **kwargs)

Create single S3 file from list of S3 files

metadata(path[, refresh])

Return metadata of path.

mkdirs(path[, exist_ok])

Alias of AbstractFileSystem.makedirs.

modified(path[, version_id, refresh])

Return the last modified timestamp of file at path as a datetime

move(path1, path2, **kwargs)

Alias of AbstractFileSystem.mv.

mv(path1, path2[, recursive, maxdepth])

Move file(s) from one location to another

open(path[, mode, block_size, ...])

Return a file-like object from the filesystem

pipe(path[, value])

Put value into path

pipe_file(path, value, **kwargs)

Set the bytes of given file

put(lpath, rpath[, recursive, callback])

Copy file(s) from local.

put_file(lpath, rpath[, callback])

Copy single file to remote

put_tags(path, tags[, mode])

Set tags for given existing key

read_block(fn, offset, length[, delimiter])

Read a block of bytes from

read_bytes(path[, start, end])

Alias of AbstractFileSystem.cat_file.

read_text(path[, encoding, errors, newline])

Get the contents of the file as a string.

rename(path1, path2, **kwargs)

Alias of AbstractFileSystem.mv.

rm(path[, recursive, maxdepth])

Delete files.

rm_file(path)

Delete a file

set_session([refresh, kwargs])

Establish S3 connection object.

setxattr(path[, copy_kwargs])

Set metadata.

sign(path[, expiration])

Create a signed URL representing the given path

size(path)

Size in bytes of file

sizes(paths)

Size in bytes of each file in a list of paths

split_path(path)

Normalise S3 path string into bucket and key.

start_transaction()

Begin write transaction for deferring files, non-context version

stat(path, **kwargs)

Alias of AbstractFileSystem.info.

tail(path[, size])

Get the last size bytes from file

to_json()

JSON representation of this filesystem instance

touch(path[, truncate, data])

Create empty file or truncate

ukey(path)

Hash of file properties, to tell if it has changed

unstrip_protocol(name)

Format FS-specific path to generic, including protocol

upload(lpath, rpath[, recursive])

Alias of AbstractFileSystem.put.

url(path[, expires, client_method])

Generate presigned URL to access path by HTTP

walk(path[, maxdepth, topdown])

Return all files belows path

write_bytes(path, value, **kwargs)

Alias of AbstractFileSystem.pipe_file.

write_text(path, value[, encoding, errors, ...])

Write the text to the given file.

call_s3

cat_ranges

close_session

cp_file

exists

get_s3

is_bucket_versioned

isdir

list_multipart_uploads

makedirs

mkdir

object_version_info

open_async

rmdir

checksum(path, refresh=False)

Unique value for current version of file

If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.

Parameters
pathstring/bytes

path of file to get checksum for

refreshbool (=False)

if False, look in local cache for file details first

chmod(path, acl, **kwargs)

Set Access Control on a bucket/key

See http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl

Parameters
pathstring

the object to set

aclstring

the value of ACL to apply

clear_multipart_uploads(bucket)

Remove any partial uploads in the bucket

connect(refresh=False, kwargs={})

Establish S3 connection object. Returns ——- Session to be closed later with await .close()

exists(path)

Is there a file at the given path

find(path, maxdepth=None, withdirs=None, detail=False, prefix='')

List all files below path. Like posix find command without conditions Parameters ———- path : str maxdepth: int or None

If not None, the maximum number of levels to descend

withdirs: bool

Whether to include directory paths in the output. This is True when used by glob, but users usually only want files.

prefix: str

Only return files that match ^{path}/{prefix} (if there is an exact match filename == {path}/{prefix}, it also will be included)

get_delegated_s3pars(exp=3600)

Get temporary credentials from STS, appropriate for sending across a network. Only relevant where the key/secret were explicitly provided.

Parameters
expint

Time in seconds that credentials are good for

Returns
dict of parameters
get_tags(path)[source]

Retrieve tag key/values for the given path

Returns
{str: str}
getxattr(path, attr_name, **kwargs)

Get an attribute from the metadata.

Examples

>>> mys3fs.getxattr('mykey', 'attribute_1')  
'value_1'
invalidate_cache(path=None)[source]

Discard any cached directory information

Parameters
path: string or None

If None, clear all listings cached else listings at or under given path.

invalidate_region_cache()

Invalidate the region cache (associated with buckets) if cache_regions is turned on.

isdir(path)

Is this entry directory-like?

makedirs(path, exist_ok=False)

Recursively make directories

Creates directory at path and any intervening required directories. Raises exception if, for instance, the path already exists but is a file.

Parameters
path: str

leaf directory name

exist_ok: bool (False)

If False, will error if the target already exists

merge(path, filelist, **kwargs)

Create single S3 file from list of S3 files

Uses multi-part, no data is downloaded. The original files are not deleted.

Parameters
pathstr

The final file to produce

filelistlist of str

The paths, in order, to assemble into the final file.

metadata(path, refresh=False, **kwargs)

Return metadata of path.

Parameters
pathstring/bytes

filename to get metadata for

refreshbool (=False)

(ignored)

mkdir(path, acl='', create_parents=True, **kwargs)

Create directory entry at path

For systems that don’t have true directories, may create an for this instance only and not touch the real filesystem

Parameters
path: str

location

create_parents: bool

if True, this is equivalent to makedirs

kwargs:

may be permissions, etc.

modified(path, version_id=None, refresh=False)[source]

Return the last modified timestamp of file at path as a datetime

put_tags(path, tags, mode='o')[source]

Set tags for given existing key

Tags are a str:str mapping that can be attached to any key, see https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/allocation-tag-restrictions.html

This is similar to, but distinct from, key metadata, which is usually set at key creation time.

Parameters
path: str

Existing key to attach tags to

tags: dict str, str

Tags to apply.

mode:

One of ‘o’ or ‘m’ ‘o’: Will over-write any existing tags. ‘m’: Will merge in new tags with existing tags. Incurs two remote calls.

rmdir(path)

Remove a directory, if empty

async set_session(refresh=False, kwargs={})[source]

Establish S3 connection object. Returns ——- Session to be closed later with await .close()

setxattr(path, copy_kwargs=None, **kw_args)

Set metadata.

Attributes have to be of the form documented in the `Metadata Reference`_.

Parameters
kw_argskey-value pairs like field=”value”, where the values must be

strings. Does not alter existing fields, unless the field appears here - if the value is None, delete the field.

copy_kwargsdict, optional

dictionary of additional params to use for the underlying s3.copy_object.

Examples

>>> mys3file.setxattr(attribute_1='value1', attribute_2='value2')  
# Example for use with copy_args
>>> mys3file.setxattr(copy_kwargs={'ContentType': 'application/pdf'},
...     attribute_1='value1')  

http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata

sign(path, expiration=100, **kwargs)[source]

Create a signed URL representing the given path

Some implementations allow temporary URLs to be generated, as a way of delegating credentials.

Parameters
pathstr

The path on the filesystem

expirationint

Number of seconds to enable the URL for (if supported)

Returns
URLstr

The signed URL

Raises
NotImplementedErrorif method is not implemented for a filesystem
split_path(path) Tuple[str, str, Optional[str]][source]

Normalise S3 path string into bucket and key.

Parameters
pathstring

Input path, like s3://mybucket/path/to/file

Examples

>>> split_path("s3://mybucket/path/to/file")
['mybucket', 'path/to/file', None]
>>> split_path("s3://mybucket/path/to/versioned_file?versionId=some_version_id")
['mybucket', 'path/to/versioned_file', 'some_version_id']
touch(path, truncate=True, data=None, **kwargs)

Create empty file or truncate

url(path, expires=3600, client_method='get_object', **kwargs)

Generate presigned URL to access path by HTTP

Parameters
pathstring

the key path we are interested in

expiresint

the number of seconds this signature will be good for.

class s3fs.core.S3File(s3, path, mode='rb', block_size=5242880, acl='', version_id=None, fill_cache=True, s3_additional_kwargs=None, autocommit=True, cache_type='bytes', requester_pays=False, cache_options=None)[source]

Open S3 key as a file. Data is only loaded and cached on demand.

Parameters
s3S3FileSystem

botocore connection

pathstring

S3 bucket/key to access

modestr

One of ‘rb’, ‘wb’, ‘ab’. These have the same meaning as they do for the built-in open function.

block_sizeint

read-ahead size for finding delimiters

fill_cachebool

If seeking to new a part of the file beyond the current buffer, with this True, the buffer will be filled between the sections to best support random access. When reading only a few specific chunks out of a file, performance may be better if False.

acl: str

Canned ACL to apply

version_idstr

Optional version to read the file at. If not specified this will default to the current version of the object. This is only used for reading.

requester_paysbool (False)

If RequesterPays buckets are supported.

See also

S3FileSystem.open

used to create S3File objects

Examples

>>> s3 = S3FileSystem()  
>>> with s3.open('my-bucket/my-file.txt', mode='rb') as f:  
...     ...  
Attributes
closed
details
full_name

Methods

close()

Close file

commit()

Move from temp to final destination

discard()

Throw away temporary file

fileno(/)

Returns underlying file descriptor if one exists.

flush([force])

Write buffered data to backend store.

getxattr(xattr_name, **kwargs)

Get an attribute from the metadata.

info()

File information about this path

isatty(/)

Return whether this is an 'interactive' stream.

metadata([refresh])

Return metadata of file.

read([length])

Return data from cache, or fetch pieces as necessary

readable()

Whether opened for reading

readinto(b)

mirrors builtin file's readinto method

readline()

Read until first occurrence of newline character

readlines()

Return all data, split by the newline character

readuntil([char, blocks])

Return data between current position and first occurrence of char

seek(loc[, whence])

Set current file location

seekable()

Whether is seekable (only in read mode)

setxattr([copy_kwargs])

Set metadata.

tell()

Current file location

truncate

Truncate file to size bytes.

url(**kwargs)

HTTP URL to read this file (if it already exists)

writable()

Whether opened for writing

write(data)

Write data to buffer.

writelines(lines, /)

Write a list of lines to stream.

readinto1

commit()[source]

Move from temp to final destination

discard()[source]

Throw away temporary file

getxattr(xattr_name, **kwargs)[source]

Get an attribute from the metadata. See getxattr().

Examples

>>> mys3file.getxattr('attribute_1')  
'value_1'
metadata(refresh=False, **kwargs)[source]

Return metadata of file. See metadata().

Metadata is cached unless refresh=True.

setxattr(copy_kwargs=None, **kwargs)[source]

Set metadata. See setxattr().

Examples

>>> mys3file.setxattr(attribute_1='value1', attribute_2='value2')  
url(**kwargs)[source]

HTTP URL to read this file (if it already exists)

s3fs.mapping.S3Map(root, s3, check=False, create=False)[source]

Mirror previous class, not implemented in fsspec

class s3fs.utils.ParamKwargsHelper(s3)[source]

Utility class to help extract the subset of keys that an s3 method is actually using

Parameters
s3boto S3FileSystem

Methods

filter_dict

class s3fs.utils.SSEParams(server_side_encryption=None, sse_customer_algorithm=None, sse_customer_key=None, sse_kms_key_id=None)[source]

Methods

to_kwargs