Useful methods for working with httplib
, completely decoupled from
code specific to urllib3.
Timeout configuration.
Timeouts can be defined as a default for a pool:
import urllib3
timeout = urllib3.util.Timeout(connect=2.0, read=7.0)
http = urllib3.PoolManager(timeout=timeout)
resp = http.request("GET", "https://example.com/")
print(resp.status)
Or per-request (which overrides the default for the pool):
response = http.request("GET", "https://example.com/", timeout=Timeout(10))
Timeouts can be disabled by setting all the parameters to None
:
no_timeout = Timeout(connect=None, read=None)
response = http.request("GET", "https://example.com/", timeout=no_timeout)
total (int, float, or None) –
This combines the connect and read timeouts into one; the read timeout will be set to the time leftover from the connect attempt. In the event that both a connect timeout and a total are specified, or a read timeout and a total are specified, the shorter timeout will be applied.
Defaults to None.
connect (int, float, or None) – The maximum amount of time (in seconds) to wait for a connection attempt to a server to succeed. Omitting the parameter will default the connect timeout to the system default, probably the global default timeout in socket.py. None will set an infinite timeout for connection attempts.
read (int, float, or None) –
The maximum amount of time (in seconds) to wait between consecutive read operations for a response from the server. Omitting the parameter will default the read timeout to the system default, probably the global default timeout in socket.py. None will set an infinite timeout.
Note
Many factors can affect the total amount of time for urllib3 to return an HTTP response.
For example, Python’s DNS resolver does not obey the timeout specified on the socket. Other factors that can affect total request time include high CPU load, high swap, the program running at a low priority level, or other behaviors.
In addition, the read and total timeouts only measure the time between read operations on the socket connecting the client and the server, not the total amount of time for the request to return a complete response. For most requests, the timeout is raised because the server has not sent the first byte in the specified time. This is not always the case; if a server streams one byte every fifteen seconds, a timeout of 20 seconds will not trigger, even though the request will take several minutes to complete.
If your goal is to cut off any request after a set amount of wall clock time, consider having a second “watcher” thread to cut off a slow request.
A sentinel object representing the default timeout value
Create a copy of the timeout object
Timeout properties are stored per-pool but each request needs a fresh Timeout object to ensure each one has its own start/stop configured.
a copy of the timeout object
Get the value to use when setting a connection timeout.
This will be a positive float or integer, the value None (never timeout), or the default system timeout.
Connect timeout.
int, float, Timeout.DEFAULT_TIMEOUT
or None
Create a new Timeout from a legacy timeout value.
The timeout value used by httplib.py sets the same timeout on the
connect(), and recv() socket requests. This creates a Timeout
object that sets the individual timeouts to the timeout
value
passed to this function.
timeout (integer, float, urllib3.util.Timeout.DEFAULT_TIMEOUT
, or None) – The legacy timeout value.
Timeout object
Gets the time elapsed since the call to start_connect()
.
Elapsed time in seconds.
float
urllib3.exceptions.TimeoutStateError – if you attempt to get duration for a timer that hasn’t been started.
Get the value for the read timeout.
This assumes some time has elapsed in the connection timeout and computes the read timeout appropriately.
If self.total is set, the read timeout is dependent on the amount of
time taken by the connect timeout. If the connection time has not been
established, a TimeoutStateError
will be
raised.
Value to use for the read timeout.
int, float or None
urllib3.exceptions.TimeoutStateError – If start_connect()
has not yet been called on this object.
Start the timeout clock, used during a connect() attempt
urllib3.exceptions.TimeoutStateError – if you attempt to start a timer that has been started already.
Alias for field number 2
Alias for field number 0
Alias for field number 4
Alias for field number 3
Alias for field number 1
Retry configuration.
Each retry attempt will create a new Retry object with updated values, so they can be safely reused.
Retries can be defined as a default for a pool:
retries = Retry(connect=5, read=2, redirect=5)
http = PoolManager(retries=retries)
response = http.request("GET", "https://example.com/")
Or per-request (which overrides the default for the pool):
response = http.request("GET", "https://example.com/", retries=Retry(10))
Retries can be disabled by passing False
:
response = http.request("GET", "https://example.com/", retries=False)
Errors will be wrapped in MaxRetryError
unless
retries are disabled, in which case the causing exception will be raised.
total (int) –
Total number of retries to allow. Takes precedence over other counts.
Set to None
to remove this constraint and fall back on other
counts.
Set to 0
to fail on the first retry.
Set to False
to disable and imply raise_on_redirect=False
.
connect (int) –
How many connection-related errors to retry on.
These are errors raised before the request is sent to the remote server, which we assume has not triggered the server to process the request.
Set to 0
to fail on the first retry of this type.
read (int) –
How many times to retry on read errors.
These errors are raised after the request was sent to the server, so the request may have side-effects.
Set to 0
to fail on the first retry of this type.
redirect (int) –
How many redirects to perform. Limit this to avoid infinite redirect loops.
A redirect is a HTTP response with a status code 301, 302, 303, 307 or 308.
Set to 0
to fail on the first retry of this type.
Set to False
to disable and imply raise_on_redirect=False
.
status (int) –
How many times to retry on bad status codes.
These are retries made on responses, where status code matches
status_forcelist
.
Set to 0
to fail on the first retry of this type.
other (int) –
How many times to retry on other errors.
Other errors are errors that are not connect, read, redirect or status errors. These errors might be raised after the request was sent to the server, so the request might have side-effects.
Set to 0
to fail on the first retry of this type.
If total
is not set, it’s a good idea to set this to 0 to account
for unexpected edge cases and avoid infinite retry loops.
allowed_methods (Collection) –
Set of uppercased HTTP method verbs that we should retry on.
By default, we only retry on methods which are considered to be
idempotent (multiple requests with the same parameters end with the
same state). See Retry.DEFAULT_ALLOWED_METHODS
.
Set to a None
value to retry on any verb.
status_forcelist (Collection) –
A set of integer HTTP status codes that we should force a retry on.
A retry is initiated if the request method is in allowed_methods
and the response status code is in status_forcelist
.
By default, this is disabled with None
.
backoff_factor (float) –
A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). urllib3 will sleep for:
{backoff factor} * (2 ** ({number of previous retries}))
seconds. If backoff_jitter is non-zero, this sleep is extended by:
random.uniform(0, {backoff jitter})
seconds. For example, if the backoff_factor is 0.1, then Retry.sleep()
will
sleep for [0.0s, 0.2s, 0.4s, 0.8s, …] between retries. No backoff will ever
be longer than backoff_max.
By default, backoff is disabled (factor set to 0).
raise_on_redirect (bool) – Whether, if the number of redirects is exhausted, to raise a MaxRetryError, or to return a response with a response code in the 3xx range.
raise_on_status (bool) – Similar meaning to raise_on_redirect
:
whether we should raise an exception, or return a response,
if status falls in status_forcelist
range and retries have
been exhausted.
history (tuple) – The history of the request encountered during
each call to increment()
. The list is in the order
the requests occurred. Each list item is of class RequestHistory
.
respect_retry_after_header (bool) – Whether to respect Retry-After header on status codes defined as
Retry.RETRY_AFTER_STATUS_CODES
or not.
remove_headers_on_redirect (Collection) – Sequence of headers to remove from the request when a response indicating a redirect is returned before firing off the redirected request.
Default methods to be used for allowed_methods
Default maximum backoff time.
Default headers to be used for remove_headers_on_redirect
Default status codes to be used for status_forcelist
Backwards-compatibility for the old retries format.
Get the value of Retry-After in seconds.
Return a new Retry object with incremented retry counters.
response (BaseHTTPResponse
) – A response object, or None, if the server did not
return a response.
error (Exception) – An error encountered during the request, or None if the response was received successfully.
A new Retry
object.
Is this method/status code retryable? (Based on allowlists and control variables such as the number of total retries to allow, whether to respect the Retry-After header, whether this header is present, and whether the returned status code is on the list of status codes to be retried upon on the presence of the aforementioned header)
Sleep between retry attempts.
This method will respect a server’s Retry-After
response header
and sleep the duration of the time requested. If that is not present, it
will use an exponential backoff. By default, the backoff factor is 0 and
this method will return immediately.
Data structure for representing an HTTP URL. Used as a return value for
parse_url()
. Both the scheme and host are normalized as they are
both case-insensitive according to RFC 3986.
Authority component as defined in RFC 3986 3.2. This includes userinfo (auth), host and port.
userinfo@host:port
For backwards-compatibility with urlparse. We’re nice like that.
Network location including host and port.
If you need the equivalent of urllib.parse’s netloc
,
use the authority
property instead.
Absolute path including the query string.
Convert self into a url
This function should more or less round-trip with parse_url()
. The
returned url may not be exactly the same as the url inputted to
parse_url()
, but it should be equivalent by the RFC (e.g., urls
with a blank port will have : removed).
Example:
import urllib3
U = urllib3.util.parse_url("https://google.com/mail/")
print(U.url)
# "https://google.com/mail/"
print( urllib3.util.Url("https", "username:password",
"host.com", 80, "/path", "query", "fragment"
).url
)
# "https://username:password@host.com:80/path?query#fragment"
Given a url, return a parsed Url
namedtuple. Best-effort is
performed to parse incomplete urls. Fields not provided will be None.
This parser is RFC 3986 and RFC 6874 compliant.
The parser logic and helper functions are based heavily on
work done in the rfc3986
module.
url (str) – URL to parse into a Url
namedtuple.
Partly backwards-compatible with urllib.parse
.
Example:
import urllib3
print( urllib3.util.parse_url('http://google.com/mail/'))
# Url(scheme='http', host='google.com', port=None, path='/mail/', ...)
print( urllib3.util.parse_url('google.com:80'))
# Url(scheme=None, host='google.com', port=80, path=None, ...)
print( urllib3.util.parse_url('/foo?bar'))
# Url(scheme=None, host=None, port=None, path='/foo', query='bar', ...)
Our embarrassingly-simple replacement for mimetools.choose_boundary.
Encode a dictionary of fields
using the multipart/form-data MIME format.
fields – Dictionary of fields or list of (key, RequestField
).
Values are processed by urllib3.fields.RequestField.from_tuples()
.
boundary – If not specified, then a random boundary will be generated using
urllib3.filepost.choose_boundary()
.
Iterate over fields.
Supports list of (k, v) tuples and dicts, and lists of
RequestField
.
A data container for request body parameters.
name – The name of this request field. Must be unicode.
data – The data/value body.
filename – An optional filename of the request field. Must be unicode.
headers – An optional dict-like object of headers to initially use for the field.
Changed in version 2.0.0: The header_formatter
parameter is deprecated and will
be removed in urllib3 v2.1.0.
Override this method to change how each multipart header
parameter is formatted. By default, this calls
format_multipart_header_param()
.
name – The name of the parameter, an ASCII-only str
.
value – The value of the parameter, a str
or UTF-8 encoded
bytes
.
A RequestField
factory from old-style tuple parameters.
Supports constructing RequestField
from
parameter of key/value strings AND key/filetuple. A filetuple is a
(filename, data, MIME type) tuple where the MIME type is optional.
For example:
'foo': 'bar',
'fakefile': ('foofile.txt', 'contents of foofile'),
'realfile': ('barfile.txt', open('realfile').read()),
'typedfile': ('bazfile.bin', open('bazfile').read(), 'image/jpeg'),
'nonamefile': 'contents of nonamefile field',
Field names and filenames must be unicode.
Makes this request field into a multipart request field.
This method overrides “Content-Disposition”, “Content-Type” and “Content-Location” headers to the request parameter.
content_disposition – The ‘Content-Disposition’ of the request body. Defaults to ‘form-data’
content_type – The ‘Content-Type’ of the request body.
content_location – The ‘Content-Location’ of the request body.
Deprecated since version 2.0.0: Renamed to format_multipart_header_param()
. Will be
removed in urllib3 v2.1.0.
Deprecated since version 2.0.0: Renamed to format_multipart_header_param()
. Will be
removed in urllib3 v2.1.0.
Helper function to format and quote a single header parameter using the strategy defined in RFC 2231.
Particularly useful for header parameters which might contain non-ASCII values, like file names. This follows RFC 2388 Section 4.4.
name – The name of the parameter, a string expected to be ASCII only.
value – The value of the parameter, provided as bytes
or str`.
An RFC-2231-formatted unicode string.
Deprecated since version 2.0.0: Will be removed in urllib3 v2.1.0. This is not valid for
multipart/form-data
header parameters.
Format and quote a single multipart header parameter.
This follows the WHATWG HTML Standard as of 2021/06/10, matching
the behavior of current browser and curl versions. Values are
assumed to be UTF-8. The \n
, \r
, and "
characters are
percent encoded.
name – The name of the parameter, an ASCII-only str
.
value – The value of the parameter, a str
or UTF-8 encoded
bytes
.
A string name="value"
with the escaped value.
Changed in version 2.0.0: Matches the WHATWG HTML Standard as of 2021/06/10. Control characters are no longer percent encoded.
Changed in version 2.0.0: Renamed from format_header_param_html5
and
format_header_param
. The old names will be removed in
urllib3 v2.1.0.
A convenience, top-level request method. It uses a module-global PoolManager
instance.
Therefore, its side effects could be shared across dependencies relying on it.
To avoid side effects create a new PoolManager
instance and use it instead.
The method does not accept low-level **urlopen_kw
keyword arguments.
Alias for field number 0
Alias for field number 1
Takes the HTTP request method, body, and blocksize and transforms them into an iterable of chunks to pass to socket.sendall() and an optional ‘Content-Length’ header.
A ‘Content-Length’ of ‘None’ indicates the length of the body can’t be determined so should use ‘Transfer-Encoding: chunked’ for framing instead.
Shortcuts for generating request headers.
keep_alive – If True
, adds ‘connection: keep-alive’ header.
accept_encoding – Can be a boolean, list, or string.
True
translates to ‘gzip,deflate’. If either the brotli
or
brotlicffi
package is installed ‘gzip,deflate,br’ is used instead.
List will get joined by comma.
String will be used as provided.
user_agent – String representing the user-agent you want, such as “python-urllib3/0.6”
basic_auth – Colon-separated username:password string for ‘authorization: basic …’ auth header.
proxy_basic_auth – Colon-separated username:password string for ‘proxy-authorization: basic …’ auth header.
disable_cache – If True
, adds ‘cache-control: no-cache’ header.
Example:
import urllib3
print(urllib3.util.make_headers(keep_alive=True, user_agent="Batman/1.0"))
# {'connection': 'keep-alive', 'user-agent': 'Batman/1.0'}
print(urllib3.util.make_headers(accept_encoding=True))
# {'accept-encoding': 'gzip,deflate'}
Flush and close the IO object.
This method has no effect if the file is already closed.
Should we redirect and where to?
Truthy redirect location string if we got a redirect status
code and valid location. None
if redirect status and no
location. False
if not a redirect status code.
Parses the body of the HTTP response as JSON.
To use a custom JSON decoder pass the result of HTTPResponse.data
to the decoder.
This method can raise either UnicodeDecodeError or json.JSONDecodeError.
Read more here.
Memory-efficient bytes buffer
To return decoded data in read() and still follow the BufferedIOBase API, we need a buffer to always return the correct amount of bytes.
This buffer should be filled using calls to put()
Our maximum memory usage is determined by the sum of the size of:
self.buffer, which contains the full data
the largest chunk that we will copy in get()
The worst case scenario is a single chunk, in which case we’ll make a full copy of the data inside get().
HTTP Response container.
Backwards-compatible with http.client.HTTPResponse
but the response body
is
loaded and decoded on-demand when the data
property is accessed. This
class is also compatible with the Python standard library’s io
module, and can hence be treated as a readable object in the context of that
framework.
Extra parameters for behaviour not present in http.client.HTTPResponse
:
preload_content – If True, the response’s body will be preloaded during construction.
decode_content – If True, will attempt to decode the body based on the ‘content-encoding’ header.
original_response – When this HTTPResponse wrapper is generated from an http.client.HTTPResponse
object, it’s convenient to include the original for debug purposes. It’s
otherwise unused.
retries – The retries contains the last Retry
that
was used during the request.
enforce_content_length – Enforce content length checking. Body returned by server must match value of Content-Length header, if present. Otherwise, raise error.
Flush and close the IO object.
This method has no effect if the file is already closed.
Read and discard any remaining HTTP response data in the response connection.
Unread data in the HTTPResponse connection blocks the connection from being released back to the pool.
Returns underlying file descriptor if one exists.
OSError is raised if the IO object does not use a file descriptor.
Flush write buffers, if applicable.
This is not implemented for read-only and non-blocking streams.
Similar to http.client.HTTPResponse.read()
, but with two additional
parameters: decode_content
and cache_content
.
amt – How much of the content to read. If specified, caching is skipped because it doesn’t make sense to cache partial content as the full response.
decode_content – If True, will attempt to decode the body based on the ‘content-encoding’ header.
cache_content – If True, will save the returned data such that the same result is
returned despite of the state of the underlying file object. This
is useful if you want the .data
property to continue working
after having .read()
the file object. (Overridden if amt
is
set.)
Similar to HTTPResponse.read()
, but with an additional
parameter: decode_content
.
amt – How much of the content to read. If specified, caching is skipped because it doesn’t make sense to cache partial content as the full response.
decode_content – If True, will attempt to decode the body based on the ‘content-encoding’ header.
Return whether object was opened for reading.
If False, read() will raise OSError.
A generator wrapper for the read() method. A call will block until
amt
bytes have been read from the connection or until the
connection is closed.
amt – How much of the content to read. The generator will return up to much data per iteration, but may return less. This is particularly likely when using compressed data. However, the empty string will never be returned.
decode_content – If True, will attempt to decode the body based on the ‘content-encoding’ header.
Checks if the underlying file-like object looks like a
http.client.HTTPResponse
object. We do this by testing for
the fp attribute. If it is present we assume it returns raw chunks as
processed by read_chunked().
Obtain the number of bytes pulled over the wire so far. May differ from
the amount of content returned by :meth:urllib3.response.HTTPResponse.read
if bytes are encoded on the wire (e.g, compressed).
Returns the URL that was the source of this response. If the request that generated this response redirected, this method will return the final redirect location.
Checks if given fingerprint matches the supplied certificate.
cert – Certificate as bytes object.
fingerprint – Fingerprint as string of hexdigits, can be interspersed by colons.
Creates and configures an ssl.SSLContext
instance for use with urllib3.
ssl_version –
The desired protocol version to use. This will default to PROTOCOL_SSLv23 which will negotiate the highest protocol that both the server and your installation of OpenSSL support.
This parameter is deprecated instead use ‘ssl_minimum_version’.
ssl_minimum_version – The minimum version of TLS to be used. Use the ‘ssl.TLSVersion’ enum for specifying the value.
ssl_maximum_version – The maximum version of TLS to be used. Use the ‘ssl.TLSVersion’ enum for specifying the value. Not recommended to set to anything other than ‘ssl.TLSVersion.MAXIMUM_SUPPORTED’ which is the default value.
cert_reqs – Whether to require the certificate verification. This defaults to
ssl.CERT_REQUIRED
.
options – Specific OpenSSL options. These default to ssl.OP_NO_SSLv2
,
ssl.OP_NO_SSLv3
, ssl.OP_NO_COMPRESSION
, and ssl.OP_NO_TICKET
.
ciphers – Which cipher suites to allow the server to select. Defaults to either system configured ciphers if OpenSSL 1.1.1+, otherwise uses a secure default set of ciphers.
Constructed SSLContext object with specified options
SSLContext
Detects whether the hostname given is an IPv4 or IPv6 address. Also detects IPv6 addresses with Zone IDs.
hostname (str) – Hostname to examine.
True if the hostname is an IP address, False otherwise.
Resolves the argument to a numeric constant, which can be passed to
the wrap_socket function/method from the ssl module.
Defaults to ssl.CERT_REQUIRED
.
If given a string it is assumed to be the name of the constant in the
ssl
module or its abbreviation.
(So you can specify REQUIRED instead of CERT_REQUIRED.
If it’s neither None nor a string we assume it is already the numeric
constant which can directly be passed to wrap_socket.
like resolve_cert_reqs
All arguments except for server_hostname, ssl_context, and ca_cert_dir have
the same meaning as they do when using ssl.wrap_socket()
.
server_hostname – When SNI is supported, the expected hostname of the certificate
ssl_context – A pre-made SSLContext
object. If none is provided, one will
be created using create_urllib3_context()
.
ciphers – A string of ciphers we wish the client to support.
ca_cert_dir – A directory containing CA certificates in multiple separate files, as supported by OpenSSL’s -CApath flag or the capath argument to SSLContext.load_verify_locations().
key_password – Optional password if the keyfile is encrypted.
ca_cert_data – Optional string containing CA certificates in PEM format suitable for passing as the cadata parameter to SSLContext.load_verify_locations()
tls_in_tls – Use SSLTransport to wrap the existing socket.