Python requests module is a well established open source library for doing HTTP requests that is widely used in web scraping and other fields. However it has some limitations. At the time of writing, requests module only supports HTTP/1.1 yet significant fraction of sites are supporting more modern, faster HTTP/2 protocol and there is no support for asynchronous communication in requests module.
HTTPX is a newer, more modern Python
module that addresses some of the limitations of requests module (not to be
confused with another httpx
that is CLI tool for probing HTTP servers). It can be installed as httpx
through PIP. Furthermore, one can install the following optional dependencies:
h2
for HTTP/2 support (throughhttpx[http2]
)socksio
for SOCKS proxy support (throughhttpx[socks]
)click
for optional CLI tool (throughhttpx[cli]
)rich
for making the optional CLI tool prettierbrotli
orbrotlicffi
for supporting Cloudflare’s Brotli compression (throughhttpx[brotli]
)
HTTPX is largely, but not entirely a drop-in replacement for the aforementioned requests module. There are some differences, such as:
- It does not follow redirects automatically by default (we need to set pass
follow_redirects=True
). .url
on response object is URL object, not a string- Keys of proxy dictionary are
http://
andhttps://
, nothttp
andhttps
. - All request have timeout set them by default to avoid things getting stuck.
Let us launch Python REPL and make a simple HTTPS request.
$ python3
Python 3.10.6 (main, Aug 30 2022, 04:58:14) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpx
>>> resp = httpx.get("https://api.github.com/users/facebook-github-bot/events/public")
>>> resp
<Response [200 OK]>
>>> resp.status_code
200
>>> resp.text[:100]
'[{"id":"24020632984","type":"IssueCommentEvent","actor":{"id":6422482,"login":"facebook-github-bot",'
>>> type(resp.json())
<class 'list'>
>>> resp.json()[0]
{'id': '24020632984', 'type': 'IssueCommentEvent', 'actor': {'id': 6422482, 'login': 'facebook-github-bot', 'display_login': 'facebook-github-bot', 'gravatar_id': '', 'url': 'https://api.github.com/users/facebook-github-bot', 'avatar_url': 'https://avatars.githubusercontent.com/u/6422482?'}, 'repo': {'id': 392055467, 'name': 'facebookresearch/fbpcs', 'url': 'https://api.github.com/repos/facebookresearch/fbpcs'}, 'payload': {'action': 'created', 'issue': {'url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432', 'repository_url': 'https://api.github.com/repos/facebookresearch/fbpcs', 'labels_url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432/labels{/name}', 'comments_url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432/comments', 'events_url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432/events', 'html_url': 'https://github.com/facebookresearch/fbpcs/pull/432', 'id': 1070015172, 'node_id': 'PR_kwDOF15Kq84vVQRm', 'number': 432, 'title': 'update graphapi version', 'user': {'login': 'benliugithub', 'id': 90293689, 'node_id': 'MDQ6VXNlcjkwMjkzNjg5', 'avatar_url': 'https://avatars.githubusercontent.com/u/90293689?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/benliugithub', 'html_url': 'https://github.com/benliugithub', 'followers_url': 'https://api.github.com/users/benliugithub/followers', 'following_url': 'https://api.github.com/users/benliugithub/following{/other_user}', 'gists_url': 'https://api.github.com/users/benliugithub/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/benliugithub/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/benliugithub/subscriptions', 'organizations_url': 'https://api.github.com/users/benliugithub/orgs', 'repos_url': 'https://api.github.com/users/benliugithub/repos', 'events_url': 'https://api.github.com/users/benliugithub/events{/privacy}', 'received_events_url': 'https://api.github.com/users/benliugithub/received_events', 'type': 'User', 'site_admin': False}, 'labels': [{'id': 3223376167, 'node_id': 'MDU6TGFiZWwzMjIzMzc2MTY3', 'url': 'https://api.github.com/repos/facebookresearch/fbpcs/labels/CLA%20Signed', 'name': 'CLA Signed', 'color': '009900', 'default': False, 'description': 'This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.'}, {'id': 3251191745, 'node_id': 'MDU6TGFiZWwzMjUxMTkxNzQ1', 'url': 'https://api.github.com/repos/facebookresearch/fbpcs/labels/fb-exported', 'name': 'fb-exported', 'color': 'ededed', 'default': False, 'description': None}], 'state': 'open', 'locked': False, 'assignee': None, 'assignees': [], 'milestone': None, 'comments': 1, 'created_at': '2021-12-02T21:54:32Z', 'updated_at': '2022-09-15T07:20:13Z', 'closed_at': None, 'author_association': 'NONE', 'active_lock_reason': None, 'draft': False, 'pull_request': {'url': 'https://api.github.com/repos/facebookresearch/fbpcs/pulls/432', 'html_url': 'https://github.com/facebookresearch/fbpcs/pull/432', 'diff_url': 'https://github.com/facebookresearch/fbpcs/pull/432.diff', 'patch_url': 'https://github.com/facebookresearch/fbpcs/pull/432.patch', 'merged_at': None}, 'body': 'Summary: As title\n\nReviewed By: leegross\n\nDifferential Revision: D32811061\n\n', 'reactions': {'url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432/reactions', 'total_count': 0, '+1': 0, '-1': 0, 'laugh': 0, 'hooray': 0, 'confused': 0, 'heart': 0, 'rocket': 0, 'eyes': 0}, 'timeline_url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432/timeline', 'performed_via_github_app': None, 'state_reason': None}, 'comment': {'url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/comments/1247684175', 'html_url': 'https://github.com/facebookresearch/fbpcs/pull/432#issuecomment-1247684175', 'issue_url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/432', 'id': 1247684175, 'node_id': 'IC_kwDOF15Kq85KXiZP', 'user': {'login': 'facebook-github-bot', 'id': 6422482, 'node_id': 'MDQ6VXNlcjY0MjI0ODI=', 'avatar_url': 'https://avatars.githubusercontent.com/u/6422482?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/facebook-github-bot', 'html_url': 'https://github.com/facebook-github-bot', 'followers_url': 'https://api.github.com/users/facebook-github-bot/followers', 'following_url': 'https://api.github.com/users/facebook-github-bot/following{/other_user}', 'gists_url': 'https://api.github.com/users/facebook-github-bot/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/facebook-github-bot/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/facebook-github-bot/subscriptions', 'organizations_url': 'https://api.github.com/users/facebook-github-bot/orgs', 'repos_url': 'https://api.github.com/users/facebook-github-bot/repos', 'events_url': 'https://api.github.com/users/facebook-github-bot/events{/privacy}', 'received_events_url': 'https://api.github.com/users/facebook-github-bot/received_events', 'type': 'User', 'site_admin': False}, 'created_at': '2022-09-15T07:20:13Z', 'updated_at': '2022-09-15T07:20:13Z', 'author_association': 'CONTRIBUTOR', 'body': 'Hi @benliugithub! \n\nThank you for your pull request. \n\nWe **require** contributors to sign our **Contributor License Agreement**, and yours needs attention.\n\nYou currently have a record in our system, but the CLA is no longer valid, and will need to be **resubmitted**.\n\n# Process\n\nIn order for us to review and merge your suggested changes, please sign at <https://code.facebook.com/cla>. **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.\n\nOnce the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with `CLA signed`. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.\n\nIf you have received this in error or have any questions, please contact us at [[email protected]](mailto:[email protected]?subject=CLA%20for%20facebookresearch%2Ffbpcs%20%23432). Thanks!', 'reactions': {'url': 'https://api.github.com/repos/facebookresearch/fbpcs/issues/comments/1247684175/reactions', 'total_count': 0, '+1': 0, '-1': 0, 'laugh': 0, 'hooray': 0, 'confused': 0, 'heart': 0, 'rocket': 0, 'eyes': 0}, 'performed_via_github_app': None}}, 'public': True, 'created_at': '2022-09-15T07:20:14Z', 'org': {'id': 16943930, 'login': 'facebookresearch', 'gravatar_id': '', 'url': 'https://api.github.com/orgs/facebookresearch', 'avatar_url': 'https://avatars.githubusercontent.com/u/16943930?'}}
This was HTTP/1.1 request. But what if we wanted to launch a HTTP/2 request?
Assuming httpx[http2]
is installed, we would need instantiate a client object
with http2
keyword argument set to true:
>>> client = httpx.Client(http2=True)
>>> client
<httpx.Client object at 0x105f3a2c0>
>>> resp = client.get("https://api.github.com/users/facebook-github-bot/events/public")
>>> resp
<Response [200 OK]>
>>> resp.text[:100]
'[{"id":"24020972057","type":"PullRequestEvent","actor":{"id":6422482,"login":"facebook-github-bot","'
>>> resp.http_version
'HTTP/2'
The client object in HTTPX is largely equivalent to requests.Session
: it can manage cookies
between requests, reuse headers, proxy URLs and other settings. Furthermore, HTTPX client implements HTTP
connection pooling for improved performance. This entails refraining from closing TCP connections
to remote servers after getting response and keeping them available to save time when new request
is available. The client object can be used as context manager, but if we wanted to explicitly
close all connection in the pool we could do so by calling close()
method.
If we want to benefit from async I/O we can use httpx.AsyncClient
instead of httpx.Client
:
$ python3 -m asyncio
asyncio REPL 3.10.6 (main, Aug 30 2022, 04:58:14) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Use "await" directly instead of "asyncio.run()".
Type "help", "copyright", "credits" or "license" for more information.
>>> import asyncio
>>> import httpx
>>> async with httpx.AsyncClient() as client:
... resp = await client.get("https://instagram.com")
...
>>> resp
<Response [301 Moved Permanently]>
If you have installed the optional httpx[cli]
submodule, you can use the CLI tool that is
developed with the library:
$ httpx --download fb_bot.json https://api.github.com/users/facebook-github-bot/events/public
HTTP/1.1 200 OK
Server: GitHub.com
Date: Thu, 15 Sep 2022 07:53:21 GMT
Content-Type: application/json; charset=utf-8
Cache-Control: public, max-age=60, s-maxage=60
Vary: Accept, Accept-Encoding, Accept, X-Requested-With
ETag: W/"0d377aecaac2a62f7662fc84bfb6ca33a1e6372d14da1fa4f89a3a39f02e8191"
Last-Modified: Thu, 15 Sep 2022 07:44:01 GMT
X-Poll-Interval: 60
X-GitHub-Media-Type: github.v3; format=json
Link: <https://api.github.com/user/6422482/events/public?page=2>; rel="next", <https://api.github.com/user/6422482/events/public?page=8>; rel="last"
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes,
X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 0
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Content-Security-Policy: default-src 'none'
Content-Encoding: gzip
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 52
X-RateLimit-Reset: 1663229406
X-RateLimit-Resource: core
X-RateLimit-Used: 8
Accept-Ranges: bytes
Transfer-Encoding: chunked
X-GitHub-Request-Id: EA77:1A4B:39239D:3B3337:6322D9F0
Downloading fb_bot.json 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16,286/0 bytes ?
$ file fb_bot.json
fb_bot.json: JSON data
HTTPX heeds the following environment variables:
HTTPX_LOG_LEVEL
- enables extra logging for debugging purposes (can bedebug
ortrace
)SSLKEYLOGFILE
- path for saving TLS handshake secrets, so that TLS connections can be decrypted in Wireshark.HTTP_PROXY
andHTTPS_PROXY
- proxy URLs to be used for HTTP and HTTPS requests.ALL_PROXY
- proxy URL to be used for all requests.NO_PROXY
- comma-separated list of hostnames and URLs that should be accessed without proxying.NETRC
- a path to a .netrc file.SSL_CERT_FILE
- path to custom CA certificate (.crt file). This may be useful when working with proxies that intercept TLS connections.SSL_CERT_DIR
- X.509 certificate directory path. This directory is expected to follow the OpenSSL layout.