To make Python script parametrisable with some input data we need to develop
a Command Line Interface. This will enable the end user to run the script with
various inputs and provide a way to choose optional behaviors that the script
supports. For example, running the script with --silent
suppresses the output
that it would otherwise print to the terminal, whereas --verbose
makes the
output extra detailed with fine-grained technical details included. Scripts
with well developed CLIs can be invoked in a reproducible way and thus it is
possible to integrate them into larger automation flows.
The simplest way to make Python script accept input from command line is to
read the sys.argv
list that is populated as the script is launched. Like in
C programs, sys.argv[0]
is name of the program being launched,
sys.argv[1]
is the first argument, sys.argv[2]
is the second one and so on.
Length of sys.argv
list is number of argument plus one - if there’s no CLI
arguments being passed into the script it will be 1.
The following trivial script exemplifies how this can be used to implement a very basic CLI:
#!/usr/bin/python3
import sys
def main():
if len(sys.argv) == 1:
print("Usage:")
print("{} <arg1> <arg2> ...".format(sys.argv[0]))
return
for arg in sys.argv[1:]:
print(arg)
if __name__ == "__main__":
main()
As we can see, the CLI arguments are accessed from within the script and printed to standard output:
$ python3 argv_demo.py
Usage:
argv_demo.py <arg1> <arg2> ...
$ python3 argv_demo.py 1 2 3
1
2
3
However this way of implementing the command line interface is fairly low-level and under-abstracted for more advanced use cases. For example, if we wanted to implement subcommands or multiple options that may or may not be associated with values we would need to do quite a bit of work to parse that list of arguments into an internal form that we would be using further in the code.
There are some Python modules to make it easier. The argparse module ships with vanilla Python installation. Click is an open source project that provides further abstraction layer to enable setting up the CLI with very little code.
For demonstration purposes, let us assume that we want to develop a simple script that takes one or more HTTP(S) URLs, tries fetching them with GET requests and prints the HTTP response status codes to the user. We want to be able to pass either single URL or a list of URLs in a file. Futhermore, we want to optionally set timeout duration and HTTP debug information.
Using argparse
When using argparse
module, we need to instantiate ArgumentParser
object,
set it up with all the CLI arguments and use it to parse sys.argv
(without
the first element). As a starting point, we can use Python REPL for prototyping.
$ python3
Python 3.10.6 (main, Aug 11 2022, 13:36:31) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import argparse
>>> parser = argparse.ArgumentParser(description="Simple HTTP URL Checker")
>>> parser
ArgumentParser(prog='', usage=None, description='Simple HTTP URL Checker', formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)
>>> parser.add_argument("--timeout", type=float, default=1.0, required=False, help="Timeout duration for HTTP GET request in seconds (default: 1.0)")
_StoreAction(option_strings=['--timeout'], dest='timeout', nargs=None, const=None, default=1.0, type=<class 'float'>, choices=None, required=False, help='Timeout duration for HTTP GET request in seconds (default: 1.0)', metavar=None)
>>> parser.add_argument("--verbose", default=False, required=False, action='store_true', help="Enable HTTP debug output (off by default)")
_StoreTrueAction(option_strings=['--verbose'], dest='verbose', nargs=0, const=True, default=False, type=None, choices=None, required=False, help='Enable HTTP debug output (off by default)', metavar=None)
>>> parser.add_argument("--url-list", required=True, nargs='*', help="HTTP URLs or files containing them (one URL per line)")
_StoreAction(option_strings=['--url-list'], dest='url_list', nargs='*', const=None, default=None, type=None, choices=None, required=True, help='HTTP URLs or files containing them (one URL per line)', metavar=None)
We have configured the optional --verbose
and --timeout
options, as well the required
--url-list
argument that will take a list of URLs or files. Note that when setting up
the latter we call add_argument()
with nargs
set to *
- this will tell the parser
that we expect a list of values that are to be parsed into Python list.
Now we can do a little testing on our argument parser.
>>> parser.print_help()
usage: [-h] [--timeout TIMEOUT] [--verbose] --url-list [URL_LIST ...]
Simple HTTP URL Checker
options:
-h, --help show this help message and exit
--timeout TIMEOUT Timeout duration for HTTP GET request in seconds (default: 1.0)
--verbose Enable HTTP debug output (off by default)
--url-list [URL_LIST ...]
HTTP URLs or files containing them (one URL per line)
>>> parser.parse_args('--timeout 2.0 --verbose --url-list http://localhost:1313 http://localhost:1212'.split(' '))
Namespace(timeout=2.0, verbose=True, url_list=['http://localhost:1313', 'http://localhost:1212'])
>>> parser.parse_args('--url-list http://localhost:1313'.split(' '))
Namespace(timeout=1.0, verbose=False, url_list=['http://localhost:1313'])
It seems like it works properly. The complete script with argparse
-based CLI is as following:
#!/usr/bin/python3
import argparse
import http.client
import os
import sys
import logging
import requests
HEADERS = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
"cache-control": "no-cache",
"pragma": "no-cache",
"sec-ch-ua": '"Chromium";v="104", " Not A;Brand";v="99", "Google Chrome";v="104"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"macOS"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "cross-site",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
}
DEFAULT_TIMEOUT = 1.0
def check_url(url, timeout):
try:
resp = requests.get(url, headers=HEADERS, timeout=timeout)
return resp.status_code
except Exception as e:
return str(e)
def check_urls(urls, timeout, verbose):
if verbose:
# Based on: https://stackoverflow.com/a/16630836
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
for u in urls:
if u.startswith("http://") or u.startswith("https://"):
print("{}\t{}".format(u, check_url(u, timeout)))
else:
if os.path.isfile(u):
urls_from_file = []
in_f = open(u, "r")
lines = in_f.read().strip().split("\n")
for line in lines:
if line.startswith("http://") or line.startswith("https://"):
urls_from_file.append(line)
in_f.close()
check_urls(urls_from_file, timeout, verbose)
else:
print("File not found: {}".format(u), file=sys.stderr)
def main():
parser = argparse.ArgumentParser(description="Simple HTTP URL Checker")
parser.add_argument(
"--timeout",
type=float,
default=DEFAULT_TIMEOUT,
required=False,
help="Timeout duration for HTTP GET request in seconds (default: {})".format(
DEFAULT_TIMEOUT
),
)
parser.add_argument(
"--verbose",
action="store_true",
required=False,
help="Enable HTTP debug output (off by default)",
)
parser.add_argument(
"--url-list",
required=True,
nargs="*",
help="HTTP URLs or files containing them (one URL per line)",
)
args = parser.parse_args(sys.argv[1:])
check_urls(args.url_list, args.timeout, args.verbose)
if __name__ == "__main__":
main()
We can do some test runs to check that the CLI works correctly:
$ python3 url_scan_argparse.py -h
usage: url_scan_argparse.py [-h] [--timeout TIMEOUT] [--verbose] --url-list [URL_LIST ...]
Simple HTTP URL Checker
options:
-h, --help show this help message and exit
--timeout TIMEOUT Timeout duration for HTTP GET request in seconds (default: 1.0)
--verbose Enable HTTP debug output (off by default)
--url-list [URL_LIST ...]
HTTP URLs or files containing them (one URL per line)
$ python3 url_scan_argparse.py --url-list https://trickster.dev https://trickster.dev/.git
https://trickster.dev 200
https://trickster.dev/.git 404
$ python3 url_scan_argparse.py --timeout 2.0
usage: url_scan_argparse.py [-h] [--timeout TIMEOUT] [--verbose] --url-list [URL_LIST ...]
url_scan_argparse.py: error: the following arguments are required: --url-list
$ python3 url_scan_argparse.py --verbose --url-list https://trickster.dev https://trickster.dev/.git
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): trickster.dev:443
DEBUG:urllib3.connectionpool:https://trickster.dev:443 "GET / HTTP/1.1" 301 41
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.trickster.dev:443
DEBUG:urllib3.connectionpool:https://www.trickster.dev:443 "GET / HTTP/1.1" 200 None
https://trickster.dev 200
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): trickster.dev:443
DEBUG:urllib3.connectionpool:https://trickster.dev:443 "GET /.git HTTP/1.1" 301 45
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.trickster.dev:443
DEBUG:urllib3.connectionpool:https://www.trickster.dev:443 "GET /.git HTTP/1.1" 404 None
https://trickster.dev/.git 404
Since this code is based on requests module it heeds HTTP_PROXY
and HTTPS_PROXY
environment
variables, which means it can be used to run some tests to check proxy pool suitability for a
particular target site.
Using Click
Modifying the code to use Click is not that difficult. First, we import it by putting the following import statement at the top:
import click
Then at them bottom we make check_urls()
function to be an entry point of the script:
if __name__ == "__main__":
check_urls()
Next, we annotate this function with Click-specific annotations that will connect it to the command line interface:
@click.command()
@click.option(
"--timeout",
default=DEFAULT_TIMEOUT,
help="Timeout duration for HTTP GET request in seconds",
)
@click.option(
"--verbose",
default=False,
is_flag=True,
help="Enable HTTP debug output (off by default)",
)
@click.option(
"--url-list",
required=True,
multiple=True,
help="HTTP URLs or files containing them (one URL per line)",
)
def check_urls(url_list, timeout, verbose):
Click module will make sure that value of --timeout
from CLI will be assigned
to timeout
function parameter. The same applies to other two arguments.
However there is one limitation. Although Click can take multiple values
for --url-list
and parse them into Python array, it does not support
the variadic form we used in the previous script. Each entry in the list will
need to have --url-list
before it:
$ python3 url_scan_click.py --url-list https://trickster.dev --url-list https://trickster.dev/.git
https://trickster.dev 200
https://trickster.dev/.git 404