Searching package entity inventories

qp arose from a desire to do a comprehensive inventory of certain data science libraries to solidify my understanding of their components. Doing so manually would be prohibitively time-consuming, and helpfully many of the major projects are documented with a tool called Sphinx, which I know from previous experience produces (as a side effect), something called "intersphinx inventories", as an objects.inv file colocated with the documentation website.

For this project I made a very minimal package to get the functionality I wanted, and a nice easy CLI with defopt which let me iterate quickly (where I usually use the more sophisticated argparse).

I soon pivoted from the original intention of targeting a single library (pandas) to targeting multiple, and then extended it to target any library from a given URL.

From the README:

  • To get a list of all the entities in PyTorch (stable version) and their URLs, run:

sh qp torch -v stable -q | wc -l3366

  • To pull out just the torch.Tensor class methods, run:

sh qp torch -v stable --role method --names torch.Tensor -q | wc -l514

  • This has many uses, for example to create a list of markdown format links, pipe it as:

sh echo "$(qp torch -v stable -r method -n torch.Tensor -q)" | \ sed -e 's/ /]: /g' -e 's/^torch\.Tensor\./[/g' ⇣ ```md

... ```

The first usage was in creating a lookup page for all methods on PyTorch's Tensor, which greatly solidified my grasp on this important class and continues to serve as a reference. With a few chained bash commands, I was quickly able to set up the skeleton for this mammoth reference and get a sense of the relative scale of each part, and minimise the amount of manual work involved in creating such a reference work.