impscan
is a Python CLI tool to 'scan' imports and identify packages. It uses
range-streams to efficiently 'peek' at package archive metadata:
namely the site-packages
paths which, if present, indicate a Python binary is installable via that
archive.
After the initial non-streaming approach proved unusably slow, I took a detour to write the
range-streams
package, with tests so I had peace of mind it actually worked as intended, and
documented so that I didn't need to rely on keeping its workings in my short term memory. After
the proof-of-concept was set up, I extended it to archives, which was slightly non-trivial for conda
archives, where there are [potentially, for .conda
archives] multiple levels of compression
(what's known as solid compression).
- As of August 2021, this project took a sideline while I look into Dask parallelisation for CSVs (contributing a PR to improve support) in turn to use in wikitransp (the dataset for which, WIT, uses an awkward form of CSV not amenable to Dask at this time).
WIP/to be revisited