impscan

Identifying minimal imports list and repository sources by parsing package dependency trees en masse.

impscan is a Python CLI tool to 'scan' imports and identify packages. It uses range-streams to efficiently 'peek' at package archive metadata: namely the site-packages paths which, if present, indicate a Python binary is installable via that archive.

After the initial non-streaming approach proved unusably slow, I took a detour to write the range-streams package, with tests so I had peace of mind it actually worked as intended, and documented so that I didn't need to rely on keeping its workings in my short term memory. After the proof-of-concept was set up, I extended it to archives, which was slightly non-trivial for conda archives, where there are [potentially, for .conda archives] multiple levels of compression (what's known as solid compression).

WIP/to be revisited