page-dewarp

Python 3 port of a document image dewarping library using a cubic sheet model.

I renovated this library from a standalone Python 2 script published by Matt Zucker in 2016, in part as an exercise in understanding it better, and to more clearly identify the bottlenecks.

I gave it a nice modular organisation, rather than what I call a completely "procedural" one (in which the only way state is passed around is through function calls). I tend to associate this style of programming with languages like R, and in Python find it leads to "arg passing hell" (or, needlessly passing on args through many layers of a function call stack in which they're not used at intermediate levels).

After all was said and done, the bottleneck was in the call to scipy.optimize (as Zucker had noted in his blog post on the software on its initial release), performing nonlinear least squares optimisation.

I investigated a GPU-accelerated alternative, using a library called GPUfit. Unfortunately, the library was a little opaque (the library was a mix of C++, MatLab, CUDA, and Python) and while trying to decipher it I fixed a minor bug, but did not advance to the stage of adapting it for the NLSS procedure for dewarping (it was not particularly clear without documentation what data I should be passing in to do so).

I also came across a PyTorch optimisation library being submitted for incorporation by one of Yann LeCun's students, but [at the time I tested it] it appeared no faster than scipy.optimize (in fact I recall it was far slower).

I came across GPUfit after finding a Disney Research publication with unreleased code which described use of the Levenberg-Marquardt algorithm (a reference on which can be found towards the end of Strang's most recent linear algebra text).

Uses Levenberg-Marquadt algo, by assembling the Jacobian https://t.co/UO0WZK0BTj

Another (open source!) library Gpufit [C++ but shipped with Python wheels] https://t.co/E67K6vKMG7 goes via the Hessian
📄 Scientific Reports, 2017 https://t.co/662H2HZ48s
📝 https://t.co/hntqti70z7 pic.twitter.com/itExVu7scx
— Louis Maddox (@permutans) April 18, 2021

Aside: while doing so I also documented what the code was doing more clearly, and made notes in the project wiki. Some of these notes I later repurposed into a tutorial submitted to the scikit-image project's skimage-tutorials repo.

For now, you can view it on nbviewer, as I understand an official website is forthcoming.

While looking into 'page dewarping', I inevitably came across the deep learning version, memorably a paper by Das et al. which involved crumpled up paper which was then 'dewarped' back to the uncrumpled version (photographed before/after crumpling) using ConvNets. My personal view is that this is:

a) a misleading name, as it's a completely distinct task from homography estimation
b) a misleading task, as it's more like a task designed to achieve good results for an architecture that the authors already intended to use (what I refer to as 'a solution looking for a problem')

I was underwhelmed by the performance on page dewarping tasks, and am uncertain if there are any tasks other than those contrived for such research experiments where the ConvNet version of the task would be worthwhile.