Hardware acceleration means only the point, where the data a processed - and in your case, even building a custom ASIC would give you enormous power consumption due to the algorithm requirements, same goes for anything high-resolution, like cameras, gesture processing for 3D - they can't be made currently within low budget and low power requirements, below 10W; like scanners, interpolation is harmful when the task is to manually process the data, in fact it lowers accuracy if there is a better computer-side algorithm. I'm interested in using the touchpad like faster keyboard, which would give maximum response time of up to 25% faster compared to mechanical ones with longer key movement.
What is the USB version you are using ? For 16 bit it would take a whole USB channel 2.0 to transfer the data continuously, at that resolution it is close to its limit (modern scanners are actually limited by USB mostly, while the scanner makers are not in a hurry to make new products before making most possible profit out of old models).
Also, do you use shared memory or some sort of memcpy (2x performance penalty) to give data to the user application ?