One of the reason why 2.4 console performance is good especially at low bit depths is its ability to process more than 1 pixel per iteration and its usage of mask arrays. I tried to generalize the above in cfbimgblt.c by incorporating the idea in fbcon-cfb*.c. It's significantly faster but still not as fast as the 2.4 API. time cat /usr/src/linux/MAINTAINERs (40K text file) 1024x768-8bpp, y-panning disabled 2.5 old (with offscreen buffers) real 0m10.708s user 0m0.001s sys 0m10.707s 2.5 new real 0m4.378s user 0m0.002s sys 0m4.375s 2.4 real 0m2.098s user 0m0.000s sys 0m2.070s I've only tested the implementation at 8, 16, 24, and 32 bpp. 24bpp is slightly slower than 32 bpp :( Tony