On Tue, Aug 17, 2004 at 12:10:09PM +0400, Andrey Panin wrote: > On 230, 08 17, 2004 at 09:27:51AM +0200, Arjan van de Ven wrote: > > On Tue, 2004-08-17 at 08:13, Jens Maurer wrote: > > > The attached patch (against kernel 2.6.8.1) enables using SSE > > > instructions for copy_page and clear_page. > > > > > > A user-space test on my Pentium III 850 MHz shows a 3x speedup for > > > clear_page (compared to the default "rep stosl"), and a 50% speedup > > > for copy_page (compared to the default "rep movsl"). For a Pentium-4, > > > the speedup is about 50% in both the clear_page and copy_page cases. > > > > > > we used to have code like this in 2.4 but it got removed: the non > > temperal store code is faster in a microbenchmark but has the > > fundamental problem that it evics the data from the cpu cache; the > > actual USE of the data thus is a LOT more expensive, result is that the > > overall system performance goes down ;( > > Did SSE clear_page() suffered from this issue too ? yes especially clear_page; the kernel only calls clear_page when it's *just about* to use it, so it's actually the worst case example ;( (and clear_page gains the most because non-temperal stores avoid the write allocate so it like halves the memory bandwidth)