From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Lumby Subject: r8169 : always copying the rx buffer to new skb Date: Mon, 18 Apr 2011 13:08:17 -0400 Message-ID: <4DAC7001.9060800@hotmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Return-path: Received: from toq8.bellnexxia.net ([209.226.175.204]:47283 "EHLO toq8-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754667Ab1DRROV (ORCPT ); Mon, 18 Apr 2011 13:14:21 -0400 Received: from toip5.srvr.bell.ca ([209.226.175.88]) by tomts13-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20110418170805.JJOS1798.tomts13-srv.bellnexxia.net@toip5.srvr.bell.ca> for ; Mon, 18 Apr 2011 13:08:05 -0400 Sender: netdev-owner@vger.kernel.org List-ID: Summary - current r8169 always memcpy's every received buffer to new skb, and I'd like to propose that it should not, which can improve throughput / reduce CPU utilization by around 10% In the patch 2010-10-09 Stanislaw Gruszka r8169: allocate with GFP_KERNEL flag when able to sleep the code for making a decision "shall I copy the buffer to new skb or unhook it from ring to pass up and allocate new" based on a module param called rx_copybreak, was removed, and instead we now always allocate naked data buffers (i.e. not wrapped in skb) and always memcpy each one to a new skb to pass up to netif. I think (not sure) this was related to a bug Bug 629158 Network adapter "disappears" after resuming from acpi suspend although the change from using GFP_ATOMIC to using GFP_KERNEL for initial allocation was made (I believe) in 2010-04-30 Eric Dumazet r8169: Fix rtl8169_rx_interrupt() So I am not entirely certain of the motivation for the removal of rx_copybreak and the "always memcpy". But I believe it is not necessary, at least not on all (most?) systems, and have measured 11% increase in throughput (aggregate Mbits/sec) on a heavy network benchmark on my own machine, on 2.6.39-rc2 with rps/rfs enabled, by patching the code back to the 2.6.39 equivalent of rx_copybreak. oprofile shows the improvement is all or mostly obtained from avoiding the memcpy'ing. And of course since the memcpy'ing is done in the driver before the netif/rps gets it, all that memcpy'ing is done on CPU0 (I think true on any SMP machine with an rtl8169?) The measurements are : aggregate (rx+tx) Mbits/sec through the NIC 2.6.39-rc2 unpatched 918 2.6.39-rc2 patched, rx_copybreak=64 1026 packet rates fluctuate around 60K - 70K pkts/sec rx plus 60K - 70K pkts/sec tx I would like to propose that : . switch back to wrapping databuffers in skb's (so we have the option of copying or unhooking) . re-introduce rx_copybreak module param so each sysadmin can control if wished. That is my main proposal, and I would be interested to hear thoughts on that. As a second proposal (which I made before), I'd like to suggest that the number of rx buffers allocated at open should be configurable by module param. This is not needed for my other proposal but may help reduce the possibility of temporary shortage of buffers. There is really no justification for assuming that 256 buffers is the correct number for all systems from a netbook to a 32-way server. I ran my "patched" measurement with num_rx_buffs=128 and there were no alloc failures logged. I would say that if there is any enthusiasm for the main avoid_memcpy proposal, then it is worth the extra small effort to do the num_rx_buffs as well. My current patch (including configurable num_rx_buffs) - lines deleted 120 lines added 668 BTW if this proposal is acceptable, I'm willing to do the patch work but I have only one machine with a r8169 (actually a RTL8168c) to test on. John Lumby