From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruce Cole Subject: [RFT] r8169 changes against 2.6.23-rc3 Date: Tue, 21 Aug 2007 12:32:52 -0700 Message-ID: <46CB3DE4.4060107@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: bacole@gmail.com To: Francois Romieu , netdev@vger.kernel.org Return-path: Received: from ag-out-0708.google.com ([72.14.246.248]:2063 "EHLO ag-out-0708.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750819AbXHUTer (ORCPT ); Tue, 21 Aug 2007 15:34:47 -0400 Received: by ag-out-0708.google.com with SMTP id 35so560275aga for ; Tue, 21 Aug 2007 12:34:46 -0700 (PDT) Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On 8/20/07, Dirk wrote: >> So it seems that when the driver tries to queue a packet while the >> controller is busy processing the queue, the newly queued packet does >> not get noticed by the controller (until further packet activity occurs). >> Perhaps there is a problem with the memory barriers when adding to the >> TX queue, but I'm a newbie on linux kernel memory barriers. > >One thing I noticed a while ago (march) is that floodpinging (ping -f) >the r8169 host from an external system also increases performance >without changing code. Yes, I just tried this and saw the same result. Makes perfect sense - if the TX queue is normally getting stuck until TCP retransmits, then keeping the TX queue busy keeps the queue from remaining stuck. I think this is a good demonstration that the underlying problem is a stuck TX queue as suggested. >I ended up (until now perhaps :-) with disabling the onboard nic and >adding an e1000 card. Yes, ditching the realtek interface and going with an ad-on nic seems to be what everyone has been doing to get around this problem. Perhaps you'd like to try the busy-wait workaround with ndelay(10)? It has saved me from buying an e1000 card as well. Speaking of the e1000, I notice that its TX queue processing code for that driver includes spin_lock_irqsave()/spin_unlock_irqrestore() protection on access to the queue. The r8169 driver seems to be missing equivalent code. Last time I dealt with kernel locking bugs was in the old days of splnet()/splx(), so I could use some help here, but I suspect this could be fixed with more careful locking.