public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: Stephen Hemminger <shemminger@osdl.org>
Cc: linuxppc-dev@ozlabs.org, akpm@osdl.org,
	James K Lewis <jklewis@us.ibm.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Jeff Garzik <jgarzik@pobox.com>,
	Jens Osterkamp <Jens.Osterkamp@de.ibm.com>,
	David Miller <davem@davemloft.net>,
	Linas Vepstas <linas@austin.ibm.com>
Subject: [RFC v2] HOWTO use NAPI to reduce TX interrupts
Date: Sun, 20 Aug 2006 19:48:19 +0200	[thread overview]
Message-ID: <200608201948.20596.arnd@arndb.de> (raw)
In-Reply-To: <200608191325.19557.arnd@arndb.de>

A recent discussion about the spidernet driver resulted in the dicovery
that network drivers are supposed to use NAPI for both their receive and
transmit paths, but this is documented nowhere.

In order to help the next person writing a NAPI based driver, I wrote
down what I found missing about this.

Please tell me if anything in here is still wrong or could use better
wording.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
This is the second version of my mini howto, after a few comments
I got from Stephen Hemminger and  Avuton Olrich.

Index: linux-cg/Documentation/networking/NAPI_HOWTO.txt
===================================================================
--- linux-cg.orig/Documentation/networking/NAPI_HOWTO.txt	2006-08-20 16:51:12.000000000 +0200
+++ linux-cg/Documentation/networking/NAPI_HOWTO.txt	2006-08-20 19:42:20.000000000 +0200
@@ -1,11 +1,6 @@
-HISTORY:
-February 16/2002 -- revision 0.2.1:
-COR typo corrected
-February 10/2002 -- revision 0.2:
-some spell checking ;->
-January 12/2002 -- revision 0.1
-This is still work in progress so may change.
-To keep up to date please watch this space.
+Note: this document could use a serious cleanup by a good writer.
+It would be nice to split out the reference parts into a kerneldoc
+document and turn the rest into a tutorial.
 
 Introduction to NAPI
 ====================
@@ -738,6 +733,64 @@
 root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
 root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 
 
+
+APPENDIX 4: Using NAPI for TX skb cleanup
+=========================================
+
+While most of the discussion is focused on optimizing the receive path, in
+most drivers it is also beneficial to free TX buffers from the dev->poll()
+function. Many devices trigger an interrupt for each packet that has been
+sent out to notify the driver that it can free the skb. This results in
+a large amount of interrupt processing that we want to avoid. It is also
+suboptimal to free skbs in a hardirq context, because dev_kfree_skb_irq()
+needs to schedule a softirq to do the actual work. Calling dev_kfree_skb()
+from dev->poll() directly avoids these extra softirq schedules.
+
+The simplistic approach of setting a long kernel timer to clean up
+descriptors results in poor throughput because a user process that tries
+to send out a lot of data then blocks on its socket send buffer, while
+the driver never frees up the skbs in that buffer until the timeout.
+
+Trying the cleanup every time that hard_start_xmit() is entered provides
+relatively good throughput, but typically causes extra processing overhead
+because of mmio accesses and/or spinlocks, so you normally want to batch
+skb reclaim.
+
+In order to get optimal throughput on transmit, the sent skbs need to be
+cleaned up before the chip runs out of data to transmit, so relying on
+an end of queue interrupt means that in the window between the interrupt
+and the time that new user packets have arrived in the adapter, there is
+no outgoing data on the wire, even if user data is available.  It may
+also be bad to defer freeing skbs too long because they may consume a
+significant amount of memory.
+
+Experience shows that combination of events that trigger skb reclaim
+works best. These events include:
+- new packets coming in through hard_start_xmit()
+- packets coming in from the network through dev->poll()
+- time has passed since the first packet was send over the wire
+  but has not been reclaimed (tx_coalesce_usecs)
+- a number of packets have been sent (tx_max_coalesced_frames)
+
+We can avoid expensive locking between these by using the poll() function
+as the only place to call skb reclaim. This also means that in the
+interrupt handler, we always call netif_rx_schedule() for any interrupt,
+including those for tx or e.g. PHY handling.  This is particularly
+helpful if reading the IRQ status does an auto mask operation.
+
+Depending on the actual hardware, slightly different methods for coalesced
+tx interrupts may be used:
+- a timer that starts with the successful transmission of a packet
+  may need to be replaced with a timer that is started at when a packet
+  is submitted to the adapter.
+- instead of an interrupt that is triggered after a fixed number
+  of transmitted packets, it may be possible to mark a specific packet
+  so it generates an interrupt after processing.
+- If the adapter knows about the number of packets that have been
+  queued, a low-watermark interrupt may be used that fires when the
+  number drops below a user-defined value.
+
+
 --------------------------------------------------------------------
 
 relevant sites:
@@ -764,3 +817,4 @@
 Manfred Spraul <manfred@colorfullife.com>
 Donald Becker <becker@scyld.com>
 Jeff Garzik <jgarzik@pobox.com>
+Arnd Bergmann <arnd@arndb.de>

  reply	other threads:[~2006-08-20 17:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
2006-08-18 22:20 ` [PATCH 1/6]: powerpc/cell spidernet burst alignment patch Linas Vepstas
2006-08-18 22:51   ` Arnd Bergmann
2006-08-18 22:21 ` [PATCH 2/6]: powerpc/cell spidernet low watermark patch Linas Vepstas
     [not found]   ` <200608190109.15129.arnd@arndb.de>
2006-08-20  6:31     ` Benjamin Herrenschmidt
2006-08-20 10:03       ` Arnd Bergmann
2006-08-23 21:36         ` Linas Vepstas
2006-08-23 22:03           ` David Miller
2006-08-18 22:23 ` [PATCH 3/6]: powerpc/cell spidernet stop error printing patch Linas Vepstas
2006-08-18 22:25 ` [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info Linas Vepstas
2006-08-18 22:56   ` Arnd Bergmann
2006-08-18 22:26 ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
2006-08-18 23:03   ` Arnd Bergmann
2006-08-19  0:56     ` [RFC] HOWTO use NAPI to reduce TX interrupts Arnd Bergmann
2006-08-20  1:31       ` Stephen Hemminger
2006-08-19 11:25         ` Arnd Bergmann
2006-08-20 17:48           ` Arnd Bergmann [this message]
2006-08-21 20:40             ` NAPI documentation Stephen Hemminger
2006-08-21 22:05               ` David Miller
2006-08-21 22:09                 ` Stephen Hemminger
2006-08-21 22:17                   ` David Miller
2006-08-21 23:52           ` [RFC] HOWTO use NAPI to reduce TX interrupts Linas Vepstas
2006-08-21 23:56             ` David Miller
2006-08-22  0:29               ` Roland Dreier
2006-08-22  0:32                 ` David Miller
2006-08-23 21:52     ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
2006-08-18 22:29 ` [PATCH 6/6]: powerpc/cell spidernet refine locking Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200608201948.20596.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=Jens.Osterkamp@de.ibm.com \
    --cc=akpm@osdl.org \
    --cc=davem@davemloft.net \
    --cc=jgarzik@pobox.com \
    --cc=jklewis@us.ibm.com \
    --cc=linas@austin.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox