public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: James Chapman <jchapman@katalix.com>
To: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jan-Bernd Themann <ossthema@de.ibm.com>,
	akepner@sgi.com, netdev <netdev@vger.kernel.org>,
	Christoph Raisch <raisch@de.ibm.com>,
	Jan-Bernd Themann <themann@de.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-ppc <linuxppc-dev@ozlabs.org>,
	Marcus Eder <meder@de.ibm.com>, Thomas Klein <tklein@de.ibm.com>,
	Stefan Roscher <stefan.roscher@de.ibm.com>
Subject: Re: RFC: issues concerning the next NAPI interface
Date: Fri, 24 Aug 2007 18:16:45 +0100	[thread overview]
Message-ID: <46CF127D.1090609@katalix.com> (raw)
In-Reply-To: <20070824085203.42f4305c@freepuppy.rosehill.hemminger.net>

Stephen Hemminger wrote:
> On Fri, 24 Aug 2007 17:47:15 +0200
> Jan-Bernd Themann <ossthema@de.ibm.com> wrote:
> 
>> Hi,
>>
>> On Friday 24 August 2007 17:37, akepner@sgi.com wrote:
>>> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
>>>> .......
>>>> 3) On modern systems the incoming packets are processed very fast. Especially
>>>>    on SMP systems when we use multiple queues we process only a few packets
>>>>    per napi poll cycle. So NAPI does not work very well here and the interrupt 
>>>>    rate is still high. What we need would be some sort of timer polling mode 
>>>>    which will schedule a device after a certain amount of time for high load 
>>>>    situations. With high precision timers this could work well. Current
>>>>    usual timers are too slow. A finer granularity would be needed to keep the
>>>>    latency down (and queue length moderate).
>>>>
>>> We found the same on ia64-sn systems with tg3 a couple of years 
>>> ago. Using simple interrupt coalescing ("don't interrupt until 
>>> you've received N packets or M usecs have elapsed") worked 
>>> reasonably well in practice. If your h/w supports that (and I'd 
>>> guess it does, since it's such a simple thing), you might try 
>>> it.
>>>
>> I don't see how this should work. Our latest machines are fast enough that they
>> simply empty the queue during the first poll iteration (in most cases).
>> Even if you wait until X packets have been received, it does not help for
>> the next poll cycle. The average number of packets we process per poll queue
>> is low. So a timer would be preferable that periodically polls the 
>> queue, without the need of generating a HW interrupt. This would allow us
>> to wait until a reasonable amount of packets have been received in the meantime
>> to keep the poll overhead low. This would also be useful in combination
>> with LRO.
>>
> 
> You need hardware support for deferred interrupts. Most devices have it (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have expired.

Does hardware interrupt mitigation really interact well with NAPI? In my 
experience, holding off interrupts for X packets or Y usecs does more 
harm than good; such hardware features are useful only when the OS has 
no NAPI-like mechanism.

When tuning NAPI drivers for packets/sec performance (which is a good 
indicator of driver performance), I make sure that the driver stays in 
NAPI polled mode while it has any rx or tx work to do. If the CPU is 
fast enough that all work is always completed on each poll, I have the 
driver stay in polled mode until dev->poll() is called N times with no 
work being done. This keeps interrupts disabled for reasonable traffic 
levels, while minimizing packet processing latency. No need for hardware 
interrupt mitigation.

> The parameters for controlling it are already in ethtool, the issue is finding a good
> default set of values for a wide range of applications and architectures. Maybe some
> heuristic based on processor speed would be a good starting point. The dynamic irq
> moderation stuff is not widely used because it is too hard to get right.

I agree. It would be nice to find a way for the typical user to derive 
best values for these knobs for his/her particular system. Perhaps a 
tool using pktgen and network device phy internal loopback could be 
developed?

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


  parent reply	other threads:[~2007-08-24 17:17 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-24 13:59 RFC: issues concerning the next NAPI interface Jan-Bernd Themann
2007-08-24 15:37 ` akepner
2007-08-24 15:47   ` Jan-Bernd Themann
2007-08-24 15:52     ` Stephen Hemminger
2007-08-24 16:50       ` David Stevens
2007-08-24 21:44         ` David Miller
2007-08-24 21:51           ` Linas Vepstas
2007-08-24 16:51       ` Linas Vepstas
2007-08-24 17:07         ` Rick Jones
2007-08-24 17:45         ` Shirley Ma
2007-08-24 17:16       ` James Chapman [this message]
2007-08-24 18:11         ` Jan-Bernd Themann
2007-08-24 21:47         ` David Miller
2007-08-24 22:06           ` akepner
2007-08-26 19:36           ` James Chapman
2007-08-27  1:58             ` David Miller
2007-08-27  9:47               ` Jan-Bernd Themann
2007-08-27 20:37                 ` David Miller
2007-08-28 11:19                   ` Jan-Bernd Themann
2007-08-28 20:21                     ` David Miller
2007-08-29  7:10                       ` Jan-Bernd Themann
2007-08-29  8:15                         ` James Chapman
2007-08-29  8:43                           ` Jan-Bernd Themann
2007-08-29  8:29                         ` David Miller
2007-08-29  8:31                           ` Jan-Bernd Themann
2007-08-27 15:51               ` James Chapman
2007-08-27 16:02                 ` Jan-Bernd Themann
2007-08-27 17:05                   ` James Chapman
2007-08-27 21:02                 ` David Miller
2007-08-27 21:41                   ` James Chapman
2007-08-27 21:56                     ` David Miller
2007-08-28  9:22                       ` James Chapman
2007-08-28 11:48                         ` Jan-Bernd Themann
2007-08-28 12:16                           ` Evgeniy Polyakov
2007-08-28 14:55                           ` James Chapman
2007-08-28 11:21                   ` Jan-Bernd Themann
2007-08-28 20:25                     ` David Miller
2007-08-28 20:27                     ` David Miller
2007-08-24 16:45 ` Linas Vepstas
2007-08-24 21:43   ` David Miller
2007-08-24 21:32 ` David Miller
2007-08-24 21:37 ` David Miller
     [not found] <8VHRR-45R-17@gated-at.bofh.it>
     [not found] ` <8VKwj-8ke-27@gated-at.bofh.it>
2007-08-24 19:04   ` Bodo Eggert
2007-08-24 20:42     ` Linas Vepstas
2007-08-24 21:11       ` Jan-Bernd Themann
2007-08-24 21:35         ` Linas Vepstas
     [not found]   ` <E1IOeSm-0000bm-Jo__24045.532072387$1187982363$gmane$org@be1.lrz>
2007-08-24 20:24     ` Stephen Hemminger
  -- strict thread matches above, loose matches on Subject: below --
2007-08-25  2:10 Mitchell Erblich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46CF127D.1090609@katalix.com \
    --to=jchapman@katalix.com \
    --cc=akepner@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=meder@de.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=ossthema@de.ibm.com \
    --cc=raisch@de.ibm.com \
    --cc=shemminger@linux-foundation.org \
    --cc=stefan.roscher@de.ibm.com \
    --cc=themann@de.ibm.com \
    --cc=tklein@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox