* NAPI vs. interrupts
@ 2003-01-11 22:08 Jeff Garzik
2003-01-12 3:41 ` jamal
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Jeff Garzik @ 2003-01-11 22:08 UTC (permalink / raw)
To: netdev, linux-net; +Cc: davem
I am seeing some tg3 reports occasionally that show a fair number of
interrupts-per-second, even though tg3 is 100% NAPI.
It seems to me that as machines get faster, and amount of memory
increase [xlat: less waiting for free RAM in all parts of the kernel,
and less GFP_ATOMIC alloc failures], the likelihood that a NAPI driver
can process 100% of the RX and TX work without having to reqquest
subsequent iterations of dev->poll().
NAPI's benefits kick in when there is some amount of system load.
However if the box is fast enough to eliminate cases where system load
would otherwise exist (interrupt and packet processing overhead), the
NAPI "worst case" kicks in, where a NAPI driver _always_ does
ack some irqs
mask irqs
ack some more irqs
process events
unmask irqs
whereas a non-NAPI driver _always_ does
ack irqs
process events
When there is load, the obvious NAPI benefits kick in. However, on
super-fast servers, SMP boxes, etc. it seems likely to me that one can
receive well in excess of 1,000 interrupts per second, simply because
the box is so fast it can run thousands of iterations of the NAPI "worst
case", above.
The purpose of this email is to solicit suggestions to develop a
strategy to fix what I believe is a problem with NAPI.
Here are some comments of mine:
1) Can this problem be alleviated entirely without driver changes? For
example, would it be reasonable to do pkts-per-second sampling in the
net core, and enable software mitigation based on that?
2) Implement hardware mitigation in addition to NAPI. Either the driver
does adaptive sampling, or simply hard-locks mitigation settings at
something that averages out to N pkts per second.
3) Implement an alternate driver path that follows the classical,
non-NAPI interrupt handling path in addition to NAPI, by logic similar
to this[warning: off the cuff and not analyzed... i.e. just an idea]:
ack irqs
call dev->poll() from irq handler
[processes events until budget runs out,
or available events are all processed]
if budget ran out,
mask irqs
netif_rx_schedule()
[this, #3, does not address the irq-per-sec problem directly, but does
lessen the effect of "worst case"]
Anyway, for tg3 specifically, I am leaning towards the latter part of #2,
hard-locking mitigation settings at something tests prove is
"reasonable", and in heavy load situations NAPI will kick in as
expected, and perform its magic ;-)
Comments/feedback requested.
Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: NAPI vs. interrupts
2003-01-11 22:08 NAPI vs. interrupts Jeff Garzik
@ 2003-01-12 3:41 ` jamal
2003-01-13 15:57 ` Robert Olsson
2003-01-13 16:46 ` Ethernet simulation driver for VME bus Zhigang Liu
2 siblings, 0 replies; 5+ messages in thread
From: jamal @ 2003-01-12 3:41 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, linux-net, davem
On Sat, 11 Jan 2003, Jeff Garzik wrote:
> I am seeing some tg3 reports occasionally that show a fair number of
> interrupts-per-second, even though tg3 is 100% NAPI.
>
sample data?
> It seems to me that as machines get faster, and amount of memory
> increase [xlat: less waiting for free RAM in all parts of the kernel,
> and less GFP_ATOMIC alloc failures], the likelihood that a NAPI driver
> can process 100% of the RX and TX work without having to reqquest
> subsequent iterations of dev->poll().
>
This is very interesting - I havent come across such a CPU myself.
I am just about to unleash a P4 machine (> 2Ghz) so i may see this.
> NAPI's benefits kick in when there is some amount of system load.
> However if the box is fast enough to eliminate cases where system load
> would otherwise exist (interrupt and packet processing overhead), the
> NAPI "worst case" kicks in, where a NAPI driver _always_ does
> ack some irqs
> mask irqs
> ack some more irqs
> process events
> unmask irqs
>
> whereas a non-NAPI driver _always_ does
> ack irqs
> process events
>
I think you may have added one more transaction on NAPI, but thats beside
the point.
Yes, this is what would happen in the worst case on NAPI.
When Manfred first posted his results i was one of the doubting people
as to the effect of these extra IOs. Recently i got my hands on a pentium
(lucky me!) and i was able to see up to 8% increase on CPU with NAPI
vs non-NAPI under conditions of what i would consider typically to
be low traffic input (anywhere between 5-8000 packets coming into the
system). To describe the problem better, anywhere where the CPU can
process the full rate (example the 5000 packets/sec arrivals on the
pentium above) the system becomes penalized by the extra IO. Note in
the above pentium system, the effect of IO became lower when i switched
to MMIO based PCI transactions. Of course when going got tough on the
pentium it died without NAPI.
> When there is load, the obvious NAPI benefits kick in. However, on
> super-fast servers, SMP boxes, etc. it seems likely to me that one can
> receive well in excess of 1,000 interrupts per second, simply because
> the box is so fast it can run thousands of iterations of the NAPI "worst
> case", above.
>
True but you are better off with NAPI than without it in fast machines.
[I dont think youll notice much of CPU cycles disappearing on P3 for
example]
It's the slow machines that are a problem - and there you make the
sacrifice on small loads where you spend the unnecessary CPU but benefit
when you go under stress.
Lets take a worst case doomsday scenario:
Giges max rate: 1.4Mpps; take only 30% of that and you are
talking rx interupts at around 500K/sec. Say you have two giges NICs,
thats 1M receive interupts/sec. Is there a commodity type h/ware which
can handle this?
Robert and i have been discussing this very issue for our presentation at
nordu/usenix coming up and his arguement is: who gives a shit if you
loose some CPU cycles at low rates? He has a point, of course; I have been
experimenting with some things which will kick into NAPI at high rates
and maintain old scheme but what i can say at this point is that
they are experimental and that some of the benefits of NAPI disapear as
a result. For the scheme to work, all the NAPI benefits have to be
maintained and it has to be very unintrusive.
> The purpose of this email is to solicit suggestions to develop a
> strategy to fix what I believe is a problem with NAPI.
>
> Here are some comments of mine:
>
> 1) Can this problem be alleviated entirely without driver changes? For
> example, would it be reasonable to do pkts-per-second sampling in the
> net core, and enable software mitigation based on that?
>
> 2) Implement hardware mitigation in addition to NAPI. Either the driver
> does adaptive sampling, or simply hard-locks mitigation settings at
> something that averages out to N pkts per second.
>
> 3) Implement an alternate driver path that follows the classical,
> non-NAPI interrupt handling path in addition to NAPI, by logic similar
> to this[warning: off the cuff and not analyzed... i.e. just an idea]:
>
> ack irqs
> call dev->poll() from irq handler
> [processes events until budget runs out,
> or available events are all processed]
> if budget ran out,
> mask irqs
> netif_rx_schedule()
>
> [this, #3, does not address the irq-per-sec problem directly, but does
> lessen the effect of "worst case"]
>
> Anyway, for tg3 specifically, I am leaning towards the latter part of #2,
> hard-locking mitigation settings at something tests prove is
> "reasonable", and in heavy load situations NAPI will kick in as
> expected, and perform its magic ;-)
>
Youll run into all sorts of problems with 1 and 3. Example in SMP.
I think 2 is the best path for now. If we can collect data that shows
this to be an issue we can accelerate getting a patch; i only work on it
when i am bored. For now i agree with Roberts philosophy - if we can get
the workaround for free while maintaining NAPI benefits, great. The
question is do we care about slow machines loosing some cycles?
cheers,
jamal
^ permalink raw reply [flat|nested] 5+ messages in thread
* NAPI vs. interrupts
2003-01-11 22:08 NAPI vs. interrupts Jeff Garzik
2003-01-12 3:41 ` jamal
@ 2003-01-13 15:57 ` Robert Olsson
2003-01-13 16:46 ` Ethernet simulation driver for VME bus Zhigang Liu
2 siblings, 0 replies; 5+ messages in thread
From: Robert Olsson @ 2003-01-13 15:57 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, linux-net, davem
Jeff Garzik writes:
> It seems to me that as machines get faster, and amount of memory
> increase [xlat: less waiting for free RAM in all parts of the kernel,
> and less GFP_ATOMIC alloc failures], the likelihood that a NAPI driver
> can process 100% of the RX and TX work without having to reqquest
> subsequent iterations of dev->poll().
True.
> NAPI's benefits kick in when there is some amount of system load.
> However if the box is fast enough to eliminate cases where system load
> would otherwise exist (interrupt and packet processing overhead), the
> NAPI "worst case" kicks in, where a NAPI driver _always_ does
> ack some irqs
> mask irqs
> ack some more irqs
> process events
> unmask irqs
>
Another "worst case" :-)
NAPI subsequent iterations of dev->poll at softirq
whereas a non-NAPI driver **always** does IRQ
ack irqs
process events
So for this we pay the "insurance fee" of acking and disabling irq's
to get dev->poll running. The acking we need non-NAPI as well to see
if this irq is for us. And if case NAPI the ack at irq is passed to
dev->poll. (Davem patch to e1000). So more or we less we have the cost
of one PCI write + PCI sync if MMIO to disable irq's to enable processing
at softirq. And another PCI-write to enable irq's.
NAPI relays on the fact interrupts is the best way to indicate work
and keep latency at an absolute minimum sparse traffic-levels and
polling is unbeatable at high loads.
And the packet processing at softirq gives good system behavior system
manageable and even some hooks to control the balance between irq/softirq
and user mode apps.
> 1) Can this problem be alleviated entirely without driver changes? For
> example, would it be reasonable to do pkts-per-second sampling in the
> net core, and enable software mitigation based on that?
We played with this some time ago... First I think we should say we need
something from the particular device we are dealing with any general
measures we risk doing really bad things... i.e adding latencies to
wrong devices etc.
We tried different forms of averages and EWMA (exponented weighted
moving average) but nothing was fast enough to "follow" the burstiness
of a device receiving packets.
The only usable measure we found was the number of RX packets on the
ring. This also has the "advantage" of being adjusted by the CPU
load. You have it tulip.
> 2) Implement hardware mitigation in addition to NAPI. Either the driver
> does adaptive sampling, or simply hard-locks mitigation settings at
> something that averages out to N pkts per second.
Yes should be doable...
But real question do we need it? I'm asking this question myself I was
mentally disturbed the interrupts myself and I added mitigation to tulip.
And even some other hacks in tulip delaying the exit to "done".
Still I'm not sure...
> 3) Implement an alternate driver path that follows the classical,
> non-NAPI interrupt handling path in addition to NAPI, by logic similar
> to this[warning: off the cuff and not analyzed... i.e. just an idea]:
>
> ack irqs
> call dev->poll() from irq handler
> [processes events until budget runs out,
> or available events are all processed]
Well you need some more steps. The central backlog is needed again and you
need to schedule the RX softirq to process the backlog's packets.
> if budget ran out,
> mask irqs
> netif_rx_schedule()
>
> [this, #3, does not address the irq-per-sec problem directly, but does
> lessen the effect of "worst case"]
No "top" performance would probably be somewhat be less due to irq's.
> Anyway, for tg3 specifically, I am leaning towards the latter part of #2,
> hard-locking mitigation settings at something tests prove is
> "reasonable", and in heavy load situations NAPI will kick in as
> expected, and perform its magic ;-)
I've planned a test with the recent tg3 stuff for some time...
Cheers.
--ro
^ permalink raw reply [flat|nested] 5+ messages in thread
* Ethernet simulation driver for VME bus
2003-01-11 22:08 NAPI vs. interrupts Jeff Garzik
2003-01-12 3:41 ` jamal
2003-01-13 15:57 ` Robert Olsson
@ 2003-01-13 16:46 ` Zhigang Liu
2 siblings, 0 replies; 5+ messages in thread
From: Zhigang Liu @ 2003-01-13 16:46 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, linux-net
Hi, All:
Is there an ethernet simulation driver for VME bus?
The closest one I found is the hhnet driver from monta vista which
simulate ethernet if across compact PCI bus.
I am looking for support for the Tundra universe II PCI-VME bridge for
the driver.
Thanks
--Zhigang
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: NAPI vs. interrupts
@ 2003-01-15 8:06 Feldman, Scott
0 siblings, 0 replies; 5+ messages in thread
From: Feldman, Scott @ 2003-01-15 8:06 UTC (permalink / raw)
To: Jeff Garzik, netdev, linux-net; +Cc: davem
> NAPI driver _always_ does
> ack some irqs
> mask irqs
> ack some more irqs
> process events
> unmask irqs
e1000+NAPI is this path.
> The purpose of this email is to solicit suggestions to
> develop a strategy to fix what I believe is a problem with NAPI.
>
> Here are some comments of mine:
>
> 1) Can this problem be alleviated entirely without driver
> changes? For example, would it be reasonable to do
> pkts-per-second sampling in the net core, and enable software
> mitigation based on that?
>
> 2) Implement hardware mitigation in addition to NAPI. Either
> the driver does adaptive sampling, or simply hard-locks
> mitigation settings at something that averages out to N pkts
> per second.
I've tried something similar to this while playing around with e1000
recently. Using ITR (InterruptThrottleRate), dial in a max intr/sec
rate of say 4000 intr/sec, and then just call netif_rx_schedule() for
each interrupt. Don't mask/unmask interrupts. If already polling,
netif_rx_schedule does nothing. The code differences between the NAPI
path and non-NAPI path was minimal, so I liked that, and your worst case
is gone. If you look at the average size of the Rx packets, you could
fine tune ITR to get a pkt/intr rate to balance your quota to try to
maximize the amount of time spent polling, but this is too fancy for my
taste. Anyway, worst case, 2*4000 PIO writes/sec was the savings, but I
can't say I could measure a difference.
-scott
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-01-15 8:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-11 22:08 NAPI vs. interrupts Jeff Garzik
2003-01-12 3:41 ` jamal
2003-01-13 15:57 ` Robert Olsson
2003-01-13 16:46 ` Ethernet simulation driver for VME bus Zhigang Liu
-- strict thread matches above, loose matches on Subject: below --
2003-01-15 8:06 NAPI vs. interrupts Feldman, Scott
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).