From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: ajitk@serverengines.com, netdev@vger.kernel.org
Subject: Re: [net-next-2.6 PATCH][be2net] remove napi in the tx path and do tx completion processing in interrupt context
Date: Wed, 20 May 2009 11:25:44 +0200 [thread overview]
Message-ID: <4A13CC98.60506@cosmosbay.com> (raw)
In-Reply-To: <20090519.151334.87370735.davem@davemloft.net>
David Miller a écrit :
> From: Ajit Khaparde <ajitk@serverengines.com>
> Date: Tue, 19 May 2009 17:40:58 +0530
>
>> This patch will remove napi in tx path and do Tx compleiton
>> processing in interrupt context. This makes Tx completion
>> processing simpler without loss of performance.
>>
>> Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
>
> This is different from how every other NAPI driver does this.
>
> You should have a single NAPI context, that handles both TX and RX
> processing. Except, that for TX processing, no work budget
> adjustments are made. You simply unconditionally process all pending
> TX work without accounting it into the POLL call budget.
>
> I have no idea why this driver tried to split the RX and TX
> work like this, it accomplishes nothing but add overhead.
> Simply add the TX completion code to the RX poll handler
> and that's all you need to do. Also, make sure to run TX
> polling work before RX polling work, this makes fresh SKBs
> available for responses generated by RX packet processing.
>
> I bet this is why you really saw performance problems, rather than
> something to do with running it directly in interrupt context. There
> should be zero gain from that if you do the TX poll work properly in
> the RX poll handler. When you free TX packets in hardware interrupt
> context using dev_kfree_skb_any() that just schedules a software
> interrupt to do the actual SKB free, which adds just more overhead for
> TX processing work. You aren't avoiding software IRQ work by doing TX
> processing in the hardware interrupt handler, in fact you
> theoretically are doing more.
>
> So the only conclusion I can come to is that what is important is
> doing the TX completion work before the RX packets get processed in
> the NAPI poll handler, and you accomplish that more efficiently and
> more properly by simply moving the TX completion work to the top of
> the RX poll handler code.
>
Thanks David for this analysis
I would like to point a scalability problem we currently have with non
multiqueue devices, and multi core host with the schem you described/advocated.
(this has nothing to do with the be2net patch, please forgive me for jumping in)
When a lot of network trafic is handled by one device, we enter in a
ksofirqd/napi mode, where one cpu is almost dedicated in handling
both TX completions and RX completions, while other cpus
run application code (and some parts of TCP/UDP stack )
Thats really expensive because of many cache line ping pongs occurring.
In that case, it would make sense to transfert most part of the TX completion work
to the other cpus (cpus that order the xmits actually). skb freeing of course,
and sock_wfree() callbacks...
So maybe some NIC device drivers could let their ndo_start_xmit()
do some cleanup work of previously sent skbs. If correctly done,
we could lower number of cache line ping pongs.
This would give a breath to the cpu that would only take care of RX completions,
and probably give better throughput. Some machines out there want to transmit
lot of frames, while receiving few ones...
There is also a minor latency problem with current schem :
Taking care of TX completion takes some time and delay RX handling, increasing latencies
of incoming trafic.
next prev parent reply other threads:[~2009-05-20 9:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-19 12:10 [net-next-2.6 PATCH][be2net] remove napi in the tx path and do tx completion processing in interrupt context Ajit Khaparde
2009-05-19 22:13 ` David Miller
2009-05-20 9:25 ` Eric Dumazet [this message]
2009-05-21 0:25 ` David Miller
2009-05-20 12:56 ` Ajit Khaparde
2009-05-20 19:46 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A13CC98.60506@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=ajitk@serverengines.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).