netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: David Ahern <dsahern@kernel.org>, David Wei <dw@davidwei.uk>,
	io-uring@vger.kernel.org, netdev@vger.kernel.org
Cc: Pavel Begunkov <asml.silence@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Mina Almasry <almasrymina@google.com>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx
Date: Thu, 10 Oct 2024 12:11:54 -0600	[thread overview]
Message-ID: <9bbab76f-70db-48ef-9dcc-7fedd75582cb@kernel.dk> (raw)
In-Reply-To: <7f8c6192-3652-4761-b2e3-8a4bddb71e29@kernel.dk>

On 10/10/24 8:21 AM, Jens Axboe wrote:
> On 10/9/24 11:12 AM, Jens Axboe wrote:
>> On 10/9/24 10:53 AM, Jens Axboe wrote:
>>> On 10/9/24 10:50 AM, Jens Axboe wrote:
>>>> On 10/9/24 10:35 AM, David Ahern wrote:
>>>>> On 10/9/24 9:43 AM, Jens Axboe wrote:
>>>>>> Yep basically line rate, I get 97-98Gbps. I originally used a slower box
>>>>>> as the sender, but then you're capped on the non-zc sender being too
>>>>>> slow. The intel box does better, but it's still basically maxing out the
>>>>>> sender at this point. So yeah, with a faster (or more efficient sender),
>>>>>
>>>>> I am surprised by this comment. You should not see a Tx limited test
>>>>> (including CPU bound sender). Tx with ZC has been the easy option for a
>>>>> while now.
>>>>
>>>> I just set this up to test yesterday and just used default! I'm sure
>>>> there is a zc option, just not the default and hence it wasn't used.
>>>> I'll give it a spin, will be useful for 200G testing.
>>>
>>> I think we're talking past each other. Yes send with zerocopy is
>>> available for a while now, both with io_uring and just sendmsg(), but
>>> I'm using kperf for testing and it does not look like it supports it.
>>> Might have to add it... We'll see how far I can get without it.
>>
>> Stanislav pointed me at:
>>
>> https://github.com/facebookexperimental/kperf/pull/2
>>
>> which adds zc send. I ran a quick test, and it does reduce cpu
>> utilization on the sender from 100% to 95%. I'll keep poking...
> 
> Update on this - did more testing and the 100 -> 95 was a bit of a
> fluke, it's still maxed. So I added io_uring send and sendzc support to
> kperf, and I still saw the sendzc being maxed out sending at 100G rates
> with 100% cpu usage.
> 
> Poked a bit, and the reason is that it's all memcpy() off
> skb_orphan_frags_rx() -> skb_copy_ubufs(). At this point I asked Pavel
> as that made no sense to me, and turns out the kernel thinks there's a
> tap on the device. Maybe there is, haven't looked at that yet, but I
> just killed the orphaning and tested again.
> 
> This looks better, now I can get 100G line rate from a single thread
> using io_uring sendzc using only 30% of the single cpu/thread (including
> irq time). That is good news, as it unlocks being able to test > 100G as
> the sender is no longer the bottleneck.
> 
> Tap side still a mystery, but it unblocked testing. I'll figure that
> part out separately.

Further update - the above mystery was dhclient, thanks a lot to David
for being able to figure that out very quickly.

But the more interesting update - I got both links up on the receiving
side, providing 200G of bandwidth. I re-ran the test, with proper zero
copy running on the sending side, and io_uring zcrx on the receiver. The
receiver is two threads, BUT targeting the same queue on the two nics.
Both receiver threads bound to the same core (453 in this case). In
other words, single cpu thread is running all of both rx threads, napi
included.

Basic thread usage from top here:

10816 root      20   0  396640 393224      0 R  49.0   0.0   0:01.77 server
10818 root      20   0  396640 389128      0 R  49.0   0.0   0:01.76 server      

and I get 98.4Gbps and 98.6Gbps on the receiver side, which is basically
the combined link bw again. So 200G not enough to saturate a single cpu
thread.

-- 
Jens Axboe

  parent reply	other threads:[~2024-10-10 18:11 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-07 22:15 [PATCH v1 00/15] io_uring zero copy rx David Wei
2024-10-07 22:15 ` [PATCH v1 01/15] net: devmem: pull struct definitions out of ifdef David Wei
2024-10-09 20:17   ` Mina Almasry
2024-10-09 23:16     ` Pavel Begunkov
2024-10-10 18:01       ` Mina Almasry
2024-10-10 18:57         ` Pavel Begunkov
2024-10-13 22:38           ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 02/15] net: prefix devmem specific helpers David Wei
2024-10-09 20:19   ` Mina Almasry
2024-10-07 22:15 ` [PATCH v1 03/15] net: generalise net_iov chunk owners David Wei
2024-10-08 15:46   ` Stanislav Fomichev
2024-10-08 16:34     ` Pavel Begunkov
2024-10-09 16:28       ` Stanislav Fomichev
2024-10-11 18:44         ` David Wei
2024-10-11 22:02           ` Pavel Begunkov
2024-10-11 22:25             ` Mina Almasry
2024-10-11 23:12               ` Pavel Begunkov
2024-10-09 20:44   ` Mina Almasry
2024-10-09 22:13     ` Pavel Begunkov
2024-10-09 22:19     ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 04/15] net: page_pool: create hooks for custom page providers David Wei
2024-10-09 20:49   ` Mina Almasry
2024-10-09 22:02     ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 05/15] net: prepare for non devmem TCP memory providers David Wei
2024-10-09 20:56   ` Mina Almasry
2024-10-09 21:45     ` Pavel Begunkov
2024-10-13 22:33       ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 06/15] net: page_pool: add ->scrub mem provider callback David Wei
2024-10-09 21:00   ` Mina Almasry
2024-10-09 21:59     ` Pavel Begunkov
2024-10-10 17:54       ` Mina Almasry
2024-10-13 17:25         ` David Wei
2024-10-14 13:37           ` Pavel Begunkov
2024-10-14 22:58           ` Mina Almasry
2024-10-16 17:42             ` Pavel Begunkov
2024-11-01 17:18               ` Mina Almasry
2024-11-01 18:35                 ` Pavel Begunkov
2024-11-01 19:24                   ` Mina Almasry
2024-11-01 21:38                     ` Pavel Begunkov
2024-11-04 20:42                       ` Mina Almasry
2024-11-04 21:27                         ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 07/15] net: page pool: add helper creating area from pages David Wei
2024-10-09 21:11   ` Mina Almasry
2024-10-09 21:34     ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 08/15] net: add helper executing custom callback from napi David Wei
2024-10-08 22:25   ` Joe Damato
2024-10-09 15:09     ` Pavel Begunkov
2024-10-09 16:13       ` Joe Damato
2024-10-09 19:12         ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 09/15] io_uring/zcrx: add interface queue and refill queue David Wei
2024-10-09 17:50   ` Jens Axboe
2024-10-09 18:09     ` Jens Axboe
2024-10-09 19:08     ` Pavel Begunkov
2024-10-11 22:11     ` Pavel Begunkov
2024-10-13 17:32     ` David Wei
2024-10-07 22:15 ` [PATCH v1 10/15] io_uring/zcrx: add io_zcrx_area David Wei
2024-10-09 18:02   ` Jens Axboe
2024-10-09 19:05     ` Pavel Begunkov
2024-10-09 19:06       ` Jens Axboe
2024-10-09 21:29   ` Mina Almasry
2024-10-07 22:15 ` [PATCH v1 11/15] io_uring/zcrx: implement zerocopy receive pp memory provider David Wei
2024-10-09 18:10   ` Jens Axboe
2024-10-09 22:01   ` Mina Almasry
2024-10-09 22:58     ` Pavel Begunkov
2024-10-10 18:19       ` Mina Almasry
2024-10-10 20:26         ` Pavel Begunkov
2024-10-10 20:53           ` Mina Almasry
2024-10-10 20:58             ` Mina Almasry
2024-10-10 21:22             ` Pavel Begunkov
2024-10-11  0:32               ` Mina Almasry
2024-10-11  1:49                 ` Pavel Begunkov
2024-10-07 22:16 ` [PATCH v1 12/15] io_uring/zcrx: add io_recvzc request David Wei
2024-10-09 18:28   ` Jens Axboe
2024-10-09 18:51     ` Pavel Begunkov
2024-10-09 19:01       ` Jens Axboe
2024-10-09 19:27         ` Pavel Begunkov
2024-10-09 19:42           ` Jens Axboe
2024-10-09 19:47             ` Pavel Begunkov
2024-10-09 19:50               ` Jens Axboe
2024-10-07 22:16 ` [PATCH v1 13/15] io_uring/zcrx: add copy fallback David Wei
2024-10-08 15:58   ` Stanislav Fomichev
2024-10-08 16:39     ` Pavel Begunkov
2024-10-08 16:40     ` David Wei
2024-10-09 16:30       ` Stanislav Fomichev
2024-10-09 23:05         ` Pavel Begunkov
2024-10-11  6:22           ` David Wei
2024-10-11 14:43             ` Stanislav Fomichev
2024-10-09 18:38   ` Jens Axboe
2024-10-07 22:16 ` [PATCH v1 14/15] io_uring/zcrx: set pp memory provider for an rx queue David Wei
2024-10-09 18:42   ` Jens Axboe
2024-10-10 13:09     ` Pavel Begunkov
2024-10-10 13:19       ` Jens Axboe
2024-10-07 22:16 ` [PATCH v1 15/15] io_uring/zcrx: throttle receive requests David Wei
2024-10-09 18:43   ` Jens Axboe
2024-10-07 22:20 ` [PATCH v1 00/15] io_uring zero copy rx David Wei
2024-10-08 23:10 ` Joe Damato
2024-10-09 15:07   ` Pavel Begunkov
2024-10-09 16:10     ` Joe Damato
2024-10-09 16:12     ` Jens Axboe
2024-10-11  6:15       ` David Wei
2024-10-09 15:27 ` Jens Axboe
2024-10-09 15:38   ` David Ahern
2024-10-09 15:43     ` Jens Axboe
2024-10-09 15:49       ` Pavel Begunkov
2024-10-09 15:50         ` Jens Axboe
2024-10-09 16:35       ` David Ahern
2024-10-09 16:50         ` Jens Axboe
2024-10-09 16:53           ` Jens Axboe
2024-10-09 17:12             ` Jens Axboe
2024-10-10 14:21               ` Jens Axboe
2024-10-10 15:03                 ` David Ahern
2024-10-10 15:15                   ` Jens Axboe
2024-10-10 18:11                 ` Jens Axboe [this message]
2024-10-14  8:42                   ` David Laight
2024-10-09 16:55 ` Mina Almasry
2024-10-09 16:57   ` Jens Axboe
2024-10-09 19:32     ` Mina Almasry
2024-10-09 19:43       ` Pavel Begunkov
2024-10-09 19:47       ` Jens Axboe
2024-10-09 17:19   ` David Ahern
2024-10-09 18:21   ` Pedro Tammela
2024-10-10 13:19     ` Pavel Begunkov
2024-10-11  0:35     ` David Wei
2024-10-11 14:28       ` Pedro Tammela
2024-10-11  0:29   ` David Wei
2024-10-11 19:43     ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bbab76f-70db-48ef-9dcc-7fedd75582cb@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=almasrymina@google.com \
    --cc=asml.silence@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).