public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: David Ahern <dsahern@gmail.com>,
	io-uring@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Jakub Kicinski <kuba@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Willem de Bruijn <willemb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>, Jens Axboe <axboe@kernel.dk>
Subject: Re: [RFC 00/12] io_uring zerocopy send
Date: Wed, 1 Dec 2021 19:11:39 +0000	[thread overview]
Message-ID: <889c0306-afed-62cd-d95b-a20b8e798979@gmail.com> (raw)
In-Reply-To: <c4424a7a-2ef1-6524-9b10-1e7d1f1e1fe4@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2937 bytes --]

On 12/1/21 17:57, David Ahern wrote:
> On 12/1/21 8:32 AM, Pavel Begunkov wrote:
>>
>> Sure. First, for dummy I set mtu by hand, not sure can do it from
>> the userspace, can I? Without it __ip_append_data() falls into
>> non-zerocopy path.
>>
[...]
>>
>> modprobe dummy numdummies=1
>> ip link set dummy0 up
> 
> 
> No change is needed to the dummy driver:
>    ip li add dummy0 type dummy
>    ip li set dummy0 up mtu 65536

awesome, thanks!

>> # force requests to <dummy_ip_addr> go through the dummy device
>> ip route add <dummy_ip_addr> dev dummy0
> 
> that command is not necessary.
> 
>>
>>
>> With dummy I was just sinking the traffic to the dummy device,
>> was good enough for me. Omitting "taskset" and "nice":
>>
>> send-zc -4 -D <dummy_ip_addr> -t 10 udp
>>
>> Similarly with msg_zerocopy:
>>
>> <kernel>/tools/testing/selftests/net/msg_zerocopy -4 -p 6666 -D
>> <dummy_ip_addr> -t 10 -z udp
> 
> I get -ENOBUFS with '-z' and any local address.

Ah, right. Citing from Willem's MSG_ZEROCOPY letter:

"
Notification skbuffs are allocated from optmem. For sockets that
cannot effectively coalesce notifications, the optmem max may need
to be increased to avoid hitting -ENOBUFS:

   sysctl -w net.core.optmem_max=1048576
"


>> For loopback testing, as zerocopy is not allowed for it as Willem
>> explained in
>> the original MSG_ZEROCOPY cover-letter, I used a hack to bypass it:
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index ebb12a7d386d..42df33b175ce 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -2854,9 +2854,7 @@ static inline int skb_orphan_frags(struct sk_buff
>> *skb, gfp_t gfp_mask)
>>   /* Frags must be orphaned, even if refcounted, if skb might loop to rx
>> path */
>>   static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask)
>>   {
>> -    if (likely(!skb_zcopy(skb)))
>> -        return 0;
>> -    return skb_copy_ubufs(skb, gfp_mask);
>> +    return skb_orphan_frags(skb, gfp_mask);
>>   }
>>   
> 
> that is the key change that is missing in your repo. All local traffic
> (traffic to the address on a dummy device falls into this comment) goes
> through loopback. That's just the way Linux works. If you look at the
> dummy driver, it's xmit function just drops packets if any actually make
> it there.

Not at all, the measurements were done without this patch. In case it
may shed some light, attaching a fresh flamegraph, same 115761.6 MB/s

btw, why a dummy device would ever go through loopback? It doesn't
seem to make sense, though may be missing something.


>>> mileage varies quite a bit.
>>
>> Interesting, any brief notes on the setup and the results? Dummy
> 
> VM on Chromebook. I just cloned your repos, built, install and test. As
> mentioned above, the skb_orphan_frags_rx change is missing from your
> repo and that is the key to your reported performance gains.
> 

-- 
Pavel Begunkov

[-- Attachment #2: perf.svg --]
[-- Type: image/svg+xml, Size: 118548 bytes --]

  reply	other threads:[~2021-12-01 19:11 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-30 15:18 [RFC 00/12] io_uring zerocopy send Pavel Begunkov
2021-11-30 15:18 ` [RFC 01/12] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-11-30 15:18 ` [RFC 02/12] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2021-11-30 15:18 ` [RFC 03/12] net/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-11-30 15:18 ` [RFC 04/12] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-11-30 15:18 ` [RFC 05/12] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-01 19:20   ` Jonathan Lemon
2021-12-01 20:17     ` Pavel Begunkov
2021-11-30 15:18 ` [RFC 06/12] io_uring: add send notifiers registration Pavel Begunkov
2021-11-30 15:18 ` [RFC 07/12] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-11-30 15:18 ` [RFC 08/12] io_uring: wire send zc request type Pavel Begunkov
2021-11-30 15:18 ` [RFC 09/12] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-11-30 15:18 ` [RFC 10/12] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-11-30 15:18 ` [RFC 11/12] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-11-30 15:19 ` [RFC 12/12] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-01  3:10 ` [RFC 00/12] io_uring zerocopy send David Ahern
2021-12-01 15:32   ` Pavel Begunkov
2021-12-01 17:57     ` David Ahern
2021-12-01 19:11       ` Pavel Begunkov [this message]
2021-12-01 19:20         ` David Ahern
2021-12-01 20:15           ` Pavel Begunkov
2021-12-01 21:51             ` Martin KaFai Lau
2021-12-01 22:35               ` David Ahern
2021-12-01 23:07                 ` Martin KaFai Lau
2021-12-01 23:18                   ` Pavel Begunkov
2021-12-02 15:48               ` Pavel Begunkov
2021-12-02 17:40                 ` Martin KaFai Lau
2021-12-01 20:42       ` Pavel Begunkov
2021-12-01 14:31 ` Pavel Begunkov
2021-12-01 17:49   ` David Ahern
2021-12-01 19:59     ` Pavel Begunkov
2021-12-01 18:10 ` Willem de Bruijn
2021-12-01 19:59   ` Pavel Begunkov
2021-12-01 20:29     ` Pavel Begunkov
2021-12-02  0:36       ` Willem de Bruijn
2021-12-02 16:25         ` Pavel Begunkov
2021-12-02  0:32     ` Willem de Bruijn
2021-12-02 16:45       ` Pavel Begunkov
2021-12-02 21:25         ` Willem de Bruijn
2021-12-03 16:19           ` Pavel Begunkov
2021-12-03 16:30             ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=889c0306-afed-62cd-d95b-a20b8e798979@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox