From: Jiri Pirko <jiri@resnulli.us>
To: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
Jason Xing <kerneljasonxing@gmail.com>,
Heng Qi <hengqi@linux.alibaba.com>,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, xuanzhuo@linux.alibaba.com,
virtualization@lists.linux.dev, ast@kernel.org,
daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com,
netdev@vger.kernel.org
Subject: Re: [patch net-next] virtio_net: add support for Byte Queue Limits
Date: Fri, 7 Jun 2024 11:57:37 +0200 [thread overview]
Message-ID: <ZmLZkVML2a3mT2Hh@nanopsycho.orion> (raw)
In-Reply-To: <CACGkMEvQ04NBUBwrc9AyvLqskSbQ_4OBUK=B9a+iktLcPLeyrg@mail.gmail.com>
Fri, Jun 07, 2024 at 08:47:43AM CEST, jasowang@redhat.com wrote:
>On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasowang@redhat.com wrote:
>> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasowang@redhat.com wrote:
>> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> >>
>> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote:
>> >> >> > > If the codes of orphan mode don't have an impact when you enable
>> >> >> > > napi_tx mode, please keep it if you can.
>> >> >> >
>> >> >> > For example, it complicates BQL implementation.
>> >> >> >
>> >> >> > Thanks
>> >> >>
>> >> >> I very much doubt sending interrupts to a VM can
>> >> >> *on all benchmarks* compete with not sending interrupts.
>> >> >
>> >> >It should not differ too much from the physical NIC. We can have one
>> >> >more round of benchmarks to see the difference.
>> >> >
>> >> >But if NAPI mode needs to win all of the benchmarks in order to get
>> >> >rid of orphan, that would be very difficult. Considering various bugs
>> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most
>> >> >of the benchmark doesn't show obvious differences.
>> >> >
>> >> >Looking at git history, there're commits that removes skb_orphan(), for example:
>> >> >
>> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340
>> >> >Author: Eric Dumazet <edumazet@google.com>
>> >> >Date: Fri Sep 28 07:53:26 2012 +0000
>> >> >
>> >> > mlx4: dont orphan skbs in mlx4_en_xmit()
>> >> >
>> >> > After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX
>> >> > completions) we no longer need to orphan skbs in mlx4_en_xmit()
>> >> > since skb wont stay a long time in TX ring before their release.
>> >> >
>> >> > Orphaning skbs in ndo_start_xmit() should be avoided as much as
>> >> > possible, since it breaks TCP Small Queue or other flow control
>> >> > mechanisms (per socket limits)
>> >> >
>> >> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>> >> > Acked-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
>> >> > Cc: Or Gerlitz <ogerlitz@mellanox.com>
>> >> > Signed-off-by: David S. Miller <davem@davemloft.net>
>> >> >
>> >> >>
>> >> >> So yea, it's great if napi and hardware are advanced enough
>> >> >> that the default can be changed, since this way virtio
>> >> >> is closer to a regular nic and more or standard
>> >> >> infrastructure can be used.
>> >> >>
>> >> >> But dropping it will go against *no breaking userspace* rule.
>> >> >> Complicated? Tough.
>> >> >
>> >> >I don't know what kind of userspace is broken by this. Or why it is
>> >> >not broken since the day we enable NAPI mode by default.
>> >>
>> >> There is a module option that explicitly allows user to set
>> >> napi_tx=false
>> >> or
>> >> napi_weight=0
>> >>
>> >> So if you remove this option or ignore it, both breaks the user
>> >> expectation.
>> >
>> >We can keep them, but I wonder what's the expectation of the user
>> >here? The only thing so far I can imagine is the performance
>> >difference.
>>
>> True.
>>
>> >
>> >> I personally would vote for this breakage. To carry ancient
>> >> things like this one forever does not make sense to me.
>> >
>> >Exactly.
>> >
>> >> While at it,
>> >> let's remove all virtio net module params. Thoughts?
>> >
>> >I tend to
>> >
>> >1) drop the orphan mode, but we can have some benchmarks first
>>
>> Any idea which? That would be really tricky to find the ones where
>> orphan mode makes difference I assume.
>
>True. Personally, I would like to just drop orphan mode. But I'm not
>sure others are happy with this.
How about to do it other way around. I will take a stab at sending patch
removing it. If anyone is against and has solid data to prove orphan
mode is needed, let them provide those.
>
>Thanks
>
>>
>>
>> >2) keep the module parameters
>>
>> and ignore them, correct? Perhaps a warning would be good.
>>
>>
>> >
>> >Thanks
>> >
>> >>
>> >>
>> >>
>> >> >
>> >> >Thanks
>> >> >
>> >> >>
>> >> >> --
>> >> >> MST
>> >> >>
>> >> >
>> >>
>> >
>>
>
next prev parent reply other threads:[~2024-06-07 9:57 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-09 11:46 [patch net-next] virtio_net: add support for Byte Queue Limits Jiri Pirko
2024-05-09 12:41 ` Michael S. Tsirkin
2024-05-09 13:31 ` Jiri Pirko
2024-05-09 14:28 ` Michael S. Tsirkin
2024-05-10 4:25 ` Jason Wang
2024-05-10 10:37 ` Jiri Pirko
2024-05-10 10:52 ` Michael S. Tsirkin
2024-05-10 11:11 ` Jiri Pirko
2024-05-10 11:27 ` Michael S. Tsirkin
2024-05-10 11:36 ` Jiri Pirko
2024-05-15 7:34 ` Jiri Pirko
2024-05-15 8:20 ` Michael S. Tsirkin
2024-05-15 10:12 ` Jiri Pirko
2024-05-15 12:54 ` Jiri Pirko
2024-05-16 4:48 ` Jason Wang
2024-05-16 10:54 ` Jiri Pirko
2024-05-16 12:31 ` Michael S. Tsirkin
2024-05-16 15:25 ` Jiri Pirko
2024-05-16 19:04 ` Michael S. Tsirkin
2024-05-17 7:52 ` Jiri Pirko
[not found] ` <CAA93jw6WanAQrPAFZ1hYVTXuWDwP+4J70LnmPOD2ugNwYK6HMA@mail.gmail.com>
2024-06-06 7:30 ` Jiri Pirko
2024-05-10 4:25 ` Jason Wang
2024-05-10 7:11 ` Heng Qi
2024-05-10 10:35 ` Jiri Pirko
2024-05-20 12:48 ` Jiri Pirko
2024-06-05 11:30 ` Jiri Pirko
2024-06-05 11:42 ` Heng Qi
2024-06-06 0:20 ` Jason Wang
2024-06-06 2:58 ` Jason Xing
2024-06-06 4:25 ` Jason Wang
2024-06-06 6:05 ` Michael S. Tsirkin
2024-06-06 7:56 ` Jason Wang
2024-06-06 13:45 ` Jiri Pirko
2024-06-07 6:25 ` Jason Wang
2024-06-07 6:39 ` Jiri Pirko
2024-06-07 6:43 ` Michael S. Tsirkin
2024-06-07 6:47 ` Jason Wang
2024-06-07 9:57 ` Jiri Pirko [this message]
2024-06-07 10:23 ` Michael S. Tsirkin
2024-06-07 11:30 ` Jiri Pirko
2024-06-10 14:18 ` Michael S. Tsirkin
2024-06-17 1:44 ` Jason Wang
2024-06-17 9:30 ` Jiri Pirko
2024-06-17 16:16 ` Michael S. Tsirkin
2024-06-18 1:19 ` Jason Wang
2024-06-18 0:52 ` Jason Wang
2024-06-18 18:23 ` Michael S. Tsirkin
2024-06-17 16:18 ` Michael S. Tsirkin
2024-06-07 11:22 ` Jason Xing
2024-06-06 11:42 ` Jason Xing
2024-06-06 12:00 ` Michael S. Tsirkin
2024-06-06 13:41 ` Jiri Pirko
2024-06-07 6:22 ` Jason Wang
2024-06-07 6:39 ` Michael S. Tsirkin
2024-06-07 6:40 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmLZkVML2a3mT2Hh@nanopsycho.orion \
--to=jiri@resnulli.us \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=hengqi@linux.alibaba.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).