netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Xing <kerneljasonxing@gmail.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Jason Wang <jasowang@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	 Heng Qi <hengqi@linux.alibaba.com>,
	davem@davemloft.net, edumazet@google.com,  kuba@kernel.org,
	pabeni@redhat.com, xuanzhuo@linux.alibaba.com,
	 virtualization@lists.linux.dev, ast@kernel.org,
	daniel@iogearbox.net,  hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org
Subject: Re: [patch net-next] virtio_net: add support for Byte Queue Limits
Date: Fri, 7 Jun 2024 19:22:02 +0800	[thread overview]
Message-ID: <CAL+tcoCRSCvbwxNibDu7MdWXF77qjROd1eTeyZN6Sh-ckXDQrw@mail.gmail.com> (raw)
In-Reply-To: <ZmLZkVML2a3mT2Hh@nanopsycho.orion>

On Fri, Jun 7, 2024 at 5:57 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Fri, Jun 07, 2024 at 08:47:43AM CEST, jasowang@redhat.com wrote:
> >On Fri, Jun 7, 2024 at 2:39 PM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Fri, Jun 07, 2024 at 08:25:19AM CEST, jasowang@redhat.com wrote:
> >> >On Thu, Jun 6, 2024 at 9:45 PM Jiri Pirko <jiri@resnulli.us> wrote:
> >> >>
> >> >> Thu, Jun 06, 2024 at 09:56:50AM CEST, jasowang@redhat.com wrote:
> >> >> >On Thu, Jun 6, 2024 at 2:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> >>
> >> >> >> On Thu, Jun 06, 2024 at 12:25:15PM +0800, Jason Wang wrote:
> >> >> >> > > If the codes of orphan mode don't have an impact when you enable
> >> >> >> > > napi_tx mode, please keep it if you can.
> >> >> >> >
> >> >> >> > For example, it complicates BQL implementation.
> >> >> >> >
> >> >> >> > Thanks
> >> >> >>
> >> >> >> I very much doubt sending interrupts to a VM can
> >> >> >> *on all benchmarks* compete with not sending interrupts.
> >> >> >
> >> >> >It should not differ too much from the physical NIC. We can have one
> >> >> >more round of benchmarks to see the difference.
> >> >> >
> >> >> >But if NAPI mode needs to win all of the benchmarks in order to get
> >> >> >rid of orphan, that would be very difficult. Considering various bugs
> >> >> >will be fixed by dropping skb_orphan(), it would be sufficient if most
> >> >> >of the benchmark doesn't show obvious differences.
> >> >> >
> >> >> >Looking at git history, there're commits that removes skb_orphan(), for example:
> >> >> >
> >> >> >commit 8112ec3b8722680251aecdcc23dfd81aa7af6340
> >> >> >Author: Eric Dumazet <edumazet@google.com>
> >> >> >Date:   Fri Sep 28 07:53:26 2012 +0000
> >> >> >
> >> >> >    mlx4: dont orphan skbs in mlx4_en_xmit()
> >> >> >
> >> >> >    After commit e22979d96a55d (mlx4_en: Moving to Interrupts for TX
> >> >> >    completions) we no longer need to orphan skbs in mlx4_en_xmit()
> >> >> >    since skb wont stay a long time in TX ring before their release.
> >> >> >
> >> >> >    Orphaning skbs in ndo_start_xmit() should be avoided as much as
> >> >> >    possible, since it breaks TCP Small Queue or other flow control
> >> >> >    mechanisms (per socket limits)
> >> >> >
> >> >> >    Signed-off-by: Eric Dumazet <edumazet@google.com>
> >> >> >    Acked-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
> >> >> >    Cc: Or Gerlitz <ogerlitz@mellanox.com>
> >> >> >    Signed-off-by: David S. Miller <davem@davemloft.net>
> >> >> >
> >> >> >>
> >> >> >> So yea, it's great if napi and hardware are advanced enough
> >> >> >> that the default can be changed, since this way virtio
> >> >> >> is closer to a regular nic and more or standard
> >> >> >> infrastructure can be used.
> >> >> >>
> >> >> >> But dropping it will go against *no breaking userspace* rule.
> >> >> >> Complicated? Tough.
> >> >> >
> >> >> >I don't know what kind of userspace is broken by this. Or why it is
> >> >> >not broken since the day we enable NAPI mode by default.
> >> >>
> >> >> There is a module option that explicitly allows user to set
> >> >> napi_tx=false
> >> >> or
> >> >> napi_weight=0
> >> >>
> >> >> So if you remove this option or ignore it, both breaks the user
> >> >> expectation.
> >> >
> >> >We can keep them, but I wonder what's the expectation of the user
> >> >here? The only thing so far I can imagine is the performance
> >> >difference.
> >>
> >> True.
> >>
> >> >
> >> >> I personally would vote for this breakage. To carry ancient
> >> >> things like this one forever does not make sense to me.
> >> >
> >> >Exactly.
> >> >
> >> >> While at it,
> >> >> let's remove all virtio net module params. Thoughts?
> >> >
> >> >I tend to
> >> >
> >> >1) drop the orphan mode, but we can have some benchmarks first
> >>
> >> Any idea which? That would be really tricky to find the ones where
> >> orphan mode makes difference I assume.
> >
> >True. Personally, I would like to just drop orphan mode. But I'm not
> >sure others are happy with this.
>
> How about to do it other way around. I will take a stab at sending patch
> removing it. If anyone is against and has solid data to prove orphan
> mode is needed, let them provide those.

Honestly, we in production have to use skb orphan mode. I cannot
provide enough and strong evidence about why default mode on earth
causes performance degradation in some cases. I mean I don't know the
root cause. The only thing I can tell is if I enable the skb orphan
mode, then 1)  will let more skb pass the TCP layer, 2) some key
members like snd_cwnd in tcp will behave normally.

I know without orphan mode the skb will be controlled/limited by the
combination of TSO and napi_tx mode thanks to sk_wmem_alloc. So I
_guess_ the root cause is: the possible delay of interrupts generated
by the host machine will cause the delay of freeing skbs, resulting in
the slowing down the tx speed. If the interval between two interrupts
is very short like the real NIC, I think the issue would disappear.

That's all I can tell. Just for record. Hope this information could
also be useful to other readers.

Thanks,
Jason

>
>
> >
> >Thanks
> >
> >>
> >>
> >> >2) keep the module parameters
> >>
> >> and ignore them, correct? Perhaps a warning would be good.
> >>
> >>
> >> >
> >> >Thanks
> >> >
> >> >>
> >> >>
> >> >>
> >> >> >
> >> >> >Thanks
> >> >> >
> >> >> >>
> >> >> >> --
> >> >> >> MST
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >

  parent reply	other threads:[~2024-06-07 11:22 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-09 11:46 [patch net-next] virtio_net: add support for Byte Queue Limits Jiri Pirko
2024-05-09 12:41 ` Michael S. Tsirkin
2024-05-09 13:31   ` Jiri Pirko
2024-05-09 14:28     ` Michael S. Tsirkin
2024-05-10  4:25       ` Jason Wang
2024-05-10 10:37       ` Jiri Pirko
2024-05-10 10:52         ` Michael S. Tsirkin
2024-05-10 11:11           ` Jiri Pirko
2024-05-10 11:27             ` Michael S. Tsirkin
2024-05-10 11:36               ` Jiri Pirko
2024-05-15  7:34               ` Jiri Pirko
2024-05-15  8:20                 ` Michael S. Tsirkin
2024-05-15 10:12                   ` Jiri Pirko
2024-05-15 12:54                     ` Jiri Pirko
2024-05-16  4:48                       ` Jason Wang
2024-05-16 10:54                         ` Jiri Pirko
2024-05-16 12:31                           ` Michael S. Tsirkin
2024-05-16 15:25                             ` Jiri Pirko
2024-05-16 19:04                               ` Michael S. Tsirkin
2024-05-17  7:52                                 ` Jiri Pirko
     [not found]                 ` <CAA93jw6WanAQrPAFZ1hYVTXuWDwP+4J70LnmPOD2ugNwYK6HMA@mail.gmail.com>
2024-06-06  7:30                   ` Jiri Pirko
2024-05-10  4:25 ` Jason Wang
2024-05-10  7:11 ` Heng Qi
2024-05-10 10:35   ` Jiri Pirko
2024-05-20 12:48   ` Jiri Pirko
2024-06-05 11:30     ` Jiri Pirko
2024-06-05 11:42       ` Heng Qi
2024-06-06  0:20         ` Jason Wang
2024-06-06  2:58           ` Jason Xing
2024-06-06  4:25             ` Jason Wang
2024-06-06  6:05               ` Michael S. Tsirkin
2024-06-06  7:56                 ` Jason Wang
2024-06-06 13:45                   ` Jiri Pirko
2024-06-07  6:25                     ` Jason Wang
2024-06-07  6:39                       ` Jiri Pirko
2024-06-07  6:43                         ` Michael S. Tsirkin
2024-06-07  6:47                         ` Jason Wang
2024-06-07  9:57                           ` Jiri Pirko
2024-06-07 10:23                             ` Michael S. Tsirkin
2024-06-07 11:30                               ` Jiri Pirko
2024-06-10 14:18                                 ` Michael S. Tsirkin
2024-06-17  1:44                                   ` Jason Wang
2024-06-17  9:30                                     ` Jiri Pirko
2024-06-17 16:16                                       ` Michael S. Tsirkin
2024-06-18  1:19                                         ` Jason Wang
2024-06-18  0:52                                       ` Jason Wang
2024-06-18 18:23                                         ` Michael S. Tsirkin
2024-06-17 16:18                                     ` Michael S. Tsirkin
2024-06-07 11:22                             ` Jason Xing [this message]
2024-06-06 11:42               ` Jason Xing
2024-06-06 12:00                 ` Michael S. Tsirkin
2024-06-06 13:41               ` Jiri Pirko
2024-06-07  6:22                 ` Jason Wang
2024-06-07  6:39                   ` Michael S. Tsirkin
2024-06-07  6:40                   ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL+tcoCRSCvbwxNibDu7MdWXF77qjROd1eTeyZN6Sh-ckXDQrw@mail.gmail.com \
    --to=kerneljasonxing@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hengqi@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).