Re: [PATCH v5] virtio-net: clarify coalescing parameters settings

public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed

From: Heng Qi <hengqi@linux.alibaba.com>
To: "Si-Wei Liu" <si-wei.liu@oracle.com>
Cc: Halil Pasic <pasic@linux.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	"virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
	Jason Wang <jasowang@redhat.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Parav Pandit <parav@nvidia.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v5] virtio-net: clarify coalescing parameters settings
Date: Sat, 22 Jun 2024 09:34:29 +0800	[thread overview]
Message-ID: <1719020069.8729858-17-hengqi@linux.alibaba.com> (raw)
In-Reply-To: <b9e47c91-46aa-4526-b441-de48906f1ae4@oracle.com>

On Fri, 21 Jun 2024 16:46:27 -0700, "Si-Wei Liu" <si-wei.liu@oracle.com> wrote:
> 
> 
> On 6/20/2024 8:24 PM, Heng Qi wrote:
> > On Thu, 20 Jun 2024 18:21:03 -0700, "Si-Wei Liu" <si-wei.liu@oracle.com> wrote:
> >>
> >> On 6/20/2024 12:40 AM, Heng Qi wrote:
> >>> On Mon, 17 Jun 2024 16:31:25 -0700, "Si-Wei Liu" <si-wei.liu@oracle.com> wrote:
> >>>> On 6/16/2024 7:27 PM, Heng Qi wrote:
> >>>>> On Thu, 13 Jun 2024 02:13:50 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>>> On Tue, Jun 11, 2024 at 05:43:18PM +0000, Parav Pandit wrote:
> >>>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
> >>>>>>>> Sent: Tuesday, June 11, 2024 10:00 PM
> >>>>>>>> How we can we make progress with
> >>>>>>>> the realease but sure we don't make backwards compat a pain?
> >>>>>>>>
> >>>>>>>> Ideas?
> >>>>>>>>
> >>>>>>> There is no functional break with this relaxation.
> >>>>>>> Device set some non-zero defaults and driver didn't modify them....
> >>>>>>> Anything broken? Unlikely.
> >>>> Generally it's inappropriate to leave this decision making to the device
> >>>> for what would be the best / most performant default config, as the
> >>>> device is generally considered agnostic to the guest load...
> >>> Instead, the performance of the virtual machine and the driver depends heavily
> >>> on how the device is implemented, just as we have proposed various ways to
> >>> offload the data queue in the device to the hardware. The reason why most
> >>> devices use software to simulate ctrlq instead of using hardware offload is
> >>> that the driver has no requirements for the performance of ctrlq before, that is,
> >>> the device implementation is responsible for and meets the driver's performance
> >>> requirements.
> >> I am not sure I follow the arguments of ctrlq being s/w or h/w, I
> >> thought that it was for the debate why we need coalescing on hardware
> >> offload device in the first place, instead of reusing event index or
> >> similar s/w based notification suppression. I don't doubt the value of
> >> coalescing, but how's it relevant to the default value disposition?
> >> Generally the default config disposition is not considered to be part of
> >> device implementation, especially when it comes to the situation where
> >> device can't easily figure out the specific workload to occur in the
> >> guest, and there's no perfect single default value that could meet every
> >> single performance metric across the board. This is a typical tuning
> >> knob left up to the user to adjust, why the device or driver needs to
> >> set or load the initial value? The driver just needs to start with
> >> certain value, be it 0 or non-zero, which guest user can override at any
> >> point of time, depending on his/her need, and that's it! I guess I still
> >> don't understand your user case here, why device / driver default is of
> >> such importance.
> > I've explained that, and I understand your argument is why default value is needed,
> > and users should be able to adjust them, right?
> Sounds about right. You'll soon realize that there's no perfect default 
> that could work with everyone - in that it's just a static value to 
> begin with, no matter whatever initial value the device comes up with, 
> one user or another will come over to you and complain that what is 
> loaded from the device doesn't match the workload they have, so those 
> users are still expected to adjust manually tweaking for their own. As 
> long as it's a tunable that guest user can control and override anytime, 
> I don't feel it too much different what initial config the device would 
> start with.

When you have a large number of customers and they buy your machines,
how many users do you think have experience adjusting this value? More below.

> 
> > The default value is working when the user doesn't adjust them.
> > It's not practical to rely entirely on user adjustment,
> That's where the adaptive interrupt moderation framework (e.g. DIM for 
> Linux) could come to the play, I think?

Last time, this work is only useful if DIM is not enabled. I don't want to
explain it again, I've said it many times to different people in the history
discussion, don't you want to check any history discussion?

> 
> >   and for devices
> > that serve hundreds of thousands of customers, the device implementation
> > has to be comprehensive.
> Would you mind elaborating the technical difficulty for why the device 
> implementation has to be comprehensive to serve hundreds or thousands of 
> customers (i.e. adaptive moderation in the device), and is there better 
> way to improve the current interface (ctrlq v.s. doorbell/MMIO register) 
> that is hardware offload implementation wise. I feel people reading the 
> thread without understanding the full background would become even more 
> confused as to why it's relevant to your proposal, to me it's completely 
> distinct use case or problem area that we are talking about.

Regardless of whether DIM can be enabled by default (it cannot accelerate all
scenarios, right?), therefore, a considerable number of machines on the
cloud vendors' line still have to rely on static values to provide boot-up
performance. The vendors will also make optimizations to optimize scenarios
like ping-pong, so the boot-up performance is very good.

Thanks.

> 
> Thanks,
> -Siwei
> 
> >
> > Thanks.
> >
> >>>> Unless the
> >>>> device is specially hard wired to some fixed guest setup that users
> >>>> couldn't change, it doesn't seem logical that the device could derive
> >>>> the best or most performant config on driver's behalf. What if the guest
> >>>> wants best latency for its load but the device just blindlessly guess
> >>>> the guest might prefer throughput friendly that it miserably uses
> >>>> latency impacting non-zero default?
> >>> The device does not want to guess and cannot guess. This patch does not force
> >>> the device to choose a non-zero value, but relaxes it to allow the device to
> >>> choose 0 or non-zero, which is very friendly to virtual machines with different
> >>> performance requirements, right?
> >> I don't understand the friendly part - do you imply your VM users are on
> >> kinda fixed wired setup that they cannot change these coalescing
> >> parameters after driver is loaded? Can the owner of the VM in control
> >> apply certain initial config for the coalescing parameters to the VM
> >> image? Or is it the problem of the guest driver that doesn't yet expose
> >> coalescing parameters to the end user? Otherwise I would think that
> >> guest user should be able to set parameters accordingly that would best
> >> fit the specific performance requirement of their own. How the device
> >> could even help here? I don't feel there's a lot of value to grant the
> >> device or host admin the flexibility to policy the *best* config on
> >> guest user's behalf, to be honest. And you seem to admit the fact that
> >> the default doesn't really matter, be it 0 or non-zero.
> >>
> >>>> Could this device side change for
> >>>> the default config regress boot time performance (which may need best
> >>>> latency over throughput)?
> >>> Don't make these assumptions, what if the driver needs better throughput?
> >> There's a misconception here: what we think that driver may need in
> >> terms on performance does actually reflect what guest user would like to
> >> have. Driver cannot read guest user's mind to make the decision, either.
> >>
> >> In history there was drastic change in the Linux virtio-net driver that
> >> ever changed the default disposition for XPS (and RPS as well) from
> >> throughput and long-lived connection oriented to concurrency and
> >> short-lived oriented, which regressed a lot of existing setups that
> >> expects sustained throughput and packet rate after kernel upgrade.
> >> Although the occurrence of such drastic change for default disposition
> >> is not so welcome (that is one of the reasons why I valued consistent
> >> initial value and back compatible  behavior), I don't see people yelling
> >> at virtio spec for less flexibility of offering the default disposition,
> >> given that the guest user can override the config any time with their
> >> own tooling or script, there's no problem at all for them to just set
> >> the corresponding config back explicitly to what it was before.
> >>
> >>>>>>> And device/driver has better performance, is that a problem? Unlikely.
> >>>>>>>
> >>>> Even for rare case with a hard wired setup, the way to tackle the very
> >>>> problem using device's default is still quite questionable. Usually the
> >>>> mgmt software or network config utility should be equipped with some
> >>>> default value if need be. And we know the guest has the best position to
> >>>> impose the best / most performant config for its own load. What is the
> >>>> issue or use case that this initial config couldn't be applied by the
> >>>> guest mgmt software ahead but has to resort to the device to load some
> >>>> default (which is odd and irrelevant to any guest load), before the
> >>>> interface is brought up for operation i.e. performing I/O?
> >>> Use cases are everywhere, Alibaba Cloud, MLX and all other modern network cards
> >>> have a default value that is not 0.
> >> You seems to be referencing your own setup basically, and the question
> >> is still left answered - why the initial config can't be done through
> >> the mgmt software or network config utility within the guest?
> >>
> >>> (0 is actually a kind of default value)
> >>>
> >>>>> Sorry, my vacation just ended.
> >>>>>
> >>>>>> Yes, it is possible. Driver can cache values it sets
> >>>>>> and never query device with get.
> >>>>> Don't we already have a lot of behaviors to drive queries from devices?
> >>>>> RSS context, device stats.
> >>>>>
> >>>>>> Before anything is set, driver will report incorrect values.
> >>>>> Devices that are widely supported and supported by good practices should have
> >>>>> any initialization value. Just reporting 0 is incorrect value. Although the
> >>>>> spec now says so.
> >>>> I don't have an aligned view here, sorry. As I recall having 0 as the
> >>>> default is just to keep device started in a state where coalescing is
> >>>> disabled, so it's backward compatible and consistent with a
> >>>> non-coalescing supporting driver - such that it won't yield surprising
> >>>> effect (for e.g. regressed latency) inadvertently after user's getting
> >>>> driver software upgraded. Unlike the other virtio-net features that
> >>>> could 100% improve performance in all aspect, this coalescing feature is
> >>>> more of a performance tuning knob that may improve performance metrics
> >>>> (such as cpu usage or throughput) of one dimension while demoting the
> >>>> others (such as latency, jitter or connection rate) from the equation.
> >>>> That said, there's not a single and fixed set of default config that
> >>>> device could supply which is able to satisfy all kind of guest load.
> >>>> Rather than rely on the device to offer a matching default for driver
> >>>> (which I think it's technically wrong), I'd lean to having guest
> >>>> software or network utility to apply the initial config for the guest,
> >>>> where they should have best knowledge for the specific guest workload
> >>>> than what device could do.
> >>> Before this feature, a good device implementation should also support coalescing
> >>> (of course we don't necessarily assume it has coalescing).
> >> Again, I don't doubt the value of supporting coalescing.
> >>
> >>>    In addition, virtual
> >>> machines that tend to favor latency and throughput exist. If the device supported
> >>> by the manufacturer needs to provide a low-latency virtual machine, please
> >>> continue to keep the default value of 0.
> >> No, that's not what I was asking. There's no such requirement for any
> >> vendor to provide a low-latency or high throughput VM. The more general
> >> use case is - the setup for real world workload might just be too
> >> complex that end users would prefer low-latency on some virtual NIC or
> >> even some specific queues, while the other queues of a virtual NIC, or
> >> reset virtual NIC might have very different dispositions. Due to the
> >> needs and dynamics of workload scaling up & down, they might have more
> >> or less queues or virtual NICs to configure, so these disposition would
> >> need to be readjusted at any point of time, for which there's no easy
> >> way for device to adapt to easily. The guest user should have best
> >> knowledge for the specific guest workload and setup than what device
> >> could/should offer.
> >>
> >>>>>> What will break as a result? Hard to predict.
> >>>>>>
> >>>>>>
> >>>>>> Given the patch is big, I am inclined to say it just should use
> >>>>>> a feature bit.
> >>>>> This doesn't seem to break anything in my opinion, we just told the device that
> >>>>> now you can set more values.
> >>>> Another thing this could break is live migration between devices with
> >>>> varied default. How do you make sure the guest doesn't rely on some
> >>>> default from the source device, while on the destination it just doesn't
> >>>> get the same default coalescingjjj value? To get rid of this side effect
> >>>> the guest would still need to apply the initial config for its own,
> >>>> anyway... Which eventually would render this proposal with arbitrary
> >>>> default rather pointless.
> >>> I don't quite understand why this would affect hot migration, the values
> >>> would be migrated over.
> >> Then I don't see this has been discussed (wouldn't the initial value be
> >> part of virtio device state to migrate over?) in the thread or described
> >> in the proposed text. And your proposed per-queue coalescing parameters
> >> (default plus current value) would have to be described too as part of
> >> virtio-net device state, so that people don't misunderstand your proposal.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >>> Thanks.
> >>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>> Using new feature bits does not seem necessary.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>> -- 
> >>>>>> MST
> >>>>>>
>

next prev parent reply	other threads:[~2024-06-22  1:49 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28  4:47 [PATCH v5] virtio-net: clarify coalescing parameters settings Heng Qi
2024-05-28  4:50 ` Heng Qi
2024-05-31  6:36   ` Heng Qi
2024-05-31  9:39     ` Cornelia Huck
2024-06-07 20:02 ` Halil Pasic
2024-06-08  2:34   ` Heng Qi
2024-06-10 12:46     ` Halil Pasic
2024-06-10 13:35       ` Heng Qi
2024-06-10 14:50         ` Michael S. Tsirkin
2024-06-10 15:12           ` Parav Pandit
2024-06-11 14:04           ` Cornelia Huck
2024-06-10 20:19         ` Halil Pasic
2024-06-11 10:40           ` Heng Qi
2024-06-11 16:29             ` Michael S. Tsirkin
2024-06-11 17:43               ` Parav Pandit
2024-06-13  6:13                 ` Michael S. Tsirkin
2024-06-17  2:27                   ` Heng Qi
2024-06-17 23:31                     ` Si-Wei Liu
2024-06-20  7:40                       ` Heng Qi
2024-06-21  1:21                         ` Si-Wei Liu
2024-06-21  3:24                           ` Heng Qi
2024-06-21 23:46                             ` Si-Wei Liu
2024-06-22  1:34                               ` Heng Qi [this message]
2024-06-25  4:51                                 ` Si-Wei Liu
2024-06-25  5:56                                   ` Parav Pandit
2024-06-26  1:14                                     ` Si-Wei Liu
2024-06-27 10:37                                       ` Halil Pasic
2024-06-27 11:27                                         ` Parav Pandit
2024-06-27 12:35                                         ` Michael S. Tsirkin
2024-06-27 12:45                                           ` Parav Pandit
2024-06-27 12:52                                             ` Michael S. Tsirkin
2024-06-27 13:03                                               ` Parav Pandit
2024-06-27 14:59                                                 ` Michael S. Tsirkin
2024-06-27 17:27                                               ` Si-Wei Liu
2024-06-27 17:14                                           ` Si-Wei Liu
2024-06-27 22:18                                             ` Michael S. Tsirkin
2024-06-28  6:56                                               ` Si-Wei Liu
2024-06-28  8:23                                                 ` Jason Wang
2024-06-28 19:31                                                   ` Si-Wei Liu
2024-06-30 17:04                                                     ` Michael S. Tsirkin
2024-07-03  6:09                                                     ` Jason Wang
2024-07-02 20:37                                                   ` Halil Pasic
2024-07-02 21:04                                                     ` Michael S. Tsirkin
2024-07-03  5:01                                                     ` Jason Wang
2024-06-29  6:47                                           ` Halil Pasic
2024-06-30 16:55                                             ` Michael S. Tsirkin
2024-07-02 21:43                                               ` Halil Pasic
2024-06-27 12:13                                       ` Parav Pandit
2024-06-27 12:42                                         ` Michael S. Tsirkin
2024-06-25  7:53                               ` Jason Wang
2024-06-25  8:06                                 ` Michael S. Tsirkin
2024-06-25  8:13                                   ` Jason Wang
2024-06-25  8:21                                     ` Michael S. Tsirkin
2024-06-11 23:03 ` Michael S. Tsirkin
2024-06-17  2:35   ` Heng Qi
2024-06-25  7:26     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1719020069.8729858-17-hengqi@linux.alibaba.com \
    --to=hengqi@linux.alibaba.com \
    --cc=cohuck@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=pasic@linux.ibm.com \
    --cc=si-wei.liu@oracle.com \
    --cc=virtio-comment@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox