From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Parav Pandit <parav@nvidia.com>, Heng Qi <hengqi@linux.alibaba.com>
Cc: Halil Pasic <pasic@linux.ibm.com>,
Cornelia Huck <cohuck@redhat.com>,
"virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
Jason Wang <jasowang@redhat.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v5] virtio-net: clarify coalescing parameters settings
Date: Tue, 25 Jun 2024 18:14:15 -0700 [thread overview]
Message-ID: <bca1e5e8-65af-4825-b6fe-2ca3b6c41feb@oracle.com> (raw)
In-Reply-To: <PH0PR12MB54818CFB447152A25130F9B8DCD52@PH0PR12MB5481.namprd12.prod.outlook.com>
On 6/24/2024 10:56 PM, Parav Pandit wrote:
> Hi Si-Wei,
>
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>> Sent: Tuesday, June 25, 2024 10:22 AM
>>
>> On 6/21/2024 6:34 PM, Heng Qi wrote:
>>> On Fri, 21 Jun 2024 16:46:27 -0700, "Si-Wei Liu"<si-wei.liu@oracle.com>
>> wrote:
>>>> On 6/20/2024 8:24 PM, Heng Qi wrote:
>>>>> On Thu, 20 Jun 2024 18:21:03 -0700, "Si-Wei Liu"<si-wei.liu@oracle.com>
>> wrote:
>>>>>> On 6/20/2024 12:40 AM, Heng Qi wrote:
>>>>>>> On Mon, 17 Jun 2024 16:31:25 -0700, "Si-Wei Liu"<si-
>> wei.liu@oracle.com> wrote:
>>>>>>>> On 6/16/2024 7:27 PM, Heng Qi wrote:
>>>>>>>>> On Thu, 13 Jun 2024 02:13:50 -0400, "Michael S.
>> Tsirkin"<mst@redhat.com> wrote:
>>>>>>>>>> On Tue, Jun 11, 2024 at 05:43:18PM +0000, Parav Pandit wrote:
>>>>>>>>>>>> From: Michael S. Tsirkin<mst@redhat.com>
>>>>>>>>>>>> Sent: Tuesday, June 11, 2024 10:00 PM How we can we make
>>>>>>>>>>>> progress with the realease but sure we don't make backwards
>>>>>>>>>>>> compat a pain?
>>>>>>>>>>>>
>>>>>>>>>>>> Ideas?
>>>>>>>>>>>>
>>>>>>>>>>> There is no functional break with this relaxation.
>>>>>>>>>>> Device set some non-zero defaults and driver didn't modify
>> them....
>>>>>>>>>>> Anything broken? Unlikely.
>>>>>>>> Generally it's inappropriate to leave this decision making to the
>>>>>>>> device for what would be the best / most performant default
>>>>>>>> config, as the device is generally considered agnostic to the guest
>> load...
>>>>>>> Instead, the performance of the virtual machine and the driver
>>>>>>> depends heavily on how the device is implemented, just as we have
>>>>>>> proposed various ways to offload the data queue in the device to
>>>>>>> the hardware. The reason why most devices use software to simulate
>>>>>>> ctrlq instead of using hardware offload is that the driver has no
>>>>>>> requirements for the performance of ctrlq before, that is, the
>>>>>>> device implementation is responsible for and meets the driver's
>> performance requirements.
>>>>>> I am not sure I follow the arguments of ctrlq being s/w or h/w, I
>>>>>> thought that it was for the debate why we need coalescing on
>>>>>> hardware offload device in the first place, instead of reusing
>>>>>> event index or similar s/w based notification suppression. I don't
>>>>>> doubt the value of coalescing, but how's it relevant to the default value
>> disposition?
>>>>>> Generally the default config disposition is not considered to be
>>>>>> part of device implementation, especially when it comes to the
>>>>>> situation where device can't easily figure out the specific
>>>>>> workload to occur in the guest, and there's no perfect single
>>>>>> default value that could meet every single performance metric
>>>>>> across the board. This is a typical tuning knob left up to the user
>>>>>> to adjust, why the device or driver needs to set or load the
>>>>>> initial value? The driver just needs to start with certain value,
>>>>>> be it 0 or non-zero, which guest user can override at any point of
>>>>>> time, depending on his/her need, and that's it! I guess I still
>>>>>> don't understand your user case here, why device / driver default is of
>> such importance.
>>>>> I've explained that, and I understand your argument is why default
>>>>> value is needed, and users should be able to adjust them, right?
>>>> Sounds about right. You'll soon realize that there's no perfect
>>>> default that could work with everyone - in that it's just a static
>>>> value to begin with, no matter whatever initial value the device
>>>> comes up with, one user or another will come over to you and complain
>>>> that what is loaded from the device doesn't match the workload they
>>>> have, so those users are still expected to adjust manually tweaking
>>>> for their own. As long as it's a tunable that guest user can control
>>>> and override anytime, I don't feel it too much different what initial
>>>> config the device would start with.
>>> When you have a large number of customers and they buy your machines,
>>> how many users do you think have experience adjusting this value? More
>> below.
>> This is not what this proposal alone could address, as it just shifts the sheer
>> pain to the device and host admin rather than provide any flexibility.
>> Technically I believe the right solution is to seek adaptive coalescing that
>> doesn't require user to tune anything. Noted adaptive coalescing /
>> suppression could be done from the guest or by the device. For e.g. the s/w
>> based event index based notification suppression could also work with
>> hardware if the event index is somehow propagated through a doorbell
>> register rather than load from host memory over the PCI transport.
>>
>>>>> The default value is working when the user doesn't adjust them.
>>>>> It's not practical to rely entirely on user adjustment,
>>>> That's where the adaptive interrupt moderation framework (e.g. DIM
>>>> for
>>>> Linux) could come to the play, I think?
>>> Last time, this work is only useful if DIM is not enabled. I don't
>>> want to explain it again, I've said it many times to different people
>>> in the history discussion, don't you want to check any history discussion?
>> I did before I replied. I thought both Halil and Michael tried hard to convey is
>> that what you claimed to work by just getting the initial config loaded from
>> device will not be sufficient to address all the performance requirements in
>> broad and general means other than satisfy the goal set by yourself. What I
>> saw is that you kept proving yourself by trying to reference your own use
>> case and design - this not just mixed different things up but also ended up
>> with circular argument that wasted everyone's time unfortunately.
>>
>>>>> and for devices
>>>>> that serve hundreds of thousands of customers, the device
>>>>> implementation has to be comprehensive.
>>>> Would you mind elaborating the technical difficulty for why the
>>>> device implementation has to be comprehensive to serve hundreds or
>>>> thousands of customers (i.e. adaptive moderation in the device), and
>>>> is there better way to improve the current interface (ctrlq v.s.
>>>> doorbell/MMIO register) that is hardware offload implementation wise.
>>>> I feel people reading the thread without understanding the full
>>>> background would become even more confused as to why it's relevant to
>>>> your proposal, to me it's completely distinct use case or problem area that
>> we are talking about.
>>> Regardless of whether DIM can be enabled by default (it cannot
>>> accelerate all scenarios, right?), therefore, a considerable number of
>>> machines on the cloud vendors' line still have to rely on static
>>> values to provide boot-up performance.
>> Sorry, this is not a good excuse for people to accept a sub-optimal solution.
>> Please talk to the other hardware vendor or read my above reply carefully.
>>
>> -Siwei
>>
>>> The vendors will also make optimizations to optimize scenarios like
>>> ping-pong, so the boot-up performance is very good.
>>>
>>> Thanks.
>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>> Thanks.
>>>>>
>>>>>>>> Unless the
>>>>>>>> device is specially hard wired to some fixed guest setup that
>>>>>>>> users couldn't change, it doesn't seem logical that the device
>>>>>>>> could derive the best or most performant config on driver's
>>>>>>>> behalf. What if the guest wants best latency for its load but the
>>>>>>>> device just blindlessly guess the guest might prefer throughput
>>>>>>>> friendly that it miserably uses latency impacting non-zero default?
>>>>>>> The device does not want to guess and cannot guess. This patch
>>>>>>> does not force the device to choose a non-zero value, but relaxes
>>>>>>> it to allow the device to choose 0 or non-zero, which is very
>>>>>>> friendly to virtual machines with different performance requirements,
>> right?
>>>>>> I don't understand the friendly part - do you imply your VM users
>>>>>> are on kinda fixed wired setup that they cannot change these
>>>>>> coalescing parameters after driver is loaded? Can the owner of the
>>>>>> VM in control apply certain initial config for the coalescing
>>>>>> parameters to the VM image? Or is it the problem of the guest
>>>>>> driver that doesn't yet expose coalescing parameters to the end
>>>>>> user? Otherwise I would think that guest user should be able to set
>>>>>> parameters accordingly that would best fit the specific performance
>>>>>> requirement of their own. How the device could even help here? I
>>>>>> don't feel there's a lot of value to grant the device or host admin
>>>>>> the flexibility to policy the *best* config on guest user's behalf,
>>>>>> to be honest. And you seem to admit the fact that the default doesn't
>> really matter, be it 0 or non-zero.
>>>>>>>> Could this device side change for the default config regress boot
>>>>>>>> time performance (which may need best latency over throughput)?
>>>>>>> Don't make these assumptions, what if the driver needs better
>> throughput?
>>>>>> There's a misconception here: what we think that driver may need in
>>>>>> terms on performance does actually reflect what guest user would
>>>>>> like to have. Driver cannot read guest user's mind to make the decision,
>> either.
>>>>>> In history there was drastic change in the Linux virtio-net driver
>>>>>> that ever changed the default disposition for XPS (and RPS as well)
>>>>>> from throughput and long-lived connection oriented to concurrency
>>>>>> and short-lived oriented, which regressed a lot of existing setups
>>>>>> that expects sustained throughput and packet rate after kernel
>> upgrade.
>>>>>> Although the occurrence of such drastic change for default
>>>>>> disposition is not so welcome (that is one of the reasons why I
>>>>>> valued consistent initial value and back compatible behavior), I
>>>>>> don't see people yelling at virtio spec for less flexibility of
>>>>>> offering the default disposition, given that the guest user can
>>>>>> override the config any time with their own tooling or script,
>>>>>> there's no problem at all for them to just set the corresponding config
>> back explicitly to what it was before.
>>>>>>>>>>> And device/driver has better performance, is that a problem?
>> Unlikely.
>>>>>>>> Even for rare case with a hard wired setup, the way to tackle the
>>>>>>>> very problem using device's default is still quite questionable.
>>>>>>>> Usually the mgmt software or network config utility should be
>>>>>>>> equipped with some default value if need be. And we know the
>>>>>>>> guest has the best position to impose the best / most performant
>>>>>>>> config for its own load. What is the issue or use case that this
>>>>>>>> initial config couldn't be applied by the guest mgmt software
>>>>>>>> ahead but has to resort to the device to load some default (which
>>>>>>>> is odd and irrelevant to any guest load), before the interface is
>> brought up for operation i.e. performing I/O?
>>>>>>> Use cases are everywhere, Alibaba Cloud, MLX and all other modern
>>>>>>> network cards have a default value that is not 0.
>>>>>> You seems to be referencing your own setup basically, and the
>>>>>> question is still left answered - why the initial config can't be
>>>>>> done through the mgmt software or network config utility within the
>> guest?
>>>>>>> (0 is actually a kind of default value)
>>>>>>>
>>>>>>>>> Sorry, my vacation just ended.
>>>>>>>>>
>>>>>>>>>> Yes, it is possible. Driver can cache values it sets and never
>>>>>>>>>> query device with get.
>>>>>>>>> Don't we already have a lot of behaviors to drive queries from
>> devices?
>>>>>>>>> RSS context, device stats.
>>>>>>>>>
>>>>>>>>>> Before anything is set, driver will report incorrect values.
>>>>>>>>> Devices that are widely supported and supported by good
>>>>>>>>> practices should have any initialization value. Just reporting 0
>>>>>>>>> is incorrect value. Although the spec now says so.
>>>>>>>> I don't have an aligned view here, sorry. As I recall having 0 as
>>>>>>>> the default is just to keep device started in a state where
>>>>>>>> coalescing is disabled, so it's backward compatible and
>>>>>>>> consistent with a non-coalescing supporting driver - such that it
>>>>>>>> won't yield surprising effect (for e.g. regressed latency)
>>>>>>>> inadvertently after user's getting driver software upgraded.
>>>>>>>> Unlike the other virtio-net features that could 100% improve
>>>>>>>> performance in all aspect, this coalescing feature is more of a
>>>>>>>> performance tuning knob that may improve performance metrics
>>>>>>>> (such as cpu usage or throughput) of one dimension while demoting
>> the others (such as latency, jitter or connection rate) from the equation.
>>>>>>>> That said, there's not a single and fixed set of default config
>>>>>>>> that device could supply which is able to satisfy all kind of guest load.
>>>>>>>> Rather than rely on the device to offer a matching default for
>>>>>>>> driver (which I think it's technically wrong), I'd lean to having
>>>>>>>> guest software or network utility to apply the initial config for
>>>>>>>> the guest, where they should have best knowledge for the specific
>>>>>>>> guest workload than what device could do.
>>>>>>> Before this feature, a good device implementation should also
>>>>>>> support coalescing (of course we don't necessarily assume it has
>> coalescing).
>>>>>> Again, I don't doubt the value of supporting coalescing.
>>>>>>
>>>>>>> In addition, virtual
>>>>>>> machines that tend to favor latency and throughput exist. If the
>>>>>>> device supported by the manufacturer needs to provide a
>>>>>>> low-latency virtual machine, please continue to keep the default value
>> of 0.
>>>>>> No, that's not what I was asking. There's no such requirement for
>>>>>> any vendor to provide a low-latency or high throughput VM. The more
>>>>>> general use case is - the setup for real world workload might just
>>>>>> be too complex that end users would prefer low-latency on some
>>>>>> virtual NIC or even some specific queues, while the other queues of
>>>>>> a virtual NIC, or reset virtual NIC might have very different
>>>>>> dispositions. Due to the needs and dynamics of workload scaling up
>>>>>> & down, they might have more or less queues or virtual NICs to
>>>>>> configure, so these disposition would need to be readjusted at any
>>>>>> point of time, for which there's no easy way for device to adapt to
>>>>>> easily. The guest user should have best knowledge for the specific
>>>>>> guest workload and setup than what device could/should offer.
>>>>>>
>>>>>>>>>> What will break as a result? Hard to predict.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Given the patch is big, I am inclined to say it just should use
>>>>>>>>>> a feature bit.
>>>>>>>>> This doesn't seem to break anything in my opinion, we just told
>>>>>>>>> the device that now you can set more values.
>>>>>>>> Another thing this could break is live migration between devices
>>>>>>>> with varied default. How do you make sure the guest doesn't rely
>>>>>>>> on some default from the source device, while on the destination
>>>>>>>> it just doesn't get the same default coalescingjjj value? To get
>>>>>>>> rid of this side effect the guest would still need to apply the
>>>>>>>> initial config for its own, anyway... Which eventually would
>>>>>>>> render this proposal with arbitrary default rather pointless.
>>>>>>> I don't quite understand why this would affect hot migration, the
>>>>>>> values would be migrated over.
>>>>>> Then I don't see this has been discussed (wouldn't the initial
>>>>>> value be part of virtio device state to migrate over?) in the
>>>>>> thread or described in the proposed text. And your proposed
>>>>>> per-queue coalescing parameters (default plus current value) would
>>>>>> have to be described too as part of virtio-net device state, so that
>> people don't misunderstand your proposal.
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Siwei
>>>>>>>>> Using new feature bits does not seem necessary.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> MST
>>>>>>>>>>
> I saw the need of this proposal slightly differently in the discussion with Heng in v4.
> The way I understood is, proposed relaxation enables below Linux driver flow to work as equally as without device offering VIRTIO_NET_F_VQ_NOTF_COAL.
>
> Flow is:
> 1. The device offered feature VIRTIO_NET_F_VQ_NOTF_COAL
> 2. The virtio-net driver negotiated VIRTNET_FEATURES that has VIRTIO_NET_F_VQ_NOTF_COAL
>
> 3. Because VIRTIO_NET_F_VQ_NOTF_COAL is negotiated, device is not applying any coalescing on the VQ, in a good hope that driver will perform VQ notification coalescing.
>
> 4. the virtio-net driver even after negotiating VIRTIO_NET_F_VQ_NOTF_COAL, still kept the its dim disabled.
> vi->rq[queue].dim_enabled = false by default.
>
> 5. The virtio net driver enables dim only on user request via ethtool.
>
> So here, just because one is using a new driver and new device, it may get lower performance.
Impossible, older driver / device doesn't have coalescing advertise /
enabled, and new device & new driver with coalescing negotiated starts
safe with 0 with coalescing disabled effectively, how can it get lower
performance?
>
> Do we agree to above problem description?
This narrative has two problems:
- The term "lower performance" is misleading and inaccurate, the fact or
implication of coalescing is that it yields worse latency, causing
additional delay and jitter that would hurt quite a lot of real world
workload that is latency sensitive. So, rather than say coalescing
helps performance, it should clearly tell the truth that the coalescing
parameter is no more than a tuning knob that has performance
impact/implication, being positive or negative it is very subject to the
guest workload. So the default disposition or initial parameters loaded
from device just reflects the preference of the host admin/ device
owner/ cloud vendor of one's own, for e.g. it is optimized for
throughput oriented load versus latency oriented. Be noted, generally
end users don't care what coalescing is about, they do not connect
performance to "throughput" or "bandwidth" as that in your mind.
- The specific problem or use case is narrow scoped and specifically
tailored (e.g. for a hard wired guest setup that is known to the device
owner / host admin already), while for some reason the advantage of the
proposed solution is overly exaggerated. For one hand, the ethtool
interface is very Linux specific that exposed quite a lit lower level
device implementation details to normal end users. In case of static and
fixed configuration, it definitely requires dedicated knowledge of the
user to fine tune for their load. For the other hand, if we are talking
about normal users who don't have such knowledge AND don't care so much
about performance, then whatever coalescing parameter will just work
with them, and they don't really care about the default. However, For
those normal users who care about performance but still care about
getting the best or well balanced performance (N.B. I say performance in
general, not limited to only throughput or latency in one dimension) for
their load, what options do they have? IMHO what they really like is to
lean on guest/driver/device to auto tune by itself rather than for them
to take the deep learning curve to fiddle with the low level device
parameters manually by themselves, right?
If you agree with the above, I guess we could proceed with unlocking the
other good options, as listed below. To me, the question I got from the
above problem statement but still left unanswered is that, what makes
the cloud vendor think that they could have better posture to decide
certain default values for guest users' behalf? Why it is a MUST to do
so for every cloud vendor in general?
Thanks,
-Siwei
> If no, I likely missed something in the long discussion and need to read all of it. :(
>
> If yes, few solutions are:
> 1. A user intervention is must to avoid this regression. The user in the guest VM must enable DIM if the device supports it.
> Very hard to do this plumbing.
>
> 2. A net driver enables the DIM by default as long as VIRTIO_NET_F_VQ_NOTF_COAL is negotiated.
> (without ethtool setting)
>
> 3. A device relaxes the limitation and continue to apply the coalescing, until driver overrides it.
>
> In my humble opinion, Heng is solving it using option #3, that tends to work with existing and future drivers who may/may not enable the DIM.
>
>
>
>
next prev parent reply other threads:[~2024-06-26 1:14 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 4:47 [PATCH v5] virtio-net: clarify coalescing parameters settings Heng Qi
2024-05-28 4:50 ` Heng Qi
2024-05-31 6:36 ` Heng Qi
2024-05-31 9:39 ` Cornelia Huck
2024-06-07 20:02 ` Halil Pasic
2024-06-08 2:34 ` Heng Qi
2024-06-10 12:46 ` Halil Pasic
2024-06-10 13:35 ` Heng Qi
2024-06-10 14:50 ` Michael S. Tsirkin
2024-06-10 15:12 ` Parav Pandit
2024-06-11 14:04 ` Cornelia Huck
2024-06-10 20:19 ` Halil Pasic
2024-06-11 10:40 ` Heng Qi
2024-06-11 16:29 ` Michael S. Tsirkin
2024-06-11 17:43 ` Parav Pandit
2024-06-13 6:13 ` Michael S. Tsirkin
2024-06-17 2:27 ` Heng Qi
2024-06-17 23:31 ` Si-Wei Liu
2024-06-20 7:40 ` Heng Qi
2024-06-21 1:21 ` Si-Wei Liu
2024-06-21 3:24 ` Heng Qi
2024-06-21 23:46 ` Si-Wei Liu
2024-06-22 1:34 ` Heng Qi
2024-06-25 4:51 ` Si-Wei Liu
2024-06-25 5:56 ` Parav Pandit
2024-06-26 1:14 ` Si-Wei Liu [this message]
2024-06-27 10:37 ` Halil Pasic
2024-06-27 11:27 ` Parav Pandit
2024-06-27 12:35 ` Michael S. Tsirkin
2024-06-27 12:45 ` Parav Pandit
2024-06-27 12:52 ` Michael S. Tsirkin
2024-06-27 13:03 ` Parav Pandit
2024-06-27 14:59 ` Michael S. Tsirkin
2024-06-27 17:27 ` Si-Wei Liu
2024-06-27 17:14 ` Si-Wei Liu
2024-06-27 22:18 ` Michael S. Tsirkin
2024-06-28 6:56 ` Si-Wei Liu
2024-06-28 8:23 ` Jason Wang
2024-06-28 19:31 ` Si-Wei Liu
2024-06-30 17:04 ` Michael S. Tsirkin
2024-07-03 6:09 ` Jason Wang
2024-07-02 20:37 ` Halil Pasic
2024-07-02 21:04 ` Michael S. Tsirkin
2024-07-03 5:01 ` Jason Wang
2024-06-29 6:47 ` Halil Pasic
2024-06-30 16:55 ` Michael S. Tsirkin
2024-07-02 21:43 ` Halil Pasic
2024-06-27 12:13 ` Parav Pandit
2024-06-27 12:42 ` Michael S. Tsirkin
2024-06-25 7:53 ` Jason Wang
2024-06-25 8:06 ` Michael S. Tsirkin
2024-06-25 8:13 ` Jason Wang
2024-06-25 8:21 ` Michael S. Tsirkin
2024-06-11 23:03 ` Michael S. Tsirkin
2024-06-17 2:35 ` Heng Qi
2024-06-25 7:26 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bca1e5e8-65af-4825-b6fe-2ca3b6c41feb@oracle.com \
--to=si-wei.liu@oracle.com \
--cc=cohuck@redhat.com \
--cc=hengqi@linux.alibaba.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=pasic@linux.ibm.com \
--cc=virtio-comment@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox