From: Halil Pasic <pasic@linux.ibm.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Si-Wei Liu <si-wei.liu@oracle.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Parav Pandit <parav@nvidia.com>,
Heng Qi <hengqi@linux.alibaba.com>,
Cornelia Huck <cohuck@redhat.com>,
"virtio-comment@lists.linux.dev" <virtio-comment@lists.linux.dev>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [PATCH v5] virtio-net: clarify coalescing parameters settings
Date: Tue, 2 Jul 2024 22:37:35 +0200 [thread overview]
Message-ID: <20240702223735.4fd2847a.pasic@linux.ibm.com> (raw)
In-Reply-To: <CACGkMEsxP0eKK2DbKVTig8MPW0J7z4o5xSSRtk4JAcOMoXmsPw@mail.gmail.com>
On Fri, 28 Jun 2024 16:23:17 +0800
Jason Wang <jasowang@redhat.com> wrote:
> > Hmmm, this next sentence you reference above was indeed for the
> > interaction between coalescing and used buffer suppression. Then what's
> > the best-effort part was about, really? Round-up or round down the set
> > value to the power of 2 to save space? How is it relevant to our
> > discussion? I think even with rounding it shouldn't be too off? (as
> > said, by best-effort v.s. give up)
>
> Would it help if we add something like below in the guidance?
>
It would IMHO greatly benefit clarity, provided we do get it right.
> When the device wants to send a notification it needs:
>
> 1) check notification suppression, don't send interrupt if it is
> suppressed otherwise goto 2)
> 2) check event index, don't send interrupt if event index is not
> crossed otherwise goto 3)
> 3) check coalescing, don't send interrupt if the coalescing limit is
> not exceed, otherwise send the interrupt
>
> (Just to demonstrate the idea, needs tweaking for sure)
Yes I agree this indeed needs tweaking. Let me make some points.
1) for virtio-net we have 3 distinct mechanisms that police used
buffer notification avoidance, but 2 of the three are actually
mutually exclusive if I understand it correctly. Namely we have
coalescing guarded by VIRTIO_NET_F_NOTF_COAL and then depending on
whether _F_EVENT_IDX was negotiated or not, we have "event_idx based" or
"crude flag based" notification suppression.
2) Let me just quote the entire section on the good old used buffer
notification suppression
"""
2.7.7.2 Device Requirements: Used Buffer Notification Suppression
If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
* The device MUST ignore the used_event value.
* After the device writes a descriptor index into the used ring:
- If flags is 1, the device SHOULD NOT send a notification.
- If flags is 0, the device MUST send a notification.
Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
* The device MUST ignore the lower bit of flags.
* After the device writes a descriptor index into the used ring:
- If the idx field in the used ring (which determined where that
descriptor index was placed) was equal to used_event, the device
MUST send a notification.
- Otherwise the device SHOULD NOT send a notification.
Note: For example, if used_event is 0, then a device using
VIRTIO_F_EVENT_IDX would send a used buffer notification to the driver
after the first buffer is used (and again after the 65536th buffer,
etc).
"""
Frankly, a MUST is a must, and I don't see very little leeway
for the coalescing to avoid any notifications. But on the other hand,
that is the very purpose of the coalescing so IMHO we have a problem
here.
The only loophole here is "after", which could be stretched to
"eventually, before the end of the world". Which would basically reduce
the MUST ad-absurdum, and I would not like that.
3) AFAIU if multiple suppression mechanisms are active
concurrently we are aiming for some sort of an logical AND semantic i.e.
a further reduction over just a single one being employed (and not OR).
But IMHO the both the wording for the event_idx based and curde
suppression, as well as for coalescing.
Let me quote the full description for the coalescing:
"""
5.1.6.5.9.1 Operation
The device sends a used buffer notification once the notification conditions are met and if the notifications are not suppressed as explained in 2.7.7.
When the device has non-zero max_usecs and non-zero max_packets, it starts counting microseconds and packets upon receiving/sending a packet. The device counts packets and microseconds for each receive virtqueue and transmit virtqueue separately. In this case, the notification conditions are met when max_usecs microseconds elapse, or upon sending/receiving max_packets packets, whichever happens first. Afterwards, the device waits for the next packet and starts counting packets and microseconds again.
When the device has max_usecs = 0 or max_packets = 0, the notification conditions are met after every packet received/sent.
"""
Here "Afterwards" is ambiguous. What we want here is "afterwards" ==
"after the notification has been sent", but one can also read it as
"after the condition has been met".
4) I see two ways to make sense out of it.
My preferred way would be to replace "send a notification" with
"generate a notification" in the main text including "2.7.7.2 Device
Requirements: Used Buffer Notification Suppression", along with adding an
explanation which states that generated notifications are sent
immediately unless a device has an active facility that under a certain
set of conditions retains and possibly coalesces notifications.
Please notice that here coalescence is used with its dictionary
meaning (see https://dictionary.cambridge.org/dictionary/english/coalesce),
and that the current description of NET_F_NOTIF_COAL is more akin to a
suppression mechanism where notifications are suppressed or discarded and
not coalesced.
The notification coalescing facility would then retain the first
used buffer notification request for each queue (those are still
generated by the same rules), until either the packet count or the
timeout type max frequency (or delay we need to clarify that) condition
is met, and coalesce any further notification requests with the retained
one until it is sent.
The other way, which I'm not sure is actually viable, is basically to
try to clarify the hell out of the current approach as taken by
"5.1.6.5.9.1 Operation" by making it crystal clear when exactly are the
"notification conditions met" and what is the complement state of that,
and add a sentence to 2.7.7.2 that references device specific mechanisms
to do more suppression. And of course we would have to find a nice
definition for "if the notifications are not suppressed as explained in
2.7.7." such that loss of initiative is not possible.
IMHO any solution where things can stall because we would have sent
an used buffer notification without "coalescing" but we did not because
of coalescing is not acceptable. This could happen if we cross
event_idx when the coalescing notification conditions are not met and
thus do not notify, and then when the coalescing notification condition
are finally met (e.g. via timeout) the event_id type suppression
is active again because we crossed event_idx when coalescing
notification condition was not met, and we end up not notifying again.
Sorry for the wall of text.
Regards,
Halil
next prev parent reply other threads:[~2024-07-02 20:37 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 4:47 [PATCH v5] virtio-net: clarify coalescing parameters settings Heng Qi
2024-05-28 4:50 ` Heng Qi
2024-05-31 6:36 ` Heng Qi
2024-05-31 9:39 ` Cornelia Huck
2024-06-07 20:02 ` Halil Pasic
2024-06-08 2:34 ` Heng Qi
2024-06-10 12:46 ` Halil Pasic
2024-06-10 13:35 ` Heng Qi
2024-06-10 14:50 ` Michael S. Tsirkin
2024-06-10 15:12 ` Parav Pandit
2024-06-11 14:04 ` Cornelia Huck
2024-06-10 20:19 ` Halil Pasic
2024-06-11 10:40 ` Heng Qi
2024-06-11 16:29 ` Michael S. Tsirkin
2024-06-11 17:43 ` Parav Pandit
2024-06-13 6:13 ` Michael S. Tsirkin
2024-06-17 2:27 ` Heng Qi
2024-06-17 23:31 ` Si-Wei Liu
2024-06-20 7:40 ` Heng Qi
2024-06-21 1:21 ` Si-Wei Liu
2024-06-21 3:24 ` Heng Qi
2024-06-21 23:46 ` Si-Wei Liu
2024-06-22 1:34 ` Heng Qi
2024-06-25 4:51 ` Si-Wei Liu
2024-06-25 5:56 ` Parav Pandit
2024-06-26 1:14 ` Si-Wei Liu
2024-06-27 10:37 ` Halil Pasic
2024-06-27 11:27 ` Parav Pandit
2024-06-27 12:35 ` Michael S. Tsirkin
2024-06-27 12:45 ` Parav Pandit
2024-06-27 12:52 ` Michael S. Tsirkin
2024-06-27 13:03 ` Parav Pandit
2024-06-27 14:59 ` Michael S. Tsirkin
2024-06-27 17:27 ` Si-Wei Liu
2024-06-27 17:14 ` Si-Wei Liu
2024-06-27 22:18 ` Michael S. Tsirkin
2024-06-28 6:56 ` Si-Wei Liu
2024-06-28 8:23 ` Jason Wang
2024-06-28 19:31 ` Si-Wei Liu
2024-06-30 17:04 ` Michael S. Tsirkin
2024-07-03 6:09 ` Jason Wang
2024-07-02 20:37 ` Halil Pasic [this message]
2024-07-02 21:04 ` Michael S. Tsirkin
2024-07-03 5:01 ` Jason Wang
2024-06-29 6:47 ` Halil Pasic
2024-06-30 16:55 ` Michael S. Tsirkin
2024-07-02 21:43 ` Halil Pasic
2024-06-27 12:13 ` Parav Pandit
2024-06-27 12:42 ` Michael S. Tsirkin
2024-06-25 7:53 ` Jason Wang
2024-06-25 8:06 ` Michael S. Tsirkin
2024-06-25 8:13 ` Jason Wang
2024-06-25 8:21 ` Michael S. Tsirkin
2024-06-11 23:03 ` Michael S. Tsirkin
2024-06-17 2:35 ` Heng Qi
2024-06-25 7:26 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240702223735.4fd2847a.pasic@linux.ibm.com \
--to=pasic@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=hengqi@linux.alibaba.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=si-wei.liu@oracle.com \
--cc=virtio-comment@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox