* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-25 2:30 Xiaoguang Wang
@ 2024-01-25 3:28 ` Lege Wang
2024-01-30 6:11 ` Lege Wang
2024-01-30 6:52 ` Parav Pandit
2 siblings, 0 replies; 16+ messages in thread
From: Lege Wang @ 2024-01-25 3:28 UTC (permalink / raw)
To: virtio-comment@lists.oasis-open.org
hi
I must say that this RFC patch just shows the rough design, and explain why we need
this feature. The patch itself is not mature enough, sorry, later after some discussions,
I'll submit one better.
Regards,
lege.wang
> -----Original Message-----
> From: Lege Wang <lege.wang@jaguarmicro.com>
> Sent: Thursday, January 25, 2024 10:30 AM
> To: virtio-comment@lists.oasis-open.org
> Cc: Lege Wang <lege.wang@jaguarmicro.com>
> Subject: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE:
> add new used buffer notification suppression mechanism
>
> Currently for the PCI transport, used buffer notification suppression
> informations are located in guest memory, for example, pvirtq_event_suppress
> for packed ring and virtq_avail for split ring, then PCI devices simulated
> by hypervisors, such as QEMU, can vist these suppression informations
> directly before issuing used event, the only overheads may be just proper
> memory barriers, but for some hardware PCI devices offered by
> DPU(Data Processing Unit), DPU would need to submit DMAs to copy guest
> suppression informations to DPU internal cache or memory, then check
> whether used event should be notified after examing this internal cache
> or memory, which is obvious expensive and time-consuming. Worsely, if
> device has written some used descriptors, and the above DMA operation has
> not been completed yet, device can not determine whether to issue used
> event, until it gets newest used buffer notification suppression information.
>
> To help improve this sitiuations, add a new feature:
> VIRTIO_F_USED_EVENT_AUTO_DISABLE. When this feature bit is negotiated,
> devices will place new notification area for used buffer event enable
> notification in PCI bar specfied by virtio_pci_notify_cap, also for better
> used buffer notification coalescing, the device will disable later used
> buffer notifications automatically after sending one used buffer notification
> to driver, and after the driver finishes handling current used buffer event,
> it needs to notify the device to enable used buffer notification again by
> writing above new notification area in PCI bar. Device will receive
> this notification reliably, and device knows that it can issue used event
> now, but it may choose appropriate timing to send used events according to
> its notification coalescing policy , now it also does not need to DMA guest
> memory to obtain notification suppression.
>
> Signed-off-by: Xiaoguang Wang <lege.wang@jaguarmicro.com>
> ---
> content.tex | 34 +++++++++++++++++++++++++++++++++
> transport-pci.tex | 48
> +++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/content.tex b/content.tex
> index 0a62dce..5374510 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -448,6 +448,7 @@ \section{Driver Notifications} \label{sec:Basic Facilities
> of a Virtio Device /
> The driver is sometimes required to send an available buffer
> notification to the device.
>
> +\subsection{Available Buffer Notifications} \label{sec:Basic Facilities of a
> Virtio Device / Driver notifications / Available Buffer Notifications}
> When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
> this notification contains either a virtqueue index if
> VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied
> virtqueue
> @@ -488,6 +489,34 @@ \section{Driver Notifications} \label{sec:Basic
> Facilities of a Virtio Device /
> has been negotiated, these notifications would then have
> identical \field{next_off} and \field{next_wrap} values.
>
> +\subsection{Used Buffer Event Enable Notifications} \label{sec:Basic Facilities
> of a Virtio Device / Driver notifications / Used Buffer Event Enable
> Notifications}
> +This kind of notification is only necessary when
> VIRTIO_F_USED_EVENT_AUTO_DISABLE is negotiated.
> +When this feature bit is negotiated, the device will disable later used buffer
> notification
> +automatically after sending one used buffer notification to the driver, and
> after the driver
> +finishes handling current used buffer event, it needs to notify the device to
> enable used
> +buffer notification again.
> +
> +The notification method and supplying any such virtqueue notification config
> data
> +is transport specific, currently driver notifications to the device include
> +the following information:
> +
> +\begin{description}
> +\item [vq_index or vq_notif_config_data] Either virtqueue index or device
> + supplied queue notification config data corresponding to a virtqueue.
> +\item [next_off] Offset
> + within the ring where the next used ring entry will be written.
> + When VIRTIO_F_RING_PACKED has not been negotiated this refers to
> the
> + 15 least significant bits of the used index.
> + When VIRTIO_F_RING_PACKED has been negotiated this refers to the
> offset
> + (in units of descriptor entries)
> + within the descriptor ring where the next used descriptor will be
> written.
> +\item [next_wrap] Wrap Counter.
> + With VIRTIO_F_RING_PACKED this is the wrap counter
> + referring to the next used descriptor.
> + Without VIRTIO_F_RING_PACKED this is the most significant bit
> + (bit 15) of the used index.
> +\end{description}
> +
> \input{shared-mem.tex}
>
> \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device /
> Exporting Objects}
> @@ -872,6 +901,11 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved
> Feature Bits}
> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
> handling features reserved for future use.
>
> + \item[VIRTIO_F_USED_EVENT_AUTO_DISABLE(42)] This feature indicates
> that the device
> + will disable later used buffer notification automatically after sending one
> used buffer
> + notification to the driver.
> + See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications / Used
> Buffer Event Enable Notifications}
> +
> \end{description}
>
> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..21b80dc 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,8 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> /* About the administration virtqueue. */
> le16 admin_queue_index; /* read-only for driver */
> le16 admin_queue_num; /* read-only for driver */
> +
> + le16 queue_intr_notify_off; /* read-only for driver */
> };
> \end{lstlisting}
>
> @@ -428,6 +430,14 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> The value 0 indicates no supported administration virtqueues.
> This field is valid only if VIRTIO_F_ADMIN_VQ has been
> negotiated.
> +
> +\item[\field{queue_intr_notify_off}]
> + The driver reads this to calculate the offset from start of Used Buffer
> Event Enable Notification structure at
> + which this virtqueue is located.
> + \begin{note} this is \em{not} an offset in bytes.
> + See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI
> Device Layout / Notification capability} below.
> + \end{note}
> +
> \end{description}
>
> \devicenormative{\paragraph}{Common configuration structure layout}{Virtio
> Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common
> configuration structure layout}
> @@ -498,7 +508,7 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> \drivernormative{\paragraph}{Common configuration structure layout}{Virtio
> Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common
> configuration structure layout}
>
> The driver MUST NOT write to \field{device_feature}, \field{num_queues},
> -\field{config_generation}, \field{queue_notify_off} or
> +\field{config_generation}, \field{queue_notify_off},
> \field{queue_intr_notify_off} or
> \field{queue_notif_config_data}.
>
> If VIRTIO_F_RING_PACKED has been negotiated,
> @@ -541,6 +551,7 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options
> struct virtio_pci_notify_cap {
> struct virtio_pci_cap cap;
> le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
> + le32 intr_notify_off_multiplier; /* Multiplier for
> queue_intr_notify_off. Only if VIRTIO_F_USED_EVENT_AUTO_DISABLE is
> negotiated. */
> };
> \end{lstlisting}
>
> @@ -560,10 +571,26 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options
> the same Queue Notify address for all queues.
> \end{note}
>
> +\field{intr_notify_off_multiplier} is combined with the
> \field{queue_intr_notify_off} to
> +derive the Queue Used Buffer Event Enable Notify address within a BAR for a
> virtqueue:
> +
> +\begin{lstlisting}
> + cap.offset + queue_intr_notify_off * intr_notify_off_multiplier
> +\end{lstlisting}
> +
> +The \field{cap.offset} and \field{intr_notify_off_multiplier} are taken from the
> +notification capability structure above, and the \field{queue_intr_notify_off}
> is
> +taken from the common configuration structure.
> +
> +\begin{note}
> +For example, if \field{intr_notifier_off_multiplier} is 0, the device uses
> +the same Queue Used Buffer Event Enable Notify address for all queues.
> +\end{note}
> +
> \devicenormative{\paragraph}{Notification capability}{Virtio Transport
> Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
> The device MUST present at least one notification capability.
>
> -For devices not offering VIRTIO_F_NOTIFICATION_DATA:
> +For devices not offering VIRTIO_F_NOTIFICATION_DATA or
> VIRTIO_F_USED_EVENT_AUTO_DISABLE:
>
> The \field{cap.offset} MUST be 2-byte aligned.
>
> @@ -596,6 +623,23 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options
> cap.length >= queue_notify_off * notify_off_multiplier + 4
> \end{lstlisting}
>
> +For devices offering VIRTIO_F_USED_EVENT_AUTO_DISABLE::
> +
> +The device MUST either present \field{intr_notify_off_multiplier} as a
> +number that is a power of 2 that is also a multiple 4,
> +or present \field{intr_notify_off_multiplier} as 0.
> +
> +The \field{cap.offset} MUST be 4-byte aligned.
> +
> +The value \field{cap.length} presented by the device MUST be at least 8
> +and MUST be large enough to support queue notification offsets
> +for all supported queues in all possible configurations.
> +
> +For all queues, the value \field{cap.length} presented by the device MUST
> satisfy:
> +\begin{lstlisting}
> +cap.length >= queue_intr_notify_off * intr_notify_off_multiplier + 4
> +\end{lstlisting}
> +
> \subsubsection{ISR status capability}\label{sec:Virtio Transport Options /
> Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
>
> The VIRTIO_PCI_CAP_ISR_CFG capability
> --
> 2.43.0
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-25 2:30 Xiaoguang Wang
2024-01-25 3:28 ` Lege Wang
@ 2024-01-30 6:11 ` Lege Wang
2024-01-30 6:52 ` Parav Pandit
2 siblings, 0 replies; 16+ messages in thread
From: Lege Wang @ 2024-01-30 6:11 UTC (permalink / raw)
To: virtio-comment@lists.oasis-open.org
hi
Gentle ping, thanks.
Lege.wang
> -----Original Message-----
> From: Lege Wang <lege.wang@jaguarmicro.com>
> Sent: Thursday, January 25, 2024 10:30 AM
> To: virtio-comment@lists.oasis-open.org
> Cc: Lege Wang <lege.wang@jaguarmicro.com>
> Subject: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE:
> add new used buffer notification suppression mechanism
>
> Currently for the PCI transport, used buffer notification suppression
> informations are located in guest memory, for example, pvirtq_event_suppress
> for packed ring and virtq_avail for split ring, then PCI devices simulated
> by hypervisors, such as QEMU, can vist these suppression informations
> directly before issuing used event, the only overheads may be just proper
> memory barriers, but for some hardware PCI devices offered by
> DPU(Data Processing Unit), DPU would need to submit DMAs to copy guest
> suppression informations to DPU internal cache or memory, then check
> whether used event should be notified after examing this internal cache
> or memory, which is obvious expensive and time-consuming. Worsely, if
> device has written some used descriptors, and the above DMA operation has
> not been completed yet, device can not determine whether to issue used
> event, until it gets newest used buffer notification suppression information.
>
> To help improve this sitiuations, add a new feature:
> VIRTIO_F_USED_EVENT_AUTO_DISABLE. When this feature bit is negotiated,
> devices will place new notification area for used buffer event enable
> notification in PCI bar specfied by virtio_pci_notify_cap, also for better
> used buffer notification coalescing, the device will disable later used
> buffer notifications automatically after sending one used buffer notification
> to driver, and after the driver finishes handling current used buffer event,
> it needs to notify the device to enable used buffer notification again by
> writing above new notification area in PCI bar. Device will receive
> this notification reliably, and device knows that it can issue used event
> now, but it may choose appropriate timing to send used events according to
> its notification coalescing policy , now it also does not need to DMA guest
> memory to obtain notification suppression.
>
> Signed-off-by: Xiaoguang Wang <lege.wang@jaguarmicro.com>
> ---
> content.tex | 34 +++++++++++++++++++++++++++++++++
> transport-pci.tex | 48
> +++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/content.tex b/content.tex
> index 0a62dce..5374510 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -448,6 +448,7 @@ \section{Driver Notifications} \label{sec:Basic Facilities
> of a Virtio Device /
> The driver is sometimes required to send an available buffer
> notification to the device.
>
> +\subsection{Available Buffer Notifications} \label{sec:Basic Facilities of a
> Virtio Device / Driver notifications / Available Buffer Notifications}
> When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
> this notification contains either a virtqueue index if
> VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied
> virtqueue
> @@ -488,6 +489,34 @@ \section{Driver Notifications} \label{sec:Basic
> Facilities of a Virtio Device /
> has been negotiated, these notifications would then have
> identical \field{next_off} and \field{next_wrap} values.
>
> +\subsection{Used Buffer Event Enable Notifications} \label{sec:Basic Facilities
> of a Virtio Device / Driver notifications / Used Buffer Event Enable
> Notifications}
> +This kind of notification is only necessary when
> VIRTIO_F_USED_EVENT_AUTO_DISABLE is negotiated.
> +When this feature bit is negotiated, the device will disable later used buffer
> notification
> +automatically after sending one used buffer notification to the driver, and
> after the driver
> +finishes handling current used buffer event, it needs to notify the device to
> enable used
> +buffer notification again.
> +
> +The notification method and supplying any such virtqueue notification config
> data
> +is transport specific, currently driver notifications to the device include
> +the following information:
> +
> +\begin{description}
> +\item [vq_index or vq_notif_config_data] Either virtqueue index or device
> + supplied queue notification config data corresponding to a virtqueue.
> +\item [next_off] Offset
> + within the ring where the next used ring entry will be written.
> + When VIRTIO_F_RING_PACKED has not been negotiated this refers to
> the
> + 15 least significant bits of the used index.
> + When VIRTIO_F_RING_PACKED has been negotiated this refers to the
> offset
> + (in units of descriptor entries)
> + within the descriptor ring where the next used descriptor will be
> written.
> +\item [next_wrap] Wrap Counter.
> + With VIRTIO_F_RING_PACKED this is the wrap counter
> + referring to the next used descriptor.
> + Without VIRTIO_F_RING_PACKED this is the most significant bit
> + (bit 15) of the used index.
> +\end{description}
> +
> \input{shared-mem.tex}
>
> \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device /
> Exporting Objects}
> @@ -872,6 +901,11 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved
> Feature Bits}
> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
> handling features reserved for future use.
>
> + \item[VIRTIO_F_USED_EVENT_AUTO_DISABLE(42)] This feature indicates
> that the device
> + will disable later used buffer notification automatically after sending one
> used buffer
> + notification to the driver.
> + See \ref{sec:Basic Facilities of a Virtio Device / Driver notifications / Used
> Buffer Event Enable Notifications}
> +
> \end{description}
>
> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..21b80dc 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,8 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> /* About the administration virtqueue. */
> le16 admin_queue_index; /* read-only for driver */
> le16 admin_queue_num; /* read-only for driver */
> +
> + le16 queue_intr_notify_off; /* read-only for driver */
> };
> \end{lstlisting}
>
> @@ -428,6 +430,14 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> The value 0 indicates no supported administration virtqueues.
> This field is valid only if VIRTIO_F_ADMIN_VQ has been
> negotiated.
> +
> +\item[\field{queue_intr_notify_off}]
> + The driver reads this to calculate the offset from start of Used Buffer
> Event Enable Notification structure at
> + which this virtqueue is located.
> + \begin{note} this is \em{not} an offset in bytes.
> + See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI
> Device Layout / Notification capability} below.
> + \end{note}
> +
> \end{description}
>
> \devicenormative{\paragraph}{Common configuration structure layout}{Virtio
> Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common
> configuration structure layout}
> @@ -498,7 +508,7 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> \drivernormative{\paragraph}{Common configuration structure layout}{Virtio
> Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common
> configuration structure layout}
>
> The driver MUST NOT write to \field{device_feature}, \field{num_queues},
> -\field{config_generation}, \field{queue_notify_off} or
> +\field{config_generation}, \field{queue_notify_off},
> \field{queue_intr_notify_off} or
> \field{queue_notif_config_data}.
>
> If VIRTIO_F_RING_PACKED has been negotiated,
> @@ -541,6 +551,7 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options
> struct virtio_pci_notify_cap {
> struct virtio_pci_cap cap;
> le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
> + le32 intr_notify_off_multiplier; /* Multiplier for
> queue_intr_notify_off. Only if VIRTIO_F_USED_EVENT_AUTO_DISABLE is
> negotiated. */
> };
> \end{lstlisting}
>
> @@ -560,10 +571,26 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options
> the same Queue Notify address for all queues.
> \end{note}
>
> +\field{intr_notify_off_multiplier} is combined with the
> \field{queue_intr_notify_off} to
> +derive the Queue Used Buffer Event Enable Notify address within a BAR for a
> virtqueue:
> +
> +\begin{lstlisting}
> + cap.offset + queue_intr_notify_off * intr_notify_off_multiplier
> +\end{lstlisting}
> +
> +The \field{cap.offset} and \field{intr_notify_off_multiplier} are taken from the
> +notification capability structure above, and the \field{queue_intr_notify_off}
> is
> +taken from the common configuration structure.
> +
> +\begin{note}
> +For example, if \field{intr_notifier_off_multiplier} is 0, the device uses
> +the same Queue Used Buffer Event Enable Notify address for all queues.
> +\end{note}
> +
> \devicenormative{\paragraph}{Notification capability}{Virtio Transport
> Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
> The device MUST present at least one notification capability.
>
> -For devices not offering VIRTIO_F_NOTIFICATION_DATA:
> +For devices not offering VIRTIO_F_NOTIFICATION_DATA or
> VIRTIO_F_USED_EVENT_AUTO_DISABLE:
>
> The \field{cap.offset} MUST be 2-byte aligned.
>
> @@ -596,6 +623,23 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options
> cap.length >= queue_notify_off * notify_off_multiplier + 4
> \end{lstlisting}
>
> +For devices offering VIRTIO_F_USED_EVENT_AUTO_DISABLE::
> +
> +The device MUST either present \field{intr_notify_off_multiplier} as a
> +number that is a power of 2 that is also a multiple 4,
> +or present \field{intr_notify_off_multiplier} as 0.
> +
> +The \field{cap.offset} MUST be 4-byte aligned.
> +
> +The value \field{cap.length} presented by the device MUST be at least 8
> +and MUST be large enough to support queue notification offsets
> +for all supported queues in all possible configurations.
> +
> +For all queues, the value \field{cap.length} presented by the device MUST
> satisfy:
> +\begin{lstlisting}
> +cap.length >= queue_intr_notify_off * intr_notify_off_multiplier + 4
> +\end{lstlisting}
> +
> \subsubsection{ISR status capability}\label{sec:Virtio Transport Options /
> Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
>
> The VIRTIO_PCI_CAP_ISR_CFG capability
> --
> 2.43.0
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-25 2:30 Xiaoguang Wang
2024-01-25 3:28 ` Lege Wang
2024-01-30 6:11 ` Lege Wang
@ 2024-01-30 6:52 ` Parav Pandit
2024-01-31 6:33 ` Lege Wang
2024-01-31 8:39 ` Lege Wang
2 siblings, 2 replies; 16+ messages in thread
From: Parav Pandit @ 2024-01-30 6:52 UTC (permalink / raw)
To: Xiaoguang Wang, virtio-comment@lists.oasis-open.org
> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Xiaoguang Wang
> Sent: Thursday, January 25, 2024 8:00 AM
>
> Currently for the PCI transport, used buffer notification suppression
> informations are located in guest memory, for example,
> pvirtq_event_suppress for packed ring and virtq_avail for split ring, then PCI
> devices simulated by hypervisors, such as QEMU, can vist these suppression
> informations directly before issuing used event, the only overheads may be
> just proper memory barriers, but for some hardware PCI devices offered by
> DPU(Data Processing Unit), DPU would need to submit DMAs to copy guest
> suppression informations to DPU internal cache or memory, then check
> whether used event should be notified after examing this internal cache or
> memory, which is obvious expensive and time-consuming. Worsely, if device
> has written some used descriptors, and the above DMA operation has not
> been completed yet, device can not determine whether to issue used event,
> until it gets newest used buffer notification suppression information.
>
> To help improve this sitiuations, add a new feature:
> VIRTIO_F_USED_EVENT_AUTO_DISABLE. When this feature bit is negotiated,
> devices will place new notification area for used buffer event enable
> notification in PCI bar specfied by virtio_pci_notify_cap, also for better used
> buffer notification coalescing, the device will disable later used buffer
> notifications automatically after sending one used buffer notification to driver,
> and after the driver finishes handling current used buffer event, it needs to
> notify the device to enable used buffer notification again by writing above new
> notification area in PCI bar. Device will receive this notification reliably, and
> device knows that it can issue used event now, but it may choose appropriate
> timing to send used events according to its notification coalescing policy , now
> it also does not need to DMA guest memory to obtain notification
> suppression.
>
Good description describing the problem and solution well.
Yes, we discussed this requirement already before at [1].
[1] https://lore.kernel.org/virtio-comment/1689820586.8879051-7-xuanzhuo@linux.alibaba.com/T/#mce485d2c7187b800575c5a63d83b2e9e77f474cc
> Signed-off-by: Xiaoguang Wang <lege.wang@jaguarmicro.com>
> ---
> content.tex | 34 +++++++++++++++++++++++++++++++++
> transport-pci.tex | 48
> +++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/content.tex b/content.tex
> index 0a62dce..5374510 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -448,6 +448,7 @@ \section{Driver Notifications} \label{sec:Basic Facilities
> of a Virtio Device / The driver is sometimes required to send an available
> buffer notification to the device.
>
> +\subsection{Available Buffer Notifications} \label{sec:Basic Facilities
> +of a Virtio Device / Driver notifications / Available Buffer
> +Notifications}
> When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, this
> notification contains either a virtqueue index if
> VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied
> virtqueue @@ -488,6 +489,34 @@ \section{Driver Notifications}
> \label{sec:Basic Facilities of a Virtio Device / has been negotiated, these
> notifications would then have identical \field{next_off} and \field{next_wrap}
> values.
>
> +\subsection{Used Buffer Event Enable Notifications} \label{sec:Basic
> +Facilities of a Virtio Device / Driver notifications / Used Buffer Event Enable
> Notifications} This kind of notification is only necessary when
> VIRTIO_F_USED_EVENT_AUTO_DISABLE is negotiated.
> +When this feature bit is negotiated, the device will disable later used
Remove the word "later".
> +buffer notification automatically after sending one used buffer
> +notification to the driver, and after the driver finishes handling
> +current used buffer event, it needs to notify the device to enable used buffer
> notification again.
> +
> +The notification method and supplying any such virtqueue notification
> +config data is transport specific, currently driver notifications to
> +the device include the following information:
> +
> +\begin{description}
> +\item [vq_index or vq_notif_config_data] Either virtqueue index or device
> + supplied queue notification config data corresponding to a virtqueue.
> +\item [next_off] Offset
> + within the ring where the next used ring entry will be written.
> + When VIRTIO_F_RING_PACKED has not been negotiated this refers to the
> + 15 least significant bits of the used index.
> + When VIRTIO_F_RING_PACKED has been negotiated this refers to the
> offset
> + (in units of descriptor entries)
> + within the descriptor ring where the next used descriptor will be written.
> +\item [next_wrap] Wrap Counter.
> + With VIRTIO_F_RING_PACKED this is the wrap counter
> + referring to the next used descriptor.
> + Without VIRTIO_F_RING_PACKED this is the most significant bit
> + (bit 15) of the used index.
> +\end{description}
> +
This forces the driver to process _all_ used buffer notifications before enabling them again.
There is a race condition that used ring updates are occurring from the device (without notification) and driver updates the value.
For example,
1. driver posted descriptors from index 0 to 9.
2. Used buffer notification arrived for 0 to 3.
3. Driver processes used buffer ring from 0 to 3 and enables used buffer notification for index 4.
4. While #3 is happening, the device already updated the used buffer entries.
5. So device view of used ring entry is messed up as it already passed that point.
Hence a better driver used buffer notification should be,
1. index upto which it has processed
2. explicit enable/disable bit to enable the notification
These two fields clearly communicate to the device, upto how much used buffer is consumed and understand the driver consumption rate.
This way, there is no ambiguity for device on where to write next used ring entry.
(where to write next used entry is not driven by the notification enablement).
And the explicit enable bit indicates notifications are enabled back.
Technically, only enable bit is needed, but doing #1 is helpful for the device to understand driver consumption pace.
And for that may be current notification can be made 64-bits or as suggested in your patch to have new location.
> \input{shared-mem.tex}
>
> \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device /
> Exporting Objects} @@ -872,6 +901,11 @@ \chapter{Reserved Feature
> Bits}\label{sec:Reserved Feature Bits}
> \ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits}
> for
> handling features reserved for future use.
>
> + \item[VIRTIO_F_USED_EVENT_AUTO_DISABLE(42)] This feature indicates
> + that the device will disable later used buffer notification
> + automatically after sending one used buffer notification to the driver.
> + See \ref{sec:Basic Facilities of a Virtio Device / Driver
> + notifications / Used Buffer Event Enable Notifications}
The wording "disable later" is confusing, does not read proper.
May be to reword it as,
This feature indicates that the device will disable used buffer notification automatically after sending one used buffer notification.
> +
> \end{description}
>
> \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} diff
> --git a/transport-pci.tex b/transport-pci.tex index a5c6719..21b80dc 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,8 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> /* About the administration virtqueue. */
> le16 admin_queue_index; /* read-only for driver */
> le16 admin_queue_num; /* read-only for driver */
> +
> + le16 queue_intr_notify_off; /* read-only for driver */
> };
I think it should be renamed to queue_used_buf_notify_off;
This is because notifications are slightly worded differently than interrupts.
> \end{lstlisting}
>
> @@ -428,6 +430,14 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
> The value 0 indicates no supported administration virtqueues.
> This field is valid only if VIRTIO_F_ADMIN_VQ has been
> negotiated.
> +
> +\item[\field{queue_intr_notify_off}]
> + The driver reads this to calculate the offset from start of Used Buffer
> Event Enable Notification structure at
> + which this virtqueue is located.
> + \begin{note} this is \em{not} an offset in bytes.
> + See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device
> Layout / Notification capability} below.
> + \end{note}
> +
This is strange. The notification is in the PCI BAR, right?
it can be done simple as just the BAR offset.
> \end{description}
>
> \devicenormative{\paragraph}{Common configuration structure
> layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> Common configuration structure layout} @@ -498,7 +508,7 @@
> \subsubsection{Common configuration structure layout}\label{sec:Virtio
> Transport \drivernormative{\paragraph}{Common configuration structure
> layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> Common configuration structure layout}
>
> The driver MUST NOT write to \field{device_feature}, \field{num_queues}, -
> \field{config_generation}, \field{queue_notify_off} or
> +\field{config_generation}, \field{queue_notify_off},
> +\field{queue_intr_notify_off} or
> \field{queue_notif_config_data}.
>
> If VIRTIO_F_RING_PACKED has been negotiated, @@ -541,6 +551,7 @@
> \subsubsection{Notification structure layout}\label{sec:Virtio Transport
> Options struct virtio_pci_notify_cap {
> struct virtio_pci_cap cap;
> le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
> + le32 intr_notify_off_multiplier; /* Multiplier for
> + queue_intr_notify_off. Only if VIRTIO_F_USED_EVENT_AUTO_DISABLE is
> + negotiated. */
> };
> \end{lstlisting}
>
I would really like to avoid extending this structure which is not in the PCI extended capability.
And we are almost full with 256B limit.
It is the time to have the PCI extended cap defined and place this field there.
If this multiplier is not useful, maybe it should be added only if needed.
Did you add it because to match the current flexibility of the spec, or your hw actually needs this multiplier?
> @@ -560,10 +571,26 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options the same Queue Notify address for
> all queues.
> \end{note}
>
> +\field{intr_notify_off_multiplier} is combined with the
> +\field{queue_intr_notify_off} to derive the Queue Used Buffer Event Enable
> Notify address within a BAR for a virtqueue:
> +
> +\begin{lstlisting}
> + cap.offset + queue_intr_notify_off * intr_notify_off_multiplier
> +\end{lstlisting}
> +
> +The \field{cap.offset} and \field{intr_notify_off_multiplier} are taken
> +from the notification capability structure above, and the
> +\field{queue_intr_notify_off} is taken from the common configuration
> structure.
> +
> +\begin{note}
> +For example, if \field{intr_notifier_off_multiplier} is 0, the device
> +uses the same Queue Used Buffer Event Enable Notify address for all queues.
> +\end{note}
> +
> \devicenormative{\paragraph}{Notification capability}{Virtio Transport
> Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} The
> device MUST present at least one notification capability.
>
> -For devices not offering VIRTIO_F_NOTIFICATION_DATA:
> +For devices not offering VIRTIO_F_NOTIFICATION_DATA or
> VIRTIO_F_USED_EVENT_AUTO_DISABLE:
>
> The \field{cap.offset} MUST be 2-byte aligned.
>
> @@ -596,6 +623,23 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options cap.length >= queue_notify_off *
> notify_off_multiplier + 4 \end{lstlisting}
>
> +For devices offering VIRTIO_F_USED_EVENT_AUTO_DISABLE::
> +
> +The device MUST either present \field{intr_notify_off_multiplier} as a
> +number that is a power of 2 that is also a multiple 4, or present
> +\field{intr_notify_off_multiplier} as 0.
> +
> +The \field{cap.offset} MUST be 4-byte aligned.
> +
> +The value \field{cap.length} presented by the device MUST be at least 8
> +and MUST be large enough to support queue notification offsets for all
> +supported queues in all possible configurations.
> +
> +For all queues, the value \field{cap.length} presented by the device MUST
> satisfy:
> +\begin{lstlisting}
> +cap.length >= queue_intr_notify_off * intr_notify_off_multiplier + 4
> +\end{lstlisting}
> +
> \subsubsection{ISR status capability}\label{sec:Virtio Transport Options /
> Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
>
> The VIRTIO_PCI_CAP_ISR_CFG capability
> --
> 2.43.0
>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-
> open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-30 6:52 ` Parav Pandit
@ 2024-01-31 6:33 ` Lege Wang
2024-01-31 6:45 ` Parav Pandit
2024-01-31 8:39 ` Lege Wang
1 sibling, 1 reply; 16+ messages in thread
From: Lege Wang @ 2024-01-31 6:33 UTC (permalink / raw)
To: Parav Pandit, virtio-comment@lists.oasis-open.org
hi,
Sorry for late response, I took a while to understand your comments, thanks.
> > notification area in PCI bar. Device will receive this notification reliably, and
> > device knows that it can issue used event now, but it may choose appropriate
> > timing to send used events according to its notification coalescing policy ,
> now
> > it also does not need to DMA guest memory to obtain notification
> > suppression.
> >
> Good description describing the problem and solution well.
> Yes, we discussed this requirement already before at [1].
>
> [1]
> https://lore.kernel.org/virtio-comment/1689820586.8879051-7-xuanzhuo@li
> nux.alibaba.com/T/#mce485d2c7187b800575c5a63d83b2e9e77f474cc
Cool, thanks for this sharing.
I'd also like to help develop following kernel virtio core patch for this feature if
this spec patch is accepted finally.
>
> >
> > +\subsection{Used Buffer Event Enable Notifications} \label{sec:Basic
> > +Facilities of a Virtio Device / Driver notifications / Used Buffer Event Enable
> > Notifications} This kind of notification is only necessary when
> > VIRTIO_F_USED_EVENT_AUTO_DISABLE is negotiated.
> > +When this feature bit is negotiated, the device will disable later used
> Remove the word "later".
OK.
> This forces the driver to process _all_ used buffer notifications before enabling
> them again.
> There is a race condition that used ring updates are occurring from the device
> (without notification) and driver updates the value.
> For example,
> 1. driver posted descriptors from index 0 to 9.
> 2. Used buffer notification arrived for 0 to 3.
> 3. Driver processes used buffer ring from 0 to 3 and enables used buffer
> notification for index 4.
> 4. While #3 is happening, the device already updated the used buffer entries.
> 5. So device view of used ring entry is messed up as it already passed that
> point.
>
> Hence a better driver used buffer notification should be,
>
> 1. index upto which it has processed
> 2. explicit enable/disable bit to enable the notification
>
> These two fields clearly communicate to the device, upto how much used
> buffer is consumed and understand the driver consumption rate.
> This way, there is no ambiguity for device on where to write next used ring
> entry.
> (where to write next used entry is not driven by the notification enablement).
>
> And the explicit enable bit indicates notifications are enabled back.
>
> Technically, only enable bit is needed, but doing #1 is helpful for the device to
> understand driver consumption pace.
Thanks for this valuable suggestion, which clarifies driver and device's behavior clearly.
> And for that may be current notification can be made 64-bits or as suggested
> in your patch to have new location.
After some thoughts, I decide to just make current notification extended to 64-bits for simplicity.
> The wording "disable later" is confusing, does not read proper.
>
> May be to reword it as,
>
> This feature indicates that the device will disable used buffer notification
> automatically after sending one used buffer notification.
OK, thanks.
> > +
> > + le16 queue_intr_notify_off; /* read-only for driver */
> > };
> I think it should be renamed to queue_used_buf_notify_off;
> This is because notifications are slightly worded differently than interrupts.
Since it may be simple to just extend current notification data to 64-bits, there's
no needs to add this field.
> > + which this virtqueue is located.
> > + \begin{note} this is \em{not} an offset in bytes.
> > + See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI
> Device
> > Layout / Notification capability} below.
> > + \end{note}
> > +
> This is strange. The notification is in the PCI BAR, right?
> it can be done simple as just the BAR offset.
See below comment.
>
> >
> > + queue_intr_notify_off. Only if VIRTIO_F_USED_EVENT_AUTO_DISABLE is
> > + negotiated. */
> > };
> > \end{lstlisting}
> >
> I would really like to avoid extending this structure which is not in the PCI
> extended capability.
> And we are almost full with 256B limit.
> It is the time to have the PCI extended cap defined and place this field there.
Yes, I understand your concerns. Initially I intended to add new PCI capability
to describe this new notification area, but found that this new capability would
be similar to virtio_pci_notify_cap, and will introduce many duplicate descriptions
in virtio spec doc. If we choose to extent current notification area, there're no needs
to touch current pci capability, the corresponding queue notify address to enable
used buffer notification within a BAR for a virtqueue would look like below:
cap.offset + queue_notify_off * notify_off_multiplier + 4
> If this multiplier is not useful, maybe it should be added only if needed.
> Did you add it because to match the current flexibility of the spec, or your hw
> actually needs this multiplier?
Yes, just to match the current spec ??
Regards,
lege.wang
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-31 6:33 ` Lege Wang
@ 2024-01-31 6:45 ` Parav Pandit
2024-01-31 7:21 ` Lege Wang
0 siblings, 1 reply; 16+ messages in thread
From: Parav Pandit @ 2024-01-31 6:45 UTC (permalink / raw)
To: Lege Wang, virtio-comment@lists.oasis-open.org
> From: Lege Wang <lege.wang@jaguarmicro.com>
> Sent: Wednesday, January 31, 2024 12:04 PM
>
> hi,
>
> Sorry for late response, I took a while to understand your comments, thanks.
> > > notification area in PCI bar. Device will receive this notification
> > > reliably, and device knows that it can issue used event now, but it
> > > may choose appropriate timing to send used events according to its
> > > notification coalescing policy ,
> > now
> > > it also does not need to DMA guest memory to obtain notification
> > > suppression.
> > >
> > Good description describing the problem and solution well.
> > Yes, we discussed this requirement already before at [1].
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/1689820586.8879051-7-
> xuanzhuo@l
> > i nux.alibaba.com/T/#mce485d2c7187b800575c5a63d83b2e9e77f474cc
> Cool, thanks for this sharing.
> I'd also like to help develop following kernel virtio core patch for this feature if
> this spec patch is accepted finally.
>
> >
> > >
> > > +\subsection{Used Buffer Event Enable Notifications}
> > > +\label{sec:Basic Facilities of a Virtio Device / Driver
> > > +notifications / Used Buffer Event Enable
> > > Notifications} This kind of notification is only necessary when
> > > VIRTIO_F_USED_EVENT_AUTO_DISABLE is negotiated.
> > > +When this feature bit is negotiated, the device will disable later
> > > +used
> > Remove the word "later".
> OK.
>
> > This forces the driver to process _all_ used buffer notifications
> > before enabling them again.
> > There is a race condition that used ring updates are occurring from
> > the device (without notification) and driver updates the value.
> > For example,
> > 1. driver posted descriptors from index 0 to 9.
> > 2. Used buffer notification arrived for 0 to 3.
> > 3. Driver processes used buffer ring from 0 to 3 and enables used
> > buffer notification for index 4.
> > 4. While #3 is happening, the device already updated the used buffer entries.
> > 5. So device view of used ring entry is messed up as it already passed
> > that point.
> >
> > Hence a better driver used buffer notification should be,
> >
> > 1. index upto which it has processed
> > 2. explicit enable/disable bit to enable the notification
> >
> > These two fields clearly communicate to the device, upto how much used
> > buffer is consumed and understand the driver consumption rate.
> > This way, there is no ambiguity for device on where to write next used
> > ring entry.
> > (where to write next used entry is not driven by the notification
> enablement).
> >
> > And the explicit enable bit indicates notifications are enabled back.
> >
> > Technically, only enable bit is needed, but doing #1 is helpful for
> > the device to understand driver consumption pace.
> Thanks for this valuable suggestion, which clarifies driver and device's behavior
> clearly.
>
> > And for that may be current notification can be made 64-bits or as
> > suggested in your patch to have new location.
> After some thoughts, I decide to just make current notification extended to
> 64-bits for simplicity.
I think having 2 32-bit notification registers is better.
Existing one for avail desc posting.
New 32-bit one for used buffer consumption and re-enablement of notification.
This way we get good backward compatibility and also the optional behavior.
3rd advantage of keeping them as two is when used buffer notification enablement is done, it does not need to touch the device for avail posting.
In some systems, a 64-bit single PCIe TLP write may not be possible, through rare, it may be there.
So having it as separate 32-bit used buffer notification is good.
>
> > The wording "disable later" is confusing, does not read proper.
> >
> > May be to reword it as,
> >
> > This feature indicates that the device will disable used buffer
> > notification automatically after sending one used buffer notification.
> OK, thanks.
>
> > > +
> > > + le16 queue_intr_notify_off; /* read-only for driver */
> > > };
> > I think it should be renamed to queue_used_buf_notify_off; This is
> > because notifications are slightly worded differently than interrupts.
> Since it may be simple to just extend current notification data to 64-bits,
> there's no needs to add this field.
>
> > > + which this virtqueue is located.
> > > + \begin{note} this is \em{not} an offset in bytes.
> > > + See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus
> > > + / PCI
> > Device
> > > Layout / Notification capability} below.
> > > + \end{note}
> > > +
> > This is strange. The notification is in the PCI BAR, right?
> > it can be done simple as just the BAR offset.
> See below comment.
>
> >
> > >
> > > + queue_intr_notify_off. Only if VIRTIO_F_USED_EVENT_AUTO_DISABLE is
> > > + negotiated. */
> > > };
> > > \end{lstlisting}
> > >
> > I would really like to avoid extending this structure which is not in
> > the PCI extended capability.
> > And we are almost full with 256B limit.
> > It is the time to have the PCI extended cap defined and place this field there.
> Yes, I understand your concerns. Initially I intended to add new PCI capability
> to describe this new notification area, but found that this new capability
> would be similar to virtio_pci_notify_cap, and will introduce many duplicate
> descriptions in virtio spec doc. If we choose to extent current notification area,
> there're no needs to touch current pci capability, the corresponding queue
> notify address to enable used buffer notification within a BAR for a virtqueue
> would look like below:
> cap.offset + queue_notify_off * notify_off_multiplier + 4
>
Just to double check,
If avail buffer notification is at offset = X
Used buffer notification is at offset X + 4.
Right?
> > If this multiplier is not useful, maybe it should be added only if needed.
> > Did you add it because to match the current flexibility of the spec,
> > or your hw actually needs this multiplier?
> Yes, just to match the current spec ??
>
Ok. that explains.
Yes lets keep things simple as X, and X+4 offsets?
> Regards,
> lege.wang
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-31 6:45 ` Parav Pandit
@ 2024-01-31 7:21 ` Lege Wang
2024-01-31 7:24 ` Parav Pandit
0 siblings, 1 reply; 16+ messages in thread
From: Lege Wang @ 2024-01-31 7:21 UTC (permalink / raw)
To: Parav Pandit, virtio-comment@lists.oasis-open.org
hi,
> >
> > > And for that may be current notification can be made 64-bits or as
> > > suggested in your patch to have new location.
> > After some thoughts, I decide to just make current notification extended to
> > 64-bits for simplicity.
> I think having 2 32-bit notification registers is better.
> Existing one for avail desc posting.
> New 32-bit one for used buffer consumption and re-enablement of
> notification.
> This way we get good backward compatibility and also the optional behavior.
> 3rd advantage of keeping them as two is when used buffer notification
> enablement is done, it does not need to touch the device for avail posting.
Yes, indeed this is what I have in mind, I did not clarify it clearly.
These two 32-bits registers are adjacent, and both are 4-byte aligned.
>
> In some systems, a 64-bit single PCIe TLP write may not be possible, through
> rare, it may be there.
> So having it as separate 32-bit used buffer notification is good.
OK, I see.
>
> > Yes, I understand your concerns. Initially I intended to add new PCI capability
> > to describe this new notification area, but found that this new capability
> > would be similar to virtio_pci_notify_cap, and will introduce many duplicate
> > descriptions in virtio spec doc. If we choose to extent current notification
> area,
> > there're no needs to touch current pci capability, the corresponding queue
> > notify address to enable used buffer notification within a BAR for a virtqueue
> > would look like below:
> > cap.offset + queue_notify_off * notify_off_multiplier + 4
> >
> Just to double check,
>
> If avail buffer notification is at offset = X
> Used buffer notification is at offset X + 4.
> Right?
Yes.
>
> > > If this multiplier is not useful, maybe it should be added only if needed.
> > > Did you add it because to match the current flexibility of the spec,
> > > or your hw actually needs this multiplier?
> > Yes, just to match the current spec ??
> >
> Ok. that explains.
> Yes lets keep things simple as X, and X+4 offsets?
Yes, I think so.
Regards.
lege.wang
>
> > Regards,
> > lege.wang
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-31 7:21 ` Lege Wang
@ 2024-01-31 7:24 ` Parav Pandit
0 siblings, 0 replies; 16+ messages in thread
From: Parav Pandit @ 2024-01-31 7:24 UTC (permalink / raw)
To: Lege Wang, virtio-comment@lists.oasis-open.org
> From: Lege Wang <lege.wang@jaguarmicro.com>
> Sent: Wednesday, January 31, 2024 12:51 PM
> To: Parav Pandit <parav@nvidia.com>; virtio-comment@lists.oasis-open.org
> Subject: RE: [virtio-comment] [RFC PATCH]
> VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification
> suppression mechanism
>
> hi,
>
>
> > >
> > > > And for that may be current notification can be made 64-bits or as
> > > > suggested in your patch to have new location.
> > > After some thoughts, I decide to just make current notification
> > > extended to 64-bits for simplicity.
> > I think having 2 32-bit notification registers is better.
> > Existing one for avail desc posting.
> > New 32-bit one for used buffer consumption and re-enablement of
> > notification.
> > This way we get good backward compatibility and also the optional behavior.
> > 3rd advantage of keeping them as two is when used buffer notification
> > enablement is done, it does not need to touch the device for avail posting.
> Yes, indeed this is what I have in mind, I did not clarify it clearly.
> These two 32-bits registers are adjacent, and both are 4-byte aligned.
>
> >
> > In some systems, a 64-bit single PCIe TLP write may not be possible,
> > through rare, it may be there.
> > So having it as separate 32-bit used buffer notification is good.
> OK, I see.
>
> >
> > > Yes, I understand your concerns. Initially I intended to add new PCI
> > > capability to describe this new notification area, but found that
> > > this new capability would be similar to virtio_pci_notify_cap, and
> > > will introduce many duplicate descriptions in virtio spec doc. If we
> > > choose to extent current notification
> > area,
> > > there're no needs to touch current pci capability, the corresponding
> > > queue notify address to enable used buffer notification within a BAR
> > > for a virtqueue would look like below:
> > > cap.offset + queue_notify_off * notify_off_multiplier + 4
> > >
> > Just to double check,
> >
> > If avail buffer notification is at offset = X Used buffer notification
> > is at offset X + 4.
> > Right?
> Yes.
>
> >
> > > > If this multiplier is not useful, maybe it should be added only if needed.
> > > > Did you add it because to match the current flexibility of the
> > > > spec, or your hw actually needs this multiplier?
> > > Yes, just to match the current spec ??
> > >
> > Ok. that explains.
> > Yes lets keep things simple as X, and X+4 offsets?
> Yes, I think so.
>
Ok. Look forward to v1 with these small changes.
Thanks.
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-30 6:52 ` Parav Pandit
2024-01-31 6:33 ` Lege Wang
@ 2024-01-31 8:39 ` Lege Wang
2024-02-01 9:44 ` Lege Wang
2024-02-05 7:40 ` Parav Pandit
1 sibling, 2 replies; 16+ messages in thread
From: Lege Wang @ 2024-01-31 8:39 UTC (permalink / raw)
To: Parav Pandit, virtio-comment@lists.oasis-open.org
hi,
I'd like to have some more discussions about used buffer notification mechanism, thanks.
> > +\item [next_wrap] Wrap Counter.
> > + With VIRTIO_F_RING_PACKED this is the wrap counter
> > + referring to the next used descriptor.
> > + Without VIRTIO_F_RING_PACKED this is the most significant bit
> > + (bit 15) of the used index.
> > +\end{description}
> > +
> This forces the driver to process _all_ used buffer notifications before enabling
> them again.
> There is a race condition that used ring updates are occurring from the device
> (without notification) and driver updates the value.
> For example,
> 1. driver posted descriptors from index 0 to 9.
> 2. Used buffer notification arrived for 0 to 3.
> 3. Driver processes used buffer ring from 0 to 3 and enables used buffer
> notification for index 4.
> 4. While #3 is happening, the device already updated the used buffer entries.
> 5. So device view of used ring entry is messed up as it already passed that point.
>
> Hence a better driver used buffer notification should be,
>
> 1. index upto which it has processed
I would change it to " index upto which it will process next", the reason is that
virtio core driver codes have already a well-maintained "struct vring_virtqueue->last_used_idx",
we can use it for notification, and using this value, the device will also be able to learn
the driver consumption rate.
> 2. explicit enable/disable bit to enable the notification
I wonder if there's a case that the driver just notifies the used index info, but not enable
the device notification, the driver will always need one enable operation finally. And the notification
data would be similar to:
le32 {
union {
vq_index: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA not negotiated */
vq_notif_config_data: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA negotiated */
};
next_off : 15;
next_wrap : 1;
};
next_off and next_wrap will denote last_used_idx info, and vq_index or vq_notif_config_data is
also necessary, so for one 32-bit register, there's no room for this enable/disable bit.
Regards,
lege.wang
>
> These two fields clearly communicate to the device, upto how much used buffer
> is consumed and understand the driver consumption rate.
> This way, there is no ambiguity for device on where to write next used ring
> entry.
> (where to write next used entry is not driven by the notification enablement).
>
> And the explicit enable bit indicates notifications are enabled back.
>
> Technically, only enable bit is needed, but doing #1 is helpful for the device to
> understand driver consumption pace.
> And for that may be current notification can be made 64-bits or as suggested in
> your patch to have new location.
>
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-31 8:39 ` Lege Wang
@ 2024-02-01 9:44 ` Lege Wang
2024-02-05 7:40 ` Parav Pandit
1 sibling, 0 replies; 16+ messages in thread
From: Lege Wang @ 2024-02-01 9:44 UTC (permalink / raw)
To: Parav Pandit, virtio-comment@lists.oasis-open.org
hi,
A friendly reminder in case you missed this email
Regards,
lege.wang
> >
> > Hence a better driver used buffer notification should be,
> >
> > 1. index upto which it has processed
> I would change it to " index upto which it will process next", the reason is that
> virtio core driver codes have already a well-maintained "struct
> vring_virtqueue->last_used_idx",
> we can use it for notification, and using this value, the device will also be able
> to learn
> the driver consumption rate.
>
> > 2. explicit enable/disable bit to enable the notification
> I wonder if there's a case that the driver just notifies the used index info, but
> not enable
> the device notification, the driver will always need one enable operation finally.
> And the notification
> data would be similar to:
> le32 {
> union {
> vq_index: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA not
> negotiated */
> vq_notif_config_data: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA
> negotiated */
> };
> next_off : 15;
> next_wrap : 1;
> };
> next_off and next_wrap will denote last_used_idx info, and vq_index or
> vq_notif_config_data is
> also necessary, so for one 32-bit register, there's no room for this
> enable/disable bit.
>
>
> Regards,
> lege.wang
> >
> > These two fields clearly communicate to the device, upto how much used
> buffer
> > is consumed and understand the driver consumption rate.
> > This way, there is no ambiguity for device on where to write next used ring
> > entry.
> > (where to write next used entry is not driven by the notification enablement).
> >
> > And the explicit enable bit indicates notifications are enabled back.
> >
> > Technically, only enable bit is needed, but doing #1 is helpful for the device to
> > understand driver consumption pace.
> > And for that may be current notification can be made 64-bits or as suggested
> in
> > your patch to have new location.
> >
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-01-31 8:39 ` Lege Wang
2024-02-01 9:44 ` Lege Wang
@ 2024-02-05 7:40 ` Parav Pandit
2024-02-22 1:56 ` Lege Wang
1 sibling, 1 reply; 16+ messages in thread
From: Parav Pandit @ 2024-02-05 7:40 UTC (permalink / raw)
To: Lege Wang, virtio-comment@lists.oasis-open.org
> From: Lege Wang <lege.wang@jaguarmicro.com>
> Sent: Wednesday, January 31, 2024 2:10 PM
>
> hi,
>
> I'd like to have some more discussions about used buffer notification
> mechanism, thanks.
> > > +\item [next_wrap] Wrap Counter.
> > > + With VIRTIO_F_RING_PACKED this is the wrap counter
> > > + referring to the next used descriptor.
> > > + Without VIRTIO_F_RING_PACKED this is the most significant bit
> > > + (bit 15) of the used index.
> > > +\end{description}
> > > +
> > This forces the driver to process _all_ used buffer notifications
> > before enabling them again.
> > There is a race condition that used ring updates are occurring from
> > the device (without notification) and driver updates the value.
> > For example,
> > 1. driver posted descriptors from index 0 to 9.
> > 2. Used buffer notification arrived for 0 to 3.
> > 3. Driver processes used buffer ring from 0 to 3 and enables used
> > buffer notification for index 4.
> > 4. While #3 is happening, the device already updated the used buffer entries.
> > 5. So device view of used ring entry is messed up as it already passed that
> point.
> >
> > Hence a better driver used buffer notification should be,
> >
> > 1. index upto which it has processed
> I would change it to " index upto which it will process next", the reason is that
> virtio core driver codes have already a well-maintained "struct
> vring_virtqueue->last_used_idx", we can use it for notification, and using this
> value, the device will also be able to learn the driver consumption rate.
>
It the index upto which the driver _has_ already processed, right?
> > 2. explicit enable/disable bit to enable the notification
> I wonder if there's a case that the driver just notifies the used index info, but
> not enable the device notification,
Yes. there is a scenario.
For example, device has received 28 packets.
When the driver is polling, it was given budget to poll only 20 packets. In this case NAPI context will come back again to poll rest of the 8 packets.
In this scenario, it is desired to still update the polled completions (used elements), but not enable the notification.
Notification can be enabled later when remaining 8 or more packets are processed.
> the driver will always need one enable
> operation finally. And the notification data would be similar to:
> le32 {
> union {
> vq_index: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA not
> negotiated */
> vq_notif_config_data: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA
> negotiated */
> };
> next_off : 15;
> next_wrap : 1;
> };
> next_off and next_wrap will denote last_used_idx info, and vq_index or
> vq_notif_config_data is also necessary, so for one 32-bit register, there's no
> room for this enable/disable bit.
>
Right. There isn't enough room for enable bit.
I see two options.
1. When device wants to still work with 32-bit doorbells, it can reserve top bit of vq_index or next_off for enable/disable.
Here the restriction is the queue depth would be limited to 16K descriptors, which is fairly large enough.
Or having top bix of vq_index is better as, it may be well within limit for a device to offer 32K queues.
For non rdma devices 32K queues is also seems reasonable large value.
2. Device can offer 64-bit format where enable bit is located in the upper 32-bit.
3. Or may be option for the device with two different feature bits, because 64-bit doorbells are relatively new for virtio to adapt to.
>
> Regards,
> lege.wang
> >
> > These two fields clearly communicate to the device, upto how much used
> > buffer is consumed and understand the driver consumption rate.
> > This way, there is no ambiguity for device on where to write next used
> > ring entry.
> > (where to write next used entry is not driven by the notification
> enablement).
> >
> > And the explicit enable bit indicates notifications are enabled back.
> >
> > Technically, only enable bit is needed, but doing #1 is helpful for
> > the device to understand driver consumption pace.
> > And for that may be current notification can be made 64-bits or as
> > suggested in your patch to have new location.
> >
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread* RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism
2024-02-05 7:40 ` Parav Pandit
@ 2024-02-22 1:56 ` Lege Wang
0 siblings, 0 replies; 16+ messages in thread
From: Lege Wang @ 2024-02-22 1:56 UTC (permalink / raw)
To: Parav Pandit, virtio-comment@lists.oasis-open.org
hi,
I'm back from a long vacation, and have some internal work to do, sorry for
long late response.
> > >
> > > Hence a better driver used buffer notification should be,
> > >
> > > 1. index upto which it has processed
> > I would change it to " index upto which it will process next", the reason is
> that
> > virtio core driver codes have already a well-maintained "struct
> > vring_virtqueue->last_used_idx", we can use it for notification, and using this
> > value, the device will also be able to learn the driver consumption rate.
> >
> It the index upto which the driver _has_ already processed, right?
Yes, and your previous comment is right, ignore my comment above.
>
> > > 2. explicit enable/disable bit to enable the notification
> > I wonder if there's a case that the driver just notifies the used index info, but
> > not enable the device notification,
> Yes. there is a scenario.
> For example, device has received 28 packets.
> When the driver is polling, it was given budget to poll only 20 packets. In this
> case NAPI context will come back again to poll rest of the 8 packets.
> In this scenario, it is desired to still update the polled completions (used
> elements), but not enable the notification.
> Notification can be enabled later when remaining 8 or more packets are
> processed.
OK, I see.
>
> > the driver will always need one enable
> > operation finally. And the notification data would be similar to:
> > le32 {
> > union {
> > vq_index: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA
> not
> > negotiated */
> > vq_notif_config_data: 16; /* Used if VIRTIO_F_NOTIF_CONFIG_DATA
> > negotiated */
> > };
> > next_off : 15;
> > next_wrap : 1;
> > };
> > next_off and next_wrap will denote last_used_idx info, and vq_index or
> > vq_notif_config_data is also necessary, so for one 32-bit register, there's no
> > room for this enable/disable bit.
> >
> Right. There isn't enough room for enable bit.
> I see two options.
>
> 1. When device wants to still work with 32-bit doorbells, it can reserve top bit
> of vq_index or next_off for enable/disable.
> Here the restriction is the queue depth would be limited to 16K descriptors,
> which is fairly large enough.
> Or having top bix of vq_index is better as, it may be well within limit for a
> device to offer 32K queues.
> For non rdma devices 32K queues is also seems reasonable large value.
I'd like to not add these restrictions, in case some devices need bigger number of
descriptors or queue depth.
>
> 2. Device can offer 64-bit format where enable bit is located in the upper
> 32-bit.
>
> 3. Or may be option for the device with two different feature bits, because
> 64-bit doorbells are relatively new for virtio to adapt to.
Sounds feasible, I'll try this way.
I'll prepare a V1 patch as soon as possible, thanks for your patience.
Regards,
Lege.wang
>
> >
> > Regards,
> > lege.wang
> > >
> > > These two fields clearly communicate to the device, upto how much used
> > > buffer is consumed and understand the driver consumption rate.
> > > This way, there is no ambiguity for device on where to write next used
> > > ring entry.
> > > (where to write next used entry is not driven by the notification
> > enablement).
> > >
> > > And the explicit enable bit indicates notifications are enabled back.
> > >
> > > Technically, only enable bit is needed, but doing #1 is helpful for
> > > the device to understand driver consumption pace.
> > > And for that may be current notification can be made 64-bits or as
> > > suggested in your patch to have new location.
> > >
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 16+ messages in thread