All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC
@ 2021-11-19 13:21 Christian Schoenebeck
  2021-11-24 17:14 ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Schoenebeck @ 2021-11-19 13:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, Greg Kurz, Cornelia Huck

This new feature flag allows indirect descriptor tables to
exceed the queue size.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
Stefan noted that there should also be a numeric configuration field
reflecting a precise limit of indirect descriptors. The question is where
should that go to exactly? Some devices currently handle this in their
device specific configuration space. Wouldn't it make sense to handle that
in the common configuration space instead?
---
 content.tex    | 21 +++++++++++++++++++++
 split-ring.tex |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/content.tex b/content.tex
index 5d112af..b42a26b 100644
--- a/content.tex
+++ b/content.tex
@@ -6693,6 +6693,27 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   transport specific.
   For more details about driver notifications over PCI see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}.
 
+  \item[VIRTIO_RING_F_LARGE_INDIRECT_DESC(40)] This feature indicates that the
+  amount of descriptors in an indirect descriptor table is allowed to exceed
+  the Queue Size.
+
+  If this feature bit is not negotiated, then a driver MUST NOT create a
+  descriptor chain longer than the Queue Size, which then also applies to
+  indirect descriptor tables as in \ref{sec:Basic Facilities of a Virtio
+  Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
+  Descriptors}. Which means without this feature, the Queue Size limits
+  both the maximum amount of pending messages being emplaced in the vring,
+  as well as the actual bulk data size being transmitted.
+
+  With this feature enabled, the Queue Size only limits the maximum amount
+  of pending messages in the vring, but does not oppose a limit to the actual
+  bulk data size being transmitted. Decoupling these two configuration
+  parameters this way not only allows much larger bulk data being transferred
+  per message, but also avoids complicated synchronization mechanisms if
+  device only supports a very small amount of pending/active messages.
+  Due to the 16-bit size of a descriptor's "next" field there is still an
+  absolute limit of $2^{16}$ descriptors per indirect descriptor table.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
diff --git a/split-ring.tex b/split-ring.tex
index bfef62d..5284635 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -269,7 +269,7 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
 one table per descriptor).
 
 A driver MUST NOT create a descriptor chain longer than the Queue Size of
-the device.
+the device unless VIRTIO_RING_F_LARGE_INDIRECT_DESC has been negotiated.
 
 A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
 in \field{flags}.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC
  2021-11-19 13:21 [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC Christian Schoenebeck
@ 2021-11-24 17:14 ` Stefan Hajnoczi
  2021-11-25 13:24   ` [virtio-comment] " Christian Schoenebeck
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2021-11-24 17:14 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: virtio-comment, Greg Kurz, Cornelia Huck

[-- Attachment #1: Type: text/plain, Size: 3623 bytes --]

On Fri, Nov 19, 2021 at 02:21:12PM +0100, Christian Schoenebeck wrote:
> This new feature flag allows indirect descriptor tables to
> exceed the queue size.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---
> Stefan noted that there should also be a numeric configuration field
> reflecting a precise limit of indirect descriptors. The question is where
> should that go to exactly? Some devices currently handle this in their
> device specific configuration space. Wouldn't it make sense to handle that
> in the common configuration space instead?

It could be added as a read-only struct virtio_pci_common_cfg le16
queue_indirect_size field. The same needs to be done for the other
transports.

> ---
>  content.tex    | 21 +++++++++++++++++++++
>  split-ring.tex |  2 +-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/content.tex b/content.tex
> index 5d112af..b42a26b 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -6693,6 +6693,27 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    transport specific.
>    For more details about driver notifications over PCI see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Available Buffer Notifications}.
>  
> +  \item[VIRTIO_RING_F_LARGE_INDIRECT_DESC(40)] This feature indicates that the
> +  amount of descriptors in an indirect descriptor table is allowed to exceed
> +  the Queue Size.
> +
> +  If this feature bit is not negotiated, then a driver MUST NOT create a
> +  descriptor chain longer than the Queue Size, which then also applies to
> +  indirect descriptor tables as in \ref{sec:Basic Facilities of a Virtio
> +  Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> +  Descriptors}. Which means without this feature, the Queue Size limits
> +  both the maximum amount of pending messages being emplaced in the vring,

The word "buffers" is typically used instead of "messages".

> +  as well as the actual bulk data size being transmitted.
> +
> +  With this feature enabled, the Queue Size only limits the maximum amount
> +  of pending messages in the vring, but does not oppose a limit to the actual
> +  bulk data size being transmitted. Decoupling these two configuration

I think we're still limited by Queue Size in the virtqueue descriptor
table so this statement only applies when indirect descriptors are used:

  "... build data size being transmitted when indirect descriptors are used."

> +  parameters this way not only allows much larger bulk data being transferred
> +  per message, but also avoids complicated synchronization mechanisms if
> +  device only supports a very small amount of pending/active messages.
> +  Due to the 16-bit size of a descriptor's "next" field there is still an
> +  absolute limit of $2^{16}$ descriptors per indirect descriptor table.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> diff --git a/split-ring.tex b/split-ring.tex
> index bfef62d..5284635 100644
> --- a/split-ring.tex
> +++ b/split-ring.tex
> @@ -269,7 +269,7 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>  one table per descriptor).
>  
>  A driver MUST NOT create a descriptor chain longer than the Queue Size of
> -the device.
> +the device unless VIRTIO_RING_F_LARGE_INDIRECT_DESC has been negotiated.
>  
>  A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
>  in \field{flags}.
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [virtio-comment] Re: [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC
  2021-11-24 17:14 ` Stefan Hajnoczi
@ 2021-11-25 13:24   ` Christian Schoenebeck
  2021-11-25 14:58     ` Cornelia Huck
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Schoenebeck @ 2021-11-25 13:24 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, Greg Kurz, Cornelia Huck

On Mittwoch, 24. November 2021 18:14:59 CET Stefan Hajnoczi wrote:
> On Fri, Nov 19, 2021 at 02:21:12PM +0100, Christian Schoenebeck wrote:
> > This new feature flag allows indirect descriptor tables to
> > exceed the queue size.
> > 
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---
> > Stefan noted that there should also be a numeric configuration field
> > reflecting a precise limit of indirect descriptors. The question is where
> > should that go to exactly? Some devices currently handle this in their
> > device specific configuration space. Wouldn't it make sense to handle that
> > in the common configuration space instead?
> 
> It could be added as a read-only struct virtio_pci_common_cfg le16
> queue_indirect_size field. The same needs to be done for the other
> transports.

OK, do I have to make it clear that it is an optional field in 
virtio_pci_common_cfg?

Are you sure about read-only? QEMU currently reserves the worst case expected 
amount of descriptors on stack on every vring entry being processed. If that 
field was read-write instead, and if guest driver never uses the amount of 
descriptors as advertised by host device being support, guest driver could 
simply lower that value and reduce the pressure on host device.

So maybe it would make sense making it read-write with the requirement that 
guest driver must not increase the value? In the end that write option would 
be a soft feature, device could still ignore it and allocate more, it wouldn't 
hurt IMO and avoids another feature flag being required for this in future.

> > ---
> > 
> >  content.tex    | 21 +++++++++++++++++++++
> >  split-ring.tex |  2 +-
> >  2 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/content.tex b/content.tex
> > index 5d112af..b42a26b 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -6693,6 +6693,27 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved
> > Feature Bits}> 
> >    transport specific.
> >    For more details about driver notifications over PCI see
> >    \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific
> >    Initialization And Device Operation / Available Buffer Notifications}.> 
> > +  \item[VIRTIO_RING_F_LARGE_INDIRECT_DESC(40)] This feature indicates
> > that the +  amount of descriptors in an indirect descriptor table is
> > allowed to exceed +  the Queue Size.
> > +
> > +  If this feature bit is not negotiated, then a driver MUST NOT create a
> > +  descriptor chain longer than the Queue Size, which then also applies to
> > +  indirect descriptor tables as in \ref{sec:Basic Facilities of a Virtio
> > +  Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> > +  Descriptors}. Which means without this feature, the Queue Size limits
> > +  both the maximum amount of pending messages being emplaced in the
> > vring,
> 
> The word "buffers" is typically used instead of "messages".

If that's the common terminology in the virtio spec, OK, I'll change that.

> > +  as well as the actual bulk data size being transmitted.
> > +
> > +  With this feature enabled, the Queue Size only limits the maximum
> > amount
> > +  of pending messages in the vring, but does not oppose a limit to the
> > actual +  bulk data size being transmitted. Decoupling these two
> > configuration
> I think we're still limited by Queue Size in the virtqueue descriptor
> table so this statement only applies when indirect descriptors are used:
> 
>   "... build data size being transmitted when indirect descriptors are
> used."

I thought this was already implied in this block, but sure it makes it more 
clear, so fine with me.

> > +  parameters this way not only allows much larger bulk data being
> > transferred +  per message, but also avoids complicated synchronization
> > mechanisms if +  device only supports a very small amount of
> > pending/active messages. +  Due to the 16-bit size of a descriptor's
> > "next" field there is still an +  absolute limit of $2^{16}$ descriptors
> > per indirect descriptor table. +
> > 
> >  \end{description}
> >  
> >  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > 
> > diff --git a/split-ring.tex b/split-ring.tex
> > index bfef62d..5284635 100644
> > --- a/split-ring.tex
> > +++ b/split-ring.tex
> > @@ -269,7 +269,7 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic
> > Facilities of a Virtio Devi> 
> >  one table per descriptor).
> >  
> >  A driver MUST NOT create a descriptor chain longer than the Queue Size of
> > 
> > -the device.
> > +the device unless VIRTIO_RING_F_LARGE_INDIRECT_DESC has been negotiated.
> > 
> >  A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
> >  in \field{flags}.



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [virtio-comment] Re: [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC
  2021-11-25 13:24   ` [virtio-comment] " Christian Schoenebeck
@ 2021-11-25 14:58     ` Cornelia Huck
  2021-11-29 10:56       ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Cornelia Huck @ 2021-11-25 14:58 UTC (permalink / raw)
  To: Christian Schoenebeck, Stefan Hajnoczi; +Cc: virtio-comment, Greg Kurz

On Thu, Nov 25 2021, Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Mittwoch, 24. November 2021 18:14:59 CET Stefan Hajnoczi wrote:
>> On Fri, Nov 19, 2021 at 02:21:12PM +0100, Christian Schoenebeck wrote:
>> > This new feature flag allows indirect descriptor tables to
>> > exceed the queue size.
>> > 
>> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
>> > ---
>> > Stefan noted that there should also be a numeric configuration field
>> > reflecting a precise limit of indirect descriptors. The question is where
>> > should that go to exactly? Some devices currently handle this in their
>> > device specific configuration space. Wouldn't it make sense to handle that
>> > in the common configuration space instead?
>> 
>> It could be added as a read-only struct virtio_pci_common_cfg le16
>> queue_indirect_size field. The same needs to be done for the other
>> transports.
>
> OK, do I have to make it clear that it is an optional field in 
> virtio_pci_common_cfg?

It would need to be something like "this field only exists if
VIRTIO_RING_F_LARGE_INDIRECT_DESC has been negotiated".

I'm not quite sure how MMIO expresses optional registers.

For CCW, it's a bit more complicated: You would need to extend two
command payloads (vq_config_block and vq_info_block, assuming you want
to make this read/write); to do that, we will likely want to introduce
revision 3 (and make the feature bit dependent on revision 3).

>
> Are you sure about read-only? QEMU currently reserves the worst case expected 
> amount of descriptors on stack on every vring entry being processed. If that 
> field was read-write instead, and if guest driver never uses the amount of 
> descriptors as advertised by host device being support, guest driver could 
> simply lower that value and reduce the pressure on host device.
>
> So maybe it would make sense making it read-write with the requirement that 
> guest driver must not increase the value? In the end that write option would 
> be a soft feature, device could still ignore it and allocate more, it wouldn't 
> hurt IMO and avoids another feature flag being required for this in future.

I agree, this should be read/write. It's similar to the queue size,
where the driver may also configure a lower number.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [virtio-comment] Re: [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC
  2021-11-25 14:58     ` Cornelia Huck
@ 2021-11-29 10:56       ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2021-11-29 10:56 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Christian Schoenebeck, virtio-comment, Greg Kurz

[-- Attachment #1: Type: text/plain, Size: 2509 bytes --]

On Thu, Nov 25, 2021 at 03:58:32PM +0100, Cornelia Huck wrote:
> On Thu, Nov 25 2021, Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> 
> > On Mittwoch, 24. November 2021 18:14:59 CET Stefan Hajnoczi wrote:
> >> On Fri, Nov 19, 2021 at 02:21:12PM +0100, Christian Schoenebeck wrote:
> >> > This new feature flag allows indirect descriptor tables to
> >> > exceed the queue size.
> >> > 
> >> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> >> > ---
> >> > Stefan noted that there should also be a numeric configuration field
> >> > reflecting a precise limit of indirect descriptors. The question is where
> >> > should that go to exactly? Some devices currently handle this in their
> >> > device specific configuration space. Wouldn't it make sense to handle that
> >> > in the common configuration space instead?
> >> 
> >> It could be added as a read-only struct virtio_pci_common_cfg le16
> >> queue_indirect_size field. The same needs to be done for the other
> >> transports.
> >
> > OK, do I have to make it clear that it is an optional field in 
> > virtio_pci_common_cfg?
> 
> It would need to be something like "this field only exists if
> VIRTIO_RING_F_LARGE_INDIRECT_DESC has been negotiated".
> 
> I'm not quite sure how MMIO expresses optional registers.
> 
> For CCW, it's a bit more complicated: You would need to extend two
> command payloads (vq_config_block and vq_info_block, assuming you want
> to make this read/write); to do that, we will likely want to introduce
> revision 3 (and make the feature bit dependent on revision 3).
> 
> >
> > Are you sure about read-only? QEMU currently reserves the worst case expected 
> > amount of descriptors on stack on every vring entry being processed. If that 
> > field was read-write instead, and if guest driver never uses the amount of 
> > descriptors as advertised by host device being support, guest driver could 
> > simply lower that value and reduce the pressure on host device.
> >
> > So maybe it would make sense making it read-write with the requirement that 
> > guest driver must not increase the value? In the end that write option would 
> > be a soft feature, device could still ignore it and allocate more, it wouldn't 
> > hurt IMO and avoids another feature flag being required for this in future.
> 
> I agree, this should be read/write. It's similar to the queue size,
> where the driver may also configure a lower number.

Sounds good.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-29 10:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-19 13:21 [PATCH] Add VIRTIO_RING_F_LARGE_INDIRECT_DESC Christian Schoenebeck
2021-11-24 17:14 ` Stefan Hajnoczi
2021-11-25 13:24   ` [virtio-comment] " Christian Schoenebeck
2021-11-25 14:58     ` Cornelia Huck
2021-11-29 10:56       ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.