Discussion of the VIRTIO specification
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Christian Schoenebeck <qemu_oss@crudebyte.com>
Cc: virtio-comment@lists.oasis-open.org,
	Cornelia Huck <cohuck@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>, Greg Kurz <groug@kaod.org>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [virtio-comment] [PATCH v3 1/4] Add VIRTIO_RING_F_INDIRECT_SIZE
Date: Wed, 23 Mar 2022 08:35:00 -0400	[thread overview]
Message-ID: <20220323081402-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <2085227.9aJWpeLndy@silver>

On Wed, Mar 23, 2022 at 11:20:07AM +0100, Christian Schoenebeck wrote:
> On Montag, 21. März 2022 23:13:55 CET Michael S. Tsirkin wrote:
> > On Mon, Mar 21, 2022 at 10:23:07AM +0100, Christian Schoenebeck wrote:
> > > On Sonntag, 20. März 2022 22:52:16 CET Michael S. Tsirkin wrote:
> > > > On Sun, Mar 20, 2022 at 06:43:53PM +0100, Christian Schoenebeck wrote:
> > > > > To be honest, I don't feel like discussing precise wordings at this
> > > > > point
> > > > > when you are questioning the proposal on design level already.
> > > > > 
> > > > > Maybe you make some more thorough thoughts on what you actually want
> > > > > this
> > > > > to be on design level before continueing to argue about precise
> > > > > terminology, which you are not using either BTW when you articulating
> > > > > your criticism.
> > > > > 
> > > > > Or even better: come up with your own proposol with the precise
> > > > > wording
> > > > > you
> > > > > feel appropriate.
> > > > 
> > > > OK let's go back and agree on what we are trying to achieve.  The github
> > > > issue and the cover letter imply that while indirect descriptors would
> > > > normally allow huge tables, we artificially limit them to queue size,
> > > > and you want to be able to relax that.
> > > 
> > > Correct, that's my motivation for all of this.
> > 
> > Okay. So I think that given this, we can limit the total number
> > of non-indirect descriptors, including non-indirect ones
> > in a chain + all the ones in indirect pointer table if any,
> > and excluding the indirect descriptor itself, and this
> > will address the issue you are describing here, right?
> 
> As far as I understand your suggestion, yes, it would cover:
> 
> A. My use case [1]: allowing indirect table length > queue size.
> B. Stefan's use case [2]: forcing indirect table length < queue size.
> C. Your use case: forcing chain (within FIFOs) length < queue size.
> 
> [1] https://lore.kernel.org/netdev/cover.1640870037.git.linux_oss@crudebyte.com/
> [2] https://lore.kernel.org/all/YXpmwP6RtvY0BmSM@stefanha-x1.localdomain/
> 
> However it would not cover:
> 
> D. Your other use case: blocking indirect tables.

Well, the practical use-case I have in mind is actually for when using
mergeable buffers with the network device, in practice there's never a
s/g but there's no way for device to know that. So maybe we can declare
that indirect must not be used when there's a single descriptor in the
table (esp when the new feature is accepted), and then just limiting s/g
to 1 will do the trick in this specific case.

> E. Potential other people's need: max. chain length (within FIFOs) != max.
>    indirect table length.

I don't understand this one, sorry.

> It's not clear to me though why you would want to exclude the descriptor
> pointing to a table from counting towards that limit.

Because it's a transient thing. It does not need to be processed
with a packet, it's just a way of supplying a longer s/g.
In other words what devices care about is the s/g list length,
bot how the s/g is specified.

I think what QEMU and Linux both do is very typical:
the indirect descriptor is used to fetch the table,
but then what QEMU cares about is whether it will overflow 1024
entries when preparing the iovec for linux.



> Instead of yet again mixing different limits again with each other, what about
> using 3 distinct fields for those limits instead:
> 
> 1. max. indirect table length (any descriptor in the table)
> 2. max. chain length (currently: descriptors within FIFOs)
> 3. max. total descriptors per buffer (sum of all descriptors, both from FIFOs
>    and indirect tables, including descriptor pointing to a table).

So this doesn't actually address at least what you called "your
use-case" above. Well it almost does, but not exactly because
of the extra indirect descriptor.

It's not that I'm against this kind of thing but personally
I'd like a bit more than a theoretical argument for
why all the extra complexity is helpful - remember that
all that needs to be coded in the driver, and any mistakes
often tend to paint us in the corner because of the need to
support existing drivers.


> As a side note: the spec currently refers to "table of indirect descriptors",
> hence my previous assumption that descriptors in such a table should be called
> "indirect descriptors", whereas you are apparently assuming that an "indirect
> descriptor" is only that single one pointing to a table and you see the
> descriptors in the table as "direct" ones I guess.

Oh. You are right. There's no special name for what I mistakenly
called an indirect descriptor. So what I called
"direct descriptors" are really "read/write descriptors".
It might make sense to add some text calling these something.
Scatter/gather descriptors maybe?

And what I called "indirect descriptor" is really just
descriptor with INDIRECT flag set. I am not sure we need
to call these anything unless we really start counting it.




> > Question:
> > - I am thinking about bi-directional descriptors such as
> >   block and scsi have. it looks like they have a separate limit for
> >   read and write parts of the chain. Should we have two limits
> >   then? Or should we just make driver use the lower of the two,
> >   i.e. the per vq limit applies to the total # of elements
> >   in a buffer, and read/write sgs are controlled separately
> >   by the per device control?
> > 
> > Stefan, any comments?
> 
> Leaving that question to Stefan. No opinion from my side at this point.
> 
> > > > Fair enough.
> > > > 
> > > > However, I feel trying to talk about indirect descriptor is too narrow a
> > > > use-case, simply because the issue is not indirect at all.  Why do we
> > > > limit number of segments? I think it's really because of backend
> > > > limitations. And indirect is only used by the frontend.  So limiting
> > > > that is really going about it wrong.
> > > 
> > > I am only aware about current implementation situation in QEMU and Linux
> > > kernel. As for those two: yes, it is not a limitation on Linux kernel
> > > side,
> > > but on QEMU side.
> > > 
> > > As for other implementations: no idea.
> > > 
> > > > So block for example has seg_max already. What should happen
> > > > if that exceeds queue size is not defined.
> > > > 
> > > > So maybe we can generalize that making it device independent?
> > > > The litmus paper for this is the block and scsi devices,
> > > > we should be able to use the new feature as a super-set.
> > > > 
> > > > Before we discuss solutions, did I formulate the problem correctly?
> > > 
> > > Keep in mind that I never worked on virtio code or virtio spec before. I
> > > just started to review virtio implementation of QEMU and Linux kernel and
> > > the virtio spec in November, specifically in context of 9p. I definitely
> > > don't know all the other virtioo device classes out there.
> > 
> > I think it's great that we have someone taking care of 9p btw!
> 
> Thanks!
> 
> > > In other words: I can't help you on fitting this appropriately into a
> > > superset picture.
> > > 
> > > Best regards,
> > > Christian Schoenebeck
> > 
> > I think we need some cleanups in the spec to make what you are trying to
> > do possible to do cleanly, specifically move this description:
> > 
> > 	A buffer consists of zero or more device-readable physically-contiguous
> > 	elements followed by zero or more physically-contiguous
> > 	device-writable elements (each buffer has at least one element).
> > 
> > out to the generic part from packed ring part, drop corresponding
> > text from split ring and make it refer to the term "element".
> > 
> > After this, the new field will just be "a number of elements per
> > buffer" (note that indirect descriptors do not themselves
> > describe elements and so won't be included in the math).
> > 
> > Christian, you mentioned you don't like the term buffer generally,
> > changing that can be done before or after this feature but IMHO
> > best not as part of it.
> 
> For pragmatic reasons I will refrain from questioning any virtio terms in
> foreseeable future and will just use the ones suggested by people.

It's not that the terminology is so great, but we do have used it for a
while and by now it's quite entrenched.

> > I think it's good in that it will fit better in the superset picture
> > addressing in addition to your requirement also the requirement to have
> > huge rings while limiting descriptors.
> > 
> > Does above sound like it addresses your requirements of having
> > a longer descriptor chain than queue size? If not what is not
> > addressed?
> > 
> > Thanks,
> 


  reply	other threads:[~2022-03-23 12:35 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-16 13:44 [PATCH v3 0/4] Add VIRTIO_RING_F_INDIRECT_SIZE and queue_indirect_size Christian Schoenebeck
2022-03-16 13:47 ` [PATCH v3 1/4] Add VIRTIO_RING_F_INDIRECT_SIZE Christian Schoenebeck
2022-03-17 13:40   ` [virtio-comment] " Cornelia Huck
2022-03-18 10:45     ` Christian Schoenebeck
2022-03-18 16:03       ` [virtio-comment] " Cornelia Huck
2022-03-19  9:33   ` [virtio-comment] " Michael S. Tsirkin
2022-03-19 12:00     ` Christian Schoenebeck
2022-03-20 12:31       ` Michael S. Tsirkin
2022-03-20 13:32         ` Christian Schoenebeck
2022-03-20 13:55           ` Michael S. Tsirkin
2022-03-20 15:17             ` Christian Schoenebeck
2022-03-20 16:06               ` Michael S. Tsirkin
2022-03-20 16:07                 ` Michael S. Tsirkin
2022-03-20 17:43                 ` Christian Schoenebeck
2022-03-20 21:52                   ` Michael S. Tsirkin
2022-03-21  9:23                     ` Christian Schoenebeck
2022-03-21 22:13                       ` Michael S. Tsirkin
2022-03-23 10:20                         ` Christian Schoenebeck
2022-03-23 12:35                           ` Michael S. Tsirkin [this message]
2022-03-24  9:16                             ` Christian Schoenebeck
2022-03-24 10:36                               ` Michael S. Tsirkin
2022-03-24 11:11                                 ` Christian Schoenebeck
2022-03-24 11:16                                   ` Michael S. Tsirkin
2022-03-24 11:52                                     ` Christian Schoenebeck
2022-03-16 13:50 ` [PATCH v3 2/4] Add PCI configuration field "queue_indirect_size" Christian Schoenebeck
2022-03-16 14:41   ` [virtio-comment] " Christian Schoenebeck
2022-03-16 13:52 ` [PATCH v3 3/4] Add MMIO configuration register "QueueIndirectNum" Christian Schoenebeck
2022-03-16 13:55 ` [PATCH v3 4/4] Add CCW configuration field "indirect_num" Christian Schoenebeck
2022-03-17 14:12   ` [virtio-comment] " Cornelia Huck
2022-03-18 11:02     ` Christian Schoenebeck
2022-03-18 16:06       ` Halil Pasic
2022-03-19 10:12         ` Christian Schoenebeck
2022-03-21 16:36           ` Cornelia Huck
2022-03-22  1:56             ` Halil Pasic
2022-03-22  9:57               ` Cornelia Huck
2022-03-22 11:21                 ` Halil Pasic
2022-03-18 16:10       ` Cornelia Huck
2022-03-19 10:23         ` Christian Schoenebeck
2022-03-21 16:25           ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220323081402-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=asmadeus@codewreck.org \
    --cc=cohuck@redhat.com \
    --cc=groug@kaod.org \
    --cc=pasic@linux.ibm.com \
    --cc=qemu_oss@crudebyte.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-comment@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox