From: Lars Ganrot <lga@napatech.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
"virtio-comment@lists.oasis-open.org"
<virtio-comment@lists.oasis-open.org>,
"virtio-dev@lists.oasis-open.org"
<virtio-dev@lists.oasis-open.org>,
"virtio@lists.oasis-open.org" <virtio@lists.oasis-open.org>
Subject: RE: [virtio-comment] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
Date: Mon, 17 Aug 2020 08:11:36 +0000 [thread overview]
Message-ID: <3c6acf2d5d524d17bb14e7d7d1d55f30@napatech.com> (raw)
In-Reply-To: <20200813191623-mutt-send-email-mst@kernel.org>
> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Michael S. Tsirkin
> Sent: 14. august 2020 01:18
>
> On Wed, Aug 12, 2020 at 03:55:18PM +0000, Lars Ganrot wrote:
> > > From: virtio-comment@lists.oasis-open.org
> > > <virtio-comment@lists.oasis- open.org> On Behalf Of Cornelia Huck
> > > Sent: 12. august 2020 14:51
> > >
> > > On Tue, 11 Aug 2020 15:43:44 +0000
> > > Lars Ganrot <lga@napatech.com> wrote:
> > >
> > > > > From: virtio-comment@lists.oasis-open.org
> > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Lars Ganrot
> > > > > Sent: 11. august 2020 16:54
> > > > >
> > > > > > From: virtio-dev@lists.oasis-open.org
> > > > > > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S.
> > > > > > Tsirkin
> > > > > > Sent: 11. august 2020 10:23
> > > > > >
> > > > > > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > > > > > On Mon, 10 Aug 2020 12:15:15 -0400 "Michael S. Tsirkin"
> > > > > > > <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > > Devices that normally use buffers in order can benefit
> > > > > > > > from ability to temporarily switch to handle some buffers out of
> order.
> > > > > > > >
> > > > > > > > As a case in point, a networking device might handle RX
> > > > > > > > buffers in order normally. However, should an access to an
> > > > > > > > RX buffer cause a page fault (e.g. when using PRI), the
> > > > > > > > device could benefit from ability to temporarily keep
> > > > > > > > using following buffers in the ring (possibly with higher
> > > > > > > > overhead) until the fault has
> > > been resolved.
> > > > > > > >
> > > > > > > > Page faults allow more features such as THP, auto-NUMA,
> > > > > > > > live migration.
> > > > > > > >
> > > > > > > > Out of order is of course already possible, however,
> > > > > > > > IN_ORDER is currently required for descriptor batching
> > > > > > > > where device marks a whole batch of buffers used in one go.
> > > > > > > >
> > > > > > > > The idea behind this proposal is to relax that
> > > > > > > > requirement, allowing batching without asking device to be
> > > > > > > > in orde rat all times, as
> > > > > > > > follows:
> > > > > > > >
> > > > > > > > Device uses buffers in any order. Eventually when device
> > > > > > > > detects that it has used all previously outstanding
> > > > > > > > buffers, it sets a FLUSH flag on the last buffer used. If
> > > > > > > > it set this flag on the last buffer used previously, and
> > > > > > > > now uses a batch of descriptors in-order, it can now
> > > > > > > > signal the last buffer used again setting the FLUSH
> > > > > flag.
> > > > > > > >
> > > > > > > > Driver can detect in-order when it sees two FLUSH flags
> > > > > > > > one after another. In other respects the feature is
> > > > > > > > similar to IN_ORDER from the driver implementation POV.
> > > > > > > >
> > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > ---
> > > > > > > > content.tex | 9 ++++++++-
> > > > > > > > packed-ring.tex | 23 +++++++++++++++++++++++
> > > > > > > > split-ring.tex
> > > > > > > > |
> > > > > > > > 26
> > > > > > > > ++++++++++++++++++++++++--
> > > > > > > > 3 files changed, 55 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/content.tex b/content.tex index
> > > > > > > > 91735e3..8494eb6
> > > > > > > > 100644
> > > > > > > > --- a/content.tex
> > > > > > > > +++ b/content.tex
> > > > > > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > > > > > Facilities of a Virtio Device / Virtqueues}
> > > > > > > >
> > > > > > > > Some devices always use descriptors in the same order in
> > > > > > > > which they have been made available. These devices can
> > > > > > > > offer the -VIRTIO_F_IN_ORDER feature. If negotiated, this
> > > > > > > > knowledge
> > > > > > > > +VIRTIO_F_IN_ORDER feature. Other devices sometimes use
> > > > > > > > +descriptors in the same order in which they have been
> > > > > > > > +made available. These devices can offer the
> > > > > > > > +VIRTIO_F_PARTIAL_ORDER feature. If one of the features
> > > > > > > > +VIRTIO_F_IN_ORDER or VIRTIO_F_PARTIAL_ORDER is
> > > > > > negotiated,
> > > > > > > > +this knowledge
> > > > > > >
> > > > > > > Do these two features conflict with each other? I.e., at
> > > > > > > most one of them may be negotiated (or offered?) at a time?
> > > > > >
> > > > > > Good point. I think so, yes. Will document.
> > > > >
> > > > > Isn't it more natural to think of VIRTIO_F_IN_ORDER as the
> > > > > simple case which always maintains ordered access, while the new
> > > > > feature flag allows active control of when descriptors are
> > > > > ordered and when not? To make it backward compatible let
> > > > > VIRTIO_F_IN_ORDER imply the new bit is set, while the new bit
> > > > > set by itself without VIRTIO_F_IN_ORDER set means only active
> > > > > control is offered. I guess a name like VIRTIO_F_CTRL_ORDER
> > > > > would be more appropriate with this
> > > interpretation.
> > > > >
> > > >
> > > > On second thought that might be a bit backwards - how about:
> > > >
> > > > Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0
> This
> > > > proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1
> Potential
> > > > future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1
> > >
> > > What happens in the new device/old driver case?
> > > - device offers IN_ORDER and PARTIAL_ORDER
> > > - driver does not know PARTIAL_ORDER, accepts IN_ORDER
> > > - device now only can do complete ordering
> > >
> > > Maybe I don't understand the purpose of the new feature correctly,
> > > but I thought it was for those devices that don't do full in-order,
> > > but can do it for a subset of buffers? As such, the two features
> > > can't really imply each other: a device offering IN_ORDER might not
> > > know about the new feature and its mechanism, and a device offering
> > > the new feature, but not IN_ORDER probably does so because it cannot
> support full IN_ORDER.
> > >
> > > I think it makes the most sense if the device can offer both flags,
> > > but the driver must only accept at most one of them?
> > >
> >
> > Yes, you are absolutely right. Keeping them as two mutually exclusive
> options is probably best.
> >
> > Another aspect of the proposal is that the functionality is only
> > described for the packed ring layout, but the concept should be
> > equally applicable to the split ring.
>
> Pls take a look at the changes in split-ring.tex Are they unclear?
>
I guess not. I failed to understand that the used-ring 32-bit ID field is divided into a 16-bit ID and a 16-bit flags. Sorry for the confusion.
Does this mean that for a device that always sets the FLUSH-flag, the PARTIAL_ORDER feature will be identical to IN_ORDER for both packed and split rings? The first write to a Used structure (after creation of the Virtqueue) would not have 2 FLUSH-flags, but would still fulfil the condition.
> > The split ring does not have
> > flags for each used ring element, so the FLUSH flag functionality
> > needs to be a bit different. One way is to add an IN_ORDER-flag to the
> > virtq_used.flags, that allows the device to signal when it starts to
> > adhere to IN_ORDER buffer use. The driver acknowledges this when it is
> > ready to fulfil its part of the IN_ORDER restrictions by setting a new
> > virtq_avail.flag: IN_ORDER_ACK. At this point both device and driver
> > can assume IN_ORDER optimization until the device clears IN_ORDER.
> > After clearing IN_ORDER, the device must wait until IN_ORDER_ACK is
> > cleared before IN_ORDER can be asserted it again.
>
> It would be tricky to figure out which buffers does the handshake apply to. My
> proposal does it a bit differently, pls take a look.
>
>
> >
> > This is a little less fine-grained than the packed-ring solution with the FLUSH
> flag, but for infrequent scenarios like migration it should be OK.
> >
> > >
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > >
> > > In order to verify user consent to the Feedback License terms and to
> > > minimize spam in the list archive, subscription is required before posting.
> > >
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-
> > > open.org/who/ipr/feedback_license.pdf
> > > List Guidelines:
> > > https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> >
>
>
> This publicly archived list offers a means to provide input to the OASIS Virtual
> I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and to minimize
> spam in the list archive, subscription is required before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-
> open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
next prev parent reply other threads:[~2020-08-17 8:11 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 16:15 [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling Michael S. Tsirkin
2020-08-10 16:59 ` Cornelia Huck
2020-08-11 8:23 ` Michael S. Tsirkin
2020-08-11 8:39 ` Cornelia Huck
2020-08-11 14:53 ` [virtio-comment] RE: [virtio-dev] " Lars Ganrot
2020-08-11 15:43 ` Lars Ganrot
2020-08-12 12:50 ` [virtio] " Cornelia Huck
2020-08-12 15:55 ` [virtio-comment] " Lars Ganrot
2020-08-13 23:17 ` [virtio] " Michael S. Tsirkin
2020-08-17 8:11 ` Lars Ganrot [this message]
2021-09-06 6:33 ` Michael S. Tsirkin
2020-08-13 20:45 ` [virtio] " Michael S. Tsirkin
2022-03-29 8:33 ` [virtio-dev] " Stefan Hajnoczi
2022-03-29 10:30 ` Eugenio Perez Martin
2022-03-29 14:40 ` [virtio-comment] " Michael S. Tsirkin
2022-03-30 9:03 ` Eugenio Perez Martin
2022-03-29 14:39 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3c6acf2d5d524d17bb14e7d7d1d55f30@napatech.com \
--to=lga@napatech.com \
--cc=cohuck@redhat.com \
--cc=mst@redhat.com \
--cc=virtio-comment@lists.oasis-open.org \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtio@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox