From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Parav Pandit <parav@nvidia.com>,
"virtio-comment@lists.oasis-open.org"
<virtio-comment@lists.oasis-open.org>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"sburla@marvell.com" <sburla@marvell.com>,
Shahaf Shuler <shahafs@nvidia.com>,
Maor Gottlieb <maorg@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>,
"lingshan.zhu@intel.com" <lingshan.zhu@intel.com>
Subject: [virtio-comment] Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
Date: Thu, 16 Nov 2023 01:49:28 -0500 [thread overview]
Message-ID: <20231116014250-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CACGkMEsWN38gptqRBN=50LY6S-yJ0cRQcJk=kih9Dj-7bm0V6Q@mail.gmail.com>
On Thu, Nov 16, 2023 at 12:24:27PM +0800, Jason Wang wrote:
> On Thu, Nov 16, 2023 at 1:37 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Monday, November 13, 2023 9:11 AM
> > >
> > > On Fri, Nov 10, 2023 at 2:46 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Thursday, November 9, 2023 1:29 PM
> > > >
> > > > [..]
> > > > > > Besides the issue of performance, it's also racy, assuming we are
> > > > > > logging
> > > > > IOVA.
> > > > > >
> > > > > > 0) device log IOVA
> > > > > > 1) hypervisor fetches IOVA from log buffer
> > > > > > 2) guest map IOVA to a new GPA
> > > > > > 3) hypervisor traverse guest table to get IOVA to new GPA
> > > > > >
> > > > > > Then we lost the old GPA.
> > > > >
> > > > > Interesting and a good point. And by the way e.g. vhost has the same
> > > > > issue. You need to flush dirty tracking info when changing the
> > > > > mappings somehow. Parav what's the plan for this? Should be addressed in
> > > the spec too.
> > > > >
> > > > As you listed the flush is needed for vhost or device-based DPT.
> > >
> > > What does DPT mean? Device Page Table? Let's not invent terminology which is
> > > not known by others please.
> > >
> > Sorry for using the acronym. I meant dirty page tracking.
> >
> > > We have discussed it many times. You can't just depend on ATS or reinventing
> > > wheels in virtio.
> > The dependency is on the iommu which would have the mapping of GIOVA to GPA like any sw implementation.
> > No dependency on ATS.
> >
> > >
> > > What's more, please try not to give me the impression that the proposal is
> > > optimized for a specific vendor (like device IOMMU stuffs).
> > >
> > You should stop calling this specific vendor thing.
>
> Well, as you have explained, the confusion came from "DPT" ...
>
> > One can equally say that suspend bit proposal is for the sw_vendor device who is forcing virtio hw device to only implement ioqueues + PASID + non_unified interface for PF, VF, SIOVs + non_TDISP based devices.
> >
> > > > The necessary plumbing is already covered for this in the query (read and
> > > clear) command of this v3 proposal.
> > >
> > > The issue is logging via IOVA ... I don't see how "read and clear" can help.
> > >
> > Read and clear helps that ensures that all the dirty pages are reported, hence there is no mapping/unmapping race.
>
> Reported as IOVA ...
>
> > As everything is reported.
> >
> > > > It is listed in Device Write Records Read Command.
> > >
> > > Please explain how your proposal can solve the above race.
> > >
> > In below manner.
> > 1. guest has GIOVA to GPA_1 mapping
> > 2. RX packets occurred to GIOVA
> > 3. device reported dirty page log for GIOVA (hypervisor is yet to read)
> > 4. guest requested mapping change from GIOVA to GPA_2
> > 4.1 During this IOTLB is invalidated and dirty page report is queried ensuring, it can change the mapping
>
> It requires
>
> 1) hypervisor traps IOTLB invalidation, which doesn't work when
> nesting could be offloaded (IOMMUFD has started the work to support
> nesting)
> 2) query the device about the dirty page on each IOTLB invalidation which:
> 2.1) A huge round trip: guest IOTLB invalidation -> trapped by
> hypervisor -> start the query from the device -> device return ->
> hypervisor reports IOTLB invalidation is done -> let guest run. Have
> you benchmarked the RTT in this case? There are just too many places
> that cause the delay in the middle.
To be fair invalidations are already expensive e.g. with vhost iotlb
it requires a slow system call.
This will make them *even more* expensive.
Problem for some but not all workloads. Again I agree motivation,
tradeoffs and comparison with both dirty tracking by iommu and shadow vq
approaches really should be included.
> 2.2) Guest triggerable behaviour, malicious guest can simply do
> endless IOTLB invalidation to DOS the e.g admin virtqueue
I'm not sure how much to worry about it - just don't allow more
than one in flight per VM.
> >
> > > >
> > > > When the page write record is fully read, it is flushed.
> > > > How/when to use, I think its hypervisor specific, so we probably better off not
> > > documenting those details.
> > >
> > > Well, as the author of this proposal, at least you need to know how a hypervisor
> > > can work with your proposal, no?
> > >
> > Likely yes, but it is not the scope of the spec to list those paths etc.
>
> Fine, but as a reviewer I need to know if it can work with a hypervisor well.
>
> >
> > > > May be such read is needed in some other path too depending on how
> > > hypervisor implemented.
> > >
> > > What do you mean by "May be ... some other path" here? You're inventing a
> > > mechanism that you don't know how a hypervisor can use?
> >
> > No. I meant hypervisor may have more operations that map/unmap/flush where it may need to implement it.
> > Some one may call it set_map(), some may say dma_map()...
>
> Ok.
>
> Thanks
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
next prev parent reply other threads:[~2023-11-16 6:50 UTC|newest]
Thread overview: 157+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 13:19 [virtio-comment] [PATCH v3 0/8] Introduce device migration support commands Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 1/8] admin: Add theory of operation for device migration Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 2/8] admin: Redefine reserved2 as command specific output Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 3/8] device-context: Define the device context fields for device migration Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 4/8] admin: Add device migration admin commands Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 5/8] admin: Add requirements of device migration commands Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 6/8] admin: Add theory of operation for write recording commands Parav Pandit
2023-10-31 1:43 ` [virtio-comment] " Jason Wang
2023-10-31 3:27 ` [virtio-comment] " Parav Pandit
2023-10-31 7:45 ` [virtio-comment] " Michael S. Tsirkin
2023-10-31 9:32 ` Zhu, Lingshan
2023-10-31 9:41 ` Michael S. Tsirkin
2023-10-31 9:47 ` Zhu, Lingshan
2023-11-01 0:29 ` Jason Wang
2023-11-01 3:02 ` [virtio-comment] " Parav Pandit
2023-11-02 4:24 ` [virtio-comment] " Jason Wang
2023-11-02 6:10 ` [virtio-comment] " Parav Pandit
2023-11-06 6:34 ` [virtio-comment] " Jason Wang
2023-11-06 6:53 ` [virtio-comment] " Parav Pandit
2023-11-07 4:04 ` [virtio-comment] " Jason Wang
2023-11-07 7:05 ` Michael S. Tsirkin
2023-11-08 4:28 ` Jason Wang
2023-11-08 8:17 ` Michael S. Tsirkin
2023-11-08 9:00 ` [virtio-comment] " Parav Pandit
2023-11-08 17:16 ` [virtio-comment] " Michael S. Tsirkin
2023-11-09 6:27 ` Parav Pandit
2023-11-09 3:31 ` Jason Wang
2023-11-09 7:59 ` Michael S. Tsirkin
2023-11-10 6:46 ` [virtio-comment] " Parav Pandit
2023-11-13 3:41 ` [virtio-comment] " Jason Wang
2023-11-13 14:30 ` Michael S. Tsirkin
2023-11-14 2:03 ` Zhu, Lingshan
2023-11-14 7:52 ` Jason Wang
2023-11-15 17:37 ` [virtio-comment] " Parav Pandit
2023-11-16 4:24 ` [virtio-comment] " Jason Wang
2023-11-16 6:49 ` Michael S. Tsirkin [this message]
2023-11-21 4:21 ` Jason Wang
2023-11-21 16:24 ` [virtio-comment] " Parav Pandit
2023-11-22 4:11 ` [virtio-comment] " Jason Wang
2023-11-16 6:50 ` Michael S. Tsirkin
2023-11-13 3:31 ` Jason Wang
2023-11-13 6:57 ` Michael S. Tsirkin
2023-11-14 7:34 ` Zhu, Lingshan
2023-11-14 7:59 ` Jason Wang
2023-11-14 8:27 ` Michael S. Tsirkin
2023-11-15 4:05 ` Zhu, Lingshan
2023-11-15 7:51 ` Michael S. Tsirkin
2023-11-15 7:59 ` Zhu, Lingshan
2023-11-15 8:05 ` Michael S. Tsirkin
2023-11-15 8:42 ` Zhu, Lingshan
2023-11-15 11:52 ` Michael S. Tsirkin
2023-11-16 9:38 ` Zhu, Lingshan
2023-11-16 12:18 ` Michael S. Tsirkin
2023-11-17 9:50 ` Zhu, Lingshan
2023-11-17 9:55 ` Michael S. Tsirkin
2023-11-14 7:57 ` Jason Wang
2023-11-14 9:16 ` Michael S. Tsirkin
2023-11-15 17:42 ` [virtio-comment] " Parav Pandit
2023-11-16 4:18 ` [virtio-comment] " Jason Wang
2023-11-16 5:27 ` [virtio-comment] " Parav Pandit
2023-11-17 10:15 ` [virtio-comment] " Michael S. Tsirkin
2023-11-17 10:48 ` Parav Pandit
2023-11-17 11:19 ` Michael S. Tsirkin
2023-11-17 11:32 ` Parav Pandit
2023-11-17 11:49 ` Michael S. Tsirkin
2023-11-17 12:15 ` Parav Pandit
2023-11-17 12:37 ` Michael S. Tsirkin
2023-11-17 12:49 ` Parav Pandit
2023-11-17 13:58 ` Michael S. Tsirkin
2023-11-17 14:49 ` Parav Pandit
2023-11-17 15:00 ` Michael S. Tsirkin
2023-11-09 6:26 ` [virtio-comment] " Parav Pandit
2023-11-15 7:59 ` [virtio-comment] " Michael S. Tsirkin
2023-11-15 17:42 ` [virtio-comment] " Parav Pandit
2023-11-09 6:24 ` Parav Pandit
2023-11-13 3:37 ` [virtio-comment] " Jason Wang
2023-11-15 17:38 ` [virtio-comment] " Parav Pandit
2023-11-16 4:23 ` [virtio-comment] " Jason Wang
2023-11-16 5:29 ` [virtio-comment] " Parav Pandit
2023-11-16 5:51 ` [virtio-comment] " Michael S. Tsirkin
2023-11-16 7:35 ` Michael S. Tsirkin
2023-11-16 7:40 ` [virtio-comment] " Parav Pandit
2023-11-16 11:48 ` [virtio-comment] " Michael S. Tsirkin
2023-11-16 16:26 ` [virtio-comment] " Parav Pandit
2023-11-16 17:25 ` [virtio-comment] " Michael S. Tsirkin
2023-11-16 17:29 ` [virtio-comment] " Parav Pandit
2023-11-16 18:20 ` [virtio-comment] " Michael S. Tsirkin
2023-11-17 3:02 ` [virtio-comment] " Parav Pandit
2023-11-17 8:46 ` [virtio-comment] " Michael S. Tsirkin
2023-11-17 9:14 ` [virtio-comment] " Parav Pandit
2023-11-17 9:37 ` [virtio-comment] " Michael S. Tsirkin
2023-11-17 9:41 ` [virtio-comment] " Parav Pandit
2023-11-17 9:44 ` Parav Pandit
2023-11-17 9:51 ` [virtio-comment] " Michael S. Tsirkin
2023-11-17 9:54 ` Zhu, Lingshan
2023-11-17 10:02 ` Michael S. Tsirkin
2023-11-17 10:10 ` Parav Pandit
2023-11-17 9:57 ` Parav Pandit
2023-11-17 10:37 ` Michael S. Tsirkin
2023-11-17 10:52 ` Parav Pandit
2023-11-17 11:32 ` Michael S. Tsirkin
2023-11-17 12:22 ` Parav Pandit
2023-11-17 12:40 ` Michael S. Tsirkin
2023-11-17 12:51 ` Parav Pandit
2023-11-21 5:16 ` Jason Wang
2023-11-21 16:29 ` Parav Pandit
2023-11-21 21:00 ` Michael S. Tsirkin
2023-11-22 3:46 ` Parav Pandit
2023-11-22 7:44 ` Michael S. Tsirkin
2023-11-22 4:17 ` Jason Wang
2023-11-22 4:34 ` Parav Pandit
2023-11-24 3:15 ` Jason Wang
2023-11-17 9:52 ` Zhu, Lingshan
2023-11-17 9:59 ` [virtio-comment] " Parav Pandit
2023-11-17 10:00 ` [virtio-comment] " Zhu, Lingshan
2023-11-21 4:24 ` Jason Wang
2023-11-21 16:26 ` [virtio-comment] " Parav Pandit
2023-11-22 4:14 ` [virtio-comment] " Jason Wang
2023-11-22 4:19 ` [virtio-comment] " Parav Pandit
2023-11-24 3:09 ` [virtio-comment] " Jason Wang
2023-11-16 10:28 ` Zhu, Lingshan
2023-11-16 11:59 ` Michael S. Tsirkin
2023-11-17 9:59 ` Zhu, Lingshan
2023-11-17 10:03 ` Parav Pandit
2023-11-17 11:00 ` Michael S. Tsirkin
2023-11-17 11:05 ` Parav Pandit
2023-11-17 11:33 ` Michael S. Tsirkin
2023-11-17 11:45 ` Parav Pandit
2023-11-17 12:04 ` Michael S. Tsirkin
2023-11-17 12:11 ` Parav Pandit
2023-11-17 12:32 ` Michael S. Tsirkin
2023-11-17 13:03 ` Parav Pandit
2023-11-17 14:00 ` Michael S. Tsirkin
2023-11-17 14:48 ` Parav Pandit
2023-11-17 14:59 ` Michael S. Tsirkin
2023-11-21 6:55 ` Jason Wang
2023-11-21 16:30 ` Parav Pandit
2023-11-22 4:19 ` Jason Wang
2023-11-22 4:28 ` Parav Pandit
2023-11-24 3:08 ` Jason Wang
2023-11-22 2:31 ` Si-Wei Liu
2023-11-22 5:31 ` Jason Wang
2023-11-23 13:19 ` Si-Wei Liu
2023-11-23 14:39 ` Michael S. Tsirkin
2023-11-24 2:29 ` Jason Wang
2023-11-28 3:00 ` Si-Wei Liu
2023-11-29 5:12 ` Jason Wang
2023-11-17 10:40 ` Michael S. Tsirkin
2023-11-21 4:23 ` Jason Wang
2023-11-21 7:14 ` Jason Wang
2023-11-21 16:31 ` [virtio-comment] " Parav Pandit
2023-11-22 4:28 ` [virtio-comment] " Jason Wang
2023-11-22 6:41 ` [virtio-comment] " Parav Pandit
2023-11-24 3:06 ` [virtio-comment] " Jason Wang
2023-11-15 7:58 ` Michael S. Tsirkin
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 7/8] admin: Add " Parav Pandit
2023-10-30 13:19 ` [virtio-comment] [PATCH v3 8/8] admin: Add requirements of write reporting commands Parav Pandit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231116014250-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=cohuck@redhat.com \
--cc=jasowang@redhat.com \
--cc=lingshan.zhu@intel.com \
--cc=maorg@nvidia.com \
--cc=parav@nvidia.com \
--cc=sburla@marvell.com \
--cc=shahafs@nvidia.com \
--cc=virtio-comment@lists.oasis-open.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox