From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Parav Pandit <parav@nvidia.com>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: "jasowang@redhat.com" <jasowang@redhat.com>,
"eperezma@redhat.com" <eperezma@redhat.com>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"stefanha@redhat.com" <stefanha@redhat.com>,
"virtio-comment@lists.oasis-open.org"
<virtio-comment@lists.oasis-open.org>
Subject: [virtio-comment] Re: [PATCH V2 6/6] virtio-pci: implement dirty page tracking
Date: Mon, 6 Nov 2023 11:52:21 +0800 [thread overview]
Message-ID: <445ff573-72c3-4fe4-9e07-e7fdd2dc5750@intel.com> (raw)
In-Reply-To: <PH0PR12MB54818179419B06412D7A9C84DCA5A@PH0PR12MB5481.namprd12.prod.outlook.com>
On 11/3/2023 11:47 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Friday, November 3, 2023 8:33 PM
>>
>> On 11/3/2023 7:35 PM, Parav Pandit wrote:
>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>> Sent: Friday, November 3, 2023 4:20 PM
>>>>
>>>> On Fri, Nov 03, 2023 at 06:34:37PM +0800, Zhu Lingshan wrote:
>>>>> +\item[\field{bitmap_addr}]
>>>>> + The driver use this to set the address of the bitmap which records
>>>>> +the
>>>> dirty pages
>>>>> + caused by the device.
>>>>> + Each bit in the bitmap represents one memory page, bit 0 in the bitmap
>>>>> + reprsents page 0 at address 0, bit 1 represents page 1, and so on
>>>>> +in a
>>>> linear manner.
>>>>> + When \field{enable} is set to 1 and the device writes to a memory page,
>>>>> + the device MUST set the corresponding bit to 1 which indicating
>>>>> +the
>>>> page is dirty.
>>>>> +\item[\field{bitmap_length}]
>>>>> + The driver use this to set the length in bytes of the bitmap.
>>>>> +\end{description}
>>>>> +
>>>>> +\devicenormative{\subsubsection}{Memory Dirty Pages Tracker
>>>>> +Capability}{Virtio Transport Options / Virtio Over PCI Bus / Memory
>>>>> +Dirty Pages Tracker Capability}
>>>>> +
>>>>> +The device MUST NOT set any bits beyond bitmap_length when
>>>>> +reporting
>>>> dirty pages.
>>>>> +
>>>>> +To prevent a read-modify-write procedure, if a memory page is
>>>>> +dirty,
>>> It is not to prevent; it is just not possible to do racy RMW. 😊
>> if you understand what is a atomic routine, you will not call it racy.
>>> Hence to work around you propose to mark all pages dirty. Too bad.
>>> This just does not work.
>> why? and this is optional.
> Because device cannot set individual bits in atomic way for same byte read by the cpu.
> 1. device read the byte that had bit 0 and 4 set.
> 2. cpu atomically clear these bits.
> 3. device wrote bits 0, 4, and new bits 2 and 3.
> 4. cpu now transferred page 0 and 4 again.
>
> Optional thing also needs to work. :)
Do you know both CPU and device actually don't read bit, they read bytes????
Do you know RC connected to memory controller????
Do you know there are locked transaction and atomic operations in PCI???
Do you know there are atomic read/write/clear even read and clear and so
on in CPU ISA????
>
>>> Secondly the bitmap array is function is for full guest memory size, while
>> there is lot of sparce region now and also in future.
>>> This is the second problem.
>> did you see gra_power and its comments?
> gra_power says the page size.
> Not the sparce multiple ranges of the guest memory.
> Device endup tracking uninterested area as well.
increase gra_power can reduce bitmap size, right?
Totally up to the hypervisor, right?
>
>>> This is exactly why I asked you to review the page write recording series of
>> admin commands and comment.
>>> And you never commented with sheer ignorance.
>>>
>>> So clearly the start stop method for specific range and without bandwidth
>> explosion, admin commands of [1] stands better.
>>> If you do [1] on the member device also using its AQ in future, it will work for
>> non-passthrough case.
>>> If you build non-passthrough live migration using [1], also it will work.
>>> So I don’t see any point of this series anymore.
>> As Jason pointed out, there are many problems in your proposal, you should
>> answer there. I don't need to repeat his words and duplicate the discussions.
> Many are already addressed in v3.
interesting, does your V3 support nested?
>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202310/msg00475.h
>>> tml
>> you still need to explain why this does not work for pass-through.
> It does not work for following reasons.
> 1. Because all the fields that put on the member device are not in direct control of the hypervisor.
> The device is directly controlled by the guest including the device status and when it resets the device all the things stored in the device are lost.
have you seen PASID? and if the device reset, it has to forget
everything as expected, right?
>
> 2. the PCI FLR is clearing all the registers you exposed here.
see above
>
> 3. Endless expansion of config registers of dirty tracking is not scalable, as they are not init time registers not following the Appendix B guidelines.
endless expansion?? It is a complete set of dirty page tracking, right????
have you see this cap only controls? The device DMA writes the bitmap,
not by registers.
Again, if you want to fix Appendix B, OK.
>
> 4. bitmap based dirty tracking is not atomic between cpu and device.
> Hence, it is racy.
see above, the first reply.
>
> 5. All the device context needed for passthrough based hypervisor for a device type specific is missing.
> All of those can be used for non-passthrough as well.
> [1] https://lists.oasis-open.org/archives/virtio-comment/202311/msg00085.html
If you want to discuss this again, I don't want to wast time but only
asking you whether you want to define
virtio-fs device context
>
>> And I
>> remember this is a point-less topic as MST ever wants to mute another "pass-
>> through" thread.
> No. he did not say that.
> He meant to not endlessly debate which one is better.
> He clearly said, try to see if you can make multiple hypervisor model work.
> And your series shows a clear ignorance of his guidance.
Let me quote MST's reply here:
"I feel this discussion will keep meandering because the terminology is
vague. There's no single thing that is called "passthrough" -
vendors just build what is expedient with current hardware and
software. Nvidia has a bunch of people working on vfio so they
call that passthrough, Red Hat has people working on VDPA and
they call that passthrough, etc.
Before I mute this discussion for good, does anyone here have any
feeling progress is made? What kind of progress? "
So please don't discuss on pass-through anymore.
It seems only you need to develop the knowledge
>
>
>>>>> +optionally the device is permitted to set the entire byte, which
>>>>> +encompasses
>>>> the relevant bit, to 1.
>>>>> +
>>>>> +The device MAY increase \field{gra_power} to reduce
>> \field{bitmap_length}.
>>>>> +
>>>>> +The device must ignore any writes to \field{pasid} if PASID
>>>>> +Extended Capability is absent or the PASID functionality is
>>>>> +disabled in PASID Extended Capability
>>>> I have to say this is going to work very badly when the number of
>>>> dirty pages is
>>>> small: you will end up scanning and re-scanning all of bitmap.
>>>> And the resolution is apparently 8 pages? You have just multiplied
>>>> the migration bandwidth by a factor of 8.
>>> Yeah.
>>> And device does not even know previously reported pages are read by driver
>> or not. All guess work game for driver and device.
>> see my reply to him
> Please see above reply.
see above
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
next prev parent reply other threads:[~2023-11-06 3:52 UTC|newest]
Thread overview: 186+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-03 10:34 [virtio-comment] [PATCH V2 0/6] introduce basic facilities for virito live migration Zhu Lingshan
2023-11-03 10:34 ` [virtio-comment] [PATCH V2 1/6] virtio: introduce virtqueue state Zhu Lingshan
2023-11-03 11:35 ` [virtio-comment] " Parav Pandit
2023-11-03 14:39 ` [virtio-comment] " Zhu, Lingshan
2023-11-03 11:52 ` Michael S. Tsirkin
2023-11-03 14:49 ` Zhu, Lingshan
2023-11-06 9:35 ` Michael S. Tsirkin
2023-11-06 9:42 ` Zhu, Lingshan
2023-11-06 9:45 ` Michael S. Tsirkin
2023-11-07 8:11 ` Zhu, Lingshan
2023-11-07 8:22 ` Michael S. Tsirkin
2023-11-08 4:08 ` Zhu, Lingshan
2023-11-03 10:34 ` [virtio-comment] [PATCH V2 2/6] virtio: introduce SUSPEND bit in device status Zhu Lingshan
2023-11-03 11:35 ` [virtio-comment] " Parav Pandit
2023-11-03 14:55 ` [virtio-comment] " Zhu, Lingshan
2023-11-03 15:54 ` [virtio-comment] " Parav Pandit
2023-11-06 3:29 ` [virtio-comment] " Zhu, Lingshan
2023-11-06 4:07 ` [virtio-comment] " Parav Pandit
2023-11-06 9:21 ` Zhu, Lingshan
2023-11-06 10:52 ` Parav Pandit
2023-11-07 8:21 ` Zhu, Lingshan
2023-11-07 8:33 ` Michael S. Tsirkin
2023-11-07 9:24 ` Zhu, Lingshan
2023-11-08 7:42 ` Michael S. Tsirkin
2023-11-06 9:43 ` [virtio-comment] " Michael S. Tsirkin
2023-11-07 9:09 ` Zhu, Lingshan
2023-11-08 17:55 ` Michael S. Tsirkin
2023-11-09 9:55 ` Zhu, Lingshan
2023-11-03 10:34 ` [virtio-comment] [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND Zhu Lingshan
2023-11-06 9:49 ` [virtio-comment] " Michael S. Tsirkin
2023-11-07 9:27 ` Zhu, Lingshan
2023-11-08 17:46 ` Michael S. Tsirkin
2023-11-09 9:58 ` Zhu, Lingshan
2023-11-09 10:15 ` [virtio-comment] " Parav Pandit
2023-11-10 6:22 ` [virtio-comment] " Zhu, Lingshan
2023-11-10 6:31 ` [virtio-comment] " Parav Pandit
2023-11-13 9:23 ` Zhu, Lingshan
2023-11-15 17:35 ` Parav Pandit
2023-11-16 10:09 ` Zhu, Lingshan
2023-11-16 10:19 ` Parav Pandit
2023-11-16 12:09 ` Michael S. Tsirkin
2023-11-17 10:13 ` Zhu, Lingshan
2023-11-17 11:04 ` Michael S. Tsirkin
2023-11-22 1:41 ` Zhu, Lingshan
2023-11-22 7:30 ` Michael S. Tsirkin
2023-11-13 3:34 ` [virtio-comment] " Jason Wang
2023-11-15 17:39 ` [virtio-comment] " Parav Pandit
2023-11-16 4:19 ` Jason Wang
2023-11-16 5:27 ` Parav Pandit
2023-11-16 10:12 ` Zhu, Lingshan
2023-11-21 7:33 ` Jason Wang
2023-11-21 16:32 ` Parav Pandit
2023-11-22 5:28 ` Jason Wang
2023-11-22 6:11 ` Parav Pandit
2023-11-24 3:35 ` Jason Wang
2023-11-24 9:04 ` Michael S. Tsirkin
2023-11-24 11:50 ` Jason Wang
2023-11-24 12:17 ` Michael S. Tsirkin
2023-11-24 13:01 ` Jason Wang
2023-11-24 14:45 ` Michael S. Tsirkin
2023-11-27 6:38 ` Jason Wang
2023-11-27 8:27 ` Michael S. Tsirkin
2023-11-27 9:54 ` Zhu, Lingshan
2023-11-21 21:18 ` Michael S. Tsirkin
2023-11-22 1:51 ` Zhu, Lingshan
2023-11-22 6:47 ` Parav Pandit
2023-11-22 10:04 ` Zhu, Lingshan
2023-11-22 10:14 ` Parav Pandit
2023-11-22 6:49 ` Michael S. Tsirkin
2023-11-22 10:03 ` Zhu, Lingshan
2023-11-22 13:37 ` Michael S. Tsirkin
2023-11-22 5:28 ` Jason Wang
2023-11-22 6:32 ` Parav Pandit
2023-11-24 3:25 ` Jason Wang
2023-11-24 6:20 ` Michael S. Tsirkin
2023-11-24 6:28 ` Jason Wang
2023-11-24 6:43 ` Zhu, Lingshan
2023-11-24 8:50 ` Michael S. Tsirkin
2023-11-24 11:51 ` Jason Wang
2023-11-03 10:34 ` [virtio-comment] [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
2023-11-03 11:35 ` [virtio-comment] " Parav Pandit
2023-11-03 14:57 ` [virtio-comment] " Zhu, Lingshan
2023-11-03 15:50 ` Parav Pandit
2023-11-06 3:31 ` Zhu, Lingshan
2023-11-06 4:12 ` Parav Pandit
2023-11-06 9:27 ` Zhu, Lingshan
2023-11-06 10:52 ` Parav Pandit
2023-11-07 9:31 ` Zhu, Lingshan
2023-11-08 17:44 ` Michael S. Tsirkin
2023-11-09 10:00 ` Zhu, Lingshan
2023-11-09 10:02 ` Michael S. Tsirkin
2023-11-10 6:52 ` Zhu, Lingshan
2023-11-10 12:31 ` Parav Pandit
2023-11-13 3:46 ` Jason Wang
2023-11-13 9:23 ` Zhu, Lingshan
2023-11-15 17:36 ` Parav Pandit
2023-11-09 6:28 ` Parav Pandit
2023-11-09 8:41 ` Michael S. Tsirkin
2023-11-09 9:10 ` Parav Pandit
2023-11-09 9:53 ` Michael S. Tsirkin
2023-11-09 10:11 ` Parav Pandit
2023-11-09 10:09 ` Zhu, Lingshan
2023-11-09 10:25 ` Parav Pandit
2023-11-10 7:52 ` Zhu, Lingshan
2023-11-10 12:31 ` Parav Pandit
2023-11-13 9:25 ` Zhu, Lingshan
2023-11-15 17:35 ` Parav Pandit
2023-11-16 10:14 ` Zhu, Lingshan
2023-11-16 10:21 ` Parav Pandit
2023-11-17 10:02 ` Zhu, Lingshan
2023-11-17 10:06 ` Parav Pandit
2023-11-21 4:30 ` Jason Wang
2023-11-21 16:26 ` Parav Pandit
2023-11-22 4:15 ` Jason Wang
2023-11-22 7:15 ` Michael S. Tsirkin
2023-11-22 7:33 ` Parav Pandit
2023-11-22 14:43 ` Michael S. Tsirkin
2023-11-17 10:45 ` Michael S. Tsirkin
2023-11-22 1:32 ` Zhu, Lingshan
2023-11-22 6:53 ` Michael S. Tsirkin
2023-11-08 17:56 ` Michael S. Tsirkin
2023-11-13 9:29 ` Zhu, Lingshan
2023-11-13 10:10 ` Michael S. Tsirkin
2023-11-03 10:34 ` [virtio-comment] [PATCH V2 5/6] virtio: introduce dirty page tracking facility Zhu Lingshan
2023-11-03 11:35 ` [virtio-comment] " Parav Pandit
2023-11-03 14:11 ` [virtio-comment] " Zhu, Lingshan
2023-11-03 10:34 ` [virtio-comment] [PATCH V2 6/6] virtio-pci: implement dirty page tracking Zhu Lingshan
2023-11-03 10:46 ` [virtio-comment] " Michael S. Tsirkin
2023-11-03 14:21 ` Zhu, Lingshan
2023-11-06 9:16 ` Zhu, Lingshan
2023-11-06 10:15 ` Michael S. Tsirkin
2023-11-07 9:43 ` Zhu, Lingshan
2023-11-07 10:43 ` Michael S. Tsirkin
2023-11-03 10:50 ` Michael S. Tsirkin
2023-11-03 11:35 ` [virtio-comment] " Parav Pandit
2023-11-03 15:02 ` [virtio-comment] " Zhu, Lingshan
2023-11-03 15:47 ` [virtio-comment] " Parav Pandit
2023-11-05 16:12 ` [virtio-comment] " Michael S. Tsirkin
2023-11-06 3:58 ` Zhu, Lingshan
2023-11-06 10:33 ` Michael S. Tsirkin
2023-11-07 9:48 ` Zhu, Lingshan
2023-11-06 4:03 ` [virtio-comment] " Parav Pandit
2023-11-07 11:13 ` [virtio-comment] " Michael S. Tsirkin
2023-11-08 9:29 ` Zhu, Lingshan
2023-11-08 17:18 ` Michael S. Tsirkin
2023-11-09 10:29 ` Zhu, Lingshan
2023-11-09 10:41 ` Michael S. Tsirkin
2023-11-10 7:24 ` Zhu, Lingshan
2023-11-06 3:52 ` Zhu, Lingshan [this message]
2023-11-06 4:34 ` [virtio-comment] " Parav Pandit
2023-11-06 9:34 ` [virtio-comment] " Zhu, Lingshan
2023-11-06 10:52 ` [virtio-comment] " Parav Pandit
2023-11-06 11:05 ` [virtio-comment] " Michael S. Tsirkin
2023-11-06 11:07 ` [virtio-comment] " Parav Pandit
2023-11-06 11:21 ` [virtio-comment] " Michael S. Tsirkin
2023-11-07 9:52 ` Zhu, Lingshan
2023-11-07 11:33 ` Michael S. Tsirkin
2023-11-08 9:30 ` Zhu, Lingshan
2023-11-08 17:19 ` Michael S. Tsirkin
2023-11-09 10:34 ` Zhu, Lingshan
2023-11-06 11:13 ` [virtio-comment] " Parav Pandit
2023-11-07 10:01 ` [virtio-comment] " Zhu, Lingshan
2023-11-07 10:25 ` Michael S. Tsirkin
2023-11-07 11:12 ` [virtio-comment] " Parav Pandit
2023-11-07 11:24 ` Parav Pandit
2023-11-08 7:11 ` [virtio-comment] " Jason Wang
2023-11-08 7:16 ` [virtio-comment] " Parav Pandit
2023-11-07 11:31 ` [virtio-comment] " Michael S. Tsirkin
2023-11-08 9:36 ` Zhu, Lingshan
2023-11-07 12:00 ` Michael S. Tsirkin
2023-11-06 10:29 ` Michael S. Tsirkin
2023-11-06 11:21 ` [virtio-comment] " Parav Pandit
2023-11-06 11:27 ` [virtio-comment] " Michael S. Tsirkin
2023-11-06 11:31 ` [virtio-comment] " Parav Pandit
2023-11-07 10:02 ` [virtio-comment] " Zhu, Lingshan
2023-11-07 11:36 ` Michael S. Tsirkin
2023-11-05 16:20 ` Michael S. Tsirkin
2023-11-06 3:51 ` [virtio-comment] " Parav Pandit
2023-11-03 14:32 ` [virtio-comment] " Zhu, Lingshan
2023-11-05 16:16 ` Michael S. Tsirkin
2023-11-06 4:06 ` Zhu, Lingshan
2023-11-06 10:22 ` Michael S. Tsirkin
2023-11-07 10:44 ` Zhu, Lingshan
2023-11-07 11:29 ` Michael S. Tsirkin
2023-11-07 8:01 ` [virtio-comment] Re: [PATCH V2 0/6] introduce basic facilities for virito live migration Michael S. Tsirkin
2023-11-08 10:19 ` Zhu, Lingshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=445ff573-72c3-4fe4-9e07-e7fdd2dc5750@intel.com \
--to=lingshan.zhu@intel.com \
--cc=cohuck@redhat.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=stefanha@redhat.com \
--cc=virtio-comment@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.