From: Pankaj Gupta <pagupta@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Dan Williams <dan.j.williams@intel.com>,
Rik van Riel <riel@redhat.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
Christoph Hellwig <hch@infradead.org>
Cc: Kevin Wolf <kwolf@redhat.com>, Jan Kara <jack@suse.cz>,
kvm-devel <kvm@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
Ross Zwisler <ross.zwisler@intel.com>,
Qemu Developers <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
Nitesh Narayan Lal <nilal@redhat.com>
Subject: Re: KVM "fake DAX" flushing interface - discussion
Date: Fri, 24 Nov 2017 07:40:07 -0500 (EST) [thread overview]
Message-ID: <336152896.34452750.1511527207457.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <654f8935-258e-22ef-fae4-3e14e91e8fae@redhat.com>
Hello,
Thank you all for all the useful suggestions.
I want to summarize the discussions so far in the
thread. Please see below:
> >>
> >>> We can go with the "best" interface for what
> >>> could be a relatively slow flush (fsync on a
> >>> file on ssd/disk on the host), which requires
> >>> that the flushing task wait on completion
> >>> asynchronously.
> >>
> >>
> >> I'd like to clarify the interface of "wait on completion
> >> asynchronously" and KVM async page fault a bit more.
> >>
> >> Current design of async-page-fault only works on RAM rather
> >> than MMIO, i.e, if the page fault caused by accessing the
> >> device memory of a emulated device, it needs to go to
> >> userspace (QEMU) which emulates the operation in vCPU's
> >> thread.
> >>
> >> As i mentioned before the memory region used for vNVDIMM
> >> flush interface should be MMIO and consider its support
> >> on other hypervisors, so we do better push this async
> >> mechanism into the flush interface design itself rather
> >> than depends on kvm async-page-fault.
> >
> > I would expect this interface to be virtio-ring based to queue flush
> > requests asynchronously to the host.
>
> Could we reuse the virtio-blk device, only with a different device id?
As per previous discussions, there were suggestions on main two parts of the project:
1] Expose vNVDIMM memory range to KVM guest.
- Add flag in ACPI NFIT table for this new memory type. Do we need NVDIMM spec
changes for this?
- Guest should be able to add this memory in system memory map. Name of the added memory in
'/proc/iomem' should be different(shared memory?) than persistent memory as it
does not satisfy exact definition of persistent memory (requires an explicit flush).
- Guest should not allow 'device-dax' and other fancy features which are not
virtualization friendly.
2] Flushing interface to persist guest changes.
- As per suggestion by ChristophH (CCed), we explored options other then virtio like MMIO etc.
Looks like most of these options are not use-case friendly. As we want to do fsync on a
file on ssd/disk on the host and we cannot make guest vCPU's wait for that time.
- Though adding new driver(virtio-pmem) looks like repeated work and not needed so we can
go with the existing pmem driver and add flush specific to this new memory type.
- Suggestion by Paolo & Stefan(previously) to use virtio-blk makes sense if just
want a flush vehicle to send guest commands to host and get reply after asynchronous
execution. There was previous discussion [1] with Rik & Dan on this.
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg08373.html
Is my understanding correct here?
Thanks,
Pankaj
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Pankaj Gupta <pagupta@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Dan Williams <dan.j.williams@intel.com>,
Rik van Riel <riel@redhat.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>, Stefan Hajnoczi <stefanha@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
kvm-devel <kvm@vger.kernel.org>,
Qemu Developers <qemu-devel@nongnu.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
ross zwisler <ross.zwisler@linux.intel.com>,
Kevin Wolf <kwolf@redhat.com>,
Nitesh Narayan Lal <nilal@redhat.com>,
Haozhong Zhang <haozhong.zhang@intel.com>,
Ross Zwisler <ross.zwisler@intel.com>
Subject: Re: KVM "fake DAX" flushing interface - discussion
Date: Fri, 24 Nov 2017 07:40:07 -0500 (EST) [thread overview]
Message-ID: <336152896.34452750.1511527207457.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <654f8935-258e-22ef-fae4-3e14e91e8fae@redhat.com>
Hello,
Thank you all for all the useful suggestions.
I want to summarize the discussions so far in the
thread. Please see below:
> >>
> >>> We can go with the "best" interface for what
> >>> could be a relatively slow flush (fsync on a
> >>> file on ssd/disk on the host), which requires
> >>> that the flushing task wait on completion
> >>> asynchronously.
> >>
> >>
> >> I'd like to clarify the interface of "wait on completion
> >> asynchronously" and KVM async page fault a bit more.
> >>
> >> Current design of async-page-fault only works on RAM rather
> >> than MMIO, i.e, if the page fault caused by accessing the
> >> device memory of a emulated device, it needs to go to
> >> userspace (QEMU) which emulates the operation in vCPU's
> >> thread.
> >>
> >> As i mentioned before the memory region used for vNVDIMM
> >> flush interface should be MMIO and consider its support
> >> on other hypervisors, so we do better push this async
> >> mechanism into the flush interface design itself rather
> >> than depends on kvm async-page-fault.
> >
> > I would expect this interface to be virtio-ring based to queue flush
> > requests asynchronously to the host.
>
> Could we reuse the virtio-blk device, only with a different device id?
As per previous discussions, there were suggestions on main two parts of the project:
1] Expose vNVDIMM memory range to KVM guest.
- Add flag in ACPI NFIT table for this new memory type. Do we need NVDIMM spec
changes for this?
- Guest should be able to add this memory in system memory map. Name of the added memory in
'/proc/iomem' should be different(shared memory?) than persistent memory as it
does not satisfy exact definition of persistent memory (requires an explicit flush).
- Guest should not allow 'device-dax' and other fancy features which are not
virtualization friendly.
2] Flushing interface to persist guest changes.
- As per suggestion by ChristophH (CCed), we explored options other then virtio like MMIO etc.
Looks like most of these options are not use-case friendly. As we want to do fsync on a
file on ssd/disk on the host and we cannot make guest vCPU's wait for that time.
- Though adding new driver(virtio-pmem) looks like repeated work and not needed so we can
go with the existing pmem driver and add flush specific to this new memory type.
- Suggestion by Paolo & Stefan(previously) to use virtio-blk makes sense if just
want a flush vehicle to send guest commands to host and get reply after asynchronous
execution. There was previous discussion [1] with Rik & Dan on this.
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg08373.html
Is my understanding correct here?
Thanks,
Pankaj
WARNING: multiple messages have this Message-ID (diff)
From: Pankaj Gupta <pagupta@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Dan Williams <dan.j.williams@intel.com>,
Rik van Riel <riel@redhat.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>, Stefan Hajnoczi <stefanha@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
kvm-devel <kvm@vger.kernel.org>,
Qemu Developers <qemu-devel@nongnu.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
ross zwisler <ross.zwisler@linux.intel.com>,
Kevin Wolf <kwolf@redhat.com>,
Nitesh Narayan Lal <nilal@redhat.com>,
Haozhong Zhang <haozhong.zhang@intel.com>,
Ross Zwisler <ross.zwisler@intel.com>
Subject: Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion
Date: Fri, 24 Nov 2017 07:40:07 -0500 (EST) [thread overview]
Message-ID: <336152896.34452750.1511527207457.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <654f8935-258e-22ef-fae4-3e14e91e8fae@redhat.com>
Hello,
Thank you all for all the useful suggestions.
I want to summarize the discussions so far in the
thread. Please see below:
> >>
> >>> We can go with the "best" interface for what
> >>> could be a relatively slow flush (fsync on a
> >>> file on ssd/disk on the host), which requires
> >>> that the flushing task wait on completion
> >>> asynchronously.
> >>
> >>
> >> I'd like to clarify the interface of "wait on completion
> >> asynchronously" and KVM async page fault a bit more.
> >>
> >> Current design of async-page-fault only works on RAM rather
> >> than MMIO, i.e, if the page fault caused by accessing the
> >> device memory of a emulated device, it needs to go to
> >> userspace (QEMU) which emulates the operation in vCPU's
> >> thread.
> >>
> >> As i mentioned before the memory region used for vNVDIMM
> >> flush interface should be MMIO and consider its support
> >> on other hypervisors, so we do better push this async
> >> mechanism into the flush interface design itself rather
> >> than depends on kvm async-page-fault.
> >
> > I would expect this interface to be virtio-ring based to queue flush
> > requests asynchronously to the host.
>
> Could we reuse the virtio-blk device, only with a different device id?
As per previous discussions, there were suggestions on main two parts of the project:
1] Expose vNVDIMM memory range to KVM guest.
- Add flag in ACPI NFIT table for this new memory type. Do we need NVDIMM spec
changes for this?
- Guest should be able to add this memory in system memory map. Name of the added memory in
'/proc/iomem' should be different(shared memory?) than persistent memory as it
does not satisfy exact definition of persistent memory (requires an explicit flush).
- Guest should not allow 'device-dax' and other fancy features which are not
virtualization friendly.
2] Flushing interface to persist guest changes.
- As per suggestion by ChristophH (CCed), we explored options other then virtio like MMIO etc.
Looks like most of these options are not use-case friendly. As we want to do fsync on a
file on ssd/disk on the host and we cannot make guest vCPU's wait for that time.
- Though adding new driver(virtio-pmem) looks like repeated work and not needed so we can
go with the existing pmem driver and add flush specific to this new memory type.
- Suggestion by Paolo & Stefan(previously) to use virtio-blk makes sense if just
want a flush vehicle to send guest commands to host and get reply after asynchronous
execution. There was previous discussion [1] with Rik & Dan on this.
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg08373.html
Is my understanding correct here?
Thanks,
Pankaj
next prev parent reply other threads:[~2017-11-24 12:40 UTC|newest]
Thread overview: 176+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com>
2017-07-21 6:56 ` KVM "fake DAX" flushing interface - discussion Pankaj Gupta
2017-07-21 6:56 ` [Qemu-devel] " Pankaj Gupta
2017-07-21 6:56 ` Pankaj Gupta
2017-07-21 9:51 ` Haozhong Zhang
2017-07-21 9:51 ` [Qemu-devel] " Haozhong Zhang
2017-07-21 9:51 ` Haozhong Zhang
2017-07-21 10:21 ` Pankaj Gupta
2017-07-21 10:21 ` [Qemu-devel] " Pankaj Gupta
2017-07-21 10:21 ` Pankaj Gupta
2017-07-21 12:12 ` Stefan Hajnoczi
2017-07-21 12:12 ` [Qemu-devel] " Stefan Hajnoczi
2017-07-21 13:29 ` Pankaj Gupta
2017-07-21 13:29 ` [Qemu-devel] " Pankaj Gupta
2017-07-21 13:29 ` Pankaj Gupta
2017-07-21 14:00 ` Rik van Riel
2017-07-21 14:00 ` [Qemu-devel] " Rik van Riel
2017-07-21 14:00 ` Rik van Riel
2017-07-21 15:58 ` Stefan Hajnoczi
2017-07-21 15:58 ` [Qemu-devel] " Stefan Hajnoczi
2017-07-22 19:34 ` Dan Williams
2017-07-22 19:34 ` [Qemu-devel] " Dan Williams
2017-07-22 19:34 ` Dan Williams
2017-07-23 14:04 ` Rik van Riel
2017-07-23 14:04 ` [Qemu-devel] " Rik van Riel
2017-07-23 14:04 ` Rik van Riel
2017-07-23 16:01 ` Dan Williams
2017-07-23 16:01 ` [Qemu-devel] " Dan Williams
2017-07-23 16:01 ` Dan Williams
2017-07-23 18:10 ` Rik van Riel
2017-07-23 18:10 ` [Qemu-devel] " Rik van Riel
2017-07-23 18:10 ` Rik van Riel
2017-07-23 20:10 ` Dan Williams
2017-07-23 20:10 ` [Qemu-devel] " Dan Williams
2017-07-23 20:10 ` Dan Williams
2017-07-24 10:23 ` Jan Kara
2017-07-24 10:23 ` [Qemu-devel] " Jan Kara
2017-07-24 10:23 ` Jan Kara
2017-07-24 12:06 ` Pankaj Gupta
2017-07-24 12:06 ` [Qemu-devel] " Pankaj Gupta
2017-07-24 12:06 ` Pankaj Gupta
2017-07-24 12:37 ` Jan Kara
2017-07-24 12:37 ` [Qemu-devel] " Jan Kara
2017-07-24 12:37 ` Jan Kara
2017-07-24 15:10 ` Dan Williams
2017-07-24 15:10 ` [Qemu-devel] " Dan Williams
2017-07-24 15:10 ` Dan Williams
2017-07-24 15:48 ` Jan Kara
2017-07-24 15:48 ` [Qemu-devel] " Jan Kara
2017-07-24 15:48 ` Jan Kara
2017-07-24 16:19 ` Dan Williams
2017-07-24 16:19 ` [Qemu-devel] " Dan Williams
2017-07-24 16:19 ` Dan Williams
2017-07-25 14:27 ` Pankaj Gupta
2017-07-25 14:27 ` [Qemu-devel] " Pankaj Gupta
2017-07-25 14:27 ` Pankaj Gupta
2017-07-25 14:46 ` Dan Williams
2017-07-25 14:46 ` [Qemu-devel] " Dan Williams
2017-07-25 14:46 ` Dan Williams
2017-07-25 20:59 ` Rik van Riel
2017-07-25 20:59 ` [Qemu-devel] " Rik van Riel
2017-07-26 13:47 ` Pankaj Gupta
2017-07-26 13:47 ` [Qemu-devel] " Pankaj Gupta
2017-07-26 13:47 ` Pankaj Gupta
2017-07-26 21:27 ` Rik van Riel
2017-07-26 21:27 ` [Qemu-devel] " Rik van Riel
2017-07-26 21:40 ` Dan Williams
2017-07-26 21:40 ` [Qemu-devel] " Dan Williams
2017-07-26 21:40 ` Dan Williams
2017-07-26 23:46 ` Rik van Riel
2017-07-26 23:46 ` [Qemu-devel] " Rik van Riel
2017-07-26 23:46 ` Rik van Riel
2017-07-27 0:54 ` Dan Williams
2017-07-27 0:54 ` [Qemu-devel] " Dan Williams
2017-07-27 0:54 ` Dan Williams
2017-10-31 7:13 ` Xiao Guangrong
2017-10-31 7:13 ` [Qemu-devel] " Xiao Guangrong
2017-10-31 7:13 ` Xiao Guangrong
2017-10-31 14:20 ` Dan Williams
2017-10-31 14:20 ` [Qemu-devel] " Dan Williams
2017-10-31 14:20 ` Dan Williams
2017-11-01 3:43 ` Xiao Guangrong
2017-11-01 3:43 ` [Qemu-devel] " Xiao Guangrong
2017-11-01 3:43 ` Xiao Guangrong
2017-11-01 4:25 ` Dan Williams
2017-11-01 4:25 ` [Qemu-devel] " Dan Williams
2017-11-01 4:25 ` Dan Williams
2017-11-01 6:46 ` Xiao Guangrong
2017-11-01 6:46 ` [Qemu-devel] " Xiao Guangrong
2017-11-01 6:46 ` Xiao Guangrong
2017-11-01 15:20 ` Dan Williams
2017-11-01 15:20 ` [Qemu-devel] " Dan Williams
2017-11-01 15:20 ` Dan Williams
2017-11-02 8:50 ` Xiao Guangrong
2017-11-02 8:50 ` [Qemu-devel] " Xiao Guangrong
2017-11-02 8:50 ` Xiao Guangrong
2017-11-02 16:30 ` Dan Williams
2017-11-02 16:30 ` [Qemu-devel] " Dan Williams
2017-11-02 16:30 ` Dan Williams
2017-11-03 6:21 ` Xiao Guangrong
2017-11-03 6:21 ` [Qemu-devel] " Xiao Guangrong
2017-11-03 6:21 ` Xiao Guangrong
2017-11-21 18:19 ` Rik van Riel
2017-11-21 18:19 ` [Qemu-devel] " Rik van Riel
2017-11-21 18:26 ` Dan Williams
2017-11-21 18:26 ` [Qemu-devel] " Dan Williams
2017-11-21 18:26 ` Dan Williams
2017-11-21 18:35 ` Rik van Riel
2017-11-21 18:35 ` [Qemu-devel] " Rik van Riel
2017-11-23 4:05 ` Xiao Guangrong
2017-11-23 4:05 ` [Qemu-devel] " Xiao Guangrong
2017-11-23 4:05 ` Xiao Guangrong
2017-11-23 16:14 ` Dan Williams
2017-11-23 16:14 ` [Qemu-devel] " Dan Williams
2017-11-23 16:14 ` Dan Williams
2017-11-23 16:28 ` Paolo Bonzini
2017-11-23 16:28 ` [Qemu-devel] " Paolo Bonzini
2017-11-23 16:28 ` Paolo Bonzini
2017-11-24 12:40 ` Pankaj Gupta [this message]
2017-11-24 12:40 ` [Qemu-devel] " Pankaj Gupta
2017-11-24 12:40 ` Pankaj Gupta
2017-11-24 12:44 ` Paolo Bonzini
2017-11-24 12:44 ` [Qemu-devel] " Paolo Bonzini
2017-11-24 12:44 ` Paolo Bonzini
2017-11-24 13:02 ` [Qemu-devel] " Pankaj Gupta
2017-11-24 13:02 ` Pankaj Gupta
2017-11-24 13:20 ` Paolo Bonzini
2017-11-24 13:20 ` Paolo Bonzini
2017-11-28 18:03 ` Dan Williams
2017-11-28 18:03 ` [Qemu-devel] " Dan Williams
2017-11-28 18:03 ` Dan Williams
2018-01-13 6:23 ` Pankaj Gupta
2018-01-13 6:23 ` [Qemu-devel] " Pankaj Gupta
2018-01-13 6:23 ` Pankaj Gupta
2018-01-17 16:17 ` Dan Williams
2018-01-17 16:17 ` [Qemu-devel] " Dan Williams
2018-01-17 16:17 ` Dan Williams
2018-01-17 17:31 ` Pankaj Gupta
2018-01-17 17:31 ` [Qemu-devel] " Pankaj Gupta
2018-01-17 17:31 ` Pankaj Gupta
2018-01-18 16:53 ` David Hildenbrand
2018-01-18 16:53 ` [Qemu-devel] " David Hildenbrand
2018-01-18 16:53 ` David Hildenbrand
2018-01-18 17:38 ` Dan Williams
2018-01-18 17:38 ` [Qemu-devel] " Dan Williams
2018-01-18 17:38 ` Dan Williams
2018-01-18 17:48 ` David Hildenbrand
2018-01-18 17:48 ` [Qemu-devel] " David Hildenbrand
2018-01-18 17:48 ` David Hildenbrand
2018-01-18 18:45 ` Dan Williams
2018-01-18 18:45 ` [Qemu-devel] " Dan Williams
2018-01-18 18:45 ` Dan Williams
2018-01-18 18:54 ` Pankaj Gupta
2018-01-18 18:54 ` [Qemu-devel] " Pankaj Gupta
2018-01-18 18:54 ` Pankaj Gupta
2018-01-18 18:59 ` Dan Williams
2018-01-18 18:59 ` [Qemu-devel] " Dan Williams
2018-01-18 18:59 ` Dan Williams
2018-01-18 19:36 ` Pankaj Gupta
2018-01-18 19:36 ` [Qemu-devel] " Pankaj Gupta
2018-01-18 19:36 ` Pankaj Gupta
2018-01-18 19:48 ` Dan Williams
2018-01-18 19:48 ` [Qemu-devel] " Dan Williams
2018-01-18 19:48 ` Dan Williams
2018-01-18 19:51 ` David Hildenbrand
2018-01-18 19:51 ` [Qemu-devel] " David Hildenbrand
2018-01-18 19:51 ` David Hildenbrand
2018-01-18 20:11 ` Dan Williams
2018-01-18 20:11 ` [Qemu-devel] " Dan Williams
2018-01-18 20:11 ` Dan Williams
2017-11-06 7:57 ` [Qemu-devel] " Pankaj Gupta
2017-11-06 7:57 ` Pankaj Gupta
2017-11-06 16:57 ` Dan Williams
2017-11-06 16:57 ` Dan Williams
2017-11-07 11:21 ` Pankaj Gupta
2017-11-07 11:21 ` Pankaj Gupta
2017-11-07 11:21 ` Pankaj Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=336152896.34452750.1511527207457.JavaMail.zimbra@redhat.com \
--to=pagupta@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=linux-nvdimm@ml01.01.org \
--cc=nilal@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=riel@redhat.com \
--cc=ross.zwisler@intel.com \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
--cc=xiaoguangrong.eric@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.