From: Dan Williams <dan.j.williams@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: Pankaj Gupta <pagupta@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Rik van Riel <riel@redhat.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
kvm-devel <kvm@vger.kernel.org>,
Qemu Developers <qemu-devel@nongnu.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
ross zwisler <ross.zwisler@linux.intel.com>,
Kevin Wolf <kwolf@redhat.com>,
Nitesh Narayan Lal <nilal@redhat.com>,
Haozhong Zhang <haozhong.zhang@intel.com>,
Ross Zwisler <ross.zwisler@intel.com>
Subject: Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion
Date: Thu, 18 Jan 2018 09:38:13 -0800 [thread overview]
Message-ID: <CAPcyv4j9b6ARvKcJkE25eNHatWACscMJTN_kCLSM6D+bfu_msA@mail.gmail.com> (raw)
In-Reply-To: <f1ca60cc-5506-a161-b473-f0de363b7e95@redhat.com>
On Thu, Jan 18, 2018 at 8:53 AM, David Hildenbrand <david@redhat.com> wrote:
> On 24.11.2017 13:40, Pankaj Gupta wrote:
>>
>> Hello,
>>
>> Thank you all for all the useful suggestions.
>> I want to summarize the discussions so far in the
>> thread. Please see below:
>>
>>>>>
>>>>>> We can go with the "best" interface for what
>>>>>> could be a relatively slow flush (fsync on a
>>>>>> file on ssd/disk on the host), which requires
>>>>>> that the flushing task wait on completion
>>>>>> asynchronously.
>>>>>
>>>>>
>>>>> I'd like to clarify the interface of "wait on completion
>>>>> asynchronously" and KVM async page fault a bit more.
>>>>>
>>>>> Current design of async-page-fault only works on RAM rather
>>>>> than MMIO, i.e, if the page fault caused by accessing the
>>>>> device memory of a emulated device, it needs to go to
>>>>> userspace (QEMU) which emulates the operation in vCPU's
>>>>> thread.
>>>>>
>>>>> As i mentioned before the memory region used for vNVDIMM
>>>>> flush interface should be MMIO and consider its support
>>>>> on other hypervisors, so we do better push this async
>>>>> mechanism into the flush interface design itself rather
>>>>> than depends on kvm async-page-fault.
>>>>
>>>> I would expect this interface to be virtio-ring based to queue flush
>>>> requests asynchronously to the host.
>>>
>>> Could we reuse the virtio-blk device, only with a different device id?
>>
>> As per previous discussions, there were suggestions on main two parts of the project:
>>
>> 1] Expose vNVDIMM memory range to KVM guest.
>>
>> - Add flag in ACPI NFIT table for this new memory type. Do we need NVDIMM spec
>> changes for this?
>>
>> - Guest should be able to add this memory in system memory map. Name of the added memory in
>> '/proc/iomem' should be different(shared memory?) than persistent memory as it
>> does not satisfy exact definition of persistent memory (requires an explicit flush).
>>
>> - Guest should not allow 'device-dax' and other fancy features which are not
>> virtualization friendly.
>>
>> 2] Flushing interface to persist guest changes.
>>
>> - As per suggestion by ChristophH (CCed), we explored options other then virtio like MMIO etc.
>> Looks like most of these options are not use-case friendly. As we want to do fsync on a
>> file on ssd/disk on the host and we cannot make guest vCPU's wait for that time.
>>
>> - Though adding new driver(virtio-pmem) looks like repeated work and not needed so we can
>> go with the existing pmem driver and add flush specific to this new memory type.
>
> I'd like to emphasize again, that I would prefer a virtio-pmem only
> solution.
>
> There are architectures out there (e.g. s390x) that don't support
> NVDIMMs - there is no HW interface to expose any such stuff.
>
> However, with virtio-pmem, we could make it work also on architectures
> not having ACPI and friends.
ACPI and virtio-only can share the same pmem driver. There are two
parts to this, region discovery and setting up the pmem driver. For
discovery you can either have an NFIT-bus defined range, or a new
virtio-pmem-bus define it. As far as the pmem driver itself it's
agnostic to how the range is discovered.
In other words, pmem consumes 'regions' from libnvdimm and the a bus
provider like nfit, e820, or a new virtio-mechansim produce 'regions'.
next prev parent reply other threads:[~2018-01-18 17:38 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com>
2017-07-21 6:56 ` [Qemu-devel] KVM "fake DAX" flushing interface - discussion Pankaj Gupta
2017-07-21 9:51 ` Haozhong Zhang
2017-07-21 10:21 ` Pankaj Gupta
2017-07-21 12:12 ` Stefan Hajnoczi
2017-07-21 13:29 ` Pankaj Gupta
2017-07-21 14:00 ` Rik van Riel
2017-07-21 15:58 ` Stefan Hajnoczi
2017-07-22 19:34 ` Dan Williams
2017-07-23 14:04 ` Rik van Riel
2017-07-23 16:01 ` Dan Williams
2017-07-23 18:10 ` Rik van Riel
2017-07-23 20:10 ` Dan Williams
2017-07-24 10:23 ` Jan Kara
2017-07-24 12:06 ` Pankaj Gupta
2017-07-24 12:37 ` Jan Kara
2017-07-24 15:10 ` Dan Williams
2017-07-24 15:48 ` Jan Kara
2017-07-24 16:19 ` Dan Williams
2017-07-25 14:27 ` Pankaj Gupta
2017-07-25 14:46 ` Dan Williams
2017-07-25 20:59 ` Rik van Riel
2017-07-26 13:47 ` Pankaj Gupta
2017-07-26 21:27 ` Rik van Riel
2017-07-26 21:40 ` Dan Williams
2017-07-26 23:46 ` Rik van Riel
2017-07-27 0:54 ` Dan Williams
2017-10-31 7:13 ` Xiao Guangrong
2017-10-31 14:20 ` Dan Williams
2017-11-01 3:43 ` Xiao Guangrong
2017-11-01 4:25 ` Dan Williams
2017-11-01 6:46 ` Xiao Guangrong
2017-11-01 15:20 ` Dan Williams
2017-11-02 8:50 ` Xiao Guangrong
2017-11-02 16:30 ` Dan Williams
2017-11-03 6:21 ` Xiao Guangrong
2017-11-21 18:19 ` Rik van Riel
2017-11-21 18:26 ` Dan Williams
2017-11-21 18:35 ` Rik van Riel
2017-11-23 4:05 ` Xiao Guangrong
2017-11-23 16:14 ` Dan Williams
2017-11-23 16:28 ` Paolo Bonzini
2017-11-24 12:40 ` Pankaj Gupta
2017-11-24 12:44 ` Paolo Bonzini
2017-11-24 13:02 ` Pankaj Gupta
2017-11-24 13:20 ` Paolo Bonzini
2017-11-28 18:03 ` Dan Williams
2018-01-13 6:23 ` Pankaj Gupta
2018-01-17 16:17 ` Dan Williams
2018-01-17 17:31 ` Pankaj Gupta
2018-01-18 16:53 ` David Hildenbrand
2018-01-18 17:38 ` Dan Williams [this message]
2018-01-18 17:48 ` David Hildenbrand
2018-01-18 18:45 ` Dan Williams
2018-01-18 18:54 ` Pankaj Gupta
2018-01-18 18:59 ` Dan Williams
2018-01-18 19:36 ` Pankaj Gupta
2018-01-18 19:48 ` Dan Williams
2018-01-18 19:51 ` David Hildenbrand
2018-01-18 20:11 ` Dan Williams
2017-11-06 7:57 ` Pankaj Gupta
2017-11-06 16:57 ` Dan Williams
2017-11-07 11:21 ` Pankaj Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPcyv4j9b6ARvKcJkE25eNHatWACscMJTN_kCLSM6D+bfu_msA@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=haozhong.zhang@intel.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=linux-nvdimm@ml01.01.org \
--cc=nilal@redhat.com \
--cc=pagupta@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=riel@redhat.com \
--cc=ross.zwisler@intel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
--cc=xiaoguangrong.eric@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).