Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: Pankaj Gupta <pagupta@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	kvm-devel <kvm@vger.kernel.org>,
	Qemu Developers <qemu-devel@nongnu.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	ross zwisler <ross.zwisler@linux.intel.com>,
	Kevin Wolf <kwolf@redhat.com>,
	Nitesh Narayan Lal <nilal@redhat.com>,
	Haozhong Zhang <haozhong.zhang@intel.com>,
	Ross Zwisler <ross.zwisler@intel.com>
Subject: Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion
Date: Thu, 18 Jan 2018 09:38:13 -0800	[thread overview]
Message-ID: <CAPcyv4j9b6ARvKcJkE25eNHatWACscMJTN_kCLSM6D+bfu_msA@mail.gmail.com> (raw)
In-Reply-To: <f1ca60cc-5506-a161-b473-f0de363b7e95@redhat.com>

On Thu, Jan 18, 2018 at 8:53 AM, David Hildenbrand <david@redhat.com> wrote:
> On 24.11.2017 13:40, Pankaj Gupta wrote:
>>
>> Hello,
>>
>> Thank you all for all the useful suggestions.
>> I want to summarize the discussions so far in the
>> thread. Please see below:
>>
>>>>>
>>>>>> We can go with the "best" interface for what
>>>>>> could be a relatively slow flush (fsync on a
>>>>>> file on ssd/disk on the host), which requires
>>>>>> that the flushing task wait on completion
>>>>>> asynchronously.
>>>>>
>>>>>
>>>>> I'd like to clarify the interface of "wait on completion
>>>>> asynchronously" and KVM async page fault a bit more.
>>>>>
>>>>> Current design of async-page-fault only works on RAM rather
>>>>> than MMIO, i.e, if the page fault caused by accessing the
>>>>> device memory of a emulated device, it needs to go to
>>>>> userspace (QEMU) which emulates the operation in vCPU's
>>>>> thread.
>>>>>
>>>>> As i mentioned before the memory region used for vNVDIMM
>>>>> flush interface should be MMIO and consider its support
>>>>> on other hypervisors, so we do better push this async
>>>>> mechanism into the flush interface design itself rather
>>>>> than depends on kvm async-page-fault.
>>>>
>>>> I would expect this interface to be virtio-ring based to queue flush
>>>> requests asynchronously to the host.
>>>
>>> Could we reuse the virtio-blk device, only with a different device id?
>>
>> As per previous discussions, there were suggestions on main two parts of the project:
>>
>> 1] Expose vNVDIMM memory range to KVM guest.
>>
>>    - Add flag in ACPI NFIT table for this new memory type. Do we need NVDIMM spec
>>      changes for this?
>>
>>    - Guest should be able to add this memory in system memory map. Name of the added memory in
>>      '/proc/iomem' should be different(shared memory?) than persistent memory as it
>>      does not satisfy exact definition of persistent memory (requires an explicit flush).
>>
>>    - Guest should not allow 'device-dax' and other fancy features which are not
>>      virtualization friendly.
>>
>> 2] Flushing interface to persist guest changes.
>>
>>    - As per suggestion by ChristophH (CCed), we explored options other then virtio like MMIO etc.
>>      Looks like most of these options are not use-case friendly. As we want to do fsync on a
>>      file on ssd/disk on the host and we cannot make guest vCPU's wait for that time.
>>
>>    - Though adding new driver(virtio-pmem) looks like repeated work and not needed so we can
>>      go with the existing pmem driver and add flush specific to this new memory type.
>
> I'd like to emphasize again, that I would prefer a virtio-pmem only
> solution.
>
> There are architectures out there (e.g. s390x) that don't support
> NVDIMMs - there is no HW interface to expose any such stuff.
>
> However, with virtio-pmem, we could make it work also on architectures
> not having ACPI and friends.

ACPI and virtio-only can share the same pmem driver. There are two
parts to this, region discovery and setting up the pmem driver. For
discovery you can either have an NFIT-bus defined range, or a new
virtio-pmem-bus define it. As far as the pmem driver itself it's
agnostic to how the range is discovered.

In other words, pmem consumes 'regions' from libnvdimm and the a bus
provider like nfit, e820, or a new virtio-mechansim produce 'regions'.

next prev parent reply	other threads:[~2018-01-18 17:38 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com>
2017-07-21  6:56 ` [Qemu-devel] KVM "fake DAX" flushing interface - discussion Pankaj Gupta
2017-07-21  9:51   ` Haozhong Zhang
2017-07-21 10:21     ` Pankaj Gupta
2017-07-21 12:12   ` Stefan Hajnoczi
2017-07-21 13:29     ` Pankaj Gupta
2017-07-21 14:00       ` Rik van Riel
2017-07-21 15:58       ` Stefan Hajnoczi
2017-07-22 19:34         ` Dan Williams
2017-07-23 14:04           ` Rik van Riel
2017-07-23 16:01             ` Dan Williams
2017-07-23 18:10               ` Rik van Riel
2017-07-23 20:10                 ` Dan Williams
2017-07-24 10:23                   ` Jan Kara
2017-07-24 12:06                     ` Pankaj Gupta
2017-07-24 12:37                       ` Jan Kara
2017-07-24 15:10                         ` Dan Williams
2017-07-24 15:48                           ` Jan Kara
2017-07-24 16:19                             ` Dan Williams
2017-07-25 14:27                         ` Pankaj Gupta
2017-07-25 14:46                           ` Dan Williams
2017-07-25 20:59                             ` Rik van Riel
2017-07-26 13:47                               ` Pankaj Gupta
2017-07-26 21:27                                 ` Rik van Riel
2017-07-26 21:40                                   ` Dan Williams
2017-07-26 23:46                                     ` Rik van Riel
2017-07-27  0:54                                       ` Dan Williams
2017-10-31  7:13                                         ` Xiao Guangrong
2017-10-31 14:20                                           ` Dan Williams
2017-11-01  3:43                                             ` Xiao Guangrong
2017-11-01  4:25                                               ` Dan Williams
2017-11-01  6:46                                                 ` Xiao Guangrong
2017-11-01 15:20                                                   ` Dan Williams
2017-11-02  8:50                                                     ` Xiao Guangrong
2017-11-02 16:30                                                       ` Dan Williams
2017-11-03  6:21                                                         ` Xiao Guangrong
2017-11-21 18:19                                                           ` Rik van Riel
2017-11-21 18:26                                                             ` Dan Williams
2017-11-21 18:35                                                               ` Rik van Riel
2017-11-23  4:05                                                             ` Xiao Guangrong
2017-11-23 16:14                                                               ` Dan Williams
2017-11-23 16:28                                                                 ` Paolo Bonzini
2017-11-24 12:40                                                                   ` Pankaj Gupta
2017-11-24 12:44                                                                     ` Paolo Bonzini
2017-11-24 13:02                                                                       ` Pankaj Gupta
2017-11-24 13:20                                                                         ` Paolo Bonzini
2017-11-28 18:03                                                                     ` Dan Williams
2018-01-13  6:23                                                                       ` Pankaj Gupta
2018-01-17 16:17                                                                         ` Dan Williams
2018-01-17 17:31                                                                           ` Pankaj Gupta
2018-01-18 16:53                                                                     ` David Hildenbrand
2018-01-18 17:38                                                                       ` Dan Williams [this message]
2018-01-18 17:48                                                                         ` David Hildenbrand
2018-01-18 18:45                                                                           ` Dan Williams
2018-01-18 18:54                                                                           ` Pankaj Gupta
2018-01-18 18:59                                                                             ` Dan Williams
2018-01-18 19:36                                                                               ` Pankaj Gupta
2018-01-18 19:48                                                                                 ` Dan Williams
2018-01-18 19:51                                                                               ` David Hildenbrand
2018-01-18 20:11                                                                                 ` Dan Williams
2017-11-06  7:57                                                         ` Pankaj Gupta
2017-11-06 16:57                                                           ` Dan Williams
2017-11-07 11:21                                                             ` Pankaj Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4j9b6ARvKcJkE25eNHatWACscMJTN_kCLSM6D+bfu_msA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=haozhong.zhang@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=nilal@redhat.com \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).