From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33490) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eBkiA-0005Hd-UW for qemu-devel@nongnu.org; Mon, 06 Nov 2017 11:57:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eBkiA-0003WE-7j for qemu-devel@nongnu.org; Mon, 06 Nov 2017 11:57:23 -0500 Received: from mail-ot0-x22e.google.com ([2607:f8b0:4003:c0f::22e]:51183) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eBki9-0003Vu-Ua for qemu-devel@nongnu.org; Mon, 06 Nov 2017 11:57:22 -0500 Received: by mail-ot0-x22e.google.com with SMTP id 15so9562682otj.7 for ; Mon, 06 Nov 2017 08:57:20 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1228466331.27752565.1509955040884.JavaMail.zimbra@redhat.com> References: <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com> <378b10f3-b32f-84f5-2bbc-50c2ec5bcdd4@gmail.com> <86754966-281f-c3ed-938c-f009440de563@gmail.com> <1228466331.27752565.1509955040884.JavaMail.zimbra@redhat.com> From: Dan Williams Date: Mon, 6 Nov 2017 08:57:19 -0800 Message-ID: Content-Type: text/plain; charset="UTF-8" Subject: Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pankaj Gupta Cc: Xiao Guangrong , Kevin Wolf , Jan Kara , kvm-devel , Haozhong Zhang , Stefan Hajnoczi , Ross Zwisler , Qemu Developers , Stefan Hajnoczi , "linux-nvdimm@lists.01.org" , Paolo Bonzini , ross zwisler , Nitesh Narayan Lal , Christoph Hellwig On Sun, Nov 5, 2017 at 11:57 PM, Pankaj Gupta wrote: > > >> [..] >> >> Yes, the GUID will specifically identify this range as "Virtio Shared >> >> Memory" (or whatever name survives after a bikeshed debate). The >> >> libnvdimm core then needs to grow a new region type that mostly >> >> behaves the same as a "pmem" region, but drivers/nvdimm/pmem.c grows a >> >> new flush interface to perform the host communication. Device-dax >> >> would be disallowed from attaching to this region type, or we could >> >> grow a new device-dax type that does not allow the raw device to be >> >> mapped, but allows a filesystem mounted on top to manage the flush >> >> interface. >> > >> > >> > I am afraid it is not a good idea that a single SPA is used for multiple >> > purposes. For the region used as "pmem" is directly mapped to the VM so >> > that guest can freely access it without host's assistance, however, for >> > the region used as "host communication" is not mapped to VM, so that >> > it causes VM-exit and host gets the chance to do specific operations, >> > e.g, flush cache. So we'd better distinctly define these two regions to >> > avoid the unnecessary complexity in hypervisor. >> >> Good point, I was assuming that the mmio flush interface would be >> discovered separately from the NFIT-defined memory range. Perhaps via >> PCI in the guest? This piece of the proposal needs a bit more >> thought... > > Also, in earlier discussions we agreed for entire device flush whenever guest > performs a fsync on DAX file. If we do a MMIO call for this, guest CPU would be > trapped for the duration device flush is completed. > > Instead, if we do perform an asynchronous flush guest CPU's can be utilized by > some other tasks till flush completes? Yes, the interface for the guest to trigger and wait for flush requests should be asynchronous, just like a storage "flush-cache" command.