From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34462) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fBkSs-0004aw-OO for qemu-devel@nongnu.org; Thu, 26 Apr 2018 13:13:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fBkSo-0003HX-Ky for qemu-devel@nongnu.org; Thu, 26 Apr 2018 13:13:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46078) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fBkSo-0003H0-DB for qemu-devel@nongnu.org; Thu, 26 Apr 2018 13:13:46 -0400 Date: Thu, 26 Apr 2018 13:13:44 -0400 (EDT) From: Pankaj Gupta Message-ID: <1302242642.23016855.1524762824836.JavaMail.zimbra@redhat.com> In-Reply-To: References: <20180425112415.12327-1-pagupta@redhat.com> <20180425112415.12327-3-pagupta@redhat.com> <20180426131517.GB30991@stefanha-x1.localdomain> <58645254.23011245.1524760853269.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC v2 2/2] pmem: device flush over VIRTIO List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dan Williams Cc: Stefan Hajnoczi , Linux Kernel Mailing List , KVM list , Qemu Developers , linux-nvdimm , Linux MM , Jan Kara , Stefan Hajnoczi , Rik van Riel , haozhong zhang , Nitesh Narayan Lal , Kevin Wolf , Paolo Bonzini , ross zwisler , David Hildenbrand , xiaoguangrong eric , Christoph Hellwig , Marcel Apfelbaum , "Michael S. Tsirkin" , niteshnarayanlal@hotmail.com, Igor Mammedov , lcapitulino@redhat.com > > > >> > >> On Wed, Apr 25, 2018 at 04:54:14PM +0530, Pankaj Gupta wrote: > >> > This patch adds functionality to perform > >> > flush from guest to hosy over VIRTIO > >> > when 'ND_REGION_VIRTIO'flag is set on > >> > nd_negion. Flag is set by 'virtio-pmem' > >> > driver. > >> > > >> > Signed-off-by: Pankaj Gupta > >> > --- > >> > drivers/nvdimm/region_devs.c | 7 +++++++ > >> > 1 file changed, 7 insertions(+) > >> > > >> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c > >> > index a612be6..6c6454e 100644 > >> > --- a/drivers/nvdimm/region_devs.c > >> > +++ b/drivers/nvdimm/region_devs.c > >> > @@ -20,6 +20,7 @@ > >> > #include > >> > #include "nd-core.h" > >> > #include "nd.h" > >> > +#include > >> > > >> > /* > >> > * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is > >> > @@ -1074,6 +1075,12 @@ void nvdimm_flush(struct nd_region *nd_region) > >> > struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev); > >> > int i, idx; > >> > > >> > + /* call PV device flush */ > >> > + if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) { > >> > + virtio_pmem_flush(&nd_region->dev); > >> > + return; > >> > + } > >> > >> How does libnvdimm know when flush has completed? > >> > >> Callers expect the flush to be finished when nvdimm_flush() returns but > >> the virtio driver has only queued the request, it hasn't waited for > >> completion! > > > > I tried to implement what nvdimm does right now. It just writes to > > flush hint address to make sure data persists. > > nvdimm_flush() is currently expected to be synchronous. Currently it > is sfence(); write to special address; sfence(). By the time the > second sfence returns the data is flushed. So you would need to make > this virtio flush interface synchronous as well, but that appears > problematic to stop the guest for unbounded amounts of time. Instead, > you need to rework nvdimm_flush() and the pmem driver to make these > flush requests asynchronous and add the plumbing for completion > callbacks via bio_endio(). o.k. > > > I just did not want to block guest write requests till host side > > fsync completes. > > You must complete the flush before bio_endio(), otherwise you're > violating the expectations of the guest filesystem/block-layer. sure! > > > > > be worse for operations on different guest files because all these > > operations would happen > > ultimately on same file at host. > > > > I think with current way, we can achieve an asynchronous queuing mechanism > > on cost of not > > 100% sure when fsync would complete but it is assured it will happen. Also, > > its entire block > > flush. > > No, again, that's broken. We need to add the plumbing for > communicating the fsync() completion relative the WRITE_{FLUSH,FUA} > bio in the guest. Sure. Thanks Dan & Stefan for the explanation and review. Best regards, Pankaj