From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <1501112787.4073.49.camel@redhat.com> Subject: Re: KVM "fake DAX" flushing interface - discussion From: Rik van Riel Date: Wed, 26 Jul 2017 19:46:27 -0400 In-Reply-To: References: <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com> <20170724102330.GE652@quack2.suse.cz> <1157879323.33809400.1500897967669.JavaMail.zimbra@redhat.com> <20170724123752.GN652@quack2.suse.cz> <1888117852.34216619.1500992835767.JavaMail.zimbra@redhat.com> <1501016375.26846.21.camel@redhat.com> <1063764405.34607875.1501076841865.JavaMail.zimbra@redhat.com> <1501104453.26846.45.camel@redhat.com> Mime-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Kevin Wolf , Pankaj Gupta , Jan Kara , xiaoguangrong eric , kvm-devel , Stefan Hajnoczi , Ross Zwisler , Qemu Developers , Stefan Hajnoczi , "linux-nvdimm@lists.01.org" , Paolo Bonzini , Nitesh Narayan Lal List-ID: T24gV2VkLCAyMDE3LTA3LTI2IGF0IDE0OjQwIC0wNzAwLCBEYW4gV2lsbGlhbXMgd3JvdGU6Cj4g T24gV2VkLCBKdWwgMjYsIDIwMTcgYXQgMjoyNyBQTSwgUmlrIHZhbiBSaWVsIDxyaWVsQHJlZGhh dC5jb20+Cj4gd3JvdGU6Cj4gPiBPbiBXZWQsIDIwMTctMDctMjYgYXQgMDk6NDcgLTA0MDAsIFBh bmthaiBHdXB0YSB3cm90ZToKPiA+ID4gPiAKPiA+ID4gCj4gPiA+IEp1c3Qgd2FudCB0byBzdW1t YXJpemUgaGVyZShoaWdoIGxldmVsKToKPiA+ID4gCj4gPiA+IFRoaXMgd2lsbCByZXF1aXJlIGlt cGxlbWVudGluZyBuZXcgJ3ZpcnRpby1wbWVtJyBkZXZpY2Ugd2hpY2gKPiA+ID4gcHJlc2VudHMK PiA+ID4gYSBEQVggYWRkcmVzcyByYW5nZShsaWtlIHBtZW0pIHRvIGd1ZXN0IHdpdGggcmVhZC93 cml0ZShkaXJlY3QKPiA+ID4gYWNjZXNzKQo+ID4gPiAmIGRldmljZSBmbHVzaCBmdW5jdGlvbmFs aXR5LiBBbHNvLCBxZW11IHNob3VsZCBpbXBsZW1lbnQKPiA+ID4gY29ycmVzcG9uZGluZwo+ID4g PiBzdXBwb3J0IGZvciBmbHVzaCB1c2luZyB2aXJ0aW8uCj4gPiA+IAo+ID4gCj4gPiBBbHRlcm5h dGl2ZWx5LCB0aGUgZXhpc3RpbmcgcG1lbSBjb2RlLCB3aXRoCj4gPiBhIGZsdXNoLW9ubHkgYmxv Y2sgZGV2aWNlIG9uIHRoZSBzaWRlLCB3aGljaAo+ID4gaXMgc29tZWhvdyBhc3NvY2lhdGVkIHdp dGggdGhlIHBtZW0gZGV2aWNlLgo+ID4gCj4gPiBJIHdvbmRlciB3aGljaCBhbHRlcm5hdGl2ZSBs ZWFkcyB0byB0aGUgbGVhc3QKPiA+IGNvZGUgZHVwbGljYXRpb24sIGFuZCB0aGUgbGVhc3QgbWFp bnRlbmFuY2UKPiA+IGhhc3NsZSBnb2luZyBmb3J3YXJkLgo+IAo+IEknZCBtdWNoIHByZWZlciB0 byBoYXZlIGFub3RoZXIgZHJpdmVyLiBJLmUuIGEgZHJpdmVyIHRoYXQgcmVmYWN0b3JzCj4gb3V0 IHNvbWUgY29tbW9uIHBtZW0gZGV0YWlscyBpbnRvIGEgc2hhcmVkIG9iamVjdCBhbmQgY2FuIGF0 dGFjaCB0bwo+IE5EX0RFVklDRV9OQU1FU1BBQ0Vfe0lPLFBNRU19LiBBIGNvbnRyb2wgZGV2aWNl IG9uIHRoZSBzaWRlIHNlZW1zCj4gbGlrZQo+IGEgcmVjaXBlIGZvciBjb25mdXNpb24uCgpBdCB0 aGF0IHBvaW50LCB3b3VsZCBpdCBtYWtlIHNlbnNlIHRvIGV4cG9zZSB0aGVzZSBzcGVjaWFsCnZp cnRpby1wbWVtIGFyZWFzIHRvIHRoZSBndWVzdCBpbiBhIHNsaWdodGx5IGRpZmZlcmVudCB3YXks CnNvIHRoZSByZWdpb25zIHRoYXQgbmVlZCB2aXJ0aW8gZmx1c2hpbmcgYXJlIG5vdCBib3VuZCBi eQp0aGUgcmVndWxhciBkcml2ZXIsIGFuZCB0aGUgcmVndWxhciBkcml2ZXIgY2FuIGNvbnRpbnVl IHRvCndvcmsgZm9yIG1lbW9yeSByZWdpb25zIHRoYXQgYXJlIGJhY2tlZCBieSBhY3R1YWwgcG1l bSBpbgp0aGUgaG9zdD8KCj4gV2l0aCBhICRuZXdfZHJpdmVyIGluIGhhbmQgeW91IGNhbiBqdXN0 IGRvOgo+IAo+IMKgwqDCoG1vZHByb2JlICRuZXdfZHJpdmVyCj4gwqDCoMKgZWNobyAkbmFtZXNw YWNlID4gL3N5cy9idXMvbmQvZHJpdmVycy9uZF9wbWVtL3VuYmluZAo+IMKgwqDCoGVjaG8gJG5h bWVzcGFjZSA+IC9zeXMvYnVzL25kL2RyaXZlcnMvJG5ld19kcml2ZXIvbmV3X2lkCj4gwqDCoMKg ZWNobyAkbmFtZXNwYWNlID4gL3N5cy9idXMvbmQvZHJpdmVycy8kbmV3X2RyaXZlci9iaW5kCj4g Cj4gLi4uYW5kIHRoZSBndWVzdCBjYW4gYXJyYW5nZSBmb3IgJG5ld19kcml2ZXIgdG8gYmUgdGhl IGRlZmF1bHQsIHNvCj4geW91Cj4gZG9uJ3QgbmVlZCB0byBkbyB0aG9zZSBzdGVwcyBlYWNoIGJv b3Qgb2YgdGhlIFZNLCBieSBkb2luZzoKPiAKPiDCoMKgwqDCoGVjaG8gImJsYWNrbGlzdCBuZF9w bWVtIiA+IC9ldGMvbW9kcHJvYmUuZC92aXJ0LWRheC1mbHVzaC5jb25mCj4gwqDCoMKgwqBlY2hv ICJhbGlhcyBuZDp0NCogJG5ld19kcml2ZXIiID4+IC9ldGMvbW9kcHJvYmUuZC92aXJ0LWRheC0K PiBmbHVzaC5jb25mCj4gwqDCoMKgwqBlY2hvICJhbGlhcyBuZDp0NSogJG5ld19kcml2ZXIiID4+ IC9ldGMvbW9kcHJvYmUuZC92aXJ0LWRheC0KPiBmbHVzaC5jb25mCl9fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCkxpbnV4LW52ZGltbSBtYWlsaW5nIGxpc3QK TGludXgtbnZkaW1tQGxpc3RzLjAxLm9yZwpodHRwczovL2xpc3RzLjAxLm9yZy9tYWlsbWFuL2xp c3RpbmZvL2xpbnV4LW52ZGltbQo= From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rik van Riel Subject: Re: KVM "fake DAX" flushing interface - discussion Date: Wed, 26 Jul 2017 19:46:27 -0400 Message-ID: <1501112787.4073.49.camel@redhat.com> References: <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com> <20170724102330.GE652@quack2.suse.cz> <1157879323.33809400.1500897967669.JavaMail.zimbra@redhat.com> <20170724123752.GN652@quack2.suse.cz> <1888117852.34216619.1500992835767.JavaMail.zimbra@redhat.com> <1501016375.26846.21.camel@redhat.com> <1063764405.34607875.1501076841865.JavaMail.zimbra@redhat.com> <1501104453.26846.45.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Cc: Pankaj Gupta , Jan Kara , Stefan Hajnoczi , Stefan Hajnoczi , kvm-devel , Qemu Developers , "linux-nvdimm@lists.01.org" , ross zwisler , Paolo Bonzini , Kevin Wolf , Nitesh Narayan Lal , xiaoguangrong eric , Haozhong Zhang , Ross Zwisler To: Dan Williams Return-path: Received: from mx1.redhat.com ([209.132.183.28]:60074 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751103AbdGZXqc (ORCPT ); Wed, 26 Jul 2017 19:46:32 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Wed, 2017-07-26 at 14:40 -0700, Dan Williams wrote: > On Wed, Jul 26, 2017 at 2:27 PM, Rik van Riel > wrote: > > On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote: > > > > > > > > > > Just want to summarize here(high level): > > > > > > This will require implementing new 'virtio-pmem' device which > > > presents > > > a DAX address range(like pmem) to guest with read/write(direct > > > access) > > > & device flush functionality. Also, qemu should implement > > > corresponding > > > support for flush using virtio. > > > > > > > Alternatively, the existing pmem code, with > > a flush-only block device on the side, which > > is somehow associated with the pmem device. > > > > I wonder which alternative leads to the least > > code duplication, and the least maintenance > > hassle going forward. > > I'd much prefer to have another driver. I.e. a driver that refactors > out some common pmem details into a shared object and can attach to > ND_DEVICE_NAMESPACE_{IO,PMEM}. A control device on the side seems > like > a recipe for confusion. At that point, would it make sense to expose these special virtio-pmem areas to the guest in a slightly different way, so the regions that need virtio flushing are not bound by the regular driver, and the regular driver can continue to work for memory regions that are backed by actual pmem in the host? > With a $new_driver in hand you can just do: > >    modprobe $new_driver >    echo $namespace > /sys/bus/nd/drivers/nd_pmem/unbind >    echo $namespace > /sys/bus/nd/drivers/$new_driver/new_id >    echo $namespace > /sys/bus/nd/drivers/$new_driver/bind > > ...and the guest can arrange for $new_driver to be the default, so > you > don't need to do those steps each boot of the VM, by doing: > >     echo "blacklist nd_pmem" > /etc/modprobe.d/virt-dax-flush.conf >     echo "alias nd:t4* $new_driver" >> /etc/modprobe.d/virt-dax- > flush.conf >     echo "alias nd:t5* $new_driver" >> /etc/modprobe.d/virt-dax- > flush.conf From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40136) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1daW0k-0004SE-Jv for qemu-devel@nongnu.org; Wed, 26 Jul 2017 19:46:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1daW0f-0005Zq-Lm for qemu-devel@nongnu.org; Wed, 26 Jul 2017 19:46:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49924) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1daW0f-0005ZI-C9 for qemu-devel@nongnu.org; Wed, 26 Jul 2017 19:46:33 -0400 Message-ID: <1501112787.4073.49.camel@redhat.com> From: Rik van Riel Date: Wed, 26 Jul 2017 19:46:27 -0400 In-Reply-To: References: <1455443283.33337333.1500618150787.JavaMail.zimbra@redhat.com> <20170724102330.GE652@quack2.suse.cz> <1157879323.33809400.1500897967669.JavaMail.zimbra@redhat.com> <20170724123752.GN652@quack2.suse.cz> <1888117852.34216619.1500992835767.JavaMail.zimbra@redhat.com> <1501016375.26846.21.camel@redhat.com> <1063764405.34607875.1501076841865.JavaMail.zimbra@redhat.com> <1501104453.26846.45.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dan Williams Cc: Pankaj Gupta , Jan Kara , Stefan Hajnoczi , Stefan Hajnoczi , kvm-devel , Qemu Developers , "linux-nvdimm@lists.01.org" , ross zwisler , Paolo Bonzini , Kevin Wolf , Nitesh Narayan Lal , xiaoguangrong eric , Haozhong Zhang , Ross Zwisler On Wed, 2017-07-26 at 14:40 -0700, Dan Williams wrote: > On Wed, Jul 26, 2017 at 2:27 PM, Rik van Riel > wrote: > > On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote: > > > >=20 > > >=20 > > > Just want to summarize here(high level): > > >=20 > > > This will require implementing new 'virtio-pmem' device which > > > presents > > > a DAX address range(like pmem) to guest with read/write(direct > > > access) > > > & device flush functionality. Also, qemu should implement > > > corresponding > > > support for flush using virtio. > > >=20 > >=20 > > Alternatively, the existing pmem code, with > > a flush-only block device on the side, which > > is somehow associated with the pmem device. > >=20 > > I wonder which alternative leads to the least > > code duplication, and the least maintenance > > hassle going forward. >=20 > I'd much prefer to have another driver. I.e. a driver that refactors > out some common pmem details into a shared object and can attach to > ND_DEVICE_NAMESPACE_{IO,PMEM}. A control device on the side seems > like > a recipe for confusion. At that point, would it make sense to expose these special virtio-pmem areas to the guest in a slightly different way, so the regions that need virtio flushing are not bound by the regular driver, and the regular driver can continue to work for memory regions that are backed by actual pmem in the host? > With a $new_driver in hand you can just do: >=20 > =C2=A0=C2=A0=C2=A0modprobe $new_driver > =C2=A0=C2=A0=C2=A0echo $namespace > /sys/bus/nd/drivers/nd_pmem/unbind > =C2=A0=C2=A0=C2=A0echo $namespace > /sys/bus/nd/drivers/$new_driver/new= _id > =C2=A0=C2=A0=C2=A0echo $namespace > /sys/bus/nd/drivers/$new_driver/bin= d >=20 > ...and the guest can arrange for $new_driver to be the default, so > you > don't need to do those steps each boot of the VM, by doing: >=20 > =C2=A0=C2=A0=C2=A0=C2=A0echo "blacklist nd_pmem" > /etc/modprobe.d/virt= -dax-flush.conf > =C2=A0=C2=A0=C2=A0=C2=A0echo "alias nd:t4* $new_driver" >> /etc/modprob= e.d/virt-dax- > flush.conf > =C2=A0=C2=A0=C2=A0=C2=A0echo "alias nd:t5* $new_driver" >> /etc/modprob= e.d/virt-dax- > flush.conf