From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aneesh Kumar K.V" Date: Thu, 29 Apr 2021 16:44:23 +0000 Subject: Re: [PATCH v4 0/3] nvdimm: Enable sync-dax property for nvdimm Message-Id: <433e352d-5341-520c-5c57-79650277a719@linux.ibm.com> List-Id: References: <161966810162.652.13723419108625443430.stgit@17be908f7c1c> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Stefan Hajnoczi , Shivaprasad G Bhat Cc: david@gibson.dropbear.id.au, groug@kaod.org, qemu-ppc@nongnu.org, ehabkost@redhat.com, marcel.apfelbaum@gmail.com, mst@redhat.com, imammedo@redhat.com, xiaoguangrong.eric@gmail.com, peter.maydell@linaro.org, eblake@redhat.com, qemu-arm@nongnu.org, richard.henderson@linaro.org, pbonzini@redhat.com, haozhong.zhang@intel.com, shameerali.kolothum.thodi@huawei.com, kwangwoo.lee@sk.com, armbru@redhat.com, qemu-devel@nongnu.org, linux-nvdimm@lists.01.org, kvm-ppc@vger.kernel.org, shivaprasadbhat@gmail.com, bharata@linux.vnet.ibm.com On 4/29/21 9:25 PM, Stefan Hajnoczi wrote: > On Wed, Apr 28, 2021 at 11:48:21PM -0400, Shivaprasad G Bhat wrote: >> The nvdimm devices are expected to ensure write persistence during power >> failure kind of scenarios. >> >> The libpmem has architecture specific instructions like dcbf on POWER >> to flush the cache data to backend nvdimm device during normal writes >> followed by explicit flushes if the backend devices are not synchronous >> DAX capable. >> >> Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest >> and the subsequent flush doesn't traslate to actual flush to the backend >> file on the host in case of file backed v-nvdimms. This is addressed by >> virtio-pmem in case of x86_64 by making explicit flushes translating to >> fsync at qemu. >> >> On SPAPR, the issue is addressed by adding a new hcall to >> request for an explicit flush from the guest ndctl driver when the backend >> nvdimm cannot ensure write persistence with dcbf alone. So, the approach >> here is to convey when the hcall flush is required in a device tree >> property. The guest makes the hcall when the property is found, instead >> of relying on dcbf. > > Sorry, I'm not very familiar with SPAPR. Why add a hypercall when the > virtio-nvdimm device already exists? > On virtualized ppc64 platforms, guests use papr_scm.ko kernel drive for persistent memory support. This was done such that we can use one kernel driver to support persistent memory with multiple hypervisors. To avoid supporting multiple drivers in the guest, -device nvdimm Qemu command-line results in Qemu using PAPR SCM backend. What this patch series does is to make sure we expose the correct synchronous fault support, when we back such nvdimm device with a file. The existing PAPR SCM backend enables persistent memory support with the help of multiple hypercall. #define H_SCM_READ_METADATA 0x3E4 #define H_SCM_WRITE_METADATA 0x3E8 #define H_SCM_BIND_MEM 0x3EC #define H_SCM_UNBIND_MEM 0x3F0 #define H_SCM_UNBIND_ALL 0x3FC Most of them are already implemented in Qemu. This patch series implements H_SCM_FLUSH hypercall. -aneesh