From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"x86@kernel.org" <x86@kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: [GIT PULL] libnvdimm for 4.13
Date: Fri, 7 Jul 2017 00:22:38 +0000 [thread overview]
Message-ID: <1499386957.6081.29.camel@intel.com> (raw)
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13
to receive, libnvdimm updates for the latest ACPI and UEFI
specifications. This pull request also includes new 'struct
dax_operations' enabling to undo the abuse [1] of copy_user_nocache()
for copy operations to pmem. The dax work originally missed 4.12 to
address concerns raised by Al.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
All of the commits in this pull request have appeared in one or more
-next releases with no errors reported, however, Stephen did report a
late merge conflict between nvdimm.git and the vfs.git tree. Stephen's
merge resolution is here: http://marc.info/?l=linux-kernel&m=1499064115
07301&w=2, but to match Al's changes we appear to also need the
incremental change below.
Please pull, I believe any straggling _flushcache() feedback at this
point can be fixed up post -rc1. I include commit 0aed55af8834 "x86,
uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass
operations" at the end of this message for reference.
---
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2f46f8d4b508..073bb1feb0d0 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -97,6 +97,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
static __always_inline __must_check
@@ -151,7 +152,14 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
* IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
* destination is flushed from the cache on return.
*/
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+static __always_inline __must_check
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ if (unlikely(!check_copy_size(addr, bytes, false)))
+ return bytes;
+ else
+ return _copy_from_iter_flushcache(addr, bytes, i);
+}
#else
static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
struct iov_iter *i)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ee82300d98b9..0d18ede56a36 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -642,7 +642,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL(_copy_from_iter_nocache);
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(i->type & ITER_PIPE)) {
@@ -660,7 +660,7 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
return bytes;
}
-EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
#endif
bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
---
The following changes since commit 87085ff2e90ecfa91f8bb0cb0ce19ea661bd6f83:
thermal: int340x_thermal: fix compile after the UUID API switch (2017-06-09 16:37:31 +0200)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13
for you to fetch changes up to 9d92573fff3ec70785ef1815cc80573f70e7a921:
Merge branch 'for-4.13/dax' into libnvdimm-for-next (2017-07-03 16:54:58 -0700)
----------------------------------------------------------------
libnvdimm for 4.13
* Introduce the _flushcache() family of memory copy helpers and use them
for persistent memory write operations on x86. The _flushcache()
semantic indicates that the cache is either bypassed for the copy
operation (movnt) or any lines dirtied by the copy operation are
written back (clwb, clflushopt, or clflush).
* Extend dax_operations with ->copy_from_iter() and ->flush()
operations. These operations and other infrastructure updates allow
all persistent memory specific dax functionality to be pushed into
libnvdimm and the pmem driver directly. It also allows dax-specific
sysfs attributes to be linked to a host device, for example:
/sys/block/pmem0/dax/write_cache
* Add support for the new NVDIMM platform/firmware mechanisms introduced
in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace
label format, extensions to the address-range-scrub command set, new
error injection commands, and a new BTT (block-translation-table)
layout. These updates support inter-OS and pre-OS compatibility.
* Fix a longstanding memory corruption bug in nfit_test.
* Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
capable.
* Miscellaneous fixes and small updates across libnvdimm and the nfit
driver.
Acknowledgements that came after the branch was pushed:
commit 6aa734a2f38e "libnvdimm, region, pmem: fix 'badblocks'
sysfs_get_dirent() reference lifetime"
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
----------------------------------------------------------------
Arvind Yadav (1):
acpi, nfit: constify *_attribute_group
Dan Williams (29):
x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
dm: add ->copy_from_iter() dax operation support
libnvdimm, label: add v1.2 nvdimm label definitions
libnvdimm, label: add v1.2 interleave-set-cookie algorithm
libnvdimm, label: honor the lba size specified in v1.2 labels
libnvdimm, label: populate the type_guid property for v1.2 namespaces
libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces
libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces
libnvdimm, label: add v1.2 label checksum support
libnvdimm, label: add address abstraction identifiers
libnvdimm, label: switch to using v1.2 labels by default
filesystem-dax: convert to dax_copy_from_iter()
dax, pmem: introduce an optional 'flush' dax_operation
dm: add ->flush() dax operation support
filesystem-dax: convert to dax_flush()
x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
x86, dax, libnvdimm: remove wb_cache_pmem() indirection
x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm
x86, libnvdimm, pmem: remove global pmem api
libnvdimm, pmem: fix persistence warning
libnvdimm, nfit: enable support for volatile ranges
dax: remove default copy_from_iter fallback
dax: convert to bitmask for flags
libnvdimm, pmem, dax: export a cache control attribute
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
acpi, nfit: quiet invalid block-aperture-region warnings
libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
libnvdimm, namespace: record 'lbasize' for pmem namespaces
Merge branch 'for-4.13/dax' into libnvdimm-for-next
Jerry Hoemann (5):
libnvdimm: passthru functions clear to send
acpi, nfit: Enable DSM pass thru for root functions.
libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
acpi, nfit: Show bus_dsm_mask in sysfs
libnvdimm: New ACPI 6.2 DSM functions
Toshi Kani (3):
libnvdimm, pmem: Add sysfs notifications to badblocks
acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2
acpi/nfit: Issue Start ARS to retrieve existing records
Vishal Verma (4):
libnvdimm, btt: BTT updates for UEFI 2.7 format
libnvdimm, btt: fix btt_rw_page not returning errors
libnvdimm: fix the clear-error check in nsio_rw_bytes
libnvdimm, btt: convert some info messages to warn/err
Yasunori Goto (1):
tools/testing/nvdimm: fix nfit_test buffer overflow
MAINTAINERS | 4 +-
arch/powerpc/sysdev/axonram.c | 8 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/pmem.h | 136 ------------------
arch/x86/include/asm/string_64.h | 5 +
arch/x86/include/asm/uaccess_64.h | 11 ++
arch/x86/lib/usercopy_64.c | 134 ++++++++++++++++++
arch/x86/mm/pageattr.c | 6 +
drivers/acpi/nfit/core.c | 167 ++++++++++++++++++----
drivers/acpi/nfit/mce.c | 2 +-
drivers/acpi/nfit/nfit.h | 4 +-
drivers/block/brd.c | 8 ++
drivers/dax/super.c | 118 +++++++++++++++-
drivers/md/dm-linear.c | 30 ++++
drivers/md/dm-stripe.c | 40 ++++++
drivers/md/dm.c | 45 ++++++
drivers/nvdimm/btt.c | 45 ++++--
drivers/nvdimm/btt.h | 2 +
drivers/nvdimm/btt_devs.c | 54 +++++++-
drivers/nvdimm/bus.c | 15 +-
drivers/nvdimm/claim.c | 38 ++++-
drivers/nvdimm/core.c | 5 +-
drivers/nvdimm/dax_devs.c | 10 +-
drivers/nvdimm/dimm_devs.c | 10 +-
drivers/nvdimm/label.c | 251 +++++++++++++++++++++++++++++----
drivers/nvdimm/label.h | 21 ++-
drivers/nvdimm/namespace_devs.c | 282 ++++++++++++++++++++++++++++++++------
drivers/nvdimm/nd-core.h | 9 ++
drivers/nvdimm/nd.h | 17 ++-
drivers/nvdimm/pfn_devs.c | 12 +-
drivers/nvdimm/pmem.c | 63 ++++++++-
drivers/nvdimm/pmem.h | 15 ++
drivers/nvdimm/region.c | 17 ++-
drivers/nvdimm/region_devs.c | 88 ++++++++----
drivers/s390/block/dcssblk.c | 8 ++
fs/dax.c | 9 +-
include/linux/dax.h | 12 ++
include/linux/device-mapper.h | 6 +
include/linux/libnvdimm.h | 11 +-
include/linux/nd.h | 13 ++
include/linux/pmem.h | 142 -------------------
include/linux/string.h | 6 +
include/linux/uio.h | 15 ++
include/uapi/linux/ndctl.h | 42 +++++-
lib/Kconfig | 3 +
lib/iov_iter.c | 22 +++
tools/testing/nvdimm/test/nfit.c | 2 +-
47 files changed, 1504 insertions(+), 460 deletions(-)
delete mode 100644 arch/x86/include/asm/pmem.h
delete mode 100644 include/linux/pmem.h
---
commit 0aed55af88345b5d673240f90e671d79662fb01e
Author: Dan Williams <dan.j.williams@intel.com>
Date: Mon May 29 12:22:50 2017 -0700
x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".
Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.
This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].
The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc7232a..bb273b2f50b5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
select ARCH_HAS_KCOV if X86_64
select ARCH_HAS_MMIO_FLUSH
select ARCH_HAS_PMEM_API if X86_64
+ select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 733bae07fb29..1f22bc277c45 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt)
return 0;
}
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
+void memcpy_flushcache(void *dst, const void *src, size_t cnt);
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9a472e..b16f6a1d8b26 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne
extern long __copy_user_nocache(void *dst, const void __user *src,
unsigned size, int zerorest);
+extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned size);
+extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+ size_t len);
+
static inline int
__copy_from_user_inatomic_nocache(void *dst, const void __user *src,
unsigned size)
@@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
return __copy_user_nocache(dst, src, size, 0);
}
+static inline int
+__copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+ kasan_check_write(dst, size);
+ return __copy_user_flushcache(dst, src, size);
+}
+
unsigned long
copy_user_handle_tail(char *to, char *from, unsigned len);
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 3b7c40a2e3e1..f42d2fd86ca3 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -7,6 +7,7 @@
*/
#include <linux/export.h>
#include <linux/uaccess.h>
+#include <linux/highmem.h>
/*
* Zero Userspace
@@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len)
clac();
return len;
}
+
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/**
+ * clean_cache_range - write back a cache range with CLWB
+ * @vaddr: virtual start address
+ * @size: number of bytes to write back
+ *
+ * Write back a cache range using the CLWB (cache line write back)
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
+ */
+static void clean_cache_range(void *addr, size_t size)
+{
+ u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
+ unsigned long clflush_mask = x86_clflush_size - 1;
+ void *vend = addr + size;
+ void *p;
+
+ for (p = (void *)((unsigned long)addr & ~clflush_mask);
+ p < vend; p += x86_clflush_size)
+ clwb(p);
+}
+
+long __copy_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+ unsigned long flushed, dest = (unsigned long) dst;
+ long rc = __copy_user_nocache(dst, src, size, 0);
+
+ /*
+ * __copy_user_nocache() uses non-temporal stores for the bulk
+ * of the transfer, but we need to manually flush if the
+ * transfer is unaligned. A cached memory copy is used when
+ * destination or size is not naturally aligned. That is:
+ * - Require 8-byte alignment when size is 8 bytes or larger.
+ * - Require 4-byte alignment when size is 4 bytes.
+ */
+ if (size < 8) {
+ if (!IS_ALIGNED(dest, 4) || size != 4)
+ clean_cache_range(dst, 1);
+ } else {
+ if (!IS_ALIGNED(dest, 8)) {
+ dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+ clean_cache_range(dst, 1);
+ }
+
+ flushed = dest - (unsigned long) dst;
+ if (size > flushed && !IS_ALIGNED(size - flushed, 8))
+ clean_cache_range(dst + size - 1, 1);
+ }
+
+ return rc;
+}
+
+void memcpy_flushcache(void *_dst, const void *_src, size_t size)
+{
+ unsigned long dest = (unsigned long) _dst;
+ unsigned long source = (unsigned long) _src;
+
+ /* cache copy and flush to align dest */
+ if (!IS_ALIGNED(dest, 8)) {
+ unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest);
+
+ memcpy((void *) dest, (void *) source, len);
+ clean_cache_range((void *) dest, len);
+ dest += len;
+ source += len;
+ size -= len;
+ if (!size)
+ return;
+ }
+
+ /* 4x8 movnti loop */
+ while (size >= 32) {
+ asm("movq (%0), %%r8\n"
+ "movq 8(%0), %%r9\n"
+ "movq 16(%0), %%r10\n"
+ "movq 24(%0), %%r11\n"
+ "movnti %%r8, (%1)\n"
+ "movnti %%r9, 8(%1)\n"
+ "movnti %%r10, 16(%1)\n"
+ "movnti %%r11, 24(%1)\n"
+ :: "r" (source), "r" (dest)
+ : "memory", "r8", "r9", "r10", "r11");
+ dest += 32;
+ source += 32;
+ size -= 32;
+ }
+
+ /* 1x8 movnti loop */
+ while (size >= 8) {
+ asm("movq (%0), %%r8\n"
+ "movnti %%r8, (%1)\n"
+ :: "r" (source), "r" (dest)
+ : "memory", "r8");
+ dest += 8;
+ source += 8;
+ size -= 8;
+ }
+
+ /* 1x4 movnti loop */
+ while (size >= 4) {
+ asm("movl (%0), %%r8d\n"
+ "movnti %%r8d, (%1)\n"
+ :: "r" (source), "r" (dest)
+ : "memory", "r8");
+ dest += 4;
+ source += 4;
+ size -= 4;
+ }
+
+ /* cache copy for remaining bytes */
+ if (size) {
+ memcpy((void *) dest, (void *) source, size);
+ clean_cache_range((void *) dest, size);
+ }
+}
+EXPORT_SYMBOL_GPL(memcpy_flushcache);
+
+void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+ size_t len)
+{
+ char *from = kmap_atomic(page);
+
+ memcpy_flushcache(to, from + offset, len);
+ kunmap_atomic(from);
+}
+#endif
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 656acb5d7166..cbd5596e7562 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1842,8 +1842,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
}
if (rw)
- memcpy_to_pmem(mmio->addr.aperture + offset,
- iobuf + copied, c);
+ memcpy_flushcache(mmio->addr.aperture + offset, iobuf + copied, c);
else {
if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH)
mmio_flush_range((void __force *)
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 7ceb5fa4f2a1..b8b9c8ca7862 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -277,7 +277,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
rc = -EIO;
}
- memcpy_to_pmem(nsio->addr + offset, buf, size);
+ memcpy_flushcache(nsio->addr + offset, buf, size);
nvdimm_flush(to_nd_region(ndns->dev.parent));
return rc;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c544d466ea51..2f3aefe565c6 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -29,6 +29,7 @@
#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/pmem.h>
+#include <linux/uio.h>
#include <linux/dax.h>
#include <linux/nd.h>
#include "pmem.h"
@@ -80,7 +81,7 @@ static void write_pmem(void *pmem_addr, struct page *page,
{
void *mem = kmap_atomic(page);
- memcpy_to_pmem(pmem_addr, mem + off, len);
+ memcpy_flushcache(pmem_addr, mem + off, len);
kunmap_atomic(mem);
}
@@ -235,8 +236,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
}
+static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+ void *addr, size_t bytes, struct iov_iter *i)
+{
+ return copy_from_iter_flushcache(addr, bytes, i);
+}
+
static const struct dax_operations pmem_dax_ops = {
.direct_access = pmem_dax_direct_access,
+ .copy_from_iter = pmem_copy_from_iter,
};
static void pmem_release_queue(void *q)
@@ -294,7 +302,8 @@ static int pmem_attach_disk(struct device *dev,
dev_set_drvdata(dev, pmem);
pmem->phys_addr = res->start;
pmem->size = resource_size(res);
- if (nvdimm_has_flush(nd_region) < 0)
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE)
+ || nvdimm_has_flush(nd_region) < 0)
dev_warn(dev, "unable to guarantee persistence of writes\n");
if (!devm_request_mem_region(dev, res->start, resource_size(res),
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b550edf2571f..985b0e11bd73 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1015,8 +1015,8 @@ void nvdimm_flush(struct nd_region *nd_region)
* The first wmb() is needed to 'sfence' all previous writes
* such that they are architecturally visible for the platform
* buffer flush. Note that we've already arranged for pmem
- * writes to avoid the cache via arch_memcpy_to_pmem(). The
- * final wmb() ensures ordering for the NVDIMM flush write.
+ * writes to avoid the cache via memcpy_flushcache(). The final
+ * wmb() ensures ordering for the NVDIMM flush write.
*/
wmb();
for (i = 0; i < nd_region->ndr_mappings; i++)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5ec1f6c47716..bbe79ed90e2b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,9 @@ struct dax_operations {
*/
long (*direct_access)(struct dax_device *, pgoff_t, long,
void **, pfn_t *);
+ /* copy_from_iter: dax-driver override for default copy_from_iter */
+ size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
+ struct iov_iter *);
};
#if IS_ENABLED(CONFIG_DAX)
diff --git a/include/linux/string.h b/include/linux/string.h
index 537918f8a98e..7439d83eaa33 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src,
return 0;
}
#endif
+#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
+static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
+{
+ memcpy(dst, src, cnt);
+}
+#endif
void *memchr_inv(const void *s, int c, size_t n);
char *strreplace(char *s, char old, char new);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index f2d36a3d3005..55cd54a0e941 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/*
+ * Note, users like pmem that depend on the stricter semantics of
+ * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
+ * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
+ * destination is flushed from the cache on return.
+ */
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+#else
+static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
+ struct iov_iter *i)
+{
+ return copy_from_iter_nocache(addr, bytes, i);
+}
+#endif
bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
size_t iov_iter_zero(size_t bytes, struct iov_iter *);
unsigned long iov_iter_alignment(const struct iov_iter *i);
diff --git a/lib/Kconfig b/lib/Kconfig
index 0c8b78a9ae2e..2d1c4b3a085c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN
config ARCH_HAS_PMEM_API
bool
+config ARCH_HAS_UACCESS_FLUSHCACHE
+ bool
+
config ARCH_HAS_MMIO_FLUSH
bool
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f835964c9485..c9a69064462f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -615,6 +615,28 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
}
EXPORT_SYMBOL(copy_from_iter_nocache);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ if (unlikely(i->type & ITER_PIPE)) {
+ WARN_ON(1);
+ return 0;
+ }
+ iterate_and_advance(i, bytes, v,
+ __copy_from_user_flushcache((to += v.iov_len) - v.iov_len,
+ v.iov_base, v.iov_len),
+ memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+ v.bv_offset, v.bv_len),
+ memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
+ v.iov_len)
+ )
+
+ return bytes;
+}
+EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+#endif
+
bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
WARNING: multiple messages have this Message-ID (diff)
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"x86@kernel.org" <x86@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: [GIT PULL] libnvdimm for 4.13
Date: Fri, 7 Jul 2017 00:22:38 +0000 [thread overview]
Message-ID: <1499386957.6081.29.camel@intel.com> (raw)
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13
to receive, libnvdimm updates for the latest ACPI and UEFI
specifications. This pull request also includes new 'struct
dax_operations' enabling to undo the abuse [1] of copy_user_nocache()
for copy operations to pmem. The dax work originally missed 4.12 to
address concerns raised by Al.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
All of the commits in this pull request have appeared in one or more
-next releases with no errors reported, however, Stephen did report a
late merge conflict between nvdimm.git and the vfs.git tree. Stephen's
merge resolution is here: http://marc.info/?l=linux-kernel&m=1499064115
07301&w=2, but to match Al's changes we appear to also need the
incremental change below.
Please pull, I believe any straggling _flushcache() feedback at this
point can be fixed up post -rc1. I include commit 0aed55af8834 "x86,
uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass
operations" at the end of this message for reference.
---
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2f46f8d4b508..073bb1feb0d0 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -97,6 +97,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
static __always_inline __must_check
@@ -151,7 +152,14 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
* IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
* destination is flushed from the cache on return.
*/
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+static __always_inline __must_check
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ if (unlikely(!check_copy_size(addr, bytes, false)))
+ return bytes;
+ else
+ return _copy_from_iter_flushcache(addr, bytes, i);
+}
#else
static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
struct iov_iter *i)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ee82300d98b9..0d18ede56a36 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -642,7 +642,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL(_copy_from_iter_nocache);
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(i->type & ITER_PIPE)) {
@@ -660,7 +660,7 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
return bytes;
}
-EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
#endif
bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
---
The following changes since commit 87085ff2e90ecfa91f8bb0cb0ce19ea661bd6f83:
thermal: int340x_thermal: fix compile after the UUID API switch (2017-06-09 16:37:31 +0200)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13
for you to fetch changes up to 9d92573fff3ec70785ef1815cc80573f70e7a921:
Merge branch 'for-4.13/dax' into libnvdimm-for-next (2017-07-03 16:54:58 -0700)
----------------------------------------------------------------
libnvdimm for 4.13
* Introduce the _flushcache() family of memory copy helpers and use them
for persistent memory write operations on x86. The _flushcache()
semantic indicates that the cache is either bypassed for the copy
operation (movnt) or any lines dirtied by the copy operation are
written back (clwb, clflushopt, or clflush).
* Extend dax_operations with ->copy_from_iter() and ->flush()
operations. These operations and other infrastructure updates allow
all persistent memory specific dax functionality to be pushed into
libnvdimm and the pmem driver directly. It also allows dax-specific
sysfs attributes to be linked to a host device, for example:
/sys/block/pmem0/dax/write_cache
* Add support for the new NVDIMM platform/firmware mechanisms introduced
in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace
label format, extensions to the address-range-scrub command set, new
error injection commands, and a new BTT (block-translation-table)
layout. These updates support inter-OS and pre-OS compatibility.
* Fix a longstanding memory corruption bug in nfit_test.
* Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
capable.
* Miscellaneous fixes and small updates across libnvdimm and the nfit
driver.
Acknowledgements that came after the branch was pushed:
commit 6aa734a2f38e "libnvdimm, region, pmem: fix 'badblocks'
sysfs_get_dirent() reference lifetime"
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
----------------------------------------------------------------
Arvind Yadav (1):
acpi, nfit: constify *_attribute_group
Dan Williams (29):
x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
dm: add ->copy_from_iter() dax operation support
libnvdimm, label: add v1.2 nvdimm label definitions
libnvdimm, label: add v1.2 interleave-set-cookie algorithm
libnvdimm, label: honor the lba size specified in v1.2 labels
libnvdimm, label: populate the type_guid property for v1.2 namespaces
libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces
libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces
libnvdimm, label: add v1.2 label checksum support
libnvdimm, label: add address abstraction identifiers
libnvdimm, label: switch to using v1.2 labels by default
filesystem-dax: convert to dax_copy_from_iter()
dax, pmem: introduce an optional 'flush' dax_operation
dm: add ->flush() dax operation support
filesystem-dax: convert to dax_flush()
x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
x86, dax, libnvdimm: remove wb_cache_pmem() indirection
x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm
x86, libnvdimm, pmem: remove global pmem api
libnvdimm, pmem: fix persistence warning
libnvdimm, nfit: enable support for volatile ranges
dax: remove default copy_from_iter fallback
dax: convert to bitmask for flags
libnvdimm, pmem, dax: export a cache control attribute
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
acpi, nfit: quiet invalid block-aperture-region warnings
libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
libnvdimm, namespace: record 'lbasize' for pmem namespaces
Merge branch 'for-4.13/dax' into libnvdimm-for-next
Jerry Hoemann (5):
libnvdimm: passthru functions clear to send
acpi, nfit: Enable DSM pass thru for root functions.
libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
acpi, nfit: Show bus_dsm_mask in sysfs
libnvdimm: New ACPI 6.2 DSM functions
Toshi Kani (3):
libnvdimm, pmem: Add sysfs notifications to badblocks
acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2
acpi/nfit: Issue Start ARS to retrieve existing records
Vishal Verma (4):
libnvdimm, btt: BTT updates for UEFI 2.7 format
libnvdimm, btt: fix btt_rw_page not returning errors
libnvdimm: fix the clear-error check in nsio_rw_bytes
libnvdimm, btt: convert some info messages to warn/err
Yasunori Goto (1):
tools/testing/nvdimm: fix nfit_test buffer overflow
MAINTAINERS | 4 +-
arch/powerpc/sysdev/axonram.c | 8 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/pmem.h | 136 ------------------
arch/x86/include/asm/string_64.h | 5 +
arch/x86/include/asm/uaccess_64.h | 11 ++
arch/x86/lib/usercopy_64.c | 134 ++++++++++++++++++
arch/x86/mm/pageattr.c | 6 +
drivers/acpi/nfit/core.c | 167 ++++++++++++++++++----
drivers/acpi/nfit/mce.c | 2 +-
drivers/acpi/nfit/nfit.h | 4 +-
drivers/block/brd.c | 8 ++
drivers/dax/super.c | 118 +++++++++++++++-
drivers/md/dm-linear.c | 30 ++++
drivers/md/dm-stripe.c | 40 ++++++
drivers/md/dm.c | 45 ++++++
drivers/nvdimm/btt.c | 45 ++++--
drivers/nvdimm/btt.h | 2 +
drivers/nvdimm/btt_devs.c | 54 +++++++-
drivers/nvdimm/bus.c | 15 +-
drivers/nvdimm/claim.c | 38 ++++-
drivers/nvdimm/core.c | 5 +-
drivers/nvdimm/dax_devs.c | 10 +-
drivers/nvdimm/dimm_devs.c | 10 +-
drivers/nvdimm/label.c | 251 +++++++++++++++++++++++++++++----
drivers/nvdimm/label.h | 21 ++-
drivers/nvdimm/namespace_devs.c | 282 ++++++++++++++++++++++++++++++++------
drivers/nvdimm/nd-core.h | 9 ++
drivers/nvdimm/nd.h | 17 ++-
drivers/nvdimm/pfn_devs.c | 12 +-
drivers/nvdimm/pmem.c | 63 ++++++++-
drivers/nvdimm/pmem.h | 15 ++
drivers/nvdimm/region.c | 17 ++-
drivers/nvdimm/region_devs.c | 88 ++++++++----
drivers/s390/block/dcssblk.c | 8 ++
fs/dax.c | 9 +-
include/linux/dax.h | 12 ++
include/linux/device-mapper.h | 6 +
include/linux/libnvdimm.h | 11 +-
include/linux/nd.h | 13 ++
include/linux/pmem.h | 142 -------------------
include/linux/string.h | 6 +
include/linux/uio.h | 15 ++
include/uapi/linux/ndctl.h | 42 +++++-
lib/Kconfig | 3 +
lib/iov_iter.c | 22 +++
tools/testing/nvdimm/test/nfit.c | 2 +-
47 files changed, 1504 insertions(+), 460 deletions(-)
delete mode 100644 arch/x86/include/asm/pmem.h
delete mode 100644 include/linux/pmem.h
---
commit 0aed55af88345b5d673240f90e671d79662fb01e
Author: Dan Williams <dan.j.williams@intel.com>
Date: Mon May 29 12:22:50 2017 -0700
x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".
Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.
This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].
The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc7232a..bb273b2f50b5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
select ARCH_HAS_KCOV if X86_64
select ARCH_HAS_MMIO_FLUSH
select ARCH_HAS_PMEM_API if X86_64
+ select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 733bae07fb29..1f22bc277c45 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt)
return 0;
}
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
+void memcpy_flushcache(void *dst, const void *src, size_t cnt);
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9a472e..b16f6a1d8b26 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne
extern long __copy_user_nocache(void *dst, const void __user *src,
unsigned size, int zerorest);
+extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned size);
+extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+ size_t len);
+
static inline int
__copy_from_user_inatomic_nocache(void *dst, const void __user *src,
unsigned size)
@@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
return __copy_user_nocache(dst, src, size, 0);
}
+static inline int
+__copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+ kasan_check_write(dst, size);
+ return __copy_user_flushcache(dst, src, size);
+}
+
unsigned long
copy_user_handle_tail(char *to, char *from, unsigned len);
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 3b7c40a2e3e1..f42d2fd86ca3 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -7,6 +7,7 @@
*/
#include <linux/export.h>
#include <linux/uaccess.h>
+#include <linux/highmem.h>
/*
* Zero Userspace
@@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len)
clac();
return len;
}
+
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/**
+ * clean_cache_range - write back a cache range with CLWB
+ * @vaddr: virtual start address
+ * @size: number of bytes to write back
+ *
+ * Write back a cache range using the CLWB (cache line write back)
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
+ */
+static void clean_cache_range(void *addr, size_t size)
+{
+ u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
+ unsigned long clflush_mask = x86_clflush_size - 1;
+ void *vend = addr + size;
+ void *p;
+
+ for (p = (void *)((unsigned long)addr & ~clflush_mask);
+ p < vend; p += x86_clflush_size)
+ clwb(p);
+}
+
+long __copy_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+ unsigned long flushed, dest = (unsigned long) dst;
+ long rc = __copy_user_nocache(dst, src, size, 0);
+
+ /*
+ * __copy_user_nocache() uses non-temporal stores for the bulk
+ * of the transfer, but we need to manually flush if the
+ * transfer is unaligned. A cached memory copy is used when
+ * destination or size is not naturally aligned. That is:
+ * - Require 8-byte alignment when size is 8 bytes or larger.
+ * - Require 4-byte alignment when size is 4 bytes.
+ */
+ if (size < 8) {
+ if (!IS_ALIGNED(dest, 4) || size != 4)
+ clean_cache_range(dst, 1);
+ } else {
+ if (!IS_ALIGNED(dest, 8)) {
+ dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+ clean_cache_range(dst, 1);
+ }
+
+ flushed = dest - (unsigned long) dst;
+ if (size > flushed && !IS_ALIGNED(size - flushed, 8))
+ clean_cache_range(dst + size - 1, 1);
+ }
+
+ return rc;
+}
+
+void memcpy_flushcache(void *_dst, const void *_src, size_t size)
+{
+ unsigned long dest = (unsigned long) _dst;
+ unsigned long source = (unsigned long) _src;
+
+ /* cache copy and flush to align dest */
+ if (!IS_ALIGNED(dest, 8)) {
+ unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest);
+
+ memcpy((void *) dest, (void *) source, len);
+ clean_cache_range((void *) dest, len);
+ dest += len;
+ source += len;
+ size -= len;
+ if (!size)
+ return;
+ }
+
+ /* 4x8 movnti loop */
+ while (size >= 32) {
+ asm("movq (%0), %%r8\n"
+ "movq 8(%0), %%r9\n"
+ "movq 16(%0), %%r10\n"
+ "movq 24(%0), %%r11\n"
+ "movnti %%r8, (%1)\n"
+ "movnti %%r9, 8(%1)\n"
+ "movnti %%r10, 16(%1)\n"
+ "movnti %%r11, 24(%1)\n"
+ :: "r" (source), "r" (dest)
+ : "memory", "r8", "r9", "r10", "r11");
+ dest += 32;
+ source += 32;
+ size -= 32;
+ }
+
+ /* 1x8 movnti loop */
+ while (size >= 8) {
+ asm("movq (%0), %%r8\n"
+ "movnti %%r8, (%1)\n"
+ :: "r" (source), "r" (dest)
+ : "memory", "r8");
+ dest += 8;
+ source += 8;
+ size -= 8;
+ }
+
+ /* 1x4 movnti loop */
+ while (size >= 4) {
+ asm("movl (%0), %%r8d\n"
+ "movnti %%r8d, (%1)\n"
+ :: "r" (source), "r" (dest)
+ : "memory", "r8");
+ dest += 4;
+ source += 4;
+ size -= 4;
+ }
+
+ /* cache copy for remaining bytes */
+ if (size) {
+ memcpy((void *) dest, (void *) source, size);
+ clean_cache_range((void *) dest, size);
+ }
+}
+EXPORT_SYMBOL_GPL(memcpy_flushcache);
+
+void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+ size_t len)
+{
+ char *from = kmap_atomic(page);
+
+ memcpy_flushcache(to, from + offset, len);
+ kunmap_atomic(from);
+}
+#endif
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 656acb5d7166..cbd5596e7562 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1842,8 +1842,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
}
if (rw)
- memcpy_to_pmem(mmio->addr.aperture + offset,
- iobuf + copied, c);
+ memcpy_flushcache(mmio->addr.aperture + offset, iobuf + copied, c);
else {
if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH)
mmio_flush_range((void __force *)
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 7ceb5fa4f2a1..b8b9c8ca7862 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -277,7 +277,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
rc = -EIO;
}
- memcpy_to_pmem(nsio->addr + offset, buf, size);
+ memcpy_flushcache(nsio->addr + offset, buf, size);
nvdimm_flush(to_nd_region(ndns->dev.parent));
return rc;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c544d466ea51..2f3aefe565c6 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -29,6 +29,7 @@
#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/pmem.h>
+#include <linux/uio.h>
#include <linux/dax.h>
#include <linux/nd.h>
#include "pmem.h"
@@ -80,7 +81,7 @@ static void write_pmem(void *pmem_addr, struct page *page,
{
void *mem = kmap_atomic(page);
- memcpy_to_pmem(pmem_addr, mem + off, len);
+ memcpy_flushcache(pmem_addr, mem + off, len);
kunmap_atomic(mem);
}
@@ -235,8 +236,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
}
+static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+ void *addr, size_t bytes, struct iov_iter *i)
+{
+ return copy_from_iter_flushcache(addr, bytes, i);
+}
+
static const struct dax_operations pmem_dax_ops = {
.direct_access = pmem_dax_direct_access,
+ .copy_from_iter = pmem_copy_from_iter,
};
static void pmem_release_queue(void *q)
@@ -294,7 +302,8 @@ static int pmem_attach_disk(struct device *dev,
dev_set_drvdata(dev, pmem);
pmem->phys_addr = res->start;
pmem->size = resource_size(res);
- if (nvdimm_has_flush(nd_region) < 0)
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE)
+ || nvdimm_has_flush(nd_region) < 0)
dev_warn(dev, "unable to guarantee persistence of writes\n");
if (!devm_request_mem_region(dev, res->start, resource_size(res),
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b550edf2571f..985b0e11bd73 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1015,8 +1015,8 @@ void nvdimm_flush(struct nd_region *nd_region)
* The first wmb() is needed to 'sfence' all previous writes
* such that they are architecturally visible for the platform
* buffer flush. Note that we've already arranged for pmem
- * writes to avoid the cache via arch_memcpy_to_pmem(). The
- * final wmb() ensures ordering for the NVDIMM flush write.
+ * writes to avoid the cache via memcpy_flushcache(). The final
+ * wmb() ensures ordering for the NVDIMM flush write.
*/
wmb();
for (i = 0; i < nd_region->ndr_mappings; i++)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5ec1f6c47716..bbe79ed90e2b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,9 @@ struct dax_operations {
*/
long (*direct_access)(struct dax_device *, pgoff_t, long,
void **, pfn_t *);
+ /* copy_from_iter: dax-driver override for default copy_from_iter */
+ size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
+ struct iov_iter *);
};
#if IS_ENABLED(CONFIG_DAX)
diff --git a/include/linux/string.h b/include/linux/string.h
index 537918f8a98e..7439d83eaa33 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src,
return 0;
}
#endif
+#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
+static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
+{
+ memcpy(dst, src, cnt);
+}
+#endif
void *memchr_inv(const void *s, int c, size_t n);
char *strreplace(char *s, char old, char new);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index f2d36a3d3005..55cd54a0e941 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/*
+ * Note, users like pmem that depend on the stricter semantics of
+ * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
+ * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
+ * destination is flushed from the cache on return.
+ */
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+#else
+static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
+ struct iov_iter *i)
+{
+ return copy_from_iter_nocache(addr, bytes, i);
+}
+#endif
bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
size_t iov_iter_zero(size_t bytes, struct iov_iter *);
unsigned long iov_iter_alignment(const struct iov_iter *i);
diff --git a/lib/Kconfig b/lib/Kconfig
index 0c8b78a9ae2e..2d1c4b3a085c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN
config ARCH_HAS_PMEM_API
bool
+config ARCH_HAS_UACCESS_FLUSHCACHE
+ bool
+
config ARCH_HAS_MMIO_FLUSH
bool
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f835964c9485..c9a69064462f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -615,6 +615,28 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
}
EXPORT_SYMBOL(copy_from_iter_nocache);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ if (unlikely(i->type & ITER_PIPE)) {
+ WARN_ON(1);
+ return 0;
+ }
+ iterate_and_advance(i, bytes, v,
+ __copy_from_user_flushcache((to += v.iov_len) - v.iov_len,
+ v.iov_base, v.iov_len),
+ memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+ v.bv_offset, v.bv_len),
+ memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
+ v.iov_len)
+ )
+
+ return bytes;
+}
+EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+#endif
+
bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next reply other threads:[~2017-07-07 0:22 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-07 0:22 Williams, Dan J [this message]
2017-07-07 0:22 ` [GIT PULL] libnvdimm for 4.13 Williams, Dan J
2017-07-07 0:34 ` Al Viro
2017-07-07 0:34 ` Al Viro
2017-07-07 0:34 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1499386957.6081.29.camel@intel.com \
--to=dan.j.williams@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.