All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: [GIT PULL] libnvdimm for 4.13
Date: Fri, 7 Jul 2017 00:22:38 +0000	[thread overview]
Message-ID: <1499386957.6081.29.camel@intel.com> (raw)

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13

to receive, libnvdimm updates for the latest ACPI and UEFI
specifications. This pull request also includes new 'struct
dax_operations' enabling to undo the abuse [1] of copy_user_nocache()
for copy operations to pmem. The dax work originally missed 4.12 to
address concerns raised by Al.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html

All of the commits in this pull request have appeared in one or more
-next releases with no errors reported, however, Stephen did report a
late merge conflict between nvdimm.git and the vfs.git tree. Stephen's
merge resolution is here: http://marc.info/?l=linux-kernel&m=1499064115
07301&w=2, but to match Al's changes we appear to also need the
incremental change below.

Please pull, I believe any straggling _flushcache() feedback at this
point can be fixed up post -rc1. I include commit 0aed55af8834 "x86,
uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass
operations" at the end of this message for reference.

---

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2f46f8d4b508..073bb1feb0d0 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -97,6 +97,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
 bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
 
 static __always_inline __must_check
@@ -151,7 +152,14 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
  * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
  * destination is flushed from the cache on return.
  */
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+static __always_inline __must_check
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+	if (unlikely(!check_copy_size(addr, bytes, false)))
+		return bytes;
+	else
+		return _copy_from_iter_flushcache(addr, bytes, i);
+}
 #else
 static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
 				       struct iov_iter *i)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ee82300d98b9..0d18ede56a36 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -642,7 +642,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL(_copy_from_iter_nocache);
 
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;
 	if (unlikely(i->type & ITER_PIPE)) {
@@ -660,7 +660,7 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 
 	return bytes;
 }
-EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
 #endif
 
 bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)

---

The following changes since commit 87085ff2e90ecfa91f8bb0cb0ce19ea661bd6f83:

  thermal: int340x_thermal: fix compile after the UUID API switch (2017-06-09 16:37:31 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13

for you to fetch changes up to 9d92573fff3ec70785ef1815cc80573f70e7a921:

  Merge branch 'for-4.13/dax' into libnvdimm-for-next (2017-07-03 16:54:58 -0700)

----------------------------------------------------------------
libnvdimm for 4.13

* Introduce the _flushcache() family of memory copy helpers and use them
  for persistent memory write operations on x86. The _flushcache()
  semantic indicates that the cache is either bypassed for the copy
  operation (movnt) or any lines dirtied by the copy operation are
  written back (clwb, clflushopt, or clflush).

* Extend dax_operations with ->copy_from_iter() and ->flush()
  operations. These operations and other infrastructure updates allow
  all persistent memory specific dax functionality to be pushed into
  libnvdimm and the pmem driver directly. It also allows dax-specific
  sysfs attributes to be linked to a host device, for example:
      /sys/block/pmem0/dax/write_cache

* Add support for the new NVDIMM platform/firmware mechanisms introduced
  in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace
  label format, extensions to the address-range-scrub command set, new
  error injection commands, and a new BTT (block-translation-table)
  layout. These updates support inter-OS and pre-OS compatibility.

* Fix a longstanding memory corruption bug in nfit_test.

* Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
  capable.

* Miscellaneous fixes and small updates across libnvdimm and the nfit
  driver.

Acknowledgements that came after the branch was pushed:

commit 6aa734a2f38e "libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime"
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

----------------------------------------------------------------
Arvind Yadav (1):
      acpi, nfit: constify *_attribute_group

Dan Williams (29):
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
      dm: add ->copy_from_iter() dax operation support
      libnvdimm, label: add v1.2 nvdimm label definitions
      libnvdimm, label: add v1.2 interleave-set-cookie algorithm
      libnvdimm, label: honor the lba size specified in v1.2 labels
      libnvdimm, label: populate the type_guid property for v1.2 namespaces
      libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces
      libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces
      libnvdimm, label: add v1.2 label checksum support
      libnvdimm, label: add address abstraction identifiers
      libnvdimm, label: switch to using v1.2 labels by default
      filesystem-dax: convert to dax_copy_from_iter()
      dax, pmem: introduce an optional 'flush' dax_operation
      dm: add ->flush() dax operation support
      filesystem-dax: convert to dax_flush()
      x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
      x86, dax, libnvdimm: remove wb_cache_pmem() indirection
      x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm
      x86, libnvdimm, pmem: remove global pmem api
      libnvdimm, pmem: fix persistence warning
      libnvdimm, nfit: enable support for volatile ranges
      dax: remove default copy_from_iter fallback
      dax: convert to bitmask for flags
      libnvdimm, pmem, dax: export a cache control attribute
      libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
      acpi, nfit: quiet invalid block-aperture-region warnings
      libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
      libnvdimm, namespace: record 'lbasize' for pmem namespaces
      Merge branch 'for-4.13/dax' into libnvdimm-for-next

Jerry Hoemann (5):
      libnvdimm: passthru functions clear to send
      acpi, nfit: Enable DSM pass thru for root functions.
      libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
      acpi, nfit: Show bus_dsm_mask in sysfs
      libnvdimm: New ACPI 6.2 DSM functions

Toshi Kani (3):
      libnvdimm, pmem: Add sysfs notifications to badblocks
      acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2
      acpi/nfit: Issue Start ARS to retrieve existing records

Vishal Verma (4):
      libnvdimm, btt: BTT updates for UEFI 2.7 format
      libnvdimm, btt: fix btt_rw_page not returning errors
      libnvdimm: fix the clear-error check in nsio_rw_bytes
      libnvdimm, btt: convert some info messages to warn/err

Yasunori Goto (1):
      tools/testing/nvdimm: fix nfit_test buffer overflow

 MAINTAINERS                       |   4 +-
 arch/powerpc/sysdev/axonram.c     |   8 ++
 arch/x86/Kconfig                  |   1 +
 arch/x86/include/asm/pmem.h       | 136 ------------------
 arch/x86/include/asm/string_64.h  |   5 +
 arch/x86/include/asm/uaccess_64.h |  11 ++
 arch/x86/lib/usercopy_64.c        | 134 ++++++++++++++++++
 arch/x86/mm/pageattr.c            |   6 +
 drivers/acpi/nfit/core.c          | 167 ++++++++++++++++++----
 drivers/acpi/nfit/mce.c           |   2 +-
 drivers/acpi/nfit/nfit.h          |   4 +-
 drivers/block/brd.c               |   8 ++
 drivers/dax/super.c               | 118 +++++++++++++++-
 drivers/md/dm-linear.c            |  30 ++++
 drivers/md/dm-stripe.c            |  40 ++++++
 drivers/md/dm.c                   |  45 ++++++
 drivers/nvdimm/btt.c              |  45 ++++--
 drivers/nvdimm/btt.h              |   2 +
 drivers/nvdimm/btt_devs.c         |  54 +++++++-
 drivers/nvdimm/bus.c              |  15 +-
 drivers/nvdimm/claim.c            |  38 ++++-
 drivers/nvdimm/core.c             |   5 +-
 drivers/nvdimm/dax_devs.c         |  10 +-
 drivers/nvdimm/dimm_devs.c        |  10 +-
 drivers/nvdimm/label.c            | 251 +++++++++++++++++++++++++++++----
 drivers/nvdimm/label.h            |  21 ++-
 drivers/nvdimm/namespace_devs.c   | 282 ++++++++++++++++++++++++++++++++------
 drivers/nvdimm/nd-core.h          |   9 ++
 drivers/nvdimm/nd.h               |  17 ++-
 drivers/nvdimm/pfn_devs.c         |  12 +-
 drivers/nvdimm/pmem.c             |  63 ++++++++-
 drivers/nvdimm/pmem.h             |  15 ++
 drivers/nvdimm/region.c           |  17 ++-
 drivers/nvdimm/region_devs.c      |  88 ++++++++----
 drivers/s390/block/dcssblk.c      |   8 ++
 fs/dax.c                          |   9 +-
 include/linux/dax.h               |  12 ++
 include/linux/device-mapper.h     |   6 +
 include/linux/libnvdimm.h         |  11 +-
 include/linux/nd.h                |  13 ++
 include/linux/pmem.h              | 142 -------------------
 include/linux/string.h            |   6 +
 include/linux/uio.h               |  15 ++
 include/uapi/linux/ndctl.h        |  42 +++++-
 lib/Kconfig                       |   3 +
 lib/iov_iter.c                    |  22 +++
 tools/testing/nvdimm/test/nfit.c  |   2 +-
 47 files changed, 1504 insertions(+), 460 deletions(-)
 delete mode 100644 arch/x86/include/asm/pmem.h
 delete mode 100644 include/linux/pmem.h

---

commit 0aed55af88345b5d673240f90e671d79662fb01e
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Mon May 29 12:22:50 2017 -0700

    x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
    
    The pmem driver has a need to transfer data with a persistent memory
    destination and be able to rely on the fact that the destination writes are not
    cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
    (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
    to ensure data-writes have reached a power-fail-safe zone in the platform. The
    fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
    around and fence previous writes with an "sfence".
    
    Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
    memcpy_flushcache, that guarantee that the destination buffer is not dirty in
    the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
    will be used to replace the "pmem api" (include/linux/pmem.h +
    arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
    and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
    config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
    otherwise.
    
    This is meant to satisfy the concern from Linus that if a driver wants to do
    something beyond the normal nocache semantics it should be something private to
    that driver [1], and Al's concern that anything uaccess related belongs with
    the rest of the uaccess code [2].
    
    The first consumer of this interface is a new 'copy_from_iter' dax operation so
    that pmem can inject cache maintenance operations without imposing this
    overhead on other dax-capable drivers.
    
    [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
    [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
    
    Cc: <x86@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Toshi Kani <toshi.kani@hpe.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc7232a..bb273b2f50b5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
 	select ARCH_HAS_KCOV			if X86_64
 	select ARCH_HAS_MMIO_FLUSH
 	select ARCH_HAS_PMEM_API		if X86_64
+	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 733bae07fb29..1f22bc277c45 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt)
 	return 0;
 }
 
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
+void memcpy_flushcache(void *dst, const void *src, size_t cnt);
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9a472e..b16f6a1d8b26 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne
 extern long __copy_user_nocache(void *dst, const void __user *src,
 				unsigned size, int zerorest);
 
+extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned size);
+extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+			   size_t len);
+
 static inline int
 __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
 				  unsigned size)
@@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
 	return __copy_user_nocache(dst, src, size, 0);
 }
 
+static inline int
+__copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+	kasan_check_write(dst, size);
+	return __copy_user_flushcache(dst, src, size);
+}
+
 unsigned long
 copy_user_handle_tail(char *to, char *from, unsigned len);
 
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 3b7c40a2e3e1..f42d2fd86ca3 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -7,6 +7,7 @@
  */
 #include <linux/export.h>
 #include <linux/uaccess.h>
+#include <linux/highmem.h>
 
 /*
  * Zero Userspace
@@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len)
 	clac();
 	return len;
 }
+
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/**
+ * clean_cache_range - write back a cache range with CLWB
+ * @vaddr:	virtual start address
+ * @size:	number of bytes to write back
+ *
+ * Write back a cache range using the CLWB (cache line write back)
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
+ */
+static void clean_cache_range(void *addr, size_t size)
+{
+	u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
+	unsigned long clflush_mask = x86_clflush_size - 1;
+	void *vend = addr + size;
+	void *p;
+
+	for (p = (void *)((unsigned long)addr & ~clflush_mask);
+	     p < vend; p += x86_clflush_size)
+		clwb(p);
+}
+
+long __copy_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+	unsigned long flushed, dest = (unsigned long) dst;
+	long rc = __copy_user_nocache(dst, src, size, 0);
+
+	/*
+	 * __copy_user_nocache() uses non-temporal stores for the bulk
+	 * of the transfer, but we need to manually flush if the
+	 * transfer is unaligned. A cached memory copy is used when
+	 * destination or size is not naturally aligned. That is:
+	 *   - Require 8-byte alignment when size is 8 bytes or larger.
+	 *   - Require 4-byte alignment when size is 4 bytes.
+	 */
+	if (size < 8) {
+		if (!IS_ALIGNED(dest, 4) || size != 4)
+			clean_cache_range(dst, 1);
+	} else {
+		if (!IS_ALIGNED(dest, 8)) {
+			dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+			clean_cache_range(dst, 1);
+		}
+
+		flushed = dest - (unsigned long) dst;
+		if (size > flushed && !IS_ALIGNED(size - flushed, 8))
+			clean_cache_range(dst + size - 1, 1);
+	}
+
+	return rc;
+}
+
+void memcpy_flushcache(void *_dst, const void *_src, size_t size)
+{
+	unsigned long dest = (unsigned long) _dst;
+	unsigned long source = (unsigned long) _src;
+
+	/* cache copy and flush to align dest */
+	if (!IS_ALIGNED(dest, 8)) {
+		unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest);
+
+		memcpy((void *) dest, (void *) source, len);
+		clean_cache_range((void *) dest, len);
+		dest += len;
+		source += len;
+		size -= len;
+		if (!size)
+			return;
+	}
+
+	/* 4x8 movnti loop */
+	while (size >= 32) {
+		asm("movq    (%0), %%r8\n"
+		    "movq   8(%0), %%r9\n"
+		    "movq  16(%0), %%r10\n"
+		    "movq  24(%0), %%r11\n"
+		    "movnti  %%r8,   (%1)\n"
+		    "movnti  %%r9,  8(%1)\n"
+		    "movnti %%r10, 16(%1)\n"
+		    "movnti %%r11, 24(%1)\n"
+		    :: "r" (source), "r" (dest)
+		    : "memory", "r8", "r9", "r10", "r11");
+		dest += 32;
+		source += 32;
+		size -= 32;
+	}
+
+	/* 1x8 movnti loop */
+	while (size >= 8) {
+		asm("movq    (%0), %%r8\n"
+		    "movnti  %%r8,   (%1)\n"
+		    :: "r" (source), "r" (dest)
+		    : "memory", "r8");
+		dest += 8;
+		source += 8;
+		size -= 8;
+	}
+
+	/* 1x4 movnti loop */
+	while (size >= 4) {
+		asm("movl    (%0), %%r8d\n"
+		    "movnti  %%r8d,   (%1)\n"
+		    :: "r" (source), "r" (dest)
+		    : "memory", "r8");
+		dest += 4;
+		source += 4;
+		size -= 4;
+	}
+
+	/* cache copy for remaining bytes */
+	if (size) {
+		memcpy((void *) dest, (void *) source, size);
+		clean_cache_range((void *) dest, size);
+	}
+}
+EXPORT_SYMBOL_GPL(memcpy_flushcache);
+
+void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+		size_t len)
+{
+	char *from = kmap_atomic(page);
+
+	memcpy_flushcache(to, from + offset, len);
+	kunmap_atomic(from);
+}
+#endif
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 656acb5d7166..cbd5596e7562 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1842,8 +1842,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 		}
 
 		if (rw)
-			memcpy_to_pmem(mmio->addr.aperture + offset,
-					iobuf + copied, c);
+			memcpy_flushcache(mmio->addr.aperture + offset, iobuf + copied, c);
 		else {
 			if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH)
 				mmio_flush_range((void __force *)
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 7ceb5fa4f2a1..b8b9c8ca7862 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -277,7 +277,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 			rc = -EIO;
 	}
 
-	memcpy_to_pmem(nsio->addr + offset, buf, size);
+	memcpy_flushcache(nsio->addr + offset, buf, size);
 	nvdimm_flush(to_nd_region(ndns->dev.parent));
 
 	return rc;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c544d466ea51..2f3aefe565c6 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -29,6 +29,7 @@
 #include <linux/pfn_t.h>
 #include <linux/slab.h>
 #include <linux/pmem.h>
+#include <linux/uio.h>
 #include <linux/dax.h>
 #include <linux/nd.h>
 #include "pmem.h"
@@ -80,7 +81,7 @@ static void write_pmem(void *pmem_addr, struct page *page,
 {
 	void *mem = kmap_atomic(page);
 
-	memcpy_to_pmem(pmem_addr, mem + off, len);
+	memcpy_flushcache(pmem_addr, mem + off, len);
 	kunmap_atomic(mem);
 }
 
@@ -235,8 +236,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
 }
 
+static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	return copy_from_iter_flushcache(addr, bytes, i);
+}
+
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
+	.copy_from_iter = pmem_copy_from_iter,
 };
 
 static void pmem_release_queue(void *q)
@@ -294,7 +302,8 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (nvdimm_has_flush(nd_region) < 0)
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE)
+			|| nvdimm_has_flush(nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b550edf2571f..985b0e11bd73 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1015,8 +1015,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 	 * The first wmb() is needed to 'sfence' all previous writes
 	 * such that they are architecturally visible for the platform
 	 * buffer flush.  Note that we've already arranged for pmem
-	 * writes to avoid the cache via arch_memcpy_to_pmem().  The
-	 * final wmb() ensures ordering for the NVDIMM flush write.
+	 * writes to avoid the cache via memcpy_flushcache().  The final
+	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
 	wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5ec1f6c47716..bbe79ed90e2b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,9 @@ struct dax_operations {
 	 */
 	long (*direct_access)(struct dax_device *, pgoff_t, long,
 			void **, pfn_t *);
+	/* copy_from_iter: dax-driver override for default copy_from_iter */
+	size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
+			struct iov_iter *);
 };
 
 #if IS_ENABLED(CONFIG_DAX)
diff --git a/include/linux/string.h b/include/linux/string.h
index 537918f8a98e..7439d83eaa33 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src,
 	return 0;
 }
 #endif
+#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
+static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
+{
+	memcpy(dst, src, cnt);
+}
+#endif
 void *memchr_inv(const void *s, int c, size_t n);
 char *strreplace(char *s, char old, char new);
 
diff --git a/include/linux/uio.h b/include/linux/uio.h
index f2d36a3d3005..55cd54a0e941 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
 size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/*
+ * Note, users like pmem that depend on the stricter semantics of
+ * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
+ * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
+ * destination is flushed from the cache on return.
+ */
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+#else
+static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
+				       struct iov_iter *i)
+{
+	return copy_from_iter_nocache(addr, bytes, i);
+}
+#endif
 bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
diff --git a/lib/Kconfig b/lib/Kconfig
index 0c8b78a9ae2e..2d1c4b3a085c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN
 config ARCH_HAS_PMEM_API
 	bool
 
+config ARCH_HAS_UACCESS_FLUSHCACHE
+	bool
+
 config ARCH_HAS_MMIO_FLUSH
 	bool
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f835964c9485..c9a69064462f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -615,6 +615,28 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 }
 EXPORT_SYMBOL(copy_from_iter_nocache);
 
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+	char *to = addr;
+	if (unlikely(i->type & ITER_PIPE)) {
+		WARN_ON(1);
+		return 0;
+	}
+	iterate_and_advance(i, bytes, v,
+		__copy_from_user_flushcache((to += v.iov_len) - v.iov_len,
+					 v.iov_base, v.iov_len),
+		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len),
+		memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
+			v.iov_len)
+	)
+
+	return bytes;
+}
+EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+#endif
+
 bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;

WARNING: multiple messages have this Message-ID (diff)
From: "Williams, Dan J" <dan.j.williams@intel.com>
To: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: [GIT PULL] libnvdimm for 4.13
Date: Fri, 7 Jul 2017 00:22:38 +0000	[thread overview]
Message-ID: <1499386957.6081.29.camel@intel.com> (raw)

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13

to receive, libnvdimm updates for the latest ACPI and UEFI
specifications. This pull request also includes new 'struct
dax_operations' enabling to undo the abuse [1] of copy_user_nocache()
for copy operations to pmem. The dax work originally missed 4.12 to
address concerns raised by Al.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html

All of the commits in this pull request have appeared in one or more
-next releases with no errors reported, however, Stephen did report a
late merge conflict between nvdimm.git and the vfs.git tree. Stephen's
merge resolution is here: http://marc.info/?l=linux-kernel&m=1499064115
07301&w=2, but to match Al's changes we appear to also need the
incremental change below.

Please pull, I believe any straggling _flushcache() feedback at this
point can be fixed up post -rc1. I include commit 0aed55af8834 "x86,
uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass
operations" at the end of this message for reference.

---

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2f46f8d4b508..073bb1feb0d0 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -97,6 +97,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
 bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
 
 static __always_inline __must_check
@@ -151,7 +152,14 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
  * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
  * destination is flushed from the cache on return.
  */
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+static __always_inline __must_check
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+	if (unlikely(!check_copy_size(addr, bytes, false)))
+		return bytes;
+	else
+		return _copy_from_iter_flushcache(addr, bytes, i);
+}
 #else
 static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
 				       struct iov_iter *i)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ee82300d98b9..0d18ede56a36 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -642,7 +642,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL(_copy_from_iter_nocache);
 
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
-size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;
 	if (unlikely(i->type & ITER_PIPE)) {
@@ -660,7 +660,7 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 
 	return bytes;
 }
-EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
 #endif
 
 bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)

---

The following changes since commit 87085ff2e90ecfa91f8bb0cb0ce19ea661bd6f83:

  thermal: int340x_thermal: fix compile after the UUID API switch (2017-06-09 16:37:31 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13

for you to fetch changes up to 9d92573fff3ec70785ef1815cc80573f70e7a921:

  Merge branch 'for-4.13/dax' into libnvdimm-for-next (2017-07-03 16:54:58 -0700)

----------------------------------------------------------------
libnvdimm for 4.13

* Introduce the _flushcache() family of memory copy helpers and use them
  for persistent memory write operations on x86. The _flushcache()
  semantic indicates that the cache is either bypassed for the copy
  operation (movnt) or any lines dirtied by the copy operation are
  written back (clwb, clflushopt, or clflush).

* Extend dax_operations with ->copy_from_iter() and ->flush()
  operations. These operations and other infrastructure updates allow
  all persistent memory specific dax functionality to be pushed into
  libnvdimm and the pmem driver directly. It also allows dax-specific
  sysfs attributes to be linked to a host device, for example:
      /sys/block/pmem0/dax/write_cache

* Add support for the new NVDIMM platform/firmware mechanisms introduced
  in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace
  label format, extensions to the address-range-scrub command set, new
  error injection commands, and a new BTT (block-translation-table)
  layout. These updates support inter-OS and pre-OS compatibility.

* Fix a longstanding memory corruption bug in nfit_test.

* Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
  capable.

* Miscellaneous fixes and small updates across libnvdimm and the nfit
  driver.

Acknowledgements that came after the branch was pushed:

commit 6aa734a2f38e "libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime"
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

----------------------------------------------------------------
Arvind Yadav (1):
      acpi, nfit: constify *_attribute_group

Dan Williams (29):
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
      dm: add ->copy_from_iter() dax operation support
      libnvdimm, label: add v1.2 nvdimm label definitions
      libnvdimm, label: add v1.2 interleave-set-cookie algorithm
      libnvdimm, label: honor the lba size specified in v1.2 labels
      libnvdimm, label: populate the type_guid property for v1.2 namespaces
      libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces
      libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces
      libnvdimm, label: add v1.2 label checksum support
      libnvdimm, label: add address abstraction identifiers
      libnvdimm, label: switch to using v1.2 labels by default
      filesystem-dax: convert to dax_copy_from_iter()
      dax, pmem: introduce an optional 'flush' dax_operation
      dm: add ->flush() dax operation support
      filesystem-dax: convert to dax_flush()
      x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
      x86, dax, libnvdimm: remove wb_cache_pmem() indirection
      x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm
      x86, libnvdimm, pmem: remove global pmem api
      libnvdimm, pmem: fix persistence warning
      libnvdimm, nfit: enable support for volatile ranges
      dax: remove default copy_from_iter fallback
      dax: convert to bitmask for flags
      libnvdimm, pmem, dax: export a cache control attribute
      libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
      acpi, nfit: quiet invalid block-aperture-region warnings
      libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
      libnvdimm, namespace: record 'lbasize' for pmem namespaces
      Merge branch 'for-4.13/dax' into libnvdimm-for-next

Jerry Hoemann (5):
      libnvdimm: passthru functions clear to send
      acpi, nfit: Enable DSM pass thru for root functions.
      libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
      acpi, nfit: Show bus_dsm_mask in sysfs
      libnvdimm: New ACPI 6.2 DSM functions

Toshi Kani (3):
      libnvdimm, pmem: Add sysfs notifications to badblocks
      acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2
      acpi/nfit: Issue Start ARS to retrieve existing records

Vishal Verma (4):
      libnvdimm, btt: BTT updates for UEFI 2.7 format
      libnvdimm, btt: fix btt_rw_page not returning errors
      libnvdimm: fix the clear-error check in nsio_rw_bytes
      libnvdimm, btt: convert some info messages to warn/err

Yasunori Goto (1):
      tools/testing/nvdimm: fix nfit_test buffer overflow

 MAINTAINERS                       |   4 +-
 arch/powerpc/sysdev/axonram.c     |   8 ++
 arch/x86/Kconfig                  |   1 +
 arch/x86/include/asm/pmem.h       | 136 ------------------
 arch/x86/include/asm/string_64.h  |   5 +
 arch/x86/include/asm/uaccess_64.h |  11 ++
 arch/x86/lib/usercopy_64.c        | 134 ++++++++++++++++++
 arch/x86/mm/pageattr.c            |   6 +
 drivers/acpi/nfit/core.c          | 167 ++++++++++++++++++----
 drivers/acpi/nfit/mce.c           |   2 +-
 drivers/acpi/nfit/nfit.h          |   4 +-
 drivers/block/brd.c               |   8 ++
 drivers/dax/super.c               | 118 +++++++++++++++-
 drivers/md/dm-linear.c            |  30 ++++
 drivers/md/dm-stripe.c            |  40 ++++++
 drivers/md/dm.c                   |  45 ++++++
 drivers/nvdimm/btt.c              |  45 ++++--
 drivers/nvdimm/btt.h              |   2 +
 drivers/nvdimm/btt_devs.c         |  54 +++++++-
 drivers/nvdimm/bus.c              |  15 +-
 drivers/nvdimm/claim.c            |  38 ++++-
 drivers/nvdimm/core.c             |   5 +-
 drivers/nvdimm/dax_devs.c         |  10 +-
 drivers/nvdimm/dimm_devs.c        |  10 +-
 drivers/nvdimm/label.c            | 251 +++++++++++++++++++++++++++++----
 drivers/nvdimm/label.h            |  21 ++-
 drivers/nvdimm/namespace_devs.c   | 282 ++++++++++++++++++++++++++++++++------
 drivers/nvdimm/nd-core.h          |   9 ++
 drivers/nvdimm/nd.h               |  17 ++-
 drivers/nvdimm/pfn_devs.c         |  12 +-
 drivers/nvdimm/pmem.c             |  63 ++++++++-
 drivers/nvdimm/pmem.h             |  15 ++
 drivers/nvdimm/region.c           |  17 ++-
 drivers/nvdimm/region_devs.c      |  88 ++++++++----
 drivers/s390/block/dcssblk.c      |   8 ++
 fs/dax.c                          |   9 +-
 include/linux/dax.h               |  12 ++
 include/linux/device-mapper.h     |   6 +
 include/linux/libnvdimm.h         |  11 +-
 include/linux/nd.h                |  13 ++
 include/linux/pmem.h              | 142 -------------------
 include/linux/string.h            |   6 +
 include/linux/uio.h               |  15 ++
 include/uapi/linux/ndctl.h        |  42 +++++-
 lib/Kconfig                       |   3 +
 lib/iov_iter.c                    |  22 +++
 tools/testing/nvdimm/test/nfit.c  |   2 +-
 47 files changed, 1504 insertions(+), 460 deletions(-)
 delete mode 100644 arch/x86/include/asm/pmem.h
 delete mode 100644 include/linux/pmem.h

---

commit 0aed55af88345b5d673240f90e671d79662fb01e
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Mon May 29 12:22:50 2017 -0700

    x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
    
    The pmem driver has a need to transfer data with a persistent memory
    destination and be able to rely on the fact that the destination writes are not
    cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
    (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
    to ensure data-writes have reached a power-fail-safe zone in the platform. The
    fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
    around and fence previous writes with an "sfence".
    
    Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
    memcpy_flushcache, that guarantee that the destination buffer is not dirty in
    the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
    will be used to replace the "pmem api" (include/linux/pmem.h +
    arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
    and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
    config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
    otherwise.
    
    This is meant to satisfy the concern from Linus that if a driver wants to do
    something beyond the normal nocache semantics it should be something private to
    that driver [1], and Al's concern that anything uaccess related belongs with
    the rest of the uaccess code [2].
    
    The first consumer of this interface is a new 'copy_from_iter' dax operation so
    that pmem can inject cache maintenance operations without imposing this
    overhead on other dax-capable drivers.
    
    [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
    [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
    
    Cc: <x86@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Toshi Kani <toshi.kani@hpe.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc7232a..bb273b2f50b5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
 	select ARCH_HAS_KCOV			if X86_64
 	select ARCH_HAS_MMIO_FLUSH
 	select ARCH_HAS_PMEM_API		if X86_64
+	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 733bae07fb29..1f22bc277c45 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt)
 	return 0;
 }
 
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
+void memcpy_flushcache(void *dst, const void *src, size_t cnt);
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c5504b9a472e..b16f6a1d8b26 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne
 extern long __copy_user_nocache(void *dst, const void __user *src,
 				unsigned size, int zerorest);
 
+extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned size);
+extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+			   size_t len);
+
 static inline int
 __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
 				  unsigned size)
@@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src,
 	return __copy_user_nocache(dst, src, size, 0);
 }
 
+static inline int
+__copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+	kasan_check_write(dst, size);
+	return __copy_user_flushcache(dst, src, size);
+}
+
 unsigned long
 copy_user_handle_tail(char *to, char *from, unsigned len);
 
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 3b7c40a2e3e1..f42d2fd86ca3 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -7,6 +7,7 @@
  */
 #include <linux/export.h>
 #include <linux/uaccess.h>
+#include <linux/highmem.h>
 
 /*
  * Zero Userspace
@@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len)
 	clac();
 	return len;
 }
+
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/**
+ * clean_cache_range - write back a cache range with CLWB
+ * @vaddr:	virtual start address
+ * @size:	number of bytes to write back
+ *
+ * Write back a cache range using the CLWB (cache line write back)
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
+ */
+static void clean_cache_range(void *addr, size_t size)
+{
+	u16 x86_clflush_size = boot_cpu_data.x86_clflush_size;
+	unsigned long clflush_mask = x86_clflush_size - 1;
+	void *vend = addr + size;
+	void *p;
+
+	for (p = (void *)((unsigned long)addr & ~clflush_mask);
+	     p < vend; p += x86_clflush_size)
+		clwb(p);
+}
+
+long __copy_user_flushcache(void *dst, const void __user *src, unsigned size)
+{
+	unsigned long flushed, dest = (unsigned long) dst;
+	long rc = __copy_user_nocache(dst, src, size, 0);
+
+	/*
+	 * __copy_user_nocache() uses non-temporal stores for the bulk
+	 * of the transfer, but we need to manually flush if the
+	 * transfer is unaligned. A cached memory copy is used when
+	 * destination or size is not naturally aligned. That is:
+	 *   - Require 8-byte alignment when size is 8 bytes or larger.
+	 *   - Require 4-byte alignment when size is 4 bytes.
+	 */
+	if (size < 8) {
+		if (!IS_ALIGNED(dest, 4) || size != 4)
+			clean_cache_range(dst, 1);
+	} else {
+		if (!IS_ALIGNED(dest, 8)) {
+			dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+			clean_cache_range(dst, 1);
+		}
+
+		flushed = dest - (unsigned long) dst;
+		if (size > flushed && !IS_ALIGNED(size - flushed, 8))
+			clean_cache_range(dst + size - 1, 1);
+	}
+
+	return rc;
+}
+
+void memcpy_flushcache(void *_dst, const void *_src, size_t size)
+{
+	unsigned long dest = (unsigned long) _dst;
+	unsigned long source = (unsigned long) _src;
+
+	/* cache copy and flush to align dest */
+	if (!IS_ALIGNED(dest, 8)) {
+		unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest);
+
+		memcpy((void *) dest, (void *) source, len);
+		clean_cache_range((void *) dest, len);
+		dest += len;
+		source += len;
+		size -= len;
+		if (!size)
+			return;
+	}
+
+	/* 4x8 movnti loop */
+	while (size >= 32) {
+		asm("movq    (%0), %%r8\n"
+		    "movq   8(%0), %%r9\n"
+		    "movq  16(%0), %%r10\n"
+		    "movq  24(%0), %%r11\n"
+		    "movnti  %%r8,   (%1)\n"
+		    "movnti  %%r9,  8(%1)\n"
+		    "movnti %%r10, 16(%1)\n"
+		    "movnti %%r11, 24(%1)\n"
+		    :: "r" (source), "r" (dest)
+		    : "memory", "r8", "r9", "r10", "r11");
+		dest += 32;
+		source += 32;
+		size -= 32;
+	}
+
+	/* 1x8 movnti loop */
+	while (size >= 8) {
+		asm("movq    (%0), %%r8\n"
+		    "movnti  %%r8,   (%1)\n"
+		    :: "r" (source), "r" (dest)
+		    : "memory", "r8");
+		dest += 8;
+		source += 8;
+		size -= 8;
+	}
+
+	/* 1x4 movnti loop */
+	while (size >= 4) {
+		asm("movl    (%0), %%r8d\n"
+		    "movnti  %%r8d,   (%1)\n"
+		    :: "r" (source), "r" (dest)
+		    : "memory", "r8");
+		dest += 4;
+		source += 4;
+		size -= 4;
+	}
+
+	/* cache copy for remaining bytes */
+	if (size) {
+		memcpy((void *) dest, (void *) source, size);
+		clean_cache_range((void *) dest, size);
+	}
+}
+EXPORT_SYMBOL_GPL(memcpy_flushcache);
+
+void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
+		size_t len)
+{
+	char *from = kmap_atomic(page);
+
+	memcpy_flushcache(to, from + offset, len);
+	kunmap_atomic(from);
+}
+#endif
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 656acb5d7166..cbd5596e7562 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1842,8 +1842,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 		}
 
 		if (rw)
-			memcpy_to_pmem(mmio->addr.aperture + offset,
-					iobuf + copied, c);
+			memcpy_flushcache(mmio->addr.aperture + offset, iobuf + copied, c);
 		else {
 			if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH)
 				mmio_flush_range((void __force *)
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 7ceb5fa4f2a1..b8b9c8ca7862 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -277,7 +277,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 			rc = -EIO;
 	}
 
-	memcpy_to_pmem(nsio->addr + offset, buf, size);
+	memcpy_flushcache(nsio->addr + offset, buf, size);
 	nvdimm_flush(to_nd_region(ndns->dev.parent));
 
 	return rc;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c544d466ea51..2f3aefe565c6 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -29,6 +29,7 @@
 #include <linux/pfn_t.h>
 #include <linux/slab.h>
 #include <linux/pmem.h>
+#include <linux/uio.h>
 #include <linux/dax.h>
 #include <linux/nd.h>
 #include "pmem.h"
@@ -80,7 +81,7 @@ static void write_pmem(void *pmem_addr, struct page *page,
 {
 	void *mem = kmap_atomic(page);
 
-	memcpy_to_pmem(pmem_addr, mem + off, len);
+	memcpy_flushcache(pmem_addr, mem + off, len);
 	kunmap_atomic(mem);
 }
 
@@ -235,8 +236,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 	return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn);
 }
 
+static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	return copy_from_iter_flushcache(addr, bytes, i);
+}
+
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
+	.copy_from_iter = pmem_copy_from_iter,
 };
 
 static void pmem_release_queue(void *q)
@@ -294,7 +302,8 @@ static int pmem_attach_disk(struct device *dev,
 	dev_set_drvdata(dev, pmem);
 	pmem->phys_addr = res->start;
 	pmem->size = resource_size(res);
-	if (nvdimm_has_flush(nd_region) < 0)
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE)
+			|| nvdimm_has_flush(nd_region) < 0)
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b550edf2571f..985b0e11bd73 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1015,8 +1015,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 	 * The first wmb() is needed to 'sfence' all previous writes
 	 * such that they are architecturally visible for the platform
 	 * buffer flush.  Note that we've already arranged for pmem
-	 * writes to avoid the cache via arch_memcpy_to_pmem().  The
-	 * final wmb() ensures ordering for the NVDIMM flush write.
+	 * writes to avoid the cache via memcpy_flushcache().  The final
+	 * wmb() ensures ordering for the NVDIMM flush write.
 	 */
 	wmb();
 	for (i = 0; i < nd_region->ndr_mappings; i++)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5ec1f6c47716..bbe79ed90e2b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,9 @@ struct dax_operations {
 	 */
 	long (*direct_access)(struct dax_device *, pgoff_t, long,
 			void **, pfn_t *);
+	/* copy_from_iter: dax-driver override for default copy_from_iter */
+	size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
+			struct iov_iter *);
 };
 
 #if IS_ENABLED(CONFIG_DAX)
diff --git a/include/linux/string.h b/include/linux/string.h
index 537918f8a98e..7439d83eaa33 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src,
 	return 0;
 }
 #endif
+#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
+static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
+{
+	memcpy(dst, src, cnt);
+}
+#endif
 void *memchr_inv(const void *s, int c, size_t n);
 char *strreplace(char *s, char old, char new);
 
diff --git a/include/linux/uio.h b/include/linux/uio.h
index f2d36a3d3005..55cd54a0e941 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
 size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+/*
+ * Note, users like pmem that depend on the stricter semantics of
+ * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
+ * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
+ * destination is flushed from the cache on return.
+ */
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
+#else
+static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes,
+				       struct iov_iter *i)
+{
+	return copy_from_iter_nocache(addr, bytes, i);
+}
+#endif
 bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
diff --git a/lib/Kconfig b/lib/Kconfig
index 0c8b78a9ae2e..2d1c4b3a085c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN
 config ARCH_HAS_PMEM_API
 	bool
 
+config ARCH_HAS_UACCESS_FLUSHCACHE
+	bool
+
 config ARCH_HAS_MMIO_FLUSH
 	bool
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f835964c9485..c9a69064462f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -615,6 +615,28 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 }
 EXPORT_SYMBOL(copy_from_iter_nocache);
 
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+	char *to = addr;
+	if (unlikely(i->type & ITER_PIPE)) {
+		WARN_ON(1);
+		return 0;
+	}
+	iterate_and_advance(i, bytes, v,
+		__copy_from_user_flushcache((to += v.iov_len) - v.iov_len,
+					 v.iov_base, v.iov_len),
+		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len),
+		memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
+			v.iov_len)
+	)
+
+	return bytes;
+}
+EXPORT_SYMBOL_GPL(copy_from_iter_flushcache);
+#endif
+
 bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

             reply	other threads:[~2017-07-07  0:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-07  0:22 Williams, Dan J [this message]
2017-07-07  0:22 ` [GIT PULL] libnvdimm for 4.13 Williams, Dan J
2017-07-07  0:34 ` Al Viro
2017-07-07  0:34   ` Al Viro
2017-07-07  0:34   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1499386957.6081.29.camel@intel.com \
    --to=dan.j.williams@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.