* [RFC PATCH] x86, uaccess, pmem: introduce copy_from_iter_writethru for dax + pmem [not found] <20170425012230.GX29622@ZenIV.linux.org.uk> @ 2017-04-26 21:56 ` Dan Williams 2017-04-27 6:30 ` Ingo Molnar 0 siblings, 1 reply; 16+ messages in thread From: Dan Williams @ 2017-04-26 21:56 UTC (permalink / raw) To: viro Cc: Jan Kara, Matthew Wilcox, x86, linux-kernel, hch, linux-block, linux-nvdimm, Jeff Moyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner, Ross Zwisler The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_writethru, memcpy_page_writethru, and memcpy_writethru, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_writethru and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_writethru() and memcpy_writethru() are gated by the CONFIG_ARCH_HAS_UACCESS_WRITETHRU config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- This patch is based on a merge of vfs.git/for-next and nvdimm.git/libnvdimm-for-next. arch/x86/Kconfig | 1 arch/x86/include/asm/string_64.h | 5 + arch/x86/include/asm/uaccess_64.h | 13 ++++ arch/x86/lib/usercopy_64.c | 128 +++++++++++++++++++++++++++++++++++++ drivers/acpi/nfit/core.c | 2 - drivers/nvdimm/claim.c | 2 - drivers/nvdimm/pmem.c | 13 +++- drivers/nvdimm/region_devs.c | 2 - include/linux/dax.h | 3 + include/linux/string.h | 6 ++ include/linux/uio.h | 15 ++++ lib/Kconfig | 3 + lib/iov_iter.c | 22 ++++++ 13 files changed, 210 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1d50fdff77ee..bd3ff407d707 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -54,6 +54,7 @@ config X86 select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MMIO_FLUSH select ARCH_HAS_PMEM_API if X86_64 + select ARCH_HAS_UACCESS_WRITETHRU if X86_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_SG_CHAIN select ARCH_HAS_STRICT_KERNEL_RWX diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 733bae07fb29..60173bc51603 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt) return 0; } +#ifdef CONFIG_ARCH_HAS_UACCESS_WRITETHRU +#define __HAVE_ARCH_MEMCPY_WRITETHRU 1 +void memcpy_writethru(void *dst, const void *src, size_t cnt); +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_X86_STRING_64_H */ diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index c5504b9a472e..748e8a50e4b3 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -171,6 +171,11 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne extern long __copy_user_nocache(void *dst, const void __user *src, unsigned size, int zerorest); +extern long __copy_user_writethru(void *dst, const void __user *src, + unsigned size); +extern void memcpy_page_writethru(char *to, struct page *page, size_t offset, + size_t len); + static inline int __copy_from_user_inatomic_nocache(void *dst, const void __user *src, unsigned size) @@ -179,6 +184,14 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src, return __copy_user_nocache(dst, src, size, 0); } +static inline int +__copy_from_user_inatomic_writethru(void *dst, const void __user *src, + unsigned size) +{ + kasan_check_write(dst, size); + return __copy_user_writethru(dst, src, size); +} + unsigned long copy_user_handle_tail(char *to, char *from, unsigned len); diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 3b7c40a2e3e1..144cb5e59193 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -7,6 +7,7 @@ */ #include <linux/export.h> #include <linux/uaccess.h> +#include <linux/highmem.h> /* * Zero Userspace @@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len) clac(); return len; } + +#ifdef CONFIG_ARCH_HAS_UACCESS_WRITETHRU +/** + * clean_cache_range - write back a cache range with CLWB + * @vaddr: virtual start address + * @size: number of bytes to write back + * + * Write back a cache range using the CLWB (cache line write back) + * instruction. Note that @size is internally rounded up to be cache + * line size aligned. + */ +static void clean_cache_range(void *addr, size_t size) +{ + u16 x86_clflush_size = boot_cpu_data.x86_clflush_size; + unsigned long clflush_mask = x86_clflush_size - 1; + void *vend = addr + size; + void *p; + + for (p = (void *)((unsigned long)addr & ~clflush_mask); + p < vend; p += x86_clflush_size) + clwb(p); +} + +long __copy_user_writethru(void *dst, const void __user *src, unsigned size) +{ + unsigned long flushed, dest = (unsigned long) dst; + long rc = __copy_user_nocache(dst, src, size, 0); + + /* + * __copy_user_nocache() uses non-temporal stores for the bulk + * of the transfer, but we need to manually flush if the + * transfer is unaligned. A cached memory copy is used when + * destination or size is not naturally aligned. That is: + * - Require 8-byte alignment when size is 8 bytes or larger. + * - Require 4-byte alignment when size is 4 bytes. + */ + if (size < 8) { + if (!IS_ALIGNED(dest, 4) || size != 4) + clean_cache_range(dst, 1); + } else { + if (!IS_ALIGNED(dest, 8)) { + dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); + clean_cache_range(dst, 1); + } + + flushed = dest - (unsigned long) dst; + if (size > flushed && !IS_ALIGNED(size - flushed, 8)) + clean_cache_range(dst + size - 1, 1); + } + + return rc; +} + +void memcpy_writethru(void *_dst, const void *_src, size_t size) +{ + unsigned long dest = (unsigned long) _dst; + unsigned long source = (unsigned long) _src; + + /* cache copy and flush to align dest */ + if (!IS_ALIGNED(dest, 8)) { + unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest); + + memcpy((void *) dest, (void *) source, len); + clean_cache_range((void *) dest, len); + dest += len; + source += len; + size -= len; + if (!size) + return; + } + + /* 4x8 movnti loop */ + while (size >= 32) { + asm("movq (%0), %%r8\n" + "movq 8(%0), %%r9\n" + "movq 16(%0), %%r10\n" + "movq 24(%0), %%r11\n" + "movnti %%r8, (%1)\n" + "movnti %%r9, 8(%1)\n" + "movnti %%r10, 16(%1)\n" + "movnti %%r11, 24(%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8", "r9", "r10", "r11"); + dest += 32; + source += 32; + size -= 32; + } + + /* 1x8 movnti loop */ + while (size >= 8) { + asm("movq (%0), %%r8\n" + "movnti %%r8, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 8; + source += 8; + size -= 8; + } + + /* 1x4 movnti loop */ + while (size >= 4) { + asm("movl (%0), %%r8d\n" + "movnti %%r8d, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 4; + source += 4; + size -= 4; + } + + /* cache copy for remaining bytes */ + if (size) { + memcpy((void *) dest, (void *) source, size); + clean_cache_range((void *) dest, size); + } +} +EXPORT_SYMBOL_GPL(memcpy_writethru); + +void memcpy_page_writethru(char *to, struct page *page, size_t offset, + size_t len) +{ + char *from = kmap_atomic(page); + + memcpy_writethru(to, from + offset, len); + kunmap_atomic(from); +} +#endif diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index d0c07b2344e4..c84e242f91ed 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1776,7 +1776,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk, } if (rw) - memcpy_to_pmem(mmio->addr.aperture + offset, + memcpy_writethru(mmio->addr.aperture + offset, iobuf + copied, c); else { if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH) diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 3a35e8028b9c..38822f6fa49f 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -266,7 +266,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, rc = -EIO; } - memcpy_to_pmem(nsio->addr + offset, buf, size); + memcpy_writethru(nsio->addr + offset, buf, size); nvdimm_flush(to_nd_region(ndns->dev.parent)); return rc; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 3b3dab73d741..28dc82a595a5 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -28,6 +28,7 @@ #include <linux/pfn_t.h> #include <linux/slab.h> #include <linux/pmem.h> +#include <linux/uio.h> #include <linux/dax.h> #include <linux/nd.h> #include "pmem.h" @@ -79,7 +80,7 @@ static void write_pmem(void *pmem_addr, struct page *page, { void *mem = kmap_atomic(page); - memcpy_to_pmem(pmem_addr, mem + off, len); + memcpy_writethru(pmem_addr, mem + off, len); kunmap_atomic(mem); } @@ -234,8 +235,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev, return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn); } +static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i) +{ + return copy_from_iter_writethru(addr, bytes, i); +} + static const struct dax_operations pmem_dax_ops = { .direct_access = pmem_dax_direct_access, + .copy_from_iter = pmem_copy_from_iter, }; static void pmem_release_queue(void *q) @@ -288,7 +296,8 @@ static int pmem_attach_disk(struct device *dev, dev_set_drvdata(dev, pmem); pmem->phys_addr = res->start; pmem->size = resource_size(res); - if (nvdimm_has_flush(nd_region) < 0) + if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_WRITETHRU) + || nvdimm_has_flush(nd_region) < 0) dev_warn(dev, "unable to guarantee persistence of writes\n"); if (!devm_request_mem_region(dev, res->start, resource_size(res), diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b7cb5066d961..b668ba455c39 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -947,7 +947,7 @@ void nvdimm_flush(struct nd_region *nd_region) * The first wmb() is needed to 'sfence' all previous writes * such that they are architecturally visible for the platform * buffer flush. Note that we've already arranged for pmem - * writes to avoid the cache via arch_memcpy_to_pmem(). The + * writes to avoid the cache via memcpy_writethru(). The * final wmb() ensures ordering for the NVDIMM flush write. */ wmb(); diff --git a/include/linux/dax.h b/include/linux/dax.h index d3158e74a59e..156f067d4db5 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -16,6 +16,9 @@ struct dax_operations { */ long (*direct_access)(struct dax_device *, pgoff_t, long, void **, pfn_t *); + /* copy_from_iter: dax-driver override for default copy_from_iter */ + size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t, + struct iov_iter *); }; int dax_read_lock(void); diff --git a/include/linux/string.h b/include/linux/string.h index 9d6f189157e2..f4e166d88e2a 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src, return 0; } #endif +#ifndef __HAVE_ARCH_MEMCPY_WRITETHRU +static inline void memcpy_writethru(void *dst, const void *src, size_t cnt) +{ + memcpy(dst, src, cnt); +} +#endif void *memchr_inv(const void *s, int c, size_t n); char *strreplace(char *s, char old, char new); diff --git a/include/linux/uio.h b/include/linux/uio.h index f2d36a3d3005..d284cb5e89fa 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i); size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i); bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i); size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); +#ifdef CONFIG_ARCH_HAS_UACCESS_WRITETHRU +/* + * Note, users like pmem that depend on the stricter semantics of + * copy_from_iter_writethru() than copy_from_iter_nocache() must check + * for IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_WRITETHROUGH) before assuming + * that the destination is flushed from the cache on return. + */ +size_t copy_from_iter_writethru(void *addr, size_t bytes, struct iov_iter *i); +#else +static inline size_t copy_from_iter_writethru(void *addr, size_t bytes, + struct iov_iter *i) +{ + return copy_from_iter_nocache(addr, bytes, i); +} +#endif bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i); size_t iov_iter_zero(size_t bytes, struct iov_iter *); unsigned long iov_iter_alignment(const struct iov_iter *i); diff --git a/lib/Kconfig b/lib/Kconfig index 0c8b78a9ae2e..db31bc186df2 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN config ARCH_HAS_PMEM_API bool +config ARCH_HAS_UACCESS_WRITETHRU + bool + config ARCH_HAS_MMIO_FLUSH bool diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f7c93568ec99..afc3dc75346c 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -615,6 +615,28 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) } EXPORT_SYMBOL(copy_from_iter_nocache); +#ifdef CONFIG_ARCH_HAS_UACCESS_WRITETHRU +size_t copy_from_iter_writethru(void *addr, size_t bytes, struct iov_iter *i) +{ + char *to = addr; + if (unlikely(i->type & ITER_PIPE)) { + WARN_ON(1); + return 0; + } + iterate_and_advance(i, bytes, v, + __copy_from_user_inatomic_writethru((to += v.iov_len) - v.iov_len, + v.iov_base, v.iov_len), + memcpy_page_writethru((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len), + memcpy_writethru((to += v.iov_len) - v.iov_len, v.iov_base, + v.iov_len) + ) + + return bytes; +} +EXPORT_SYMBOL_GPL(copy_from_iter_writethru); +#endif + bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) { char *to = addr; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH] x86, uaccess, pmem: introduce copy_from_iter_writethru for dax + pmem 2017-04-26 21:56 ` [RFC PATCH] x86, uaccess, pmem: introduce copy_from_iter_writethru for dax + pmem Dan Williams @ 2017-04-27 6:30 ` Ingo Molnar 2017-04-28 19:39 ` [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations Dan Williams 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2017-04-27 6:30 UTC (permalink / raw) To: Dan Williams Cc: viro, Jan Kara, Matthew Wilcox, x86, linux-kernel, hch, linux-block, linux-nvdimm, Jeff Moyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner, Ross Zwisler * Dan Williams <dan.j.williams@intel.com> wrote: > +#ifdef CONFIG_ARCH_HAS_UACCESS_WRITETHRU > +#define __HAVE_ARCH_MEMCPY_WRITETHRU 1 > +void memcpy_writethru(void *dst, const void *src, size_t cnt); > +#endif This should be named memcpy_wt(), which is the well-known postfix for write-through. We already have ioremap_wt(), set_memory_wt(), etc. - no need to introduce a longer variant with uncommon spelling. Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-04-27 6:30 ` Ingo Molnar @ 2017-04-28 19:39 ` Dan Williams 2017-05-05 6:54 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Dan Williams @ 2017-04-28 19:39 UTC (permalink / raw) To: viro Cc: Jan Kara, Matthew Wilcox, x86, linux-kernel, hch, linux-block, linux-nvdimm, jmoyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner, ross.zwisler The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_wt, memcpy_page_wt, and memcpy_wt, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_wt and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_wt() and memcpy_wt() are gated by the CONFIG_ARCH_HAS_UACCESS_WT config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- Changes since the initial RFC: * s/writethru/wt/ since we already have ioremap_wt(), set_memory_wt(), etc. (Ingo) arch/x86/Kconfig | 1 arch/x86/include/asm/string_64.h | 5 + arch/x86/include/asm/uaccess_64.h | 11 +++ arch/x86/lib/usercopy_64.c | 128 +++++++++++++++++++++++++++++++++++++ drivers/acpi/nfit/core.c | 3 - drivers/nvdimm/claim.c | 2 - drivers/nvdimm/pmem.c | 13 +++- drivers/nvdimm/region_devs.c | 4 + include/linux/dax.h | 3 + include/linux/string.h | 6 ++ include/linux/uio.h | 15 ++++ lib/Kconfig | 3 + lib/iov_iter.c | 21 ++++++ 13 files changed, 208 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1d50fdff77ee..398117923b1c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -54,6 +54,7 @@ config X86 select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MMIO_FLUSH select ARCH_HAS_PMEM_API if X86_64 + select ARCH_HAS_UACCESS_WT if X86_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_SG_CHAIN select ARCH_HAS_STRICT_KERNEL_RWX diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 733bae07fb29..dfbd66b11c72 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt) return 0; } +#ifdef CONFIG_ARCH_HAS_UACCESS_WT +#define __HAVE_ARCH_MEMCPY_WT 1 +void memcpy_wt(void *dst, const void *src, size_t cnt); +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_X86_STRING_64_H */ diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index c5504b9a472e..07ded30c7e89 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne extern long __copy_user_nocache(void *dst, const void __user *src, unsigned size, int zerorest); +extern long __copy_user_wt(void *dst, const void __user *src, unsigned size); +extern void memcpy_page_wt(char *to, struct page *page, size_t offset, + size_t len); + static inline int __copy_from_user_inatomic_nocache(void *dst, const void __user *src, unsigned size) @@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src, return __copy_user_nocache(dst, src, size, 0); } +static inline int +__copy_from_user_inatomic_wt(void *dst, const void __user *src, unsigned size) +{ + kasan_check_write(dst, size); + return __copy_user_wt(dst, src, size); +} + unsigned long copy_user_handle_tail(char *to, char *from, unsigned len); diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 3b7c40a2e3e1..0aeff66a022f 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -7,6 +7,7 @@ */ #include <linux/export.h> #include <linux/uaccess.h> +#include <linux/highmem.h> /* * Zero Userspace @@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len) clac(); return len; } + +#ifdef CONFIG_ARCH_HAS_UACCESS_WT +/** + * clean_cache_range - write back a cache range with CLWB + * @vaddr: virtual start address + * @size: number of bytes to write back + * + * Write back a cache range using the CLWB (cache line write back) + * instruction. Note that @size is internally rounded up to be cache + * line size aligned. + */ +static void clean_cache_range(void *addr, size_t size) +{ + u16 x86_clflush_size = boot_cpu_data.x86_clflush_size; + unsigned long clflush_mask = x86_clflush_size - 1; + void *vend = addr + size; + void *p; + + for (p = (void *)((unsigned long)addr & ~clflush_mask); + p < vend; p += x86_clflush_size) + clwb(p); +} + +long __copy_user_wt(void *dst, const void __user *src, unsigned size) +{ + unsigned long flushed, dest = (unsigned long) dst; + long rc = __copy_user_nocache(dst, src, size, 0); + + /* + * __copy_user_nocache() uses non-temporal stores for the bulk + * of the transfer, but we need to manually flush if the + * transfer is unaligned. A cached memory copy is used when + * destination or size is not naturally aligned. That is: + * - Require 8-byte alignment when size is 8 bytes or larger. + * - Require 4-byte alignment when size is 4 bytes. + */ + if (size < 8) { + if (!IS_ALIGNED(dest, 4) || size != 4) + clean_cache_range(dst, 1); + } else { + if (!IS_ALIGNED(dest, 8)) { + dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); + clean_cache_range(dst, 1); + } + + flushed = dest - (unsigned long) dst; + if (size > flushed && !IS_ALIGNED(size - flushed, 8)) + clean_cache_range(dst + size - 1, 1); + } + + return rc; +} + +void memcpy_wt(void *_dst, const void *_src, size_t size) +{ + unsigned long dest = (unsigned long) _dst; + unsigned long source = (unsigned long) _src; + + /* cache copy and flush to align dest */ + if (!IS_ALIGNED(dest, 8)) { + unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest); + + memcpy((void *) dest, (void *) source, len); + clean_cache_range((void *) dest, len); + dest += len; + source += len; + size -= len; + if (!size) + return; + } + + /* 4x8 movnti loop */ + while (size >= 32) { + asm("movq (%0), %%r8\n" + "movq 8(%0), %%r9\n" + "movq 16(%0), %%r10\n" + "movq 24(%0), %%r11\n" + "movnti %%r8, (%1)\n" + "movnti %%r9, 8(%1)\n" + "movnti %%r10, 16(%1)\n" + "movnti %%r11, 24(%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8", "r9", "r10", "r11"); + dest += 32; + source += 32; + size -= 32; + } + + /* 1x8 movnti loop */ + while (size >= 8) { + asm("movq (%0), %%r8\n" + "movnti %%r8, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 8; + source += 8; + size -= 8; + } + + /* 1x4 movnti loop */ + while (size >= 4) { + asm("movl (%0), %%r8d\n" + "movnti %%r8d, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 4; + source += 4; + size -= 4; + } + + /* cache copy for remaining bytes */ + if (size) { + memcpy((void *) dest, (void *) source, size); + clean_cache_range((void *) dest, size); + } +} +EXPORT_SYMBOL_GPL(memcpy_wt); + +void memcpy_page_wt(char *to, struct page *page, size_t offset, + size_t len) +{ + char *from = kmap_atomic(page); + + memcpy_wt(to, from + offset, len); + kunmap_atomic(from); +} +#endif diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index d0c07b2344e4..be9bba609f26 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1776,8 +1776,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk, } if (rw) - memcpy_to_pmem(mmio->addr.aperture + offset, - iobuf + copied, c); + memcpy_wt(mmio->addr.aperture + offset, iobuf + copied, c); else { if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH) mmio_flush_range((void __force *) diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 3a35e8028b9c..864ed42baaf0 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -266,7 +266,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, rc = -EIO; } - memcpy_to_pmem(nsio->addr + offset, buf, size); + memcpy_wt(nsio->addr + offset, buf, size); nvdimm_flush(to_nd_region(ndns->dev.parent)); return rc; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 3b3dab73d741..4be8f30de9b3 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -28,6 +28,7 @@ #include <linux/pfn_t.h> #include <linux/slab.h> #include <linux/pmem.h> +#include <linux/uio.h> #include <linux/dax.h> #include <linux/nd.h> #include "pmem.h" @@ -79,7 +80,7 @@ static void write_pmem(void *pmem_addr, struct page *page, { void *mem = kmap_atomic(page); - memcpy_to_pmem(pmem_addr, mem + off, len); + memcpy_wt(pmem_addr, mem + off, len); kunmap_atomic(mem); } @@ -234,8 +235,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev, return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn); } +static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i) +{ + return copy_from_iter_wt(addr, bytes, i); +} + static const struct dax_operations pmem_dax_ops = { .direct_access = pmem_dax_direct_access, + .copy_from_iter = pmem_copy_from_iter, }; static void pmem_release_queue(void *q) @@ -288,7 +296,8 @@ static int pmem_attach_disk(struct device *dev, dev_set_drvdata(dev, pmem); pmem->phys_addr = res->start; pmem->size = resource_size(res); - if (nvdimm_has_flush(nd_region) < 0) + if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_WT) + || nvdimm_has_flush(nd_region) < 0) dev_warn(dev, "unable to guarantee persistence of writes\n"); if (!devm_request_mem_region(dev, res->start, resource_size(res), diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b7cb5066d961..016af2a6694d 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -947,8 +947,8 @@ void nvdimm_flush(struct nd_region *nd_region) * The first wmb() is needed to 'sfence' all previous writes * such that they are architecturally visible for the platform * buffer flush. Note that we've already arranged for pmem - * writes to avoid the cache via arch_memcpy_to_pmem(). The - * final wmb() ensures ordering for the NVDIMM flush write. + * writes to avoid the cache via memcpy_wt(). The final wmb() + * ensures ordering for the NVDIMM flush write. */ wmb(); for (i = 0; i < nd_region->ndr_mappings; i++) diff --git a/include/linux/dax.h b/include/linux/dax.h index d3158e74a59e..156f067d4db5 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -16,6 +16,9 @@ struct dax_operations { */ long (*direct_access)(struct dax_device *, pgoff_t, long, void **, pfn_t *); + /* copy_from_iter: dax-driver override for default copy_from_iter */ + size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t, + struct iov_iter *); }; int dax_read_lock(void); diff --git a/include/linux/string.h b/include/linux/string.h index 9d6f189157e2..245e0a29b7e5 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src, return 0; } #endif +#ifndef __HAVE_ARCH_MEMCPY_WT +static inline void memcpy_wt(void *dst, const void *src, size_t cnt) +{ + memcpy(dst, src, cnt); +} +#endif void *memchr_inv(const void *s, int c, size_t n); char *strreplace(char *s, char old, char new); diff --git a/include/linux/uio.h b/include/linux/uio.h index f2d36a3d3005..30c43aa371b5 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i); size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i); bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i); size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); +#ifdef CONFIG_ARCH_HAS_UACCESS_WT +/* + * Note, users like pmem that depend on the stricter semantics of + * copy_from_iter_wt() than copy_from_iter_nocache() must check + * for IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_WRITETHROUGH) before assuming + * that the destination is flushed from the cache on return. + */ +size_t copy_from_iter_wt(void *addr, size_t bytes, struct iov_iter *i); +#else +static inline size_t copy_from_iter_wt(void *addr, size_t bytes, + struct iov_iter *i) +{ + return copy_from_iter_nocache(addr, bytes, i); +} +#endif bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i); size_t iov_iter_zero(size_t bytes, struct iov_iter *); unsigned long iov_iter_alignment(const struct iov_iter *i); diff --git a/lib/Kconfig b/lib/Kconfig index 0c8b78a9ae2e..f0752a7a9001 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN config ARCH_HAS_PMEM_API bool +config ARCH_HAS_UACCESS_WT + bool + config ARCH_HAS_MMIO_FLUSH bool diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f7c93568ec99..19ab9af091f9 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -615,6 +615,27 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) } EXPORT_SYMBOL(copy_from_iter_nocache); +#ifdef CONFIG_ARCH_HAS_UACCESS_WT +size_t copy_from_iter_wt(void *addr, size_t bytes, struct iov_iter *i) +{ + char *to = addr; + if (unlikely(i->type & ITER_PIPE)) { + WARN_ON(1); + return 0; + } + iterate_and_advance(i, bytes, v, + __copy_from_user_inatomic_wt((to += v.iov_len) - v.iov_len, + v.iov_base, v.iov_len), + memcpy_page_wt((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len), + memcpy_wt((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + ) + + return bytes; +} +EXPORT_SYMBOL_GPL(copy_from_iter_wt); +#endif + bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) { char *to = addr; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-04-28 19:39 ` [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations Dan Williams @ 2017-05-05 6:54 ` Ingo Molnar 2017-05-05 14:12 ` Dan Williams 2017-05-05 20:39 ` Kani, Toshimitsu 2017-05-08 20:32 ` Ross Zwisler 2 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2017-05-05 6:54 UTC (permalink / raw) To: Dan Williams Cc: viro, Jan Kara, Matthew Wilcox, x86, linux-kernel, hch, linux-block, linux-nvdimm, jmoyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner, ross.zwisler * Dan Williams <dan.j.williams@intel.com> wrote: > The pmem driver has a need to transfer data with a persistent memory > destination and be able to rely on the fact that the destination writes > are not cached. It is sufficient for the writes to be flushed to a > cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect > userspace to call fsync() to ensure data-writes have reached a > power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or > REQ_FLUSH to the pmem driver which will turn around and fence previous > writes with an "sfence". > > Implement a __copy_from_user_inatomic_wt, memcpy_page_wt, and memcpy_wt, > that guarantee that the destination buffer is not dirty in the cpu cache > on completion. The new copy_from_iter_wt and sub-routines will be used > to replace the "pmem api" (include/linux/pmem.h + > arch/x86/include/asm/pmem.h). The availability of copy_from_iter_wt() > and memcpy_wt() are gated by the CONFIG_ARCH_HAS_UACCESS_WT config > symbol, and fallback to copy_from_iter_nocache() and plain memcpy() > otherwise. > > This is meant to satisfy the concern from Linus that if a driver wants > to do something beyond the normal nocache semantics it should be > something private to that driver [1], and Al's concern that anything > uaccess related belongs with the rest of the uaccess code [2]. > > [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html > [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html > > Cc: <x86@kernel.org> > Cc: Jan Kara <jack@suse.cz> > Cc: Jeff Moyer <jmoyer@redhat.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Matthew Wilcox <mawilcox@microsoft.com> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > Changes since the initial RFC: > * s/writethru/wt/ since we already have ioremap_wt(), set_memory_wt(), > etc. (Ingo) Looks good to me. I suspect you'd like to carry this in the nvdimm tree? Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-05 6:54 ` Ingo Molnar @ 2017-05-05 14:12 ` Dan Williams 0 siblings, 0 replies; 16+ messages in thread From: Dan Williams @ 2017-05-05 14:12 UTC (permalink / raw) To: Ingo Molnar Cc: Al Viro, Jan Kara, Matthew Wilcox, X86 ML, linux-kernel@vger.kernel.org, Christoph Hellwig, linux-block, linux-nvdimm@lists.01.org, jmoyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner, Ross Zwisler On Thu, May 4, 2017 at 11:54 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * Dan Williams <dan.j.williams@intel.com> wrote: > >> The pmem driver has a need to transfer data with a persistent memory >> destination and be able to rely on the fact that the destination writes >> are not cached. It is sufficient for the writes to be flushed to a >> cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect >> userspace to call fsync() to ensure data-writes have reached a >> power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or >> REQ_FLUSH to the pmem driver which will turn around and fence previous >> writes with an "sfence". >> >> Implement a __copy_from_user_inatomic_wt, memcpy_page_wt, and memcpy_wt, >> that guarantee that the destination buffer is not dirty in the cpu cache >> on completion. The new copy_from_iter_wt and sub-routines will be used >> to replace the "pmem api" (include/linux/pmem.h + >> arch/x86/include/asm/pmem.h). The availability of copy_from_iter_wt() >> and memcpy_wt() are gated by the CONFIG_ARCH_HAS_UACCESS_WT config >> symbol, and fallback to copy_from_iter_nocache() and plain memcpy() >> otherwise. >> >> This is meant to satisfy the concern from Linus that if a driver wants >> to do something beyond the normal nocache semantics it should be >> something private to that driver [1], and Al's concern that anything >> uaccess related belongs with the rest of the uaccess code [2]. >> >> [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html >> [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html >> >> Cc: <x86@kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Cc: Jeff Moyer <jmoyer@redhat.com> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Matthew Wilcox <mawilcox@microsoft.com> >> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> Changes since the initial RFC: >> * s/writethru/wt/ since we already have ioremap_wt(), set_memory_wt(), >> etc. (Ingo) > > Looks good to me. I suspect you'd like to carry this in the nvdimm tree? > > Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo!. Yes, I'll carry it in nvdimm.git for 4.13. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-04-28 19:39 ` [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations Dan Williams 2017-05-05 6:54 ` Ingo Molnar @ 2017-05-05 20:39 ` Kani, Toshimitsu 2017-05-05 22:25 ` Dan Williams 2017-05-08 20:32 ` Ross Zwisler 2 siblings, 1 reply; 16+ messages in thread From: Kani, Toshimitsu @ 2017-05-05 20:39 UTC (permalink / raw) To: dan.j.williams@intel.com, viro@zeniv.linux.org.uk Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz T24gRnJpLCAyMDE3LTA0LTI4IGF0IDEyOjM5IC0wNzAwLCBEYW4gV2lsbGlhbXMgd3JvdGU6DQo+ IFRoZSBwbWVtIGRyaXZlciBoYXMgYSBuZWVkIHRvIHRyYW5zZmVyIGRhdGEgd2l0aCBhIHBlcnNp c3RlbnQgbWVtb3J5DQo+IGRlc3RpbmF0aW9uIGFuZCBiZSBhYmxlIHRvIHJlbHkgb24gdGhlIGZh Y3QgdGhhdCB0aGUgZGVzdGluYXRpb24NCj4gd3JpdGVzIGFyZSBub3QgY2FjaGVkLiBJdCBpcyBz dWZmaWNpZW50IGZvciB0aGUgd3JpdGVzIHRvIGJlIGZsdXNoZWQNCj4gdG8gYSBjcHUtc3RvcmUt YnVmZmVyIChub24tdGVtcG9yYWwgLyAibW92bnQiIGluIHg4NiB0ZXJtcyksIGFzIHdlDQo+IGV4 cGVjdCB1c2Vyc3BhY2UgdG8gY2FsbCBmc3luYygpIHRvIGVuc3VyZSBkYXRhLXdyaXRlcyBoYXZl IHJlYWNoZWQgYQ0KPiBwb3dlci1mYWlsLXNhZmUgem9uZSBpbiB0aGUgcGxhdGZvcm0uIFRoZSBm c3luYygpIHRyaWdnZXJzIGEgUkVRX0ZVQQ0KPiBvciBSRVFfRkxVU0ggdG8gdGhlIHBtZW0gZHJp dmVyIHdoaWNoIHdpbGwgdHVybiBhcm91bmQgYW5kIGZlbmNlDQo+IHByZXZpb3VzIHdyaXRlcyB3 aXRoIGFuICJzZmVuY2UiLg0KPiANCj4gSW1wbGVtZW50IGEgX19jb3B5X2Zyb21fdXNlcl9pbmF0 b21pY193dCwgbWVtY3B5X3BhZ2Vfd3QsIGFuZA0KPiBtZW1jcHlfd3QsIHRoYXQgZ3VhcmFudGVl IHRoYXQgdGhlIGRlc3RpbmF0aW9uIGJ1ZmZlciBpcyBub3QgZGlydHkgaW4NCj4gdGhlIGNwdSBj YWNoZSBvbiBjb21wbGV0aW9uLiBUaGUgbmV3IGNvcHlfZnJvbV9pdGVyX3d0IGFuZCBzdWItDQo+ IHJvdXRpbmVzIHdpbGwgYmUgdXNlZCB0byByZXBsYWNlIHRoZSAicG1lbSBhcGkiIChpbmNsdWRl L2xpbnV4L3BtZW0uaA0KPiArIGFyY2gveDg2L2luY2x1ZGUvYXNtL3BtZW0uaCkuIFRoZSBhdmFp bGFiaWxpdHkgb2YNCj4gY29weV9mcm9tX2l0ZXJfd3QoKSBhbmQgbWVtY3B5X3d0KCkgYXJlIGdh dGVkIGJ5IHRoZQ0KPiBDT05GSUdfQVJDSF9IQVNfVUFDQ0VTU19XVCBjb25maWcgc3ltYm9sLCBh bmQgZmFsbGJhY2sgdG8NCj4gY29weV9mcm9tX2l0ZXJfbm9jYWNoZSgpIGFuZCBwbGFpbiBtZW1j cHkoKSBvdGhlcndpc2UuDQo+IA0KPiBUaGlzIGlzIG1lYW50IHRvIHNhdGlzZnkgdGhlIGNvbmNl cm4gZnJvbSBMaW51cyB0aGF0IGlmIGEgZHJpdmVyDQo+IHdhbnRzIHRvIGRvIHNvbWV0aGluZyBi ZXlvbmQgdGhlIG5vcm1hbCBub2NhY2hlIHNlbWFudGljcyBpdCBzaG91bGQNCj4gYmUgc29tZXRo aW5nIHByaXZhdGUgdG8gdGhhdCBkcml2ZXIgWzFdLCBhbmQgQWwncyBjb25jZXJuIHRoYXQNCj4g YW55dGhpbmcgdWFjY2VzcyByZWxhdGVkIGJlbG9uZ3Mgd2l0aCB0aGUgcmVzdCBvZiB0aGUgdWFj Y2VzcyBjb2RlDQo+IFsyXS4NCj4gDQo+IFsxXTogaHR0cHM6Ly9saXN0cy4wMS5vcmcvcGlwZXJt YWlsL2xpbnV4LW52ZGltbS8yMDE3LUphbnVhcnkvMDA4MzY0Lg0KPiBodG1sDQo+IFsyXTogaHR0 cHM6Ly9saXN0cy4wMS5vcmcvcGlwZXJtYWlsL2xpbnV4LW52ZGltbS8yMDE3LUFwcmlsLzAwOTk0 Mi5odA0KPiBtbA0KPiANCj4gQ2M6IDx4ODZAa2VybmVsLm9yZz4NCj4gQ2M6IEphbiBLYXJhIDxq YWNrQHN1c2UuY3o+DQo+IENjOiBKZWZmIE1veWVyIDxqbW95ZXJAcmVkaGF0LmNvbT4NCj4gQ2M6 IEluZ28gTW9sbmFyIDxtaW5nb0ByZWRoYXQuY29tPg0KPiBDYzogQ2hyaXN0b3BoIEhlbGx3aWcg PGhjaEBsc3QuZGU+DQo+IENjOiAiSC4gUGV0ZXIgQW52aW4iIDxocGFAenl0b3IuY29tPg0KPiBD YzogQWwgVmlybyA8dmlyb0B6ZW5pdi5saW51eC5vcmcudWs+DQo+IENjOiBUaG9tYXMgR2xlaXhu ZXIgPHRnbHhAbGludXRyb25peC5kZT4NCj4gQ2M6IE1hdHRoZXcgV2lsY294IDxtYXdpbGNveEBt aWNyb3NvZnQuY29tPg0KPiBDYzogUm9zcyBad2lzbGVyIDxyb3NzLnp3aXNsZXJAbGludXguaW50 ZWwuY29tPg0KPiBTaWduZWQtb2ZmLWJ5OiBEYW4gV2lsbGlhbXMgPGRhbi5qLndpbGxpYW1zQGlu dGVsLmNvbT4NCj4gLS0tDQo+IENoYW5nZXMgc2luY2UgdGhlIGluaXRpYWwgUkZDOg0KPiAqIHMv d3JpdGV0aHJ1L3d0LyBzaW5jZSB3ZSBhbHJlYWR5IGhhdmUgaW9yZW1hcF93dCgpLA0KPiBzZXRf bWVtb3J5X3d0KCksIGV0Yy4gKEluZ28pDQoNClNvcnJ5IEkgc2hvdWxkIGhhdmUgc2FpZCBlYXJs aWVyLCBidXQgSSB0aGluayB0aGUgdGVybSAid3QiIGlzDQptaXNsZWFkaW5nLiAgTm9uLXRlbXBv cmFsIHN0b3JlcyB1c2VkIGluIG1lbWNweV93dCgpIHByb3ZpZGUgV0MNCnNlbWFudGljcywgbm90 IFdUIHNlbWFudGljcy4gIEhvdyBhYm91dCB1c2luZyAibm9jYWNoZSIgYXMgaXQncyBiZWVuDQp1 c2VkIGluIF9fY29weV91c2VyX25vY2FjaGUoKT8NCg0KVGhhbmtzLA0KLVRvc2hpDQo= ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-05 20:39 ` Kani, Toshimitsu @ 2017-05-05 22:25 ` Dan Williams 2017-05-05 22:44 ` Kani, Toshimitsu 0 siblings, 1 reply; 16+ messages in thread From: Dan Williams @ 2017-05-05 22:25 UTC (permalink / raw) To: Kani, Toshimitsu Cc: viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz On Fri, May 5, 2017 at 1:39 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Fri, 2017-04-28 at 12:39 -0700, Dan Williams wrote: >> The pmem driver has a need to transfer data with a persistent memory >> destination and be able to rely on the fact that the destination >> writes are not cached. It is sufficient for the writes to be flushed >> to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we >> expect userspace to call fsync() to ensure data-writes have reached a >> power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA >> or REQ_FLUSH to the pmem driver which will turn around and fence >> previous writes with an "sfence". >> >> Implement a __copy_from_user_inatomic_wt, memcpy_page_wt, and >> memcpy_wt, that guarantee that the destination buffer is not dirty in >> the cpu cache on completion. The new copy_from_iter_wt and sub- >> routines will be used to replace the "pmem api" (include/linux/pmem.h >> + arch/x86/include/asm/pmem.h). The availability of >> copy_from_iter_wt() and memcpy_wt() are gated by the >> CONFIG_ARCH_HAS_UACCESS_WT config symbol, and fallback to >> copy_from_iter_nocache() and plain memcpy() otherwise. >> >> This is meant to satisfy the concern from Linus that if a driver >> wants to do something beyond the normal nocache semantics it should >> be something private to that driver [1], and Al's concern that >> anything uaccess related belongs with the rest of the uaccess code >> [2]. >> >> [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364. >> html >> [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.ht >> ml >> >> Cc: <x86@kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Cc: Jeff Moyer <jmoyer@redhat.com> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Matthew Wilcox <mawilcox@microsoft.com> >> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> Changes since the initial RFC: >> * s/writethru/wt/ since we already have ioremap_wt(), >> set_memory_wt(), etc. (Ingo) > > Sorry I should have said earlier, but I think the term "wt" is > misleading. Non-temporal stores used in memcpy_wt() provide WC > semantics, not WT semantics. The non-temporal stores do, but memcpy_wt() is using a combination of non-temporal stores and explicit cache flushing. > How about using "nocache" as it's been > used in __copy_user_nocache()? The difference in my mind is that the "_nocache" suffix indicates opportunistic / optional cache pollution avoidance whereas "_wt" strictly arranges for caches not to contain dirty data upon completion of the routine. For example, non-temporal stores on older x86 cpus could potentially leave dirty data in the cache, so memcpy_wt on those cpus would need to use explicit cache flushing. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-05 22:25 ` Dan Williams @ 2017-05-05 22:44 ` Kani, Toshimitsu 2017-05-06 2:15 ` Dan Williams 0 siblings, 1 reply; 16+ messages in thread From: Kani, Toshimitsu @ 2017-05-05 22:44 UTC (permalink / raw) To: dan.j.williams@intel.com Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz T24gRnJpLCAyMDE3LTA1LTA1IGF0IDE1OjI1IC0wNzAwLCBEYW4gV2lsbGlhbXMgd3JvdGU6DQo+ IE9uIEZyaSwgTWF5IDUsIDIwMTcgYXQgMTozOSBQTSwgS2FuaSwgVG9zaGltaXRzdSA8dG9zaGku a2FuaUBocGUuY29tPg0KPiB3cm90ZToNCiA6DQo+ID4gPiAtLS0NCj4gPiA+IENoYW5nZXMgc2lu Y2UgdGhlIGluaXRpYWwgUkZDOg0KPiA+ID4gKiBzL3dyaXRldGhydS93dC8gc2luY2Ugd2UgYWxy ZWFkeSBoYXZlIGlvcmVtYXBfd3QoKSwNCj4gPiA+IHNldF9tZW1vcnlfd3QoKSwgZXRjLiAoSW5n bykNCj4gPiANCj4gPiBTb3JyeSBJIHNob3VsZCBoYXZlIHNhaWQgZWFybGllciwgYnV0IEkgdGhp bmsgdGhlIHRlcm0gInd0IiBpcw0KPiA+IG1pc2xlYWRpbmcuwqDCoE5vbi10ZW1wb3JhbCBzdG9y ZXMgdXNlZCBpbiBtZW1jcHlfd3QoKSBwcm92aWRlIFdDDQo+ID4gc2VtYW50aWNzLCBub3QgV1Qg c2VtYW50aWNzLg0KPiANCj4gVGhlIG5vbi10ZW1wb3JhbCBzdG9yZXMgZG8sIGJ1dCBtZW1jcHlf d3QoKSBpcyB1c2luZyBhIGNvbWJpbmF0aW9uIG9mDQo+IG5vbi10ZW1wb3JhbCBzdG9yZXMgYW5k IGV4cGxpY2l0IGNhY2hlIGZsdXNoaW5nLg0KPiANCj4gPiBIb3cgYWJvdXQgdXNpbmcgIm5vY2Fj aGUiIGFzIGl0J3MgYmVlbg0KPiA+IHVzZWQgaW4gX19jb3B5X3VzZXJfbm9jYWNoZSgpPw0KPiAN Cj4gVGhlIGRpZmZlcmVuY2UgaW4gbXkgbWluZCBpcyB0aGF0IHRoZSAiX25vY2FjaGUiIHN1ZmZp eCBpbmRpY2F0ZXMNCj4gb3Bwb3J0dW5pc3RpYyAvIG9wdGlvbmFsIGNhY2hlIHBvbGx1dGlvbiBh dm9pZGFuY2Ugd2hlcmVhcyAiX3d0Ig0KPiBzdHJpY3RseSBhcnJhbmdlcyBmb3IgY2FjaGVzIG5v dCB0byBjb250YWluIGRpcnR5IGRhdGEgdXBvbg0KPiBjb21wbGV0aW9uIG9mIHRoZSByb3V0aW5l LiBGb3IgZXhhbXBsZSwgbm9uLXRlbXBvcmFsIHN0b3JlcyBvbiBvbGRlcg0KPiB4ODYgY3B1cyBj b3VsZCBwb3RlbnRpYWxseSBsZWF2ZSBkaXJ0eSBkYXRhIGluIHRoZSBjYWNoZSwgc28NCj4gbWVt Y3B5X3d0IG9uIHRob3NlIGNwdXMgd291bGQgbmVlZCB0byB1c2UgZXhwbGljaXQgY2FjaGUgZmx1 c2hpbmcuDQoNCkkgc2VlLiAgSSBhZ3JlZSB0aGF0IGl0cyBiZWhhdmlvciBpcyBkaWZmZXJlbnQg ZnJvbSB0aGUgZXhpc3Rpbmcgb25lDQp3aXRoICJfbm9jYWNoZSIuICAgVGhhdCBzYWlkLCBJIHRo aW5rICJ3dCIgb3IgIndyaXRlLXRocm91Z2giIGdlbmVyYWxseQ0KbWVhbnMgdGhhdCB3cml0ZXMg YWxsb2NhdGUgY2FjaGVsaW5lcyBhbmQga2VlcCB0aGVtIGNsZWFuIGJ5IHdyaXRpbmcgdG8NCm1l bW9yeS4gIFNvLCBzdWJzZXF1ZW50IHJlYWRzIHRvIHRoZSBkZXN0aW5hdGlvbiB3aWxsIGhpdCB0 aGUNCmNhY2hlbGluZXMuICBUaGlzIGlzIG5vdCB0aGUgY2FzZSB3aXRoIHRoaXMgaW50ZXJmYWNl Lg0KDQpUaGFua3MsDQotVG9zaGkNCiA= ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-05 22:44 ` Kani, Toshimitsu @ 2017-05-06 2:15 ` Dan Williams 2017-05-06 3:17 ` Kani, Toshimitsu 2017-05-06 9:46 ` Ingo Molnar 0 siblings, 2 replies; 16+ messages in thread From: Dan Williams @ 2017-05-06 2:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz On Fri, May 5, 2017 at 3:44 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Fri, 2017-05-05 at 15:25 -0700, Dan Williams wrote: >> On Fri, May 5, 2017 at 1:39 PM, Kani, Toshimitsu <toshi.kani@hpe.com> >> wrote: > : >> > > --- >> > > Changes since the initial RFC: >> > > * s/writethru/wt/ since we already have ioremap_wt(), >> > > set_memory_wt(), etc. (Ingo) >> > >> > Sorry I should have said earlier, but I think the term "wt" is >> > misleading. Non-temporal stores used in memcpy_wt() provide WC >> > semantics, not WT semantics. >> >> The non-temporal stores do, but memcpy_wt() is using a combination of >> non-temporal stores and explicit cache flushing. >> >> > How about using "nocache" as it's been >> > used in __copy_user_nocache()? >> >> The difference in my mind is that the "_nocache" suffix indicates >> opportunistic / optional cache pollution avoidance whereas "_wt" >> strictly arranges for caches not to contain dirty data upon >> completion of the routine. For example, non-temporal stores on older >> x86 cpus could potentially leave dirty data in the cache, so >> memcpy_wt on those cpus would need to use explicit cache flushing. > > I see. I agree that its behavior is different from the existing one > with "_nocache". That said, I think "wt" or "write-through" generally > means that writes allocate cachelines and keep them clean by writing to > memory. So, subsequent reads to the destination will hit the > cachelines. This is not the case with this interface. True... maybe _nocache_strict()? Or, leave it _wt() until someone comes along and is surprised that the cache is not warm for reads after memcpy_wt(), at which point we can ask "why not just use plain memcpy then?", or set the page-attributes to WT. ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-06 2:15 ` Dan Williams @ 2017-05-06 3:17 ` Kani, Toshimitsu 2017-05-06 9:46 ` Ingo Molnar 1 sibling, 0 replies; 16+ messages in thread From: Kani, Toshimitsu @ 2017-05-06 3:17 UTC (permalink / raw) To: Dan Williams Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz PiBPbiBGcmksIE1heSA1LCAyMDE3IGF0IDM6NDQgUE0sIEthbmksIFRvc2hpbWl0c3UgPHRvc2hp LmthbmlAaHBlLmNvbT4NCj4gd3JvdGU6DQo+ID4gT24gRnJpLCAyMDE3LTA1LTA1IGF0IDE1OjI1 IC0wNzAwLCBEYW4gV2lsbGlhbXMgd3JvdGU6DQo+ID4+IE9uIEZyaSwgTWF5IDUsIDIwMTcgYXQg MTozOSBQTSwgS2FuaSwgVG9zaGltaXRzdSA8dG9zaGkua2FuaUBocGUuY29tPg0KPiA+PiB3cm90 ZToNCj4gPiAgOg0KPiA+PiA+ID4gLS0tDQo+ID4+ID4gPiBDaGFuZ2VzIHNpbmNlIHRoZSBpbml0 aWFsIFJGQzoNCj4gPj4gPiA+ICogcy93cml0ZXRocnUvd3QvIHNpbmNlIHdlIGFscmVhZHkgaGF2 ZSBpb3JlbWFwX3d0KCksDQo+ID4+ID4gPiBzZXRfbWVtb3J5X3d0KCksIGV0Yy4gKEluZ28pDQo+ ID4+ID4NCj4gPj4gPiBTb3JyeSBJIHNob3VsZCBoYXZlIHNhaWQgZWFybGllciwgYnV0IEkgdGhp bmsgdGhlIHRlcm0gInd0IiBpcw0KPiA+PiA+IG1pc2xlYWRpbmcuICBOb24tdGVtcG9yYWwgc3Rv cmVzIHVzZWQgaW4gbWVtY3B5X3d0KCkgcHJvdmlkZSBXQw0KPiA+PiA+IHNlbWFudGljcywgbm90 IFdUIHNlbWFudGljcy4NCj4gPj4NCj4gPj4gVGhlIG5vbi10ZW1wb3JhbCBzdG9yZXMgZG8sIGJ1 dCBtZW1jcHlfd3QoKSBpcyB1c2luZyBhIGNvbWJpbmF0aW9uIG9mDQo+ID4+IG5vbi10ZW1wb3Jh bCBzdG9yZXMgYW5kIGV4cGxpY2l0IGNhY2hlIGZsdXNoaW5nLg0KPiA+Pg0KPiA+PiA+IEhvdyBh Ym91dCB1c2luZyAibm9jYWNoZSIgYXMgaXQncyBiZWVuDQo+ID4+ID4gdXNlZCBpbiBfX2NvcHlf dXNlcl9ub2NhY2hlKCk/DQo+ID4+DQo+ID4+IFRoZSBkaWZmZXJlbmNlIGluIG15IG1pbmQgaXMg dGhhdCB0aGUgIl9ub2NhY2hlIiBzdWZmaXggaW5kaWNhdGVzDQo+ID4+IG9wcG9ydHVuaXN0aWMg LyBvcHRpb25hbCBjYWNoZSBwb2xsdXRpb24gYXZvaWRhbmNlIHdoZXJlYXMgIl93dCINCj4gPj4g c3RyaWN0bHkgYXJyYW5nZXMgZm9yIGNhY2hlcyBub3QgdG8gY29udGFpbiBkaXJ0eSBkYXRhIHVw b24NCj4gPj4gY29tcGxldGlvbiBvZiB0aGUgcm91dGluZS4gRm9yIGV4YW1wbGUsIG5vbi10ZW1w b3JhbCBzdG9yZXMgb24gb2xkZXINCj4gPj4geDg2IGNwdXMgY291bGQgcG90ZW50aWFsbHkgbGVh dmUgZGlydHkgZGF0YSBpbiB0aGUgY2FjaGUsIHNvDQo+ID4+IG1lbWNweV93dCBvbiB0aG9zZSBj cHVzIHdvdWxkIG5lZWQgdG8gdXNlIGV4cGxpY2l0IGNhY2hlIGZsdXNoaW5nLg0KPiA+DQo+ID4g SSBzZWUuICBJIGFncmVlIHRoYXQgaXRzIGJlaGF2aW9yIGlzIGRpZmZlcmVudCBmcm9tIHRoZSBl eGlzdGluZyBvbmUNCj4gPiB3aXRoICJfbm9jYWNoZSIuICAgVGhhdCBzYWlkLCBJIHRoaW5rICJ3 dCIgb3IgIndyaXRlLXRocm91Z2giIGdlbmVyYWxseQ0KPiA+IG1lYW5zIHRoYXQgd3JpdGVzIGFs bG9jYXRlIGNhY2hlbGluZXMgYW5kIGtlZXAgdGhlbSBjbGVhbiBieSB3cml0aW5nIHRvDQo+ID4g bWVtb3J5LiAgU28sIHN1YnNlcXVlbnQgcmVhZHMgdG8gdGhlIGRlc3RpbmF0aW9uIHdpbGwgaGl0 IHRoZQ0KPiA+IGNhY2hlbGluZXMuICBUaGlzIGlzIG5vdCB0aGUgY2FzZSB3aXRoIHRoaXMgaW50 ZXJmYWNlLg0KPiANCj4gVHJ1ZS4uLiBtYXliZSBfbm9jYWNoZV9zdHJpY3QoKT8gT3IsIGxlYXZl IGl0IF93dCgpIHVudGlsIHNvbWVvbmUNCj4gY29tZXMgYWxvbmcgYW5kIGlzIHN1cnByaXNlZCB0 aGF0IHRoZSBjYWNoZSBpcyBub3Qgd2FybSBmb3IgcmVhZHMNCj4gYWZ0ZXIgbWVtY3B5X3d0KCks IGF0IHdoaWNoIHBvaW50IHdlIGNhbiBhc2sgIndoeSBub3QganVzdCB1c2UgcGxhaW4NCj4gbWVt Y3B5IHRoZW4/Iiwgb3Igc2V0IHRoZSBwYWdlLWF0dHJpYnV0ZXMgdG8gV1QuDQoNCkkgcHJlZmVy IF9ub2NhY2hlX3N0cmljdCgpLCBpZiBpdCdzIG5vdCB0b28gbG9uZywgc2luY2UgaXQgYXZvaWRz IGFueQ0KY29uZnVzaW9uLiAgSWYgb3RoZXIgYXJjaGVzIGFjdHVhbGx5IGltcGxlbWVudCBpdCB3 aXRoIFdUIHNlbWFudGljcywNCndlIG1pZ2h0IGJlY29tZSB0aGUgb25lIHRvIGNoYW5nZSBpdCwg aW5zdGVhZCBvZiB0aGUgY2FsbGVyLg0KDQpUaGFua3MsDQotVG9zaGkNCg0K ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-06 2:15 ` Dan Williams 2017-05-06 3:17 ` Kani, Toshimitsu @ 2017-05-06 9:46 ` Ingo Molnar 2017-05-06 13:57 ` Dan Williams 1 sibling, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2017-05-06 9:46 UTC (permalink / raw) To: Dan Williams Cc: Kani, Toshimitsu, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz * Dan Williams <dan.j.williams@intel.com> wrote: > On Fri, May 5, 2017 at 3:44 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > On Fri, 2017-05-05 at 15:25 -0700, Dan Williams wrote: > >> On Fri, May 5, 2017 at 1:39 PM, Kani, Toshimitsu <toshi.kani@hpe.com> > >> wrote: > > : > >> > > --- > >> > > Changes since the initial RFC: > >> > > * s/writethru/wt/ since we already have ioremap_wt(), > >> > > set_memory_wt(), etc. (Ingo) > >> > > >> > Sorry I should have said earlier, but I think the term "wt" is > >> > misleading. Non-temporal stores used in memcpy_wt() provide WC > >> > semantics, not WT semantics. > >> > >> The non-temporal stores do, but memcpy_wt() is using a combination of > >> non-temporal stores and explicit cache flushing. > >> > >> > How about using "nocache" as it's been > >> > used in __copy_user_nocache()? > >> > >> The difference in my mind is that the "_nocache" suffix indicates > >> opportunistic / optional cache pollution avoidance whereas "_wt" > >> strictly arranges for caches not to contain dirty data upon > >> completion of the routine. For example, non-temporal stores on older > >> x86 cpus could potentially leave dirty data in the cache, so > >> memcpy_wt on those cpus would need to use explicit cache flushing. > > > > I see. I agree that its behavior is different from the existing one > > with "_nocache". That said, I think "wt" or "write-through" generally > > means that writes allocate cachelines and keep them clean by writing to > > memory. So, subsequent reads to the destination will hit the > > cachelines. This is not the case with this interface. > > True... maybe _nocache_strict()? Or, leave it _wt() until someone > comes along and is surprised that the cache is not warm for reads > after memcpy_wt(), at which point we can ask "why not just use plain > memcpy then?", or set the page-attributes to WT. Perhaps a _nocache_flush() postfix, to signal both that it's non-temporal and that no cache line is left around afterwards (dirty or clean)? Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-06 9:46 ` Ingo Molnar @ 2017-05-06 13:57 ` Dan Williams 2017-05-07 8:57 ` Ingo Molnar 0 siblings, 1 reply; 16+ messages in thread From: Dan Williams @ 2017-05-06 13:57 UTC (permalink / raw) To: Ingo Molnar Cc: Kani, Toshimitsu, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz On Sat, May 6, 2017 at 2:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Dan Williams <dan.j.williams@intel.com> wrote: > >> On Fri, May 5, 2017 at 3:44 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> > On Fri, 2017-05-05 at 15:25 -0700, Dan Williams wrote: >> >> On Fri, May 5, 2017 at 1:39 PM, Kani, Toshimitsu <toshi.kani@hpe.com> >> >> wrote: >> > : >> >> > > --- >> >> > > Changes since the initial RFC: >> >> > > * s/writethru/wt/ since we already have ioremap_wt(), >> >> > > set_memory_wt(), etc. (Ingo) >> >> > >> >> > Sorry I should have said earlier, but I think the term "wt" is >> >> > misleading. Non-temporal stores used in memcpy_wt() provide WC >> >> > semantics, not WT semantics. >> >> >> >> The non-temporal stores do, but memcpy_wt() is using a combination of >> >> non-temporal stores and explicit cache flushing. >> >> >> >> > How about using "nocache" as it's been >> >> > used in __copy_user_nocache()? >> >> >> >> The difference in my mind is that the "_nocache" suffix indicates >> >> opportunistic / optional cache pollution avoidance whereas "_wt" >> >> strictly arranges for caches not to contain dirty data upon >> >> completion of the routine. For example, non-temporal stores on older >> >> x86 cpus could potentially leave dirty data in the cache, so >> >> memcpy_wt on those cpus would need to use explicit cache flushing. >> > >> > I see. I agree that its behavior is different from the existing one >> > with "_nocache". That said, I think "wt" or "write-through" generally >> > means that writes allocate cachelines and keep them clean by writing to >> > memory. So, subsequent reads to the destination will hit the >> > cachelines. This is not the case with this interface. >> >> True... maybe _nocache_strict()? Or, leave it _wt() until someone >> comes along and is surprised that the cache is not warm for reads >> after memcpy_wt(), at which point we can ask "why not just use plain >> memcpy then?", or set the page-attributes to WT. > > Perhaps a _nocache_flush() postfix, to signal both that it's non-temporal and that > no cache line is left around afterwards (dirty or clean)? Yes, I think "flush" belongs in the name, and to make it easily grep-able separate from _nocache we can call it _flushcache? An efficient implementation will use _nocache / non-temporal stores internally, but external consumers just care about the state of the cache after the call. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-06 13:57 ` Dan Williams @ 2017-05-07 8:57 ` Ingo Molnar 2017-05-08 3:01 ` Kani, Toshimitsu 0 siblings, 1 reply; 16+ messages in thread From: Ingo Molnar @ 2017-05-07 8:57 UTC (permalink / raw) To: Dan Williams Cc: Kani, Toshimitsu, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz * Dan Williams <dan.j.williams@intel.com> wrote: > On Sat, May 6, 2017 at 2:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > * Dan Williams <dan.j.williams@intel.com> wrote: > > > >> On Fri, May 5, 2017 at 3:44 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > >> > On Fri, 2017-05-05 at 15:25 -0700, Dan Williams wrote: > >> >> On Fri, May 5, 2017 at 1:39 PM, Kani, Toshimitsu <toshi.kani@hpe.com> > >> >> wrote: > >> > : > >> >> > > --- > >> >> > > Changes since the initial RFC: > >> >> > > * s/writethru/wt/ since we already have ioremap_wt(), > >> >> > > set_memory_wt(), etc. (Ingo) > >> >> > > >> >> > Sorry I should have said earlier, but I think the term "wt" is > >> >> > misleading. Non-temporal stores used in memcpy_wt() provide WC > >> >> > semantics, not WT semantics. > >> >> > >> >> The non-temporal stores do, but memcpy_wt() is using a combination of > >> >> non-temporal stores and explicit cache flushing. > >> >> > >> >> > How about using "nocache" as it's been > >> >> > used in __copy_user_nocache()? > >> >> > >> >> The difference in my mind is that the "_nocache" suffix indicates > >> >> opportunistic / optional cache pollution avoidance whereas "_wt" > >> >> strictly arranges for caches not to contain dirty data upon > >> >> completion of the routine. For example, non-temporal stores on older > >> >> x86 cpus could potentially leave dirty data in the cache, so > >> >> memcpy_wt on those cpus would need to use explicit cache flushing. > >> > > >> > I see. I agree that its behavior is different from the existing one > >> > with "_nocache". That said, I think "wt" or "write-through" generally > >> > means that writes allocate cachelines and keep them clean by writing to > >> > memory. So, subsequent reads to the destination will hit the > >> > cachelines. This is not the case with this interface. > >> > >> True... maybe _nocache_strict()? Or, leave it _wt() until someone > >> comes along and is surprised that the cache is not warm for reads > >> after memcpy_wt(), at which point we can ask "why not just use plain > >> memcpy then?", or set the page-attributes to WT. > > > > Perhaps a _nocache_flush() postfix, to signal both that it's non-temporal and that > > no cache line is left around afterwards (dirty or clean)? > > Yes, I think "flush" belongs in the name, and to make it easily > grep-able separate from _nocache we can call it _flushcache? An > efficient implementation will use _nocache / non-temporal stores > internally, but external consumers just care about the state of the > cache after the call. _flushcache() works for me too. Thanks, Ingo ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-07 8:57 ` Ingo Molnar @ 2017-05-08 3:01 ` Kani, Toshimitsu 0 siblings, 0 replies; 16+ messages in thread From: Kani, Toshimitsu @ 2017-05-08 3:01 UTC (permalink / raw) To: Ingo Molnar, Dan Williams Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jmoyer@redhat.com, tglx@linutronix.de, hch@lst.de, viro@zeniv.linux.org.uk, x86@kernel.org, mawilcox@microsoft.com, hpa@zytor.com, linux-nvdimm@lists.01.org, mingo@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, jack@suse.cz > * Dan Williams <dan.j.williams@intel.com> wrote: > > > On Sat, May 6, 2017 at 2:46 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Dan Williams <dan.j.williams@intel.com> wrote: > > > > > >> On Fri, May 5, 2017 at 3:44 PM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > >> > On Fri, 2017-05-05 at 15:25 -0700, Dan Williams wrote: > > >> >> On Fri, May 5, 2017 at 1:39 PM, Kani, Toshimitsu > <toshi.kani@hpe.com> > > >> >> wrote: > > >> > : > > >> >> > > --- > > >> >> > > Changes since the initial RFC: > > >> >> > > * s/writethru/wt/ since we already have ioremap_wt(), > > >> >> > > set_memory_wt(), etc. (Ingo) > > >> >> > > > >> >> > Sorry I should have said earlier, but I think the term "wt" is > > >> >> > misleading. Non-temporal stores used in memcpy_wt() provide WC > > >> >> > semantics, not WT semantics. > > >> >> > > >> >> The non-temporal stores do, but memcpy_wt() is using a combination > of > > >> >> non-temporal stores and explicit cache flushing. > > >> >> > > >> >> > How about using "nocache" as it's been > > >> >> > used in __copy_user_nocache()? > > >> >> > > >> >> The difference in my mind is that the "_nocache" suffix indicates > > >> >> opportunistic / optional cache pollution avoidance whereas "_wt" > > >> >> strictly arranges for caches not to contain dirty data upon > > >> >> completion of the routine. For example, non-temporal stores on older > > >> >> x86 cpus could potentially leave dirty data in the cache, so > > >> >> memcpy_wt on those cpus would need to use explicit cache flushing. > > >> > > > >> > I see. I agree that its behavior is different from the existing one > > >> > with "_nocache". That said, I think "wt" or "write-through" generally > > >> > means that writes allocate cachelines and keep them clean by writing > to > > >> > memory. So, subsequent reads to the destination will hit the > > >> > cachelines. This is not the case with this interface. > > >> > > >> True... maybe _nocache_strict()? Or, leave it _wt() until someone > > >> comes along and is surprised that the cache is not warm for reads > > >> after memcpy_wt(), at which point we can ask "why not just use plain > > >> memcpy then?", or set the page-attributes to WT. > > > > > > Perhaps a _nocache_flush() postfix, to signal both that it's non-temporal > and that > > > no cache line is left around afterwards (dirty or clean)? > > > > Yes, I think "flush" belongs in the name, and to make it easily > > grep-able separate from _nocache we can call it _flushcache? An > > efficient implementation will use _nocache / non-temporal stores > > internally, but external consumers just care about the state of the > > cache after the call. > > _flushcache() works for me too. > Works for me too. Thanks, -Toshi ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-04-28 19:39 ` [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations Dan Williams 2017-05-05 6:54 ` Ingo Molnar 2017-05-05 20:39 ` Kani, Toshimitsu @ 2017-05-08 20:32 ` Ross Zwisler 2017-05-08 20:40 ` Dan Williams 2 siblings, 1 reply; 16+ messages in thread From: Ross Zwisler @ 2017-05-08 20:32 UTC (permalink / raw) To: Dan Williams Cc: viro, Jan Kara, Matthew Wilcox, x86, linux-kernel, hch, linux-block, linux-nvdimm, jmoyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner, ross.zwisler On Fri, Apr 28, 2017 at 12:39:12PM -0700, Dan Williams wrote: > The pmem driver has a need to transfer data with a persistent memory > destination and be able to rely on the fact that the destination writes > are not cached. It is sufficient for the writes to be flushed to a > cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect > userspace to call fsync() to ensure data-writes have reached a > power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or > REQ_FLUSH to the pmem driver which will turn around and fence previous > writes with an "sfence". > > Implement a __copy_from_user_inatomic_wt, memcpy_page_wt, and memcpy_wt, > that guarantee that the destination buffer is not dirty in the cpu cache > on completion. The new copy_from_iter_wt and sub-routines will be used > to replace the "pmem api" (include/linux/pmem.h + > arch/x86/include/asm/pmem.h). The availability of copy_from_iter_wt() > and memcpy_wt() are gated by the CONFIG_ARCH_HAS_UACCESS_WT config > symbol, and fallback to copy_from_iter_nocache() and plain memcpy() > otherwise. > > This is meant to satisfy the concern from Linus that if a driver wants > to do something beyond the normal nocache semantics it should be > something private to that driver [1], and Al's concern that anything > uaccess related belongs with the rest of the uaccess code [2]. > > [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html > [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html > > Cc: <x86@kernel.org> > Cc: Jan Kara <jack@suse.cz> > Cc: Jeff Moyer <jmoyer@redhat.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Matthew Wilcox <mawilcox@microsoft.com> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- <> > diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h > index c5504b9a472e..07ded30c7e89 100644 > --- a/arch/x86/include/asm/uaccess_64.h > +++ b/arch/x86/include/asm/uaccess_64.h > @@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne > extern long __copy_user_nocache(void *dst, const void __user *src, > unsigned size, int zerorest); > > +extern long __copy_user_wt(void *dst, const void __user *src, unsigned size); > +extern void memcpy_page_wt(char *to, struct page *page, size_t offset, > + size_t len); > + > static inline int > __copy_from_user_inatomic_nocache(void *dst, const void __user *src, > unsigned size) > @@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src, > return __copy_user_nocache(dst, src, size, 0); > } > > +static inline int > +__copy_from_user_inatomic_wt(void *dst, const void __user *src, unsigned size) > +{ > + kasan_check_write(dst, size); > + return __copy_user_wt(dst, src, size); > +} > + > unsigned long > copy_user_handle_tail(char *to, char *from, unsigned len); > > diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c > index 3b7c40a2e3e1..0aeff66a022f 100644 > --- a/arch/x86/lib/usercopy_64.c > +++ b/arch/x86/lib/usercopy_64.c > @@ -7,6 +7,7 @@ > */ > #include <linux/export.h> > #include <linux/uaccess.h> > +#include <linux/highmem.h> > > /* > * Zero Userspace > @@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len) > clac(); > return len; > } > + > +#ifdef CONFIG_ARCH_HAS_UACCESS_WT > +/** > + * clean_cache_range - write back a cache range with CLWB > + * @vaddr: virtual start address > + * @size: number of bytes to write back > + * > + * Write back a cache range using the CLWB (cache line write back) > + * instruction. Note that @size is internally rounded up to be cache > + * line size aligned. > + */ > +static void clean_cache_range(void *addr, size_t size) > +{ > + u16 x86_clflush_size = boot_cpu_data.x86_clflush_size; > + unsigned long clflush_mask = x86_clflush_size - 1; > + void *vend = addr + size; > + void *p; > + > + for (p = (void *)((unsigned long)addr & ~clflush_mask); > + p < vend; p += x86_clflush_size) > + clwb(p); > +} > + > +long __copy_user_wt(void *dst, const void __user *src, unsigned size) > +{ > + unsigned long flushed, dest = (unsigned long) dst; > + long rc = __copy_user_nocache(dst, src, size, 0); > + > + /* > + * __copy_user_nocache() uses non-temporal stores for the bulk > + * of the transfer, but we need to manually flush if the > + * transfer is unaligned. A cached memory copy is used when > + * destination or size is not naturally aligned. That is: > + * - Require 8-byte alignment when size is 8 bytes or larger. > + * - Require 4-byte alignment when size is 4 bytes. > + */ > + if (size < 8) { > + if (!IS_ALIGNED(dest, 4) || size != 4) > + clean_cache_range(dst, 1); > + } else { > + if (!IS_ALIGNED(dest, 8)) { > + dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); > + clean_cache_range(dst, 1); > + } > + > + flushed = dest - (unsigned long) dst; > + if (size > flushed && !IS_ALIGNED(size - flushed, 8)) > + clean_cache_range(dst + size - 1, 1); > + } > + > + return rc; > +} > + > +void memcpy_wt(void *_dst, const void *_src, size_t size) > +{ > + unsigned long dest = (unsigned long) _dst; > + unsigned long source = (unsigned long) _src; > + > + /* cache copy and flush to align dest */ > + if (!IS_ALIGNED(dest, 8)) { > + unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest); > + > + memcpy((void *) dest, (void *) source, len); > + clean_cache_range((void *) dest, len); > + dest += len; > + source += len; > + size -= len; > + if (!size) > + return; > + } > + > + /* 4x8 movnti loop */ > + while (size >= 32) { > + asm("movq (%0), %%r8\n" > + "movq 8(%0), %%r9\n" > + "movq 16(%0), %%r10\n" > + "movq 24(%0), %%r11\n" > + "movnti %%r8, (%1)\n" > + "movnti %%r9, 8(%1)\n" > + "movnti %%r10, 16(%1)\n" > + "movnti %%r11, 24(%1)\n" > + :: "r" (source), "r" (dest) > + : "memory", "r8", "r9", "r10", "r11"); > + dest += 32; > + source += 32; > + size -= 32; > + } > + > + /* 1x8 movnti loop */ > + while (size >= 8) { > + asm("movq (%0), %%r8\n" > + "movnti %%r8, (%1)\n" > + :: "r" (source), "r" (dest) > + : "memory", "r8"); > + dest += 8; > + source += 8; > + size -= 8; > + } > + > + /* 1x4 movnti loop */ > + while (size >= 4) { > + asm("movl (%0), %%r8d\n" > + "movnti %%r8d, (%1)\n" > + :: "r" (source), "r" (dest) > + : "memory", "r8"); > + dest += 4; > + source += 4; > + size -= 4; > + } > + > + /* cache copy for remaining bytes */ > + if (size) { > + memcpy((void *) dest, (void *) source, size); > + clean_cache_range((void *) dest, size); > + } > +} > +EXPORT_SYMBOL_GPL(memcpy_wt); I took a pretty hard look at the changes in arch/x86/lib/usercopy_64.c, and they look correct to me. The inline assembly for non-temporal copies mixed with C for loop control is IMHO much easier to follow than the pure assembly of __copy_user_nocache(). Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations 2017-05-08 20:32 ` Ross Zwisler @ 2017-05-08 20:40 ` Dan Williams 0 siblings, 0 replies; 16+ messages in thread From: Dan Williams @ 2017-05-08 20:40 UTC (permalink / raw) To: Ross Zwisler, Dan Williams, Al Viro, Jan Kara, Matthew Wilcox, X86 ML, linux-kernel@vger.kernel.org, Christoph Hellwig, linux-block, linux-nvdimm@lists.01.org, jmoyer, Ingo Molnar, H. Peter Anvin, linux-fsdevel, Thomas Gleixner On Mon, May 8, 2017 at 1:32 PM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > On Fri, Apr 28, 2017 at 12:39:12PM -0700, Dan Williams wrote: >> The pmem driver has a need to transfer data with a persistent memory >> destination and be able to rely on the fact that the destination writes >> are not cached. It is sufficient for the writes to be flushed to a >> cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect >> userspace to call fsync() to ensure data-writes have reached a >> power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or >> REQ_FLUSH to the pmem driver which will turn around and fence previous >> writes with an "sfence". >> >> Implement a __copy_from_user_inatomic_wt, memcpy_page_wt, and memcpy_wt, >> that guarantee that the destination buffer is not dirty in the cpu cache >> on completion. The new copy_from_iter_wt and sub-routines will be used >> to replace the "pmem api" (include/linux/pmem.h + >> arch/x86/include/asm/pmem.h). The availability of copy_from_iter_wt() >> and memcpy_wt() are gated by the CONFIG_ARCH_HAS_UACCESS_WT config >> symbol, and fallback to copy_from_iter_nocache() and plain memcpy() >> otherwise. >> >> This is meant to satisfy the concern from Linus that if a driver wants >> to do something beyond the normal nocache semantics it should be >> something private to that driver [1], and Al's concern that anything >> uaccess related belongs with the rest of the uaccess code [2]. >> >> [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html >> [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html >> >> Cc: <x86@kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Cc: Jeff Moyer <jmoyer@redhat.com> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: Al Viro <viro@zeniv.linux.org.uk> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Matthew Wilcox <mawilcox@microsoft.com> >> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> [..] > I took a pretty hard look at the changes in arch/x86/lib/usercopy_64.c, and > they look correct to me. The inline assembly for non-temporal copies mixed > with C for loop control is IMHO much easier to follow than the pure assembly > of __copy_user_nocache(). > > Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Thanks Ross, I appreciate it. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-05-08 20:40 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170425012230.GX29622@ZenIV.linux.org.uk>
2017-04-26 21:56 ` [RFC PATCH] x86, uaccess, pmem: introduce copy_from_iter_writethru for dax + pmem Dan Williams
2017-04-27 6:30 ` Ingo Molnar
2017-04-28 19:39 ` [PATCH v2] x86, uaccess: introduce copy_from_iter_wt for pmem / writethrough operations Dan Williams
2017-05-05 6:54 ` Ingo Molnar
2017-05-05 14:12 ` Dan Williams
2017-05-05 20:39 ` Kani, Toshimitsu
2017-05-05 22:25 ` Dan Williams
2017-05-05 22:44 ` Kani, Toshimitsu
2017-05-06 2:15 ` Dan Williams
2017-05-06 3:17 ` Kani, Toshimitsu
2017-05-06 9:46 ` Ingo Molnar
2017-05-06 13:57 ` Dan Williams
2017-05-07 8:57 ` Ingo Molnar
2017-05-08 3:01 ` Kani, Toshimitsu
2017-05-08 20:32 ` Ross Zwisler
2017-05-08 20:40 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).