All of lore.kernel.org
 help / color / mirror / Atom feed
* [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
@ 2017-04-19 17:05 ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
  To: stable
  Cc: Jan Kara, Matthew Wilcox, x86, Jens Axboe, linux-nvdimm,
	Ingo Molnar, Al Viro, H. Peter Anvin, Robert Hu, Thomas Gleixner,
	Christoph Hellwig

Hi -stable team,

Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
__copy_user_nocache cache-bypass assumptions", and another block layer
fix that allows the libnvdimm unit tests to run.

I have copied Jens and Jan on the block-layer fix in case they have any
concerns.

---

Dan Williams (2):
      x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
      block: fix del_gendisk() vs blkdev_ioctl crash


 arch/x86/include/asm/pmem.h |   45 +++++++++++++++++++++++++++++++------------
 block/genhd.c               |    1 -
 2 files changed, 32 insertions(+), 14 deletions(-)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
@ 2017-04-19 17:05 ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
  To: stable
  Cc: Jan Kara, Toshi Kani, Matthew Wilcox, x86, Christoph Hellwig,
	Jens Axboe, linux-nvdimm, Jeff Moyer, Ingo Molnar, Al Viro,
	H. Peter Anvin, Robert Hu, Thomas Gleixner, Ross Zwisler

Hi -stable team,

Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
__copy_user_nocache cache-bypass assumptions", and another block layer
fix that allows the libnvdimm unit tests to run.

I have copied Jens and Jan on the block-layer fix in case they have any
concerns.

---

Dan Williams (2):
      x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
      block: fix del_gendisk() vs blkdev_ioctl crash


 arch/x86/include/asm/pmem.h |   45 +++++++++++++++++++++++++++++++------------
 block/genhd.c               |    1 -
 2 files changed, 32 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [4.4-stable PATCH 1/2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-19 17:05 ` Dan Williams
@ 2017-04-19 17:05   ` Dan Williams
  -1 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
  To: stable
  Cc: Jan Kara, Matthew Wilcox, linux-nvdimm, x86, Ingo Molnar, Al Viro,
	H. Peter Anvin, Thomas Gleixner, Christoph Hellwig

commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream.

Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.

Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.

Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/pmem.h |   45 +++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index d8ce3ec816ab..bd8ce6bcdfc9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -72,8 +72,8 @@ static inline void arch_wmb_pmem(void)
  * @size:	number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
- * instruction.  This function requires explicit ordering with an
- * arch_wmb_pmem() call.  This API is internal to the x86 PMEM implementation.
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
  */
 static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
 {
@@ -87,15 +87,6 @@ static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
 		clwb(p);
 }
 
-/*
- * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec
- * iterators, so for other types (bvec & kvec) we must do a cache write-back.
- */
-static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
-{
-	return iter_is_iovec(i) == false;
-}
-
 /**
  * arch_copy_from_iter_pmem - copy data from an iterator to PMEM
  * @addr:	PMEM destination address
@@ -114,8 +105,36 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
 	/* TODO: skip the write-back by always using non-temporal stores */
 	len = copy_from_iter_nocache(vaddr, bytes, i);
 
-	if (__iter_needs_pmem_wb(i))
-		__arch_wb_cache_pmem(vaddr, bytes);
+	/*
+	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
+	 * non-temporal stores for the bulk of the transfer, but we need
+	 * to manually flush if the transfer is unaligned. A cached
+	 * memory copy is used when destination or size is not naturally
+	 * aligned. That is:
+	 *   - Require 8-byte alignment when size is 8 bytes or larger.
+	 *   - Require 4-byte alignment when size is 4 bytes.
+	 *
+	 * In the non-iovec case the entire destination needs to be
+	 * flushed.
+	 */
+	if (iter_is_iovec(i)) {
+		unsigned long flushed, dest = (unsigned long) addr;
+
+		if (bytes < 8) {
+			if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+				__arch_wb_cache_pmem(addr, 1);
+		} else {
+			if (!IS_ALIGNED(dest, 8)) {
+				dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+				__arch_wb_cache_pmem(addr, 1);
+			}
+
+			flushed = dest - (unsigned long) addr;
+			if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+				__arch_wb_cache_pmem(addr + bytes - 1, 1);
+		}
+	} else
+		__arch_wb_cache_pmem(addr, bytes);
 
 	return len;
 }

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [4.4-stable PATCH 1/2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
@ 2017-04-19 17:05   ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
  To: stable
  Cc: Jan Kara, Toshi Kani, Matthew Wilcox, x86, Christoph Hellwig,
	linux-nvdimm, Jeff Moyer, Ingo Molnar, Al Viro, H. Peter Anvin,
	Thomas Gleixner, Ross Zwisler

commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream.

Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.

Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.

Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/pmem.h |   45 +++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index d8ce3ec816ab..bd8ce6bcdfc9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -72,8 +72,8 @@ static inline void arch_wmb_pmem(void)
  * @size:	number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
- * instruction.  This function requires explicit ordering with an
- * arch_wmb_pmem() call.  This API is internal to the x86 PMEM implementation.
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
  */
 static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
 {
@@ -87,15 +87,6 @@ static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
 		clwb(p);
 }
 
-/*
- * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec
- * iterators, so for other types (bvec & kvec) we must do a cache write-back.
- */
-static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
-{
-	return iter_is_iovec(i) == false;
-}
-
 /**
  * arch_copy_from_iter_pmem - copy data from an iterator to PMEM
  * @addr:	PMEM destination address
@@ -114,8 +105,36 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
 	/* TODO: skip the write-back by always using non-temporal stores */
 	len = copy_from_iter_nocache(vaddr, bytes, i);
 
-	if (__iter_needs_pmem_wb(i))
-		__arch_wb_cache_pmem(vaddr, bytes);
+	/*
+	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
+	 * non-temporal stores for the bulk of the transfer, but we need
+	 * to manually flush if the transfer is unaligned. A cached
+	 * memory copy is used when destination or size is not naturally
+	 * aligned. That is:
+	 *   - Require 8-byte alignment when size is 8 bytes or larger.
+	 *   - Require 4-byte alignment when size is 4 bytes.
+	 *
+	 * In the non-iovec case the entire destination needs to be
+	 * flushed.
+	 */
+	if (iter_is_iovec(i)) {
+		unsigned long flushed, dest = (unsigned long) addr;
+
+		if (bytes < 8) {
+			if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+				__arch_wb_cache_pmem(addr, 1);
+		} else {
+			if (!IS_ALIGNED(dest, 8)) {
+				dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+				__arch_wb_cache_pmem(addr, 1);
+			}
+
+			flushed = dest - (unsigned long) addr;
+			if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+				__arch_wb_cache_pmem(addr + bytes - 1, 1);
+		}
+	} else
+		__arch_wb_cache_pmem(addr, bytes);
 
 	return len;
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [4.4-stable PATCH 2/2] block: fix del_gendisk() vs blkdev_ioctl crash
  2017-04-19 17:05 ` Dan Williams
@ 2017-04-19 17:05   ` Dan Williams
  -1 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
  To: stable; +Cc: Jens Axboe, Robert Hu, Jan Kara, linux-nvdimm

commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk->driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear ->driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its ->parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)->parent, but lets fix it globally since
->driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
 CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
 [..]
 Call Trace:
  [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
  [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
  [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
  [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
  [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
  [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
  [<ffffffff812916dd>] block_ioctl+0x3d/0x50
  [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
  [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
  [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
  [<ffffffff81264f09>] SyS_ioctl+0x79/0x90
  [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Reported-by: Robert Hu <robert.hu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index a5bed6bc869d..3032453a89e6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -664,7 +664,6 @@ void del_gendisk(struct gendisk *disk)
 
 	kobject_put(disk->part0.holder_dir);
 	kobject_put(disk->slave_dir);
-	disk->driverfs_dev = NULL;
 	if (!sysfs_deprecated)
 		sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
 	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [4.4-stable PATCH 2/2] block: fix del_gendisk() vs blkdev_ioctl crash
@ 2017-04-19 17:05   ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
  To: stable; +Cc: Jens Axboe, Robert Hu, Jan Kara, linux-nvdimm

commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk->driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear ->driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its ->parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)->parent, but lets fix it globally since
->driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
 CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
 [..]
 Call Trace:
  [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
  [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
  [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
  [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
  [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
  [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
  [<ffffffff812916dd>] block_ioctl+0x3d/0x50
  [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
  [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
  [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
  [<ffffffff81264f09>] SyS_ioctl+0x79/0x90
  [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Reported-by: Robert Hu <robert.hu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index a5bed6bc869d..3032453a89e6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -664,7 +664,6 @@ void del_gendisk(struct gendisk *disk)
 
 	kobject_put(disk->part0.holder_dir);
 	kobject_put(disk->slave_dir);
-	disk->driverfs_dev = NULL;
 	if (!sysfs_deprecated)
 		sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
 	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
  2017-04-19 17:05 ` Dan Williams
@ 2017-04-25 12:07   ` Greg KH
  -1 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2017-04-25 12:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, Matthew Wilcox, x86, stable, Jens Axboe, linux-nvdimm,
	Ingo Molnar, Al Viro, H. Peter Anvin, Robert Hu, Thomas Gleixner,
	Christoph Hellwig

On Wed, Apr 19, 2017 at 10:05:41AM -0700, Dan Williams wrote:
> Hi -stable team,
> 
> Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
> __copy_user_nocache cache-bypass assumptions", and another block layer
> fix that allows the libnvdimm unit tests to run.
> 
> I have copied Jens and Jan on the block-layer fix in case they have any
> concerns.

Looks good, thanks.

greg k-h
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
@ 2017-04-25 12:07   ` Greg KH
  0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2017-04-25 12:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: stable, Jan Kara, Toshi Kani, Matthew Wilcox, x86,
	Christoph Hellwig, Jens Axboe, linux-nvdimm, Jeff Moyer,
	Ingo Molnar, Al Viro, H. Peter Anvin, Robert Hu, Thomas Gleixner,
	Ross Zwisler

On Wed, Apr 19, 2017 at 10:05:41AM -0700, Dan Williams wrote:
> Hi -stable team,
> 
> Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
> __copy_user_nocache cache-bypass assumptions", and another block layer
> fix that allows the libnvdimm unit tests to run.
> 
> I have copied Jens and Jan on the block-layer fix in case they have any
> concerns.

Looks good, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-25 12:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-19 17:05 [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4 Dan Williams
2017-04-19 17:05 ` Dan Williams
2017-04-19 17:05 ` [4.4-stable PATCH 1/2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions Dan Williams
2017-04-19 17:05   ` Dan Williams
2017-04-19 17:05 ` [4.4-stable PATCH 2/2] block: fix del_gendisk() vs blkdev_ioctl crash Dan Williams
2017-04-19 17:05   ` Dan Williams
2017-04-25 12:07 ` [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4 Greg KH
2017-04-25 12:07   ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.