* [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
@ 2017-04-19 17:05 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
To: stable
Cc: Jan Kara, Matthew Wilcox, x86, Jens Axboe, linux-nvdimm,
Ingo Molnar, Al Viro, H. Peter Anvin, Robert Hu, Thomas Gleixner,
Christoph Hellwig
Hi -stable team,
Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
__copy_user_nocache cache-bypass assumptions", and another block layer
fix that allows the libnvdimm unit tests to run.
I have copied Jens and Jan on the block-layer fix in case they have any
concerns.
---
Dan Williams (2):
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
block: fix del_gendisk() vs blkdev_ioctl crash
arch/x86/include/asm/pmem.h | 45 +++++++++++++++++++++++++++++++------------
block/genhd.c | 1 -
2 files changed, 32 insertions(+), 14 deletions(-)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 8+ messages in thread
* [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
@ 2017-04-19 17:05 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
To: stable
Cc: Jan Kara, Toshi Kani, Matthew Wilcox, x86, Christoph Hellwig,
Jens Axboe, linux-nvdimm, Jeff Moyer, Ingo Molnar, Al Viro,
H. Peter Anvin, Robert Hu, Thomas Gleixner, Ross Zwisler
Hi -stable team,
Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
__copy_user_nocache cache-bypass assumptions", and another block layer
fix that allows the libnvdimm unit tests to run.
I have copied Jens and Jan on the block-layer fix in case they have any
concerns.
---
Dan Williams (2):
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
block: fix del_gendisk() vs blkdev_ioctl crash
arch/x86/include/asm/pmem.h | 45 +++++++++++++++++++++++++++++++------------
block/genhd.c | 1 -
2 files changed, 32 insertions(+), 14 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [4.4-stable PATCH 1/2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
2017-04-19 17:05 ` Dan Williams
@ 2017-04-19 17:05 ` Dan Williams
-1 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
To: stable
Cc: Jan Kara, Matthew Wilcox, linux-nvdimm, x86, Ingo Molnar, Al Viro,
H. Peter Anvin, Thomas Gleixner, Christoph Hellwig
commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream.
Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.
Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.
Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/x86/include/asm/pmem.h | 45 +++++++++++++++++++++++++++++++------------
1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index d8ce3ec816ab..bd8ce6bcdfc9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -72,8 +72,8 @@ static inline void arch_wmb_pmem(void)
* @size: number of bytes to write back
*
* Write back a cache range using the CLWB (cache line write back)
- * instruction. This function requires explicit ordering with an
- * arch_wmb_pmem() call. This API is internal to the x86 PMEM implementation.
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
*/
static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
{
@@ -87,15 +87,6 @@ static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
clwb(p);
}
-/*
- * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec
- * iterators, so for other types (bvec & kvec) we must do a cache write-back.
- */
-static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
-{
- return iter_is_iovec(i) == false;
-}
-
/**
* arch_copy_from_iter_pmem - copy data from an iterator to PMEM
* @addr: PMEM destination address
@@ -114,8 +105,36 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
/* TODO: skip the write-back by always using non-temporal stores */
len = copy_from_iter_nocache(vaddr, bytes, i);
- if (__iter_needs_pmem_wb(i))
- __arch_wb_cache_pmem(vaddr, bytes);
+ /*
+ * In the iovec case on x86_64 copy_from_iter_nocache() uses
+ * non-temporal stores for the bulk of the transfer, but we need
+ * to manually flush if the transfer is unaligned. A cached
+ * memory copy is used when destination or size is not naturally
+ * aligned. That is:
+ * - Require 8-byte alignment when size is 8 bytes or larger.
+ * - Require 4-byte alignment when size is 4 bytes.
+ *
+ * In the non-iovec case the entire destination needs to be
+ * flushed.
+ */
+ if (iter_is_iovec(i)) {
+ unsigned long flushed, dest = (unsigned long) addr;
+
+ if (bytes < 8) {
+ if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+ __arch_wb_cache_pmem(addr, 1);
+ } else {
+ if (!IS_ALIGNED(dest, 8)) {
+ dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+ __arch_wb_cache_pmem(addr, 1);
+ }
+
+ flushed = dest - (unsigned long) addr;
+ if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+ __arch_wb_cache_pmem(addr + bytes - 1, 1);
+ }
+ } else
+ __arch_wb_cache_pmem(addr, bytes);
return len;
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [4.4-stable PATCH 1/2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
@ 2017-04-19 17:05 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
To: stable
Cc: Jan Kara, Toshi Kani, Matthew Wilcox, x86, Christoph Hellwig,
linux-nvdimm, Jeff Moyer, Ingo Molnar, Al Viro, H. Peter Anvin,
Thomas Gleixner, Ross Zwisler
commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream.
Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.
Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.
Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/x86/include/asm/pmem.h | 45 +++++++++++++++++++++++++++++++------------
1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index d8ce3ec816ab..bd8ce6bcdfc9 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -72,8 +72,8 @@ static inline void arch_wmb_pmem(void)
* @size: number of bytes to write back
*
* Write back a cache range using the CLWB (cache line write back)
- * instruction. This function requires explicit ordering with an
- * arch_wmb_pmem() call. This API is internal to the x86 PMEM implementation.
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
*/
static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
{
@@ -87,15 +87,6 @@ static inline void __arch_wb_cache_pmem(void *vaddr, size_t size)
clwb(p);
}
-/*
- * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec
- * iterators, so for other types (bvec & kvec) we must do a cache write-back.
- */
-static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
-{
- return iter_is_iovec(i) == false;
-}
-
/**
* arch_copy_from_iter_pmem - copy data from an iterator to PMEM
* @addr: PMEM destination address
@@ -114,8 +105,36 @@ static inline size_t arch_copy_from_iter_pmem(void __pmem *addr, size_t bytes,
/* TODO: skip the write-back by always using non-temporal stores */
len = copy_from_iter_nocache(vaddr, bytes, i);
- if (__iter_needs_pmem_wb(i))
- __arch_wb_cache_pmem(vaddr, bytes);
+ /*
+ * In the iovec case on x86_64 copy_from_iter_nocache() uses
+ * non-temporal stores for the bulk of the transfer, but we need
+ * to manually flush if the transfer is unaligned. A cached
+ * memory copy is used when destination or size is not naturally
+ * aligned. That is:
+ * - Require 8-byte alignment when size is 8 bytes or larger.
+ * - Require 4-byte alignment when size is 4 bytes.
+ *
+ * In the non-iovec case the entire destination needs to be
+ * flushed.
+ */
+ if (iter_is_iovec(i)) {
+ unsigned long flushed, dest = (unsigned long) addr;
+
+ if (bytes < 8) {
+ if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+ __arch_wb_cache_pmem(addr, 1);
+ } else {
+ if (!IS_ALIGNED(dest, 8)) {
+ dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+ __arch_wb_cache_pmem(addr, 1);
+ }
+
+ flushed = dest - (unsigned long) addr;
+ if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+ __arch_wb_cache_pmem(addr + bytes - 1, 1);
+ }
+ } else
+ __arch_wb_cache_pmem(addr, bytes);
return len;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [4.4-stable PATCH 2/2] block: fix del_gendisk() vs blkdev_ioctl crash
2017-04-19 17:05 ` Dan Williams
@ 2017-04-19 17:05 ` Dan Williams
-1 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
To: stable; +Cc: Jens Axboe, Robert Hu, Jan Kara, linux-nvdimm
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.
When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.
The nvdimm_revalidate_disk() implementation depends on
disk->driverfs_dev to be valid at entry. However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.
There's no reason for del_gendisk() to clear ->driverfs_dev. That
device is the parent of the disk. It is guaranteed to not be freed
until the disk, as a child, drops its ->parent reference.
We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)->parent, but lets fix it globally since
->driverfs_dev follows the lifetime of the parent. Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G O 4.4.0-rc5 #2257
[..]
Call Trace:
[<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
[<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
[<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
[<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
[<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
[<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
[<ffffffff812916dd>] block_ioctl+0x3d/0x50
[<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
[<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
[<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
[<ffffffff81264f09>] SyS_ioctl+0x79/0x90
[<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Reported-by: Robert Hu <robert.hu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
block/genhd.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c
index a5bed6bc869d..3032453a89e6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -664,7 +664,6 @@ void del_gendisk(struct gendisk *disk)
kobject_put(disk->part0.holder_dir);
kobject_put(disk->slave_dir);
- disk->driverfs_dev = NULL;
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [4.4-stable PATCH 2/2] block: fix del_gendisk() vs blkdev_ioctl crash
@ 2017-04-19 17:05 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-04-19 17:05 UTC (permalink / raw)
To: stable; +Cc: Jens Axboe, Robert Hu, Jan Kara, linux-nvdimm
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.
When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.
The nvdimm_revalidate_disk() implementation depends on
disk->driverfs_dev to be valid at entry. However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.
There's no reason for del_gendisk() to clear ->driverfs_dev. That
device is the parent of the disk. It is guaranteed to not be freed
until the disk, as a child, drops its ->parent reference.
We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)->parent, but lets fix it globally since
->driverfs_dev follows the lifetime of the parent. Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G O 4.4.0-rc5 #2257
[..]
Call Trace:
[<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
[<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
[<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
[<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
[<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
[<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
[<ffffffff812916dd>] block_ioctl+0x3d/0x50
[<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
[<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
[<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
[<ffffffff81264f09>] SyS_ioctl+0x79/0x90
[<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Reported-by: Robert Hu <robert.hu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
block/genhd.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c
index a5bed6bc869d..3032453a89e6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -664,7 +664,6 @@ void del_gendisk(struct gendisk *disk)
kobject_put(disk->part0.holder_dir);
kobject_put(disk->slave_dir);
- disk->driverfs_dev = NULL;
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
2017-04-19 17:05 ` Dan Williams
@ 2017-04-25 12:07 ` Greg KH
-1 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2017-04-25 12:07 UTC (permalink / raw)
To: Dan Williams
Cc: Jan Kara, Matthew Wilcox, x86, stable, Jens Axboe, linux-nvdimm,
Ingo Molnar, Al Viro, H. Peter Anvin, Robert Hu, Thomas Gleixner,
Christoph Hellwig
On Wed, Apr 19, 2017 at 10:05:41AM -0700, Dan Williams wrote:
> Hi -stable team,
>
> Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
> __copy_user_nocache cache-bypass assumptions", and another block layer
> fix that allows the libnvdimm unit tests to run.
>
> I have copied Jens and Jan on the block-layer fix in case they have any
> concerns.
Looks good, thanks.
greg k-h
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4
@ 2017-04-25 12:07 ` Greg KH
0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2017-04-25 12:07 UTC (permalink / raw)
To: Dan Williams
Cc: stable, Jan Kara, Toshi Kani, Matthew Wilcox, x86,
Christoph Hellwig, Jens Axboe, linux-nvdimm, Jeff Moyer,
Ingo Molnar, Al Viro, H. Peter Anvin, Robert Hu, Thomas Gleixner,
Ross Zwisler
On Wed, Apr 19, 2017 at 10:05:41AM -0700, Dan Williams wrote:
> Hi -stable team,
>
> Here is a backport for commit 11e63f6d920d "x86, pmem: fix broken
> __copy_user_nocache cache-bypass assumptions", and another block layer
> fix that allows the libnvdimm unit tests to run.
>
> I have copied Jens and Jan on the block-layer fix in case they have any
> concerns.
Looks good, thanks.
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-04-25 12:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-19 17:05 [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4 Dan Williams
2017-04-19 17:05 ` Dan Williams
2017-04-19 17:05 ` [4.4-stable PATCH 1/2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions Dan Williams
2017-04-19 17:05 ` Dan Williams
2017-04-19 17:05 ` [4.4-stable PATCH 2/2] block: fix del_gendisk() vs blkdev_ioctl crash Dan Williams
2017-04-19 17:05 ` Dan Williams
2017-04-25 12:07 ` [4.4-stable PATCH 0/2] libnvdimm: stable fixes for 4.4 Greg KH
2017-04-25 12:07 ` Greg KH
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.