From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Dan Williams <dan.j.williams@intel.com>
Subject: [PATCH 4.9 21/21] device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
Date: Tue, 25 Apr 2017 16:09:16 +0100 [thread overview]
Message-ID: <20170425150828.410579648@linuxfoundation.org> (raw)
In-Reply-To: <20170425150827.381333941@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams <dan.j.williams@intel.com>
commit 956a4cd2c957acf638ff29951aabaa9d8e92bbc2 upstream.
The following warning triggers with a new unit test that stresses the
device-dax interface.
===============================
[ ERR: suspicious RCU usage. ]
4.11.0-rc4+ #1049 Tainted: G O
-------------------------------
./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
2 locks held by fio/9070:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8d0739d7>] __do_page_fault+0x167/0x4f0
#1: (rcu_read_lock){......}, at: [<ffffffffc03fbd02>] dax_dev_huge_fault+0x32/0x620 [dax]
Call Trace:
dump_stack+0x86/0xc3
lockdep_rcu_suspicious+0xd7/0x110
___might_sleep+0xac/0x250
__might_sleep+0x4a/0x80
__alloc_pages_nodemask+0x23a/0x360
alloc_pages_current+0xa1/0x1f0
pte_alloc_one+0x17/0x80
__pte_alloc+0x1e/0x120
__get_locked_pte+0x1bf/0x1d0
insert_pfn.isra.70+0x3a/0x100
? lookup_memtype+0xa6/0xd0
vm_insert_mixed+0x64/0x90
dax_dev_huge_fault+0x520/0x620 [dax]
? dax_dev_huge_fault+0x32/0x620 [dax]
dax_dev_fault+0x10/0x20 [dax]
__do_fault+0x1e/0x140
__handle_mm_fault+0x9af/0x10d0
handle_mm_fault+0x16d/0x370
? handle_mm_fault+0x47/0x370
__do_page_fault+0x28c/0x4f0
trace_do_page_fault+0x58/0x2a0
do_async_page_fault+0x1a/0xa0
async_page_fault+0x28/0x30
Inserting a page table entry may trigger an allocation while we are
holding a read lock to keep the device instance alive for the duration
of the fault. Use srcu for this keep-alive protection.
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/dax/Kconfig | 1 +
drivers/dax/dax.c | 13 +++++++------
2 files changed, 8 insertions(+), 6 deletions(-)
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -2,6 +2,7 @@ menuconfig DEV_DAX
tristate "DAX: direct access to differentiated memory"
default m if NVDIMM_DAX
depends on TRANSPARENT_HUGEPAGE
+ select SRCU
help
Support raw access to differentiated (persistence, bandwidth,
latency...) memory via an mmap(2) capable character
--- a/drivers/dax/dax.c
+++ b/drivers/dax/dax.c
@@ -24,6 +24,7 @@
#include "dax.h"
static dev_t dax_devt;
+DEFINE_STATIC_SRCU(dax_srcu);
static struct class *dax_class;
static DEFINE_IDA(dax_minor_ida);
static int nr_dax = CONFIG_NR_DEV_DAX;
@@ -59,7 +60,7 @@ struct dax_region {
* @region - parent region
* @dev - device backing the character device
* @cdev - core chardev data
- * @alive - !alive + rcu grace period == no new mappings can be established
+ * @alive - !alive + srcu grace period == no new mappings can be established
* @id - child id in the region
* @num_resources - number of physical address extents in this device
* @res - array of physical address ranges
@@ -437,7 +438,7 @@ static int __dax_dev_pmd_fault(struct da
static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags)
{
- int rc;
+ int rc, id;
struct file *filp = vma->vm_file;
struct dax_dev *dax_dev = filp->private_data;
@@ -445,9 +446,9 @@ static int dax_dev_pmd_fault(struct vm_a
current->comm, (flags & FAULT_FLAG_WRITE)
? "write" : "read", vma->vm_start, vma->vm_end);
- rcu_read_lock();
+ id = srcu_read_lock(&dax_srcu);
rc = __dax_dev_pmd_fault(dax_dev, vma, addr, pmd, flags);
- rcu_read_unlock();
+ srcu_read_unlock(&dax_srcu, id);
return rc;
}
@@ -563,11 +564,11 @@ static void unregister_dax_dev(void *dev
* Note, rcu is not protecting the liveness of dax_dev, rcu is
* ensuring that any fault handlers that might have seen
* dax_dev->alive == true, have completed. Any fault handlers
- * that start after synchronize_rcu() has started will abort
+ * that start after synchronize_srcu() has started will abort
* upon seeing dax_dev->alive == false.
*/
dax_dev->alive = false;
- synchronize_rcu();
+ synchronize_srcu(&dax_srcu);
unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1);
cdev_del(cdev);
device_unregister(dev);
next prev parent reply other threads:[~2017-04-25 15:27 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-25 15:08 [PATCH 4.9 00/21] 4.9.25-stable review Greg Kroah-Hartman
2017-04-25 15:08 ` [PATCH 4.9 01/21] KEYS: Disallow keyrings beginning with . to be joined as session keyrings Greg Kroah-Hartman
2017-04-25 15:08 ` [PATCH 4.9 02/21] KEYS: Change the name of the dead type to ".dead" to prevent user access Greg Kroah-Hartman
2017-04-25 15:08 ` [PATCH 4.9 03/21] KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings Greg Kroah-Hartman
2017-04-25 15:08 ` [PATCH 4.9 04/21] tracing: Allocate the snapshot buffer before enabling probe Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 05/21] ring-buffer: Have ring_buffer_iter_empty() return true when empty Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 06/21] mm: prevent NR_ISOLATE_* stats from going negative Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 07/21] cifs: Do not send echoes before Negotiate is complete Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 08/21] CIFS: remove bad_network_name flag Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 09/21] s390/mm: fix CMMA vs KSM vs others Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 10/21] Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 11/21] ACPI / power: Avoid maybe-uninitialized warning Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 12/21] mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 13/21] ubifs: Fix RENAME_WHITEOUT support Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 14/21] ubifs: Fix O_TMPFILE corner case in ubifs_link() Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 15/21] mac80211: reject ToDS broadcast data frames Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 16/21] mac80211: fix MU-MIMO follow-MAC mode Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 17/21] ubi/upd: Always flush after prepared for an update Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 18/21] powerpc/kprobe: Fix oops when kprobed on stdu instruction Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 19/21] x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs Greg Kroah-Hartman
2017-04-25 15:09 ` Greg Kroah-Hartman [this message]
2017-04-25 18:24 ` [PATCH 4.9 00/21] 4.9.25-stable review Shuah Khan
2017-04-26 2:23 ` Guenter Roeck
-- strict thread matches above, loose matches on Subject: below --
2017-04-25 15:09 [4.9,20/21] x86/mce: Make the MCE notifier a blocking one Greg Kroah-Hartman
2017-04-25 15:09 ` [PATCH 4.9 20/21] " Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170425150828.410579648@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dan.j.williams@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.