public inbox for linux-nvdimm@lists.01.org
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>, linux-nvdimm@lists.01.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Will Deacon <will.deacon@arm.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Erwin Tsaur <erwin.tsaur@oracle.com>
Subject: Re: [PATCH 0/6] libnvdimm: Fix async operations and locking
Date: Tue, 18 Jun 2019 15:10:33 -0700	[thread overview]
Message-ID: <53fa618d-376f-2200-c8ba-e22ba004cdc0@oracle.com> (raw)
In-Reply-To: <156029554317.419799.1324389595953183385.stgit@dwillia2-desk3.amr.corp.intel.com>

On 6/11/2019 4:25 PM, Dan Williams wrote:
> The libnvdimm subsystem uses async operations to parallelize device
> probing operations and to allow sysfs to trigger device_unregister() on
> deleted namepsaces. A multithreaded stress test of the libnvdimm sysfs
> interface uncovered a case where device_unregister() is triggered
> multiple times, and the subsequent investigation uncovered a broken
> locking scenario.
> 
> The lack of lockdep coverage for device_lock() stymied the debug. That
> is, until patch6 "driver-core, libnvdimm: Let device subsystems add
> local lockdep coverage" solved that with a shadow lock, with lockdep
> coverage, to mirror device_lock() operations. Given the time saved with
> shadow-lock debug-hack, patch6 attempts to generalize device_lock()
> debug facility that might be able to be carried upstream. Patch6 is
> staged at the end of this fix series in case it is contentious and needs
> to be dropped.
> 
> Patch1 "drivers/base: Introduce kill_device()" could be achieved with
> local libnvdimm infrastructure. However, the existing 'dead' flag in
> 'struct device_private' aims to solve similar async register/unregister
> races so the fix in patch2 "libnvdimm/bus: Prevent duplicate
> device_unregister() calls" can be implemented with existing driver-core
> infrastructure.
> 
> Patch3 is a rare lockdep warning that is intermittent based on
> namespaces racing ahead of the completion of probe of their parent
> region. It is not related to the other fixes, it just happened to
> trigger as a result of the async stress test.
> 
> Patch4 and patch5 address an ABBA deadlock tripped by the stress test.
> 
> These patches pass the failing stress test and the existing libnvdimm
> unit tests with CONFIG_PROVE_LOCKING=y and the new "dev->lockdep_mutex"
> shadow lock with no lockdep warnings.
> 
> ---
> 
> Dan Williams (6):
>        drivers/base: Introduce kill_device()
>        libnvdimm/bus: Prevent duplicate device_unregister() calls
>        libnvdimm/region: Register badblocks before namespaces
>        libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
>        libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
>        driver-core, libnvdimm: Let device subsystems add local lockdep coverage
> 
> 
>   drivers/acpi/nfit/core.c        |   28 ++++---
>   drivers/acpi/nfit/nfit.h        |   24 ++++++
>   drivers/base/core.c             |   30 ++++++--
>   drivers/nvdimm/btt_devs.c       |   16 ++--
>   drivers/nvdimm/bus.c            |  154 +++++++++++++++++++++++++++------------
>   drivers/nvdimm/core.c           |   10 +--
>   drivers/nvdimm/dimm_devs.c      |    4 +
>   drivers/nvdimm/namespace_devs.c |   36 +++++----
>   drivers/nvdimm/nd-core.h        |   71 ++++++++++++++++++
>   drivers/nvdimm/pfn_devs.c       |   24 +++---
>   drivers/nvdimm/pmem.c           |    4 +
>   drivers/nvdimm/region.c         |   24 +++---
>   drivers/nvdimm/region_devs.c    |   12 ++-
>   include/linux/device.h          |    6 ++
>   14 files changed, 308 insertions(+), 135 deletions(-)
> 

Tested-by: Jane Chu <jane.chu@oracle.com>

Specifically, running parallel ndctls creating/destroying namespaces in 
multiple processes concurrently led to system panic, that has been 
verified fixed by this patch series.

Thanks!
-jane
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

      parent reply	other threads:[~2019-06-18 22:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-11 23:25 [PATCH 0/6] libnvdimm: Fix async operations and locking Dan Williams
2019-06-11 23:25 ` [PATCH 1/6] drivers/base: Introduce kill_device() Dan Williams
2019-06-11 23:25 ` [PATCH 2/6] libnvdimm/bus: Prevent duplicate device_unregister() calls Dan Williams
2019-06-11 23:25 ` [PATCH 3/6] libnvdimm/region: Register badblocks before namespaces Dan Williams
2019-06-21  0:56   ` Verma, Vishal L
2019-06-11 23:26 ` [PATCH 4/6] libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl() Dan Williams
2019-06-11 23:26 ` [PATCH 5/6] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock Dan Williams
2019-06-11 23:26 ` [PATCH 6/6] driver-core, libnvdimm: Let device subsystems add local lockdep coverage Dan Williams
2019-06-19 22:21   ` Dan Williams
2019-06-20  6:34     ` Greg Kroah-Hartman
2019-06-18 22:10 ` Jane Chu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53fa618d-376f-2200-c8ba-e22ba004cdc0@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=erwin.tsaur@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=stable@vger.kernel.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox