public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	stable@vger.kernel.org, Ashish Sangwan <a.sangwan@samsung.com>,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Dirk Behme <dirk.behme@de.bosch.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org
Subject: Re: [PATCH] driver core: Fix uevent_show() vs driver detach race
Date: Sun, 14 Jul 2024 09:17:46 +0200	[thread overview]
Message-ID: <2024071438-perceive-earache-db11@gregkh> (raw)
In-Reply-To: <172081332794.577428.9738802016494057132.stgit@dwillia2-xfh.jf.intel.com>

On Fri, Jul 12, 2024 at 12:42:09PM -0700, Dan Williams wrote:
> uevent_show() wants to de-reference dev->driver->name. There is no clean
> way for a device attribute to de-reference dev->driver unless that
> attribute is defined via (struct device_driver).dev_groups. Instead, the
> anti-pattern of taking the device_lock() in the attribute handler risks
> deadlocks with code paths that remove device attributes while holding
> the lock.
> 
> This deadlock is typically invisible to lockdep given the device_lock()
> is marked lockdep_set_novalidate_class(), but some subsystems allocate a
> local lockdep key for @dev->mutex to reveal reports of the form:
> 
>  ======================================================
>  WARNING: possible circular locking dependency detected
>  6.10.0-rc7+ #275 Tainted: G           OE    N
>  ------------------------------------------------------
>  modprobe/2374 is trying to acquire lock:
>  ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
> 
>  but task is already holding lock:
>  ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
> 
>  which lock already depends on the new lock.
> 
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #1 (&cxl_root_key){+.+.}-{3:3}:
>         __mutex_lock+0x99/0xc30
>         uevent_show+0xac/0x130
>         dev_attr_show+0x18/0x40
>         sysfs_kf_seq_show+0xac/0xf0
>         seq_read_iter+0x110/0x450
>         vfs_read+0x25b/0x340
>         ksys_read+0x67/0xf0
>         do_syscall_64+0x75/0x190
>         entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
>  -> #0 (kn->active#6){++++}-{0:0}:
>         __lock_acquire+0x121a/0x1fa0
>         lock_acquire+0xd6/0x2e0
>         kernfs_drain+0x1e9/0x200
>         __kernfs_remove+0xde/0x220
>         kernfs_remove_by_name_ns+0x5e/0xa0
>         device_del+0x168/0x410
>         device_unregister+0x13/0x60
>         devres_release_all+0xb8/0x110
>         device_unbind_cleanup+0xe/0x70
>         device_release_driver_internal+0x1c7/0x210
>         driver_detach+0x47/0x90
>         bus_remove_driver+0x6c/0xf0
>         cxl_acpi_exit+0xc/0x11 [cxl_acpi]
>         __do_sys_delete_module.isra.0+0x181/0x260
>         do_syscall_64+0x75/0x190
>         entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> The observation though is that driver objects are typically much longer
> lived than device objects. It is reasonable to perform lockless
> de-reference of a @driver pointer even if it is racing detach from a
> device. Given the infrequency of driver unregistration, use
> synchronize_rcu() in module_remove_driver() to close any potential
> races.  It is potentially overkill to suffer synchronize_rcu() just to
> handle the rare module removal racing uevent_show() event.
> 
> Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
> 
> Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
> Reported-by: syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.ne.jp [1]
> Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.notmuch
> Cc: stable@vger.kernel.org
> Cc: Ashish Sangwan <a.sangwan@samsung.com>
> Cc: Namjae Jeon <namjae.jeon@samsung.com>
> Cc: Dirk Behme <dirk.behme@de.bosch.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/base/core.c   |   13 ++++++++-----
>  drivers/base/module.c |    4 ++++
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2b4c0624b704..b5399262198a 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -25,6 +25,7 @@
>  #include <linux/mutex.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/netdevice.h>
> +#include <linux/rcupdate.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
>  #include <linux/string_helpers.h>
> @@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
>  static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
>  {
>  	const struct device *dev = kobj_to_dev(kobj);
> +	struct device_driver *driver;
>  	int retval = 0;
>  
>  	/* add device node properties if present */
> @@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
>  	if (dev->type && dev->type->name)
>  		add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
>  
> -	if (dev->driver)
> -		add_uevent_var(env, "DRIVER=%s", dev->driver->name);
> +	/* Synchronize with module_remove_driver() */
> +	rcu_read_lock();
> +	driver = READ_ONCE(dev->driver);
> +	if (driver)
> +		add_uevent_var(env, "DRIVER=%s", driver->name);
> +	rcu_read_unlock();

It's a lot of work for a "simple" thing of just "tell userspace what
happened" type of thing.  But it makes sense.

I'll queue this up after -rc1 is out, there's no rush before then as
it's only lockdep warnings happening now, right?

thanks,

greg k-h

  parent reply	other threads:[~2024-07-14  7:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-12 19:42 [PATCH] driver core: Fix uevent_show() vs driver detach race Dan Williams
2024-07-12 22:18 ` Tetsuo Handa
2024-07-12 23:49   ` Dan Williams
2024-07-13  6:05     ` Tetsuo Handa
2024-07-13 17:22       ` Dan Williams
2024-07-14  7:17 ` Greg KH [this message]
2024-07-14 16:49   ` Dan Williams
2024-08-06  9:00 ` Dirk Behme

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2024071438-perceive-earache-db11@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=a.sangwan@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=dirk.behme@de.bosch.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namjae.jeon@samsung.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rafael@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox