From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tyler Hall <tylerwhall@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Nicolai Stange <nicstange@gmail.com>,
Johannes Berg <johannes@sipsolutions.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: deadlock in debugfs synchronize_srcu() when unplugging USB
Date: Thu, 12 Oct 2017 14:04:50 -0700 [thread overview]
Message-ID: <20171012210450.GL3521@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAOjnSCYGprej+vEEsSXwr=wO+eWLe2d6sHQYTpp-DFpQ3pmguw@mail.gmail.com>
On Thu, Oct 12, 2017 at 04:01:48PM -0400, Tyler Hall wrote:
> Hi,
>
> I have a reproducible scenario wherein removing a USB device while
> reading /sys/kernel/debug/usb/devices causes a deadlock. This should
> not be specific to any USB device. Any USB device removal that causes
> a call to debugfs_remove() has inverted lock ordering with respect to
> the read() of debug/usb/devices.
>
> e.g.
> read thread: srcu_read_lock(&debugfs_srcu);
> -- usb unplug --
> remove thread: mutex_lock(&usb_bus_idr_lock);
> remove thread: synchronize_srcu(&debugfs_srcu); <- blocked
> read thread: mutex_lock(&usb_bus_idr_lock); <- blocked
> read thread: srcu_read_unlock(&debugfs_srcu, ...);
The reader cannot exit its SRCU read-side critical section until it
acquires usb_bus_idr_lock. The updater's synchronize_srcu() is not
permitted to return until all pre-existing readers complete, and it won't
release usb_bus_idr_lock until that happens. So you have a deadlock,
pure and simple.
Use of RCU and SRCU can greatly reduce the possibility of deadlock,
but as you can see, sufficiently clever code can still manage to get
into a deadlock state.
The rule is "Within a read-side critical section, never wait on anything
that directly or indirectly waits on a grace period." The above code
violates that rule, and so the above code needs to be fixed.
> This seems to be another flavor of what Johannes Berg reported:
> deadlock in synchronize_srcu() in debugfs?
> https://lkml.org/lkml/2017/3/23/415
It does look quite similar.
> I applied this patch set from Nicolai Stange and can no longer
> reproduce the hang.
> [RFC PATCH v2 0/9] debugfs: per-file removal protection
> https://lkml.org/lkml/2017/5/3/292
>
> As patch 2/9 in the series indicates, commit 49d200deaa68 ("debugfs:
> prevent access to removed files' private data") is where this was
> first introduced, and it is reproducible on v4.14-rc4.
>
> How should we move forward with the resolution of this debugfs change?
> It seems to me that the USB locking is reasonable but the debugfs
> global srcu is overly restrictive. This could lead to unexpected lock
> inversion any time a driver shares a mutex between its debugfs read
> and removal paths.
It looks like no one took Nicolai's series and that Nicolai never
reposted it. Would you like to forward-port and repost?
Thanx, Paul
> Backtrace below. Thanks!
>
> -Tyler Hall
>
> This is easier to reproduce by adding a sleep before the
> usb_bus_idr_lock, but I've seen it on an unmodified kernel.
>
> diff --git a/drivers/usb/core/devices.c b/drivers/usb/core/devices.c
> index 55dea2e7828f..534650cd0950 100644
> --- a/drivers/usb/core/devices.c
> +++ b/drivers/usb/core/devices.c
> @@ -614,6 +614,7 @@ static ssize_t usb_device_read(struct file *file,
> char __user *buf,
> if (!access_ok(VERIFY_WRITE, buf, nbytes))
> return -EFAULT;
>
> + msleep(1000);
> mutex_lock(&usb_bus_idr_lock);
> /* print devices for all busses */
> idr_for_each_entry(&usb_bus_idr, bus, id) {
>
>
> [ 24.240542] sysrq: SysRq : Show Blocked State
> [ 24.240765] task PC stack pid father
> [ 24.240975] kworker/0:2 D13840 881 2 0x80000000
> [ 24.241525] Workqueue: usb_hub_wq hub_event
> [ 24.241682] Call Trace:
> [ 24.242273] __schedule+0x317/0x6d0
> [ 24.242442] schedule+0x31/0x80
> [ 24.242514] schedule_timeout+0x1d0/0x320
> [ 24.242603] ? __queue_work+0x135/0x400
> [ 24.242689] wait_for_completion+0x92/0xf0
> [ 24.242765] ? wait_for_completion+0x92/0xf0
> [ 24.242841] ? wake_up_q+0x70/0x70
> [ 24.242907] __synchronize_srcu.part.14+0x71/0x90
> [ 24.242985] ? trace_event_raw_event_rcu_torture_read+0xe0/0xe0
> [ 24.243169] synchronize_srcu_expedited+0x22/0x30
> [ 24.243265] ? synchronize_srcu_expedited+0x22/0x30
> [ 24.243347] synchronize_srcu+0x9a/0xc0
> [ 24.243418] debugfs_remove+0x6d/0xa0
> [ 24.243490] bdi_unregister+0x8b/0x170
> [ 24.243558] del_gendisk+0x139/0x220
> [ 24.243624] sd_remove+0x5c/0xc0
> [ 24.243685] device_release_driver_internal+0x150/0x210
> [ 24.243769] device_release_driver+0xd/0x10
> [ 24.243841] bus_remove_device+0xdb/0x120
> [ 24.243915] device_del+0x1c3/0x2e0
> [ 24.243977] __scsi_remove_device+0xff/0x130
> [ 24.244122] scsi_forget_host+0x5b/0x60
> [ 24.244203] scsi_remove_host+0x74/0x140
> [ 24.244281] usb_stor_disconnect+0x54/0xc0
> [ 24.244357] usb_unbind_interface+0x6d/0x260
> [ 24.244437] device_release_driver_internal+0x150/0x210
> [ 24.244520] device_release_driver+0xd/0x10
> [ 24.244591] bus_remove_device+0xdb/0x120
> [ 24.244659] device_del+0x1c3/0x2e0
> [ 24.244722] usb_disable_device+0x97/0x1f0
> [ 24.244792] usb_disconnect+0x88/0x230
> [ 24.244853] hub_event+0x5b9/0x11e0
> [ 24.244915] ? add_timer+0x10e/0x230
> [ 24.244984] process_one_work+0x146/0x3e0
> [ 24.245124] worker_thread+0x43/0x3e0
> [ 24.245204] kthread+0x104/0x140
> [ 24.245266] ? create_worker+0x190/0x190
> [ 24.245333] ? kthread_create_on_node+0x40/0x40
> [ 24.245406] ret_from_fork+0x22/0x30
>
> [ 24.245542] cat D13712 1029 1018 0x00000000
> [ 24.245652] Call Trace:
> [ 24.245705] __schedule+0x317/0x6d0
> [ 24.245770] schedule+0x31/0x80
> [ 24.245830] schedule_preempt_disabled+0x9/0x10
> [ 24.245903] __mutex_lock.isra.2+0x225/0x470
> [ 24.245975] __mutex_lock_slowpath+0xe/0x10
> [ 24.246110] ? __mutex_lock_slowpath+0xe/0x10
> [ 24.246199] mutex_lock+0x2a/0x30
> [ 24.246261] usb_device_read+0xb6/0x140
> [ 24.246325] full_proxy_read+0x4f/0x90
> [ 24.246394] __vfs_read+0x23/0x120
> [ 24.246456] ? security_file_permission+0x96/0xb0
> [ 24.246533] ? rw_verify_area+0x49/0xb0
> [ 24.246593] vfs_read+0x8e/0x130
> [ 24.246646] SyS_read+0x41/0xa0
> [ 24.246698] entry_SYSCALL_64_fastpath+0x13/0x94
>
next prev parent reply other threads:[~2017-10-12 21:04 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 20:01 deadlock in debugfs synchronize_srcu() when unplugging USB Tyler Hall
2017-10-12 21:04 ` Paul E. McKenney [this message]
2017-10-16 7:51 ` Greg Kroah-Hartman
2017-10-16 8:15 ` Johannes Berg
2017-10-16 8:31 ` Greg Kroah-Hartman
2017-10-16 15:08 ` Tyler Hall
[not found] ` <CAOjnSCazvybpH5ZhHn=iTLm_dVGEdr=W1pUQv6zowRisB1MWTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-30 23:15 ` [PATCH v3 0/8] debugfs: per-file removal protection Nicolai Stange
2017-10-30 23:15 ` Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 1/8] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 2/8] debugfs: implement per-file removal protection Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 3/8] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 4/8] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 5/8] IB/hfi1: " Nicolai Stange
[not found] ` <20171030231554.6200-6-nicstange-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-07 14:29 ` Dennis Dalessandro
2017-11-07 14:29 ` Dennis Dalessandro
2017-10-30 23:15 ` [PATCH v3 6/8] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 7/8] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-10-30 23:15 ` [PATCH v3 8/8] debugfs: defer debugfs_fsdata allocation to first usage Nicolai Stange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171012210450.GL3521@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=gregkh@linuxfoundation.org \
--cc=johannes@sipsolutions.net \
--cc=linux-kernel@vger.kernel.org \
--cc=nicstange@gmail.com \
--cc=tylerwhall@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.