public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tyler Hall <tylerwhall@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nicolai Stange <nicstange@gmail.com>,
	Johannes Berg <johannes@sipsolutions.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: deadlock in debugfs synchronize_srcu() when unplugging USB
Date: Thu, 12 Oct 2017 14:04:50 -0700	[thread overview]
Message-ID: <20171012210450.GL3521@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAOjnSCYGprej+vEEsSXwr=wO+eWLe2d6sHQYTpp-DFpQ3pmguw@mail.gmail.com>

On Thu, Oct 12, 2017 at 04:01:48PM -0400, Tyler Hall wrote:
> Hi,
> 
> I have a reproducible scenario wherein removing a USB device while
> reading /sys/kernel/debug/usb/devices causes a deadlock. This should
> not be specific to any USB device. Any USB device removal that causes
> a call to debugfs_remove() has inverted lock ordering with respect to
> the read() of debug/usb/devices.
> 
> e.g.
> read thread: srcu_read_lock(&debugfs_srcu);
> -- usb unplug --
>   remove thread: mutex_lock(&usb_bus_idr_lock);
>   remove thread: synchronize_srcu(&debugfs_srcu); <- blocked
> read thread: mutex_lock(&usb_bus_idr_lock); <- blocked
> read thread: srcu_read_unlock(&debugfs_srcu, ...);

The reader cannot exit its SRCU read-side critical section until it
acquires usb_bus_idr_lock.  The updater's synchronize_srcu() is not
permitted to return until all pre-existing readers complete, and it won't
release usb_bus_idr_lock until that happens.  So you have a deadlock,
pure and simple.

Use of RCU and SRCU can greatly reduce the possibility of deadlock,
but as you can see, sufficiently clever code can still manage to get
into a deadlock state.

The rule is "Within a read-side critical section, never wait on anything
that directly or indirectly waits on a grace period."  The above code
violates that rule, and so the above code needs to be fixed.

> This seems to be another flavor of what Johannes Berg reported:
> deadlock in synchronize_srcu() in debugfs?
> https://lkml.org/lkml/2017/3/23/415

It does look quite similar.

> I applied this patch set from Nicolai Stange and can no longer
> reproduce the hang.
> [RFC PATCH v2 0/9] debugfs: per-file removal protection
> https://lkml.org/lkml/2017/5/3/292
> 
> As patch 2/9 in the series indicates, commit 49d200deaa68 ("debugfs:
> prevent access to removed files' private data") is where this was
> first introduced, and it is reproducible on v4.14-rc4.
> 
> How should we move forward with the resolution of this debugfs change?
> It seems to me that the USB locking is reasonable but the debugfs
> global srcu is overly restrictive. This could lead to unexpected lock
> inversion any time a driver shares a mutex between its debugfs read
> and removal paths.

It looks like no one took Nicolai's series and that Nicolai never
reposted it.  Would you like to forward-port and repost?

						Thanx, Paul

> Backtrace below. Thanks!
> 
> -Tyler Hall
> 
> This is easier to reproduce by adding a sleep before the
> usb_bus_idr_lock, but I've seen it on an unmodified kernel.
> 
> diff --git a/drivers/usb/core/devices.c b/drivers/usb/core/devices.c
> index 55dea2e7828f..534650cd0950 100644
> --- a/drivers/usb/core/devices.c
> +++ b/drivers/usb/core/devices.c
> @@ -614,6 +614,7 @@ static ssize_t usb_device_read(struct file *file,
> char __user *buf,
>      if (!access_ok(VERIFY_WRITE, buf, nbytes))
>          return -EFAULT;
> 
> +    msleep(1000);
>      mutex_lock(&usb_bus_idr_lock);
>      /* print devices for all busses */
>      idr_for_each_entry(&usb_bus_idr, bus, id) {
> 
> 
> [   24.240542] sysrq: SysRq : Show Blocked State
> [   24.240765]   task                        PC stack   pid father
> [   24.240975] kworker/0:2     D13840   881      2 0x80000000
> [   24.241525] Workqueue: usb_hub_wq hub_event
> [   24.241682] Call Trace:
> [   24.242273]  __schedule+0x317/0x6d0
> [   24.242442]  schedule+0x31/0x80
> [   24.242514]  schedule_timeout+0x1d0/0x320
> [   24.242603]  ? __queue_work+0x135/0x400
> [   24.242689]  wait_for_completion+0x92/0xf0
> [   24.242765]  ? wait_for_completion+0x92/0xf0
> [   24.242841]  ? wake_up_q+0x70/0x70
> [   24.242907]  __synchronize_srcu.part.14+0x71/0x90
> [   24.242985]  ? trace_event_raw_event_rcu_torture_read+0xe0/0xe0
> [   24.243169]  synchronize_srcu_expedited+0x22/0x30
> [   24.243265]  ? synchronize_srcu_expedited+0x22/0x30
> [   24.243347]  synchronize_srcu+0x9a/0xc0
> [   24.243418]  debugfs_remove+0x6d/0xa0
> [   24.243490]  bdi_unregister+0x8b/0x170
> [   24.243558]  del_gendisk+0x139/0x220
> [   24.243624]  sd_remove+0x5c/0xc0
> [   24.243685]  device_release_driver_internal+0x150/0x210
> [   24.243769]  device_release_driver+0xd/0x10
> [   24.243841]  bus_remove_device+0xdb/0x120
> [   24.243915]  device_del+0x1c3/0x2e0
> [   24.243977]  __scsi_remove_device+0xff/0x130
> [   24.244122]  scsi_forget_host+0x5b/0x60
> [   24.244203]  scsi_remove_host+0x74/0x140
> [   24.244281]  usb_stor_disconnect+0x54/0xc0
> [   24.244357]  usb_unbind_interface+0x6d/0x260
> [   24.244437]  device_release_driver_internal+0x150/0x210
> [   24.244520]  device_release_driver+0xd/0x10
> [   24.244591]  bus_remove_device+0xdb/0x120
> [   24.244659]  device_del+0x1c3/0x2e0
> [   24.244722]  usb_disable_device+0x97/0x1f0
> [   24.244792]  usb_disconnect+0x88/0x230
> [   24.244853]  hub_event+0x5b9/0x11e0
> [   24.244915]  ? add_timer+0x10e/0x230
> [   24.244984]  process_one_work+0x146/0x3e0
> [   24.245124]  worker_thread+0x43/0x3e0
> [   24.245204]  kthread+0x104/0x140
> [   24.245266]  ? create_worker+0x190/0x190
> [   24.245333]  ? kthread_create_on_node+0x40/0x40
> [   24.245406]  ret_from_fork+0x22/0x30
> 
> [   24.245542] cat             D13712  1029   1018 0x00000000
> [   24.245652] Call Trace:
> [   24.245705]  __schedule+0x317/0x6d0
> [   24.245770]  schedule+0x31/0x80
> [   24.245830]  schedule_preempt_disabled+0x9/0x10
> [   24.245903]  __mutex_lock.isra.2+0x225/0x470
> [   24.245975]  __mutex_lock_slowpath+0xe/0x10
> [   24.246110]  ? __mutex_lock_slowpath+0xe/0x10
> [   24.246199]  mutex_lock+0x2a/0x30
> [   24.246261]  usb_device_read+0xb6/0x140
> [   24.246325]  full_proxy_read+0x4f/0x90
> [   24.246394]  __vfs_read+0x23/0x120
> [   24.246456]  ? security_file_permission+0x96/0xb0
> [   24.246533]  ? rw_verify_area+0x49/0xb0
> [   24.246593]  vfs_read+0x8e/0x130
> [   24.246646]  SyS_read+0x41/0xa0
> [   24.246698]  entry_SYSCALL_64_fastpath+0x13/0x94
> 

  reply	other threads:[~2017-10-12 21:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 20:01 deadlock in debugfs synchronize_srcu() when unplugging USB Tyler Hall
2017-10-12 21:04 ` Paul E. McKenney [this message]
2017-10-16  7:51 ` Greg Kroah-Hartman
2017-10-16  8:15   ` Johannes Berg
2017-10-16  8:31     ` Greg Kroah-Hartman
2017-10-16 15:08       ` Tyler Hall
2017-10-30 23:15         ` [PATCH v3 0/8] debugfs: per-file removal protection Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 1/8] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 2/8] debugfs: implement per-file removal protection Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 3/8] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 4/8] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 5/8] IB/hfi1: " Nicolai Stange
2017-11-07 14:29             ` Dennis Dalessandro
2017-10-30 23:15           ` [PATCH v3 6/8] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 7/8] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 8/8] debugfs: defer debugfs_fsdata allocation to first usage Nicolai Stange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171012210450.GL3521@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicstange@gmail.com \
    --cc=tylerwhall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox