Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: James Bottomley <jejb@linux.vnet.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Greg Kroah-Hartman <greg@kroah.com>,
	Hannes Reinecke <hare@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock
Date: Tue, 08 Nov 2016 17:43:35 -0800	[thread overview]
Message-ID: <1478655815.2368.34.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <87bmxpqxlb.fsf@xmission.com>

On Tue, 2016-11-08 at 18:57 -0600, Eric W. Biederman wrote:
> James Bottomley <jejb@linux.vnet.ibm.com> writes:
> 
> > On Tue, 2016-11-08 at 13:13 -0600, Eric W. Biederman wrote:
[...]
> > > Is it really the dropping of the lock that is causing this?
> > > I don't see that when I read those traces.
> > 
> > No, it's an ABBA lock inversion that causes this.  The traces are
> > somewhat dense, but they say it here:
> > 
> >  Possible unsafe locking scenario:
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(s_active#336);
> >                                lock(&shost->scan_mutex);
> >                                lock(s_active#336);
> >   lock(&shost->scan_mutex);
> > 
> >  *** DEADLOCK ***
> > 
> > The detailed explanation of this is here:
> > 
> > http://marc.info/?l=linux-scsi&m=147855187425596
> > 
> > The fix is ensuring that the CPU1 thread doesn't get into taking
> > s_active if CPU0 already has it using the KERNFS_SUICIDED/AL flag 
> > as an indicator.
> 
> So.  The kernfs code does not look safe to have kernfs_remove_self
> and kernfs_remove_by_name_ns racing against each other I agree.
> 
> The kernfs_remove_self path turns KERNFS_SUICIDAL into another 
> blocking lock by another name, and without lockdep annotations so I 
> don't know that it is safe.

Yes ... the number of hand rolled locks in that code make it super hard
to follow.

> The relevant bit from kernfs_remove_self is:
> 
> 	if (!(kn->flags & KERNFS_SUICIDAL)) {
> 		kn->flags |= KERNFS_SUICIDAL;
> 		__kernfs_remove(kn);
> 		kn->flags |= KERNFS_SUICIDED;
> 		ret = true;
> 	} else {
> 		wait_queue_head_t *waitq = &kernfs_root(kn)
> ->deactivate_waitq;
> 		DEFINE_WAIT(wait);
> 
> 		while (true) {
> 			prepare_to_wait(waitq, &wait,
> TASK_UNINTERRUPTIBLE);
> 
> 			if ((kn->flags & KERNFS_SUICIDED) &&
> 			    atomic_read(&kn->active) ==
> KN_DEACTIVATED_BIAS)
> 				break;
> 
> 			mutex_unlock(&kernfs_mutex);
> 			schedule();
> 			mutex_lock(&kernfs_mutex);
> 		}
> 		finish_wait(waitq, &wait);
> 		WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb));
> 		ret = false;
> 	}
> 
> I am pretty certain that if you are going to make kernfs_remove_self 
> and kernfs_remove_by_name_ns to be safe to race against each other, 
> not just the KERNFS_SUICIDAL check, but the wait when KERNFS_SUICIDAL 
> is set needs to be added ot kernfs_remove_by_name_ns.

I don't think you can do that: waiting for SUICIDED would introduce
another potential lock entanglement.  I'm reasonably happy that the
deactivation offset coupled with kernfs_drain in the non self remove
path means that the necessary cleanup is done when the directory itself
is removed.  That seems to be a common pattern in all non-self removes.

> And I suspect if you add the appropriate lockdep annotations to that 
> mess you will find yourself in a similar but related ABBA deadlock.

I can't prove the negative, but as long as there's no waiting on the
SUICIDED/AL flags in the non-self remove path, I believe we're safe
with the current patch.

> Which is why I would like a simpler easier to understand mechanism if
> we can.

I don't disagree: If you want to clean out the Augean Stables, I can
lend you the thigh length rubber boots and the gas mask.  However, I
think that what we're currently proposing: a simple patch to make
device_remove_file_self() actually work for everyone, along with
stringent testing is the better approach.

After all, if you look at

commit ac0ece9174aca9aa895ce0accc54f1f8ff12d117
Author: Tejun Heo <tj@kernel.org>
Date:   Mon Feb 3 14:03:03 2014 -0500

    scsi: use device_remove_file_self() instead of device_schedule_callback()

You'll see Tejun added all this stuff just to remove the async callback
we originally had.  Simply restoring the async callback back makes us
quite considerably worse off because the device_remove_file_self()
mechanism is in use elsewhere.  We need either to fix it and move on or
junk it and go back to the original.

James

next prev parent reply	other threads:[~2016-11-09  1:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-08  0:32 [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock Bart Van Assche
2016-11-08  7:01 ` Greg KH
2016-11-08 15:34   ` James Bottomley
2016-11-08 15:28 ` James Bottomley
2016-11-08 16:52   ` Bart Van Assche
2016-11-08 18:01     ` James Bottomley
2016-11-08 19:13       ` Eric W. Biederman
2016-11-08 23:33         ` Bart Van Assche
2016-11-09  1:22           ` Eric W. Biederman
2016-11-08 23:44         ` James Bottomley
2016-11-09  0:57           ` Eric W. Biederman
2016-11-09  1:43             ` James Bottomley [this message]
2016-11-09  2:10               ` Eric W. Biederman
2016-11-11  1:37                 ` James Bottomley
2016-11-11  4:13                   ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2018-06-26 22:25 Bart Van Assche
2016-10-26 18:44 Bart Van Assche
2016-10-27  9:36 ` Sagi Grimberg
2016-10-27 15:39   ` Bart Van Assche
2016-10-27  9:46 ` Johannes Thumshirn
2016-10-27 15:38   ` Bart Van Assche
2016-10-29  0:12     ` Johannes Thumshirn
2016-10-29  2:08 ` James Bottomley
2016-10-30 19:22   ` Bart Van Assche
2016-10-30 20:25     ` James Bottomley
2016-11-03 22:27   ` Bart Van Assche
2016-11-04 13:47     ` James Bottomley
2016-11-04 18:17       ` Bart Van Assche
2016-11-07 20:51         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1478655815.2368.34.camel@linux.vnet.ibm.com \
    --to=jejb@linux.vnet.ibm.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=ebiederm@xmission.com \
    --cc=greg@kroah.com \
    --cc=hare@suse.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).