linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <jejb@linux.vnet.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Greg Kroah-Hartman <greg@kroah.com>,
	Hannes Reinecke <hare@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock
Date: Tue, 08 Nov 2016 15:44:44 -0800	[thread overview]
Message-ID: <1478648684.2368.17.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <87oa1pvl8f.fsf@xmission.com>

On Tue, 2016-11-08 at 13:13 -0600, Eric W. Biederman wrote:
> James Bottomley <jejb@linux.vnet.ibm.com> writes:
> 
> > On Tue, 2016-11-08 at 08:52 -0800, Bart Van Assche wrote:
> > > On 11/08/2016 07:28 AM, James Bottomley wrote:
> > > > On Mon, 2016-11-07 at 16:32 -0800, Bart Van Assche wrote:
> > > > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > > > > index cf4c636..44ec536 100644
> > > > > --- a/fs/kernfs/dir.c
> > > > > +++ b/fs/kernfs/dir.c
> > > > > @@ -1410,7 +1410,7 @@ int kernfs_remove_by_name_ns(struct
> > > > > kernfs_node
> > > > > *parent, const char *name,
> > > > >  	mutex_lock(&kernfs_mutex);
> > > > > 
> > > > >  	kn = kernfs_find_ns(parent, name, ns);
> > > > > -	if (kn)
> > > > > +	if (kn && !(kn->flags & KERNFS_SUICIDED))
> > > > 
> > > > Actually, wrong flag, you need KERNFS_SUICIDAL.  The reason is
> > > > that
> > > > kernfs_mutex is actually dropped half way through
> > > > __kernfs_remove, 
> > > > so KERNFS_SUICIDED is not set atomically with this mutex.
> > > 
> > > Hello James,
> > > 
> > > Sorry but what you wrote is not correct.
> > 
> > I think you agree it is dropped.  I don't need to add the bit about 
> > the reacquisition because the race is mediated by the first 
> > acquisition not the second one, if you mediate on KERNFS_SUICIDAL, 
> > you only need to worry about this because the mediation is in the 
> > first acquisition.   If you mediate on KERNFS_SUICIDED, you need to 
> > explain that the final thing that means the race can't happen is 
> > the unbreak in the sysfs delete path re-acquiring s_active ... the 
> > explanation of what's going on and why gets about 2x more complex.
> 
> Is it really the dropping of the lock that is causing this?
> I don't see that when I read those traces.

No, it's an ABBA lock inversion that causes this.  The traces are
somewhat dense, but they say it here:

 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(s_active#336);
                               lock(&shost->scan_mutex);
                               lock(s_active#336);
  lock(&shost->scan_mutex);

 *** DEADLOCK ***

The detailed explanation of this is here:

http://marc.info/?l=linux-scsi&m=147855187425596

The fix is ensuring that the CPU1 thread doesn't get into taking
s_active if CPU0 already has it using the KERNFS_SUICIDED/AL flag as an
indicator.

James


  parent reply	other threads:[~2016-11-08 23:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-08  0:32 [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock Bart Van Assche
2016-11-08  7:01 ` Greg KH
2016-11-08 15:34   ` James Bottomley
2016-11-08 15:28 ` James Bottomley
2016-11-08 16:52   ` Bart Van Assche
2016-11-08 18:01     ` James Bottomley
2016-11-08 19:13       ` Eric W. Biederman
2016-11-08 23:33         ` Bart Van Assche
2016-11-09  1:22           ` Eric W. Biederman
2016-11-08 23:44         ` James Bottomley [this message]
2016-11-09  0:57           ` Eric W. Biederman
2016-11-09  1:43             ` James Bottomley
2016-11-09  2:10               ` Eric W. Biederman
2016-11-11  1:37                 ` James Bottomley
2016-11-11  4:13                   ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2018-06-26 22:25 Bart Van Assche
2016-10-26 18:44 Bart Van Assche
2016-10-27  9:36 ` Sagi Grimberg
2016-10-27 15:39   ` Bart Van Assche
2016-10-27  9:46 ` Johannes Thumshirn
2016-10-27 15:38   ` Bart Van Assche
2016-10-29  0:12     ` Johannes Thumshirn
2016-10-29  2:08 ` James Bottomley
2016-10-30 19:22   ` Bart Van Assche
2016-10-30 20:25     ` James Bottomley
2016-11-03 22:27   ` Bart Van Assche
2016-11-04 13:47     ` James Bottomley
2016-11-04 18:17       ` Bart Van Assche
2016-11-07 20:51         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1478648684.2368.17.camel@linux.vnet.ibm.com \
    --to=jejb@linux.vnet.ibm.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=ebiederm@xmission.com \
    --cc=greg@kroah.com \
    --cc=hare@suse.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).