Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Greg Kroah-Hartman <greg@kroah.com>,
	Hannes Reinecke <hare@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock
Date: Tue, 08 Nov 2016 18:57:20 -0600	[thread overview]
Message-ID: <87bmxpqxlb.fsf@xmission.com> (raw)
In-Reply-To: <1478648684.2368.17.camel@linux.vnet.ibm.com> (James Bottomley's message of "Tue, 08 Nov 2016 15:44:44 -0800")

James Bottomley <jejb@linux.vnet.ibm.com> writes:

> On Tue, 2016-11-08 at 13:13 -0600, Eric W. Biederman wrote:
>> James Bottomley <jejb@linux.vnet.ibm.com> writes:
>> 
>> > On Tue, 2016-11-08 at 08:52 -0800, Bart Van Assche wrote:
>> > > On 11/08/2016 07:28 AM, James Bottomley wrote:
>> > > > On Mon, 2016-11-07 at 16:32 -0800, Bart Van Assche wrote:
>> > > > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>> > > > > index cf4c636..44ec536 100644
>> > > > > --- a/fs/kernfs/dir.c
>> > > > > +++ b/fs/kernfs/dir.c
>> > > > > @@ -1410,7 +1410,7 @@ int kernfs_remove_by_name_ns(struct
>> > > > > kernfs_node
>> > > > > *parent, const char *name,
>> > > > >  	mutex_lock(&kernfs_mutex);
>> > > > > 
>> > > > >  	kn = kernfs_find_ns(parent, name, ns);
>> > > > > -	if (kn)
>> > > > > +	if (kn && !(kn->flags & KERNFS_SUICIDED))
>> > > > 
>> > > > Actually, wrong flag, you need KERNFS_SUICIDAL.  The reason is
>> > > > that
>> > > > kernfs_mutex is actually dropped half way through
>> > > > __kernfs_remove, 
>> > > > so KERNFS_SUICIDED is not set atomically with this mutex.
>> > > 
>> > > Hello James,
>> > > 
>> > > Sorry but what you wrote is not correct.
>> > 
>> > I think you agree it is dropped.  I don't need to add the bit about 
>> > the reacquisition because the race is mediated by the first 
>> > acquisition not the second one, if you mediate on KERNFS_SUICIDAL, 
>> > you only need to worry about this because the mediation is in the 
>> > first acquisition.   If you mediate on KERNFS_SUICIDED, you need to 
>> > explain that the final thing that means the race can't happen is 
>> > the unbreak in the sysfs delete path re-acquiring s_active ... the 
>> > explanation of what's going on and why gets about 2x more complex.
>> 
>> Is it really the dropping of the lock that is causing this?
>> I don't see that when I read those traces.
>
> No, it's an ABBA lock inversion that causes this.  The traces are
> somewhat dense, but they say it here:
>
>  Possible unsafe locking scenario:
>        CPU0                    CPU1
>        ----                    ----
>   lock(s_active#336);
>                                lock(&shost->scan_mutex);
>                                lock(s_active#336);
>   lock(&shost->scan_mutex);
>
>  *** DEADLOCK ***
>
> The detailed explanation of this is here:
>
> http://marc.info/?l=linux-scsi&m=147855187425596
>
> The fix is ensuring that the CPU1 thread doesn't get into taking
> s_active if CPU0 already has it using the KERNFS_SUICIDED/AL flag as an
> indicator.

So.  The kernfs code does not look safe to have kernfs_remove_self
and kernfs_remove_by_name_ns racing against each other I agree.

The kernfs_remove_self path turns KERNFS_SUICIDAL into another blocking
lock by another name, and without lockdep annotations so I don't know
that it is safe.

The relevant bit from kernfs_remove_self is:

	if (!(kn->flags & KERNFS_SUICIDAL)) {
		kn->flags |= KERNFS_SUICIDAL;
		__kernfs_remove(kn);
		kn->flags |= KERNFS_SUICIDED;
		ret = true;
	} else {
		wait_queue_head_t *waitq = &kernfs_root(kn)->deactivate_waitq;
		DEFINE_WAIT(wait);

		while (true) {
			prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE);

			if ((kn->flags & KERNFS_SUICIDED) &&
			    atomic_read(&kn->active) == KN_DEACTIVATED_BIAS)
				break;

			mutex_unlock(&kernfs_mutex);
			schedule();
			mutex_lock(&kernfs_mutex);
		}
		finish_wait(waitq, &wait);
		WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb));
		ret = false;
	}

I am pretty certain that if you are going to make kernfs_remove_self and
kernfs_remove_by_name_ns to be safe to race against each other, not just
the KERNFS_SUICIDAL check, but the wait when KERNFS_SUICIDAL is set
needs to be added ot kernfs_remove_by_name_ns.

And I suspect if you add the appropriate lockdep annotations to that mess
you will find yourself in a similar but related ABBA deadlock.

Which is why I would like a simpler easier to understand mechanism if we can.

Eric

next prev parent reply	other threads:[~2016-11-09  0:59 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-08  0:32 [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock Bart Van Assche
2016-11-08  7:01 ` Greg KH
2016-11-08 15:34   ` James Bottomley
2016-11-08 15:28 ` James Bottomley
2016-11-08 16:52   ` Bart Van Assche
2016-11-08 18:01     ` James Bottomley
2016-11-08 19:13       ` Eric W. Biederman
2016-11-08 23:33         ` Bart Van Assche
2016-11-09  1:22           ` Eric W. Biederman
2016-11-08 23:44         ` James Bottomley
2016-11-09  0:57           ` Eric W. Biederman [this message]
2016-11-09  1:43             ` James Bottomley
2016-11-09  2:10               ` Eric W. Biederman
2016-11-11  1:37                 ` James Bottomley
2016-11-11  4:13                   ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2018-06-26 22:25 Bart Van Assche
2016-10-26 18:44 Bart Van Assche
2016-10-27  9:36 ` Sagi Grimberg
2016-10-27 15:39   ` Bart Van Assche
2016-10-27  9:46 ` Johannes Thumshirn
2016-10-27 15:38   ` Bart Van Assche
2016-10-29  0:12     ` Johannes Thumshirn
2016-10-29  2:08 ` James Bottomley
2016-10-30 19:22   ` Bart Van Assche
2016-10-30 20:25     ` James Bottomley
2016-11-03 22:27   ` Bart Van Assche
2016-11-04 13:47     ` James Bottomley
2016-11-04 18:17       ` Bart Van Assche
2016-11-07 20:51         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bmxpqxlb.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=greg@kroah.com \
    --cc=hare@suse.de \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.