All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Greg Kroah-Hartman <greg@kroah.com>,
	Hannes Reinecke <hare@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Sagi Grimberg <sagi@grimberg.me>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock
Date: Tue, 08 Nov 2016 20:10:44 -0600	[thread overview]
Message-ID: <87pom5o123.fsf@xmission.com> (raw)
In-Reply-To: <1478655815.2368.34.camel@linux.vnet.ibm.com> (James Bottomley's message of "Tue, 08 Nov 2016 17:43:35 -0800")

James Bottomley <jejb@linux.vnet.ibm.com> writes:

> On Tue, 2016-11-08 at 18:57 -0600, Eric W. Biederman wrote:
>> James Bottomley <jejb@linux.vnet.ibm.com> writes:
>> 
>> > On Tue, 2016-11-08 at 13:13 -0600, Eric W. Biederman wrote:
> [...]
>> > > Is it really the dropping of the lock that is causing this?
>> > > I don't see that when I read those traces.
>> > 
>> > No, it's an ABBA lock inversion that causes this.  The traces are
>> > somewhat dense, but they say it here:
>> > 
>> >  Possible unsafe locking scenario:
>> >        CPU0                    CPU1
>> >        ----                    ----
>> >   lock(s_active#336);
>> >                                lock(&shost->scan_mutex);
>> >                                lock(s_active#336);
>> >   lock(&shost->scan_mutex);
>> > 
>> >  *** DEADLOCK ***
>> > 
>> > The detailed explanation of this is here:
>> > 
>> > http://marc.info/?l=linux-scsi&m=147855187425596
>> > 
>> > The fix is ensuring that the CPU1 thread doesn't get into taking
>> > s_active if CPU0 already has it using the KERNFS_SUICIDED/AL flag 
>> > as an indicator.
>> 
>> So.  The kernfs code does not look safe to have kernfs_remove_self
>> and kernfs_remove_by_name_ns racing against each other I agree.
>> 
>> The kernfs_remove_self path turns KERNFS_SUICIDAL into another 
>> blocking lock by another name, and without lockdep annotations so I 
>> don't know that it is safe.
>
> Yes ... the number of hand rolled locks in that code make it super hard
> to follow.
>
>> The relevant bit from kernfs_remove_self is:
>> 
>> 	if (!(kn->flags & KERNFS_SUICIDAL)) {
>> 		kn->flags |= KERNFS_SUICIDAL;
>> 		__kernfs_remove(kn);
>> 		kn->flags |= KERNFS_SUICIDED;
>> 		ret = true;
>> 	} else {
>> 		wait_queue_head_t *waitq = &kernfs_root(kn)
>> ->deactivate_waitq;
>> 		DEFINE_WAIT(wait);
>> 
>> 		while (true) {
>> 			prepare_to_wait(waitq, &wait,
>> TASK_UNINTERRUPTIBLE);
>> 
>> 			if ((kn->flags & KERNFS_SUICIDED) &&
>> 			    atomic_read(&kn->active) ==
>> KN_DEACTIVATED_BIAS)
>> 				break;
>> 
>> 			mutex_unlock(&kernfs_mutex);
>> 			schedule();
>> 			mutex_lock(&kernfs_mutex);
>> 		}
>> 		finish_wait(waitq, &wait);
>> 		WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb));
>> 		ret = false;
>> 	}
>> 
>> I am pretty certain that if you are going to make kernfs_remove_self 
>> and kernfs_remove_by_name_ns to be safe to race against each other, 
>> not just the KERNFS_SUICIDAL check, but the wait when KERNFS_SUICIDAL 
>> is set needs to be added ot kernfs_remove_by_name_ns.
>
> I don't think you can do that: waiting for SUICIDED would introduce
> another potential lock entanglement.  I'm reasonably happy that the
> deactivation offset coupled with kernfs_drain in the non self remove
> path means that the necessary cleanup is done when the directory itself
> is removed.  That seems to be a common pattern in all non-self
> removes.

But if we don't I am pretty certain there will be asynchrounous behavior
in some cases that could potentially confuse userspace.

Which is partly why I would like to kill kernfs_remove_self.

>> And I suspect if you add the appropriate lockdep annotations to that 
>> mess you will find yourself in a similar but related ABBA deadlock.
>
> I can't prove the negative, but as long as there's no waiting on the
> SUICIDED/AL flags in the non-self remove path, I believe we're safe
> with the current patch.
>
>> Which is why I would like a simpler easier to understand mechanism if
>> we can.
>
> I don't disagree: If you want to clean out the Augean Stables, I can
> lend you the thigh length rubber boots and the gas mask.  However, I
> think that what we're currently proposing: a simple patch to make
> device_remove_file_self() actually work for everyone, along with
> stringent testing is the better approach.
>
> After all, if you look at
>
> commit ac0ece9174aca9aa895ce0accc54f1f8ff12d117
> Author: Tejun Heo <tj@kernel.org>
> Date:   Mon Feb 3 14:03:03 2014 -0500
>
>     scsi: use device_remove_file_self() instead of device_schedule_callback()
>
> You'll see Tejun added all this stuff just to remove the async callback
> we originally had.  Simply restoring the async callback back makes us
> quite considerably worse off because the device_remove_file_self()
> mechanism is in use elsewhere.  We need either to fix it and move on or
> junk it and go back to the original.

I vote we junk it and go back to something that resembles the original.
there are only about 5 other callers so this isn't that bad to do.

Tejun's work clearly added this deadlock 2 and a half years ago, and
it was nasty enough no one noticed until recently.

Using task_work_add(current, ...) as I posted earlier let's us retain
the synchronous property of the current API.

While we debate the details I am happy to look at scsi as a special case
and solve for scsi.  Then when we have the details worked out we can go
fix the other cases.  Given my preliminary patch in my last reply it
looks very straight forward to fix this sanely.

Eric


  reply	other threads:[~2016-11-09  2:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-08  0:32 [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock Bart Van Assche
2016-11-08  7:01 ` Greg KH
2016-11-08 15:34   ` James Bottomley
2016-11-08 15:28 ` James Bottomley
2016-11-08 16:52   ` Bart Van Assche
2016-11-08 18:01     ` James Bottomley
2016-11-08 19:13       ` Eric W. Biederman
2016-11-08 23:33         ` Bart Van Assche
2016-11-09  1:22           ` Eric W. Biederman
2016-11-08 23:44         ` James Bottomley
2016-11-09  0:57           ` Eric W. Biederman
2016-11-09  1:43             ` James Bottomley
2016-11-09  2:10               ` Eric W. Biederman [this message]
2016-11-11  1:37                 ` James Bottomley
2016-11-11  4:13                   ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2018-06-26 22:25 Bart Van Assche
2016-10-26 18:44 Bart Van Assche
2016-10-27  9:36 ` Sagi Grimberg
2016-10-27 15:39   ` Bart Van Assche
2016-10-27  9:46 ` Johannes Thumshirn
2016-10-27 15:38   ` Bart Van Assche
2016-10-29  0:12     ` Johannes Thumshirn
2016-10-29  2:08 ` James Bottomley
2016-10-30 19:22   ` Bart Van Assche
2016-10-30 20:25     ` James Bottomley
2016-11-03 22:27   ` Bart Van Assche
2016-11-04 13:47     ` James Bottomley
2016-11-04 18:17       ` Bart Van Assche
2016-11-07 20:51         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pom5o123.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=greg@kroah.com \
    --cc=hare@suse.de \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.