linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>,
	linux-kernel@vger.kernel.org, Jens Axboe <jaxboe@fusionio.com>
Subject: Re: Strange block/scsi/workqueue issue
Date: Tue, 12 Apr 2011 14:15:12 +0900	[thread overview]
Message-ID: <20110412051512.GL9673@mtj.dyndns.org> (raw)
In-Reply-To: <1302583757.2558.21.camel@mulgrave.site>

Hello,

On Mon, Apr 11, 2011 at 11:49:17PM -0500, James Bottomley wrote:
> > But confused here.  Why does it make any difference whether the
> > release operation is in the request_fn context or not?  What makes
> > SCSI refcounting different from others?
> 
> I didn't say it did.  SCSI refcounting is fairly standard.
> 
> The problem isn't really anything to do with SCSI ... it's the way block
> queue destruction must now be called.  The block queue destruction
> includes a synchronous flush of the work queue.  That means it can't be
> called from the executing workqueue without deadlocking.  The last put
> of a SCSI device destroys the queue.  This now means that the last put
> of the SCSI device can't be in the block delay work path.  However, as
> the device shuts down that can very well wind up happening if
> blk_delay_queue() ends up being called as the device is dying.

Yeah, I understood that part.  I thought you were referring to the
problem caused by the proposed workqueue replacement in the patch when
you talked about workqueue related issues in the previous message, and
saying that there's an inherent problem with using workqueue directly.

> The entangled deadlock seems to have been introduced by commit
> 3cca6dc1c81e2407928dc4c6105252146fd3924f prior to that, there was no
> synchronous cancel in the destroy path.
> 
> A fix might be to shunt more stuff off to workqueues, but that's
> producing a more complex system which would be prone to entanglements
> that would be even harder to spot.

I don't agree there.  To me, the cause for entanglement seems to be
request_fn calling all the way through blocking destruction because it
detected that the final put was called with sleepable context.  It's
just weird and difficult to anticipate to directly call into sleepable
destruction path from request_fn whether it had sleepable context or
not.  With the yet-to-be-debugged bug caused by the conversion aside,
I think simply using workqueue is the better solution.

> Perhaps a better solution is just not to use sync cancellations in
> block?  As long as the work in the queue holds a queue ref, they can be
> done asynchronously.

Hmmm... maybe but at least I prefer doing explicit shutdown/draining
on destruction even if the base data structure is refcounted.  Things
become much more predictable that way.

Thanks.

-- 
tejun

  parent reply	other threads:[~2011-04-12  5:15 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-11 14:56 Strange block/scsi/workqueue issue Steven Whitehouse
2011-04-11 17:18 ` Tejun Heo
2011-04-11 17:29   ` Jens Axboe
2011-04-11 17:52   ` Steven Whitehouse
2011-04-12  0:14     ` Tejun Heo
2011-04-12  8:49       ` Steven Whitehouse
2011-04-12  0:47   ` James Bottomley
2011-04-12  2:51     ` Tejun Heo
2011-04-12  4:49       ` James Bottomley
2011-04-12  5:02         ` James Bottomley
2011-04-12  8:42           ` Steven Whitehouse
2011-04-12 13:42             ` James Bottomley
2011-04-12 14:06               ` Steven Whitehouse
2011-04-12 15:14                 ` James Bottomley
2011-04-12 16:04                   ` Steven Whitehouse
2011-04-12 16:27                     ` James Bottomley
2011-04-12 16:51                       ` Steven Whitehouse
2011-04-12 17:41                         ` James Bottomley
2011-04-12 18:33                           ` Steven Whitehouse
2011-04-12 19:56                             ` James Bottomley
2011-04-12 20:30                               ` Steven Whitehouse
2011-04-12 20:43                                 ` James Bottomley
2011-04-13  5:18                                   ` Tejun Heo
2011-04-13  6:06                                     ` Tejun Heo
2011-04-13  9:20                                       ` Steven Whitehouse
2011-04-13 14:00                                         ` Steven Whitehouse
2011-04-13 17:01                                           ` James Bottomley
2011-04-13 19:35                                             ` Steven Whitehouse
2011-04-13 20:12                                             ` Jens Axboe
2011-04-13 20:17                                               ` James Bottomley
2011-04-22 18:01                                                 ` Tejun Heo
2011-04-22 18:06                                                   ` James Bottomley
2011-04-22 18:30                                                     ` Tejun Heo
2011-05-31  6:05                                             ` Anton V. Boyarshinov
2011-04-22 18:03                                           ` Tejun Heo
2011-04-12  5:15         ` Tejun Heo [this message]
2011-04-12 15:15           ` James Bottomley
2011-04-13  5:11             ` Tejun Heo
2011-04-13 14:15               ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110412051512.GL9673@mtj.dyndns.org \
    --to=tj@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).