Re: Strange block/scsi/workqueue issue

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>,
	linux-kernel@vger.kernel.org, Jens Axboe <jaxboe@fusionio.com>
Subject: Re: Strange block/scsi/workqueue issue
Date: Wed, 13 Apr 2011 14:11:39 +0900	[thread overview]
Message-ID: <20110413051139.GC24161@mtj.dyndns.org> (raw)
In-Reply-To: <1302621318.2604.19.camel@mulgrave.site>

Hey, James.

On Tue, Apr 12, 2011 at 10:15:18AM -0500, James Bottomley wrote:
> So your idea is that all final puts should go through a workqueue?  Like
> I said, that would work, but it's not just SCSI ... any call path that
> destroys a queue has to be audited.

Yeap.

> The problem is nothing to do with sleeping context ... it's that any
> work called by the block workqueue can't destroy that queue.  In a
> refcounted model, that's a bit nasty.

I can see your point but please read on.

> > Hmmm... maybe but at least I prefer doing explicit shutdown/draining
> > on destruction even if the base data structure is refcounted.  Things
> > become much more predictable that way.
> 
> It is pretty much instantaneous.  Unless we're executing, we cancel the
> work.  If the work is already running, we just let it complete instead
> of waiting for it.
> 
> Synchronous waits are dangerous because they cause entanglement.

There are two different types of dangers involved.  One is of getting
trapped into deadlock by recursing and ending up waiting for oneself.
The other of continuing operation on objects which could be in dubious
state.  I guess my point is that I prefer the former by a large
margin.

The deadlocks are more reliable in reproducibility.  Lockdep and soft
hang check can detect them easily and a single stack dump will point
us right to where the problem is.  The latter is much trickier.  The
problem is more difficult to trigger and even when it triggers the
effect often wouldn't be obvious.  Auditing for correctness is more
difficult too - which fields are safe to access post-mortem?  Is there
any chance that the ongoing operation might reach out to hardware
which is already gone or claimed by another software entity?

In this particular case, IMHO it's reasonable for block layer to
require that the destruction function not to be called directly from
request queue path although it definitely could have used better
documentation.

Thank you.

-- 
tejun

next prev parent reply	other threads:[~2011-04-13  5:11 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-11 14:56 Strange block/scsi/workqueue issue Steven Whitehouse
2011-04-11 17:18 ` Tejun Heo
2011-04-11 17:29   ` Jens Axboe
2011-04-11 17:52   ` Steven Whitehouse
2011-04-12  0:14     ` Tejun Heo
2011-04-12  8:49       ` Steven Whitehouse
2011-04-12  0:47   ` James Bottomley
2011-04-12  2:51     ` Tejun Heo
2011-04-12  4:49       ` James Bottomley
2011-04-12  5:02         ` James Bottomley
2011-04-12  8:42           ` Steven Whitehouse
2011-04-12 13:42             ` James Bottomley
2011-04-12 14:06               ` Steven Whitehouse
2011-04-12 15:14                 ` James Bottomley
2011-04-12 16:04                   ` Steven Whitehouse
2011-04-12 16:27                     ` James Bottomley
2011-04-12 16:51                       ` Steven Whitehouse
2011-04-12 17:41                         ` James Bottomley
2011-04-12 18:33                           ` Steven Whitehouse
2011-04-12 19:56                             ` James Bottomley
2011-04-12 20:30                               ` Steven Whitehouse
2011-04-12 20:43                                 ` James Bottomley
2011-04-13  5:18                                   ` Tejun Heo
2011-04-13  6:06                                     ` Tejun Heo
2011-04-13  9:20                                       ` Steven Whitehouse
2011-04-13 14:00                                         ` Steven Whitehouse
2011-04-13 17:01                                           ` James Bottomley
2011-04-13 19:35                                             ` Steven Whitehouse
2011-04-13 20:12                                             ` Jens Axboe
2011-04-13 20:17                                               ` James Bottomley
2011-04-22 18:01                                                 ` Tejun Heo
2011-04-22 18:06                                                   ` James Bottomley
2011-04-22 18:30                                                     ` Tejun Heo
2011-05-31  6:05                                             ` Anton V. Boyarshinov
2011-04-22 18:03                                           ` Tejun Heo
2011-04-12  5:15         ` Tejun Heo
2011-04-12 15:15           ` James Bottomley
2011-04-13  5:11             ` Tejun Heo [this message]
2011-04-13 14:15               ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110413051139.GC24161@mtj.dyndns.org \
    --to=tj@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).