Re: [patch] block: fix race between request completion and timeout handling

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Moyer <jmoyer@redhat.com>
To: emilne@redhat.com
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, Hannes Reinecke <hare@suse.de>
Subject: Re: [patch] block: fix race between request completion and timeout handling
Date: Tue, 10 Sep 2013 14:01:49 -0400	[thread overview]
Message-ID: <x4961u8eesy.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <1378834539.3872.6443.camel@localhost.localdomain> (Ewan Milne's message of "Tue, 10 Sep 2013 13:35:39 -0400")

Ewan Milne <emilne@redhat.com> writes:

> If the request completes after blk_mark_rq_complete() is called but
> before blk_clear_req_complete() is called, the completion will not be
> processed, and we will have to wait for the request to timeout again.
> Maybe this is not so bad, as it should be extremely rare, but if the
> timeout were a large value for some reason, that could be a problem.
>
> It seems to me that the issue is really that there are 2 state variables
> (the REQ_ATOM_COMPLETE flag and the req->timeout_list) for the same
> state, and so manipulating both of these without a lock will always have
> a window.

Agreed.  Do you see a clean way of fixing that?

> Clearly it would be better to avoid a panic, so Jeff's fix would help.
>
> I'm not sure I follow how the issue Jeff is fixing caused this
> particular crash, though.  How did the request get back on the queue?
> The crash occurred when the SCSI EH was flushing the done_q requests.

Right, sorry.  I wrote that after having been away from the problem for
too long.  I left out an important part:

This would only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it.  That is possible, but it may make
sense to keep digging for another race.  I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used.  It looks like we actually do run
into that situation in other reports.

I don't think this is a smoking gun, but I think the patch should go in
so we can further narrow down the search.

Thanks for looking at it, Ewan.

Cheers,
Jeff

     prev parent reply	other threads:[~2013-09-10 18:01 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-29 19:14 [patch] block: fix race between request completion and timeout handling Jeff Moyer
2013-09-10 17:35 ` Ewan Milne
2013-09-10 18:01   ` Jeff Moyer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x4961u8eesy.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).