All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: Jens Axboe <jens.axboe@oracle.com>,
	James Bottomley <James.Bottomley@suse.de>
Cc: linux-scsi@vger.kernel.org
Subject: scsi_eh startup on scsi_dispatch_cmd busy calls to scsi_queue_insert
Date: Fri, 28 May 2010 09:20:56 -0700	[thread overview]
Message-ID: <20100528162055.GA32022@linux.vnet.ibm.com> (raw)

This email is on a similar topic to a previous email that I posted on the
subject of blk_abort_request calls through blk_abort_queue racing with requests
that had a timer started on them, but where later requeued due to condition
checks in scsi_request_fn / scsi_dispatch_cmd instead of completing through the
softirq path.
http://markmail.org/message/23vfel74dbtjzzho


While I have seen error cases using standard mainline kernels I have attempted
to accelerated the error cases using a patched kernel. I added a patch for a
few sysfs attributes for controlling abort calls, target busy, and queuecommand
busy. During testing with IO load I could generate two error signatures.

1.) Timeout handler not starting up as failed is greater than busy.

2.) Bug on case in "kernel BUG at block/blk-core.c:956!" which is "BUG_ON(blk_queued_rq(rq));".

These error cases occur if a request that is marked started is added to the
scis_eh list, but later determination decides not to completely start the
request. The not completely starting the request can occur through the path of
scsi_request_fn to the checking of the return value of queuecommand in
scsi_dispatch_cmd.

James, in a response to a ping you indicated that if I was really seeing a
error in this area that I may need a check for complete in the non-softirq
requeue cases. I ran testing with a simple change that was not much more than a
wrapper around blk_mark_rq_complete with a return value. This appeared to
address the issue, but in one test case I created it still failed.

Using a modified scsi_debug module that had a delay in queuecommand of 100ms
more than the timeout value prior to returning a busy response. Prior to
delaying in the queuecommand I dropped the host_lock which a few queuecommand
functions do. I was able at a timeout value of 1 second to generate the bug on
case.

While this test case is on the edge it does point out that the lock dance of
queue_lock / host_lock from scsi_request_fn through the checking of the return
value of queuecommand would appear to leave a window open in the determination
of request ownership.

I also tried a patched test run attempting to use the cmd serial_number to hold
off scsi_eh startup on a command, but the possible drop of the host_lock in
queuecommand functions effects this alternate solution as well.

In older kernels we used to have serialization with the timeout handler in
scsi_dispatch_cmd through the use of  " if (scsi_delete_timer(cmd))" which we
do not have anymore with the newer blk timeout. Since I did not run similar
testing on older kernels it is unclear if a windows existed there.

Question:

1.) Does the edge case using the modified scsi_debug appear to a be a valid
case? If so do you see a method to close this window, or with the current
structure is there a timeout floor where this window will always exist?


Thanks,

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

                 reply	other threads:[~2010-05-28 16:21 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100528162055.GA32022@linux.vnet.ibm.com \
    --to=andmike@linux.vnet.ibm.com \
    --cc=James.Bottomley@suse.de \
    --cc=jens.axboe@oracle.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.