From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Anderson Subject: block_abort_queue (blk_abort_request) racing with scsi_request_fn Date: Tue, 11 May 2010 22:23:37 -0700 Message-ID: <20100512052336.GB15240@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline Sender: linux-scsi-owner@vger.kernel.org To: Jens Axboe , James Bottomley Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org List-Id: dm-devel.ids I was looking at a dump from a weekend run and I believe I am seeing a case where blk_abort_request through blk_abort_queue picked up a request for timeout that scsi_request_fn decided not to start. This test was under error injection. I assume the case in scsi_request_fn this is hitting is that a request has been put on the timeout_list with blk_start_request and then one of the not_ready checks is hit and the request is decided not to be started. I believe the drop It appears that my usage of walking the timeout_list in block_abort_queue and using blk_mark_rq_complete in block_abort_request will not work in this case. While it would be good to have way to ensure a command is started, it is unclear if even at a low timeout of 1 second that a user other than blk_abort_queue would hit this race. The dropping / acquiring of host_lock and queue_lock in scsi_request_fn and scsi_dispatch_cmd make it unclear to me if usage of blk_mark_rq_complete will cover all cases. I looked at checking serial_number in scsi_times_out along with a couple blk_mark_rq_complete additions, but unclear if this would be good and / or work in all cases. I looked at just accelerating deadline by some default value but unclear if that would be acceptable. I also looked at just using just the mark interface I previously posted and not calling blk_abort_request at all, but that would change current behavior that has been in use for a while. Looking for suggestions. Thanks, -andmike -- Michael Anderson andmike@linux.vnet.ibm.com