* [PATCH 0/2] fix race conditions in SCSI/Block leading to oops
@ 2011-04-22 15:37 James Bottomley
2011-04-22 15:38 ` [PATCH 1/2] block: add proper state guards to __elv_next_request James Bottomley
2011-04-22 15:39 ` [PATCH 2/2] put stricter guards on queue dead checks James Bottomley
0 siblings, 2 replies; 3+ messages in thread
From: James Bottomley @ 2011-04-22 15:37 UTC (permalink / raw)
To: linux-scsi; +Cc: Jens Axboe
There's a lkml thread on this:
http://marc.info/?t=130253369300004
But the synopsis is that some combination of delaying factors in the
boot of this machine exposed a race in SCSI and block between teardown
and reuse of SCSI devices. This two set series fixes the race.
James
---
James Bottomley (2):
block: add proper state guards to __elv_next_request
put stricter guards on queue dead checks
block/blk.h | 3 ++-
drivers/scsi/scsi_sysfs.c | 16 ++++++++--------
2 files changed, 10 insertions(+), 9 deletions(-)
--
1.7.4.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] block: add proper state guards to __elv_next_request
2011-04-22 15:37 [PATCH 0/2] fix race conditions in SCSI/Block leading to oops James Bottomley
@ 2011-04-22 15:38 ` James Bottomley
2011-04-22 15:39 ` [PATCH 2/2] put stricter guards on queue dead checks James Bottomley
1 sibling, 0 replies; 3+ messages in thread
From: James Bottomley @ 2011-04-22 15:38 UTC (permalink / raw)
To: linux-scsi; +Cc: Jens Axboe
blk_cleanup_queue() calls elevator_exit() and after this, we can't
touch the elevator without oopsing. __elv_next_request() must check
for this state because in the refcounted queue model, we can still
call it after blk_cleanup_queue() has been called.
This was reported as causing an oops attributable to scsi.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
block/blk.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/block/blk.h b/block/blk.h
index 6126346..4df474d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -62,7 +62,8 @@ static inline struct request *__elv_next_request(struct request_queue *q)
return rq;
}
- if (!q->elevator->ops->elevator_dispatch_fn(q, 0))
+ if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags) ||
+ !q->elevator->ops->elevator_dispatch_fn(q, 0))
return NULL;
}
}
--
1.7.4.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] put stricter guards on queue dead checks
2011-04-22 15:37 [PATCH 0/2] fix race conditions in SCSI/Block leading to oops James Bottomley
2011-04-22 15:38 ` [PATCH 1/2] block: add proper state guards to __elv_next_request James Bottomley
@ 2011-04-22 15:39 ` James Bottomley
1 sibling, 0 replies; 3+ messages in thread
From: James Bottomley @ 2011-04-22 15:39 UTC (permalink / raw)
To: linux-scsi; +Cc: Jens Axboe
SCSI uses request_queue->queuedata == NULL as a signal that the queue
is dying. We set this state in the sdev release function. However,
this allows a small window where we release the last reference but
haven't quite got to this stage yet and so something will try to take
a reference in scsi_request_fn and oops. It's very rare, but we had a
report here, so we're pushing this as a bug fix
The actual fix is to set request_queue->queuedata to NULL in
scsi_remove_device() before we drop the reference. This causes
correct automatic rejects from scsi_request_fn as people who hold
additional references try to submit work and prevents anything from
getting a new reference to the sdev that way.
Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
drivers/scsi/scsi_sysfs.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e44ff64..e639125 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -322,14 +322,8 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
kfree(evt);
}
- if (sdev->request_queue) {
- sdev->request_queue->queuedata = NULL;
- /* user context needed to free queue */
- scsi_free_queue(sdev->request_queue);
- /* temporary expedient, try to catch use of queue lock
- * after free of sdev */
- sdev->request_queue = NULL;
- }
+ /* NULL queue means the device can't be used */
+ sdev->request_queue = NULL;
scsi_target_reap(scsi_target(sdev));
@@ -937,6 +931,12 @@ void __scsi_remove_device(struct scsi_device *sdev)
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
+
+ /* cause the request function to reject all I/O requests */
+ sdev->request_queue->queuedata = NULL;
+
+ /* Freeing the queue signals to block that we're done */
+ scsi_free_queue(sdev->request_queue);
put_device(dev);
}
--
1.7.4.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-04-22 15:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-22 15:37 [PATCH 0/2] fix race conditions in SCSI/Block leading to oops James Bottomley
2011-04-22 15:38 ` [PATCH 1/2] block: add proper state guards to __elv_next_request James Bottomley
2011-04-22 15:39 ` [PATCH 2/2] put stricter guards on queue dead checks James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox