* Deadlock during DV when queue is full
@ 2007-05-28 23:42 Andrew Patterson
2007-05-30 18:01 ` Jens Axboe
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Patterson @ 2007-05-28 23:42 UTC (permalink / raw)
To: linux-scsi; +Cc: jens.axboe, Moore, Eric
I am running into deadlock during domain validation when the request
queue is full. I am using the MPT Fusion spi driver and have run into
this problem with 2.6.16 and the latest scsi_misc kernels. The system
is running a load test on a u320 pSCSI bus with a drive that will
occasionally hang the bus until a host reset clears the condition. This
particular drive in known to not handle QAS very well. After the host
reset, the MPT Fusion driver attempts domain validation on all drives on
the bus. During DV, one or more of the queues lockup while trying to
execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the
scsi_execute() call. A stack trace shows:
[ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB)
[ 2318.532030]
[ 2318.532031] Call Trace:
[ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840
[ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0
[ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80
[ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0
[ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0
[ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78
[ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120
[ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40
[ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0
[ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8
[ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120
[ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88
[ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520
[ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30
[ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520
[ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0
[ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00
[ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40
[ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi]
[ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0
[ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi]
[ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0
[ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320
[ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80
[ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280
[ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50
[ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120
[ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20
[ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100
[ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0
[ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40
[ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0
Some code examination and tracing show that get_request_wait() calls
get_request() to obtain a request. If get_request() returns NULL, it
will wait and try again. Here is the code from get_request_wait():
rq = get_request(q, rw_flags, bio, GFP_NOIO);
while (!rq) {
DEFINE_WAIT(wait);
struct request_list *rl = &q->rq;
prepare_to_wait_exclusive(&rl->wait[rw], &wait,
TASK_UNINTERRUPTIBLE);
rq = get_request(q, rw_flags, bio, GFP_NOIO);
if (!rq) {
struct io_context *ioc;
blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ);
__generic_unplug_device(q);
spin_unlock_irq(q->queue_lock);
io_schedule();
/*
* After sleeping, we become a "batching" process and
* will be able to allocate at least one request, and
* up to a big batch of them for a small period time.
* See ioc_batching, ioc_set_batching
*/
ioc = current_io_context(GFP_NOIO, q->node);
ioc_set_batching(q, ioc);
spin_lock_irq(q->queue_lock);
}
finish_wait(&rl->wait[rw], &wait);
}
Note the io_schedule() here. As far as I can tell, there is not wakeup
for this wait queue. The only wakeup's occur when a request is freed.
No requests can be processed because the error handling is holding off
request processing until the error condition is cleared so we get a
deadlock.
Looking through get_request() we see:
if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) {
if (rl->count[rw]+1 >= q->nr_requests) {
ioc = current_io_context(GFP_ATOMIC, q->node);
/*
* The queue will fill after this allocation, so set
* it as full, and mark this process as "batching".
* This process will be allowed to complete a batch of
* requests, others will be blocked.
*/
if (!blk_queue_full(q, rw)) {
ioc_set_batching(q, ioc);
blk_set_queue_full(q, rw);
} else {
if (may_queue != ELV_MQUEUE_MUST
&& !ioc_batching(q, ioc)) {
/*
* The queue is full and the allocating
* process is not a "batcher", and not
* exempted by the IO scheduler
goto out;
}
}
}
blk_set_queue_congested(q, rw);
}
In this heavily loaded system, we get into the "goto out" because count
> nr_requests. The "goto out" will lead to returning NULL. This
condition would not occur if ioc_batching was set, but this is not done
until after the io_schedule() in get_request_wait().
--
Andrew Patterson
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Deadlock during DV when queue is full 2007-05-28 23:42 Deadlock during DV when queue is full Andrew Patterson @ 2007-05-30 18:01 ` Jens Axboe 2007-05-30 18:43 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2007-05-30 18:01 UTC (permalink / raw) To: Andrew Patterson; +Cc: linux-scsi, jens.axboe, Moore, Eric On Mon, May 28 2007, Andrew Patterson wrote: > I am running into deadlock during domain validation when the request > queue is full. I am using the MPT Fusion spi driver and have run into > this problem with 2.6.16 and the latest scsi_misc kernels. The system > is running a load test on a u320 pSCSI bus with a drive that will > occasionally hang the bus until a host reset clears the condition. This > particular drive in known to not handle QAS very well. After the host > reset, the MPT Fusion driver attempts domain validation on all drives on > the bus. During DV, one or more of the queues lockup while trying to > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > scsi_execute() call. A stack trace shows: Ugh, that's nasty. If that is a valid scenario (and it looks like it is), then we have reserve a request (and SCSI command) for such uses as the below scenario is definitely livelock country. > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > [ 2318.532030] > [ 2318.532031] Call Trace: > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > Some code examination and tracing show that get_request_wait() calls > get_request() to obtain a request. If get_request() returns NULL, it > will wait and try again. Here is the code from get_request_wait(): > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > while (!rq) { > DEFINE_WAIT(wait); > struct request_list *rl = &q->rq; > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > TASK_UNINTERRUPTIBLE); > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > if (!rq) { > struct io_context *ioc; > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > __generic_unplug_device(q); > spin_unlock_irq(q->queue_lock); > io_schedule(); > > /* > * After sleeping, we become a "batching" process and > * will be able to allocate at least one request, and > * up to a big batch of them for a small period time. > * See ioc_batching, ioc_set_batching > */ > ioc = current_io_context(GFP_NOIO, q->node); > ioc_set_batching(q, ioc); > > spin_lock_irq(q->queue_lock); > } > finish_wait(&rl->wait[rw], &wait); > } > > Note the io_schedule() here. As far as I can tell, there is not wakeup > for this wait queue. The only wakeup's occur when a request is freed. > No requests can be processed because the error handling is holding off > request processing until the error condition is cleared so we get a > deadlock. > > Looking through get_request() we see: > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > if (rl->count[rw]+1 >= q->nr_requests) { > ioc = current_io_context(GFP_ATOMIC, q->node); > /* > * The queue will fill after this allocation, so set > * it as full, and mark this process as "batching". > * This process will be allowed to complete a batch of > * requests, others will be blocked. > */ > if (!blk_queue_full(q, rw)) { > ioc_set_batching(q, ioc); > blk_set_queue_full(q, rw); > } else { > if (may_queue != ELV_MQUEUE_MUST > && !ioc_batching(q, ioc)) { > /* > * The queue is full and the allocating > * process is not a "batcher", and not > * exempted by the IO scheduler > goto out; > } > } > } > blk_set_queue_congested(q, rw); > } > > In this heavily loaded system, we get into the "goto out" because count > > nr_requests. The "goto out" will lead to returning NULL. This > condition would not occur if ioc_batching was set, but this is not done > until after the io_schedule() in get_request_wait(). It doesn't matter, memory allocation could still block due to reclaim which wont happen because no more IO is getting through. Or if you went atomic it could also fail. There's no other solution than maintaining a cached request + command for this. libata has a similar issue wrt error handling with ncq, we may need a command in error handling to retrieve the log page. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 18:01 ` Jens Axboe @ 2007-05-30 18:43 ` James Bottomley 2007-05-30 18:55 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2007-05-30 18:43 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Patterson, linux-scsi, jens.axboe, Moore, Eric On Wed, 2007-05-30 at 20:01 +0200, Jens Axboe wrote: > On Mon, May 28 2007, Andrew Patterson wrote: > > I am running into deadlock during domain validation when the request > > queue is full. I am using the MPT Fusion spi driver and have run into > > this problem with 2.6.16 and the latest scsi_misc kernels. The system > > is running a load test on a u320 pSCSI bus with a drive that will > > occasionally hang the bus until a host reset clears the condition. This > > particular drive in known to not handle QAS very well. After the host > > reset, the MPT Fusion driver attempts domain validation on all drives on > > the bus. During DV, one or more of the queues lockup while trying to > > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > > scsi_execute() call. A stack trace shows: > > Ugh, that's nasty. If that is a valid scenario (and it looks like it > is), then we have reserve a request (and SCSI command) for such uses as > the below scenario is definitely livelock country. > > > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > > [ 2318.532030] > > [ 2318.532031] Call Trace: > > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > Some code examination and tracing show that get_request_wait() calls > > get_request() to obtain a request. If get_request() returns NULL, it > > will wait and try again. Here is the code from get_request_wait(): > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > while (!rq) { > > DEFINE_WAIT(wait); > > struct request_list *rl = &q->rq; > > > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > > TASK_UNINTERRUPTIBLE); > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > if (!rq) { > > struct io_context *ioc; > > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > > > __generic_unplug_device(q); > > spin_unlock_irq(q->queue_lock); > > io_schedule(); > > > > /* > > * After sleeping, we become a "batching" process and > > * will be able to allocate at least one request, and > > * up to a big batch of them for a small period time. > > * See ioc_batching, ioc_set_batching > > */ > > ioc = current_io_context(GFP_NOIO, q->node); > > ioc_set_batching(q, ioc); > > > > spin_lock_irq(q->queue_lock); > > } > > finish_wait(&rl->wait[rw], &wait); > > } > > > > Note the io_schedule() here. As far as I can tell, there is not wakeup > > for this wait queue. The only wakeup's occur when a request is freed. > > No requests can be processed because the error handling is holding off > > request processing until the error condition is cleared so we get a > > deadlock. > > > > Looking through get_request() we see: > > > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > > if (rl->count[rw]+1 >= q->nr_requests) { > > ioc = current_io_context(GFP_ATOMIC, q->node); > > /* > > * The queue will fill after this allocation, so set > > * it as full, and mark this process as "batching". > > * This process will be allowed to complete a batch of > > * requests, others will be blocked. > > */ > > if (!blk_queue_full(q, rw)) { > > ioc_set_batching(q, ioc); > > blk_set_queue_full(q, rw); > > } else { > > if (may_queue != ELV_MQUEUE_MUST > > && !ioc_batching(q, ioc)) { > > /* > > * The queue is full and the allocating > > * process is not a "batcher", and not > > * exempted by the IO scheduler > > goto out; > > } > > } > > } > > blk_set_queue_congested(q, rw); > > } > > > > In this heavily loaded system, we get into the "goto out" because count > > > nr_requests. The "goto out" will lead to returning NULL. This > > condition would not occur if ioc_batching was set, but this is not done > > until after the io_schedule() in get_request_wait(). > > It doesn't matter, memory allocation could still block due to reclaim > which wont happen because no more IO is getting through. Or if you went > atomic it could also fail. > > There's no other solution than maintaining a cached request + command > for this. libata has a similar issue wrt error handling with ncq, we may > need a command in error handling to retrieve the log page. Actually, there is another solution: DV is careful only to be using a single command for its processes ... if we could use the eh command for this, then I think the problem would go away ... unfortunately, that's a bit more complex to achieve than it sounds. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 18:43 ` James Bottomley @ 2007-05-30 18:55 ` Jens Axboe 2007-05-30 19:02 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2007-05-30 18:55 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Patterson, linux-scsi, jens.axboe, Moore, Eric On Wed, May 30 2007, James Bottomley wrote: > On Wed, 2007-05-30 at 20:01 +0200, Jens Axboe wrote: > > On Mon, May 28 2007, Andrew Patterson wrote: > > > I am running into deadlock during domain validation when the request > > > queue is full. I am using the MPT Fusion spi driver and have run into > > > this problem with 2.6.16 and the latest scsi_misc kernels. The system > > > is running a load test on a u320 pSCSI bus with a drive that will > > > occasionally hang the bus until a host reset clears the condition. This > > > particular drive in known to not handle QAS very well. After the host > > > reset, the MPT Fusion driver attempts domain validation on all drives on > > > the bus. During DV, one or more of the queues lockup while trying to > > > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > > > scsi_execute() call. A stack trace shows: > > > > Ugh, that's nasty. If that is a valid scenario (and it looks like it > > is), then we have reserve a request (and SCSI command) for such uses as > > the below scenario is definitely livelock country. > > > > > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > > > [ 2318.532030] > > > [ 2318.532031] Call Trace: > > > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > > > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > > > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > > > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > > > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > > > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > > > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > > > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > > > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > > > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > > > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > > > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > > > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > > > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > > > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > > > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > > > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > > > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > > > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > > > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > > > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > > > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > > > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > > > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > > > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > > > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > > > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > > > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > > > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > > > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > > > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > > > > Some code examination and tracing show that get_request_wait() calls > > > get_request() to obtain a request. If get_request() returns NULL, it > > > will wait and try again. Here is the code from get_request_wait(): > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > while (!rq) { > > > DEFINE_WAIT(wait); > > > struct request_list *rl = &q->rq; > > > > > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > > > TASK_UNINTERRUPTIBLE); > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > if (!rq) { > > > struct io_context *ioc; > > > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > > > > > __generic_unplug_device(q); > > > spin_unlock_irq(q->queue_lock); > > > io_schedule(); > > > > > > /* > > > * After sleeping, we become a "batching" process and > > > * will be able to allocate at least one request, and > > > * up to a big batch of them for a small period time. > > > * See ioc_batching, ioc_set_batching > > > */ > > > ioc = current_io_context(GFP_NOIO, q->node); > > > ioc_set_batching(q, ioc); > > > > > > spin_lock_irq(q->queue_lock); > > > } > > > finish_wait(&rl->wait[rw], &wait); > > > } > > > > > > Note the io_schedule() here. As far as I can tell, there is not wakeup > > > for this wait queue. The only wakeup's occur when a request is freed. > > > No requests can be processed because the error handling is holding off > > > request processing until the error condition is cleared so we get a > > > deadlock. > > > > > > Looking through get_request() we see: > > > > > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > > > if (rl->count[rw]+1 >= q->nr_requests) { > > > ioc = current_io_context(GFP_ATOMIC, q->node); > > > /* > > > * The queue will fill after this allocation, so set > > > * it as full, and mark this process as "batching". > > > * This process will be allowed to complete a batch of > > > * requests, others will be blocked. > > > */ > > > if (!blk_queue_full(q, rw)) { > > > ioc_set_batching(q, ioc); > > > blk_set_queue_full(q, rw); > > > } else { > > > if (may_queue != ELV_MQUEUE_MUST > > > && !ioc_batching(q, ioc)) { > > > /* > > > * The queue is full and the allocating > > > * process is not a "batcher", and not > > > * exempted by the IO scheduler > > > goto out; > > > } > > > } > > > } > > > blk_set_queue_congested(q, rw); > > > } > > > > > > In this heavily loaded system, we get into the "goto out" because count > > > > nr_requests. The "goto out" will lead to returning NULL. This > > > condition would not occur if ioc_batching was set, but this is not done > > > until after the io_schedule() in get_request_wait(). > > > > It doesn't matter, memory allocation could still block due to reclaim > > which wont happen because no more IO is getting through. Or if you went > > atomic it could also fail. > > > > There's no other solution than maintaining a cached request + command > > for this. libata has a similar issue wrt error handling with ncq, we may > > need a command in error handling to retrieve the log page. > > Actually, there is another solution: DV is careful only to be using a > single command for its processes ... if we could use the eh command for > this, then I think the problem would go away ... unfortunately, that's a > bit more complex to achieve than it sounds. That would be fine, the key is just to have such a reserved command. Is there also a reserved request? -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 18:55 ` Jens Axboe @ 2007-05-30 19:02 ` James Bottomley 2007-05-30 19:03 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2007-05-30 19:02 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Patterson, linux-scsi, jens.axboe, Moore, Eric On Wed, 2007-05-30 at 20:55 +0200, Jens Axboe wrote: > On Wed, May 30 2007, James Bottomley wrote: > > On Wed, 2007-05-30 at 20:01 +0200, Jens Axboe wrote: > > > On Mon, May 28 2007, Andrew Patterson wrote: > > > > I am running into deadlock during domain validation when the request > > > > queue is full. I am using the MPT Fusion spi driver and have run into > > > > this problem with 2.6.16 and the latest scsi_misc kernels. The system > > > > is running a load test on a u320 pSCSI bus with a drive that will > > > > occasionally hang the bus until a host reset clears the condition. This > > > > particular drive in known to not handle QAS very well. After the host > > > > reset, the MPT Fusion driver attempts domain validation on all drives on > > > > the bus. During DV, one or more of the queues lockup while trying to > > > > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > > > > scsi_execute() call. A stack trace shows: > > > > > > Ugh, that's nasty. If that is a valid scenario (and it looks like it > > > is), then we have reserve a request (and SCSI command) for such uses as > > > the below scenario is definitely livelock country. > > > > > > > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > > > > [ 2318.532030] > > > > [ 2318.532031] Call Trace: > > > > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > > > > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > > > > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > > > > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > > > > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > > > > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > > > > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > > > > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > > > > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > > > > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > > > > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > > > > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > > > > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > > > > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > > > > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > > > > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > > > > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > > > > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > > > > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > > > > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > > > > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > > > > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > > > > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > > > > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > > > > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > > > > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > > > > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > > > > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > > > > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > > > > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > > > > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > > > > > > > Some code examination and tracing show that get_request_wait() calls > > > > get_request() to obtain a request. If get_request() returns NULL, it > > > > will wait and try again. Here is the code from get_request_wait(): > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > while (!rq) { > > > > DEFINE_WAIT(wait); > > > > struct request_list *rl = &q->rq; > > > > > > > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > > > > TASK_UNINTERRUPTIBLE); > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > > > if (!rq) { > > > > struct io_context *ioc; > > > > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > > > > > > > __generic_unplug_device(q); > > > > spin_unlock_irq(q->queue_lock); > > > > io_schedule(); > > > > > > > > /* > > > > * After sleeping, we become a "batching" process and > > > > * will be able to allocate at least one request, and > > > > * up to a big batch of them for a small period time. > > > > * See ioc_batching, ioc_set_batching > > > > */ > > > > ioc = current_io_context(GFP_NOIO, q->node); > > > > ioc_set_batching(q, ioc); > > > > > > > > spin_lock_irq(q->queue_lock); > > > > } > > > > finish_wait(&rl->wait[rw], &wait); > > > > } > > > > > > > > Note the io_schedule() here. As far as I can tell, there is not wakeup > > > > for this wait queue. The only wakeup's occur when a request is freed. > > > > No requests can be processed because the error handling is holding off > > > > request processing until the error condition is cleared so we get a > > > > deadlock. > > > > > > > > Looking through get_request() we see: > > > > > > > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > > > > if (rl->count[rw]+1 >= q->nr_requests) { > > > > ioc = current_io_context(GFP_ATOMIC, q->node); > > > > /* > > > > * The queue will fill after this allocation, so set > > > > * it as full, and mark this process as "batching". > > > > * This process will be allowed to complete a batch of > > > > * requests, others will be blocked. > > > > */ > > > > if (!blk_queue_full(q, rw)) { > > > > ioc_set_batching(q, ioc); > > > > blk_set_queue_full(q, rw); > > > > } else { > > > > if (may_queue != ELV_MQUEUE_MUST > > > > && !ioc_batching(q, ioc)) { > > > > /* > > > > * The queue is full and the allocating > > > > * process is not a "batcher", and not > > > > * exempted by the IO scheduler > > > > goto out; > > > > } > > > > } > > > > } > > > > blk_set_queue_congested(q, rw); > > > > } > > > > > > > > In this heavily loaded system, we get into the "goto out" because count > > > > > nr_requests. The "goto out" will lead to returning NULL. This > > > > condition would not occur if ioc_batching was set, but this is not done > > > > until after the io_schedule() in get_request_wait(). > > > > > > It doesn't matter, memory allocation could still block due to reclaim > > > which wont happen because no more IO is getting through. Or if you went > > > atomic it could also fail. > > > > > > There's no other solution than maintaining a cached request + command > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > need a command in error handling to retrieve the log page. > > > > Actually, there is another solution: DV is careful only to be using a > > single command for its processes ... if we could use the eh command for > > this, then I think the problem would go away ... unfortunately, that's a > > bit more complex to achieve than it sounds. > > That would be fine, the key is just to have such a reserved command. Is > there also a reserved request? Yes ... we clean out the failing command in error recovery and reuse it, so we know it has both a command and a request. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 19:02 ` James Bottomley @ 2007-05-30 19:03 ` Jens Axboe 2007-05-30 19:07 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2007-05-30 19:03 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Patterson, linux-scsi, jens.axboe, Moore, Eric On Wed, May 30 2007, James Bottomley wrote: > On Wed, 2007-05-30 at 20:55 +0200, Jens Axboe wrote: > > On Wed, May 30 2007, James Bottomley wrote: > > > On Wed, 2007-05-30 at 20:01 +0200, Jens Axboe wrote: > > > > On Mon, May 28 2007, Andrew Patterson wrote: > > > > > I am running into deadlock during domain validation when the request > > > > > queue is full. I am using the MPT Fusion spi driver and have run into > > > > > this problem with 2.6.16 and the latest scsi_misc kernels. The system > > > > > is running a load test on a u320 pSCSI bus with a drive that will > > > > > occasionally hang the bus until a host reset clears the condition. This > > > > > particular drive in known to not handle QAS very well. After the host > > > > > reset, the MPT Fusion driver attempts domain validation on all drives on > > > > > the bus. During DV, one or more of the queues lockup while trying to > > > > > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > > > > > scsi_execute() call. A stack trace shows: > > > > > > > > Ugh, that's nasty. If that is a valid scenario (and it looks like it > > > > is), then we have reserve a request (and SCSI command) for such uses as > > > > the below scenario is definitely livelock country. > > > > > > > > > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > > > > > [ 2318.532030] > > > > > [ 2318.532031] Call Trace: > > > > > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > > > > > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > > > > > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > > > > > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > > > > > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > > > > > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > > > > > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > > > > > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > > > > > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > > > > > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > > > > > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > > > > > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > > > > > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > > > > > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > > > > > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > > > > > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > > > > > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > > > > > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > > > > > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > > > > > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > > > > > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > > > > > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > > > > > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > > > > > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > > > > > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > > > > > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > > > > > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > > > > > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > > > > > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > > > > > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > > > > > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > > > > > > > > > > Some code examination and tracing show that get_request_wait() calls > > > > > get_request() to obtain a request. If get_request() returns NULL, it > > > > > will wait and try again. Here is the code from get_request_wait(): > > > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > while (!rq) { > > > > > DEFINE_WAIT(wait); > > > > > struct request_list *rl = &q->rq; > > > > > > > > > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > > > > > TASK_UNINTERRUPTIBLE); > > > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > > > > > if (!rq) { > > > > > struct io_context *ioc; > > > > > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > > > > > > > > > __generic_unplug_device(q); > > > > > spin_unlock_irq(q->queue_lock); > > > > > io_schedule(); > > > > > > > > > > /* > > > > > * After sleeping, we become a "batching" process and > > > > > * will be able to allocate at least one request, and > > > > > * up to a big batch of them for a small period time. > > > > > * See ioc_batching, ioc_set_batching > > > > > */ > > > > > ioc = current_io_context(GFP_NOIO, q->node); > > > > > ioc_set_batching(q, ioc); > > > > > > > > > > spin_lock_irq(q->queue_lock); > > > > > } > > > > > finish_wait(&rl->wait[rw], &wait); > > > > > } > > > > > > > > > > Note the io_schedule() here. As far as I can tell, there is not wakeup > > > > > for this wait queue. The only wakeup's occur when a request is freed. > > > > > No requests can be processed because the error handling is holding off > > > > > request processing until the error condition is cleared so we get a > > > > > deadlock. > > > > > > > > > > Looking through get_request() we see: > > > > > > > > > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > > > > > if (rl->count[rw]+1 >= q->nr_requests) { > > > > > ioc = current_io_context(GFP_ATOMIC, q->node); > > > > > /* > > > > > * The queue will fill after this allocation, so set > > > > > * it as full, and mark this process as "batching". > > > > > * This process will be allowed to complete a batch of > > > > > * requests, others will be blocked. > > > > > */ > > > > > if (!blk_queue_full(q, rw)) { > > > > > ioc_set_batching(q, ioc); > > > > > blk_set_queue_full(q, rw); > > > > > } else { > > > > > if (may_queue != ELV_MQUEUE_MUST > > > > > && !ioc_batching(q, ioc)) { > > > > > /* > > > > > * The queue is full and the allocating > > > > > * process is not a "batcher", and not > > > > > * exempted by the IO scheduler > > > > > goto out; > > > > > } > > > > > } > > > > > } > > > > > blk_set_queue_congested(q, rw); > > > > > } > > > > > > > > > > In this heavily loaded system, we get into the "goto out" because count > > > > > > nr_requests. The "goto out" will lead to returning NULL. This > > > > > condition would not occur if ioc_batching was set, but this is not done > > > > > until after the io_schedule() in get_request_wait(). > > > > > > > > It doesn't matter, memory allocation could still block due to reclaim > > > > which wont happen because no more IO is getting through. Or if you went > > > > atomic it could also fail. > > > > > > > > There's no other solution than maintaining a cached request + command > > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > > need a command in error handling to retrieve the log page. > > > > > > Actually, there is another solution: DV is careful only to be using a > > > single command for its processes ... if we could use the eh command for > > > this, then I think the problem would go away ... unfortunately, that's a > > > bit more complex to achieve than it sounds. (btw this is not another solution, it's indeed the solution of keeping a reserved request :-) > > That would be fine, the key is just to have such a reserved command. Is > > there also a reserved request? > > Yes ... we clean out the failing command in error recovery and reuse it, > so we know it has both a command and a request. Sounds a bit hackish, unless the failed command never needs to be retried. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 19:03 ` Jens Axboe @ 2007-05-30 19:07 ` James Bottomley 2007-05-30 19:11 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2007-05-30 19:07 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Patterson, linux-scsi, jens.axboe, Moore, Eric On Wed, 2007-05-30 at 21:03 +0200, Jens Axboe wrote: > On Wed, May 30 2007, James Bottomley wrote: > > On Wed, 2007-05-30 at 20:55 +0200, Jens Axboe wrote: > > > On Wed, May 30 2007, James Bottomley wrote: > > > > On Wed, 2007-05-30 at 20:01 +0200, Jens Axboe wrote: > > > > > On Mon, May 28 2007, Andrew Patterson wrote: > > > > > > I am running into deadlock during domain validation when the request > > > > > > queue is full. I am using the MPT Fusion spi driver and have run into > > > > > > this problem with 2.6.16 and the latest scsi_misc kernels. The system > > > > > > is running a load test on a u320 pSCSI bus with a drive that will > > > > > > occasionally hang the bus until a host reset clears the condition. This > > > > > > particular drive in known to not handle QAS very well. After the host > > > > > > reset, the MPT Fusion driver attempts domain validation on all drives on > > > > > > the bus. During DV, one or more of the queues lockup while trying to > > > > > > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > > > > > > scsi_execute() call. A stack trace shows: > > > > > > > > > > Ugh, that's nasty. If that is a valid scenario (and it looks like it > > > > > is), then we have reserve a request (and SCSI command) for such uses as > > > > > the below scenario is definitely livelock country. > > > > > > > > > > > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > > > > > > [ 2318.532030] > > > > > > [ 2318.532031] Call Trace: > > > > > > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > > > > > > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > > > > > > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > > > > > > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > > > > > > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > > > > > > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > > > > > > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > > > > > > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > > > > > > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > > > > > > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > > > > > > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > > > > > > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > > > > > > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > > > > > > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > > > > > > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > > > > > > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > > > > > > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > > > > > > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > > > > > > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > > > > > > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > > > > > > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > > > > > > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > > > > > > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > > > > > > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > > > > > > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > > > > > > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > > > > > > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > > > > > > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > > > > > > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > > > > > > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > > > > > > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > > > > > > > > > > > > > Some code examination and tracing show that get_request_wait() calls > > > > > > get_request() to obtain a request. If get_request() returns NULL, it > > > > > > will wait and try again. Here is the code from get_request_wait(): > > > > > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > while (!rq) { > > > > > > DEFINE_WAIT(wait); > > > > > > struct request_list *rl = &q->rq; > > > > > > > > > > > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > > > > > > TASK_UNINTERRUPTIBLE); > > > > > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > > > > > > > if (!rq) { > > > > > > struct io_context *ioc; > > > > > > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > > > > > > > > > > > __generic_unplug_device(q); > > > > > > spin_unlock_irq(q->queue_lock); > > > > > > io_schedule(); > > > > > > > > > > > > /* > > > > > > * After sleeping, we become a "batching" process and > > > > > > * will be able to allocate at least one request, and > > > > > > * up to a big batch of them for a small period time. > > > > > > * See ioc_batching, ioc_set_batching > > > > > > */ > > > > > > ioc = current_io_context(GFP_NOIO, q->node); > > > > > > ioc_set_batching(q, ioc); > > > > > > > > > > > > spin_lock_irq(q->queue_lock); > > > > > > } > > > > > > finish_wait(&rl->wait[rw], &wait); > > > > > > } > > > > > > > > > > > > Note the io_schedule() here. As far as I can tell, there is not wakeup > > > > > > for this wait queue. The only wakeup's occur when a request is freed. > > > > > > No requests can be processed because the error handling is holding off > > > > > > request processing until the error condition is cleared so we get a > > > > > > deadlock. > > > > > > > > > > > > Looking through get_request() we see: > > > > > > > > > > > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > > > > > > if (rl->count[rw]+1 >= q->nr_requests) { > > > > > > ioc = current_io_context(GFP_ATOMIC, q->node); > > > > > > /* > > > > > > * The queue will fill after this allocation, so set > > > > > > * it as full, and mark this process as "batching". > > > > > > * This process will be allowed to complete a batch of > > > > > > * requests, others will be blocked. > > > > > > */ > > > > > > if (!blk_queue_full(q, rw)) { > > > > > > ioc_set_batching(q, ioc); > > > > > > blk_set_queue_full(q, rw); > > > > > > } else { > > > > > > if (may_queue != ELV_MQUEUE_MUST > > > > > > && !ioc_batching(q, ioc)) { > > > > > > /* > > > > > > * The queue is full and the allocating > > > > > > * process is not a "batcher", and not > > > > > > * exempted by the IO scheduler > > > > > > goto out; > > > > > > } > > > > > > } > > > > > > } > > > > > > blk_set_queue_congested(q, rw); > > > > > > } > > > > > > > > > > > > In this heavily loaded system, we get into the "goto out" because count > > > > > > > nr_requests. The "goto out" will lead to returning NULL. This > > > > > > condition would not occur if ioc_batching was set, but this is not done > > > > > > until after the io_schedule() in get_request_wait(). > > > > > > > > > > It doesn't matter, memory allocation could still block due to reclaim > > > > > which wont happen because no more IO is getting through. Or if you went > > > > > atomic it could also fail. > > > > > > > > > > There's no other solution than maintaining a cached request + command > > > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > > > need a command in error handling to retrieve the log page. > > > > > > > > Actually, there is another solution: DV is careful only to be using a > > > > single command for its processes ... if we could use the eh command for > > > > this, then I think the problem would go away ... unfortunately, that's a > > > > bit more complex to achieve than it sounds. > > (btw this is not another solution, it's indeed the solution of keeping a > reserved request :-) > > > > That would be fine, the key is just to have such a reserved command. Is > > > there also a reserved request? > > > > Yes ... we clean out the failing command in error recovery and reuse it, > > so we know it has both a command and a request. > > Sounds a bit hackish, unless the failed command never needs to be > retried. Oh, it does. We clean it out, save the necessary on the stack and reuse it, then restore the data and send it for a retry. Up until a while ago it was what all the old_ fields were for in the SCSI command; now, after Christoph fixed it, we save them on the stack instead. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 19:07 ` James Bottomley @ 2007-05-30 19:11 ` Jens Axboe 2007-05-30 19:22 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2007-05-30 19:11 UTC (permalink / raw) To: James Bottomley; +Cc: Andrew Patterson, linux-scsi, jens.axboe, Moore, Eric On Wed, May 30 2007, James Bottomley wrote: > On Wed, 2007-05-30 at 21:03 +0200, Jens Axboe wrote: > > On Wed, May 30 2007, James Bottomley wrote: > > > On Wed, 2007-05-30 at 20:55 +0200, Jens Axboe wrote: > > > > On Wed, May 30 2007, James Bottomley wrote: > > > > > On Wed, 2007-05-30 at 20:01 +0200, Jens Axboe wrote: > > > > > > On Mon, May 28 2007, Andrew Patterson wrote: > > > > > > > I am running into deadlock during domain validation when the request > > > > > > > queue is full. I am using the MPT Fusion spi driver and have run into > > > > > > > this problem with 2.6.16 and the latest scsi_misc kernels. The system > > > > > > > is running a load test on a u320 pSCSI bus with a drive that will > > > > > > > occasionally hang the bus until a host reset clears the condition. This > > > > > > > particular drive in known to not handle QAS very well. After the host > > > > > > > reset, the MPT Fusion driver attempts domain validation on all drives on > > > > > > > the bus. During DV, one or more of the queues lockup while trying to > > > > > > > execute various SCSI commands (INQUIRY, WRITE_BUFFER, etc) using the > > > > > > > scsi_execute() call. A stack trace shows: > > > > > > > > > > > > Ugh, that's nasty. If that is a valid scenario (and it looks like it > > > > > > is), then we have reserve a request (and SCSI command) for such uses as > > > > > > the below scenario is definitely livelock country. > > > > > > > > > > > > > [ 2318.524898] events/1 D a0000001007258f0 0 16 2 (L-TLB) > > > > > > > [ 2318.532030] > > > > > > > [ 2318.532031] Call Trace: > > > > > > > [ 2318.532202] [<a000000100724750>] schedule+0x1550/0x1840 > > > > > > > [ 2318.532204] sp=e00000010a8dfc60 bsp=e00000010a8d8ff0 > > > > > > > [ 2318.546975] [<a0000001007258f0>] io_schedule+0x50/0x80 > > > > > > > [ 2318.546977] sp=e00000010a8dfcf0 bsp=e00000010a8d8fd0 > > > > > > > [ 2318.554417] [<a0000001003b8820>] get_request_wait+0x200/0x2c0 > > > > > > > [ 2318.554419] sp=e00000010a8dfcf0 bsp=e00000010a8d8f78 > > > > > > > [ 2318.562540] [<a0000001003b8990>] blk_get_request+0xb0/0x120 > > > > > > > [ 2318.562542] sp=e00000010a8dfd40 bsp=e00000010a8d8f40 > > > > > > > [ 2318.579166] [<a00000010058b5e0>] scsi_execute+0x40/0x1e0 > > > > > > > [ 2318.579168] sp=e00000010a8dfd40 bsp=e00000010a8d8ee8 > > > > > > > [ 2318.586863] [<a0000001005980f0>] spi_execute+0x70/0x120 > > > > > > > [ 2318.586865] sp=e00000010a8dfd40 bsp=e00000010a8d8e88 > > > > > > > [ 2318.594204] [<a000000100599650>] spi_dv_device_echo_buffer+0x2f0/0x520 > > > > > > > [ 2318.594206] sp=e00000010a8dfdc0 bsp=e00000010a8d8e30 > > > > > > > [ 2318.607333] [<a000000100597a30>] spi_dv_retrain+0x70/0x520 > > > > > > > [ 2318.607335] sp=e00000010a8dfde0 bsp=e00000010a8d8dc0 > > > > > > > [ 2318.616119] [<a000000100599170>] spi_dv_device+0xdf0/0xf00 > > > > > > > [ 2318.616121] sp=e00000010a8dfde0 bsp=e00000010a8d8d40 > > > > > > > [ 2318.630538] [<a00000020db7e360>] mptspi_dv_device+0x160/0x2c0 [mptspi] > > > > > > > [ 2318.630540] sp=e00000010a8dfdf0 bsp=e00000010a8d8ce0 > > > > > > > [ 2318.638341] [<a00000020db7e660>] mptspi_dv_renegotiate_work+0x1a0/0x220 [mptspi] > > > > > > > [ 2318.638343] sp=e00000010a8dfdf0 bsp=e00000010a8d8cb0 > > > > > > > [ 2318.652773] [<a0000001000b80c0>] run_workqueue+0x1c0/0x320 > > > > > > > [ 2318.652775] sp=e00000010a8dfe00 bsp=e00000010a8d8c80 > > > > > > > [ 2318.660003] [<a0000001000b8460>] worker_thread+0x240/0x280 > > > > > > > [ 2318.660005] sp=e00000010a8dfe00 bsp=e00000010a8d8c50 > > > > > > > [ 2318.667536] [<a0000001000c24e0>] kthread+0xa0/0x120 > > > > > > > [ 2318.667538] sp=e00000010a8dfe30 bsp=e00000010a8d8c20 > > > > > > > [ 2318.681699] [<a0000001000129f0>] kernel_thread_helper+0xd0/0x100 > > > > > > > [ 2318.681701] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > > [ 2318.689121] [<a0000001000094c0>] start_kernel_thread+0x20/0x40 > > > > > > > [ 2318.689124] sp=e00000010a8dfe30 bsp=e00000010a8d8bf0 > > > > > > > > > > > > > > > > > > > > > Some code examination and tracing show that get_request_wait() calls > > > > > > > get_request() to obtain a request. If get_request() returns NULL, it > > > > > > > will wait and try again. Here is the code from get_request_wait(): > > > > > > > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > > while (!rq) { > > > > > > > DEFINE_WAIT(wait); > > > > > > > struct request_list *rl = &q->rq; > > > > > > > > > > > > > > prepare_to_wait_exclusive(&rl->wait[rw], &wait, > > > > > > > TASK_UNINTERRUPTIBLE); > > > > > > > > > > > > > > rq = get_request(q, rw_flags, bio, GFP_NOIO); > > > > > > > > > > > > > > if (!rq) { > > > > > > > struct io_context *ioc; > > > > > > > blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ); > > > > > > > > > > > > > > __generic_unplug_device(q); > > > > > > > spin_unlock_irq(q->queue_lock); > > > > > > > io_schedule(); > > > > > > > > > > > > > > /* > > > > > > > * After sleeping, we become a "batching" process and > > > > > > > * will be able to allocate at least one request, and > > > > > > > * up to a big batch of them for a small period time. > > > > > > > * See ioc_batching, ioc_set_batching > > > > > > > */ > > > > > > > ioc = current_io_context(GFP_NOIO, q->node); > > > > > > > ioc_set_batching(q, ioc); > > > > > > > > > > > > > > spin_lock_irq(q->queue_lock); > > > > > > > } > > > > > > > finish_wait(&rl->wait[rw], &wait); > > > > > > > } > > > > > > > > > > > > > > Note the io_schedule() here. As far as I can tell, there is not wakeup > > > > > > > for this wait queue. The only wakeup's occur when a request is freed. > > > > > > > No requests can be processed because the error handling is holding off > > > > > > > request processing until the error condition is cleared so we get a > > > > > > > deadlock. > > > > > > > > > > > > > > Looking through get_request() we see: > > > > > > > > > > > > > > if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) { > > > > > > > if (rl->count[rw]+1 >= q->nr_requests) { > > > > > > > ioc = current_io_context(GFP_ATOMIC, q->node); > > > > > > > /* > > > > > > > * The queue will fill after this allocation, so set > > > > > > > * it as full, and mark this process as "batching". > > > > > > > * This process will be allowed to complete a batch of > > > > > > > * requests, others will be blocked. > > > > > > > */ > > > > > > > if (!blk_queue_full(q, rw)) { > > > > > > > ioc_set_batching(q, ioc); > > > > > > > blk_set_queue_full(q, rw); > > > > > > > } else { > > > > > > > if (may_queue != ELV_MQUEUE_MUST > > > > > > > && !ioc_batching(q, ioc)) { > > > > > > > /* > > > > > > > * The queue is full and the allocating > > > > > > > * process is not a "batcher", and not > > > > > > > * exempted by the IO scheduler > > > > > > > goto out; > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > blk_set_queue_congested(q, rw); > > > > > > > } > > > > > > > > > > > > > > In this heavily loaded system, we get into the "goto out" because count > > > > > > > > nr_requests. The "goto out" will lead to returning NULL. This > > > > > > > condition would not occur if ioc_batching was set, but this is not done > > > > > > > until after the io_schedule() in get_request_wait(). > > > > > > > > > > > > It doesn't matter, memory allocation could still block due to reclaim > > > > > > which wont happen because no more IO is getting through. Or if you went > > > > > > atomic it could also fail. > > > > > > > > > > > > There's no other solution than maintaining a cached request + command > > > > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > > > > need a command in error handling to retrieve the log page. > > > > > > > > > > Actually, there is another solution: DV is careful only to be using a > > > > > single command for its processes ... if we could use the eh command for > > > > > this, then I think the problem would go away ... unfortunately, that's a > > > > > bit more complex to achieve than it sounds. > > > > (btw this is not another solution, it's indeed the solution of keeping a > > reserved request :-) > > > > > > That would be fine, the key is just to have such a reserved command. Is > > > > there also a reserved request? > > > > > > Yes ... we clean out the failing command in error recovery and reuse it, > > > so we know it has both a command and a request. > > > > Sounds a bit hackish, unless the failed command never needs to be > > retried. > > Oh, it does. We clean it out, save the necessary on the stack and reuse > it, then restore the data and send it for a retry. Up until a while ago > it was what all the old_ fields were for in the SCSI command; now, after > Christoph fixed it, we save them on the stack instead. I guess the scsi command doesn't need a whole lot of saved state, but the request does. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 19:11 ` Jens Axboe @ 2007-05-30 19:22 ` James Bottomley 2007-05-31 4:19 ` Andrew Patterson 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2007-05-30 19:22 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Patterson, linux-scsi, Moore, Eric On Wed, 2007-05-30 at 21:11 +0200, Jens Axboe wrote: > > > > > > > There's no other solution than maintaining a cached request + command > > > > > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > > > > > need a command in error handling to retrieve the log page. > > > > > > > > > > > > Actually, there is another solution: DV is careful only to be using a > > > > > > single command for its processes ... if we could use the eh command for > > > > > > this, then I think the problem would go away ... unfortunately, that's a > > > > > > bit more complex to achieve than it sounds. > > > > > > (btw this is not another solution, it's indeed the solution of keeping a > > > reserved request :-) > > > > > > > > That would be fine, the key is just to have such a reserved command. Is > > > > > there also a reserved request? > > > > > > > > Yes ... we clean out the failing command in error recovery and reuse it, > > > > so we know it has both a command and a request. > > > > > > Sounds a bit hackish, unless the failed command never needs to be > > > retried. > > > > Oh, it does. We clean it out, save the necessary on the stack and reuse > > it, then restore the data and send it for a retry. Up until a while ago > > it was what all the old_ fields were for in the SCSI command; now, after > > Christoph fixed it, we save them on the stack instead. > > I guess the scsi command doesn't need a whole lot of saved state, but > the request does. Well, we're careful never to let the block layer see it again. We have a special injection point scsi_send_eh_cmnd() for this. On the other hand ... supposing we were to push it back on the block queue for a resend, what would we need to save? It's fully prepared and set up as special because of the attached command, so I'm not sure there would be anything, as long as we don't let it go into the normal block completion paths ... I'm just wondering if we can get rid of the special injection path. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-30 19:22 ` James Bottomley @ 2007-05-31 4:19 ` Andrew Patterson 2007-05-31 5:31 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Andrew Patterson @ 2007-05-31 4:19 UTC (permalink / raw) To: James Bottomley; +Cc: Jens Axboe, linux-scsi, Moore, Eric On Wed, 2007-05-30 at 14:22 -0500, James Bottomley wrote: > On Wed, 2007-05-30 at 21:11 +0200, Jens Axboe wrote: > > > > > > > > There's no other solution than maintaining a cached request + command > > > > > > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > > > > > > need a command in error handling to retrieve the log page. > > > > > > > > > > > > > > Actually, there is another solution: DV is careful only to be using a > > > > > > > single command for its processes ... if we could use the eh command for > > > > > > > this, then I think the problem would go away ... unfortunately, that's a > > > > > > > bit more complex to achieve than it sounds. > > > > > > > > (btw this is not another solution, it's indeed the solution of keeping a > > > > reserved request :-) > > > > > > > > > > That would be fine, the key is just to have such a reserved command. Is > > > > > > there also a reserved request? > > > > > > > > > > Yes ... we clean out the failing command in error recovery and reuse it, > > > > > so we know it has both a command and a request. > > > > > > > > Sounds a bit hackish, unless the failed command never needs to be > > > > retried. > > > > > > Oh, it does. We clean it out, save the necessary on the stack and reuse > > > it, then restore the data and send it for a retry. Up until a while ago > > > it was what all the old_ fields were for in the SCSI command; now, after > > > Christoph fixed it, we save them on the stack instead. > > > > I guess the scsi command doesn't need a whole lot of saved state, but > > the request does. > > Well, we're careful never to let the block layer see it again. We have > a special injection point scsi_send_eh_cmnd() for this. On the other > hand ... supposing we were to push it back on the block queue for a > resend, what would we need to save? It's fully prepared and set up as > special because of the attached command, so I'm not sure there would be > anything, as long as we don't let it go into the normal block completion > paths ... I'm just wondering if we can get rid of the special injection > path. > > James > > Rather than reusing a request, can we just somehow signal the block layer to let this command exceed the current nr_request limit and go on through? This would be similar to the ioc_batching mechanism currently in use. Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock during DV when queue is full 2007-05-31 4:19 ` Andrew Patterson @ 2007-05-31 5:31 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2007-05-31 5:31 UTC (permalink / raw) To: Andrew Patterson; +Cc: James Bottomley, linux-scsi, Moore, Eric On Wed, May 30 2007, Andrew Patterson wrote: > On Wed, 2007-05-30 at 14:22 -0500, James Bottomley wrote: > > On Wed, 2007-05-30 at 21:11 +0200, Jens Axboe wrote: > > > > > > > > > There's no other solution than maintaining a cached request + command > > > > > > > > > for this. libata has a similar issue wrt error handling with ncq, we may > > > > > > > > > need a command in error handling to retrieve the log page. > > > > > > > > > > > > > > > > Actually, there is another solution: DV is careful only to be using a > > > > > > > > single command for its processes ... if we could use the eh command for > > > > > > > > this, then I think the problem would go away ... unfortunately, that's a > > > > > > > > bit more complex to achieve than it sounds. > > > > > > > > > > (btw this is not another solution, it's indeed the solution of keeping a > > > > > reserved request :-) > > > > > > > > > > > > That would be fine, the key is just to have such a reserved command. Is > > > > > > > there also a reserved request? > > > > > > > > > > > > Yes ... we clean out the failing command in error recovery and reuse it, > > > > > > so we know it has both a command and a request. > > > > > > > > > > Sounds a bit hackish, unless the failed command never needs to be > > > > > retried. > > > > > > > > Oh, it does. We clean it out, save the necessary on the stack and reuse > > > > it, then restore the data and send it for a retry. Up until a while ago > > > > it was what all the old_ fields were for in the SCSI command; now, after > > > > Christoph fixed it, we save them on the stack instead. > > > > > > I guess the scsi command doesn't need a whole lot of saved state, but > > > the request does. > > > > Well, we're careful never to let the block layer see it again. We have > > a special injection point scsi_send_eh_cmnd() for this. On the other > > hand ... supposing we were to push it back on the block queue for a > > resend, what would we need to save? It's fully prepared and set up as > > special because of the attached command, so I'm not sure there would be > > anything, as long as we don't let it go into the normal block completion > > paths ... I'm just wondering if we can get rid of the special injection > > path. > > > > James > > > > > > Rather than reusing a request, can we just somehow signal the block > layer to let this command exceed the current nr_request limit and go on > through? This would be similar to the ioc_batching mechanism currently > in use. And if there's no more memory left, what do you do? -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-05-31 5:32 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-28 23:42 Deadlock during DV when queue is full Andrew Patterson 2007-05-30 18:01 ` Jens Axboe 2007-05-30 18:43 ` James Bottomley 2007-05-30 18:55 ` Jens Axboe 2007-05-30 19:02 ` James Bottomley 2007-05-30 19:03 ` Jens Axboe 2007-05-30 19:07 ` James Bottomley 2007-05-30 19:11 ` Jens Axboe 2007-05-30 19:22 ` James Bottomley 2007-05-31 4:19 ` Andrew Patterson 2007-05-31 5:31 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).