From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: Hannes Reinecke <hare@suse.de>, Jens Axboe <jens.axboe@oracle.com>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>,
linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: Deadlock during multipath failover
Date: Thu, 12 Feb 2009 12:44:50 -0800 [thread overview]
Message-ID: <20090212204450.GA3556@linux.vnet.ibm.com> (raw)
In-Reply-To: <4994234D.1000404@suse.de>
Hannes Reinecke <hare@suse.de> wrote:
> Hi Christof,
>
> Christof Schmitt wrote:
>> During failover tests on a current distribution kernel, we found this
>> problem. From reading the code, the upstream kernel has the same
>> problem:
>>
>> During multipath failover tests with SCSI on System z, the kernel
>> deadlocks in this situation:
>>
>>> STACK:
>>> 0 blk_add_timer+206 [0x2981ea]
>>> 1 blk_rq_timed_out+132 [0x2982a8]
>>> 2 blk_abort_request+114 [0x29833e]
>>> 3 blk_abort_queue+92 [0x2983a8]
>>> 4 deactivate_path+74 [0x3e00009625a]
>>> 5 run_workqueue+236 [0x149e04]
>>> 6 worker_thread+294 [0x149fce]
>>> 7 kthread+110 [0x14f436]
>>> 8 kernel_thread_starter+6 [0x10941a]
>>
>> blk_abort_queue takes the queue_lock with spinlock_irqsave and walks
>> the timer_list with list_for_each_entry_safe. Since a path to a SCSI
>> device just failed, the rport state is FC_PORTSTATE_BLOCKED. This
>> rport state triggers blk_add_timer that calls list_add_tail to move
>> the request to the end of timer_list. Thus, the
>> list_for_each_entry_safe never reaches the end of the timer_list, it
>> continously moves the requests to the end of the list.
>>
> Hmm. That would be fixes by using list_splice() here:
>
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index a095353..67bcc3f 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -209,12 +209,15 @@ void blk_abort_queue(struct request_queue *q)
> {
> unsigned long flags;
> struct request *rq, *tmp;
> + LIST_HEAD(list);
>
> spin_lock_irqsave(q->queue_lock, flags);
>
> elv_abort_queue(q);
>
> - list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
> + list_splice_init(&q->timeout_list, &list);
> +
> + list_for_each_entry_safe(rq, tmp, &list, timeout_list)
> blk_abort_request(rq);
>
> spin_unlock_irqrestore(q->queue_lock, flags);
>
>> The rport state FC_PORTSTATE_BLOCKED would end, when the function
>> fc_timeout_deleted_rport would run to remove the rport. But this
>> function was schedules from queue_delayed_work. The timer already
>> expired, but the timer function does not run, because the timer
>> interrupt is disabled from the spinlock_irqsave call.
>>
> .. but this shouldn't happen anymore when using splice, as
> the timer will be called _after_ the irqrestore above.
If this patch does not address the deadlock another option to look into
would be to run some testing without blk_abort_request (just using
elv_abort_queue) and not try to abort in flight IOs at this time.
We observed reduced IO delays during storage failover testing (target
responsive but timing out IOs) with this code, but I do not have good
breakdown data on the number of IOs handled by elv_abort_queue vs
blk_abort_request vs IO delay (It is also config dependent).
-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com
next prev parent reply other threads:[~2009-02-12 20:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-12 9:40 Deadlock during multipath failover Christof Schmitt
2009-02-12 13:25 ` Hannes Reinecke
2009-02-12 13:25 ` Hannes Reinecke
2009-02-12 20:44 ` Mike Anderson [this message]
2009-02-13 10:50 ` Christof Schmitt
2009-02-17 18:57 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090212204450.GA3556@linux.vnet.ibm.com \
--to=andmike@linux.vnet.ibm.com \
--cc=christof.schmitt@de.ibm.com \
--cc=hare@suse.de \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.