All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Christof Schmitt <christof.schmitt@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: Deadlock during multipath failover
Date: Thu, 12 Feb 2009 14:25:33 +0100	[thread overview]
Message-ID: <4994234D.1000404@suse.de> (raw)
In-Reply-To: <20090212094022.GA4973@schmichrtp.de.ibm.com>

Hi Christof,

Christof Schmitt wrote:
> During failover tests on a current distribution kernel, we found this
> problem. From reading the code, the upstream kernel has the same
> problem:
> 
> During multipath failover tests with SCSI on System z, the kernel
> deadlocks in this situation:
> 
>>  STACK:
>>  0 blk_add_timer+206 [0x2981ea]
>>  1 blk_rq_timed_out+132 [0x2982a8]
>>  2 blk_abort_request+114 [0x29833e]
>>  3 blk_abort_queue+92 [0x2983a8]
>>  4 deactivate_path+74 [0x3e00009625a]
>>  5 run_workqueue+236 [0x149e04]
>>  6 worker_thread+294 [0x149fce]
>>  7 kthread+110 [0x14f436]
>>  8 kernel_thread_starter+6 [0x10941a]
> 
> blk_abort_queue takes the queue_lock with spinlock_irqsave and walks
> the timer_list with list_for_each_entry_safe. Since a path to a SCSI
> device just failed, the rport state is FC_PORTSTATE_BLOCKED. This
> rport state triggers blk_add_timer that calls list_add_tail to move
> the request to the end of timer_list. Thus, the
> list_for_each_entry_safe never reaches the end of the timer_list, it
> continously moves the requests to the end of the list.
> 
Hmm. That would be fixes by using list_splice() here:

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index a095353..67bcc3f 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -209,12 +209,15 @@ void blk_abort_queue(struct request_queue *q)
 {
        unsigned long flags;
        struct request *rq, *tmp;
+       LIST_HEAD(list);
 
        spin_lock_irqsave(q->queue_lock, flags);
 
        elv_abort_queue(q);
 
-       list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
+       list_splice_init(&q->timeout_list, &list);
+
+       list_for_each_entry_safe(rq, tmp, &list, timeout_list)
                blk_abort_request(rq);
 
        spin_unlock_irqrestore(q->queue_lock, flags);

> The rport state FC_PORTSTATE_BLOCKED would end, when the function
> fc_timeout_deleted_rport would run to remove the rport. But this
> function was schedules from queue_delayed_work. The timer already
> expired, but the timer function does not run, because the timer
> interrupt is disabled from the spinlock_irqsave call.
> 
.. but this shouldn't happen anymore when using splice, as
the timer will be called _after_ the irqrestore above.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Hannes Reinecke <hare@suse.de>
To: Christof Schmitt <christof.schmitt@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: Deadlock during multipath failover
Date: Thu, 12 Feb 2009 14:25:33 +0100	[thread overview]
Message-ID: <4994234D.1000404@suse.de> (raw)
In-Reply-To: <20090212094022.GA4973@schmichrtp.de.ibm.com>

Hi Christof,

Christof Schmitt wrote:
> During failover tests on a current distribution kernel, we found this
> problem. From reading the code, the upstream kernel has the same
> problem:
> 
> During multipath failover tests with SCSI on System z, the kernel
> deadlocks in this situation:
> 
>>  STACK:
>>  0 blk_add_timer+206 [0x2981ea]
>>  1 blk_rq_timed_out+132 [0x2982a8]
>>  2 blk_abort_request+114 [0x29833e]
>>  3 blk_abort_queue+92 [0x2983a8]
>>  4 deactivate_path+74 [0x3e00009625a]
>>  5 run_workqueue+236 [0x149e04]
>>  6 worker_thread+294 [0x149fce]
>>  7 kthread+110 [0x14f436]
>>  8 kernel_thread_starter+6 [0x10941a]
> 
> blk_abort_queue takes the queue_lock with spinlock_irqsave and walks
> the timer_list with list_for_each_entry_safe. Since a path to a SCSI
> device just failed, the rport state is FC_PORTSTATE_BLOCKED. This
> rport state triggers blk_add_timer that calls list_add_tail to move
> the request to the end of timer_list. Thus, the
> list_for_each_entry_safe never reaches the end of the timer_list, it
> continously moves the requests to the end of the list.
> 
Hmm. That would be fixes by using list_splice() here:

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index a095353..67bcc3f 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -209,12 +209,15 @@ void blk_abort_queue(struct request_queue *q)
 {
        unsigned long flags;
        struct request *rq, *tmp;
+       LIST_HEAD(list);
 
        spin_lock_irqsave(q->queue_lock, flags);
 
        elv_abort_queue(q);
 
-       list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
+       list_splice_init(&q->timeout_list, &list);
+
+       list_for_each_entry_safe(rq, tmp, &list, timeout_list)
                blk_abort_request(rq);
 
        spin_unlock_irqrestore(q->queue_lock, flags);

> The rport state FC_PORTSTATE_BLOCKED would end, when the function
> fc_timeout_deleted_rport would run to remove the rport. But this
> function was schedules from queue_delayed_work. The timer already
> expired, but the timer function does not run, because the timer
> interrupt is disabled from the spinlock_irqsave call.
> 
.. but this shouldn't happen anymore when using splice, as
the timer will be called _after_ the irqrestore above.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

  reply	other threads:[~2009-02-12 13:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-12  9:40 Deadlock during multipath failover Christof Schmitt
2009-02-12 13:25 ` Hannes Reinecke [this message]
2009-02-12 13:25   ` Hannes Reinecke
2009-02-12 20:44   ` Mike Anderson
2009-02-13 10:50     ` Christof Schmitt
2009-02-17 18:57       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4994234D.1000404@suse.de \
    --to=hare@suse.de \
    --cc=christof.schmitt@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.