Re: Deadlock during multipath failover

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christof Schmitt <christof.schmitt@de.ibm.com>
To: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Hannes Reinecke <hare@suse.de>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: Deadlock during multipath failover
Date: Fri, 13 Feb 2009 11:50:43 +0100	[thread overview]
Message-ID: <20090213105043.GA9888@schmichrtp.de.ibm.com> (raw)
In-Reply-To: <20090212204450.GA3556@linux.vnet.ibm.com>

On Thu, Feb 12, 2009 at 12:44:50PM -0800, Mike Anderson wrote:
> Hannes Reinecke <hare@suse.de> wrote:
> > Hi Christof,
> >
> > Christof Schmitt wrote:
> >> During failover tests on a current distribution kernel, we found this
> >> problem. From reading the code, the upstream kernel has the same
> >> problem:
> >>
> >> During multipath failover tests with SCSI on System z, the kernel
> >> deadlocks in this situation:
> >>
> >>>  STACK:
> >>>  0 blk_add_timer+206 [0x2981ea]
> >>>  1 blk_rq_timed_out+132 [0x2982a8]
> >>>  2 blk_abort_request+114 [0x29833e]
> >>>  3 blk_abort_queue+92 [0x2983a8]
> >>>  4 deactivate_path+74 [0x3e00009625a]
> >>>  5 run_workqueue+236 [0x149e04]
> >>>  6 worker_thread+294 [0x149fce]
> >>>  7 kthread+110 [0x14f436]
> >>>  8 kernel_thread_starter+6 [0x10941a]
> >>
> >> blk_abort_queue takes the queue_lock with spinlock_irqsave and walks
> >> the timer_list with list_for_each_entry_safe. Since a path to a SCSI
> >> device just failed, the rport state is FC_PORTSTATE_BLOCKED. This
> >> rport state triggers blk_add_timer that calls list_add_tail to move
> >> the request to the end of timer_list. Thus, the
> >> list_for_each_entry_safe never reaches the end of the timer_list, it
> >> continously moves the requests to the end of the list.
> >>
> > Hmm. That would be fixes by using list_splice() here:
> >
> > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > index a095353..67bcc3f 100644
> > --- a/block/blk-timeout.c
> > +++ b/block/blk-timeout.c
> > @@ -209,12 +209,15 @@ void blk_abort_queue(struct request_queue *q)
> > {
> >        unsigned long flags;
> >        struct request *rq, *tmp;
> > +       LIST_HEAD(list);
> >
> >        spin_lock_irqsave(q->queue_lock, flags);
> >
> >        elv_abort_queue(q);
> >
> > -       list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
> > +       list_splice_init(&q->timeout_list, &list);
> > +
> > +       list_for_each_entry_safe(rq, tmp, &list, timeout_list)
> >                blk_abort_request(rq);
> >
> >        spin_unlock_irqrestore(q->queue_lock, flags);
> >
> >> The rport state FC_PORTSTATE_BLOCKED would end, when the function
> >> fc_timeout_deleted_rport would run to remove the rport. But this
> >> function was schedules from queue_delayed_work. The timer already
> >> expired, but the timer function does not run, because the timer
> >> interrupt is disabled from the spinlock_irqsave call.
> >>
> > .. but this shouldn't happen anymore when using splice, as
> > the timer will be called _after_ the irqrestore above.
> 
> If this patch does not address the deadlock another option to look into
> would be to run some testing without blk_abort_request (just using
> elv_abort_queue) and not try to abort in flight IOs at this time. 
> 
> We observed reduced IO delays during storage failover testing (target
> responsive but timing out IOs) with this code, but I do not have good
> breakdown data on the number of IOs handled by elv_abort_queue vs
> blk_abort_request vs IO delay (It is also config dependent).

The patch fixes the observed deadlock. While the rport is BLOCKED,
blk_abort_request only resets the timer for each request, so i would
guess there is no big difference in calling blk_abort_request or not,
at least in this scenario.

Christof Schmitt

next prev parent reply	other threads:[~2009-02-13 10:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-12  9:40 Deadlock during multipath failover Christof Schmitt
2009-02-12 13:25 ` Hannes Reinecke
2009-02-12 13:25   ` Hannes Reinecke
2009-02-12 20:44   ` Mike Anderson
2009-02-13 10:50     ` Christof Schmitt [this message]
2009-02-17 18:57       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090213105043.GA9888@schmichrtp.de.ibm.com \
    --to=christof.schmitt@de.ibm.com \
    --cc=andmike@linux.vnet.ibm.com \
    --cc=hare@suse.de \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.