Re: [RFC PATCH] scsi: Add failfast mode to avoid infinite retry loop

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Ewan Milne <emilne@redhat.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Eiichi Tsukata <eiichi.tsukata.xh@hitachi.com>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [RFC PATCH] scsi: Add failfast mode to avoid infinite retry loop
Date: Fri, 23 Aug 2013 15:36:55 -0400	[thread overview]
Message-ID: <1377286615.3872.25.camel@localhost.localdomain> (raw)
In-Reply-To: <1377263977.2095.1.camel@dabdike>

On Fri, 2013-08-23 at 06:19 -0700, James Bottomley wrote:
> On Fri, 2013-08-23 at 18:10 +0900, Eiichi Tsukata wrote:
> > Yes, basically the device should be offlined on error detection.
> > Just offlining the disk is enough when an error occurs on "not" os-installed
> > system disk. Panic is going too far on such case.
> > 
> > However, in a clustered environment where computers use each its own
> > disk and
> > do not share the same disk, calling panic() will be suitable when an
> > error
> > occurs in system disk.
> 
> However, when not in a clustered environment, it won't be.  Decisions
> about whether to panic the system or not are user space policy, and
> should not be embedded into subsystems.  What we need to do is to come
> up with a way of detecting the condition, reporting it and possibly
> taking some action.
> 
> >  Because even on such disk error, cluster monitoring
> > tool may not be able to detect the system failure while heartbeat can
> > continue
> > working.
> > So, I think basically offlining is enough and also, panic is necessary
> > on some cases.

The way I have seen this done in such a clustered environment is to have
the heartbeat agent on each system periodically attempt to access the
disk.  If that I/O hangs, other systems will see loss of heartbeat.
You really don't want to panic the kernel.  Among other things, it may
make it difficult to get the system up again later for long enough to
figure out what is wrong.

> 
> Offline seems a bit drastic ... what happens if you send it a target
> reset?
> 
> James
> 
> 
> 
>

next prev parent reply	other threads:[~2013-08-23 19:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-19  9:39 [RFC PATCH] scsi: Add failfast mode to avoid infinite retry loop Eiichi Tsukata
2013-08-19 14:30 ` James Bottomley
2013-08-20  7:13   ` Eiichi Tsukata
2013-08-20 18:09     ` Ewan Milne
2013-08-23  9:10       ` Eiichi Tsukata
2013-08-23 12:26         ` Ric Wheeler
2013-08-26 10:03           ` Eiichi Tsukata
2013-08-23 13:19         ` James Bottomley
2013-08-23 19:36           ` Ewan Milne [this message]
2013-08-26  9:34             ` Eiichi Tsukata
2013-08-26  9:32           ` Eiichi Tsukata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1377286615.3872.25.camel@localhost.localdomain \
    --to=emilne@redhat.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=eiichi.tsukata.xh@hitachi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox