Re: [PATCH] scsi_error: count medium access timeout only once per EH run

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: "Ewan D. Milne" <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	linux-scsi@vger.kernel.org,
	Lawrence Oberman <loberman@redhat.com>,
	Benjamin Block <bblock@linux.vnet.ibm.com>,
	Steffen Maier <maier@de.ibm.com>, Hannes Reinecke <hare@suse.com>
Subject: Re: [PATCH] scsi_error: count medium access timeout only once per EH run
Date: Mon, 27 Feb 2017 22:04:53 -0500	[thread overview]
Message-ID: <yq1a897xbfu.fsf@oracle.com> (raw)
In-Reply-To: <1488223993.10197.146.camel@localhost.localdomain> (Ewan D. Milne's message of "Mon, 27 Feb 2017 14:33:13 -0500")

>>>>> "Ewan" == Ewan D Milne <emilne@redhat.com> writes:

Ewan,

Ewan> So, this is good, the current implementation has a flaw in that
Ewan> under certain conditions, a device will get offlined immediately,
Ewan> (i.e. if there are a few medium access commands pending, and they
Ewan> all timeout), which isn't what was intended.

Yeah. That was OK for my use case. I was trying to prevent the server
from going into a tail spin. There was no chance of recovering the disk.

But ideally we'd be offlining based on how many times we retry the same
medium access command.

Ewan> as separate medium access timeouts, but I think the original
Ewan> intent of Martin's change wasn't to operate on such a short
Ewan> time-scale, am I right, Martin?

On the device that begat my original patch, SPC command responses were
handled by the SAS controller firmware on behalf of all discovered
devices. Regardless of whether said drives were still alive or not.

Medium Access commands, however, would always get passed on to the
physical drive for processing. So when a drive went pining for the
fjords, TUR would always succeed whereas reads and writes would time
out.

-- 
Martin K. Petersen	Oracle Linux Engineering

next prev parent reply	other threads:[~2017-02-28  3:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23 10:27 [PATCH] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
2017-02-23 14:13 ` Laurence Oberman
2017-02-27 19:33 ` Ewan D. Milne
2017-02-28  3:04   ` Martin K. Petersen [this message]
2017-02-28 10:03   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1a897xbfu.fsf@oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=bblock@linux.vnet.ibm.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=loberman@redhat.com \
    --cc=maier@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox