All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: James J <james.j@shiftmail.org>,
	Dark Penguin <darkpenguin@yandex.ru>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: md failing mechanism
Date: Fri, 22 Jan 2016 19:44:55 -0500	[thread overview]
Message-ID: <56A2CD07.9000804@turmel.org> (raw)
In-Reply-To: <56A2BDF7.7020101@shiftmail.org>

On 01/22/2016 06:40 PM, James J wrote:
> On 22/01/2016 22:44, Dark Penguin wrote:
>>
>> As I understand, one way around this problem is to change the kernel
>> timeout to exceed the drive timeout by changing
>> /sys/block/sd?/device/timeout to something larger than the default 30,
>> but I'd have to do that after every reboot, is all that correct?
>>
> 
> No, this part needs further investigation and comments from the gurus.

Yes, DP had that correct.

> With a SCSI timeout 30 secs, which is the setting you had at the time of
> the incident AFAIU, what should have happened was that the drive should
> have been kicked out at the 30th second, this is BEFORE it had a chance
> to return a read failure because your desktop drive takes more than
> 30secs to return a read failure. This was what you indeed expected but
> it is not what has happened.

His problem description doesn't perfectly match timeout mismatch.  He
probably had a real problem that was exacerbated by his now-discovered
timeout problem.  He no longer has the dmesg so further speculation is
moot.  If it happens again, we can look closer.

> The recommentation of raising the timeout to 120+ is for the opposite
> purpose of what you want. It is for the case the sysadmin accepts to
> wait a long time because he wants to prevent the kicking of the drive at
> the first read-error (normally drives are kicked for a write error).
> This might be wanted in order to a) defer the replacement of the drive,
> either to perform the replacement at a more opportune time and/or in a
> better manner such as a no-degrade replace operation, or b) because he
> does not want to replace the drive at all: maybe he believes that the
> error might be spurious and will not happen again and the drive is still
> of acceptable fitness for the purpose, e.g. in a low-cost file server.

No.  If you have a drive that doesn't support scterc or has it turned
off, you *must* set a timeout longer than the drive's native timeout or
you will have great problems.  I suggest you read the references to the
archives I posted.

Keep in mind that in a properly working array UREs are *fixed* when
discovered by overwriting them.  This is vital to array robustness, as
many UREs are transient (don't need relocation at all).

Phil


  reply	other threads:[~2016-01-23  0:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 17:59 md failing mechanism Dark Penguin
2016-01-22 19:29 ` Phil Turmel
2016-01-22 20:00   ` Wols Lists
2016-01-22 21:44   ` Dark Penguin
2016-01-22 22:18     ` Phil Turmel
2016-01-22 22:50       ` Dark Penguin
2016-01-22 23:23         ` Edward Kuns
2016-01-22 23:34       ` Wols Lists
2016-01-23  0:09         ` Dark Penguin
2016-01-22 22:37     ` Edward Kuns
2016-01-22 23:07       ` Dark Penguin
2016-01-22 23:39         ` Wols Lists
2016-01-23  0:09           ` Dark Penguin
2016-01-23  0:34         ` Phil Turmel
2016-01-23 10:33           ` Dark Penguin
2016-01-23 15:12             ` Phil Turmel
2016-01-22 23:40     ` James J
2016-01-23  0:44       ` Phil Turmel [this message]
2016-01-23 14:09       ` Wols Lists
2016-01-23 19:02         ` James J
2016-01-24 22:13           ` Adam Goryachev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A2CD07.9000804@turmel.org \
    --to=philip@turmel.org \
    --cc=darkpenguin@yandex.ru \
    --cc=james.j@shiftmail.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.