linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wols Lists <antlists@youngman.org.uk>
To: Gandalf Corvotempesta <gandalf.corvotempesta@gmail.com>,
	linux-raid@vger.kernel.org
Subject: Re: Disk Monitoring
Date: Wed, 28 Jun 2017 13:43:56 +0100	[thread overview]
Message-ID: <5953A48C.9080500@youngman.org.uk> (raw)
In-Reply-To: <CAJH6TXgvrVckHDmh1oiN9mupLrsS2NP3J44bG1_wE9Nnx4=yHQ@mail.gmail.com>

On 28/06/17 11:25, Gandalf Corvotempesta wrote:
> Hi to all
> I always used hardwre raid but with my next server I would like to use mdadm.
> 
> Some questions:
> 
> 1) all raid controllers have proactive monitoring features, like
> patrol read, consistency check and (more or less) some SMART
> integration.
> Any counterpart in mdadm?
> 
> 2) thanks to this features, raid controller are usually able to detect
> disk issues before they cause data-loss. what about mdadm ?
> 
> How and when do you replace disks ? Based on which params? Do you
> always wait for a total failure before replacing the disk?

Not wise. mdadm has the --replace option which will copy a failing
drive. This ensures redundancy is not lost during a disk replacement
(unless other stuff goes wrong too).

You need to use stuff like SMART to monitor disk health, read up on
smartctl. Okay, disks often fail unexpectedly even when SMART says
they're healthy, but if things like the relocate count start climbing
it's an indication of trouble ...

Some people are very aggressive and replace disks at the first hint of
trouble. Other people only replace disks when things start going badly
wrong. Your call. The whole point of raid is to enable recovery when
things have otherwise gone irretrievably wrong, but it's best not to
push your luck that far as many people have found out ...
> 
> Is mdadm able to notify some possible bad-things before they happens ?

You probably need to turn on kernel logging. And monitor the logs!

Also keep an eye on /proc/mdstat.

I don't know what state xosview is in at the moment but that's my
favourite monitoring tool. Run it on the server with the array, use X to
display it on your local desktop. Last I checked, the raid monitoring
stuff was broken, but the author knows and was fixing it.
> 
> Many times in the past our raid controllers forced a bad sector
> reallocation during proactive tasks like patrol read. This saved me
> many times before. I've tried to not replace a disks when this
> reallocation was made (it was a test server) and after some weeks the
> disk failed totally.

Read up on how disks fail. If you tell mdadm to do a "scrub" it will
read the array from end to end. This should cause any dodgy sectors to
be rewritten. Note that this doesn't mean anything is wrong - just as
RAM decays and needs to be refreshed every few nanoseconds, so disk
decays and needs to be refreshed every few years. It's only when the
magnetic coating begins to physically decay that you need to worry about
the health of the disk on that score.

Cheers,
Wol


  parent reply	other threads:[~2017-06-28 12:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 10:25 Disk Monitoring Gandalf Corvotempesta
2017-06-28 10:45 ` Johannes Truschnigg
2017-07-06  3:31   ` NeilBrown
2017-06-28 12:43 ` Wols Lists [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-06-28 13:19 Wolfgang Denk
2017-06-29  9:52 ` Gandalf Corvotempesta
2017-06-29 10:10   ` Reindl Harald
2017-06-29 10:14     ` Gandalf Corvotempesta
2017-06-29 10:37       ` Reindl Harald
2017-06-29 14:28       ` Wols Lists
2017-06-29 10:14   ` Andreas Klauer
2017-06-29 10:14   ` Mateusz Korniak
2017-06-29 10:16     ` Gandalf Corvotempesta
2017-06-29 14:33       ` Wols Lists
2017-06-30 12:35         ` Gandalf Corvotempesta
2017-06-30 14:35           ` Phil Turmel
2017-06-30 19:56             ` Anthony Youngman
2017-07-01 13:42               ` Drew
2017-07-01 14:12                 ` Gandalf Corvotempesta
2017-07-01 15:36                   ` Drew
2017-06-29 10:20   ` Mateusz Korniak
2017-06-29 10:25     ` Gandalf Corvotempesta
2017-06-29 10:34       ` Reindl Harald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5953A48C.9080500@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=gandalf.corvotempesta@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).