linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dark Penguin <darkpenguin@yandex.ru>
To: linux-raid@vger.kernel.org
Subject: md failing mechanism
Date: Fri, 22 Jan 2016 20:59:45 +0300	[thread overview]
Message-ID: <56A26E11.2090703@yandex.ru> (raw)

Greetings,

Recently, I've had my first drive failure in a software RAID1 on a file 
server. And I was really surprised about exactly what happened; I always 
thought that when md can't process a read request from one of the 
drives, it is supposed to mark that drive as faulty and read from 
another drive; but, for some reason, it was deliberately trying to read 
from a faulty drive no matter what, which apparently caused Samba to 
wait until it's finished, and so the whole server was rendered 
inaccessible (I mean, the whole Samba).


What I expected:
- A user tries to read a file via Samba.
- Samba issues a read request to md.
- md tries to read the file from one of the drives... the drive is 
struggling to read a bad sector...
- md thinks: okay, this is taking too long, production is not waiting; 
I'll just read from another drive instead.
- It reads from another drive successfully, and users continue their work.
- Finally, the "bad" drive gives up on trying to read the bad sector and 
returns an error. md marks the drive as faulty and sends an email 
telling me to replace the drive as soon as possible.


What happened instead:
- A user tries to read a file via Samba.
- Samba issues a read request to md.
- md tries to read the file from one of the drives... the drive is 
struggling to read a bad sector... Samba is waiting for md, md is 
waiting for the drive, and the drive is trying again and again to read 
this blasted sector like its life depends on it, while users see that 
the network folder doesn't respond anymore at all.

This goes on forever, until users call me, I come to investigate, see 
Samba down, see a lot of errors in dmesg, and then I manually mark this 
drive as faulty.


Now, that happened a while ago; I did not have the most recent kernel on 
that server (I think it was 3.2 from Debian Wheezy or something a little 
newer from the backports), but I can't try it again with a new server, 
because I can't make a functional RAID1, write data there, and then 
destroy some sectors and see what happens. I just want to ask, is that 
really how it works?.. Was that supposed to happen?.. I thought the main 
point of a RAID1 is to avoid any downtime, especially in such cases!.. 
Or is it maybe a known issue fixed in the more recent versions, so I 
should just update my kernels and expect different behaviour next time?..


-- 
darkpenguin

             reply	other threads:[~2016-01-22 17:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 17:59 Dark Penguin [this message]
2016-01-22 19:29 ` md failing mechanism Phil Turmel
2016-01-22 20:00   ` Wols Lists
2016-01-22 21:44   ` Dark Penguin
2016-01-22 22:18     ` Phil Turmel
2016-01-22 22:50       ` Dark Penguin
2016-01-22 23:23         ` Edward Kuns
2016-01-22 23:34       ` Wols Lists
2016-01-23  0:09         ` Dark Penguin
2016-01-22 22:37     ` Edward Kuns
2016-01-22 23:07       ` Dark Penguin
2016-01-22 23:39         ` Wols Lists
2016-01-23  0:09           ` Dark Penguin
2016-01-23  0:34         ` Phil Turmel
2016-01-23 10:33           ` Dark Penguin
2016-01-23 15:12             ` Phil Turmel
2016-01-22 23:40     ` James J
2016-01-23  0:44       ` Phil Turmel
2016-01-23 14:09       ` Wols Lists
2016-01-23 19:02         ` James J
2016-01-24 22:13           ` Adam Goryachev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A26E11.2090703@yandex.ru \
    --to=darkpenguin@yandex.ru \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).