* Monitoring for disk failures
@ 2014-01-27 11:43 Alin Dobre
2014-01-27 13:10 ` Duncan
2014-01-28 9:03 ` Anand Jain
0 siblings, 2 replies; 4+ messages in thread
From: Alin Dobre @ 2014-01-27 11:43 UTC (permalink / raw)
To: linux-btrfs
Hi all!
I am trying to create a very simple script that would alert in case of
disk failures from a RAID Btrfs.
Digging into the code, I have noticed that the "btrfs fi sh" command
should display a warning if there is a missing disk. However, testing in
a Qemu, I used "drive_del" via QMP to remove a "live" SCSI drive,
already mounted as part of a RAID10 array, the "fi sh" command still
gave no indication that the drive is missing. Then, I tried removing a
scsi disk from the host via "echo 1 >/sys/block/sdX/device/delete" to
actually make the kernel SCSI host forget about it, and "fi sh" still
doesn't show anything.
I have tested using btrfs-progs v3.12 and kernel 3.13.0.
Do you guys know what's wrong with the setup explained above or do you
have any indication on how to detect if there is a failing disk, part of
a Btrfs RAID?
Cheers,
Alin.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Monitoring for disk failures
2014-01-27 11:43 Monitoring for disk failures Alin Dobre
@ 2014-01-27 13:10 ` Duncan
2014-01-28 9:15 ` Anand Jain
2014-01-28 9:03 ` Anand Jain
1 sibling, 1 reply; 4+ messages in thread
From: Duncan @ 2014-01-27 13:10 UTC (permalink / raw)
To: linux-btrfs
Alin Dobre posted on Mon, 27 Jan 2014 11:43:33 +0000 as excerpted:
> I am trying to create a very simple script that would alert in case of
> disk failures from a RAID Btrfs.
>
> Digging into the code, I have noticed that the "btrfs fi sh" command
> should display a warning if there is a missing disk. However, testing in
> a Qemu, I used "drive_del" via QMP to remove a "live" SCSI drive,
> already mounted as part of a RAID10 array, the "fi sh" command still
> gave no indication that the drive is missing. Then, I tried removing a
> scsi disk from the host via "echo 1 >/sys/block/sdX/device/delete" to
> actually make the kernel SCSI host forget about it, and "fi sh" still
> doesn't show anything.
>
> I have tested using btrfs-progs v3.12 and kernel 3.13.0.
Without actually trying it here... I believe by default that'd update
only when there was an I/O error.
Did you try btrfs filesystem show --all-devices? That scans differently.
If that doesn't work try btrfs device scan first as that updates the in-
kernel list, then filesystem show. Alternatively, monitor the kernel log
for output as the scanned devices show up there.
And if /that/ doesn't work, try show, followed by a probe of all the
devices listed by show. But I strongly suspect a device scan will force
the update you're looking for.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Monitoring for disk failures
2014-01-27 11:43 Monitoring for disk failures Alin Dobre
2014-01-27 13:10 ` Duncan
@ 2014-01-28 9:03 ` Anand Jain
1 sibling, 0 replies; 4+ messages in thread
From: Anand Jain @ 2014-01-28 9:03 UTC (permalink / raw)
To: Alin Dobre; +Cc: linux-btrfs
Alin,
[bug] its messy when missing device reappears after its been replaced in
RAID1
I am aware of it and working on it. I also reported a more
critical bug earlier as below.
[bug] its messy when missing device reappears after its been replaced
in RAID1
We see IO errors when disk goes missing. But NO as of now
a mounted FS will never report a missing disk unless you
unmount and mount (not remount) then kernel will realize
the missing disk
Note that don;t be happy about
btrfs fi show -d <all-devices>
reporting missing disk (when fs is mounted), since its
not inline with kernel. Here with -d option btrfs-progs
is adding its 'own' intelligence to show disk as missing
(what is not what end user want, end user would want to
know how btrfs kernel is managing the missing disk and
they want to do it by using btrfs-progs. At many places
btrfs-progs is way to intelligent than what actually
needed. That's wrong).
More to come.
Thanks, Anand
On 01/27/2014 07:43 PM, Alin Dobre wrote:
> Hi all!
>
> I am trying to create a very simple script that would alert in case of
> disk failures from a RAID Btrfs.
>
> Digging into the code, I have noticed that the "btrfs fi sh" command
> should display a warning if there is a missing disk. However, testing in
> a Qemu, I used "drive_del" via QMP to remove a "live" SCSI drive,
> already mounted as part of a RAID10 array, the "fi sh" command still
> gave no indication that the drive is missing. Then, I tried removing a
> scsi disk from the host via "echo 1 >/sys/block/sdX/device/delete" to
> actually make the kernel SCSI host forget about it, and "fi sh" still
> doesn't show anything.
>
> I have tested using btrfs-progs v3.12 and kernel 3.13.0.
>
> Do you guys know what's wrong with the setup explained above or do you
> have any indication on how to detect if there is a failing disk, part of
> a Btrfs RAID?
>
> Cheers,
> Alin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Monitoring for disk failures
2014-01-27 13:10 ` Duncan
@ 2014-01-28 9:15 ` Anand Jain
0 siblings, 0 replies; 4+ messages in thread
From: Anand Jain @ 2014-01-28 9:15 UTC (permalink / raw)
To: Duncan, linux-btrfs
> Without actually trying it here... I believe by default that'd update
> only when there was an I/O error.
>
> Did you try btrfs filesystem show --all-devices? That scans differently.
That will show missing with its own probes but kernel does not
know that disk is missing.
> If that doesn't work try btrfs device scan first as that updates the in-
> kernel list, then filesystem show.
Nope. Scan does not remove the old (missing disk) entries,
I am writing patch(es)..
> Alternatively, monitor the kernel log
> for output as the scanned devices show up there.
Thats the best choice as of now OR btrfs fi show -d will
show missing
Thanks, Anand
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-01-28 9:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-27 11:43 Monitoring for disk failures Alin Dobre
2014-01-27 13:10 ` Duncan
2014-01-28 9:15 ` Anand Jain
2014-01-28 9:03 ` Anand Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).