linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Monitoring for disk failures
@ 2014-01-27 11:43 Alin Dobre
  2014-01-27 13:10 ` Duncan
  2014-01-28  9:03 ` Anand Jain
  0 siblings, 2 replies; 4+ messages in thread
From: Alin Dobre @ 2014-01-27 11:43 UTC (permalink / raw)
  To: linux-btrfs

Hi all!

I am trying to create a very simple script that would alert in case of
disk failures from a RAID Btrfs.

Digging into the code, I have noticed that the "btrfs fi sh" command
should display a warning if there is a missing disk. However, testing in
a Qemu, I used "drive_del" via QMP to remove a "live" SCSI drive,
already mounted as part of a RAID10 array, the "fi sh" command still
gave no indication that the drive is missing. Then, I tried removing a
scsi disk from the host via "echo 1 >/sys/block/sdX/device/delete" to
actually make the kernel SCSI host forget about it, and "fi sh" still
doesn't show anything.

I have tested using btrfs-progs v3.12 and kernel 3.13.0.

Do you guys know what's wrong with the setup explained above or do you
have any indication on how to detect if there is a failing disk, part of
a Btrfs RAID?

Cheers,
Alin.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Monitoring for disk failures
  2014-01-27 11:43 Monitoring for disk failures Alin Dobre
@ 2014-01-27 13:10 ` Duncan
  2014-01-28  9:15   ` Anand Jain
  2014-01-28  9:03 ` Anand Jain
  1 sibling, 1 reply; 4+ messages in thread
From: Duncan @ 2014-01-27 13:10 UTC (permalink / raw)
  To: linux-btrfs

Alin Dobre posted on Mon, 27 Jan 2014 11:43:33 +0000 as excerpted:

> I am trying to create a very simple script that would alert in case of
> disk failures from a RAID Btrfs.
> 
> Digging into the code, I have noticed that the "btrfs fi sh" command
> should display a warning if there is a missing disk. However, testing in
> a Qemu, I used "drive_del" via QMP to remove a "live" SCSI drive,
> already mounted as part of a RAID10 array, the "fi sh" command still
> gave no indication that the drive is missing. Then, I tried removing a
> scsi disk from the host via "echo 1 >/sys/block/sdX/device/delete" to
> actually make the kernel SCSI host forget about it, and "fi sh" still
> doesn't show anything.
> 
> I have tested using btrfs-progs v3.12 and kernel 3.13.0.

Without actually trying it here... I believe by default that'd update 
only when there was an I/O error.

Did you try btrfs filesystem show --all-devices?  That scans differently.

If that doesn't work try btrfs device scan first as that updates the in-
kernel list, then filesystem show.  Alternatively, monitor the kernel log 
for output as the scanned devices show up there.

And if /that/ doesn't work, try show, followed by a probe of all the 
devices listed by show.  But I strongly suspect a device scan will force 
the update you're looking for.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Monitoring for disk failures
  2014-01-27 11:43 Monitoring for disk failures Alin Dobre
  2014-01-27 13:10 ` Duncan
@ 2014-01-28  9:03 ` Anand Jain
  1 sibling, 0 replies; 4+ messages in thread
From: Anand Jain @ 2014-01-28  9:03 UTC (permalink / raw)
  To: Alin Dobre; +Cc: linux-btrfs


Alin,

[bug] its messy when missing device reappears after its been replaced in 
RAID1

  I am aware of it and working on it. I also reported a more
  critical bug earlier as below.

  [bug] its messy when missing device reappears after its been replaced 
in RAID1


  We see IO errors when disk goes missing. But NO as of now
  a mounted FS will never report a missing disk unless you
  unmount and mount (not remount) then kernel will realize
  the missing disk

  Note that don;t be happy about
     btrfs fi show -d <all-devices>
  reporting missing disk (when fs is mounted), since its
  not inline with kernel. Here with -d option btrfs-progs
  is adding its 'own' intelligence to show disk as missing
  (what is not what end user want, end user would want to
  know how btrfs kernel is managing the missing disk and
  they want to do it by using btrfs-progs. At many places
  btrfs-progs is way to intelligent than what actually
  needed. That's wrong).

  More to come.

Thanks, Anand



On 01/27/2014 07:43 PM, Alin Dobre wrote:
> Hi all!
>
> I am trying to create a very simple script that would alert in case of
> disk failures from a RAID Btrfs.
>
> Digging into the code, I have noticed that the "btrfs fi sh" command
> should display a warning if there is a missing disk. However, testing in
> a Qemu, I used "drive_del" via QMP to remove a "live" SCSI drive,
> already mounted as part of a RAID10 array, the "fi sh" command still
> gave no indication that the drive is missing. Then, I tried removing a
> scsi disk from the host via "echo 1 >/sys/block/sdX/device/delete" to
> actually make the kernel SCSI host forget about it, and "fi sh" still
> doesn't show anything.
>
> I have tested using btrfs-progs v3.12 and kernel 3.13.0.
>
> Do you guys know what's wrong with the setup explained above or do you
> have any indication on how to detect if there is a failing disk, part of
> a Btrfs RAID?
>
> Cheers,
> Alin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Monitoring for disk failures
  2014-01-27 13:10 ` Duncan
@ 2014-01-28  9:15   ` Anand Jain
  0 siblings, 0 replies; 4+ messages in thread
From: Anand Jain @ 2014-01-28  9:15 UTC (permalink / raw)
  To: Duncan, linux-btrfs



> Without actually trying it here... I believe by default that'd update
> only when there was an I/O error.
>
> Did you try btrfs filesystem show --all-devices?  That scans differently.

  That will show missing with its own probes but kernel does not
  know that disk is missing.

> If that doesn't work try btrfs device scan first as that updates the in-
> kernel list, then filesystem show.

  Nope. Scan does not remove the old (missing disk) entries,
  I am writing patch(es)..

> Alternatively, monitor the kernel log
> for output as the scanned devices show up there.

  Thats the best choice as of now OR btrfs fi show -d will
  show missing

Thanks, Anand

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-28  9:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-27 11:43 Monitoring for disk failures Alin Dobre
2014-01-27 13:10 ` Duncan
2014-01-28  9:15   ` Anand Jain
2014-01-28  9:03 ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).