* polling mdX/md/degraded in sysfs
@ 2012-01-05 8:34 Mikhail Balabin
2012-01-08 9:06 ` Alexander Lyakas
0 siblings, 1 reply; 7+ messages in thread
From: Mikhail Balabin @ 2012-01-05 8:34 UTC (permalink / raw)
To: linux-raid
Hi!
I'm playing around with monitoring software raid status via sysfs
entries. In my case it's a raid1 array. According to
Documentation/md.txt any md device with redundancy should contain file
"degraded" (for example, /sys/block/md0/md/degraded) with the number
of devices by which the arrays is degraded. It is stated that this
file can be polled to monitor changes in the array, but it does not
work for me. Here is my (stripped-down) python code:
import select
fileName = "/sys/block/md0/md/degraded"
epoll = select.epoll()
while(True):
file = open(fileName)
status = file.read()
print(status)
epoll.register(file.fileno(), select.EPOLLPRI|select.EPOLLERR)
epoll.poll()
print("==== poll ====")
epoll.unregister(file.fileno())
file.close()
The script works fine for /proc/mdstat or /proc/mounts, but does not
show any events for /sys/block/md0/md/degraded. Is there a problem in
my code? Or is the documentation inaccurate?
Mikhail Balabin
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: polling mdX/md/degraded in sysfs 2012-01-05 8:34 polling mdX/md/degraded in sysfs Mikhail Balabin @ 2012-01-08 9:06 ` Alexander Lyakas 2012-01-08 11:37 ` Mikhail Balabin 2012-01-09 0:44 ` NeilBrown 0 siblings, 2 replies; 7+ messages in thread From: Alexander Lyakas @ 2012-01-08 9:06 UTC (permalink / raw) To: Mikhail Balabin; +Cc: linux-raid Hi, well, at least according to 2.6.38-8 kernel code, this attribute is notified in 3 cases: # When the array is started (e.g., via RUN_ARRAY ioctl) # When "reshape" is initiated via sysfs # When a spare is activated after successful completion of resync/recover/check/replair If you want to monitor changes in the array, what works for me is the following: # Arrange some script/executable to be called by MD monitor # Every time your script/executable is called, go and check the details you are interested in (e.g., mdadm --detail). The MD monitor also provides the description of the event (see man mdadm for possible events), but at least for me it is not always accurate, especially when there are several very fast changes in the array. # If you want to monitor resync/recover/check/repair progress, you need to specify both --delay and --increment options to MD monitor. Alex. On Thu, Jan 5, 2012 at 10:34 AM, Mikhail Balabin <mbalabin@gmail.com> wrote: > Hi! > > I'm playing around with monitoring software raid status via sysfs > entries. In my case it's a raid1 array. According to > Documentation/md.txt any md device with redundancy should contain file > "degraded" (for example, /sys/block/md0/md/degraded) with the number > of devices by which the arrays is degraded. It is stated that this > file can be polled to monitor changes in the array, but it does not > work for me. Here is my (stripped-down) python code: > > import select > fileName = "/sys/block/md0/md/degraded" > epoll = select.epoll() > while(True): > file = open(fileName) > status = file.read() > print(status) > > epoll.register(file.fileno(), select.EPOLLPRI|select.EPOLLERR) > epoll.poll() > print("==== poll ====") > epoll.unregister(file.fileno()) > file.close() > > The script works fine for /proc/mdstat or /proc/mounts, but does not > show any events for /sys/block/md0/md/degraded. Is there a problem in > my code? Or is the documentation inaccurate? > > Mikhail Balabin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: polling mdX/md/degraded in sysfs 2012-01-08 9:06 ` Alexander Lyakas @ 2012-01-08 11:37 ` Mikhail Balabin 2012-01-08 17:19 ` Alexander Lyakas 2012-01-09 0:44 ` NeilBrown 1 sibling, 1 reply; 7+ messages in thread From: Mikhail Balabin @ 2012-01-08 11:37 UTC (permalink / raw) To: Alexander Lyakas; +Cc: linux-raid Hi, To trigger my script, I was doing mdadm --fail with my array. I have not not waited enough time to finish array resync with the script running. So, it's possible that some events can be caught by the script. I will check it later to make sure. Still, md.txt states that "any increase or decrease in the count of missing devices will trigger an event". It is strange that arguably the most important event, a disk failure, does not trigger poll. I think that the behavior specified in documentation is more logical and the lack of the event may be considered as a (very minor) kernel bug. I use 2.6.39 Debian-shipped kernel, by the way. The workaround is simple, though: polling /proc/mdstat works fine for both disk failure and disk resync event. After the detection of an event I can read /sys entries, it's much more comfortable than parsing human-readable /proc/mdstat. I tried mdadm --monitor first, but it did not fully suit my needs. The story is, I have been running a raid-1 array on my workstation for about a year now. Some time ago one of the disks started failing, but I've noticed the failure a month or so later. So, I decided that I need a small tool to monitor array's health. I thought that mdadm's email notification is somewhat clumsy and unreliable solution for a workstation. mdadm --program can popup a message, but it does not work if the array is already degraded at startup (if the array was shut down uncleanly as a result of power failure, for example). mdadm is typically started before graphical shell, so I could not see a popup message in this case. So I've hacked a small script displaying a system tray icon which turns red when something bad happens to my array. Nice little project to do if you've caught cold and stay home on new year's holidays :) Mikhail Balabin 2012/1/8 Alexander Lyakas <alex.bolshoy@gmail.com>: > Hi, > well, at least according to 2.6.38-8 kernel code, this attribute is > notified in 3 cases: > # When the array is started (e.g., via RUN_ARRAY ioctl) > # When "reshape" is initiated via sysfs > # When a spare is activated after successful completion of > resync/recover/check/replair > > If you want to monitor changes in the array, what works for me is the following: > # Arrange some script/executable to be called by MD monitor > # Every time your script/executable is called, go and check the > details you are interested in (e.g., mdadm --detail). The MD monitor > also provides the description of the event (see man mdadm for possible > events), but at least for me it is not always accurate, especially > when there are several very fast changes in the array. > # If you want to monitor resync/recover/check/repair progress, you > need to specify both --delay and --increment options to MD monitor. > > Alex. > > > On Thu, Jan 5, 2012 at 10:34 AM, Mikhail Balabin <mbalabin@gmail.com> wrote: >> Hi! >> >> I'm playing around with monitoring software raid status via sysfs >> entries. In my case it's a raid1 array. According to >> Documentation/md.txt any md device with redundancy should contain file >> "degraded" (for example, /sys/block/md0/md/degraded) with the number >> of devices by which the arrays is degraded. It is stated that this >> file can be polled to monitor changes in the array, but it does not >> work for me. Here is my (stripped-down) python code: >> >> import select >> fileName = "/sys/block/md0/md/degraded" >> epoll = select.epoll() >> while(True): >> file = open(fileName) >> status = file.read() >> print(status) >> >> epoll.register(file.fileno(), select.EPOLLPRI|select.EPOLLERR) >> epoll.poll() >> print("==== poll ====") >> epoll.unregister(file.fileno()) >> file.close() >> >> The script works fine for /proc/mdstat or /proc/mounts, but does not >> show any events for /sys/block/md0/md/degraded. Is there a problem in >> my code? Or is the documentation inaccurate? >> >> Mikhail Balabin >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: polling mdX/md/degraded in sysfs 2012-01-08 11:37 ` Mikhail Balabin @ 2012-01-08 17:19 ` Alexander Lyakas 0 siblings, 0 replies; 7+ messages in thread From: Alexander Lyakas @ 2012-01-08 17:19 UTC (permalink / raw) To: Mikhail Balabin; +Cc: linux-raid > The workaround is simple, though: polling /proc/mdstat works fine for > both disk failure and disk resync event. That's exactly what mdadm --monitor does, BTW. After the detection of an > event I can read /sys entries, it's much more comfortable than parsing > human-readable /proc/mdstat. Yes, and internal MDADM IOCTLs are even better (but you need to code C). Alex. > > Mikhail Balabin > > 2012/1/8 Alexander Lyakas <alex.bolshoy@gmail.com>: >> Hi, >> well, at least according to 2.6.38-8 kernel code, this attribute is >> notified in 3 cases: >> # When the array is started (e.g., via RUN_ARRAY ioctl) >> # When "reshape" is initiated via sysfs >> # When a spare is activated after successful completion of >> resync/recover/check/replair >> >> If you want to monitor changes in the array, what works for me is the following: >> # Arrange some script/executable to be called by MD monitor >> # Every time your script/executable is called, go and check the >> details you are interested in (e.g., mdadm --detail). The MD monitor >> also provides the description of the event (see man mdadm for possible >> events), but at least for me it is not always accurate, especially >> when there are several very fast changes in the array. >> # If you want to monitor resync/recover/check/repair progress, you >> need to specify both --delay and --increment options to MD monitor. >> >> Alex. >> >> >> On Thu, Jan 5, 2012 at 10:34 AM, Mikhail Balabin <mbalabin@gmail.com> wrote: >>> Hi! >>> >>> I'm playing around with monitoring software raid status via sysfs >>> entries. In my case it's a raid1 array. According to >>> Documentation/md.txt any md device with redundancy should contain file >>> "degraded" (for example, /sys/block/md0/md/degraded) with the number >>> of devices by which the arrays is degraded. It is stated that this >>> file can be polled to monitor changes in the array, but it does not >>> work for me. Here is my (stripped-down) python code: >>> >>> import select >>> fileName = "/sys/block/md0/md/degraded" >>> epoll = select.epoll() >>> while(True): >>> file = open(fileName) >>> status = file.read() >>> print(status) >>> >>> epoll.register(file.fileno(), select.EPOLLPRI|select.EPOLLERR) >>> epoll.poll() >>> print("==== poll ====") >>> epoll.unregister(file.fileno()) >>> file.close() >>> >>> The script works fine for /proc/mdstat or /proc/mounts, but does not >>> show any events for /sys/block/md0/md/degraded. Is there a problem in >>> my code? Or is the documentation inaccurate? >>> >>> Mikhail Balabin >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: polling mdX/md/degraded in sysfs 2012-01-08 9:06 ` Alexander Lyakas 2012-01-08 11:37 ` Mikhail Balabin @ 2012-01-09 0:44 ` NeilBrown 2012-01-09 10:35 ` Mikhail Balabin 1 sibling, 1 reply; 7+ messages in thread From: NeilBrown @ 2012-01-09 0:44 UTC (permalink / raw) To: Alexander Lyakas; +Cc: Mikhail Balabin, linux-raid [-- Attachment #1: Type: text/plain, Size: 1234 bytes --] On Sun, 8 Jan 2012 11:06:59 +0200 Alexander Lyakas <alex.bolshoy@gmail.com> wrote: > Hi, > well, at least according to 2.6.38-8 kernel code, this attribute is > notified in 3 cases: > # When the array is started (e.g., via RUN_ARRAY ioctl) > # When "reshape" is initiated via sysfs > # When a spare is activated after successful completion of > resync/recover/check/replair > Hmm... it really should notify when a disk fails. Mikhail: could you please test if this patch makes it work better for you? Thanks, NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index 1c1c562..33aa06f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7383,6 +7383,7 @@ static int remove_and_add_spares(struct mddev *mddev) { struct md_rdev *rdev; int spares = 0; + int removed = 0; mddev->curr_resync_completed = 0; @@ -7396,8 +7397,13 @@ static int remove_and_add_spares(struct mddev *mddev) mddev, rdev) == 0) { sysfs_unlink_rdev(mddev, rdev); rdev->raid_disk = -1; + removed++; } } + if (removed) + sysfs_notify(&mddev->kobj, NULL, + "degraded"); + list_for_each_entry(rdev, &mddev->disks, same_set) { if (rdev->raid_disk >= 0 && [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: polling mdX/md/degraded in sysfs 2012-01-09 0:44 ` NeilBrown @ 2012-01-09 10:35 ` Mikhail Balabin 2012-01-10 15:19 ` Mikhail Balabin 0 siblings, 1 reply; 7+ messages in thread From: Mikhail Balabin @ 2012-01-09 10:35 UTC (permalink / raw) To: NeilBrown; +Cc: Alexander Lyakas, linux-raid 2012/1/9 NeilBrown <neilb@suse.de>: > On Sun, 8 Jan 2012 11:06:59 +0200 Alexander Lyakas <alex.bolshoy@gmail.com> > wrote: > >> Hi, >> well, at least according to 2.6.38-8 kernel code, this attribute is >> notified in 3 cases: >> # When the array is started (e.g., via RUN_ARRAY ioctl) >> # When "reshape" is initiated via sysfs >> # When a spare is activated after successful completion of >> resync/recover/check/replair >> > > Hmm... it really should notify when a disk fails. > > Mikhail: could you please test if this patch makes it work better for you? Ok, I'll try to test the patch against 3.2 today. > Thanks, > NeilBrown Mikhail Balabin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: polling mdX/md/degraded in sysfs 2012-01-09 10:35 ` Mikhail Balabin @ 2012-01-10 15:19 ` Mikhail Balabin 0 siblings, 0 replies; 7+ messages in thread From: Mikhail Balabin @ 2012-01-10 15:19 UTC (permalink / raw) To: NeilBrown, Alexander Lyakas; +Cc: linux-raid 2012/1/9 Mikhail Balabin <mbalabin@gmail.com>: > 2012/1/9 NeilBrown <neilb@suse.de>: >> On Sun, 8 Jan 2012 11:06:59 +0200 Alexander Lyakas <alex.bolshoy@gmail.com> >> wrote: >> >>> Hi, >>> well, at least according to 2.6.38-8 kernel code, this attribute is >>> notified in 3 cases: >>> # When the array is started (e.g., via RUN_ARRAY ioctl) >>> # When "reshape" is initiated via sysfs >>> # When a spare is activated after successful completion of >>> resync/recover/check/replair >>> >> >> Hmm... it really should notify when a disk fails. >> >> Mikhail: could you please test if this patch makes it work better for you? > > Ok, I'll try to test the patch against 3.2 today. > >> Thanks, >> NeilBrown > > Mikhail Balabin The patch have fixed the issue. I suppose it should go to the mainline. Alexander, Neil, thanks for your efforts. Mikhail Balabin ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-01-10 15:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-05 8:34 polling mdX/md/degraded in sysfs Mikhail Balabin 2012-01-08 9:06 ` Alexander Lyakas 2012-01-08 11:37 ` Mikhail Balabin 2012-01-08 17:19 ` Alexander Lyakas 2012-01-09 0:44 ` NeilBrown 2012-01-09 10:35 ` Mikhail Balabin 2012-01-10 15:19 ` Mikhail Balabin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).