* mdadm udev rules change @ 2009-04-06 14:57 Doug Ledford 2009-04-08 7:31 ` Neil Brown 0 siblings, 1 reply; 4+ messages in thread From: Doug Ledford @ 2009-04-06 14:57 UTC (permalink / raw) To: LinuxRaid RAID; +Cc: Neil Brown [-- Attachment #1: Type: text/plain, Size: 1262 bytes --] I'm not attaching a patch for this because it's so simple. Long story short, watching both add and change events in udev rules is bad for md devices. Specifically, the kernel will generate a change event on things like array stop, and on things like fdisk close. In the case of array stop, it can result in the array being assembled again immediately. In the case of fdisk close, the situation is worse. Let's say you stop all the md devices on some block device in order to repartition. You run fdisk, change the partition table, then issue a write of the table. The write of the table triggers the change event *before* the kernel updates the partition table in memory for the block device, causing udev to rerun the incremental rules on the old partition table and restart all the arrays you just stopped with the old partition table layout, at which point the kernel is unable to reread the partition table. So, once you've enable incremental assembly, it becomes apparent that what we really want is to only start devices on add, not on add|change. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 203 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm udev rules change 2009-04-06 14:57 mdadm udev rules change Doug Ledford @ 2009-04-08 7:31 ` Neil Brown 2009-04-09 19:20 ` Doug Ledford 0 siblings, 1 reply; 4+ messages in thread From: Neil Brown @ 2009-04-08 7:31 UTC (permalink / raw) To: Doug Ledford; +Cc: LinuxRaid RAID On Monday April 6, dledford@redhat.com wrote: > I'm not attaching a patch for this because it's so simple. Long story > short, watching both add and change events in udev rules is bad for md > devices. Specifically, the kernel will generate a change event on > things like array stop, and on things like fdisk close. In the case > of array stop, it can result in the array being assembled again > immediately. In the case of fdisk close, the situation is worse. > Let's say you stop all the md devices on some block device in order to > repartition. You run fdisk, change the partition table, then issue a > write of the table. The write of the table triggers the change event > *before* the kernel updates the partition table in memory for the > block device, causing udev to rerun the incremental rules on the old > partition table and restart all the arrays you just stopped with the > old partition table layout, at which point the kernel is unable to > reread the partition table. So, once you've enable incremental > assembly, it becomes apparent that what we really want is to only > start devices on add, not on add|change. So you aren't really talking about events on md devices, but rather on events that might become components of md devices - correct? So in udev-md-raid.rules we currently have .... ACTION!="add|change", GOTO="md_end" # import data from a raid member and activate it #ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}" # import data from a raid set KERNEL!="md*", GOTO="md_end" ...... And you are saying that if we uncomment the "#ENV{...." line, then we only want it to fire on add devices. So something like: .... ACTION!="add|change", GOTO="md_end" +ACTION=="change", GOTO="no_incr" # import data from a raid member and activate it #ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}" # import data from a raid set +LABEL="no_incr" KERNEL!="md*", GOTO="md_end" ...... Is that correct? Your explanation certainly seems reasonable. Thanks. NeilBrown ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm udev rules change 2009-04-08 7:31 ` Neil Brown @ 2009-04-09 19:20 ` Doug Ledford 2009-04-10 9:22 ` Neil Brown 0 siblings, 1 reply; 4+ messages in thread From: Doug Ledford @ 2009-04-09 19:20 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid RAID [-- Attachment #1: Type: text/plain, Size: 3615 bytes --] On Apr 8, 2009, at 3:31 AM, Neil Brown wrote: > On Monday April 6, dledford@redhat.com wrote: >> I'm not attaching a patch for this because it's so simple. Long >> story >> short, watching both add and change events in udev rules is bad for >> md >> devices. Specifically, the kernel will generate a change event on >> things like array stop, and on things like fdisk close. In the case >> of array stop, it can result in the array being assembled again >> immediately. In the case of fdisk close, the situation is worse. >> Let's say you stop all the md devices on some block device in order >> to >> repartition. You run fdisk, change the partition table, then issue a >> write of the table. The write of the table triggers the change event >> *before* the kernel updates the partition table in memory for the >> block device, causing udev to rerun the incremental rules on the old >> partition table and restart all the arrays you just stopped with the >> old partition table layout, at which point the kernel is unable to >> reread the partition table. So, once you've enable incremental >> assembly, it becomes apparent that what we really want is to only >> start devices on add, not on add|change. > > So you aren't really talking about events on md devices, but rather on > events that might become components of md devices - correct? > So in udev-md-raid.rules we currently have > .... > ACTION!="add|change", GOTO="md_end" > > # import data from a raid member and activate it > #ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm > --examine --export $tempnode", RUN+="/sbin/mdadm --incremental > $env{DEVNAME}" > # import data from a raid set > KERNEL!="md*", GOTO="md_end" > ...... > > And you are saying that if we uncomment the "#ENV{...." line, then we > only want it to fire on add devices. So something like: > > .... > ACTION!="add|change", GOTO="md_end" > +ACTION=="change", GOTO="no_incr" > > # import data from a raid member and activate it > #ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm > --examine --export $tempnode", RUN+="/sbin/mdadm --incremental > $env{DEVNAME}" > # import data from a raid set > +LABEL="no_incr" > KERNEL!="md*", GOTO="md_end" > ...... > > Is that correct? > > Your explanation certainly seems reasonable. As I was reading through this, I certainly thought the above implementation was reasonable, and then I got this new bug report: https://bugzilla.redhat.com/show_bug.cgi?id=495034 So, mdadm-3.0-0.devel3.5 leaves the default udev rules file in place and only defines a little 70-mdadm-incremental.rules that only does the incremental assembly. That file only watches for add events. The earlier mdadm that the user says doesn't have the bug replaces the existing rules file entirely and uncomments the incremental part. Long story short, it would appear the add|change in the normal rules file is possible responsible for a long string of these: # udevadm monitor .... UDEV [1239266575.473953] change /devices/virtual/block/md5 (block) UEVENT[1239266575.510739] change /devices/virtual/block/md5 (block) UDEV [1239266575.516114] change /devices/virtual/block/md5 (block) UEVENT[1239266575.552984] change /devices/virtual/block/md5 (block) .... I would do some investigation on your own machines, but given udev change semantics, I think the proper course of action is simply to not watch changes. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 203 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm udev rules change 2009-04-09 19:20 ` Doug Ledford @ 2009-04-10 9:22 ` Neil Brown 0 siblings, 0 replies; 4+ messages in thread From: Neil Brown @ 2009-04-10 9:22 UTC (permalink / raw) To: Doug Ledford; +Cc: LinuxRaid RAID On Thursday April 9, dledford@redhat.com wrote: > On Apr 8, 2009, at 3:31 AM, Neil Brown wrote: > > > > And you are saying that if we uncomment the "#ENV{...." line, then we > > only want it to fire on add devices. So something like: > > > > .... > > ACTION!="add|change", GOTO="md_end" > > +ACTION=="change", GOTO="no_incr" > > > > # import data from a raid member and activate it > > #ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm > > --examine --export $tempnode", RUN+="/sbin/mdadm --incremental > > $env{DEVNAME}" > > # import data from a raid set > > +LABEL="no_incr" > > KERNEL!="md*", GOTO="md_end" > > ...... > > > > Is that correct? > > > > Your explanation certainly seems reasonable. > > As I was reading through this, I certainly thought the above > implementation was reasonable, and then I got this new bug report: > > https://bugzilla.redhat.com/show_bug.cgi?id=495034 I'm guessing this bug is because the line TEST!="md/array_state", GOTO="md_end" is missing from the udev rules file?? It is a relatively recent addition. > > I would do some investigation on your own machines, but given udev > change semantics, I think the proper course of action is simply to not > watch changes. We absolutely have to watch change events on md devices. When the "new" event happens, the array doesn't exist yet. We have just started building it. Once it is fully assembled and started a change event happens and it is on that event that we create /dev files, look for partitions, and mount filesystems. NeilBrown ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-04-10 9:22 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-06 14:57 mdadm udev rules change Doug Ledford 2009-04-08 7:31 ` Neil Brown 2009-04-09 19:20 ` Doug Ledford 2009-04-10 9:22 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).