* Minor mdadm fixes @ 2010-01-11 20:38 Doug Ledford 2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford ` (6 more replies) 0 siblings, 7 replies; 66+ messages in thread From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw) To: linux-raid These are a number of minor fixes we carry in our mdadm at the moment. Would prefer not to carry them ourselves ;-) Neil, any clue when you think might release mdadm-3.1.2? ^ permalink raw reply [flat|nested] 66+ messages in thread
* [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-11 20:38 Minor mdadm fixes Doug Ledford @ 2010-01-11 20:38 ` Doug Ledford 2010-01-18 22:01 ` Neil Brown 2010-01-18 22:13 ` Dan Williams 2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford ` (5 subsequent siblings) 6 siblings, 2 replies; 66+ messages in thread From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw) To: linux-raid; +Cc: Doug Ledford Signed-off-by: Doug Ledford <dledford@redhat.com> --- super-intel.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/super-intel.c b/super-intel.c index d6951cc..fcf438c 100644 --- a/super-intel.c +++ b/super-intel.c @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk, dd->fd = fd; dd->e = NULL; rv = imsm_read_serial(fd, devname, dd->serial); - if (rv) { + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) { + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN); + fd2devname(fd, (char *) dd->serial); + } else if (rv) { fprintf(stderr, Name ": failed to retrieve scsi serial, aborting\n"); free(dd); -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford @ 2010-01-18 22:01 ` Neil Brown 2010-01-18 22:13 ` Dan Williams 1 sibling, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-01-18 22:01 UTC (permalink / raw) Cc: linux-raid, Doug Ledford On Mon, 11 Jan 2010 15:38:10 -0500 Doug Ledford <dledford@redhat.com> wrote: Applied, thanks. NeilBrown > Signed-off-by: Doug Ledford <dledford@redhat.com> > --- > super-intel.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/super-intel.c b/super-intel.c > index d6951cc..fcf438c 100644 > --- a/super-intel.c > +++ b/super-intel.c > @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk, > dd->fd = fd; > dd->e = NULL; > rv = imsm_read_serial(fd, devname, dd->serial); > - if (rv) { > + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) { > + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN); > + fd2devname(fd, (char *) dd->serial); > + } else if (rv) { > fprintf(stderr, > Name ": failed to retrieve scsi serial, aborting\n"); > free(dd); ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford 2010-01-18 22:01 ` Neil Brown @ 2010-01-18 22:13 ` Dan Williams 2010-01-19 1:55 ` Doug Ledford 1 sibling, 1 reply; 66+ messages in thread From: Dan Williams @ 2010-01-18 22:13 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid Hi Doug, On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote: > Signed-off-by: Doug Ledford <dledford@redhat.com> > --- > super-intel.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/super-intel.c b/super-intel.c > index d6951cc..fcf438c 100644 > --- a/super-intel.c > +++ b/super-intel.c > @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk, > dd->fd = fd; > dd->e = NULL; > rv = imsm_read_serial(fd, devname, dd->serial); > - if (rv) { > + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) { > + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN); > + fd2devname(fd, (char *) dd->serial); > + } else if (rv) { This just duplicates the check already inside imsm_read_serial(). Containers on loopback devices worked before this patch, so I'll send a revert. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-18 22:13 ` Dan Williams @ 2010-01-19 1:55 ` Doug Ledford 2010-01-19 4:42 ` Dan Williams 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-19 1:55 UTC (permalink / raw) To: Dan Williams; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1589 bytes --] On 01/18/2010 05:13 PM, Dan Williams wrote: > Hi Doug, > > On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote: >> Signed-off-by: Doug Ledford <dledford@redhat.com> >> --- >> super-intel.c | 5 ++++- >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git a/super-intel.c b/super-intel.c >> index d6951cc..fcf438c 100644 >> --- a/super-intel.c >> +++ b/super-intel.c >> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk, >> dd->fd = fd; >> dd->e = NULL; >> rv = imsm_read_serial(fd, devname, dd->serial); >> - if (rv) { >> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) { >> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN); >> + fd2devname(fd, (char *) dd->serial); >> + } else if (rv) { > > This just duplicates the check already inside imsm_read_serial(). > Containers on loopback devices worked before this patch, so I'll send > a revert. > > -- > Dan Me thinks you didn't try it, because this does not duplicate the code in imsm_read_serial(). That code is needed to assemble an IMSM array that already exists on loopback devices. This is needed to *create* an imsm container on fresh loopback devices. I'm assuming your imsm container superblocks already existed or some such. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-19 1:55 ` Doug Ledford @ 2010-01-19 4:42 ` Dan Williams 2010-01-19 5:31 ` Doug Ledford 0 siblings, 1 reply; 66+ messages in thread From: Dan Williams @ 2010-01-19 4:42 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid On Mon, Jan 18, 2010 at 6:55 PM, Doug Ledford <dledford@redhat.com> wrote: > On 01/18/2010 05:13 PM, Dan Williams wrote: >> Hi Doug, >> >> On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote: >>> Signed-off-by: Doug Ledford <dledford@redhat.com> >>> --- >>> super-intel.c | 5 ++++- >>> 1 files changed, 4 insertions(+), 1 deletions(-) >>> >>> diff --git a/super-intel.c b/super-intel.c >>> index d6951cc..fcf438c 100644 >>> --- a/super-intel.c >>> +++ b/super-intel.c >>> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk, >>> dd->fd = fd; >>> dd->e = NULL; >>> rv = imsm_read_serial(fd, devname, dd->serial); >>> - if (rv) { >>> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) { >>> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN); >>> + fd2devname(fd, (char *) dd->serial); >>> + } else if (rv) { >> >> This just duplicates the check already inside imsm_read_serial(). >> Containers on loopback devices worked before this patch, so I'll send >> a revert. >> >> -- >> Dan > > Me thinks you didn't try it, because this does not duplicate the code in > imsm_read_serial(). That code is needed to assemble an IMSM array that > already exists on loopback devices. This is needed to *create* an imsm > container on fresh loopback devices. I'm assuming your imsm container > superblocks already existed or some such. > Me thinks you did not try it either :-) # export IMSM_DEVNAME_AS_SERIAL=1 # mdadm --zero-superblock /dev/loop[0-3] # mdadm -Eb /dev/loop[0-4] # mdadm -E /dev/loop0 mdadm: No md superblock detected on /dev/loop0. # mdadm --create /dev/md/imsm /dev/loop[0-3] -n 4 -e imsm mdadm: /dev/loop0 appears to contain an ext2fs file system size=306816K mtime=Sat Nov 21 10:54:51 2009 mdadm: /dev/loop1 appears to contain an ext2fs file system size=306816K mtime=Sat Nov 21 10:54:51 2009 mdadm: imsm unable to enumerate platform support array may not be compatible with hardware/firmware Continue creating array? y mdadm: container /dev/md/imsm prepared. # mdadm -Eb /dev/loop[0-3] ARRAY metadata=imsm spares=4 # mdadm -E /dev/loop0 /dev/loop0: Magic : Intel Raid ISM Cfg Sig. Version : 1.0.00 Orig Family : 00000000 Family : 697a43ec Generation : 00000001 UUID : ffffffff:ffffffff:ffffffff:ffffffff Checksum : c3d8d367 correct MPB Sectors : 1 Disks : 1 RAID Devices : 0 Disk00 Serial : /dev/loop0 State : spare Id : 00000000 Usable Size : 204382 (99.81 MiB 104.64 MB) This is with: commit 6acad4811b06335a2602fa1eeaec3a8f47f96591 Author: Michael Evan <mjevans1983@gmail.com> Date: Wed Dec 9 21:52:18 2009 -0800 -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-19 4:42 ` Dan Williams @ 2010-01-19 5:31 ` Doug Ledford 2010-01-19 5:47 ` Dan Williams 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-19 5:31 UTC (permalink / raw) To: Dan Williams, Linux RAID Mailing List [-- Attachment #1: Type: text/plain, Size: 3495 bytes --] On 01/18/2010 11:42 PM, Dan Williams wrote: > On Mon, Jan 18, 2010 at 6:55 PM, Doug Ledford <dledford@redhat.com> wrote: >> On 01/18/2010 05:13 PM, Dan Williams wrote: >>> Hi Doug, >>> >>> On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote: >>>> Signed-off-by: Doug Ledford <dledford@redhat.com> >>>> --- >>>> super-intel.c | 5 ++++- >>>> 1 files changed, 4 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/super-intel.c b/super-intel.c >>>> index d6951cc..fcf438c 100644 >>>> --- a/super-intel.c >>>> +++ b/super-intel.c >>>> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk, >>>> dd->fd = fd; >>>> dd->e = NULL; >>>> rv = imsm_read_serial(fd, devname, dd->serial); >>>> - if (rv) { >>>> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) { >>>> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN); >>>> + fd2devname(fd, (char *) dd->serial); >>>> + } else if (rv) { >>> >>> This just duplicates the check already inside imsm_read_serial(). >>> Containers on loopback devices worked before this patch, so I'll send >>> a revert. >>> >>> -- >>> Dan >> >> Me thinks you didn't try it, because this does not duplicate the code in >> imsm_read_serial(). That code is needed to assemble an IMSM array that >> already exists on loopback devices. This is needed to *create* an imsm >> container on fresh loopback devices. I'm assuming your imsm container >> superblocks already existed or some such. >> > > Me thinks you did not try it either :-) > > # export IMSM_DEVNAME_AS_SERIAL=1 > # mdadm --zero-superblock /dev/loop[0-3] > # mdadm -Eb /dev/loop[0-4] > # mdadm -E /dev/loop0 > mdadm: No md superblock detected on /dev/loop0. > # mdadm --create /dev/md/imsm /dev/loop[0-3] -n 4 -e imsm > mdadm: /dev/loop0 appears to contain an ext2fs file system > size=306816K mtime=Sat Nov 21 10:54:51 2009 > mdadm: /dev/loop1 appears to contain an ext2fs file system > size=306816K mtime=Sat Nov 21 10:54:51 2009 > mdadm: imsm unable to enumerate platform support > array may not be compatible with hardware/firmware > Continue creating array? y > mdadm: container /dev/md/imsm prepared. > # mdadm -Eb /dev/loop[0-3] > ARRAY metadata=imsm > spares=4 > > # mdadm -E /dev/loop0 > /dev/loop0: > Magic : Intel Raid ISM Cfg Sig. > Version : 1.0.00 > Orig Family : 00000000 > Family : 697a43ec > Generation : 00000001 > UUID : ffffffff:ffffffff:ffffffff:ffffffff > Checksum : c3d8d367 correct > MPB Sectors : 1 > Disks : 1 > RAID Devices : 0 > > Disk00 Serial : /dev/loop0 > State : spare > Id : 00000000 > Usable Size : 204382 (99.81 MiB 104.64 MB) > > This is with: > commit 6acad4811b06335a2602fa1eeaec3a8f47f96591 > Author: Michael Evan <mjevans1983@gmail.com> > Date: Wed Dec 9 21:52:18 2009 -0800 > > -- > Dan Ah, OK. I did say we had been carrying this around in our SRPM for a while, I just hadn't tried removing it since it was necessary. I take it you are implying that that changeset is the one that rendered it no longer necessary? -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries. 2010-01-19 5:31 ` Doug Ledford @ 2010-01-19 5:47 ` Dan Williams 0 siblings, 0 replies; 66+ messages in thread From: Dan Williams @ 2010-01-19 5:47 UTC (permalink / raw) To: Doug Ledford; +Cc: Linux RAID Mailing List On Mon, Jan 18, 2010 at 10:31 PM, Doug Ledford <dledford@redhat.com> wrote: > Ah, OK. I did say we had been carrying this around in our SRPM for a > while, I just hadn't tried removing it since it was necessary. No worries, I suspected as much. > I take > it you are implying that that changeset is the one that rendered it no > longer necessary? Nah, this is just a recent point before Neil applied this patch. I did a quick look for the commit that fixed this, but nothing popped out. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-01-11 20:38 Minor mdadm fixes Doug Ledford 2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford @ 2010-01-11 20:38 ` Doug Ledford 2010-01-18 22:09 ` Neil Brown 2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford ` (4 subsequent siblings) 6 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw) To: linux-raid; +Cc: Doug Ledford Signed-off-by: Doug Ledford <dledford@redhat.com> --- mdmon.c | 12 ++++++------ msg.c | 2 +- util.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/mdmon.c b/mdmon.c index 0ec4259..b1d7aef 100644 --- a/mdmon.c +++ b/mdmon.c @@ -118,7 +118,7 @@ static int test_pidfile(char *devname) char path[100]; struct stat st; - sprintf(path, "/var/run/mdadm/%s.pid", devname); + sprintf(path, "/dev/.mdadm/%s.pid", devname); return stat(path, &st); } @@ -132,7 +132,7 @@ int make_pidfile(char *devname, int o_excl) if (sigterm) return -1; - sprintf(path, "/var/run/mdadm/%s.pid", devname); + sprintf(path, "/dev/.mdadm/%s.pid", devname); fd = open(path, O_RDWR|O_CREAT|o_excl, 0600); if (fd < 0) @@ -163,7 +163,7 @@ pid_t devname2mdmon(char *devname) pid_t pid = -1; int fd; - sprintf(buf, "/var/run/mdadm/%s.pid", devname); + sprintf(buf, "/dev/.mdadm/%s.pid", devname); fd = open(buf, O_RDONLY|O_NOATIME); if (fd < 0) return -1; @@ -217,9 +217,9 @@ void remove_pidfile(char *devname) if (sigterm) return; - sprintf(buf, "/var/run/mdadm/%s.pid", devname); + sprintf(buf, "/dev/.mdadm/%s.pid", devname); unlink(buf); - sprintf(buf, "/var/run/mdadm/%s.sock", devname); + sprintf(buf, "/dev/.mdadm/%s.sock", devname); unlink(buf); } @@ -233,7 +233,7 @@ int make_control_sock(char *devname) if (sigterm) return -1; - sprintf(path, "/var/run/mdadm/%s.sock", devname); + sprintf(path, "/dev/.mdadm/%s.sock", devname); unlink(path); sfd = socket(PF_LOCAL, SOCK_STREAM, 0); if (sfd < 0) diff --git a/msg.c b/msg.c index 8d52b94..c3ab243 100644 --- a/msg.c +++ b/msg.c @@ -147,7 +147,7 @@ int connect_monitor(char *devname) int pos; char *c; - pos = sprintf(path, "/var/run/mdadm/"); + pos = sprintf(path, "/dev/.mdadm/"); if (is_subarray(devname)) { devname++; c = strchr(devname, '/'); diff --git a/util.c b/util.c index 5feec43..864af69 100644 --- a/util.c +++ b/util.c @@ -1469,7 +1469,7 @@ int mdmon_running(int devnum) char pid[10]; int fd; int n; - sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum)); + sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum)); fd = open(path, O_RDONLY, 0); if (fd < 0) @@ -1489,7 +1489,7 @@ int signal_mdmon(int devnum) char pid[10]; int fd; int n; - sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum)); + sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum)); fd = open(path, O_RDONLY, 0); if (fd < 0) -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford @ 2010-01-18 22:09 ` Neil Brown 2010-01-19 7:21 ` Luca Berra 2010-01-19 17:51 ` Doug Ledford 0 siblings, 2 replies; 66+ messages in thread From: Neil Brown @ 2010-01-18 22:09 UTC (permalink / raw) Cc: linux-raid, Doug Ledford On Mon, 11 Jan 2010 15:38:11 -0500 Doug Ledford <dledford@redhat.com> wrote: > Signed-off-by: Doug Ledford <dledford@redhat.com> I really really don't like this. I wasn't very keen on allowing the map file to be found in /dev, but this it just too ugly. I understand there is a problem here, but I don't like this approach to a solution. I'll give it more though when I get home from LCA2010 and see what I can come up with. Thanks, NeilBrown > --- > mdmon.c | 12 ++++++------ > msg.c | 2 +- > util.c | 4 ++-- > 3 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/mdmon.c b/mdmon.c > index 0ec4259..b1d7aef 100644 > --- a/mdmon.c > +++ b/mdmon.c > @@ -118,7 +118,7 @@ static int test_pidfile(char *devname) > char path[100]; > struct stat st; > > - sprintf(path, "/var/run/mdadm/%s.pid", devname); > + sprintf(path, "/dev/.mdadm/%s.pid", devname); > return stat(path, &st); > } > > @@ -132,7 +132,7 @@ int make_pidfile(char *devname, int o_excl) > if (sigterm) > return -1; > > - sprintf(path, "/var/run/mdadm/%s.pid", devname); > + sprintf(path, "/dev/.mdadm/%s.pid", devname); > > fd = open(path, O_RDWR|O_CREAT|o_excl, 0600); > if (fd < 0) > @@ -163,7 +163,7 @@ pid_t devname2mdmon(char *devname) > pid_t pid = -1; > int fd; > > - sprintf(buf, "/var/run/mdadm/%s.pid", devname); > + sprintf(buf, "/dev/.mdadm/%s.pid", devname); > fd = open(buf, O_RDONLY|O_NOATIME); > if (fd < 0) > return -1; > @@ -217,9 +217,9 @@ void remove_pidfile(char *devname) > if (sigterm) > return; > > - sprintf(buf, "/var/run/mdadm/%s.pid", devname); > + sprintf(buf, "/dev/.mdadm/%s.pid", devname); > unlink(buf); > - sprintf(buf, "/var/run/mdadm/%s.sock", devname); > + sprintf(buf, "/dev/.mdadm/%s.sock", devname); > unlink(buf); > } > > @@ -233,7 +233,7 @@ int make_control_sock(char *devname) > if (sigterm) > return -1; > > - sprintf(path, "/var/run/mdadm/%s.sock", devname); > + sprintf(path, "/dev/.mdadm/%s.sock", devname); > unlink(path); > sfd = socket(PF_LOCAL, SOCK_STREAM, 0); > if (sfd < 0) > diff --git a/msg.c b/msg.c > index 8d52b94..c3ab243 100644 > --- a/msg.c > +++ b/msg.c > @@ -147,7 +147,7 @@ int connect_monitor(char *devname) > int pos; > char *c; > > - pos = sprintf(path, "/var/run/mdadm/"); > + pos = sprintf(path, "/dev/.mdadm/"); > if (is_subarray(devname)) { > devname++; > c = strchr(devname, '/'); > diff --git a/util.c b/util.c > index 5feec43..864af69 100644 > --- a/util.c > +++ b/util.c > @@ -1469,7 +1469,7 @@ int mdmon_running(int devnum) > char pid[10]; > int fd; > int n; > - sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum)); > + sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum)); > fd = open(path, O_RDONLY, 0); > > if (fd < 0) > @@ -1489,7 +1489,7 @@ int signal_mdmon(int devnum) > char pid[10]; > int fd; > int n; > - sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum)); > + sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum)); > fd = open(path, O_RDONLY, 0); > > if (fd < 0) ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-01-18 22:09 ` Neil Brown @ 2010-01-19 7:21 ` Luca Berra 2010-01-19 17:51 ` Doug Ledford 1 sibling, 0 replies; 66+ messages in thread From: Luca Berra @ 2010-01-19 7:21 UTC (permalink / raw) To: linux-raid On Tue, Jan 19, 2010 at 11:09:30AM +1300, Neil Brown wrote: >On Mon, 11 Jan 2010 15:38:11 -0500 >Doug Ledford <dledford@redhat.com> wrote: > >> Signed-off-by: Doug Ledford <dledford@redhat.com> > >I really really don't like this. >I wasn't very keen on allowing the map file to be found in /dev, >but this it just too ugly. > >I understand there is a problem here, but I don't like this approach to a >solution. I'll give it more though when I get home from LCA2010 and see >what I can come up with. > I'll try holding my breath till then :) well, actually i'll use Doug's patch until a better solution is found. L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-01-18 22:09 ` Neil Brown 2010-01-19 7:21 ` Luca Berra @ 2010-01-19 17:51 ` Doug Ledford 2010-02-01 20:32 ` Bill Davidsen 2010-02-04 6:40 ` Neil Brown 1 sibling, 2 replies; 66+ messages in thread From: Doug Ledford @ 2010-01-19 17:51 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3211 bytes --] On 01/18/2010 05:09 PM, Neil Brown wrote: > On Mon, 11 Jan 2010 15:38:11 -0500 > Doug Ledford <dledford@redhat.com> wrote: > >> Signed-off-by: Doug Ledford <dledford@redhat.com> > > I really really don't like this. > I wasn't very keen on allowing the map file to be found in /dev, > but this it just too ugly. I've had to rewrite my response to this a few times :-/ So, let's be clear: you are objecting to these non device special files being located under /dev. Not necessarily *where* they are under /dev, just that they are under /dev at all. That's what I get from your statement above. First with devfs, then later with udev, the old unix tradition of only device special files under /dev is truly dead. And it should be. The files we are creating are needed prior to / filesystem bring up, and they are needed simply in order to fully populate /dev. In fact, an argument can be made that a new tradition, that files related to the creation and maintenance of device special files belong under /dev with the files they relate to, has been created. And this new tradition makes sense and is elegant on the basis that it requires only one read/write filesystem mount point during device special file population. It also makes sense that this new tradition would supersede the old tradition on the basis that the old tradition was created prior to the advent of hot plug and the need to have any read/write data just to populate your device special files. The old tradition didn't have the flexibility to deal with modern hot plug architectures, the new tradition fixes that, and does so as elegantly as possible. That being the case, the big player in the game, udev, is following the new tradition by creating an entire tree of non device special files under /dev/.udev and using that to store the information it needs. And here mdadm/mdmon are, the small players in the device bring up game that only have minor bit parts compared to udev, holding up progress and playing the recalcitrant old fart. Sorry Neil, but the war has already been decided and this is a dead battle. Files related to device special file bring up belong under /dev along with the files we are creating. Your claim that these changes are ugly are misplaced and based upon adherence to a dead tradition that has been replaced by a more sensible tradition. Maybe you don't like where they are under /dev, but the fact that they are under /dev is definitely the right thing to do and is not in the least bit ugly. > I understand there is a problem here, but I don't like this approach to a > solution. I'll give it more though when I get home from LCA2010 and see > what I can come up with. Feel free to come up with something different. But, if your solution involves maintaining an additional read/write mount area in deference to a long dead unix tradition, I'm just going to shake my head and patch your solution away to something sane. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-01-19 17:51 ` Doug Ledford @ 2010-02-01 20:32 ` Bill Davidsen 2010-02-01 21:32 ` Doug Ledford 2010-02-04 6:40 ` Neil Brown 1 sibling, 1 reply; 66+ messages in thread From: Bill Davidsen @ 2010-02-01 20:32 UTC (permalink / raw) To: Doug Ledford; +Cc: Neil Brown, linux-raid Doug Ledford wrote: > On 01/18/2010 05:09 PM, Neil Brown wrote: > > >> I understand there is a problem here, but I don't like this approach to a >> solution. I'll give it more though when I get home from LCA2010 and see >> what I can come up with. >> > > Feel free to come up with something different. But, if your solution > involves maintaining an additional read/write mount area in deference to > a long dead unix tradition, I'm just going to shake my head and patch > your solution away to something sane. > > I don't understand you argument here. Not the one where you say you're going to ignore Neil and do what you want because you can, I understand that, but the "additional read/write mount area" part, isn't /var/run r/w on all systems now? Could you clarify why this is "additional" here? -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-01 20:32 ` Bill Davidsen @ 2010-02-01 21:32 ` Doug Ledford 2010-02-01 22:42 ` Bill Davidsen 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-01 21:32 UTC (permalink / raw) To: Bill Davidsen; +Cc: Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 1453 bytes --] On 02/01/2010 03:32 PM, Bill Davidsen wrote: > Doug Ledford wrote: >> On 01/18/2010 05:09 PM, Neil Brown wrote: >> >>> I understand there is a problem here, but I don't like this approach >>> to a >>> solution. I'll give it more though when I get home from LCA2010 and see >>> what I can come up with. >>> >> >> Feel free to come up with something different. But, if your solution >> involves maintaining an additional read/write mount area in deference to >> a long dead unix tradition, I'm just going to shake my head and patch >> your solution away to something sane. >> >> > I don't understand you argument here. Not the one where you say you're > going to ignore Neil and do what you want because you can, I understand > that, but the "additional read/write mount area" part, isn't /var/run > r/w on all systems now? Could you clarify why this is "additional" here? > It's not necessarily read/write in the initrd time frame, and putting the mdadm files there means it would have to be. We didn't make these changes because we wanted to, we made them because using mdadm raid arrays for the root filesystem combined with incremental assembly or with imsm raid devices was broken otherwise. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-01 21:32 ` Doug Ledford @ 2010-02-01 22:42 ` Bill Davidsen 2010-02-02 4:08 ` Michael Evans 2010-02-02 18:07 ` Doug Ledford 0 siblings, 2 replies; 66+ messages in thread From: Bill Davidsen @ 2010-02-01 22:42 UTC (permalink / raw) To: Doug Ledford; +Cc: Neil Brown, linux-raid Doug Ledford wrote: > On 02/01/2010 03:32 PM, Bill Davidsen wrote: > >> Doug Ledford wrote: >> >>> On 01/18/2010 05:09 PM, Neil Brown wrote: >>> >>> >>>> I understand there is a problem here, but I don't like this approach >>>> to a >>>> solution. I'll give it more though when I get home from LCA2010 and see >>>> what I can come up with. >>>> >>>> >>> Feel free to come up with something different. But, if your solution >>> involves maintaining an additional read/write mount area in deference to >>> a long dead unix tradition, I'm just going to shake my head and patch >>> your solution away to something sane. >>> >>> >>> >> I don't understand you argument here. Not the one where you say you're >> going to ignore Neil and do what you want because you can, I understand >> that, but the "additional read/write mount area" part, isn't /var/run >> r/w on all systems now? Could you clarify why this is "additional" here? >> >> > > It's not necessarily read/write in the initrd time frame, and putting > the mdadm files there means it would have to be. We didn't make these > changes because we wanted to, we made them because using mdadm raid > arrays for the root filesystem combined with incremental assembly or > with imsm raid devices was broken otherwise. > > Do understand that my disquiet related to this isn't because you put a non-device in /dev, it's that you didn't put a process PID in /var/run. And frankly, once you let (force) one group of threads to be somewhere else, other services will want their PIDs some other place, and anyone maintaining an application which presents information on what's running will need to know where that information. In other words, it's not where you put it, it's where you *didn't* put it, that seems to be an invitation to put stuff just anywhere. Neil argues that they are not devices, I argue that they are PIDs. It's not as though it were a huge effort to move it after pivot root, it's a little code or script and in space which will be released. -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-01 22:42 ` Bill Davidsen @ 2010-02-02 4:08 ` Michael Evans 2010-02-02 7:17 ` Luca Berra ` (2 more replies) 2010-02-02 18:07 ` Doug Ledford 1 sibling, 3 replies; 66+ messages in thread From: Michael Evans @ 2010-02-02 4:08 UTC (permalink / raw) To: Bill Davidsen; +Cc: Doug Ledford, Neil Brown, linux-raid On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote: > Doug Ledford wrote: >> >> On 02/01/2010 03:32 PM, Bill Davidsen wrote: >> >>> >>> Doug Ledford wrote: >>> >>>> >>>> On 01/18/2010 05:09 PM, Neil Brown wrote: >>>> >>>>> >>>>> I understand there is a problem here, but I don't like this approach >>>>> to a >>>>> solution. I'll give it more though when I get home from LCA2010 and >>>>> see >>>>> what I can come up with. >>>>> >>>> >>>> Feel free to come up with something different. But, if your solution >>>> involves maintaining an additional read/write mount area in deference to >>>> a long dead unix tradition, I'm just going to shake my head and patch >>>> your solution away to something sane. >>>> >>>> >>> >>> I don't understand you argument here. Not the one where you say you're >>> going to ignore Neil and do what you want because you can, I understand >>> that, but the "additional read/write mount area" part, isn't /var/run >>> r/w on all systems now? Could you clarify why this is "additional" here? >>> >>> >> >> It's not necessarily read/write in the initrd time frame, and putting >> the mdadm files there means it would have to be. We didn't make these >> changes because we wanted to, we made them because using mdadm raid >> arrays for the root filesystem combined with incremental assembly or >> with imsm raid devices was broken otherwise. >> >> > > Do understand that my disquiet related to this isn't because you put a > non-device in /dev, it's that you > didn't put a process PID in /var/run. And frankly, once you let (force) one > group of threads to be somewhere > else, other services will want their PIDs some other place, and anyone > maintaining an application > which presents information on what's running will need to know where that > information. > > In other words, it's not where you put it, it's where you *didn't* put it, > that seems to be an > invitation to put stuff just anywhere. Neil argues that they are not > devices, I argue that > they are PIDs. It's not as though it were a huge effort to move it after > pivot root, it's a little code > or script and in space which will be released. > > -- > Bill Davidsen <davidsen@tmr.com> > "We can't solve today's problems by using the same thinking we > used in creating them." - Einstein > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Thank you for stating your concern; I think knowing that a very plausible solution is obvious. # at initrd/initramfs creation time ln -s /dev/.run /var/run #initrd/initramfs script mkdir /dev/.run The usual area becomes a symlink to a memory disk .Most systems have ample memory to support a few extra tiny files there. Cleanup on reboot is automatic. Any systems that are memory constrained probably already either have a drive they could swap this data out to, or would rather save the writes from reaching flash media anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-02 4:08 ` Michael Evans @ 2010-02-02 7:17 ` Luca Berra 2010-02-02 15:42 ` Bill Davidsen 2010-02-02 18:11 ` Doug Ledford 2 siblings, 0 replies; 66+ messages in thread From: Luca Berra @ 2010-02-02 7:17 UTC (permalink / raw) To: linux-raid; +Cc: initramfs Ccing intramfs since it could be of interest background: Doug's patch moves mdmon pid file and socket from /var/run to /dev to preserve them after pivot-root Rationale is mdmon gets started in initramfs when imsm or ddf arrays are activated. Neil does not like the proposed solution On Mon, Feb 01, 2010 at 08:08:54PM -0800, Michael Evans wrote: >On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote: >> Doug Ledford wrote: >>> >>> On 02/01/2010 03:32 PM, Bill Davidsen wrote: >>> >>>> >>>> Doug Ledford wrote: >>>> >>>>> >>>>> On 01/18/2010 05:09 PM, Neil Brown wrote: >>>>> >>>>>> >>>>>> I understand there is a problem here, but I don't like this approach >>>>>> to a >>>>>> solution. I'll give it more though when I get home from LCA2010 and >>>>>> see >>>>>> what I can come up with. >>>>>> >>>>> >>>>> Feel free to come up with something different. But, if your solution >>>>> involves maintaining an additional read/write mount area in deference to >>>>> a long dead unix tradition, I'm just going to shake my head and patch >>>>> your solution away to something sane. >>>>> >>>>> >>>> >>>> I don't understand you argument here. Not the one where you say you're >>>> going to ignore Neil and do what you want because you can, I understand >>>> that, but the "additional read/write mount area" part, isn't /var/run >>>> r/w on all systems now? Could you clarify why this is "additional" here? >>>> >>>> >>> >>> It's not necessarily read/write in the initrd time frame, and putting >>> the mdadm files there means it would have to be. We didn't make these >>> changes because we wanted to, we made them because using mdadm raid >>> arrays for the root filesystem combined with incremental assembly or >>> with imsm raid devices was broken otherwise. >>> >>> >> >> Do understand that my disquiet related to this isn't because you put a >> non-device in /dev, it's that you >> didn't put a process PID in /var/run. And frankly, once you let (force) one >> group of threads to be somewhere >> else, other services will want their PIDs some other place, and anyone >> maintaining an application >> which presents information on what's running will need to know where that >> information. >> >> In other words, it's not where you put it, it's where you *didn't* put it, >> that seems to be an >> invitation to put stuff just anywhere. Neil argues that they are not >> devices, I argue that >> they are PIDs. It's not as though it were a huge effort to move it after >> pivot root, it's a little code >> or script and in space which will be released. >> >> -- >> Bill Davidsen <davidsen@tmr.com> >> "We can't solve today's problems by using the same thinking we >> used in creating them." - Einstein >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >Thank you for stating your concern; I think knowing that a very >plausible solution is obvious. > ># at initrd/initramfs creation time >ln -s /dev/.run /var/run > >#initrd/initramfs script >mkdir /dev/.run > >The usual area becomes a symlink to a memory disk .Most systems have >ample memory to support a few extra tiny files there. Cleanup on >reboot is automatic. Any systems that are memory constrained probably >already either have a drive they could swap this data out to, or would >rather save the writes from reaching flash media anyway. this could be interesting, but then you have to move things back to /var/run after pivot root, and we cannot move a socket. still if it would suit both parties we could - keep the mdmon pid file in /var/run and use initramfs magik to preserve those pid files, this could become a standard solution when the next daemon arrives that needs to start from initramfs. - put the socket in /dev/md, and i defy anyone to say sockets do not belong in /dev, bringing forth the syslog daemon as a witness. Regards, L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-02 4:08 ` Michael Evans 2010-02-02 7:17 ` Luca Berra @ 2010-02-02 15:42 ` Bill Davidsen 2010-02-02 18:19 ` Doug Ledford 2010-02-02 18:11 ` Doug Ledford 2 siblings, 1 reply; 66+ messages in thread From: Bill Davidsen @ 2010-02-02 15:42 UTC (permalink / raw) To: Michael Evans; +Cc: Doug Ledford, Neil Brown, linux-raid Michael Evans wrote: > On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote: > >> Doug Ledford wrote: >> >>> On 02/01/2010 03:32 PM, Bill Davidsen wrote: >>> >>> >>>> Doug Ledford wrote: >>>> >>>> >>>>> On 01/18/2010 05:09 PM, Neil Brown wrote: >>>>> >>>>> >>>>>> I understand there is a problem here, but I don't like this approach >>>>>> to a >>>>>> solution. I'll give it more though when I get home from LCA2010 and >>>>>> see >>>>>> what I can come up with. >>>>>> >>>>>> >>>>> Feel free to come up with something different. But, if your solution >>>>> involves maintaining an additional read/write mount area in deference to >>>>> a long dead unix tradition, I'm just going to shake my head and patch >>>>> your solution away to something sane. >>>>> >>>>> >>>>> >>>> I don't understand you argument here. Not the one where you say you're >>>> going to ignore Neil and do what you want because you can, I understand >>>> that, but the "additional read/write mount area" part, isn't /var/run >>>> r/w on all systems now? Could you clarify why this is "additional" here? >>>> >>>> >>>> >>> It's not necessarily read/write in the initrd time frame, and putting >>> the mdadm files there means it would have to be. We didn't make these >>> changes because we wanted to, we made them because using mdadm raid >>> arrays for the root filesystem combined with incremental assembly or >>> with imsm raid devices was broken otherwise. >>> >>> >>> >> Do understand that my disquiet related to this isn't because you put a >> non-device in /dev, it's that you >> didn't put a process PID in /var/run. And frankly, once you let (force) one >> group of threads to be somewhere >> else, other services will want their PIDs some other place, and anyone >> maintaining an application >> which presents information on what's running will need to know where that >> information. >> >> In other words, it's not where you put it, it's where you *didn't* put it, >> that seems to be an >> invitation to put stuff just anywhere. Neil argues that they are not >> devices, I argue that >> they are PIDs. It's not as though it were a huge effort to move it after >> pivot root, it's a little code >> or script and in space which will be released. >> >> -- >> Bill Davidsen <davidsen@tmr.com> >> "We can't solve today's problems by using the same thinking we >> used in creating them." - Einstein >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > Thank you for stating your concern; I think knowing that a very > plausible solution is obvious. > > # at initrd/initramfs creation time > ln -s /dev/.run /var/run > > #initrd/initramfs script > mkdir /dev/.run > > The usual area becomes a symlink to a memory disk .Most systems have > ample memory to support a few extra tiny files there. Cleanup on > reboot is automatic. Any systems that are memory constrained probably > already either have a drive they could swap this data out to, or would > rather save the writes from reaching flash media anyway. > > The only possible side effect of that is that applications which put information in /var/run/subdir would have to create the subdir at run time rather than at the time of installing the application. And looking at my /var/run directory many applications do seem to have subdirectories in /var/run which were created when the applications were installed. I count 31 on this system, a quick check on other systems reveals up to 41 and 14-24 of those directories have not been used since the system was installed. That is, the applications have never been run. Does it really make sense to force modification of every application which installs a subdirectory in /var/run, and incur the overhead in each of those applications of checking for the directory and creating it if missing, as opposed to a single line in an init script to copy the boot time PID files from /dev to /var/run? It seems as if a lot of work and overhead is being generated for the applications, just to save a tiny bit of work for the people implementing a new boot procedure. (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir /dev/.run) Not only would this need a change in Fedora packages, but anyone writing a package for Linux in general would have to do it "the Fedora way" and even though Fedora is popular, I think some applications would choose to avoid the overhead and need ugly hacks in rc.local to create the directories at boot. All in all, I think the overhead belongs in the boot process, not all the existing applications. -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-02 15:42 ` Bill Davidsen @ 2010-02-02 18:19 ` Doug Ledford 2010-02-04 13:50 ` Bernd Schubert 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-02 18:19 UTC (permalink / raw) To: Bill Davidsen; +Cc: Michael Evans, Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 6481 bytes --] On 02/02/2010 10:42 AM, Bill Davidsen wrote: > Michael Evans wrote: >> On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote: >> >>> Doug Ledford wrote: >>> >>>> On 02/01/2010 03:32 PM, Bill Davidsen wrote: >>>> >>>> >>>>> Doug Ledford wrote: >>>>> >>>>> >>>>>> On 01/18/2010 05:09 PM, Neil Brown wrote: >>>>>> >>>>>> >>>>>>> I understand there is a problem here, but I don't like this approach >>>>>>> to a >>>>>>> solution. I'll give it more though when I get home from LCA2010 and >>>>>>> see >>>>>>> what I can come up with. >>>>>>> >>>>>>> >>>>>> Feel free to come up with something different. But, if your solution >>>>>> involves maintaining an additional read/write mount area in >>>>>> deference to >>>>>> a long dead unix tradition, I'm just going to shake my head and patch >>>>>> your solution away to something sane. >>>>>> >>>>>> >>>>>> >>>>> I don't understand you argument here. Not the one where you say you're >>>>> going to ignore Neil and do what you want because you can, I >>>>> understand >>>>> that, but the "additional read/write mount area" part, isn't /var/run >>>>> r/w on all systems now? Could you clarify why this is "additional" >>>>> here? >>>>> >>>>> >>>>> >>>> It's not necessarily read/write in the initrd time frame, and putting >>>> the mdadm files there means it would have to be. We didn't make these >>>> changes because we wanted to, we made them because using mdadm raid >>>> arrays for the root filesystem combined with incremental assembly or >>>> with imsm raid devices was broken otherwise. >>>> >>>> >>>> >>> Do understand that my disquiet related to this isn't because you put a >>> non-device in /dev, it's that you >>> didn't put a process PID in /var/run. And frankly, once you let >>> (force) one >>> group of threads to be somewhere >>> else, other services will want their PIDs some other place, and anyone >>> maintaining an application >>> which presents information on what's running will need to know where >>> that >>> information. >>> >>> In other words, it's not where you put it, it's where you *didn't* >>> put it, >>> that seems to be an >>> invitation to put stuff just anywhere. Neil argues that they are not >>> devices, I argue that >>> they are PIDs. It's not as though it were a huge effort to move it after >>> pivot root, it's a little code >>> or script and in space which will be released. >>> >>> -- >>> Bill Davidsen <davidsen@tmr.com> >>> "We can't solve today's problems by using the same thinking we >>> used in creating them." - Einstein >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> Thank you for stating your concern; I think knowing that a very >> plausible solution is obvious. >> >> # at initrd/initramfs creation time >> ln -s /dev/.run /var/run >> >> #initrd/initramfs script >> mkdir /dev/.run >> >> The usual area becomes a symlink to a memory disk .Most systems have >> ample memory to support a few extra tiny files there. Cleanup on >> reboot is automatic. Any systems that are memory constrained probably >> already either have a drive they could swap this data out to, or would >> rather save the writes from reaching flash media anyway. >> >> > The only possible side effect of that is that applications which put > information in /var/run/subdir would have to create the subdir at run > time rather than at the time of installing the application. And looking > at my /var/run directory many applications do seem to have > subdirectories in /var/run which were created when the applications were > installed. I count 31 on this system, a quick check on other systems > reveals up to 41 and 14-24 of those directories have not been used since > the system was installed. That is, the applications have never been run. > > Does it really make sense to force modification of every application > which installs a subdirectory in /var/run, and incur the overhead in > each of those applications of checking for the directory and creating it > if missing, as opposed to a single line in an init script to copy the > boot time PID files from /dev to /var/run? No. > It seems as if a lot of work > and overhead is being generated for the applications, just to save a > tiny bit of work for the people implementing a new boot procedure. > > (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir /dev/.run) I'm not sure I would do this either. While moving the file is possible, mdmon is actually intended to be run longer than the /var/run filesystem might be read/write. I think I would leave the mdmon files in /dev somewhere and link to them from /var/run. > Not only would this need a change in Fedora packages, but anyone writing > a package for Linux in general would have to do it "the Fedora way" and > even though Fedora is popular, I think some applications would choose to > avoid the overhead and need ugly hacks in rc.local to create the > directories at boot. > > All in all, I think the overhead belongs in the boot process, not all > the existing applications. It doesn't have to exist either place. We just need a set, accepted way to handle the problem. At this point I'm inclined to suggest that we use /dev/md/.mdadm and /dev/md/.mdmon for the respective files for each application (such as /dev/md/.mdadm/mdadm.map and /dev/md/.mdmon/*.pid) and we use static symbolic links in the rpm/deb package to point from /var/run to those two directories. The boot process doesn't have to be changed, utilities don't have to be changed, only the rpm/deb package needs updated to include the link and both mdmon and mdadm modified to create their respective directories if they don't exist already and put their files in those directories. That's it. Well, I'd have to get Dan Walsh to update the SELinux rules for mdadm too since the real directory location would change. But still, relatively painless stuff. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-02 18:19 ` Doug Ledford @ 2010-02-04 13:50 ` Bernd Schubert 2010-02-04 15:03 ` Bernd Schubert 0 siblings, 1 reply; 66+ messages in thread From: Bernd Schubert @ 2010-02-04 13:50 UTC (permalink / raw) To: Doug Ledford; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid On Tuesday 02 February 2010, Doug Ledford wrote: > On 02/02/2010 10:42 AM, Bill Davidsen wrote: > > Michael Evans wrote: > > It seems as if a lot of work > > and overhead is being generated for the applications, just to save a > > tiny bit of work for the people implementing a new boot procedure. > > > > (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir > > /dev/.run) What about to use "mount --move" from /var/run of the initrams to final /var/run before the chroot command? Cheers, Bernd ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 13:50 ` Bernd Schubert @ 2010-02-04 15:03 ` Bernd Schubert 2010-02-04 15:48 ` Doug Ledford 0 siblings, 1 reply; 66+ messages in thread From: Bernd Schubert @ 2010-02-04 15:03 UTC (permalink / raw) To: Doug Ledford; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid On Thursday 04 February 2010, Bernd Schubert wrote: > On Tuesday 02 February 2010, Doug Ledford wrote: > > On 02/02/2010 10:42 AM, Bill Davidsen wrote: > > > Michael Evans wrote: > > > It seems as if a lot of work > > > and overhead is being generated for the applications, just to save a > > > tiny bit of work for the people implementing a new boot procedure. > > > > > > (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir > > > /dev/.run) > > What about to use "mount --move" from /var/run of the initrams to final > /var/run before the chroot command? Oops, I meant pivot_root. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 15:03 ` Bernd Schubert @ 2010-02-04 15:48 ` Doug Ledford 2010-02-04 16:40 ` Bernd Schubert 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-04 15:48 UTC (permalink / raw) To: Bernd Schubert; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 930 bytes --] On 02/04/2010 10:03 AM, Bernd Schubert wrote: > On Thursday 04 February 2010, Bernd Schubert wrote: >> On Tuesday 02 February 2010, Doug Ledford wrote: >>> On 02/02/2010 10:42 AM, Bill Davidsen wrote: >>>> Michael Evans wrote: >>>> It seems as if a lot of work >>>> and overhead is being generated for the applications, just to save a >>>> tiny bit of work for the people implementing a new boot procedure. >>>> >>>> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir >>>> /dev/.run) >> >> What about to use "mount --move" from /var/run of the initrams to final >> /var/run before the chroot command? > > Oops, I meant pivot_root. Static files in /var/run would be lost that way. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 15:48 ` Doug Ledford @ 2010-02-04 16:40 ` Bernd Schubert 2010-02-04 17:35 ` Doug Ledford 0 siblings, 1 reply; 66+ messages in thread From: Bernd Schubert @ 2010-02-04 16:40 UTC (permalink / raw) To: Doug Ledford; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid On Thursday 04 February 2010, Doug Ledford wrote: > On 02/04/2010 10:03 AM, Bernd Schubert wrote: > > On Thursday 04 February 2010, Bernd Schubert wrote: > >> On Tuesday 02 February 2010, Doug Ledford wrote: > >>> On 02/02/2010 10:42 AM, Bill Davidsen wrote: > >>>> Michael Evans wrote: > >>>> It seems as if a lot of work > >>>> and overhead is being generated for the applications, just to save a > >>>> tiny bit of work for the people implementing a new boot procedure. > >>>> > >>>> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir > >>>> /dev/.run) > >> > >> What about to use "mount --move" from /var/run of the initrams to final > >> /var/run before the chroot command? > > > > Oops, I meant pivot_root. > > Static files in /var/run would be lost that way. > That should be easy using a simple subdir /var/run/mdadm and to mount --move this. Cheers, Bernd ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 16:40 ` Bernd Schubert @ 2010-02-04 17:35 ` Doug Ledford 0 siblings, 0 replies; 66+ messages in thread From: Doug Ledford @ 2010-02-04 17:35 UTC (permalink / raw) To: Bernd Schubert; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 1297 bytes --] On 02/04/2010 11:40 AM, Bernd Schubert wrote: > On Thursday 04 February 2010, Doug Ledford wrote: >> On 02/04/2010 10:03 AM, Bernd Schubert wrote: >>> On Thursday 04 February 2010, Bernd Schubert wrote: >>>> On Tuesday 02 February 2010, Doug Ledford wrote: >>>>> On 02/02/2010 10:42 AM, Bill Davidsen wrote: >>>>>> Michael Evans wrote: >>>>>> It seems as if a lot of work >>>>>> and overhead is being generated for the applications, just to save a >>>>>> tiny bit of work for the people implementing a new boot procedure. >>>>>> >>>>>> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir >>>>>> /dev/.run) >>>> >>>> What about to use "mount --move" from /var/run of the initrams to final >>>> /var/run before the chroot command? >>> >>> Oops, I meant pivot_root. >> >> Static files in /var/run would be lost that way. >> > > That should be easy using a simple subdir /var/run/mdadm and to mount --move > this. Except that /var/run/mdadm doesn't even need moved. Only the mdmon files need this. See my upcoming email for more details. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-02 4:08 ` Michael Evans 2010-02-02 7:17 ` Luca Berra 2010-02-02 15:42 ` Bill Davidsen @ 2010-02-02 18:11 ` Doug Ledford 2 siblings, 0 replies; 66+ messages in thread From: Doug Ledford @ 2010-02-02 18:11 UTC (permalink / raw) To: Michael Evans; +Cc: Bill Davidsen, Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 4263 bytes --] On 02/01/2010 11:08 PM, Michael Evans wrote: > On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote: >> Doug Ledford wrote: >>> >>> On 02/01/2010 03:32 PM, Bill Davidsen wrote: >>> >>>> >>>> Doug Ledford wrote: >>>> >>>>> >>>>> On 01/18/2010 05:09 PM, Neil Brown wrote: >>>>> >>>>>> >>>>>> I understand there is a problem here, but I don't like this approach >>>>>> to a >>>>>> solution. I'll give it more though when I get home from LCA2010 and >>>>>> see >>>>>> what I can come up with. >>>>>> >>>>> >>>>> Feel free to come up with something different. But, if your solution >>>>> involves maintaining an additional read/write mount area in deference to >>>>> a long dead unix tradition, I'm just going to shake my head and patch >>>>> your solution away to something sane. >>>>> >>>>> >>>> >>>> I don't understand you argument here. Not the one where you say you're >>>> going to ignore Neil and do what you want because you can, I understand >>>> that, but the "additional read/write mount area" part, isn't /var/run >>>> r/w on all systems now? Could you clarify why this is "additional" here? >>>> >>>> >>> >>> It's not necessarily read/write in the initrd time frame, and putting >>> the mdadm files there means it would have to be. We didn't make these >>> changes because we wanted to, we made them because using mdadm raid >>> arrays for the root filesystem combined with incremental assembly or >>> with imsm raid devices was broken otherwise. >>> >>> >> >> Do understand that my disquiet related to this isn't because you put a >> non-device in /dev, it's that you >> didn't put a process PID in /var/run. And frankly, once you let (force) one >> group of threads to be somewhere >> else, other services will want their PIDs some other place, and anyone >> maintaining an application >> which presents information on what's running will need to know where that >> information. >> >> In other words, it's not where you put it, it's where you *didn't* put it, >> that seems to be an >> invitation to put stuff just anywhere. Neil argues that they are not >> devices, I argue that >> they are PIDs. It's not as though it were a huge effort to move it after >> pivot root, it's a little code >> or script and in space which will be released. >> >> -- >> Bill Davidsen <davidsen@tmr.com> >> "We can't solve today's problems by using the same thinking we >> used in creating them." - Einstein >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Thank you for stating your concern; I think knowing that a very > plausible solution is obvious. > > # at initrd/initramfs creation time > ln -s /dev/.run /var/run > > #initrd/initramfs script > mkdir /dev/.run > > The usual area becomes a symlink to a memory disk .Most systems have > ample memory to support a few extra tiny files there. Cleanup on > reboot is automatic. Any systems that are memory constrained probably > already either have a drive they could swap this data out to, or would > rather save the writes from reaching flash media anyway. It's highly likely that mdmon would need its own directory, so ln -s /dev/.mdmon /var/run/mdmon would be more suitable. This is due to SELinux contexts. I know mdadm needed /var/run/mdadm so that in monitor mode with strong SELinux enabled it could have its own private context and that context could then be given the permissions it needed (create dev file, access sendmail, etc., a number of these actions are the very type of actions that programs do when compromised so part of strong security is only granting those perms to programs that legitimately need them). Mdmon may not need the same perms that mdadm did, but I still wouldn't be surprised if it needs its own context/directory due to the relative danger of handing out raw disk access. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-01 22:42 ` Bill Davidsen 2010-02-02 4:08 ` Michael Evans @ 2010-02-02 18:07 ` Doug Ledford 2010-02-02 18:18 ` Bill Davidsen 1 sibling, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-02 18:07 UTC (permalink / raw) To: Bill Davidsen; +Cc: Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 799 bytes --] On 02/01/2010 05:42 PM, Bill Davidsen wrote: > In other words, it's not where you put it, it's where you *didn't* put > it, that seems to be an > invitation to put stuff just anywhere. Neil argues that they are not > devices, I argue that > they are PIDs. It's not as though it were a huge effort to move it after > pivot root, it's a little code > or script and in space which will be released. On the pid files I see your point. Not that it changes the problem, but it does at least point out that more needs to be done and that the current solution is incomplete. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-02 18:07 ` Doug Ledford @ 2010-02-02 18:18 ` Bill Davidsen 0 siblings, 0 replies; 66+ messages in thread From: Bill Davidsen @ 2010-02-02 18:18 UTC (permalink / raw) To: Doug Ledford; +Cc: Neil Brown, linux-raid Doug Ledford wrote: > On 02/01/2010 05:42 PM, Bill Davidsen wrote: > > >> In other words, it's not where you put it, it's where you *didn't* put >> it, that seems to be an >> invitation to put stuff just anywhere. Neil argues that they are not >> devices, I argue that >> they are PIDs. It's not as though it were a huge effort to move it after >> pivot root, it's a little code >> or script and in space which will be released. >> > > On the pid files I see your point. Not that it changes the problem, but > it does at least point out that more needs to be done and that the > current solution is incomplete. > > Good, and your point about context and such is well taken. I still feel that the way to solve this would be to copy *just* the PID files to a real /var/run and any directories being created would have default permissions. It just feels as if there's a single point at which this could be done. And of course "if you broke it, you should fix it." That's a popular thing Linus said, but the principle dates back to MULTICS, at least (70's). -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-01-19 17:51 ` Doug Ledford 2010-02-01 20:32 ` Bill Davidsen @ 2010-02-04 6:40 ` Neil Brown 2010-02-04 18:45 ` Doug Ledford 1 sibling, 1 reply; 66+ messages in thread From: Neil Brown @ 2010-02-04 6:40 UTC (permalink / raw) To: Doug Ledford Cc: linux-raid, initramfs, Dan Williams, martin f krafft, Michal Marek [cc:ing initramfs because anther part of this thread was already cc:ed there, but this is the one I wanted to reply to. cc:ed to various md/mdadm maintainers too] On Tue, 19 Jan 2010 12:51:52 -0500 Doug Ledford <dledford@redhat.com> wrote: > On 01/18/2010 05:09 PM, Neil Brown wrote: > > On Mon, 11 Jan 2010 15:38:11 -0500 > > Doug Ledford <dledford@redhat.com> wrote: > > > >> Signed-off-by: Doug Ledford <dledford@redhat.com> > > > > I really really don't like this. > > I wasn't very keen on allowing the map file to be found in /dev, > > but this it just too ugly. > > I've had to rewrite my response to this a few times :-/ > > So, let's be clear: you are objecting to these non device special files > being located under /dev. Not necessarily *where* they are under /dev, > just that they are under /dev at all. That's what I get from your > statement above. > > First with devfs, then later with udev, the old unix tradition of only > device special files under /dev is truly dead. And it should be. The > files we are creating are needed prior to / filesystem bring up, and > they are needed simply in order to fully populate /dev. In fact, an > argument can be made that a new tradition, that files related to the > creation and maintenance of device special files belong under /dev with > the files they relate to, has been created. And this new tradition > makes sense and is elegant on the basis that it requires only one > read/write filesystem mount point during device special file population. > It also makes sense that this new tradition would supersede the old > tradition on the basis that the old tradition was created prior to the > advent of hot plug and the need to have any read/write data just to > populate your device special files. The old tradition didn't have the > flexibility to deal with modern hot plug architectures, the new > tradition fixes that, and does so as elegantly as possible. > > That being the case, the big player in the game, udev, is following the > new tradition by creating an entire tree of non device special files > under /dev/.udev and using that to store the information it needs. And > here mdadm/mdmon are, the small players in the device bring up game that > only have minor bit parts compared to udev, holding up progress and > playing the recalcitrant old fart. Sorry Neil, but the war has already > been decided and this is a dead battle. Files related to device special > file bring up belong under /dev along with the files we are creating. > Your claim that these changes are ugly are misplaced and based upon > adherence to a dead tradition that has been replaced by a more sensible > tradition. Maybe you don't like where they are under /dev, but the fact > that they are under /dev is definitely the right thing to do and is not > in the least bit ugly. > > > I understand there is a problem here, but I don't like this approach to a > > solution. I'll give it more though when I get home from LCA2010 and see > > what I can come up with. > > Feel free to come up with something different. But, if your solution > involves maintaining an additional read/write mount area in deference to > a long dead unix tradition, I'm just going to shake my head and patch > your solution away to something sane. > So I've had a good long think about this. Your arguments about using /dev do have some merit. However they sound more like post-hoc justification then genuine motivation. If the train of thought went: I need some files that are related to device management. Where shall I put them? I know, I'll put them in /dev. then it would be more convincing. But the logic actually went: I need some files to persist from early boot through to when the system has all basic filesystems mounted. Where shall I put them? I know, I'll put them in /dev. That sounds a lot less convincing. Given that chain of thought I would be more likely to come to the conclusion "I know, I'll put them in /lib/init/rw". Or at least I would on Debian - I don't know that any non-Debian-derived distros support that directory. The fact that Debian does have this directory and stores in there things that are not related to devices suggests that there is a real need for "persists from early boot" that does not fit in /dev. So if I put mdadm bits in /dev just because I can then I am making the /proc mistake of valuing pragmatics over elegance, and that is not a good long-term direction. Your argument that "udev does it so it must be OK" is also fairly weak. I would rather be a "recalcitrant old fart" than "wrong" any day. The fact that udev uses "/dev/.udev" is already an admission of failure. Prefixing a file name with '.' effectively says "I don't know where to put this, and I know it doesn't really belong here, but I cannot think of anything better so I'm going to do it anyway - shhh don't tell anyone". If only the founding fathers had given us a $HOME/rc directory for all the rc files we would be a lot better off. But there is still a problem that needs to be solved. mdmon needs to be running before any a certain class of md arrays (those with user-space managed metadata) can be written to. Because some filesystems choose to write to the device even when the filesystem is mounted read-only (which should be a hanging offence, but isn't yet) we potentially need mdmon running before the root filesystem is mounted. Because we want to unmount and completely discard the filesystem that holds the mdmon binary that was run early, we need to kill it and start a new one running from final namespace. This is also needed as to a small extent the filesystem is used to communicate between mdadm and a running mdmon, and having them have the same root is less confusing. There are three ways we can achieve this. 1/ If we can assume that between the time when the original "mount" completes and when the "mount -o remount,rw" happens the filesystem doesn't write to the device, then we can simply kill mdmon after the root is mounted, and restart it before remounting. However I don't trust filesystem implementers so I won't recommend that. 2/ Before the pivot root we can kill the old mdmon and start the new one chrooted into the final root. 3/ After the pivot root we can kill the old mdmon and start the new one. Number 2 is the approach that we (Well mostly Dan) originally intended and that the code implements ... or tries to. It got broken and I never noticed. I think I have fixed it now for 3.1.2. However it requires that /var/run exists and is writeable during early boot. I'm not sure that I am really comfortable requiring that. If the contents of /var/run are not going to persist then it would be better if they didn't exist. mdadm current relies on that non-existence for proper handing of the "mapfile". Number 3 would seem simplest except for the simple task of finding out which process to kill, and how to wait for it to clean up and die. This is where the suggestion of putting some key files in /dev comes from. If the mdmon pid file and socket were in /dev then a new mdmon would be able to find them, signal the pid, and read on the socket until it got EOF (because the other end was closed). If they aren't in /dev (or /lib/init/rw) then it isn't possible to find them. I could hunt through /proc to find the process called "mdmon" with the right args, kill that, and wait until it has gone. But that is rather ugly and I want to avoid "ugly". A really key consideration here is to make it all really easy for the distro package maintainers because debugging issues with early boot is really hard, and the maintainers have all got more interesting things to do with their time. So while I could suggest that the above ugliness be put in a script if you don't want to make /var/run persist from early boot (my preferred solution), I'm not going to do that. I think that what I will do is: - the "official" homes for the pid and unix-domain-sock are in /var/run (preferably /var/run/mdadm/ but Doug said something about needing /var/run/mdmon/ to placate the monster that is SELinux - I need more information about that). When mdadm wants to communicate with mdmon it always looks there. - There is an alternative home which is /lib/init/rw/mdadm/ by default, but a 'make' option can easily change that if a distro wants to. If I cannot access or mkdir /var/run/mdadm, I will mkdir /lib/init/rw/mdadm to have some where to create files - mdadm when run in the "take over from previous instance" mode will look in /lib/init/rw/mdadm for the relevant .pid and .sock files if they aren't in /var/run/mdadm - mdmon.8 will list the various options with details. So I get to maintain a Unix tradition which might still have some life it after all, and Doug gets a very easy way to patch in his own version of sanity. (comments always welcome - I have made the changes described above and pushed them to git://neil.brown.name/mdadm, but it isn't to late to change it completely if that turns out to be best) NeilBrown ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 6:40 ` Neil Brown @ 2010-02-04 18:45 ` Doug Ledford [not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2010-02-08 3:45 ` Neil Brown 0 siblings, 2 replies; 66+ messages in thread From: Doug Ledford @ 2010-02-04 18:45 UTC (permalink / raw) To: Neil Brown Cc: linux-raid, initramfs, Dan Williams, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 20983 bytes --] On 02/04/2010 01:40 AM, Neil Brown wrote: > > [cc:ing initramfs because anther part of this thread was already > cc:ed there, but this is the one I wanted to reply to. > cc:ed to various md/mdadm maintainers too] > > On Tue, 19 Jan 2010 12:51:52 -0500 > Doug Ledford <dledford@redhat.com> wrote: > >> On 01/18/2010 05:09 PM, Neil Brown wrote: >>> On Mon, 11 Jan 2010 15:38:11 -0500 >>> Doug Ledford <dledford@redhat.com> wrote: >>> >>>> Signed-off-by: Doug Ledford <dledford@redhat.com> >>> >>> I really really don't like this. >>> I wasn't very keen on allowing the map file to be found in /dev, >>> but this it just too ugly. >> >> I've had to rewrite my response to this a few times :-/ >> >> So, let's be clear: you are objecting to these non device special files >> being located under /dev. Not necessarily *where* they are under /dev, >> just that they are under /dev at all. That's what I get from your >> statement above. >> >> First with devfs, then later with udev, the old unix tradition of only >> device special files under /dev is truly dead. And it should be. The >> files we are creating are needed prior to / filesystem bring up, and >> they are needed simply in order to fully populate /dev. In fact, an >> argument can be made that a new tradition, that files related to the >> creation and maintenance of device special files belong under /dev with >> the files they relate to, has been created. And this new tradition >> makes sense and is elegant on the basis that it requires only one >> read/write filesystem mount point during device special file population. >> It also makes sense that this new tradition would supersede the old >> tradition on the basis that the old tradition was created prior to the >> advent of hot plug and the need to have any read/write data just to >> populate your device special files. The old tradition didn't have the >> flexibility to deal with modern hot plug architectures, the new >> tradition fixes that, and does so as elegantly as possible. >> >> That being the case, the big player in the game, udev, is following the >> new tradition by creating an entire tree of non device special files >> under /dev/.udev and using that to store the information it needs. And >> here mdadm/mdmon are, the small players in the device bring up game that >> only have minor bit parts compared to udev, holding up progress and >> playing the recalcitrant old fart. Sorry Neil, but the war has already >> been decided and this is a dead battle. Files related to device special >> file bring up belong under /dev along with the files we are creating. >> Your claim that these changes are ugly are misplaced and based upon >> adherence to a dead tradition that has been replaced by a more sensible >> tradition. Maybe you don't like where they are under /dev, but the fact >> that they are under /dev is definitely the right thing to do and is not >> in the least bit ugly. >> >>> I understand there is a problem here, but I don't like this approach to a >>> solution. I'll give it more though when I get home from LCA2010 and see >>> what I can come up with. >> >> Feel free to come up with something different. But, if your solution >> involves maintaining an additional read/write mount area in deference to >> a long dead unix tradition, I'm just going to shake my head and patch >> your solution away to something sane. >> > > So I've had a good long think about this. > > Your arguments about using /dev do have some merit. However they sound more > like post-hoc justification then genuine motivation. > If the train of thought went: > I need some files that are related to device management. Where shall I > put them? I know, I'll put them in /dev. > then it would be more convincing. But the logic actually went: > I need some files to persist from early boot through to when the system > has all basic filesystems mounted. Where shall I put them? I know, I'll > put them in /dev. > That sounds a lot less convincing. To be fair, if post-hoc versus initial made any difference what so ever, then so would the fact that I wouldn't have chosen to have these files exist at all. I would have made incremental assembly work without a map file and I would have made imsm superblock handling be in the kernel. So, I'm dealing with the consequences of decisions I didn't make and wouldn't have made. I don't think it's then fair to put some sort of 'premeditated' versus 'dealing with the situation' bias on my response. > Given that chain of thought I would be more likely to come to the conclusion > "I know, I'll put them in /lib/init/rw". Or at least I would on Debian - > I don't know that any non-Debian-derived distros support that directory. I have no idea. Not one of the files in question belongs there any more than in /dev or anywhere else for that matter though, so I wouldn't come to that conclusion in your shoes. But I find it somewhat disheartening to hear you disparage my choice to put the files in /dev because "I just wanted someplace to throw them" and then you would suggest /lib/init/rw when in fact, according to this debian bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%23403863#35 the whole /lib/init/rw things is *exactly* that same thing. It's a "we needed someplace to throw some files and didn't want to go through committee so we found someplace we owned and could do what we want" thing. In addition, as the person that reported this bug pointed out, things like pid files and map files are just as big of a FHS violation in /lib as they are in /dev. Neither place is the right place. Hell, they even had to make modifications to chkrootkit to accommodate this new directory and the files in there. Your choice of one over the other is purely personal aesthetics, and there are real and legitimate reasons to prefer *not* to have this directory. Boot complexity being the main one. The fact that at least the mdadm map file is an enumeration of device special files and mdadm devices and as such really belongs much more in /dev than in /lib is another. > The fact that Debian does have this directory and stores in there things that > are not related to devices suggests that there is a real need for "persists > from early boot" that does not fit in /dev. So if I put mdadm bits in /dev > just because I can then I am making the /proc mistake of valuing pragmatics > over elegance, and that is not a good long-term direction. > > Your argument that "udev does it so it must be OK" is also fairly weak. I > would rather be a "recalcitrant old fart" than "wrong" any day. > The fact that udev uses "/dev/.udev" is already an admission of failure. I disagree. > Prefixing a file name with '.' effectively says "I don't know where to put > this, and I know it doesn't really belong here, but I cannot think of > anything better so I'm going to do it anyway - shhh don't tell anyone". I disagree with that too, all except the shhh don't tell anyone part. Yes dot files by default keep something from being seen. But in the context of /dev/.udev the idea makes sense. The udev files are directly related to device bring up, but a big part of the reason udev is in use today was to unclutter /dev and remove device special files that we used to create *in case* the device existed and replace them with the device special files that are there for the devices that actually do exist. So, since udev is there to declutter /dev, it would not then make sense to turn around and add back in new clutter, so .udev instead of udev. > If only the founding fathers had given us a $HOME/rc directory for all the > rc files we would be a lot better off. > > But there is still a problem that needs to be solved. > > mdmon needs to be running before any a certain class of md arrays (those with > user-space managed metadata) can be written to. Because some filesystems > choose to write to the device even when the filesystem is mounted read-only > (which should be a hanging offence, but isn't yet) Just to sidestep a second on the filesystem issue, there are only two choices when it comes to filesystems: allow them to be mounted read only (truly read only) and inconsistent or pseudo read only (where the filesystem itself is the only thing that writes to the filesystem) and be able to guarantee consistency. The only way for a journaled filesystem to provide the guarantee it does is that it writes to the device during mount even if its a read only mount. This is because they guarantee to always be able to *restore* a filesystem to a sane state, not that it will always *be* in a sane state. If they didn't do that restore on mount, then possibly the thing that is inconsistent is /sbin/init and the machine doesn't boot. In other words, the point of a journaled filesystem would be wasted if they didn't do what they do. The only other option is to do the replay in page cache and allow the page cache and physical device to differ until the filesystem goes read write, but I'm not sure that level of complexity is warranted or advisable, especially since it could easily confuse anything that tries to read from the disks directly. > we potentially need mdmon > running before the root filesystem is mounted. > > Because we want to unmount and completely discard the filesystem that holds > the mdmon binary that was run early, we need to kill it and start a new one > running from final namespace. This is also needed as to a small extent the > filesystem is used to communicate between mdadm and a running mdmon, and > having them have the same root is less confusing. > > There are three ways we can achieve this. > > 1/ If we can assume that between the time when the original "mount" completes > and when the "mount -o remount,rw" happens the filesystem doesn't write to > the device, then we can simply kill mdmon after the root is mounted, and > restart it before remounting. However I don't trust filesystem > implementers so I won't recommend that. > > 2/ Before the pivot root we can kill the old mdmon and start the new one > chrooted into the final root. > 3/ After the pivot root we can kill the old mdmon and start the new one. > > Number 2 is the approach that we (Well mostly Dan) originally intended and > that the code implements ... or tries to. It got broken and I never > noticed. I think I have fixed it now for 3.1.2. Note, as I recall, Hans switched things to be #3 for various reasons. That he switched it to #3 doesn't effect mdmon really, as it still is just killing and restarting, but doing it after the pivot root solved a couple issues. I don't recall what they were, you would have to talk to Hans about that. And you left part of the issue out. Yes, all the before bring up stuff is true, but also true is that we want mdmon to hang around longer than anyone else. By the time mdmon is ready to be shutdown, /var/run is once again read only. So clean up can't be done. On the other hand, if the files for mdmon are on a temporary filesystem that is rebuilt at every boot...you get the point. > However it requires that /var/run exists and is writeable during early boot. > I'm not sure that I am really comfortable requiring that. If the contents > of /var/run are not going to persist then it would be better if they didn't > exist. mdadm current relies on that non-existence for proper handing of the > "mapfile". Can you explain this? I see nothing in the sources that tells me what you mean by the non-existence of /var/run causing the mapfile to be handled properly (and I'm not sure that's a valid requirement to put on the system anyway because now you are dictating that if another early boot application needs read only access to /var/run and we create /var/run for that purpose, then it would in some way break mdadm's operation). > Number 3 would seem simplest except for the simple task of > finding out which process to kill, and how to wait for it to clean up and > die. > > This is where the suggestion of putting some key files in /dev comes from. > If the mdmon pid file and socket were in /dev then a new mdmon would be able > to find them, signal the pid, and read on the socket until it got EOF > (because the other end was closed). If they aren't in /dev (or /lib/init/rw) > then it isn't possible to find them. > > I could hunt through /proc to find the process called "mdmon" with the right > args, kill that, and wait until it has gone. But that is rather ugly and I > want to avoid "ugly". > > A really key consideration here is to make it all really easy for the distro > package maintainers because debugging issues with early boot is really hard, > and the maintainers have all got more interesting things to do with their > time. > So while I could suggest that the above ugliness be put in a script if you > don't want to make /var/run persist from early boot (my preferred solution), > I'm not going to do that. > > I think that what I will do is: > > - the "official" homes for the pid and unix-domain-sock are in /var/run > (preferably /var/run/mdadm/ but Doug said something about needing > /var/run/mdmon/ to placate the monster that is SELinux - I need more > information about that). mdmon does not need access to sendmail, so it should not be in the same context as the mdadm files. This allows a more restrictive set of perms on mdmon than on mdadm itself. If we put the mdmon files in /var/run/mdadm, then they will have to have the same context as mdadm, and because mdadm does so many things, it's already got an overly liberal set of permissions compared to what mdmon realistically needs. > When mdadm wants to communicate with mdmon it always looks there. > > - There is an alternative home which is /lib/init/rw/mdadm/ by default, What happens to the files later in the boot process. Are they left here? Or are they migrated to an appropriate location later? If they are just left here, then this makes even *less* sense than putting the files under /dev as you've created a diversion zone in the filesystem. Someplace to throw things that *should* be elsewhere and then leave them there. Hopefully nothing gets left here. And if nothing gets left here, then whether the temporary spot is /dev/gonna_be_deleted_after_stuff_is_moved_out or /lib/init/rw makes no real difference except in the complexity of the initramfs, and more complex is more prone to break so I go with the single rw mount point/area. > but a 'make' option can easily change that if a distro wants to. Thank you, I'm sure I'll end up using that. > If I cannot access or mkdir /var/run/mdadm, I will mkdir /lib/init/rw/mdadm > to have some where to create files And so we are back to preserving two different read/write areas in the filesystem for very early boot, at least in the default, which is why I'm sure I'll use the make option. > - mdadm when run in the "take over from previous instance" mode will > look in /lib/init/rw/mdadm for the relevant .pid and .sock files if they > aren't in /var/run/mdadm Now I'm a bit concerned. What happens when the new program starts up? If /var/run is now read/write, will the new mdmon then write the files in /var/run/mdadm (or mdmon)? If it does do this in preference to /lib/init/rw/mdadm, which I would expect because if it doesn't then the issue that Bill Davidson brought up about the issue not being files under /dev but actually being certain files *not* being under /var/run creeps right back up. So, are you going to symlink /var/run/mdadm (or mdmon) to /lib/init/rw/mdadm? If so, then you are now doing *exactly* as I proposed except in /lib/init/rw/mdadm instead of something like /dev/md/.mdadm. If you don't, then I foresee problems in your future in that when mdmon is restarted in the root context, it will write files in the real /var/run/mdadm directory, but before mdmon ever shuts down, the / filesystem will be readonly, and so those files will never get cleaned, and on the next boot you will have stale files there that you will have to workaround when it comes mdmon restart time as you'll need to ignore or clean out /var/run/mdadm and then use the ones in /lib/init/rw/mdadm instead. I'm sorry Neil, but this is sounding uglier and uglier by the minute, not elegant. > - mdmon.8 will list the various options with details. > > > So I get to maintain a Unix tradition which might still have some life it > after all, and Doug gets a very easy way to patch in his own version of > sanity. > > (comments always welcome - I have made the changes described above and pushed > them to git://neil.brown.name/mdadm, but it isn't to late to change it > completely if that turns out to be best) I made my proposal in another email. But, I didn't necessarily argue for it. Since you've argued for yours, and since this is going to a mailing list that I don't think significant parts of the original thread went to, I'll present mine with the arguments. Let's look at this on a file by file basis. First, for mdadm: mdadm.map - incremental map file, needs to be read/write before / is read/write if using incremental assembly on root array. Used to be stored in /var/run/mdadm/mdadm.map. This isn't read/write early enough, so incremental assembly would break. Neil noted something above about if /var/run/mdadm doesn't exist and isn't writable then mdadm does something different in mdadm current, but I looked in the git repo and could not see where the specific problem a readonly /var/run caused would be fixed, so I'll assume for now that a readonly /var/run is still just as broken as before. We moved the file to /dev/md/.mdadm.map, but Neil didn't like that and made it /dev/.mdadm.map instead. I would actually propose /dev/md/incremental.map as it A) isn't hidden and I believe it shouldn't be hidden because of E later on, B) clearly indicates the purpose of the file, C) would be in an md specific/owned area of /dev, D) is unlike to ever conflict with someone's desired md device name, E) is a file specific to the enumeration and bring up of md device special files and as such can be argued to belong in /dev anyway, and F) solves the problem of needing a read/write /var/run for incremental assembly to work. mdadm.pid - this is only used my mdadm in monitor mode, which is not started until after the filesystem is read/write. This can safely reside in /var/run/mdadm as it does today, no changes needed. Now the files for mdmon: devname.pid, devname.sock - we use one mdmon per imsm array and each mdmon has its own pid and sock file named after the array it is watching. The problem being that if our root filesystem is on one of these imsm arrays, we need mdmon up and running so it can mark the array dirty because we will likely cause writes via possible journal replays as we mount root. Likewise, even though there is code in mdmon to clean up the pid/sock files, if we are talking about the mdmon for the root filesystem, that cleanup can't happen as we need mdmon around to mark the array clean after the final writes from going readonly are complete (and in fact, during the final halt script on Fedora, we specifically exclude *all* mdmon instances from the last killall that we do, then we call mdadm to --wait-clean so we know that all the mdmons have marked the devices clean after the readonly remount, then we reboot, so we don't even kill the mdmon programs, ever). That means they will never clean up their sock and pid files. As it turns out, being on a tmpfs, permanently, is best for the mdmon files. We need them to be written before the system comes up, and we need them to stick around while the system goes down (we actually read the pid files to find what pids to omit from the global killall we do), but we also want them to go away when we reboot. So, location wise, /dev isn't necessarily the right place for them. However, now that we use udev for dev, semantic wise it's perfect. And we do have the one argument that they are at least related to the bring up and take down of device special files. So, for these files, I would actually argue for either /dev/.udev/mdmon with a symlink from /var/run/mdmon to this location, or for /dev/md/.mdmon, again with a symlink from /var/run/mdmon. So that's my suggestion for how to handle this stuff. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2010-02-04 23:04 ` Dan Williams [not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-02-06 17:51 ` Doug Ledford 2010-02-07 22:13 ` Hans de Goede 1 sibling, 2 replies; 66+ messages in thread From: Dan Williams @ 2010-02-04 23:04 UTC (permalink / raw) To: Doug Ledford Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham On Thu, Feb 4, 2010 at 11:45 AM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > To be fair, if post-hoc versus initial made any difference what so ever, > then so would the fact that I wouldn't have chosen to have these files > exist at all. I would have made incremental assembly work without a map > file and I would have made imsm superblock handling be in the kernel. > So, I'm dealing with the consequences of decisions I didn't make and > wouldn't have made. I don't think it's then fair to put some sort of > 'premeditated' versus 'dealing with the situation' bias on my response. On the argument about where to place the mdmon files I am now torn between the "Neil" and "Doug" positions, but on the decision of where to place imsm superblock handling I stand behind the design decision to put it in userspace. 1/ If you take a look at native md superblock support you see that the support code is duplicated between kernel-space and user space, having it all handled in userspace means only one code base to maintain (elegant aspect #1). 2/ The kernel can simply worry about the *mechanism* of providing raid while all the assembly *policy* and support for any number of superblock formats is relegated to where policy belongs (elegant aspect #2). 2a/ This simply follows in the path of the design decision to not support in-kernel auto-assembly of version-1 superblocks which started the requirement to use an initramfs to boot software raid. (this is a not so elegant aspect because it mandates an initramfs to boot, but I don't think a general purpose distro can ever get away from that requirement). I will say that needing to touch several software packages (kernel, initramfs, initscripts, mdadm) to get imsm superblock support has added some excitement to the process in the short term. Long term I think the elegant aspects of the decision will prove their worth. -- Dan ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-02-05 0:21 ` Bill Davidsen 2010-02-05 12:14 ` Luca Berra 0 siblings, 1 reply; 66+ messages in thread From: Bill Davidsen @ 2010-02-05 0:21 UTC (permalink / raw) To: Dan Williams Cc: Doug Ledford, Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham Dan Williams wrote: > On Thu, Feb 4, 2010 at 11:45 AM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> To be fair, if post-hoc versus initial made any difference what so ever, >> then so would the fact that I wouldn't have chosen to have these files >> exist at all. I would have made incremental assembly work without a map >> file and I would have made imsm superblock handling be in the kernel. >> So, I'm dealing with the consequences of decisions I didn't make and >> wouldn't have made. I don't think it's then fair to put some sort of >> 'premeditated' versus 'dealing with the situation' bias on my response. >> > > On the argument about where to place the mdmon files I am now torn > between the "Neil" and "Doug" positions, but on the decision of where > to place imsm superblock handling I stand behind the design decision > to put it in userspace. > > 1/ If you take a look at native md superblock support you see that the > support code is duplicated between kernel-space and user space, having > it all handled in userspace means only one code base to maintain > (elegant aspect #1). > That is the decision which I question. Having anything mission critical in user space means that there suddenly arise ownership, privilege and scheduling issues which just don't exist for things in the kernel. Just my opinion, I believe it introduces additional points of failure. Perhaps like crypto it could be called from user or kernel space but live in the kernel. -- Bill Davidsen <davidsen-sQDSfeB7uhw@public.gmane.org> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-05 0:21 ` Bill Davidsen @ 2010-02-05 12:14 ` Luca Berra 0 siblings, 0 replies; 66+ messages in thread From: Luca Berra @ 2010-02-05 12:14 UTC (permalink / raw) To: linux-raid On Thu, Feb 04, 2010 at 07:21:59PM -0500, Bill Davidsen wrote: >> 1/ If you take a look at native md superblock support you see that the >> support code is duplicated between kernel-space and user space, having >> it all handled in userspace means only one code base to maintain >> (elegant aspect #1). >> > > That is the decision which I question. Having anything mission critical in > user space means that there suddenly arise ownership, privilege and > scheduling issues which just don't exist for things in the kernel. lol @ /sbin/mount -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 23:04 ` Dan Williams [not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-02-06 17:51 ` Doug Ledford [not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-06 17:51 UTC (permalink / raw) To: Dan Williams Cc: Neil Brown, linux-raid, initramfs, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 4352 bytes --] On 02/04/2010 06:04 PM, Dan Williams wrote: > On Thu, Feb 4, 2010 at 11:45 AM, Doug Ledford <dledford@redhat.com> wrote: >> To be fair, if post-hoc versus initial made any difference what so ever, >> then so would the fact that I wouldn't have chosen to have these files >> exist at all. I would have made incremental assembly work without a map >> file and I would have made imsm superblock handling be in the kernel. >> So, I'm dealing with the consequences of decisions I didn't make and >> wouldn't have made. I don't think it's then fair to put some sort of >> 'premeditated' versus 'dealing with the situation' bias on my response. > > On the argument about where to place the mdmon files I am now torn > between the "Neil" and "Doug" positions, but on the decision of where > to place imsm superblock handling I stand behind the design decision > to put it in userspace. > > 1/ If you take a look at native md superblock support you see that the > support code is duplicated between kernel-space and user space, having > it all handled in userspace means only one code base to maintain > (elegant aspect #1). Elegance is in the eye of the beholder. More on that in a minute. > 2/ The kernel can simply worry about the *mechanism* of providing raid > while all the assembly *policy* and support for any number of > superblock formats is relegated to where policy belongs (elegant > aspect #2). I would argue that dirty/clean state manipulation is *not* policy and *is* mechanism. So, by your definition of what should be in the kernel combined with my definition of what dirty/clean state manipulation is, the solution is not only not elegant, it's flat incorrect. > 2a/ This simply follows in the path of the design decision to not > support in-kernel auto-assembly of version-1 superblocks which started > the requirement to use an initramfs to boot software raid. (this is a > not so elegant aspect because it mandates an initramfs to boot, but I > don't think a general purpose distro can ever get away from that > requirement). I'm fine with needing mdadm to assemble the device. I'm not fine with needing mdmon once it's assembled. > I will say that needing to touch several software packages (kernel, > initramfs, initscripts, mdadm) to get imsm superblock support has > added some excitement to the process in the short term. Long term I > think the elegant aspects of the decision will prove their worth. I will say that needing to touch multiple software packages might not be a bad thing, but think of *how* they had to be changed. We had to add special exceptions for mdmon all over the place: kernel scheduler (for suspend/resume, mdmon can't be frozen like the rest of user space or else writing our suspend to disk image doesn't work), initramfs, initscripts after initramfs, initscripts on halt, SELinux. In all these cases, we had to take something that we want to keep simple and add special case rules and exceptions for mdmon. That pretty solidly says that while this arrangement may have been elegant for *you*, it was not elegant in the overall grand scheme of things. What would have been smart was to leave array creation, assembly, verfication, and modification to user space, but to put *all* of the raid mechanics, including superblock clean/dirty state processing and array shut down capabilities, in the kernel. Had you done that, I would have called your solution elegant. It's at this point that I feel obliged to mention that, in terms of this whole big argument, the incremental map file has at least some amount of sense belonging in /dev, it's really the mdmon .pid and .sock files that don't, and those files wouldn't even exist had you designed things as I mention here. It's the fact that you have two files per device that you should be placing in a specific place on the filesystem in order for them to be useful and adhere to standards yet the program they belong to needs to exist outside the context of any filesystem that I think is pretty strong evidence of the inelegance of this design. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2010-02-06 21:07 ` Dan Williams [not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-02-08 4:23 ` Neil Brown 1 sibling, 1 reply; 66+ messages in thread From: Dan Williams @ 2010-02-06 21:07 UTC (permalink / raw) To: Doug Ledford Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham >> 1/ If you take a look at native md superblock support you see that the >> support code is duplicated between kernel-space and user space, having >> it all handled in userspace means only one code base to maintain >> (elegant aspect #1). > > Elegance is in the eye of the beholder. More on that in a minute. > True, but let's agree that superblock formats are quirky, arbitrary and all around inelegant. Only needing to write that code once is at the very least an aid to one's sanity. >> 2/ The kernel can simply worry about the *mechanism* of providing raid >> while all the assembly *policy* and support for any number of >> superblock formats is relegated to where policy belongs (elegant >> aspect #2). > > I would argue that dirty/clean state manipulation is *not* policy and > *is* mechanism. So, by your definition of what should be in the kernel > combined with my definition of what dirty/clean state manipulation is, > the solution is not only not elegant, it's flat incorrect. You are conveniently blurring the lines between event generation and event handling. The kernel handles all the detail of detecting, notifying and reaping the event. The arbitrary superblock specific actions that need to happen in response to that event are really not very interesting to rest of the mechanism of providing raid. You could argue that I am conveniently drawing a line, and you would be right. There are convenient aspects of having this portion of the solution in userspace which do not compromise the integrity of the raid mechanism. We can now also handle spare assignment policy, hot-plug policy, corner case disagreements between a superblock's definition of a "container", all without thrashing the kernel. > >> 2a/ This simply follows in the path of the design decision to not >> support in-kernel auto-assembly of version-1 superblocks which started >> the requirement to use an initramfs to boot software raid. (this is a >> not so elegant aspect because it mandates an initramfs to boot, but I >> don't think a general purpose distro can ever get away from that >> requirement). > > I'm fine with needing mdadm to assemble the device. I'm not fine with > needing mdmon once it's assembled. > >> I will say that needing to touch several software packages (kernel, >> initramfs, initscripts, mdadm) to get imsm superblock support has >> added some excitement to the process in the short term. Long term I >> think the elegant aspects of the decision will prove their worth. > > I will say that needing to touch multiple software packages might not be > a bad thing, but think of *how* they had to be changed. We had to add > special exceptions for mdmon all over the place: kernel scheduler (for > suspend/resume, mdmon can't be frozen like the rest of user space or > else writing our suspend to disk image doesn't work), initramfs, > initscripts after initramfs, initscripts on halt, SELinux. In all these > cases, we had to take something that we want to keep simple and add > special case rules and exceptions for mdmon. That pretty solidly says > that while this arrangement may have been elegant for *you*, it was not > elegant in the overall grand scheme of things. No, nothing elegant about that, but I think you would agree this isn't something we threw over the wall and walked away from. Making mdmon more convenient to handle is hopefully an obvious priority. Yes, I know you would like to see it die, but we are where we are. > > What would have been smart was to leave array creation, assembly, > verfication, and modification to user space, but to put *all* of the > raid mechanics, including superblock clean/dirty state processing and > array shut down capabilities, in the kernel. Had you done that, I would > have called your solution elegant. > > It's at this point that I feel obliged to mention that, in terms of this > whole big argument, the incremental map file has at least some amount of > sense belonging in /dev, it's really the mdmon .pid and .sock files that > don't, and those files wouldn't even exist had you designed things as I > mention here. It's the fact that you have two files per device that you > should be placing in a specific place on the filesystem in order for > them to be useful and adhere to standards yet the program they belong to > needs to exist outside the context of any filesystem that I think is > pretty strong evidence of the inelegance of this design. > This comment makes me see Neil's argument in a different light, (hopefully I am not mischaracterizing it), but essentially we are waiting for the standards to catch up with this new class of program. FUSE, CUSE, and mdmon belong to a class of programs that move traditionally exclusive kernel space functionality to userspace. Debian's /lib/init/rw looks to be a response to this grey area of the standards (not that I have any familiarity with the LSB). -- Dan ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-02-06 21:46 ` martin f krafft 2010-02-06 22:06 ` Michael Evans 2010-02-08 15:32 ` Doug Ledford 1 sibling, 1 reply; 66+ messages in thread From: martin f krafft @ 2010-02-06 21:46 UTC (permalink / raw) To: Dan Williams Cc: Doug Ledford, Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 1987 bytes --] also sprach Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> [2010.02.07.1007 +1300]: > This comment makes me see Neil's argument in a different light, > (hopefully I am not mischaracterizing it), but essentially we are > waiting for the standards to catch up with this new class of > program. FUSE, CUSE, and mdmon belong to a class of programs that > move traditionally exclusive kernel space functionality to > userspace. Debian's /lib/init/rw looks to be a response to this > grey area of the standards (not that I have any familiarity with > the LSB). I have not read the full thread for lack of time, but I would like to chime in that I favour user-space over kernel-space any day: it makes for stabler systems, better interfaces, and easier upgrades — even though it's definitely more work for the distro maintainers. So mdmon seems like a good idea, even though some details might need to be worked out to everyone's satisfaction yet. I agree with Dan that this trend is new and that slow-moving standards like the FHS have yet to catch up. But they cannot catch up if distros don't explore the field. Debian's latest move in this exploration was indeed /lib/init/rw, but it's questionable, not only because it's a tmpfs, which makes it unusable for e.g. md bitmaps — unless we invented a place that moved to persistent storage as early as possible, in a way that would make it accessible early during the next boot. But now I am diverting the topic… -- .''`. martin f. krafft <madduck@d.o> Related projects: : :' : proud Debian developer http://debiansystem.info `. `'` http://people.debian.org/~madduck http://vcs-pkg.org `- Debian - when you have better things to do than fixing systems "there are two major products that come out of berkeley: lsd and unix. we don't believe this to be a coincidence." -- jeremy s. anderson [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-06 21:46 ` martin f krafft @ 2010-02-06 22:06 ` Michael Evans 0 siblings, 0 replies; 66+ messages in thread From: Michael Evans @ 2010-02-06 22:06 UTC (permalink / raw) To: Dan Williams, Doug Ledford, Neil Brown, linux-raid, initramfs, Michal Marek On Sat, Feb 6, 2010 at 1:46 PM, martin f krafft <madduck@debian.org> wrote: > also sprach Dan Williams <dan.j.williams@intel.com> [2010.02.07.1007 +1300]: >> This comment makes me see Neil's argument in a different light, >> (hopefully I am not mischaracterizing it), but essentially we are >> waiting for the standards to catch up with this new class of >> program. FUSE, CUSE, and mdmon belong to a class of programs that >> move traditionally exclusive kernel space functionality to >> userspace. Debian's /lib/init/rw looks to be a response to this >> grey area of the standards (not that I have any familiarity with >> the LSB). > > I have not read the full thread for lack of time, but I would like > to chime in that I favour user-space over kernel-space any day: it > makes for stabler systems, better interfaces, and easier upgrades > — even though it's definitely more work for the distro maintainers. > > So mdmon seems like a good idea, even though some details might need > to be worked out to everyone's satisfaction yet. > > I agree with Dan that this trend is new and that slow-moving > standards like the FHS have yet to catch up. But they cannot catch > up if distros don't explore the field. Debian's latest move in this > exploration was indeed /lib/init/rw, but it's questionable, not only > because it's a tmpfs, which makes it unusable for e.g. md bitmaps > — unless we invented a place that moved to persistent storage as > early as possible, in a way that would make it accessible early > during the next boot. But now I am diverting the topic… > > -- > .''`. martin f. krafft <madduck@d.o> Related projects: > : :' : proud Debian developer http://debiansystem.info > `. `'` http://people.debian.org/~madduck http://vcs-pkg.org > `- Debian - when you have better things to do than fixing systems > > "there are two major products that come out of berkeley: lsd and unix. > we don't believe this to be a coincidence." > -- jeremy s. anderson > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > > iEYEAREDAAYFAktt40AACgkQIgvIgzMMSnX4fQCgsyGhAdpfuObqWlkmBLNFI/jO > YxQAniFBRkITdqXjkkx1VgkHHNCJDbO2 > =tLKB > -----END PGP SIGNATURE----- > > Shouldn't all /state/ information be held in the kernel in some form and exported via one of the virtual filesystems? (dev, proc, sysfs) This way if some userspace need exists the file can be read. If some kernel access is required it's already within known structures. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-02-06 21:46 ` martin f krafft @ 2010-02-08 15:32 ` Doug Ledford 2010-02-08 21:38 ` Neil Brown 1 sibling, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-08 15:32 UTC (permalink / raw) To: Dan Williams Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 2373 bytes --] On 02/06/2010 04:07 PM, Dan Williams wrote: > This comment makes me see Neil's argument in a different light, > (hopefully I am not mischaracterizing it), but essentially we are > waiting for the standards to catch up with this new class of program. > FUSE, CUSE, and mdmon belong to a class of programs that move > traditionally exclusive kernel space functionality to userspace. > Debian's /lib/init/rw looks to be a response to this grey area of the > standards (not that I have any familiarity with the LSB). So if we want to argue that the standards are simply behind the times, and we need to do something that makes sense regardless of the standards, then I don't think anything in /dev or /lib makes sense. The files that need to be created pre-rw-root are varied in their type and purpose between different things. What we really need is simply an early boot /tmp area. So, why not make a top level directory that clearly delineates this nature? Something like /pre-init or /early-tmp or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for possible standards acceptance later)? I was thinking that mdmon's files would be stuck there, but then I remembered that we are doing option #3 for mdmon, restarting after the system is up and running, so only the mdmon instances from the initramfs would put their files there, the final ones would be on the real /var/run area. So, since as far as I know the mdmon .sock files were the only pre-boot files that couldn't be moved later (but effectively get moved by restarting mdmon after r/w /var/run), any and all files in /tmp/pre-pivot-root should be removed once the system is up and running, and quite possibly the filesystem could be entirely done away with. At least then the naming would be to Neil's satisfaction I think, and mine. And personally, when the standards are simply behind the times, I have no problem blazing ahead and letting them catch up when they get off their asses. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-08 15:32 ` Doug Ledford @ 2010-02-08 21:38 ` Neil Brown 2010-02-09 0:20 ` Michael Evans [not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 0 siblings, 2 replies; 66+ messages in thread From: Neil Brown @ 2010-02-08 21:38 UTC (permalink / raw) To: Doug Ledford Cc: Dan Williams, linux-raid, initramfs, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham On Mon, 08 Feb 2010 10:32:53 -0500 Doug Ledford <dledford@redhat.com> wrote: > On 02/06/2010 04:07 PM, Dan Williams wrote: > > > This comment makes me see Neil's argument in a different light, > > (hopefully I am not mischaracterizing it), but essentially we are > > waiting for the standards to catch up with this new class of program. > > FUSE, CUSE, and mdmon belong to a class of programs that move > > traditionally exclusive kernel space functionality to userspace. > > Debian's /lib/init/rw looks to be a response to this grey area of the > > standards (not that I have any familiarity with the LSB). > > So if we want to argue that the standards are simply behind the times, > and we need to do something that makes sense regardless of the > standards, then I don't think anything in /dev or /lib makes sense. The > files that need to be created pre-rw-root are varied in their type and > purpose between different things. What we really need is simply an > early boot /tmp area. So, why not make a top level directory that > clearly delineates this nature? Something like /pre-init or /early-tmp > or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or > /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux > specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for > possible standards acceptance later)? I was thinking that mdmon's files > would be stuck there, but then I remembered that we are doing option #3 > for mdmon, restarting after the system is up and running, so only the > mdmon instances from the initramfs would put their files there, the > final ones would be on the real /var/run area. So, since as far as I > know the mdmon .sock files were the only pre-boot files that couldn't be > moved later (but effectively get moved by restarting mdmon after r/w > /var/run), any and all files in /tmp/pre-pivot-root should be removed > once the system is up and running, and quite possibly the filesystem > could be entirely done away with. At least then the naming would be to > Neil's satisfaction I think, and mine. And personally, when the > standards are simply behind the times, I have no problem blazing ahead > and letting them catch up when they get off their asses. > > That's the spirit!!! Let's figure out what we really want/need, and just do it. Following my recent discovery that mdmon prevents /var from being unmounted at shutdown, I wonder if we really want something generic that persists from very early boot to very late shutdown, rather than just the early-boot part. So something like /var/run, but not dependent on /var and guaranteed to be in-memory (or swap) and created very early by initramfs. /run ??? Trivial implementation for most distros would be to make it a symlink to /dev/run. I would prefer a name a little more descriptive than "/run" - something that reflects the idea that it is particularly for early-boot or late-shutdown - but nothing comes to mind. I could probably actually live with "/dev/run" as the permanent home for the mdmon files: /dev/run/mdmon/*.{sock,pid} It addresses most of the issues I had with the original suggestion (hidden files, non-generic approach) so the "cons" are weaker. And I now understand the "pros" better (races with cleaning /var/run, issues with unmounting /var etc). Anyone second the motion? NeilBrown ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-08 21:38 ` Neil Brown @ 2010-02-09 0:20 ` Michael Evans [not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 1 sibling, 0 replies; 66+ messages in thread From: Michael Evans @ 2010-02-09 0:20 UTC (permalink / raw) To: Neil Brown Cc: Doug Ledford, Dan Williams, linux-raid, initramfs, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham On Mon, Feb 8, 2010 at 1:38 PM, Neil Brown <neilb@suse.de> wrote: > On Mon, 08 Feb 2010 10:32:53 -0500 > Doug Ledford <dledford@redhat.com> wrote: > >> On 02/06/2010 04:07 PM, Dan Williams wrote: >> >> > This comment makes me see Neil's argument in a different light, >> > (hopefully I am not mischaracterizing it), but essentially we are >> > waiting for the standards to catch up with this new class of program. >> > FUSE, CUSE, and mdmon belong to a class of programs that move >> > traditionally exclusive kernel space functionality to userspace. >> > Debian's /lib/init/rw looks to be a response to this grey area of the >> > standards (not that I have any familiarity with the LSB). >> >> So if we want to argue that the standards are simply behind the times, >> and we need to do something that makes sense regardless of the >> standards, then I don't think anything in /dev or /lib makes sense. The >> files that need to be created pre-rw-root are varied in their type and >> purpose between different things. What we really need is simply an >> early boot /tmp area. So, why not make a top level directory that >> clearly delineates this nature? Something like /pre-init or /early-tmp >> or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or >> /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux >> specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for >> possible standards acceptance later)? I was thinking that mdmon's files >> would be stuck there, but then I remembered that we are doing option #3 >> for mdmon, restarting after the system is up and running, so only the >> mdmon instances from the initramfs would put their files there, the >> final ones would be on the real /var/run area. So, since as far as I >> know the mdmon .sock files were the only pre-boot files that couldn't be >> moved later (but effectively get moved by restarting mdmon after r/w >> /var/run), any and all files in /tmp/pre-pivot-root should be removed >> once the system is up and running, and quite possibly the filesystem >> could be entirely done away with. At least then the naming would be to >> Neil's satisfaction I think, and mine. And personally, when the >> standards are simply behind the times, I have no problem blazing ahead >> and letting them catch up when they get off their asses. >> >> > > That's the spirit!!! > Let's figure out what we really want/need, and just do it. > > Following my recent discovery that mdmon prevents /var from being unmounted > at shutdown, I wonder if we really want something generic that persists from > very early boot to very late shutdown, rather than just the early-boot part. > So something like /var/run, but not dependent on /var and guaranteed to be > in-memory (or swap) and created very early by initramfs. > > /run > ??? > Trivial implementation for most distros would be to make it a symlink > to /dev/run. > > I would prefer a name a little more descriptive than "/run" - something that > reflects the idea that it is particularly for early-boot or late-shutdown - > but nothing comes to mind. > > I could probably actually live with "/dev/run" as the permanent home for the > mdmon files: /dev/run/mdmon/*.{sock,pid} > It addresses most of the issues I had with the original suggestion (hidden > files, non-generic approach) so the "cons" are weaker. And I now understand > the "pros" better (races with cleaning /var/run, issues with unmounting /var > etc). > > Anyone second the motion? > > NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > What about systems that have only devices known at compile/create time and thus might be created with a fully static /dev for extra simplicity. We should not simply expect that /dev is read-write as a system requirement. This is one reason why my previous solutions suggested using a known area and symlinking in an implementation defined way that mdadm/mdmon didn't need to know about. Maybe a good name for it is '/state' as in system state information. It would be reasonable to expect it to be a ram/swap backed filesystem for SMALL files to exist as a user-space state area for various daemons and such. However any information in there which is also potentially useful to in-kernel code should probably be re-located to an entry exposed via sysfs. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2010-02-09 2:19 ` martin f krafft [not found] ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org> 2010-02-09 20:30 ` Doug Ledford 1 sibling, 1 reply; 66+ messages in thread From: martin f krafft @ 2010-02-09 2:19 UTC (permalink / raw) To: Neil Brown Cc: Doug Ledford, Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 1961 bytes --] also sprach Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> [2010.02.09.1038 +1300]: > I could probably actually live with "/dev/run" as the permanent > home for the mdmon files: /dev/run/mdmon/*.{sock,pid} It > addresses most of the issues I had with the original suggestion > (hidden files, non-generic approach) so the "cons" are weaker. > And I now understand the "pros" better (races with cleaning > /var/run, issues with unmounting /var etc). Note that initramfs already carries /dev across the pivot_root, and initramfs already uses /dev/.initramfs to carry stuff across. I am not sure /dev/run will fly past the Debian Police. On the other hand, it would be convenient, since it'll work out-of-the-box, at least on Debian systems. I don't really like the idea of a symlink in / though. Nor do I really have a better idea. > Anyone second the motion? I am all for finding a solution that works, but I don't think it's as easy as "the standards are slow, so let's just forge ahead with mdadm only and give them something to standardise". I wouldn't mind avoiding all the bikeshedding, and maybe it'll just work, but having to change things later might possibly be a lot of trouble — after all, we don't want to break people's systems then. On the other hand, this is something that is reinitialised on every boot, isn't it? If that's the case and there don't seem to be complications with a later move, then I say: yeah, let's go ahead. -- .''`. martin f. krafft <madduck@d.o> Related projects: : :' : proud Debian developer http://debiansystem.info `. `'` http://people.debian.org/~madduck http://vcs-pkg.org `- Debian - when you have better things to do than fixing systems "when a gentoo admin tells me that the KISS principle is good for 'busy sysadmins', and that it's not an evolutionary step backwards, i wonder whether their tape is already running backwards." [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org> @ 2010-02-09 20:34 ` Doug Ledford [not found] ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-02-09 20:34 UTC (permalink / raw) To: Neil Brown, Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede [-- Attachment #1: Type: text/plain, Size: 2408 bytes --] On 02/08/2010 09:19 PM, martin f krafft wrote: > also sprach Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> [2010.02.09.1038 +1300]: >> I could probably actually live with "/dev/run" as the permanent >> home for the mdmon files: /dev/run/mdmon/*.{sock,pid} It >> addresses most of the issues I had with the original suggestion >> (hidden files, non-generic approach) so the "cons" are weaker. >> And I now understand the "pros" better (races with cleaning >> /var/run, issues with unmounting /var etc). > > Note that initramfs already carries /dev across the pivot_root, and > initramfs already uses /dev/.initramfs to carry stuff across. And the things carried in there should be able to be trivially moved to a final location. > I am not sure /dev/run will fly past the Debian Police. On the other > hand, it would be convenient, since it'll work out-of-the-box, at > least on Debian systems. I don't really like the idea of a symlink > in / though. Nor do I really have a better idea. Persuant to the comments that this should work even if /dev is not read/write, it really needs to officially be a top level directory (or else some other mount point that is separate from /dev I think, I guess it could be in /tmp itself). >> Anyone second the motion? > > I am all for finding a solution that works, but I don't think it's > as easy as "the standards are slow, so let's just forge ahead with > mdadm only and give them something to standardise". > > I wouldn't mind avoiding all the bikeshedding, and maybe it'll just > work, but having to change things later might possibly be a lot of > trouble — after all, we don't want to break people's systems then. I don't think so. Once it's all set up, any future change should be no more than a coordinate package update cycle where initscripts, mkinitrd, dracut, and a few other select packages that use the locations are all updated simultaneously. > On the other hand, this is something that is reinitialised on every > boot, isn't it? If that's the case and there don't seem to be > complications with a later move, then I say: yeah, let's go ahead. > -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2010-02-10 0:58 ` Mr. James W. Laferriere [not found] ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 66+ messages in thread From: Mr. James W. Laferriere @ 2010-02-10 0:58 UTC (permalink / raw) To: Doug Ledford Cc: Neil Brown, Dan Williams, linux-raid maillist, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede, Bill Nottingham Hello Doug , On Tue, 9 Feb 2010, Doug Ledford wrote: > On 02/08/2010 09:19 PM, martin f krafft wrote: ...snip... >> I am all for finding a solution that works, but I don't think it's >> as easy as "the standards are slow, so let's just forge ahead with >> mdadm only and give them something to standardise". >> >> I wouldn't mind avoiding all the bikeshedding, and maybe it'll just >> work, but having to change things later might possibly be a lot of >> trouble ? after all, we don't want to break people's systems then. > > I don't think so. Once it's all set up, any future change should be no > more than a coordinate package update cycle where initscripts, mkinitrd, > dracut, and a few other select packages that use the locations are all > updated simultaneously. The key words in the above are: 'select packages' & 'simultaneously' . How is the community of linux distributors going to accomplish this ? Heck some of the distributors don't even use the same names for packages that do exactly the same thing . The Simultainity is going to hurt alot more than is mentioned here . But that said , the idea of a /'name' area for this is imo a very good thing . Rather hiding it below others . Tia , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network&System Engineer | 3237 Holden Road | Give me Linux | | babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org | Fairbanks, AK. 99709 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org> @ 2010-02-10 1:33 ` Neil Brown 2010-02-10 9:46 ` Harald Hoyer [not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 0 siblings, 2 replies; 66+ messages in thread From: Neil Brown @ 2010-02-10 1:33 UTC (permalink / raw) To: Mr. James W. Laferriere Cc: Doug Ledford, Dan Williams, linux-raid maillist, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede, Bill Nottingham On Tue, 9 Feb 2010 15:58:52 -0900 (AKST) "Mr. James W. Laferriere" <babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org> wrote: > Hello Doug , > > On Tue, 9 Feb 2010, Doug Ledford wrote: > > On 02/08/2010 09:19 PM, martin f krafft wrote: > ...snip... > >> I am all for finding a solution that works, but I don't think it's > >> as easy as "the standards are slow, so let's just forge ahead with > >> mdadm only and give them something to standardise". > >> > >> I wouldn't mind avoiding all the bikeshedding, and maybe it'll just > >> work, but having to change things later might possibly be a lot of > >> trouble ? after all, we don't want to break people's systems then. > > > > I don't think so. Once it's all set up, any future change should be no > > more than a coordinate package update cycle where initscripts, mkinitrd, > > dracut, and a few other select packages that use the locations are all > > updated simultaneously. > > The key words in the above are: 'select packages' & 'simultaneously' . > How is the community of linux distributors going to accomplish this ? > Heck some of the distributors don't even use the same names for packages > that do exactly the same thing . > > The Simultainity is going to hurt alot more than is mentioned here . Simultaneity only needs to be within one host, not across all distros. I don't think it should be that hard to manage. > > But that said , the idea of a /'name' area for this is imo a very good > thing . Rather hiding it below others . Thanks. One idea that has occurred to me is that maybe /sys is the right place to put this stuff!!! If only sysfs directories could be writeable, I could write the pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a similar name. I could of course get the md module to create a file called "mdmon.pid" and allow it to be read and written much like a normal file. But I don't think I want to do that - and I couldn't use that solution for the socket in any case. Not a short-term solution, but something to keep in mind longer-term maybe... NeilBrown ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-10 1:33 ` Neil Brown @ 2010-02-10 9:46 ` Harald Hoyer [not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 1 sibling, 0 replies; 66+ messages in thread From: Harald Hoyer @ 2010-02-10 9:46 UTC (permalink / raw) To: Neil Brown Cc: Mr. James W. Laferriere, Doug Ledford, Dan Williams, linux-raid maillist, initramfs, Michal Marek, Hans de Goede, Bill Nottingham On 02/10/2010 02:33 AM, Neil Brown wrote: > One idea that has occurred to me is that maybe /sys is the right place to put > this stuff!!! If only sysfs directories could be writeable, I could write the > pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a > similar name. Another idea might also be /dev/shm/ which is also writable.. ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2010-02-10 15:49 ` Dan Williams 2010-02-10 16:06 ` Michael Evans 0 siblings, 1 reply; 66+ messages in thread From: Dan Williams @ 2010-02-10 15:49 UTC (permalink / raw) To: Neil Brown Cc: Mr. James W. Laferriere, Doug Ledford, linux-raid maillist, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede, Bill Nottingham On Tue, Feb 9, 2010 at 6:33 PM, Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> wrote: > On Tue, 9 Feb 2010 15:58:52 -0900 (AKST) > "Mr. James W. Laferriere" <babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org> wrote: >> >> But that said , the idea of a /'name' area for this is imo a very good >> thing . Rather hiding it below others . > > Thanks. > > One idea that has occurred to me is that maybe /sys is the right place to put > this stuff!!! If only sysfs directories could be writeable, I could write the > pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a > similar name. Hmm... we already have /sys/kernel/debug as a simple mount point for debugfs. What about adding /sys/kernel/init as a mount point for this tmpfs? -- Dan ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-10 15:49 ` Dan Williams @ 2010-02-10 16:06 ` Michael Evans [not found] ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 66+ messages in thread From: Michael Evans @ 2010-02-10 16:06 UTC (permalink / raw) To: Dan Williams Cc: Neil Brown, Mr. James W. Laferriere, Doug Ledford, linux-raid maillist, initramfs, Michal Marek, Hans de Goede, Bill Nottingham On Wed, Feb 10, 2010 at 7:49 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Feb 9, 2010 at 6:33 PM, Neil Brown <neilb@suse.de> wrote: >> On Tue, 9 Feb 2010 15:58:52 -0900 (AKST) >> "Mr. James W. Laferriere" <babydr@baby-dragons.com> wrote: >>> >>> But that said , the idea of a /'name' area for this is imo a very good >>> thing . Rather hiding it below others . >> >> Thanks. >> >> One idea that has occurred to me is that maybe /sys is the right place to put >> this stuff!!! If only sysfs directories could be writeable, I could write the >> pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a >> similar name. > > Hmm... we already have /sys/kernel/debug as a simple mount point for > debugfs. What about adding /sys/kernel/init as a mount point for this > tmpfs? > > -- > Dan > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Except that isn't quite accurate is it? This is less to do with init for the kernel and more to do with various pieces of system state information. /sys/early_rw Isn't very descriptive, but might make sense. It also might not quite be what we want to mean, as the files in it could also linger past root unmount as the system is brought down. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-02-11 2:30 ` Doug Ledford 0 siblings, 0 replies; 66+ messages in thread From: Doug Ledford @ 2010-02-11 2:30 UTC (permalink / raw) To: Michael Evans Cc: Dan Williams, Neil Brown, Mr. James W. Laferriere, linux-raid maillist, initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 4105 bytes --] On 02/10/2010 11:06 AM, Michael Evans wrote: > On Wed, Feb 10, 2010 at 7:49 AM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: >> On Tue, Feb 9, 2010 at 6:33 PM, Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> wrote: >>> On Tue, 9 Feb 2010 15:58:52 -0900 (AKST) >>> "Mr. James W. Laferriere" <babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org> wrote: >>>> >>>> But that said , the idea of a /'name' area for this is imo a very good >>>> thing . Rather hiding it below others . >>> >>> Thanks. >>> >>> One idea that has occurred to me is that maybe /sys is the right place to put >>> this stuff!!! If only sysfs directories could be writeable, I could write the >>> pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a >>> similar name. >> >> Hmm... we already have /sys/kernel/debug as a simple mount point for >> debugfs. What about adding /sys/kernel/init as a mount point for this >> tmpfs? >> >> -- >> Dan >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Except that isn't quite accurate is it? This is less to do with init > for the kernel and more to do with various pieces of system state > information. > > /sys/early_rw > > Isn't very descriptive, but might make sense. It also might not quite > be what we want to mean, as the files in it could also linger past > root unmount as the system is brought down. Well, *no* name really fits well so far. The fact of the matter is that we are talking about lots of different types of files, and different lifetimes for those files. Eg: dhcp lease file, only needs to be there until moved after root mounted; mdmon.sock file, can't be moved, but when mdmon is restart will get new file in proper location (unless we use this location on restart too in order to avoid shutdown issues, although I'm not convinced we need to do this, seems to me we could just as easily switch to using remount ro as the norm instead of umount and problem solved); mdmon.pid file so we know what processes to restart; other files too that I'm not so familiar with. The only thing all these files have in common is that they violate a core tenet of unix philosophy/prior art. Specifically, the concept of everything as a file in unix means that the unix kernel is not really functional without a filesystem. Hence why unix never booted into a basic interpreter without a disk, but instead always panicked. But, in the past, old time unix kernels always brought up the root filesystem before doing anything else. That is no longer true, and we are struggling to access our root filesystem to create files when the real root filesystem does not yet exist. That is the one thing all of these files have in common. That they are being created before the kernel is ready to deal with files properly. So since this is specifically a kernel not ready thing, I think /sys/kernel makes sense. Then I would suggest naming whatever we put in there according to this one common trait. I could see /sys/kernel/pre-init-tmp (or ptmp for short). If someone wanted to do some neat kernel programming maybe we could make /sys/kernel/early-root and allow programs to create files in there as well as directory hiearchies, and maybe add a syscall that would actually move all the files in here to the real root sometime after pivot root and read write bring up are complete (that would just be cool...no manually moving files, just bring root up r/w, clean out /var/run and any other cleanups we do before proceeding, then do this syscall and get things moved from early-root to the real root). Anyway, my $.02. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2010-02-09 2:19 ` martin f krafft @ 2010-02-09 20:30 ` Doug Ledford 1 sibling, 0 replies; 66+ messages in thread From: Doug Ledford @ 2010-02-09 20:30 UTC (permalink / raw) To: Neil Brown Cc: Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham [-- Attachment #1: Type: text/plain, Size: 4752 bytes --] On 02/08/2010 04:38 PM, Neil Brown wrote: > On Mon, 08 Feb 2010 10:32:53 -0500 > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> On 02/06/2010 04:07 PM, Dan Williams wrote: >> >>> This comment makes me see Neil's argument in a different light, >>> (hopefully I am not mischaracterizing it), but essentially we are >>> waiting for the standards to catch up with this new class of program. >>> FUSE, CUSE, and mdmon belong to a class of programs that move >>> traditionally exclusive kernel space functionality to userspace. >>> Debian's /lib/init/rw looks to be a response to this grey area of the >>> standards (not that I have any familiarity with the LSB). >> >> So if we want to argue that the standards are simply behind the times, >> and we need to do something that makes sense regardless of the >> standards, then I don't think anything in /dev or /lib makes sense. The >> files that need to be created pre-rw-root are varied in their type and >> purpose between different things. What we really need is simply an >> early boot /tmp area. So, why not make a top level directory that >> clearly delineates this nature? Something like /pre-init or /early-tmp >> or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or >> /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux >> specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for >> possible standards acceptance later)? I was thinking that mdmon's files >> would be stuck there, but then I remembered that we are doing option #3 >> for mdmon, restarting after the system is up and running, so only the >> mdmon instances from the initramfs would put their files there, the >> final ones would be on the real /var/run area. So, since as far as I >> know the mdmon .sock files were the only pre-boot files that couldn't be >> moved later (but effectively get moved by restarting mdmon after r/w >> /var/run), any and all files in /tmp/pre-pivot-root should be removed >> once the system is up and running, and quite possibly the filesystem >> could be entirely done away with. At least then the naming would be to >> Neil's satisfaction I think, and mine. And personally, when the >> standards are simply behind the times, I have no problem blazing ahead >> and letting them catch up when they get off their asses. >> >> > > That's the spirit!!! > Let's figure out what we really want/need, and just do it. > > Following my recent discovery that mdmon prevents /var from being unmounted > at shutdown, I wonder if we really want something generic that persists from > very early boot to very late shutdown, rather than just the early-boot part. > So something like /var/run, but not dependent on /var and guaranteed to be > in-memory (or swap) and created very early by initramfs. > > /run > ??? > Trivial implementation for most distros would be to make it a symlink > to /dev/run. > > I would prefer a name a little more descriptive than "/run" - something that > reflects the idea that it is particularly for early-boot or late-shutdown - > but nothing comes to mind. > > I could probably actually live with "/dev/run" as the permanent home for the > mdmon files: /dev/run/mdmon/*.{sock,pid} > It addresses most of the issues I had with the original suggestion (hidden > files, non-generic approach) so the "cons" are weaker. And I now understand > the "pros" better (races with cleaning /var/run, issues with unmounting /var > etc). > > Anyone second the motion? I second the idea, but I hate the name run. Mainly because it's not really descriptive to the issue at hand. I mean, everything needs to run, the part that's different about all of this is that it needs to run *pre-root-filesystem-available*. If we were to stick with unix tradition of being short and cryptic (but make sense when explained), then /ptmp -> /dev/ptmp might work with the explanation being that it's a pre-init temporary file area (or pre-root-filesystem temporary file area). Of course, the /ptmp -> /dev/ptmp would only be if we are using the dev filesystem for simplicity's sake. However, as someone else mentioned, realistically if we want this to be accepted, it should not be dependent on udev and a tmpfs based dev directory, it should stand on its own. Meaning that it *should* be its own minor tmpfs filesystem mounted at /ptmp. But maybe that's something to work toward versus a first step. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2010-02-06 21:07 ` Dan Williams @ 2010-02-08 4:23 ` Neil Brown 1 sibling, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-02-08 4:23 UTC (permalink / raw) To: Doug Ledford Cc: Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham On Sat, 06 Feb 2010 12:51:02 -0500 Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > I will say that needing to touch multiple software packages might not be > a bad thing, but think of *how* they had to be changed. We had to add > special exceptions for mdmon all over the place: kernel scheduler (for > suspend/resume, mdmon can't be frozen like the rest of user space or > else writing our suspend to disk image doesn't work), initramfs, > initscripts after initramfs, initscripts on halt, SELinux. In all these > cases, we had to take something that we want to keep simple and add > special case rules and exceptions for mdmon. That pretty solidly says > that while this arrangement may have been elegant for *you*, it was not > elegant in the overall grand scheme of things. > or it just means we are breaking new ground :-) The suspend/resume issue you bring up is an important one and to my mind is currently unsolved. Based on my limited understanding of hibernation, I think that mdmon should be to quiesce (but not actually be frozen) prior to taking the in-memory snapshot, then thawed prior to writing that snapshot out to disk. Further when it is thawed after resume-from-disk it needs to know it has been thawed so it can check the metadata on-disk to see if any failure happened while it slept. Similar thing would be needed for suspend through fuse. Do you know exactly what was done to the scheduler in redhat? NeilBrown ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot [not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2010-02-04 23:04 ` Dan Williams @ 2010-02-07 22:13 ` Hans de Goede 2010-02-07 23:06 ` Neil Brown 1 sibling, 1 reply; 66+ messages in thread From: Hans de Goede @ 2010-02-07 22:13 UTC (permalink / raw) To: Doug Ledford Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA, initramfs-u79uwXL29TY76Z2rM5mHXA, Dan Williams, martin f krafft, Michal Marek, Bill Nottingham Hi All, On 02/04/2010 07:45 PM, Doug Ledford wrote: > On 02/04/2010 01:40 AM, Neil Brown wrote: >> <snip> >> Because we want to unmount and completely discard the filesystem that holds >> the mdmon binary that was run early, we need to kill it and start a new one >> running from final namespace. This is also needed as to a small extent the >> filesystem is used to communicate between mdadm and a running mdmon, and >> having them have the same root is less confusing. >> >> There are three ways we can achieve this. >> >> 1/ If we can assume that between the time when the original "mount" completes >> and when the "mount -o remount,rw" happens the filesystem doesn't write to >> the device, then we can simply kill mdmon after the root is mounted, and >> restart it before remounting. However I don't trust filesystem >> implementers so I won't recommend that. >> >> 2/ Before the pivot root we can kill the old mdmon and start the new one >> chrooted into the final root. >> 3/ After the pivot root we can kill the old mdmon and start the new one. >> >> Number 2 is the approach that we (Well mostly Dan) originally intended and >> that the code implements ... or tries to. It got broken and I never >> noticed. I think I have fixed it now for 3.1.2. > > Note, as I recall, Hans switched things to be #3 for various reasons. > That he switched it to #3 doesn't effect mdmon really, as it still is > just killing and restarting, but doing it after the pivot root solved a > couple issues. I don't recall what they were, you would have to talk to > Hans about that. > The reasons I made this change was that although the mdmon takeover mechanism was designed to be used as 2., at the time I was integrating this code in to Fedora and tying all bits together the mdmon code for doing 2 was very very broken. Back then I've send Dan a long list of issues with it, which I believe are all fixed now. But as using option 3. just worked from the time I integrated this and has stayed working. I've never seen a need to switch things back to 2. again and given that 2. requires all kind of trickery and is hard to get right, where as 3. is pretty easy to get right, and much less prone to break (regress) I think that staying with 3. is a good solution / decision. As for the whole were to store mdmon .pid and .sock files, my 2cents is that /dev is the only dir where a socket file (which cannot be moved cross filesystems) can be made in the initramfs and still be accessible from the real root, and other things like /lib/whythefuckputthisinslashlib/rw, can only be implemented by: 1) adding a second tmpfs which stays living after the chroot to the real root. 2) symlinks which need to be both present on the real and the initramfs, with the big problem being ensuring they are there on the read only root fs from the initramds. Both of which is needlessly complicated and fragile. So as for as I'm concerned Fedora and the next RHEL will have these files under /dev. And if upstream does not want this, then we will just keep patching mdadm / mdmon to do this till the end of time. Note that /dev is already (ab)used in the same way for passing dhcp leases from the initramfs to the running system when / lives on a network device, and a few other state things which need to be passed between the initramfs and the real root. Pretty? No but effective and simple, and anytime you have this state passing problem the most likely solution you will end up with, because it is KISS and KISS is good. Regards, Hans ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-07 22:13 ` Hans de Goede @ 2010-02-07 23:06 ` Neil Brown 0 siblings, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-02-07 23:06 UTC (permalink / raw) To: Hans de Goede Cc: Doug Ledford, linux-raid, initramfs, Dan Williams, martin f krafft, Michal Marek, Bill Nottingham On Sun, 07 Feb 2010 23:13:49 +0100 Hans de Goede <hdegoede@redhat.com> wrote: > Both of which is needlessly complicated and fragile. So as for as I'm concerned > Fedora and the next RHEL will have these files under /dev. And if upstream > does not want this, then we will just keep patching mdadm / mdmon to do this > till the end of time. Note that /dev is already (ab)used in the same way > for passing dhcp leases from the initramfs to the running system when / lives > on a network device, and a few other state things which need to be passed > between the initramfs and the real root. > > Pretty? No but effective and simple, and anytime you have this state passing > problem the most likely solution you will end up with, because it is > KISS and KISS is good. You admit that /dev is being abused, yet you seem proud of it. Odd. Maybe I misunderstand. Your dhcp lease example is perfect (thanks!) for demonstrating that something is needed beyond devices. i.e. some sort of generic place to pass files from 'before' and 'after' pivot root is needed. The thing I like about /lib/init/rw is that it is clearly admitting this need and trying to address it. I have no particular attachment to the name (and would much rather use /var/run!) but it is the honesty and forward thinking that I like. By contrast /dev/.udev seems dishonest (as it tries to hide) and not forward thinking (as it appear to be udev specific). If it was /dev/udev or /dev/UDEV it would be better. If it was /dev/RUN/udev it would be better still. Though I would really like the carry-over filesystem to be /init and it contain 'dev' and 'var/run' and anything else needed, and after pivotroot, the interesting parts are bind mounted to their final home. mount --bind /init/dev /dev Yes. "Keep it simple" is very important. So is being generic and forward-looking. I haven't seen much evidence of being forward looking in the various suggestions and reference examples that have been put forward. Yes, the only difference among a lot of the options put forward is the name of a directory. Are names really so important. Emphatically YES. They guide the way people think. Bad names confuse people, good names educate people. So my leaning is still to default to the best name currently available which seems to be /lib/init/rw, and to make it easy to choose a distro-specific name at compile time. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-04 18:45 ` Doug Ledford [not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2010-02-08 3:45 ` Neil Brown 2010-02-08 16:56 ` Bill Nottingham 1 sibling, 1 reply; 66+ messages in thread From: Neil Brown @ 2010-02-08 3:45 UTC (permalink / raw) To: Doug Ledford Cc: linux-raid, initramfs, Dan Williams, martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham On Thu, 04 Feb 2010 13:45:07 -0500 Doug Ledford <dledford@redhat.com> wrote: > On 02/04/2010 01:40 AM, Neil Brown wrote: > > > > [cc:ing initramfs because anther part of this thread was already > > cc:ed there, but this is the one I wanted to reply to. > > cc:ed to various md/mdadm maintainers too] > > > > On Tue, 19 Jan 2010 12:51:52 -0500 > > Doug Ledford <dledford@redhat.com> wrote: > > > >> On 01/18/2010 05:09 PM, Neil Brown wrote: > >>> On Mon, 11 Jan 2010 15:38:11 -0500 > >>> Doug Ledford <dledford@redhat.com> wrote: > >>> > >>>> Signed-off-by: Doug Ledford <dledford@redhat.com> > >>> > >>> I really really don't like this. > >>> I wasn't very keen on allowing the map file to be found in /dev, > >>> but this it just too ugly. > >> > >> I've had to rewrite my response to this a few times :-/ > >> > >> So, let's be clear: you are objecting to these non device special files > >> being located under /dev. Not necessarily *where* they are under /dev, > >> just that they are under /dev at all. That's what I get from your > >> statement above. > >> > >> First with devfs, then later with udev, the old unix tradition of only > >> device special files under /dev is truly dead. And it should be. The > >> files we are creating are needed prior to / filesystem bring up, and > >> they are needed simply in order to fully populate /dev. In fact, an > >> argument can be made that a new tradition, that files related to the > >> creation and maintenance of device special files belong under /dev with > >> the files they relate to, has been created. And this new tradition > >> makes sense and is elegant on the basis that it requires only one > >> read/write filesystem mount point during device special file population. > >> It also makes sense that this new tradition would supersede the old > >> tradition on the basis that the old tradition was created prior to the > >> advent of hot plug and the need to have any read/write data just to > >> populate your device special files. The old tradition didn't have the > >> flexibility to deal with modern hot plug architectures, the new > >> tradition fixes that, and does so as elegantly as possible. > >> > >> That being the case, the big player in the game, udev, is following the > >> new tradition by creating an entire tree of non device special files > >> under /dev/.udev and using that to store the information it needs. And > >> here mdadm/mdmon are, the small players in the device bring up game that > >> only have minor bit parts compared to udev, holding up progress and > >> playing the recalcitrant old fart. Sorry Neil, but the war has already > >> been decided and this is a dead battle. Files related to device special > >> file bring up belong under /dev along with the files we are creating. > >> Your claim that these changes are ugly are misplaced and based upon > >> adherence to a dead tradition that has been replaced by a more sensible > >> tradition. Maybe you don't like where they are under /dev, but the fact > >> that they are under /dev is definitely the right thing to do and is not > >> in the least bit ugly. > >> > >>> I understand there is a problem here, but I don't like this approach to a > >>> solution. I'll give it more though when I get home from LCA2010 and see > >>> what I can come up with. > >> > >> Feel free to come up with something different. But, if your solution > >> involves maintaining an additional read/write mount area in deference to > >> a long dead unix tradition, I'm just going to shake my head and patch > >> your solution away to something sane. > >> > > > > So I've had a good long think about this. > > > > Your arguments about using /dev do have some merit. However they sound more > > like post-hoc justification then genuine motivation. > > If the train of thought went: > > I need some files that are related to device management. Where shall I > > put them? I know, I'll put them in /dev. > > then it would be more convincing. But the logic actually went: > > I need some files to persist from early boot through to when the system > > has all basic filesystems mounted. Where shall I put them? I know, I'll > > put them in /dev. > > That sounds a lot less convincing. > > To be fair, if post-hoc versus initial made any difference what so ever, > then so would the fact that I wouldn't have chosen to have these files > exist at all. I would have made incremental assembly work without a map > file and I would have made imsm superblock handling be in the kernel. > So, I'm dealing with the consequences of decisions I didn't make and > wouldn't have made. I don't think it's then fair to put some sort of > 'premeditated' versus 'dealing with the situation' bias on my response. > > > Given that chain of thought I would be more likely to come to the conclusion > > "I know, I'll put them in /lib/init/rw". Or at least I would on Debian - > > I don't know that any non-Debian-derived distros support that directory. > > I have no idea. Not one of the files in question belongs there any more > than in /dev or anywhere else for that matter though, so I wouldn't come > to that conclusion in your shoes. But I find it somewhat disheartening > to hear you disparage my choice to put the files in /dev because "I just > wanted someplace to throw them" and then you would suggest /lib/init/rw I think names are really important. If you were suggesting /dev/init/rw I wouldn't be able to suggest that /lib/init/rw is any better. But I think it is better than /dev/. > > But there is still a problem that needs to be solved. > > > > mdmon needs to be running before any a certain class of md arrays (those with > > user-space managed metadata) can be written to. Because some filesystems > > choose to write to the device even when the filesystem is mounted read-only > > (which should be a hanging offence, but isn't yet) > > Just to sidestep a second on the filesystem issue, there are only two > choices when it comes to filesystems: allow them to be mounted read only > (truly read only) and inconsistent or pseudo read only (where the > filesystem itself is the only thing that writes to the filesystem) and > be able to guarantee consistency. The only way for a journaled > filesystem to provide the guarantee it does is that it writes to the > device during mount even if its a read only mount. This is because they > guarantee to always be able to *restore* a filesystem to a sane state, > not that it will always *be* in a sane state. If they didn't do that > restore on mount, then possibly the thing that is inconsistent is > /sbin/init and the machine doesn't boot. In other words, the point of a > journaled filesystem would be wasted if they didn't do what they do. > The only other option is to do the replay in page cache and allow the > page cache and physical device to differ until the filesystem goes read > write, but I'm not sure that level of complexity is warranted or > advisable, especially since it could easily confuse anything that tries > to read from the disks directly. The other other option is to build a lookup table from the journal (a TLB ??) and at the very last step before reading from storage, map the sector address through this lookup table and thus possibly read from the journal instead from from the main FS. I'm fairly sure this would work for ext3 journals. I'm less confident of XFS simply because I am less familar with them. This would not necessary present a filesystem that is completely consistent from a 'write' perspective (there could be allocated inodes that aren't referenced and maybe the free-space bitmaps might not be 100%). But it should give all the consistency for reading from the filesystem, which is all you need. Yes, it is added complexity in the filesystem, but not much I think, and very localised. > > > we potentially need mdmon > > running before the root filesystem is mounted. > > > > Because we want to unmount and completely discard the filesystem that holds > > the mdmon binary that was run early, we need to kill it and start a new one > > running from final namespace. This is also needed as to a small extent the > > filesystem is used to communicate between mdadm and a running mdmon, and > > having them have the same root is less confusing. > > > > There are three ways we can achieve this. > > > > 1/ If we can assume that between the time when the original "mount" completes > > and when the "mount -o remount,rw" happens the filesystem doesn't write to > > the device, then we can simply kill mdmon after the root is mounted, and > > restart it before remounting. However I don't trust filesystem > > implementers so I won't recommend that. > > > > 2/ Before the pivot root we can kill the old mdmon and start the new one > > chrooted into the final root. > > 3/ After the pivot root we can kill the old mdmon and start the new one. > > > > Number 2 is the approach that we (Well mostly Dan) originally intended and > > that the code implements ... or tries to. It got broken and I never > > noticed. I think I have fixed it now for 3.1.2. > > Note, as I recall, Hans switched things to be #3 for various reasons. > That he switched it to #3 doesn't effect mdmon really, as it still is > just killing and restarting, but doing it after the pivot root solved a > couple issues. I don't recall what they were, you would have to talk to > Hans about that. > > And you left part of the issue out. Yes, all the before bring up stuff > is true, but also true is that we want mdmon to hang around longer than > anyone else. By the time mdmon is ready to be shutdown, /var/run is > once again read only. So clean up can't be done. On the other hand, if > the files for mdmon are on a temporary filesystem that is rebuilt at > every boot...you get the point. Yes, I have not been thinking much about the shutdown side of the equation. Cleanup isn't an issue - you do not need to clean up /var/run when shutting down because it always happens on boot (and won't happen on a crash anyway). The only possible issue that I can see is if you want to unmount /var before setting / to read-only. You won't be able to do this because mdmon holds an open file descriptor on /var. So instead of unmounting /var you would need to remount it read-only, and then remount '/' read-only. Is that going to be a problem? > > > However it requires that /var/run exists and is writeable during early boot. > > I'm not sure that I am really comfortable requiring that. If the contents > > of /var/run are not going to persist then it would be better if they didn't > > exist. mdadm current relies on that non-existence for proper handing of the > > "mapfile". > > Can you explain this? I see nothing in the sources that tells me what > you mean by the non-existence of /var/run causing the mapfile to be > handled properly (and I'm not sure that's a valid requirement to put on > the system anyway because now you are dictating that if another early > boot application needs read only access to /var/run and we create > /var/run for that purpose, then it would in some way break mdadm's > operation). When mdadm writes to the "mapfile" it tries to create it in /var/run. If that doesn't work it tries to create it in /dev. So if /var/run exists and is writeable during early boot the mapfile will be created there. If this is not preserved then the information that was stored in the mapfile will be lost. The code for this is all very early in mapfile.c Yes, I agree that requiring the non-existence of /var/run is somewhat fragile. I hadn't completely thought that through until I wrote the above quoted text. Is it a reasonable requirement? I would like to think so as having a /var/run that spontaneously disappears would seem to break the principle of least surprise. Unfortunately I don't like the alternatives (though clearly you do). However ... as I note below, this might be a non-issue. There may not really be any need to preserve the mapfile across pivot_root. > > - the "official" homes for the pid and unix-domain-sock are in /var/run > > (preferably /var/run/mdadm/ but Doug said something about needing > > /var/run/mdmon/ to placate the monster that is SELinux - I need more > > information about that). > > mdmon does not need access to sendmail, so it should not be in the same > context as the mdadm files. This allows a more restrictive set of perms > on mdmon than on mdadm itself. If we put the mdmon files in > /var/run/mdadm, then they will have to have the same context as mdadm, > and because mdadm does so many things, it's already got an overly > liberal set of permissions compared to what mdmon realistically needs. And you cannot allow two programs in different contexts to write to the same directory? Am I going to have to learn how SELinux works ?(he asked with dread). Would it work to use /var/run/mdadm/mdmon ?? I'm not necessarily suggesting that, just scoping out the range of options. > > > When mdadm wants to communicate with mdmon it always looks there. > > > > - There is an alternative home which is /lib/init/rw/mdadm/ by default, > > What happens to the files later in the boot process. Are they left > here? Or are they migrated to an appropriate location later? If they > are just left here, then this makes even *less* sense than putting the > files under /dev as you've created a diversion zone in the filesystem. > Someplace to throw things that *should* be elsewhere and then leave them > there. Hopefully nothing gets left here. And if nothing gets left > here, then whether the temporary spot is > /dev/gonna_be_deleted_after_stuff_is_moved_out or /lib/init/rw makes no > real difference except in the complexity of the initramfs, and more > complex is more prone to break so I go with the single rw mount point/area. The $dev.pid and $dev.sock files belong to the running mdmon. When we kill the initramfs mdmon and start a new one, these files are removed and new ones are created in /var/run. If /var/run is not writeable they are created in the alternate until /var/run becomes writeable (we monitor /proc/mounts for changes) and then remove and recreate the files. The mapfile is read from the alternate if it doesn't exist in /var/run, and written to /var/run if possible when a write is needed. So it is effectively copied at the first update. And as I said elsewhere, I think names are very important, in part because people copy them. And /dev/temp_place_for_files_carried_over_from_initramfs/ would be a lot better than /dev/.mdmon as the purpose would be obvious and the example set for others would be clear. I would put things in /dev/temp_place_for_files_carried_over_from_initramfs/var/run/mdmon I think. > > > - mdadm when run in the "take over from previous instance" mode will > > look in /lib/init/rw/mdadm for the relevant .pid and .sock files if they > > aren't in /var/run/mdadm > > Now I'm a bit concerned. What happens when the new program starts up? > If /var/run is now read/write, will the new mdmon then write the files > in /var/run/mdadm (or mdmon)? If it does do this in preference to > /lib/init/rw/mdadm, which I would expect because if it doesn't then the > issue that Bill Davidson brought up about the issue not being files > under /dev but actually being certain files *not* being under /var/run > creeps right back up. So, are you going to symlink /var/run/mdadm (or > mdmon) to /lib/init/rw/mdadm? If so, then you are now doing *exactly* > as I proposed except in /lib/init/rw/mdadm instead of something like > /dev/md/.mdadm. If you don't, then I foresee problems in your future in > that when mdmon is restarted in the root context, it will write files in > the real /var/run/mdadm directory, but before mdmon ever shuts down, the > / filesystem will be readonly, and so those files will never get > cleaned, and on the next boot you will have stale files there that you > will have to workaround when it comes mdmon restart time as you'll need > to ignore or clean out /var/run/mdadm and then use the ones in > /lib/init/rw/mdadm instead. I'm sorry Neil, but this is sounding uglier > and uglier by the minute, not elegant. But /var/run is cleaned by init scripts. All non-directories are removed. I'm fairly sure that all distros do this. I guess that means that mdmon might find it's .pid and .sock files get removed after it has created them, which would be embarrassing. (Of course if /var/run were a tmpfs, there would be no need for embarrassment...). .... no, that should be a problem. As long as we run the mdmon --all / after /var/run has been mounted and clean all should be happiness. No, I'm not suggesting symlinks. The "alternate" location is only used temporarily to carry information across from before to after pivot_root. > > > - mdmon.8 will list the various options with details. > > > > > > So I get to maintain a Unix tradition which might still have some life it > > after all, and Doug gets a very easy way to patch in his own version of > > sanity. > > > > (comments always welcome - I have made the changes described above and pushed > > them to git://neil.brown.name/mdadm, but it isn't to late to change it > > completely if that turns out to be best) > > I made my proposal in another email. But, I didn't necessarily argue > for it. Since you've argued for yours, and since this is going to a > mailing list that I don't think significant parts of the original thread > went to, I'll present mine with the arguments. > > Let's look at this on a file by file basis. First, for mdadm: > > mdadm.map - incremental map file, needs to be read/write before / is > read/write if using incremental assembly on root array. Used to be > stored in /var/run/mdadm/mdadm.map. This isn't read/write early enough, > so incremental assembly would break. Neil noted something above about > if /var/run/mdadm doesn't exist and isn't writable then mdadm does > something different in mdadm current, but I looked in the git repo and > could not see where the specific problem a readonly /var/run caused > would be fixed, so I'll assume for now that a readonly /var/run is still > just as broken as before. We moved the file to /dev/md/.mdadm.map, but > Neil didn't like that and made it /dev/.mdadm.map instead. I would > actually propose /dev/md/incremental.map as it A) isn't hidden and I > believe it shouldn't be hidden because of E later on, B) clearly > indicates the purpose of the file, C) would be in an md specific/owned > area of /dev, D) is unlike to ever conflict with someone's desired md > device name, E) is a file specific to the enumeration and bring up of md > device special files and as such can be argued to belong in /dev anyway, > and F) solves the problem of needing a read/write /var/run for > incremental assembly to work. The mapfile isn't used only for incremental assembly, so "incremental.map" wouldn't be a good name. There are (if I remember correctly) two main uses for the "mapfile". The first is as a cache for the mapping from UUID to md device (major/minor number). This is particularly need for Incremental mode so that when a new device is found, it is easy to find if an md device already is (partially) assembled for that array. Being a cache, this information can be recreated at any time - simply read the meta from some device in each array in record the UUID. This can be done with mdadm --incremental --rebuild-map (or mdadm -Ir). I think "mdadm --incremental" might even do this transparently if the mapfile cannot be found. The other use is to record the 'name' of the array. This 'name' might be extracted from the metadata (if the metadata stores a name), might be specified on the mdadm command line or in /etc/mdadm.conf, or might be generated from the metadata, the chosen minor device and other 'random' information to generate a unique name in cases where a clash with a preexisting name cannot be ruled out and would be inconvenient. This name is used by the udev rules to tell udev what name to create in /dev/md/. This isn't a pure cache as the name may be based on user input, or on the order of array discovery. However the names created by "mdadm -Ir" during boot should be the same as any names generated by mdadm calls in the initramfs unless there were significant differences between mdadm.conf in initramfs versus the final root. So it is probable that we don't need to preserve the mapfile across pivot_root. I think we did before, but there have been a number of improvements in --incremental since then, particularly the auto-generation of the mapfile. > > mdadm.pid - this is only used my mdadm in monitor mode, which is not > started until after the filesystem is read/write. This can safely > reside in /var/run/mdadm as it does today, no changes needed. Agreed. > > Now the files for mdmon: > > devname.pid, devname.sock - we use one mdmon per imsm array and each > mdmon has its own pid and sock file named after the array it is > watching. The problem being that if our root filesystem is on one of > these imsm arrays, we need mdmon up and running so it can mark the array > dirty because we will likely cause writes via possible journal replays > as we mount root. Likewise, even though there is code in mdmon to clean > up the pid/sock files, if we are talking about the mdmon for the root > filesystem, that cleanup can't happen as we need mdmon around to mark > the array clean after the final writes from going readonly are complete > (and in fact, during the final halt script on Fedora, we specifically > exclude *all* mdmon instances from the last killall that we do, then we > call mdadm to --wait-clean so we know that all the mdmons have marked > the devices clean after the readonly remount, then we reboot, so we > don't even kill the mdmon programs, ever). That means they will never > clean up their sock and pid files. As it turns out, being on a tmpfs, > permanently, is best for the mdmon files. We need them to be written > before the system comes up, and we need them to stick around while the > system goes down (we actually read the pid files to find what pids to > omit from the global killall we do), but we also want them to go away > when we reboot. So, location wise, /dev isn't necessarily the right > place for them. However, now that we use udev for dev, semantic wise > it's perfect. And we do have the one argument that they are at least > related to the bring up and take down of device special files. So, for > these files, I would actually argue for either /dev/.udev/mdmon with a > symlink from /var/run/mdmon to this location, or for /dev/md/.mdmon, > again with a symlink from /var/run/mdmon. Points where I differ are: 1/ clean-up: it is a non-issue. initscripts already do that. 2/ udev model: I don't agree that it is a good model to copy. > > So that's my suggestion for how to handle this stuff. > Thanks. Following this step in the discussion I plan to: 1/ remove the 'switchroot' option (option 2 in a previous Email). from mdmon. I don't think anyone will use it and it has no convincing benefit, and some real costs. 2/ remove the watching of /proc/mounts to see when /var becomes writeable. Rather I will require (and document) and /var/run/ should be writeable (and cleaned) before mdmon --all is run to take over from any mdmon that might still be running from the initramfs. This removes any possible race with automatic cleaning of /var/run/ 3/ Document that at mdmon may prevent /var from being unmounted and recommend "-o remount,ro" as an alternative. 4/ Use the "alternate run" directory as an alternate location for the mapfile, rather than explicitly using /dev/.mdadm.map. I should have done this before, but forgot. If we get two (or more) distros agreeing on a generic name for a scratch area to carry files over from before the pivot_root, then I will certainly consider using that rather than /lib/init/rw, even if it is in /dev. Hopefully it will not have a leading '.' in any name component. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot 2010-02-08 3:45 ` Neil Brown @ 2010-02-08 16:56 ` Bill Nottingham 0 siblings, 0 replies; 66+ messages in thread From: Bill Nottingham @ 2010-02-08 16:56 UTC (permalink / raw) To: Neil Brown Cc: Doug Ledford, linux-raid, initramfs, Dan Williams, martin f krafft, Michal Marek, Hans de Goede Neil Brown (neilb@suse.de) said: > Yes, I have not been thinking much about the shutdown side of the equation. > Cleanup isn't an issue - you do not need to clean up /var/run when shutting > down because it always happens on boot (and won't happen on a crash anyway). > The only possible issue that I can see is if you want to unmount /var before > setting / to read-only. You won't be able to do this because mdmon holds an > open file descriptor on /var. > So instead of unmounting /var you would need to remount it read-only, and > then remount '/' read-only. > > Is that going to be a problem? It's certainly a change in behavior. Historically all non-root filesystems can be cleanly unmounted, then root is marked read-only, then you halt/reboot. > The first is as a cache for the mapping from UUID to md device (major/minor > number). This is particularly need for Incremental mode so that when a new > device is found, it is easy to find if an md device already is (partially) > assembled for that array. > Being a cache, this information can be recreated at any time - simply read > the meta from some device in each array in record the UUID. This can be > done with > mdadm --incremental --rebuild-map > (or mdadm -Ir). > I think "mdadm --incremental" might even do this transparently if the mapfile > cannot be found. This seems like it could be integrated with the udev database, could it not? (Whether or not you want this dependency is another matter.) > 3/ Document that at mdmon may prevent /var from being unmounted and > recommend "-o remount,ro" as an alternative. As said above, I think this is a problem. Bill ^ permalink raw reply [flat|nested] 66+ messages in thread
* [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later 2010-01-11 20:38 Minor mdadm fixes Doug Ledford 2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford 2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford @ 2010-01-11 20:38 ` Doug Ledford 2010-01-18 22:02 ` Neil Brown 2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford ` (3 subsequent siblings) 6 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw) To: linux-raid; +Cc: Doug Ledford Signed-off-by: Doug Ledford <dledford@redhat.com> --- Detail.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Detail.c b/Detail.c index 0e47a05..ba07c83 100644 --- a/Detail.c +++ b/Detail.c @@ -174,7 +174,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) if (sra && sra->array.major_version < 0) printf("MD_METADATA=%s\n", sra->text_version); else - printf("MD_METADATA=%d.%02d\n", + printf("MD_METADATA=%d.%d\n", array.major_version, array.minor_version); } @@ -226,7 +226,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) if (sra && sra->array.major_version < 0) printf(" metadata=%s", sra->text_version); else - printf(" metadata=%d.%02d", + printf(" metadata=%d.%d", array.major_version, array.minor_version); } @@ -259,7 +259,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) if (sra && sra->array.major_version < 0) printf(" Version : %s\n", sra->text_version); else - printf(" Version : %d.%02d\n", + printf(" Version : %d.%d\n", array.major_version, array.minor_version); } -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later 2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford @ 2010-01-18 22:02 ` Neil Brown 0 siblings, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-01-18 22:02 UTC (permalink / raw) Cc: linux-raid, Doug Ledford On Mon, 11 Jan 2010 15:38:12 -0500 Doug Ledford <dledford@redhat.com> wrote: > Signed-off-by: Doug Ledford <dledford@redhat.com> Applied, thanks. NeilBrown > --- > Detail.c | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Detail.c b/Detail.c > index 0e47a05..ba07c83 100644 > --- a/Detail.c > +++ b/Detail.c > @@ -174,7 +174,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) > if (sra && sra->array.major_version < 0) > printf("MD_METADATA=%s\n", sra->text_version); > else > - printf("MD_METADATA=%d.%02d\n", > + printf("MD_METADATA=%d.%d\n", > array.major_version, array.minor_version); > } > > @@ -226,7 +226,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) > if (sra && sra->array.major_version < 0) > printf(" metadata=%s", sra->text_version); > else > - printf(" metadata=%d.%02d", > + printf(" metadata=%d.%d", > array.major_version, array.minor_version); > } > > @@ -259,7 +259,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) > if (sra && sra->array.major_version < 0) > printf(" Version : %s\n", sra->text_version); > else > - printf(" Version : %d.%02d\n", > + printf(" Version : %d.%d\n", > array.major_version, array.minor_version); > } > ^ permalink raw reply [flat|nested] 66+ messages in thread
* [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out 2010-01-11 20:38 Minor mdadm fixes Doug Ledford ` (2 preceding siblings ...) 2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford @ 2010-01-11 20:38 ` Doug Ledford 2010-01-18 22:03 ` Neil Brown 2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford ` (2 subsequent siblings) 6 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw) To: linux-raid; +Cc: Doug Ledford Signed-off-by: Doug Ledford <dledford@redhat.com> --- Detail.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/Detail.c b/Detail.c index ba07c83..e05ba10 100644 --- a/Detail.c +++ b/Detail.c @@ -203,6 +203,11 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) if (mp && mp->path && strncmp(mp->path, "/dev/md/", 8) == 0) printf("MD_DEVNAME=%s\n", mp->path+8); + if (mp && (mp->uuid[0] || mp->uuid[1] || mp->uuid[2] || + mp->uuid[3])) + printf("MD_UUID=%08x:%08x:%08x:%08x\n", + mp->uuid[0], mp->uuid[1], mp->uuid[2], + mp->uuid[3]); } goto out; } -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out 2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford @ 2010-01-18 22:03 ` Neil Brown 0 siblings, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-01-18 22:03 UTC (permalink / raw) Cc: linux-raid, Doug Ledford On Mon, 11 Jan 2010 15:38:13 -0500 Doug Ledford <dledford@redhat.com> wrote: > Signed-off-by: Doug Ledford <dledford@redhat.com> Already have this functionality since commit aae5a11207cf6da1682e6a76e116a19e21473f03 13Oct 2009. Thanks, NeilBrown > --- > Detail.c | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/Detail.c b/Detail.c > index ba07c83..e05ba10 100644 > --- a/Detail.c > +++ b/Detail.c > @@ -203,6 +203,11 @@ int Detail(char *dev, int brief, int export, int test, char *homehost) > if (mp && mp->path && > strncmp(mp->path, "/dev/md/", 8) == 0) > printf("MD_DEVNAME=%s\n", mp->path+8); > + if (mp && (mp->uuid[0] || mp->uuid[1] || mp->uuid[2] || > + mp->uuid[3])) > + printf("MD_UUID=%08x:%08x:%08x:%08x\n", > + mp->uuid[0], mp->uuid[1], mp->uuid[2], > + mp->uuid[3]); > } > goto out; > } ^ permalink raw reply [flat|nested] 66+ messages in thread
* [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file 2010-01-11 20:38 Minor mdadm fixes Doug Ledford ` (3 preceding siblings ...) 2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford @ 2010-01-11 20:38 ` Doug Ledford 2010-01-18 22:03 ` Neil Brown 2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere 2010-01-18 22:05 ` Neil Brown 6 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw) To: linux-raid; +Cc: Doug Ledford Signed-off-by: Doug Ledford <dledford@redhat.com> --- config.c | 11 ++++++++++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/config.c b/config.c index c962afd..2943221 100644 --- a/config.c +++ b/config.c @@ -677,12 +677,21 @@ void homehostline(char *line) static char *auto_options = NULL; void autoline(char *line) { + char *w; + if (auto_options) { fprintf(stderr, Name ": AUTO line may only be give once." " Subsequent lines ignored\n"); return; } - auto_options = line; + + auto_options = dl_strdup(line); + dl_init(auto_options); + + for (w=dl_next(line); w != line ; w=dl_next(w)) { + char *w2 = dl_strdup(w); + dl_add(auto_options, w2); + } } int loaded = 0; -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file 2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford @ 2010-01-18 22:03 ` Neil Brown 0 siblings, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-01-18 22:03 UTC (permalink / raw) Cc: linux-raid, Doug Ledford On Mon, 11 Jan 2010 15:38:14 -0500 Doug Ledford <dledford@redhat.com> wrote: > Signed-off-by: Doug Ledford <dledford@redhat.com> Applied, thanks. NeilBrown > --- > config.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/config.c b/config.c > index c962afd..2943221 100644 > --- a/config.c > +++ b/config.c > @@ -677,12 +677,21 @@ void homehostline(char *line) > static char *auto_options = NULL; > void autoline(char *line) > { > + char *w; > + > if (auto_options) { > fprintf(stderr, Name ": AUTO line may only be give once." > " Subsequent lines ignored\n"); > return; > } > - auto_options = line; > + > + auto_options = dl_strdup(line); > + dl_init(auto_options); > + > + for (w=dl_next(line); w != line ; w=dl_next(w)) { > + char *w2 = dl_strdup(w); > + dl_add(auto_options, w2); > + } > } > > int loaded = 0; ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-11 20:38 Minor mdadm fixes Doug Ledford ` (4 preceding siblings ...) 2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford @ 2010-01-12 0:49 ` Mr. James W. Laferriere 2010-01-12 3:10 ` Andre Noll 2010-01-18 22:05 ` Neil Brown 6 siblings, 1 reply; 66+ messages in thread From: Mr. James W. Laferriere @ 2010-01-12 0:49 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid Hello Doug , On Mon, 11 Jan 2010, Doug Ledford wrote: > These are a number of minor fixes we carry in our mdadm at the moment. Would > prefer not to carry them ourselves ;-) > > Neil, any clue when you think might release mdadm-3.1.2? Would you please annotate the git diffs you sent out ? For one I am very confused about the change of - sprintf(path, "/var/run/mdadm/%s.pid", devname); + sprintf(path, "/dev/.mdadm/%s.pid", devname); why ? For one . None of the other patches has a annotation either . While if one is a kernel coding guru they are probably very readable . I am not but would like to know what the intention is for the change(s) . Tia , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network&System Engineer | 3237 Holden Road | Give me Linux | | babydr@baby-dragons.com | Fairbanks, AK. 99709 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere @ 2010-01-12 3:10 ` Andre Noll 2010-01-12 3:36 ` Doug Ledford 0 siblings, 1 reply; 66+ messages in thread From: Andre Noll @ 2010-01-12 3:10 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: Doug Ledford, linux-raid [-- Attachment #1: Type: text/plain, Size: 1463 bytes --] On 15:49, Mr. James W. Laferriere wrote: > Hello Doug , > > On Mon, 11 Jan 2010, Doug Ledford wrote: > >These are a number of minor fixes we carry in our mdadm at the moment. > >Would > >prefer not to carry them ourselves ;-) > > > >Neil, any clue when you think might release mdadm-3.1.2? > Would you please annotate the git diffs you sent out ? > For one I am very confused about the change of > > - sprintf(path, "/var/run/mdadm/%s.pid", devname); > + sprintf(path, "/dev/.mdadm/%s.pid", devname); > > why ? For one . > > None of the other patches has a annotation either . While if one is > a kernel coding guru they are probably very readable . I am not but would > like to know what the intention is for the change(s) . The annotation is the subject line :) Before switching root it is sometimes desirable to mount --bind (or --move) the current /dev into the new root as the programs started from the new root will need the device nodes. For example the init scripts that run from the intramfs move /dev after it has been mounted as a ramfs and populated. I believe (Doug, please correct me if the following is wrong) the advantage of storing mdmon's pid files in /dev is that in /dev they remain visible after the switch to the new root. That's the "handoff after pivotroot" in the subject. Regards Andre -- The only person who always got his work done by Friday was Robinson Crusoe [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-12 3:10 ` Andre Noll @ 2010-01-12 3:36 ` Doug Ledford 2010-01-12 4:39 ` Andre Noll 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-12 3:36 UTC (permalink / raw) To: Andre Noll; +Cc: Mr. James W. Laferriere, linux-raid [-- Attachment #1: Type: text/plain, Size: 3021 bytes --] On 01/11/2010 10:10 PM, Andre Noll wrote: > On 15:49, Mr. James W. Laferriere wrote: >> Hello Doug , >> >> On Mon, 11 Jan 2010, Doug Ledford wrote: >>> These are a number of minor fixes we carry in our mdadm at the moment. >>> Would >>> prefer not to carry them ourselves ;-) >>> >>> Neil, any clue when you think might release mdadm-3.1.2? >> Would you please annotate the git diffs you sent out ? >> For one I am very confused about the change of >> >> - sprintf(path, "/var/run/mdadm/%s.pid", devname); >> + sprintf(path, "/dev/.mdadm/%s.pid", devname); >> >> why ? For one . >> >> None of the other patches has a annotation either . While if one is >> a kernel coding guru they are probably very readable . I am not but would >> like to know what the intention is for the change(s) . > > The annotation is the subject line :) > > Before switching root it is sometimes desirable to mount --bind (or > --move) the current /dev into the new root as the programs started > from the new root will need the device nodes. For example the init > scripts that run from the intramfs move /dev after it has been mounted > as a ramfs and populated. > > I believe (Doug, please correct me if the following is wrong) the > advantage of storing mdmon's pid files in /dev is that in /dev they > remain visible after the switch to the new root. That's the "handoff > after pivotroot" in the subject. I'm a little fuzzy on this myself. The original patch was from another Red Hatter that works on dracut, the new mkinitrd replacement in Fedora 12. When integrating IMSM support in the md raid stack and dracut, there became a problem with starting mdmon in the initramfs filesystem and then transitioning it to the new filesystem. It turns out that, as you point out, because /dev is moved from the initramfs to the new root (mainly because udev is now started in the initramfs), we could avoid a number of issues caused by mdmon's files being in /var/run instead of /dev. This also allowed us to do the mdmon restart *after* the switchroot had taken place and solved a number of issues with getting mdmon support to work. Like I said, I'm a bit fuzzy on the details myself because it was Hans de Goede that was doing this work, not me, and I just vaguely recall the problems this patch solved. But, it's not surprising given that we already did a similar patch to move the mdadm.map file from /var/run/mdadm to /dev simply because during the very early boot stages after you have done the switch root, /var/run is usually read only while /dev/ is read write and incremental assembly won't work properly when it can't write to the mdadm.map file. So, really this is in the same vein although the specific issues are different. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-12 3:36 ` Doug Ledford @ 2010-01-12 4:39 ` Andre Noll 2010-01-12 4:46 ` Doug Ledford 0 siblings, 1 reply; 66+ messages in thread From: Andre Noll @ 2010-01-12 4:39 UTC (permalink / raw) To: Doug Ledford; +Cc: Mr. James W. Laferriere, linux-raid [-- Attachment #1: Type: text/plain, Size: 1367 bytes --] On 22:36, Doug Ledford wrote: > > I believe (Doug, please correct me if the following is wrong) the > > advantage of storing mdmon's pid files in /dev is that in /dev they > > remain visible after the switch to the new root. That's the "handoff > > after pivotroot" in the subject. > > I'm a little fuzzy on this myself. The original patch was from another > Red Hatter that works on dracut, the new mkinitrd replacement in Fedora > 12. When integrating IMSM support in the md raid stack and dracut, > there became a problem with starting mdmon in the initramfs filesystem > and then transitioning it to the new filesystem. It turns out that, as > you point out, because /dev is moved from the initramfs to the new root > (mainly because udev is now started in the initramfs), we could avoid a > number of issues caused by mdmon's files being in /var/run instead of > /dev. One could also mount /var/run in the initramfs and move it over like /dev, but that gets a bit messy because /var might be on a yet another fs.. > This also allowed us to do the mdmon restart *after* the switchroot > had taken place and solved a number of issues with getting mdmon > support to work. Just out of interest: Why does mdmon need the restart at all? Thanks Andre -- The only person who always got his work done by Friday was Robinson Crusoe [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-12 4:39 ` Andre Noll @ 2010-01-12 4:46 ` Doug Ledford 2010-01-12 5:21 ` Andre Noll 0 siblings, 1 reply; 66+ messages in thread From: Doug Ledford @ 2010-01-12 4:46 UTC (permalink / raw) To: Andre Noll; +Cc: Mr. James W. Laferriere, linux-raid [-- Attachment #1: Type: text/plain, Size: 1944 bytes --] On 01/11/2010 11:39 PM, Andre Noll wrote: > On 22:36, Doug Ledford wrote: >>> I believe (Doug, please correct me if the following is wrong) the >>> advantage of storing mdmon's pid files in /dev is that in /dev they >>> remain visible after the switch to the new root. That's the "handoff >>> after pivotroot" in the subject. >> >> I'm a little fuzzy on this myself. The original patch was from another >> Red Hatter that works on dracut, the new mkinitrd replacement in Fedora >> 12. When integrating IMSM support in the md raid stack and dracut, >> there became a problem with starting mdmon in the initramfs filesystem >> and then transitioning it to the new filesystem. It turns out that, as >> you point out, because /dev is moved from the initramfs to the new root >> (mainly because udev is now started in the initramfs), we could avoid a >> number of issues caused by mdmon's files being in /var/run instead of >> /dev. > > One could also mount /var/run in the initramfs and move it over > like /dev, but that gets a bit messy because /var might be on a yet > another fs.. > >> This also allowed us to do the mdmon restart *after* the switchroot >> had taken place and solved a number of issues with getting mdmon >> support to work. > > Just out of interest: Why does mdmon need the restart at all? Because of the design and limitation of page cache. If you start a long running program from the initramfs, then the page cache usage will pin the initramfs in memory. You need to reexec the program from the hard drive so that the initramfs can be freed. Of course, this is all because superblock handling for imsm superblocks is in user space...<grumble, grumble, grumble>. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-12 4:46 ` Doug Ledford @ 2010-01-12 5:21 ` Andre Noll 0 siblings, 0 replies; 66+ messages in thread From: Andre Noll @ 2010-01-12 5:21 UTC (permalink / raw) To: Doug Ledford; +Cc: Mr. James W. Laferriere, linux-raid [-- Attachment #1: Type: text/plain, Size: 659 bytes --] On 23:46, Doug Ledford wrote: > > Just out of interest: Why does mdmon need the restart at all? > > Because of the design and limitation of page cache. If you start a long > running program from the initramfs, then the page cache usage will pin > the initramfs in memory. You need to reexec the program from the hard > drive so that the initramfs can be freed. Of course, this is all > because superblock handling for imsm superblocks is in user > space...<grumble, grumble, grumble>. You could copy the mdmon executable to /dev and execute it there :) Andre -- The only person who always got his work done by Friday was Robinson Crusoe [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes 2010-01-11 20:38 Minor mdadm fixes Doug Ledford ` (5 preceding siblings ...) 2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere @ 2010-01-18 22:05 ` Neil Brown 6 siblings, 0 replies; 66+ messages in thread From: Neil Brown @ 2010-01-18 22:05 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid On Mon, 11 Jan 2010 15:38:09 -0500 Doug Ledford <dledford@redhat.com> wrote: > These are a number of minor fixes we carry in our mdadm at the moment. Would > prefer not to carry them ourselves ;-) Fair enough - I've taken three of them... > > Neil, any clue when you think might release mdadm-3.1.2? Maybe in February. Feel free to remind me if I haven't done anything by about the 14th. Thanks, NeilBrown > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2010-02-11 2:30 UTC | newest] Thread overview: 66+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-01-11 20:38 Minor mdadm fixes Doug Ledford 2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford 2010-01-18 22:01 ` Neil Brown 2010-01-18 22:13 ` Dan Williams 2010-01-19 1:55 ` Doug Ledford 2010-01-19 4:42 ` Dan Williams 2010-01-19 5:31 ` Doug Ledford 2010-01-19 5:47 ` Dan Williams 2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford 2010-01-18 22:09 ` Neil Brown 2010-01-19 7:21 ` Luca Berra 2010-01-19 17:51 ` Doug Ledford 2010-02-01 20:32 ` Bill Davidsen 2010-02-01 21:32 ` Doug Ledford 2010-02-01 22:42 ` Bill Davidsen 2010-02-02 4:08 ` Michael Evans 2010-02-02 7:17 ` Luca Berra 2010-02-02 15:42 ` Bill Davidsen 2010-02-02 18:19 ` Doug Ledford 2010-02-04 13:50 ` Bernd Schubert 2010-02-04 15:03 ` Bernd Schubert 2010-02-04 15:48 ` Doug Ledford 2010-02-04 16:40 ` Bernd Schubert 2010-02-04 17:35 ` Doug Ledford 2010-02-02 18:11 ` Doug Ledford 2010-02-02 18:07 ` Doug Ledford 2010-02-02 18:18 ` Bill Davidsen 2010-02-04 6:40 ` Neil Brown 2010-02-04 18:45 ` Doug Ledford [not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2010-02-04 23:04 ` Dan Williams [not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-02-05 0:21 ` Bill Davidsen 2010-02-05 12:14 ` Luca Berra 2010-02-06 17:51 ` Doug Ledford [not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2010-02-06 21:07 ` Dan Williams [not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-02-06 21:46 ` martin f krafft 2010-02-06 22:06 ` Michael Evans 2010-02-08 15:32 ` Doug Ledford 2010-02-08 21:38 ` Neil Brown 2010-02-09 0:20 ` Michael Evans [not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2010-02-09 2:19 ` martin f krafft [not found] ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org> 2010-02-09 20:34 ` Doug Ledford [not found] ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2010-02-10 0:58 ` Mr. James W. Laferriere [not found] ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org> 2010-02-10 1:33 ` Neil Brown 2010-02-10 9:46 ` Harald Hoyer [not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2010-02-10 15:49 ` Dan Williams 2010-02-10 16:06 ` Michael Evans [not found] ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-02-11 2:30 ` Doug Ledford 2010-02-09 20:30 ` Doug Ledford 2010-02-08 4:23 ` Neil Brown 2010-02-07 22:13 ` Hans de Goede 2010-02-07 23:06 ` Neil Brown 2010-02-08 3:45 ` Neil Brown 2010-02-08 16:56 ` Bill Nottingham 2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford 2010-01-18 22:02 ` Neil Brown 2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford 2010-01-18 22:03 ` Neil Brown 2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford 2010-01-18 22:03 ` Neil Brown 2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere 2010-01-12 3:10 ` Andre Noll 2010-01-12 3:36 ` Doug Ledford 2010-01-12 4:39 ` Andre Noll 2010-01-12 4:46 ` Doug Ledford 2010-01-12 5:21 ` Andre Noll 2010-01-18 22:05 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).