* Minor mdadm fixes
@ 2010-01-11 20:38 Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
` (6 more replies)
0 siblings, 7 replies; 66+ messages in thread
From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw)
To: linux-raid
These are a number of minor fixes we carry in our mdadm at the moment. Would
prefer not to carry them ourselves ;-)
Neil, any clue when you think might release mdadm-3.1.2?
^ permalink raw reply [flat|nested] 66+ messages in thread
* [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
@ 2010-01-11 20:38 ` Doug Ledford
2010-01-18 22:01 ` Neil Brown
2010-01-18 22:13 ` Dan Williams
2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford
` (5 subsequent siblings)
6 siblings, 2 replies; 66+ messages in thread
From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw)
To: linux-raid; +Cc: Doug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
super-intel.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index d6951cc..fcf438c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
dd->fd = fd;
dd->e = NULL;
rv = imsm_read_serial(fd, devname, dd->serial);
- if (rv) {
+ if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) {
+ memset(dd->serial, 0, MAX_RAID_SERIAL_LEN);
+ fd2devname(fd, (char *) dd->serial);
+ } else if (rv) {
fprintf(stderr,
Name ": failed to retrieve scsi serial, aborting\n");
free(dd);
--
1.6.5.2
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
@ 2010-01-11 20:38 ` Doug Ledford
2010-01-18 22:09 ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford
` (4 subsequent siblings)
6 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw)
To: linux-raid; +Cc: Doug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
mdmon.c | 12 ++++++------
msg.c | 2 +-
util.c | 4 ++--
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/mdmon.c b/mdmon.c
index 0ec4259..b1d7aef 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -118,7 +118,7 @@ static int test_pidfile(char *devname)
char path[100];
struct stat st;
- sprintf(path, "/var/run/mdadm/%s.pid", devname);
+ sprintf(path, "/dev/.mdadm/%s.pid", devname);
return stat(path, &st);
}
@@ -132,7 +132,7 @@ int make_pidfile(char *devname, int o_excl)
if (sigterm)
return -1;
- sprintf(path, "/var/run/mdadm/%s.pid", devname);
+ sprintf(path, "/dev/.mdadm/%s.pid", devname);
fd = open(path, O_RDWR|O_CREAT|o_excl, 0600);
if (fd < 0)
@@ -163,7 +163,7 @@ pid_t devname2mdmon(char *devname)
pid_t pid = -1;
int fd;
- sprintf(buf, "/var/run/mdadm/%s.pid", devname);
+ sprintf(buf, "/dev/.mdadm/%s.pid", devname);
fd = open(buf, O_RDONLY|O_NOATIME);
if (fd < 0)
return -1;
@@ -217,9 +217,9 @@ void remove_pidfile(char *devname)
if (sigterm)
return;
- sprintf(buf, "/var/run/mdadm/%s.pid", devname);
+ sprintf(buf, "/dev/.mdadm/%s.pid", devname);
unlink(buf);
- sprintf(buf, "/var/run/mdadm/%s.sock", devname);
+ sprintf(buf, "/dev/.mdadm/%s.sock", devname);
unlink(buf);
}
@@ -233,7 +233,7 @@ int make_control_sock(char *devname)
if (sigterm)
return -1;
- sprintf(path, "/var/run/mdadm/%s.sock", devname);
+ sprintf(path, "/dev/.mdadm/%s.sock", devname);
unlink(path);
sfd = socket(PF_LOCAL, SOCK_STREAM, 0);
if (sfd < 0)
diff --git a/msg.c b/msg.c
index 8d52b94..c3ab243 100644
--- a/msg.c
+++ b/msg.c
@@ -147,7 +147,7 @@ int connect_monitor(char *devname)
int pos;
char *c;
- pos = sprintf(path, "/var/run/mdadm/");
+ pos = sprintf(path, "/dev/.mdadm/");
if (is_subarray(devname)) {
devname++;
c = strchr(devname, '/');
diff --git a/util.c b/util.c
index 5feec43..864af69 100644
--- a/util.c
+++ b/util.c
@@ -1469,7 +1469,7 @@ int mdmon_running(int devnum)
char pid[10];
int fd;
int n;
- sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum));
+ sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum));
fd = open(path, O_RDONLY, 0);
if (fd < 0)
@@ -1489,7 +1489,7 @@ int signal_mdmon(int devnum)
char pid[10];
int fd;
int n;
- sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum));
+ sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum));
fd = open(path, O_RDONLY, 0);
if (fd < 0)
--
1.6.5.2
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford
@ 2010-01-11 20:38 ` Doug Ledford
2010-01-18 22:02 ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford
` (3 subsequent siblings)
6 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw)
To: linux-raid; +Cc: Doug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
Detail.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Detail.c b/Detail.c
index 0e47a05..ba07c83 100644
--- a/Detail.c
+++ b/Detail.c
@@ -174,7 +174,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
if (sra && sra->array.major_version < 0)
printf("MD_METADATA=%s\n", sra->text_version);
else
- printf("MD_METADATA=%d.%02d\n",
+ printf("MD_METADATA=%d.%d\n",
array.major_version, array.minor_version);
}
@@ -226,7 +226,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
if (sra && sra->array.major_version < 0)
printf(" metadata=%s", sra->text_version);
else
- printf(" metadata=%d.%02d",
+ printf(" metadata=%d.%d",
array.major_version, array.minor_version);
}
@@ -259,7 +259,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
if (sra && sra->array.major_version < 0)
printf(" Version : %s\n", sra->text_version);
else
- printf(" Version : %d.%02d\n",
+ printf(" Version : %d.%d\n",
array.major_version, array.minor_version);
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
` (2 preceding siblings ...)
2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford
@ 2010-01-11 20:38 ` Doug Ledford
2010-01-18 22:03 ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford
` (2 subsequent siblings)
6 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw)
To: linux-raid; +Cc: Doug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
Detail.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/Detail.c b/Detail.c
index ba07c83..e05ba10 100644
--- a/Detail.c
+++ b/Detail.c
@@ -203,6 +203,11 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
if (mp && mp->path &&
strncmp(mp->path, "/dev/md/", 8) == 0)
printf("MD_DEVNAME=%s\n", mp->path+8);
+ if (mp && (mp->uuid[0] || mp->uuid[1] || mp->uuid[2] ||
+ mp->uuid[3]))
+ printf("MD_UUID=%08x:%08x:%08x:%08x\n",
+ mp->uuid[0], mp->uuid[1], mp->uuid[2],
+ mp->uuid[3]);
}
goto out;
}
--
1.6.5.2
^ permalink raw reply related [flat|nested] 66+ messages in thread
* [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
` (3 preceding siblings ...)
2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford
@ 2010-01-11 20:38 ` Doug Ledford
2010-01-18 22:03 ` Neil Brown
2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere
2010-01-18 22:05 ` Neil Brown
6 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-11 20:38 UTC (permalink / raw)
To: linux-raid; +Cc: Doug Ledford
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
config.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/config.c b/config.c
index c962afd..2943221 100644
--- a/config.c
+++ b/config.c
@@ -677,12 +677,21 @@ void homehostline(char *line)
static char *auto_options = NULL;
void autoline(char *line)
{
+ char *w;
+
if (auto_options) {
fprintf(stderr, Name ": AUTO line may only be give once."
" Subsequent lines ignored\n");
return;
}
- auto_options = line;
+
+ auto_options = dl_strdup(line);
+ dl_init(auto_options);
+
+ for (w=dl_next(line); w != line ; w=dl_next(w)) {
+ char *w2 = dl_strdup(w);
+ dl_add(auto_options, w2);
+ }
}
int loaded = 0;
--
1.6.5.2
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
` (4 preceding siblings ...)
2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford
@ 2010-01-12 0:49 ` Mr. James W. Laferriere
2010-01-12 3:10 ` Andre Noll
2010-01-18 22:05 ` Neil Brown
6 siblings, 1 reply; 66+ messages in thread
From: Mr. James W. Laferriere @ 2010-01-12 0:49 UTC (permalink / raw)
To: Doug Ledford; +Cc: linux-raid
Hello Doug ,
On Mon, 11 Jan 2010, Doug Ledford wrote:
> These are a number of minor fixes we carry in our mdadm at the moment. Would
> prefer not to carry them ourselves ;-)
>
> Neil, any clue when you think might release mdadm-3.1.2?
Would you please annotate the git diffs you sent out ?
For one I am very confused about the change of
- sprintf(path, "/var/run/mdadm/%s.pid", devname);
+ sprintf(path, "/dev/.mdadm/%s.pid", devname);
why ? For one .
None of the other patches has a annotation either . While if one is a
kernel coding guru they are probably very readable . I am not but would like to
know what the intention is for the change(s) .
Tia , JimL
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network&System Engineer | 3237 Holden Road | Give me Linux |
| babydr@baby-dragons.com | Fairbanks, AK. 99709 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere
@ 2010-01-12 3:10 ` Andre Noll
2010-01-12 3:36 ` Doug Ledford
0 siblings, 1 reply; 66+ messages in thread
From: Andre Noll @ 2010-01-12 3:10 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: Doug Ledford, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]
On 15:49, Mr. James W. Laferriere wrote:
> Hello Doug ,
>
> On Mon, 11 Jan 2010, Doug Ledford wrote:
> >These are a number of minor fixes we carry in our mdadm at the moment.
> >Would
> >prefer not to carry them ourselves ;-)
> >
> >Neil, any clue when you think might release mdadm-3.1.2?
> Would you please annotate the git diffs you sent out ?
> For one I am very confused about the change of
>
> - sprintf(path, "/var/run/mdadm/%s.pid", devname);
> + sprintf(path, "/dev/.mdadm/%s.pid", devname);
>
> why ? For one .
>
> None of the other patches has a annotation either . While if one is
> a kernel coding guru they are probably very readable . I am not but would
> like to know what the intention is for the change(s) .
The annotation is the subject line :)
Before switching root it is sometimes desirable to mount --bind (or
--move) the current /dev into the new root as the programs started
from the new root will need the device nodes. For example the init
scripts that run from the intramfs move /dev after it has been mounted
as a ramfs and populated.
I believe (Doug, please correct me if the following is wrong) the
advantage of storing mdmon's pid files in /dev is that in /dev they
remain visible after the switch to the new root. That's the "handoff
after pivotroot" in the subject.
Regards
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-12 3:10 ` Andre Noll
@ 2010-01-12 3:36 ` Doug Ledford
2010-01-12 4:39 ` Andre Noll
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-12 3:36 UTC (permalink / raw)
To: Andre Noll; +Cc: Mr. James W. Laferriere, linux-raid
[-- Attachment #1: Type: text/plain, Size: 3021 bytes --]
On 01/11/2010 10:10 PM, Andre Noll wrote:
> On 15:49, Mr. James W. Laferriere wrote:
>> Hello Doug ,
>>
>> On Mon, 11 Jan 2010, Doug Ledford wrote:
>>> These are a number of minor fixes we carry in our mdadm at the moment.
>>> Would
>>> prefer not to carry them ourselves ;-)
>>>
>>> Neil, any clue when you think might release mdadm-3.1.2?
>> Would you please annotate the git diffs you sent out ?
>> For one I am very confused about the change of
>>
>> - sprintf(path, "/var/run/mdadm/%s.pid", devname);
>> + sprintf(path, "/dev/.mdadm/%s.pid", devname);
>>
>> why ? For one .
>>
>> None of the other patches has a annotation either . While if one is
>> a kernel coding guru they are probably very readable . I am not but would
>> like to know what the intention is for the change(s) .
>
> The annotation is the subject line :)
>
> Before switching root it is sometimes desirable to mount --bind (or
> --move) the current /dev into the new root as the programs started
> from the new root will need the device nodes. For example the init
> scripts that run from the intramfs move /dev after it has been mounted
> as a ramfs and populated.
>
> I believe (Doug, please correct me if the following is wrong) the
> advantage of storing mdmon's pid files in /dev is that in /dev they
> remain visible after the switch to the new root. That's the "handoff
> after pivotroot" in the subject.
I'm a little fuzzy on this myself. The original patch was from another
Red Hatter that works on dracut, the new mkinitrd replacement in Fedora
12. When integrating IMSM support in the md raid stack and dracut,
there became a problem with starting mdmon in the initramfs filesystem
and then transitioning it to the new filesystem. It turns out that, as
you point out, because /dev is moved from the initramfs to the new root
(mainly because udev is now started in the initramfs), we could avoid a
number of issues caused by mdmon's files being in /var/run instead of
/dev. This also allowed us to do the mdmon restart *after* the
switchroot had taken place and solved a number of issues with getting
mdmon support to work. Like I said, I'm a bit fuzzy on the details
myself because it was Hans de Goede that was doing this work, not me,
and I just vaguely recall the problems this patch solved. But, it's not
surprising given that we already did a similar patch to move the
mdadm.map file from /var/run/mdadm to /dev simply because during the
very early boot stages after you have done the switch root, /var/run is
usually read only while /dev/ is read write and incremental assembly
won't work properly when it can't write to the mdadm.map file. So,
really this is in the same vein although the specific issues are different.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-12 3:36 ` Doug Ledford
@ 2010-01-12 4:39 ` Andre Noll
2010-01-12 4:46 ` Doug Ledford
0 siblings, 1 reply; 66+ messages in thread
From: Andre Noll @ 2010-01-12 4:39 UTC (permalink / raw)
To: Doug Ledford; +Cc: Mr. James W. Laferriere, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1367 bytes --]
On 22:36, Doug Ledford wrote:
> > I believe (Doug, please correct me if the following is wrong) the
> > advantage of storing mdmon's pid files in /dev is that in /dev they
> > remain visible after the switch to the new root. That's the "handoff
> > after pivotroot" in the subject.
>
> I'm a little fuzzy on this myself. The original patch was from another
> Red Hatter that works on dracut, the new mkinitrd replacement in Fedora
> 12. When integrating IMSM support in the md raid stack and dracut,
> there became a problem with starting mdmon in the initramfs filesystem
> and then transitioning it to the new filesystem. It turns out that, as
> you point out, because /dev is moved from the initramfs to the new root
> (mainly because udev is now started in the initramfs), we could avoid a
> number of issues caused by mdmon's files being in /var/run instead of
> /dev.
One could also mount /var/run in the initramfs and move it over
like /dev, but that gets a bit messy because /var might be on a yet
another fs..
> This also allowed us to do the mdmon restart *after* the switchroot
> had taken place and solved a number of issues with getting mdmon
> support to work.
Just out of interest: Why does mdmon need the restart at all?
Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-12 4:39 ` Andre Noll
@ 2010-01-12 4:46 ` Doug Ledford
2010-01-12 5:21 ` Andre Noll
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-12 4:46 UTC (permalink / raw)
To: Andre Noll; +Cc: Mr. James W. Laferriere, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
On 01/11/2010 11:39 PM, Andre Noll wrote:
> On 22:36, Doug Ledford wrote:
>>> I believe (Doug, please correct me if the following is wrong) the
>>> advantage of storing mdmon's pid files in /dev is that in /dev they
>>> remain visible after the switch to the new root. That's the "handoff
>>> after pivotroot" in the subject.
>>
>> I'm a little fuzzy on this myself. The original patch was from another
>> Red Hatter that works on dracut, the new mkinitrd replacement in Fedora
>> 12. When integrating IMSM support in the md raid stack and dracut,
>> there became a problem with starting mdmon in the initramfs filesystem
>> and then transitioning it to the new filesystem. It turns out that, as
>> you point out, because /dev is moved from the initramfs to the new root
>> (mainly because udev is now started in the initramfs), we could avoid a
>> number of issues caused by mdmon's files being in /var/run instead of
>> /dev.
>
> One could also mount /var/run in the initramfs and move it over
> like /dev, but that gets a bit messy because /var might be on a yet
> another fs..
>
>> This also allowed us to do the mdmon restart *after* the switchroot
>> had taken place and solved a number of issues with getting mdmon
>> support to work.
>
> Just out of interest: Why does mdmon need the restart at all?
Because of the design and limitation of page cache. If you start a long
running program from the initramfs, then the page cache usage will pin
the initramfs in memory. You need to reexec the program from the hard
drive so that the initramfs can be freed. Of course, this is all
because superblock handling for imsm superblocks is in user
space...<grumble, grumble, grumble>.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-12 4:46 ` Doug Ledford
@ 2010-01-12 5:21 ` Andre Noll
0 siblings, 0 replies; 66+ messages in thread
From: Andre Noll @ 2010-01-12 5:21 UTC (permalink / raw)
To: Doug Ledford; +Cc: Mr. James W. Laferriere, linux-raid
[-- Attachment #1: Type: text/plain, Size: 659 bytes --]
On 23:46, Doug Ledford wrote:
> > Just out of interest: Why does mdmon need the restart at all?
>
> Because of the design and limitation of page cache. If you start a long
> running program from the initramfs, then the page cache usage will pin
> the initramfs in memory. You need to reexec the program from the hard
> drive so that the initramfs can be freed. Of course, this is all
> because superblock handling for imsm superblocks is in user
> space...<grumble, grumble, grumble>.
You could copy the mdmon executable to /dev and execute it there :)
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
@ 2010-01-18 22:01 ` Neil Brown
2010-01-18 22:13 ` Dan Williams
1 sibling, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-01-18 22:01 UTC (permalink / raw)
Cc: linux-raid, Doug Ledford
On Mon, 11 Jan 2010 15:38:10 -0500
Doug Ledford <dledford@redhat.com> wrote:
Applied, thanks.
NeilBrown
> Signed-off-by: Doug Ledford <dledford@redhat.com>
> ---
> super-intel.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index d6951cc..fcf438c 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
> dd->fd = fd;
> dd->e = NULL;
> rv = imsm_read_serial(fd, devname, dd->serial);
> - if (rv) {
> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) {
> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN);
> + fd2devname(fd, (char *) dd->serial);
> + } else if (rv) {
> fprintf(stderr,
> Name ": failed to retrieve scsi serial, aborting\n");
> free(dd);
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later
2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford
@ 2010-01-18 22:02 ` Neil Brown
0 siblings, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-01-18 22:02 UTC (permalink / raw)
Cc: linux-raid, Doug Ledford
On Mon, 11 Jan 2010 15:38:12 -0500
Doug Ledford <dledford@redhat.com> wrote:
> Signed-off-by: Doug Ledford <dledford@redhat.com>
Applied, thanks.
NeilBrown
> ---
> Detail.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Detail.c b/Detail.c
> index 0e47a05..ba07c83 100644
> --- a/Detail.c
> +++ b/Detail.c
> @@ -174,7 +174,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
> if (sra && sra->array.major_version < 0)
> printf("MD_METADATA=%s\n", sra->text_version);
> else
> - printf("MD_METADATA=%d.%02d\n",
> + printf("MD_METADATA=%d.%d\n",
> array.major_version, array.minor_version);
> }
>
> @@ -226,7 +226,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
> if (sra && sra->array.major_version < 0)
> printf(" metadata=%s", sra->text_version);
> else
> - printf(" metadata=%d.%02d",
> + printf(" metadata=%d.%d",
> array.major_version, array.minor_version);
> }
>
> @@ -259,7 +259,7 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
> if (sra && sra->array.major_version < 0)
> printf(" Version : %s\n", sra->text_version);
> else
> - printf(" Version : %d.%02d\n",
> + printf(" Version : %d.%d\n",
> array.major_version, array.minor_version);
> }
>
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out
2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford
@ 2010-01-18 22:03 ` Neil Brown
0 siblings, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-01-18 22:03 UTC (permalink / raw)
Cc: linux-raid, Doug Ledford
On Mon, 11 Jan 2010 15:38:13 -0500
Doug Ledford <dledford@redhat.com> wrote:
> Signed-off-by: Doug Ledford <dledford@redhat.com>
Already have this functionality since
commit aae5a11207cf6da1682e6a76e116a19e21473f03
13Oct 2009.
Thanks,
NeilBrown
> ---
> Detail.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/Detail.c b/Detail.c
> index ba07c83..e05ba10 100644
> --- a/Detail.c
> +++ b/Detail.c
> @@ -203,6 +203,11 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
> if (mp && mp->path &&
> strncmp(mp->path, "/dev/md/", 8) == 0)
> printf("MD_DEVNAME=%s\n", mp->path+8);
> + if (mp && (mp->uuid[0] || mp->uuid[1] || mp->uuid[2] ||
> + mp->uuid[3]))
> + printf("MD_UUID=%08x:%08x:%08x:%08x\n",
> + mp->uuid[0], mp->uuid[1], mp->uuid[2],
> + mp->uuid[3]);
> }
> goto out;
> }
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file
2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford
@ 2010-01-18 22:03 ` Neil Brown
0 siblings, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-01-18 22:03 UTC (permalink / raw)
Cc: linux-raid, Doug Ledford
On Mon, 11 Jan 2010 15:38:14 -0500
Doug Ledford <dledford@redhat.com> wrote:
> Signed-off-by: Doug Ledford <dledford@redhat.com>
Applied, thanks.
NeilBrown
> ---
> config.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/config.c b/config.c
> index c962afd..2943221 100644
> --- a/config.c
> +++ b/config.c
> @@ -677,12 +677,21 @@ void homehostline(char *line)
> static char *auto_options = NULL;
> void autoline(char *line)
> {
> + char *w;
> +
> if (auto_options) {
> fprintf(stderr, Name ": AUTO line may only be give once."
> " Subsequent lines ignored\n");
> return;
> }
> - auto_options = line;
> +
> + auto_options = dl_strdup(line);
> + dl_init(auto_options);
> +
> + for (w=dl_next(line); w != line ; w=dl_next(w)) {
> + char *w2 = dl_strdup(w);
> + dl_add(auto_options, w2);
> + }
> }
>
> int loaded = 0;
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Minor mdadm fixes
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
` (5 preceding siblings ...)
2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere
@ 2010-01-18 22:05 ` Neil Brown
6 siblings, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-01-18 22:05 UTC (permalink / raw)
To: Doug Ledford; +Cc: linux-raid
On Mon, 11 Jan 2010 15:38:09 -0500
Doug Ledford <dledford@redhat.com> wrote:
> These are a number of minor fixes we carry in our mdadm at the moment. Would
> prefer not to carry them ourselves ;-)
Fair enough - I've taken three of them...
>
> Neil, any clue when you think might release mdadm-3.1.2?
Maybe in February. Feel free to remind me if I haven't done anything by
about the 14th.
Thanks,
NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford
@ 2010-01-18 22:09 ` Neil Brown
2010-01-19 7:21 ` Luca Berra
2010-01-19 17:51 ` Doug Ledford
0 siblings, 2 replies; 66+ messages in thread
From: Neil Brown @ 2010-01-18 22:09 UTC (permalink / raw)
Cc: linux-raid, Doug Ledford
On Mon, 11 Jan 2010 15:38:11 -0500
Doug Ledford <dledford@redhat.com> wrote:
> Signed-off-by: Doug Ledford <dledford@redhat.com>
I really really don't like this.
I wasn't very keen on allowing the map file to be found in /dev,
but this it just too ugly.
I understand there is a problem here, but I don't like this approach to a
solution. I'll give it more though when I get home from LCA2010 and see
what I can come up with.
Thanks,
NeilBrown
> ---
> mdmon.c | 12 ++++++------
> msg.c | 2 +-
> util.c | 4 ++--
> 3 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/mdmon.c b/mdmon.c
> index 0ec4259..b1d7aef 100644
> --- a/mdmon.c
> +++ b/mdmon.c
> @@ -118,7 +118,7 @@ static int test_pidfile(char *devname)
> char path[100];
> struct stat st;
>
> - sprintf(path, "/var/run/mdadm/%s.pid", devname);
> + sprintf(path, "/dev/.mdadm/%s.pid", devname);
> return stat(path, &st);
> }
>
> @@ -132,7 +132,7 @@ int make_pidfile(char *devname, int o_excl)
> if (sigterm)
> return -1;
>
> - sprintf(path, "/var/run/mdadm/%s.pid", devname);
> + sprintf(path, "/dev/.mdadm/%s.pid", devname);
>
> fd = open(path, O_RDWR|O_CREAT|o_excl, 0600);
> if (fd < 0)
> @@ -163,7 +163,7 @@ pid_t devname2mdmon(char *devname)
> pid_t pid = -1;
> int fd;
>
> - sprintf(buf, "/var/run/mdadm/%s.pid", devname);
> + sprintf(buf, "/dev/.mdadm/%s.pid", devname);
> fd = open(buf, O_RDONLY|O_NOATIME);
> if (fd < 0)
> return -1;
> @@ -217,9 +217,9 @@ void remove_pidfile(char *devname)
> if (sigterm)
> return;
>
> - sprintf(buf, "/var/run/mdadm/%s.pid", devname);
> + sprintf(buf, "/dev/.mdadm/%s.pid", devname);
> unlink(buf);
> - sprintf(buf, "/var/run/mdadm/%s.sock", devname);
> + sprintf(buf, "/dev/.mdadm/%s.sock", devname);
> unlink(buf);
> }
>
> @@ -233,7 +233,7 @@ int make_control_sock(char *devname)
> if (sigterm)
> return -1;
>
> - sprintf(path, "/var/run/mdadm/%s.sock", devname);
> + sprintf(path, "/dev/.mdadm/%s.sock", devname);
> unlink(path);
> sfd = socket(PF_LOCAL, SOCK_STREAM, 0);
> if (sfd < 0)
> diff --git a/msg.c b/msg.c
> index 8d52b94..c3ab243 100644
> --- a/msg.c
> +++ b/msg.c
> @@ -147,7 +147,7 @@ int connect_monitor(char *devname)
> int pos;
> char *c;
>
> - pos = sprintf(path, "/var/run/mdadm/");
> + pos = sprintf(path, "/dev/.mdadm/");
> if (is_subarray(devname)) {
> devname++;
> c = strchr(devname, '/');
> diff --git a/util.c b/util.c
> index 5feec43..864af69 100644
> --- a/util.c
> +++ b/util.c
> @@ -1469,7 +1469,7 @@ int mdmon_running(int devnum)
> char pid[10];
> int fd;
> int n;
> - sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum));
> + sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum));
> fd = open(path, O_RDONLY, 0);
>
> if (fd < 0)
> @@ -1489,7 +1489,7 @@ int signal_mdmon(int devnum)
> char pid[10];
> int fd;
> int n;
> - sprintf(path, "/var/run/mdadm/%s.pid", devnum2devname(devnum));
> + sprintf(path, "/dev/.mdadm/%s.pid", devnum2devname(devnum));
> fd = open(path, O_RDONLY, 0);
>
> if (fd < 0)
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
2010-01-18 22:01 ` Neil Brown
@ 2010-01-18 22:13 ` Dan Williams
2010-01-19 1:55 ` Doug Ledford
1 sibling, 1 reply; 66+ messages in thread
From: Dan Williams @ 2010-01-18 22:13 UTC (permalink / raw)
To: Doug Ledford; +Cc: linux-raid
Hi Doug,
On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote:
> Signed-off-by: Doug Ledford <dledford@redhat.com>
> ---
> super-intel.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index d6951cc..fcf438c 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
> dd->fd = fd;
> dd->e = NULL;
> rv = imsm_read_serial(fd, devname, dd->serial);
> - if (rv) {
> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) {
> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN);
> + fd2devname(fd, (char *) dd->serial);
> + } else if (rv) {
This just duplicates the check already inside imsm_read_serial().
Containers on loopback devices worked before this patch, so I'll send
a revert.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-18 22:13 ` Dan Williams
@ 2010-01-19 1:55 ` Doug Ledford
2010-01-19 4:42 ` Dan Williams
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-19 1:55 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]
On 01/18/2010 05:13 PM, Dan Williams wrote:
> Hi Doug,
>
> On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote:
>> Signed-off-by: Doug Ledford <dledford@redhat.com>
>> ---
>> super-intel.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/super-intel.c b/super-intel.c
>> index d6951cc..fcf438c 100644
>> --- a/super-intel.c
>> +++ b/super-intel.c
>> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
>> dd->fd = fd;
>> dd->e = NULL;
>> rv = imsm_read_serial(fd, devname, dd->serial);
>> - if (rv) {
>> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) {
>> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN);
>> + fd2devname(fd, (char *) dd->serial);
>> + } else if (rv) {
>
> This just duplicates the check already inside imsm_read_serial().
> Containers on loopback devices worked before this patch, so I'll send
> a revert.
>
> --
> Dan
Me thinks you didn't try it, because this does not duplicate the code in
imsm_read_serial(). That code is needed to assemble an IMSM array that
already exists on loopback devices. This is needed to *create* an imsm
container on fresh loopback devices. I'm assuming your imsm container
superblocks already existed or some such.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-19 1:55 ` Doug Ledford
@ 2010-01-19 4:42 ` Dan Williams
2010-01-19 5:31 ` Doug Ledford
0 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2010-01-19 4:42 UTC (permalink / raw)
To: Doug Ledford; +Cc: linux-raid
On Mon, Jan 18, 2010 at 6:55 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 01/18/2010 05:13 PM, Dan Williams wrote:
>> Hi Doug,
>>
>> On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote:
>>> Signed-off-by: Doug Ledford <dledford@redhat.com>
>>> ---
>>> super-intel.c | 5 ++++-
>>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/super-intel.c b/super-intel.c
>>> index d6951cc..fcf438c 100644
>>> --- a/super-intel.c
>>> +++ b/super-intel.c
>>> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
>>> dd->fd = fd;
>>> dd->e = NULL;
>>> rv = imsm_read_serial(fd, devname, dd->serial);
>>> - if (rv) {
>>> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) {
>>> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN);
>>> + fd2devname(fd, (char *) dd->serial);
>>> + } else if (rv) {
>>
>> This just duplicates the check already inside imsm_read_serial().
>> Containers on loopback devices worked before this patch, so I'll send
>> a revert.
>>
>> --
>> Dan
>
> Me thinks you didn't try it, because this does not duplicate the code in
> imsm_read_serial(). That code is needed to assemble an IMSM array that
> already exists on loopback devices. This is needed to *create* an imsm
> container on fresh loopback devices. I'm assuming your imsm container
> superblocks already existed or some such.
>
Me thinks you did not try it either :-)
# export IMSM_DEVNAME_AS_SERIAL=1
# mdadm --zero-superblock /dev/loop[0-3]
# mdadm -Eb /dev/loop[0-4]
# mdadm -E /dev/loop0
mdadm: No md superblock detected on /dev/loop0.
# mdadm --create /dev/md/imsm /dev/loop[0-3] -n 4 -e imsm
mdadm: /dev/loop0 appears to contain an ext2fs file system
size=306816K mtime=Sat Nov 21 10:54:51 2009
mdadm: /dev/loop1 appears to contain an ext2fs file system
size=306816K mtime=Sat Nov 21 10:54:51 2009
mdadm: imsm unable to enumerate platform support
array may not be compatible with hardware/firmware
Continue creating array? y
mdadm: container /dev/md/imsm prepared.
# mdadm -Eb /dev/loop[0-3]
ARRAY metadata=imsm
spares=4
# mdadm -E /dev/loop0
/dev/loop0:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.0.00
Orig Family : 00000000
Family : 697a43ec
Generation : 00000001
UUID : ffffffff:ffffffff:ffffffff:ffffffff
Checksum : c3d8d367 correct
MPB Sectors : 1
Disks : 1
RAID Devices : 0
Disk00 Serial : /dev/loop0
State : spare
Id : 00000000
Usable Size : 204382 (99.81 MiB 104.64 MB)
This is with:
commit 6acad4811b06335a2602fa1eeaec3a8f47f96591
Author: Michael Evan <mjevans1983@gmail.com>
Date: Wed Dec 9 21:52:18 2009 -0800
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-19 4:42 ` Dan Williams
@ 2010-01-19 5:31 ` Doug Ledford
2010-01-19 5:47 ` Dan Williams
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-01-19 5:31 UTC (permalink / raw)
To: Dan Williams, Linux RAID Mailing List
[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]
On 01/18/2010 11:42 PM, Dan Williams wrote:
> On Mon, Jan 18, 2010 at 6:55 PM, Doug Ledford <dledford@redhat.com> wrote:
>> On 01/18/2010 05:13 PM, Dan Williams wrote:
>>> Hi Doug,
>>>
>>> On Mon, Jan 11, 2010 at 1:38 PM, Doug Ledford <dledford@redhat.com> wrote:
>>>> Signed-off-by: Doug Ledford <dledford@redhat.com>
>>>> ---
>>>> super-intel.c | 5 ++++-
>>>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/super-intel.c b/super-intel.c
>>>> index d6951cc..fcf438c 100644
>>>> --- a/super-intel.c
>>>> +++ b/super-intel.c
>>>> @@ -3208,7 +3208,10 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
>>>> dd->fd = fd;
>>>> dd->e = NULL;
>>>> rv = imsm_read_serial(fd, devname, dd->serial);
>>>> - if (rv) {
>>>> + if (rv && check_env("IMSM_DEVNAME_AS_SERIAL")) {
>>>> + memset(dd->serial, 0, MAX_RAID_SERIAL_LEN);
>>>> + fd2devname(fd, (char *) dd->serial);
>>>> + } else if (rv) {
>>>
>>> This just duplicates the check already inside imsm_read_serial().
>>> Containers on loopback devices worked before this patch, so I'll send
>>> a revert.
>>>
>>> --
>>> Dan
>>
>> Me thinks you didn't try it, because this does not duplicate the code in
>> imsm_read_serial(). That code is needed to assemble an IMSM array that
>> already exists on loopback devices. This is needed to *create* an imsm
>> container on fresh loopback devices. I'm assuming your imsm container
>> superblocks already existed or some such.
>>
>
> Me thinks you did not try it either :-)
>
> # export IMSM_DEVNAME_AS_SERIAL=1
> # mdadm --zero-superblock /dev/loop[0-3]
> # mdadm -Eb /dev/loop[0-4]
> # mdadm -E /dev/loop0
> mdadm: No md superblock detected on /dev/loop0.
> # mdadm --create /dev/md/imsm /dev/loop[0-3] -n 4 -e imsm
> mdadm: /dev/loop0 appears to contain an ext2fs file system
> size=306816K mtime=Sat Nov 21 10:54:51 2009
> mdadm: /dev/loop1 appears to contain an ext2fs file system
> size=306816K mtime=Sat Nov 21 10:54:51 2009
> mdadm: imsm unable to enumerate platform support
> array may not be compatible with hardware/firmware
> Continue creating array? y
> mdadm: container /dev/md/imsm prepared.
> # mdadm -Eb /dev/loop[0-3]
> ARRAY metadata=imsm
> spares=4
>
> # mdadm -E /dev/loop0
> /dev/loop0:
> Magic : Intel Raid ISM Cfg Sig.
> Version : 1.0.00
> Orig Family : 00000000
> Family : 697a43ec
> Generation : 00000001
> UUID : ffffffff:ffffffff:ffffffff:ffffffff
> Checksum : c3d8d367 correct
> MPB Sectors : 1
> Disks : 1
> RAID Devices : 0
>
> Disk00 Serial : /dev/loop0
> State : spare
> Id : 00000000
> Usable Size : 204382 (99.81 MiB 104.64 MB)
>
> This is with:
> commit 6acad4811b06335a2602fa1eeaec3a8f47f96591
> Author: Michael Evan <mjevans1983@gmail.com>
> Date: Wed Dec 9 21:52:18 2009 -0800
>
> --
> Dan
Ah, OK. I did say we had been carrying this around in our SRPM for a
while, I just hadn't tried removing it since it was necessary. I take
it you are implying that that changeset is the one that rendered it no
longer necessary?
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries.
2010-01-19 5:31 ` Doug Ledford
@ 2010-01-19 5:47 ` Dan Williams
0 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2010-01-19 5:47 UTC (permalink / raw)
To: Doug Ledford; +Cc: Linux RAID Mailing List
On Mon, Jan 18, 2010 at 10:31 PM, Doug Ledford <dledford@redhat.com> wrote:
> Ah, OK. I did say we had been carrying this around in our SRPM for a
> while, I just hadn't tried removing it since it was necessary.
No worries, I suspected as much.
> I take
> it you are implying that that changeset is the one that rendered it no
> longer necessary?
Nah, this is just a recent point before Neil applied this patch. I
did a quick look for the commit that fixed this, but nothing popped
out.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-01-18 22:09 ` Neil Brown
@ 2010-01-19 7:21 ` Luca Berra
2010-01-19 17:51 ` Doug Ledford
1 sibling, 0 replies; 66+ messages in thread
From: Luca Berra @ 2010-01-19 7:21 UTC (permalink / raw)
To: linux-raid
On Tue, Jan 19, 2010 at 11:09:30AM +1300, Neil Brown wrote:
>On Mon, 11 Jan 2010 15:38:11 -0500
>Doug Ledford <dledford@redhat.com> wrote:
>
>> Signed-off-by: Doug Ledford <dledford@redhat.com>
>
>I really really don't like this.
>I wasn't very keen on allowing the map file to be found in /dev,
>but this it just too ugly.
>
>I understand there is a problem here, but I don't like this approach to a
>solution. I'll give it more though when I get home from LCA2010 and see
>what I can come up with.
>
I'll try holding my breath till then :)
well, actually i'll use Doug's patch until a better solution is found.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-01-18 22:09 ` Neil Brown
2010-01-19 7:21 ` Luca Berra
@ 2010-01-19 17:51 ` Doug Ledford
2010-02-01 20:32 ` Bill Davidsen
2010-02-04 6:40 ` Neil Brown
1 sibling, 2 replies; 66+ messages in thread
From: Doug Ledford @ 2010-01-19 17:51 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3211 bytes --]
On 01/18/2010 05:09 PM, Neil Brown wrote:
> On Mon, 11 Jan 2010 15:38:11 -0500
> Doug Ledford <dledford@redhat.com> wrote:
>
>> Signed-off-by: Doug Ledford <dledford@redhat.com>
>
> I really really don't like this.
> I wasn't very keen on allowing the map file to be found in /dev,
> but this it just too ugly.
I've had to rewrite my response to this a few times :-/
So, let's be clear: you are objecting to these non device special files
being located under /dev. Not necessarily *where* they are under /dev,
just that they are under /dev at all. That's what I get from your
statement above.
First with devfs, then later with udev, the old unix tradition of only
device special files under /dev is truly dead. And it should be. The
files we are creating are needed prior to / filesystem bring up, and
they are needed simply in order to fully populate /dev. In fact, an
argument can be made that a new tradition, that files related to the
creation and maintenance of device special files belong under /dev with
the files they relate to, has been created. And this new tradition
makes sense and is elegant on the basis that it requires only one
read/write filesystem mount point during device special file population.
It also makes sense that this new tradition would supersede the old
tradition on the basis that the old tradition was created prior to the
advent of hot plug and the need to have any read/write data just to
populate your device special files. The old tradition didn't have the
flexibility to deal with modern hot plug architectures, the new
tradition fixes that, and does so as elegantly as possible.
That being the case, the big player in the game, udev, is following the
new tradition by creating an entire tree of non device special files
under /dev/.udev and using that to store the information it needs. And
here mdadm/mdmon are, the small players in the device bring up game that
only have minor bit parts compared to udev, holding up progress and
playing the recalcitrant old fart. Sorry Neil, but the war has already
been decided and this is a dead battle. Files related to device special
file bring up belong under /dev along with the files we are creating.
Your claim that these changes are ugly are misplaced and based upon
adherence to a dead tradition that has been replaced by a more sensible
tradition. Maybe you don't like where they are under /dev, but the fact
that they are under /dev is definitely the right thing to do and is not
in the least bit ugly.
> I understand there is a problem here, but I don't like this approach to a
> solution. I'll give it more though when I get home from LCA2010 and see
> what I can come up with.
Feel free to come up with something different. But, if your solution
involves maintaining an additional read/write mount area in deference to
a long dead unix tradition, I'm just going to shake my head and patch
your solution away to something sane.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-01-19 17:51 ` Doug Ledford
@ 2010-02-01 20:32 ` Bill Davidsen
2010-02-01 21:32 ` Doug Ledford
2010-02-04 6:40 ` Neil Brown
1 sibling, 1 reply; 66+ messages in thread
From: Bill Davidsen @ 2010-02-01 20:32 UTC (permalink / raw)
To: Doug Ledford; +Cc: Neil Brown, linux-raid
Doug Ledford wrote:
> On 01/18/2010 05:09 PM, Neil Brown wrote:
>
>
>> I understand there is a problem here, but I don't like this approach to a
>> solution. I'll give it more though when I get home from LCA2010 and see
>> what I can come up with.
>>
>
> Feel free to come up with something different. But, if your solution
> involves maintaining an additional read/write mount area in deference to
> a long dead unix tradition, I'm just going to shake my head and patch
> your solution away to something sane.
>
>
I don't understand you argument here. Not the one where you say you're
going to ignore Neil and do what you want because you can, I understand
that, but the "additional read/write mount area" part, isn't /var/run
r/w on all systems now? Could you clarify why this is "additional" here?
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-01 20:32 ` Bill Davidsen
@ 2010-02-01 21:32 ` Doug Ledford
2010-02-01 22:42 ` Bill Davidsen
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-01 21:32 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]
On 02/01/2010 03:32 PM, Bill Davidsen wrote:
> Doug Ledford wrote:
>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>
>>> I understand there is a problem here, but I don't like this approach
>>> to a
>>> solution. I'll give it more though when I get home from LCA2010 and see
>>> what I can come up with.
>>>
>>
>> Feel free to come up with something different. But, if your solution
>> involves maintaining an additional read/write mount area in deference to
>> a long dead unix tradition, I'm just going to shake my head and patch
>> your solution away to something sane.
>>
>>
> I don't understand you argument here. Not the one where you say you're
> going to ignore Neil and do what you want because you can, I understand
> that, but the "additional read/write mount area" part, isn't /var/run
> r/w on all systems now? Could you clarify why this is "additional" here?
>
It's not necessarily read/write in the initrd time frame, and putting
the mdadm files there means it would have to be. We didn't make these
changes because we wanted to, we made them because using mdadm raid
arrays for the root filesystem combined with incremental assembly or
with imsm raid devices was broken otherwise.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-01 21:32 ` Doug Ledford
@ 2010-02-01 22:42 ` Bill Davidsen
2010-02-02 4:08 ` Michael Evans
2010-02-02 18:07 ` Doug Ledford
0 siblings, 2 replies; 66+ messages in thread
From: Bill Davidsen @ 2010-02-01 22:42 UTC (permalink / raw)
To: Doug Ledford; +Cc: Neil Brown, linux-raid
Doug Ledford wrote:
> On 02/01/2010 03:32 PM, Bill Davidsen wrote:
>
>> Doug Ledford wrote:
>>
>>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>>
>>>
>>>> I understand there is a problem here, but I don't like this approach
>>>> to a
>>>> solution. I'll give it more though when I get home from LCA2010 and see
>>>> what I can come up with.
>>>>
>>>>
>>> Feel free to come up with something different. But, if your solution
>>> involves maintaining an additional read/write mount area in deference to
>>> a long dead unix tradition, I'm just going to shake my head and patch
>>> your solution away to something sane.
>>>
>>>
>>>
>> I don't understand you argument here. Not the one where you say you're
>> going to ignore Neil and do what you want because you can, I understand
>> that, but the "additional read/write mount area" part, isn't /var/run
>> r/w on all systems now? Could you clarify why this is "additional" here?
>>
>>
>
> It's not necessarily read/write in the initrd time frame, and putting
> the mdadm files there means it would have to be. We didn't make these
> changes because we wanted to, we made them because using mdadm raid
> arrays for the root filesystem combined with incremental assembly or
> with imsm raid devices was broken otherwise.
>
>
Do understand that my disquiet related to this isn't because you put a
non-device in /dev, it's that you
didn't put a process PID in /var/run. And frankly, once you let (force)
one group of threads to be somewhere
else, other services will want their PIDs some other place, and anyone
maintaining an application
which presents information on what's running will need to know where
that information.
In other words, it's not where you put it, it's where you *didn't* put
it, that seems to be an
invitation to put stuff just anywhere. Neil argues that they are not
devices, I argue that
they are PIDs. It's not as though it were a huge effort to move it after
pivot root, it's a little code
or script and in space which will be released.
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-01 22:42 ` Bill Davidsen
@ 2010-02-02 4:08 ` Michael Evans
2010-02-02 7:17 ` Luca Berra
` (2 more replies)
2010-02-02 18:07 ` Doug Ledford
1 sibling, 3 replies; 66+ messages in thread
From: Michael Evans @ 2010-02-02 4:08 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Doug Ledford, Neil Brown, linux-raid
On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote:
> Doug Ledford wrote:
>>
>> On 02/01/2010 03:32 PM, Bill Davidsen wrote:
>>
>>>
>>> Doug Ledford wrote:
>>>
>>>>
>>>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>>>
>>>>>
>>>>> I understand there is a problem here, but I don't like this approach
>>>>> to a
>>>>> solution. I'll give it more though when I get home from LCA2010 and
>>>>> see
>>>>> what I can come up with.
>>>>>
>>>>
>>>> Feel free to come up with something different. But, if your solution
>>>> involves maintaining an additional read/write mount area in deference to
>>>> a long dead unix tradition, I'm just going to shake my head and patch
>>>> your solution away to something sane.
>>>>
>>>>
>>>
>>> I don't understand you argument here. Not the one where you say you're
>>> going to ignore Neil and do what you want because you can, I understand
>>> that, but the "additional read/write mount area" part, isn't /var/run
>>> r/w on all systems now? Could you clarify why this is "additional" here?
>>>
>>>
>>
>> It's not necessarily read/write in the initrd time frame, and putting
>> the mdadm files there means it would have to be. We didn't make these
>> changes because we wanted to, we made them because using mdadm raid
>> arrays for the root filesystem combined with incremental assembly or
>> with imsm raid devices was broken otherwise.
>>
>>
>
> Do understand that my disquiet related to this isn't because you put a
> non-device in /dev, it's that you
> didn't put a process PID in /var/run. And frankly, once you let (force) one
> group of threads to be somewhere
> else, other services will want their PIDs some other place, and anyone
> maintaining an application
> which presents information on what's running will need to know where that
> information.
>
> In other words, it's not where you put it, it's where you *didn't* put it,
> that seems to be an
> invitation to put stuff just anywhere. Neil argues that they are not
> devices, I argue that
> they are PIDs. It's not as though it were a huge effort to move it after
> pivot root, it's a little code
> or script and in space which will be released.
>
> --
> Bill Davidsen <davidsen@tmr.com>
> "We can't solve today's problems by using the same thinking we
> used in creating them." - Einstein
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Thank you for stating your concern; I think knowing that a very
plausible solution is obvious.
# at initrd/initramfs creation time
ln -s /dev/.run /var/run
#initrd/initramfs script
mkdir /dev/.run
The usual area becomes a symlink to a memory disk .Most systems have
ample memory to support a few extra tiny files there. Cleanup on
reboot is automatic. Any systems that are memory constrained probably
already either have a drive they could swap this data out to, or would
rather save the writes from reaching flash media anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-02 4:08 ` Michael Evans
@ 2010-02-02 7:17 ` Luca Berra
2010-02-02 15:42 ` Bill Davidsen
2010-02-02 18:11 ` Doug Ledford
2 siblings, 0 replies; 66+ messages in thread
From: Luca Berra @ 2010-02-02 7:17 UTC (permalink / raw)
To: linux-raid; +Cc: initramfs
Ccing intramfs since it could be of interest
background:
Doug's patch moves mdmon pid file and socket from /var/run to /dev
to preserve them after pivot-root
Rationale is mdmon gets started in initramfs when imsm or ddf arrays are
activated.
Neil does not like the proposed solution
On Mon, Feb 01, 2010 at 08:08:54PM -0800, Michael Evans wrote:
>On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote:
>> Doug Ledford wrote:
>>>
>>> On 02/01/2010 03:32 PM, Bill Davidsen wrote:
>>>
>>>>
>>>> Doug Ledford wrote:
>>>>
>>>>>
>>>>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>>>>
>>>>>>
>>>>>> I understand there is a problem here, but I don't like this approach
>>>>>> to a
>>>>>> solution. I'll give it more though when I get home from LCA2010 and
>>>>>> see
>>>>>> what I can come up with.
>>>>>>
>>>>>
>>>>> Feel free to come up with something different. But, if your solution
>>>>> involves maintaining an additional read/write mount area in deference to
>>>>> a long dead unix tradition, I'm just going to shake my head and patch
>>>>> your solution away to something sane.
>>>>>
>>>>>
>>>>
>>>> I don't understand you argument here. Not the one where you say you're
>>>> going to ignore Neil and do what you want because you can, I understand
>>>> that, but the "additional read/write mount area" part, isn't /var/run
>>>> r/w on all systems now? Could you clarify why this is "additional" here?
>>>>
>>>>
>>>
>>> It's not necessarily read/write in the initrd time frame, and putting
>>> the mdadm files there means it would have to be. We didn't make these
>>> changes because we wanted to, we made them because using mdadm raid
>>> arrays for the root filesystem combined with incremental assembly or
>>> with imsm raid devices was broken otherwise.
>>>
>>>
>>
>> Do understand that my disquiet related to this isn't because you put a
>> non-device in /dev, it's that you
>> didn't put a process PID in /var/run. And frankly, once you let (force) one
>> group of threads to be somewhere
>> else, other services will want their PIDs some other place, and anyone
>> maintaining an application
>> which presents information on what's running will need to know where that
>> information.
>>
>> In other words, it's not where you put it, it's where you *didn't* put it,
>> that seems to be an
>> invitation to put stuff just anywhere. Neil argues that they are not
>> devices, I argue that
>> they are PIDs. It's not as though it were a huge effort to move it after
>> pivot root, it's a little code
>> or script and in space which will be released.
>>
>> --
>> Bill Davidsen <davidsen@tmr.com>
>> "We can't solve today's problems by using the same thinking we
>> used in creating them." - Einstein
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>Thank you for stating your concern; I think knowing that a very
>plausible solution is obvious.
>
># at initrd/initramfs creation time
>ln -s /dev/.run /var/run
>
>#initrd/initramfs script
>mkdir /dev/.run
>
>The usual area becomes a symlink to a memory disk .Most systems have
>ample memory to support a few extra tiny files there. Cleanup on
>reboot is automatic. Any systems that are memory constrained probably
>already either have a drive they could swap this data out to, or would
>rather save the writes from reaching flash media anyway.
this could be interesting, but then you have to move things back to
/var/run after pivot root, and we cannot move a socket.
still if it would suit both parties we could
- keep the mdmon pid file in /var/run and use initramfs magik to preserve
those pid files, this could become a standard solution when the next
daemon arrives that needs to start from initramfs.
- put the socket in /dev/md, and i defy anyone to say sockets do not belong
in /dev, bringing forth the syslog daemon as a witness.
Regards,
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-02 4:08 ` Michael Evans
2010-02-02 7:17 ` Luca Berra
@ 2010-02-02 15:42 ` Bill Davidsen
2010-02-02 18:19 ` Doug Ledford
2010-02-02 18:11 ` Doug Ledford
2 siblings, 1 reply; 66+ messages in thread
From: Bill Davidsen @ 2010-02-02 15:42 UTC (permalink / raw)
To: Michael Evans; +Cc: Doug Ledford, Neil Brown, linux-raid
Michael Evans wrote:
> On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote:
>
>> Doug Ledford wrote:
>>
>>> On 02/01/2010 03:32 PM, Bill Davidsen wrote:
>>>
>>>
>>>> Doug Ledford wrote:
>>>>
>>>>
>>>>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>>>>
>>>>>
>>>>>> I understand there is a problem here, but I don't like this approach
>>>>>> to a
>>>>>> solution. I'll give it more though when I get home from LCA2010 and
>>>>>> see
>>>>>> what I can come up with.
>>>>>>
>>>>>>
>>>>> Feel free to come up with something different. But, if your solution
>>>>> involves maintaining an additional read/write mount area in deference to
>>>>> a long dead unix tradition, I'm just going to shake my head and patch
>>>>> your solution away to something sane.
>>>>>
>>>>>
>>>>>
>>>> I don't understand you argument here. Not the one where you say you're
>>>> going to ignore Neil and do what you want because you can, I understand
>>>> that, but the "additional read/write mount area" part, isn't /var/run
>>>> r/w on all systems now? Could you clarify why this is "additional" here?
>>>>
>>>>
>>>>
>>> It's not necessarily read/write in the initrd time frame, and putting
>>> the mdadm files there means it would have to be. We didn't make these
>>> changes because we wanted to, we made them because using mdadm raid
>>> arrays for the root filesystem combined with incremental assembly or
>>> with imsm raid devices was broken otherwise.
>>>
>>>
>>>
>> Do understand that my disquiet related to this isn't because you put a
>> non-device in /dev, it's that you
>> didn't put a process PID in /var/run. And frankly, once you let (force) one
>> group of threads to be somewhere
>> else, other services will want their PIDs some other place, and anyone
>> maintaining an application
>> which presents information on what's running will need to know where that
>> information.
>>
>> In other words, it's not where you put it, it's where you *didn't* put it,
>> that seems to be an
>> invitation to put stuff just anywhere. Neil argues that they are not
>> devices, I argue that
>> they are PIDs. It's not as though it were a huge effort to move it after
>> pivot root, it's a little code
>> or script and in space which will be released.
>>
>> --
>> Bill Davidsen <davidsen@tmr.com>
>> "We can't solve today's problems by using the same thinking we
>> used in creating them." - Einstein
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
> Thank you for stating your concern; I think knowing that a very
> plausible solution is obvious.
>
> # at initrd/initramfs creation time
> ln -s /dev/.run /var/run
>
> #initrd/initramfs script
> mkdir /dev/.run
>
> The usual area becomes a symlink to a memory disk .Most systems have
> ample memory to support a few extra tiny files there. Cleanup on
> reboot is automatic. Any systems that are memory constrained probably
> already either have a drive they could swap this data out to, or would
> rather save the writes from reaching flash media anyway.
>
>
The only possible side effect of that is that applications which put
information in /var/run/subdir would have to create the subdir at run
time rather than at the time of installing the application. And looking
at my /var/run directory many applications do seem to have
subdirectories in /var/run which were created when the applications were
installed. I count 31 on this system, a quick check on other systems
reveals up to 41 and 14-24 of those directories have not been used since
the system was installed. That is, the applications have never been run.
Does it really make sense to force modification of every application
which installs a subdirectory in /var/run, and incur the overhead in
each of those applications of checking for the directory and creating it
if missing, as opposed to a single line in an init script to copy the
boot time PID files from /dev to /var/run? It seems as if a lot of work
and overhead is being generated for the applications, just to save a
tiny bit of work for the people implementing a new boot procedure.
(cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir /dev/.run)
Not only would this need a change in Fedora packages, but anyone writing
a package for Linux in general would have to do it "the Fedora way" and
even though Fedora is popular, I think some applications would choose to
avoid the overhead and need ugly hacks in rc.local to create the
directories at boot.
All in all, I think the overhead belongs in the boot process, not all
the existing applications.
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-01 22:42 ` Bill Davidsen
2010-02-02 4:08 ` Michael Evans
@ 2010-02-02 18:07 ` Doug Ledford
2010-02-02 18:18 ` Bill Davidsen
1 sibling, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-02 18:07 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 799 bytes --]
On 02/01/2010 05:42 PM, Bill Davidsen wrote:
> In other words, it's not where you put it, it's where you *didn't* put
> it, that seems to be an
> invitation to put stuff just anywhere. Neil argues that they are not
> devices, I argue that
> they are PIDs. It's not as though it were a huge effort to move it after
> pivot root, it's a little code
> or script and in space which will be released.
On the pid files I see your point. Not that it changes the problem, but
it does at least point out that more needs to be done and that the
current solution is incomplete.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-02 4:08 ` Michael Evans
2010-02-02 7:17 ` Luca Berra
2010-02-02 15:42 ` Bill Davidsen
@ 2010-02-02 18:11 ` Doug Ledford
2 siblings, 0 replies; 66+ messages in thread
From: Doug Ledford @ 2010-02-02 18:11 UTC (permalink / raw)
To: Michael Evans; +Cc: Bill Davidsen, Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 4263 bytes --]
On 02/01/2010 11:08 PM, Michael Evans wrote:
> On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote:
>> Doug Ledford wrote:
>>>
>>> On 02/01/2010 03:32 PM, Bill Davidsen wrote:
>>>
>>>>
>>>> Doug Ledford wrote:
>>>>
>>>>>
>>>>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>>>>
>>>>>>
>>>>>> I understand there is a problem here, but I don't like this approach
>>>>>> to a
>>>>>> solution. I'll give it more though when I get home from LCA2010 and
>>>>>> see
>>>>>> what I can come up with.
>>>>>>
>>>>>
>>>>> Feel free to come up with something different. But, if your solution
>>>>> involves maintaining an additional read/write mount area in deference to
>>>>> a long dead unix tradition, I'm just going to shake my head and patch
>>>>> your solution away to something sane.
>>>>>
>>>>>
>>>>
>>>> I don't understand you argument here. Not the one where you say you're
>>>> going to ignore Neil and do what you want because you can, I understand
>>>> that, but the "additional read/write mount area" part, isn't /var/run
>>>> r/w on all systems now? Could you clarify why this is "additional" here?
>>>>
>>>>
>>>
>>> It's not necessarily read/write in the initrd time frame, and putting
>>> the mdadm files there means it would have to be. We didn't make these
>>> changes because we wanted to, we made them because using mdadm raid
>>> arrays for the root filesystem combined with incremental assembly or
>>> with imsm raid devices was broken otherwise.
>>>
>>>
>>
>> Do understand that my disquiet related to this isn't because you put a
>> non-device in /dev, it's that you
>> didn't put a process PID in /var/run. And frankly, once you let (force) one
>> group of threads to be somewhere
>> else, other services will want their PIDs some other place, and anyone
>> maintaining an application
>> which presents information on what's running will need to know where that
>> information.
>>
>> In other words, it's not where you put it, it's where you *didn't* put it,
>> that seems to be an
>> invitation to put stuff just anywhere. Neil argues that they are not
>> devices, I argue that
>> they are PIDs. It's not as though it were a huge effort to move it after
>> pivot root, it's a little code
>> or script and in space which will be released.
>>
>> --
>> Bill Davidsen <davidsen@tmr.com>
>> "We can't solve today's problems by using the same thinking we
>> used in creating them." - Einstein
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Thank you for stating your concern; I think knowing that a very
> plausible solution is obvious.
>
> # at initrd/initramfs creation time
> ln -s /dev/.run /var/run
>
> #initrd/initramfs script
> mkdir /dev/.run
>
> The usual area becomes a symlink to a memory disk .Most systems have
> ample memory to support a few extra tiny files there. Cleanup on
> reboot is automatic. Any systems that are memory constrained probably
> already either have a drive they could swap this data out to, or would
> rather save the writes from reaching flash media anyway.
It's highly likely that mdmon would need its own directory, so ln -s
/dev/.mdmon /var/run/mdmon would be more suitable. This is due to
SELinux contexts. I know mdadm needed /var/run/mdadm so that in monitor
mode with strong SELinux enabled it could have its own private context
and that context could then be given the permissions it needed (create
dev file, access sendmail, etc., a number of these actions are the very
type of actions that programs do when compromised so part of strong
security is only granting those perms to programs that legitimately need
them). Mdmon may not need the same perms that mdadm did, but I still
wouldn't be surprised if it needs its own context/directory due to the
relative danger of handing out raw disk access.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-02 18:07 ` Doug Ledford
@ 2010-02-02 18:18 ` Bill Davidsen
0 siblings, 0 replies; 66+ messages in thread
From: Bill Davidsen @ 2010-02-02 18:18 UTC (permalink / raw)
To: Doug Ledford; +Cc: Neil Brown, linux-raid
Doug Ledford wrote:
> On 02/01/2010 05:42 PM, Bill Davidsen wrote:
>
>
>> In other words, it's not where you put it, it's where you *didn't* put
>> it, that seems to be an
>> invitation to put stuff just anywhere. Neil argues that they are not
>> devices, I argue that
>> they are PIDs. It's not as though it were a huge effort to move it after
>> pivot root, it's a little code
>> or script and in space which will be released.
>>
>
> On the pid files I see your point. Not that it changes the problem, but
> it does at least point out that more needs to be done and that the
> current solution is incomplete.
>
>
Good, and your point about context and such is well taken. I still feel
that the way to
solve this would be to copy *just* the PID files to a real /var/run and
any directories
being created would have default permissions. It just feels as if
there's a single point
at which this could be done. And of course "if you broke it, you should
fix it." That's
a popular thing Linus said, but the principle dates back to MULTICS, at
least (70's).
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-02 15:42 ` Bill Davidsen
@ 2010-02-02 18:19 ` Doug Ledford
2010-02-04 13:50 ` Bernd Schubert
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-02 18:19 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Michael Evans, Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 6481 bytes --]
On 02/02/2010 10:42 AM, Bill Davidsen wrote:
> Michael Evans wrote:
>> On Mon, Feb 1, 2010 at 2:42 PM, Bill Davidsen <davidsen@tmr.com> wrote:
>>
>>> Doug Ledford wrote:
>>>
>>>> On 02/01/2010 03:32 PM, Bill Davidsen wrote:
>>>>
>>>>
>>>>> Doug Ledford wrote:
>>>>>
>>>>>
>>>>>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>>>>>
>>>>>>
>>>>>>> I understand there is a problem here, but I don't like this approach
>>>>>>> to a
>>>>>>> solution. I'll give it more though when I get home from LCA2010 and
>>>>>>> see
>>>>>>> what I can come up with.
>>>>>>>
>>>>>>>
>>>>>> Feel free to come up with something different. But, if your solution
>>>>>> involves maintaining an additional read/write mount area in
>>>>>> deference to
>>>>>> a long dead unix tradition, I'm just going to shake my head and patch
>>>>>> your solution away to something sane.
>>>>>>
>>>>>>
>>>>>>
>>>>> I don't understand you argument here. Not the one where you say you're
>>>>> going to ignore Neil and do what you want because you can, I
>>>>> understand
>>>>> that, but the "additional read/write mount area" part, isn't /var/run
>>>>> r/w on all systems now? Could you clarify why this is "additional"
>>>>> here?
>>>>>
>>>>>
>>>>>
>>>> It's not necessarily read/write in the initrd time frame, and putting
>>>> the mdadm files there means it would have to be. We didn't make these
>>>> changes because we wanted to, we made them because using mdadm raid
>>>> arrays for the root filesystem combined with incremental assembly or
>>>> with imsm raid devices was broken otherwise.
>>>>
>>>>
>>>>
>>> Do understand that my disquiet related to this isn't because you put a
>>> non-device in /dev, it's that you
>>> didn't put a process PID in /var/run. And frankly, once you let
>>> (force) one
>>> group of threads to be somewhere
>>> else, other services will want their PIDs some other place, and anyone
>>> maintaining an application
>>> which presents information on what's running will need to know where
>>> that
>>> information.
>>>
>>> In other words, it's not where you put it, it's where you *didn't*
>>> put it,
>>> that seems to be an
>>> invitation to put stuff just anywhere. Neil argues that they are not
>>> devices, I argue that
>>> they are PIDs. It's not as though it were a huge effort to move it after
>>> pivot root, it's a little code
>>> or script and in space which will be released.
>>>
>>> --
>>> Bill Davidsen <davidsen@tmr.com>
>>> "We can't solve today's problems by using the same thinking we
>>> used in creating them." - Einstein
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>> Thank you for stating your concern; I think knowing that a very
>> plausible solution is obvious.
>>
>> # at initrd/initramfs creation time
>> ln -s /dev/.run /var/run
>>
>> #initrd/initramfs script
>> mkdir /dev/.run
>>
>> The usual area becomes a symlink to a memory disk .Most systems have
>> ample memory to support a few extra tiny files there. Cleanup on
>> reboot is automatic. Any systems that are memory constrained probably
>> already either have a drive they could swap this data out to, or would
>> rather save the writes from reaching flash media anyway.
>>
>>
> The only possible side effect of that is that applications which put
> information in /var/run/subdir would have to create the subdir at run
> time rather than at the time of installing the application. And looking
> at my /var/run directory many applications do seem to have
> subdirectories in /var/run which were created when the applications were
> installed. I count 31 on this system, a quick check on other systems
> reveals up to 41 and 14-24 of those directories have not been used since
> the system was installed. That is, the applications have never been run.
>
> Does it really make sense to force modification of every application
> which installs a subdirectory in /var/run, and incur the overhead in
> each of those applications of checking for the directory and creating it
> if missing, as opposed to a single line in an init script to copy the
> boot time PID files from /dev to /var/run?
No.
> It seems as if a lot of work
> and overhead is being generated for the applications, just to save a
> tiny bit of work for the people implementing a new boot procedure.
>
> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir /dev/.run)
I'm not sure I would do this either. While moving the file is possible,
mdmon is actually intended to be run longer than the /var/run filesystem
might be read/write. I think I would leave the mdmon files in /dev
somewhere and link to them from /var/run.
> Not only would this need a change in Fedora packages, but anyone writing
> a package for Linux in general would have to do it "the Fedora way" and
> even though Fedora is popular, I think some applications would choose to
> avoid the overhead and need ugly hacks in rc.local to create the
> directories at boot.
>
> All in all, I think the overhead belongs in the boot process, not all
> the existing applications.
It doesn't have to exist either place. We just need a set, accepted way
to handle the problem. At this point I'm inclined to suggest that we
use /dev/md/.mdadm and /dev/md/.mdmon for the respective files for each
application (such as /dev/md/.mdadm/mdadm.map and /dev/md/.mdmon/*.pid)
and we use static symbolic links in the rpm/deb package to point from
/var/run to those two directories. The boot process doesn't have to be
changed, utilities don't have to be changed, only the rpm/deb package
needs updated to include the link and both mdmon and mdadm modified to
create their respective directories if they don't exist already and put
their files in those directories. That's it. Well, I'd have to get Dan
Walsh to update the SELinux rules for mdadm too since the real directory
location would change. But still, relatively painless stuff.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-01-19 17:51 ` Doug Ledford
2010-02-01 20:32 ` Bill Davidsen
@ 2010-02-04 6:40 ` Neil Brown
2010-02-04 18:45 ` Doug Ledford
1 sibling, 1 reply; 66+ messages in thread
From: Neil Brown @ 2010-02-04 6:40 UTC (permalink / raw)
To: Doug Ledford
Cc: linux-raid, initramfs, Dan Williams, martin f krafft,
Michal Marek
[cc:ing initramfs because anther part of this thread was already
cc:ed there, but this is the one I wanted to reply to.
cc:ed to various md/mdadm maintainers too]
On Tue, 19 Jan 2010 12:51:52 -0500
Doug Ledford <dledford@redhat.com> wrote:
> On 01/18/2010 05:09 PM, Neil Brown wrote:
> > On Mon, 11 Jan 2010 15:38:11 -0500
> > Doug Ledford <dledford@redhat.com> wrote:
> >
> >> Signed-off-by: Doug Ledford <dledford@redhat.com>
> >
> > I really really don't like this.
> > I wasn't very keen on allowing the map file to be found in /dev,
> > but this it just too ugly.
>
> I've had to rewrite my response to this a few times :-/
>
> So, let's be clear: you are objecting to these non device special files
> being located under /dev. Not necessarily *where* they are under /dev,
> just that they are under /dev at all. That's what I get from your
> statement above.
>
> First with devfs, then later with udev, the old unix tradition of only
> device special files under /dev is truly dead. And it should be. The
> files we are creating are needed prior to / filesystem bring up, and
> they are needed simply in order to fully populate /dev. In fact, an
> argument can be made that a new tradition, that files related to the
> creation and maintenance of device special files belong under /dev with
> the files they relate to, has been created. And this new tradition
> makes sense and is elegant on the basis that it requires only one
> read/write filesystem mount point during device special file population.
> It also makes sense that this new tradition would supersede the old
> tradition on the basis that the old tradition was created prior to the
> advent of hot plug and the need to have any read/write data just to
> populate your device special files. The old tradition didn't have the
> flexibility to deal with modern hot plug architectures, the new
> tradition fixes that, and does so as elegantly as possible.
>
> That being the case, the big player in the game, udev, is following the
> new tradition by creating an entire tree of non device special files
> under /dev/.udev and using that to store the information it needs. And
> here mdadm/mdmon are, the small players in the device bring up game that
> only have minor bit parts compared to udev, holding up progress and
> playing the recalcitrant old fart. Sorry Neil, but the war has already
> been decided and this is a dead battle. Files related to device special
> file bring up belong under /dev along with the files we are creating.
> Your claim that these changes are ugly are misplaced and based upon
> adherence to a dead tradition that has been replaced by a more sensible
> tradition. Maybe you don't like where they are under /dev, but the fact
> that they are under /dev is definitely the right thing to do and is not
> in the least bit ugly.
>
> > I understand there is a problem here, but I don't like this approach to a
> > solution. I'll give it more though when I get home from LCA2010 and see
> > what I can come up with.
>
> Feel free to come up with something different. But, if your solution
> involves maintaining an additional read/write mount area in deference to
> a long dead unix tradition, I'm just going to shake my head and patch
> your solution away to something sane.
>
So I've had a good long think about this.
Your arguments about using /dev do have some merit. However they sound more
like post-hoc justification then genuine motivation.
If the train of thought went:
I need some files that are related to device management. Where shall I
put them? I know, I'll put them in /dev.
then it would be more convincing. But the logic actually went:
I need some files to persist from early boot through to when the system
has all basic filesystems mounted. Where shall I put them? I know, I'll
put them in /dev.
That sounds a lot less convincing.
Given that chain of thought I would be more likely to come to the conclusion
"I know, I'll put them in /lib/init/rw". Or at least I would on Debian -
I don't know that any non-Debian-derived distros support that directory.
The fact that Debian does have this directory and stores in there things that
are not related to devices suggests that there is a real need for "persists
from early boot" that does not fit in /dev. So if I put mdadm bits in /dev
just because I can then I am making the /proc mistake of valuing pragmatics
over elegance, and that is not a good long-term direction.
Your argument that "udev does it so it must be OK" is also fairly weak. I
would rather be a "recalcitrant old fart" than "wrong" any day.
The fact that udev uses "/dev/.udev" is already an admission of failure.
Prefixing a file name with '.' effectively says "I don't know where to put
this, and I know it doesn't really belong here, but I cannot think of
anything better so I'm going to do it anyway - shhh don't tell anyone".
If only the founding fathers had given us a $HOME/rc directory for all the
rc files we would be a lot better off.
But there is still a problem that needs to be solved.
mdmon needs to be running before any a certain class of md arrays (those with
user-space managed metadata) can be written to. Because some filesystems
choose to write to the device even when the filesystem is mounted read-only
(which should be a hanging offence, but isn't yet) we potentially need mdmon
running before the root filesystem is mounted.
Because we want to unmount and completely discard the filesystem that holds
the mdmon binary that was run early, we need to kill it and start a new one
running from final namespace. This is also needed as to a small extent the
filesystem is used to communicate between mdadm and a running mdmon, and
having them have the same root is less confusing.
There are three ways we can achieve this.
1/ If we can assume that between the time when the original "mount" completes
and when the "mount -o remount,rw" happens the filesystem doesn't write to
the device, then we can simply kill mdmon after the root is mounted, and
restart it before remounting. However I don't trust filesystem
implementers so I won't recommend that.
2/ Before the pivot root we can kill the old mdmon and start the new one
chrooted into the final root.
3/ After the pivot root we can kill the old mdmon and start the new one.
Number 2 is the approach that we (Well mostly Dan) originally intended and
that the code implements ... or tries to. It got broken and I never
noticed. I think I have fixed it now for 3.1.2.
However it requires that /var/run exists and is writeable during early boot.
I'm not sure that I am really comfortable requiring that. If the contents
of /var/run are not going to persist then it would be better if they didn't
exist. mdadm current relies on that non-existence for proper handing of the
"mapfile".
Number 3 would seem simplest except for the simple task of
finding out which process to kill, and how to wait for it to clean up and
die.
This is where the suggestion of putting some key files in /dev comes from.
If the mdmon pid file and socket were in /dev then a new mdmon would be able
to find them, signal the pid, and read on the socket until it got EOF
(because the other end was closed). If they aren't in /dev (or /lib/init/rw)
then it isn't possible to find them.
I could hunt through /proc to find the process called "mdmon" with the right
args, kill that, and wait until it has gone. But that is rather ugly and I
want to avoid "ugly".
A really key consideration here is to make it all really easy for the distro
package maintainers because debugging issues with early boot is really hard,
and the maintainers have all got more interesting things to do with their
time.
So while I could suggest that the above ugliness be put in a script if you
don't want to make /var/run persist from early boot (my preferred solution),
I'm not going to do that.
I think that what I will do is:
- the "official" homes for the pid and unix-domain-sock are in /var/run
(preferably /var/run/mdadm/ but Doug said something about needing
/var/run/mdmon/ to placate the monster that is SELinux - I need more
information about that).
When mdadm wants to communicate with mdmon it always looks there.
- There is an alternative home which is /lib/init/rw/mdadm/ by default,
but a 'make' option can easily change that if a distro wants to.
If I cannot access or mkdir /var/run/mdadm, I will mkdir /lib/init/rw/mdadm
to have some where to create files
- mdadm when run in the "take over from previous instance" mode will
look in /lib/init/rw/mdadm for the relevant .pid and .sock files if they
aren't in /var/run/mdadm
- mdmon.8 will list the various options with details.
So I get to maintain a Unix tradition which might still have some life it
after all, and Doug gets a very easy way to patch in his own version of
sanity.
(comments always welcome - I have made the changes described above and pushed
them to git://neil.brown.name/mdadm, but it isn't to late to change it
completely if that turns out to be best)
NeilBrown
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-02 18:19 ` Doug Ledford
@ 2010-02-04 13:50 ` Bernd Schubert
2010-02-04 15:03 ` Bernd Schubert
0 siblings, 1 reply; 66+ messages in thread
From: Bernd Schubert @ 2010-02-04 13:50 UTC (permalink / raw)
To: Doug Ledford; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid
On Tuesday 02 February 2010, Doug Ledford wrote:
> On 02/02/2010 10:42 AM, Bill Davidsen wrote:
> > Michael Evans wrote:
> > It seems as if a lot of work
> > and overhead is being generated for the applications, just to save a
> > tiny bit of work for the people implementing a new boot procedure.
> >
> > (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir
> > /dev/.run)
What about to use "mount --move" from /var/run of the initrams to final
/var/run before the chroot command?
Cheers,
Bernd
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 13:50 ` Bernd Schubert
@ 2010-02-04 15:03 ` Bernd Schubert
2010-02-04 15:48 ` Doug Ledford
0 siblings, 1 reply; 66+ messages in thread
From: Bernd Schubert @ 2010-02-04 15:03 UTC (permalink / raw)
To: Doug Ledford; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid
On Thursday 04 February 2010, Bernd Schubert wrote:
> On Tuesday 02 February 2010, Doug Ledford wrote:
> > On 02/02/2010 10:42 AM, Bill Davidsen wrote:
> > > Michael Evans wrote:
> > > It seems as if a lot of work
> > > and overhead is being generated for the applications, just to save a
> > > tiny bit of work for the people implementing a new boot procedure.
> > >
> > > (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir
> > > /dev/.run)
>
> What about to use "mount --move" from /var/run of the initrams to final
> /var/run before the chroot command?
Oops, I meant pivot_root.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 15:03 ` Bernd Schubert
@ 2010-02-04 15:48 ` Doug Ledford
2010-02-04 16:40 ` Bernd Schubert
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-04 15:48 UTC (permalink / raw)
To: Bernd Schubert; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 930 bytes --]
On 02/04/2010 10:03 AM, Bernd Schubert wrote:
> On Thursday 04 February 2010, Bernd Schubert wrote:
>> On Tuesday 02 February 2010, Doug Ledford wrote:
>>> On 02/02/2010 10:42 AM, Bill Davidsen wrote:
>>>> Michael Evans wrote:
>>>> It seems as if a lot of work
>>>> and overhead is being generated for the applications, just to save a
>>>> tiny bit of work for the people implementing a new boot procedure.
>>>>
>>>> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir
>>>> /dev/.run)
>>
>> What about to use "mount --move" from /var/run of the initrams to final
>> /var/run before the chroot command?
>
> Oops, I meant pivot_root.
Static files in /var/run would be lost that way.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 15:48 ` Doug Ledford
@ 2010-02-04 16:40 ` Bernd Schubert
2010-02-04 17:35 ` Doug Ledford
0 siblings, 1 reply; 66+ messages in thread
From: Bernd Schubert @ 2010-02-04 16:40 UTC (permalink / raw)
To: Doug Ledford; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid
On Thursday 04 February 2010, Doug Ledford wrote:
> On 02/04/2010 10:03 AM, Bernd Schubert wrote:
> > On Thursday 04 February 2010, Bernd Schubert wrote:
> >> On Tuesday 02 February 2010, Doug Ledford wrote:
> >>> On 02/02/2010 10:42 AM, Bill Davidsen wrote:
> >>>> Michael Evans wrote:
> >>>> It seems as if a lot of work
> >>>> and overhead is being generated for the applications, just to save a
> >>>> tiny bit of work for the people implementing a new boot procedure.
> >>>>
> >>>> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir
> >>>> /dev/.run)
> >>
> >> What about to use "mount --move" from /var/run of the initrams to final
> >> /var/run before the chroot command?
> >
> > Oops, I meant pivot_root.
>
> Static files in /var/run would be lost that way.
>
That should be easy using a simple subdir /var/run/mdadm and to mount --move
this.
Cheers,
Bernd
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 16:40 ` Bernd Schubert
@ 2010-02-04 17:35 ` Doug Ledford
0 siblings, 0 replies; 66+ messages in thread
From: Doug Ledford @ 2010-02-04 17:35 UTC (permalink / raw)
To: Bernd Schubert; +Cc: Bill Davidsen, Michael Evans, Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1297 bytes --]
On 02/04/2010 11:40 AM, Bernd Schubert wrote:
> On Thursday 04 February 2010, Doug Ledford wrote:
>> On 02/04/2010 10:03 AM, Bernd Schubert wrote:
>>> On Thursday 04 February 2010, Bernd Schubert wrote:
>>>> On Tuesday 02 February 2010, Doug Ledford wrote:
>>>>> On 02/02/2010 10:42 AM, Bill Davidsen wrote:
>>>>>> Michael Evans wrote:
>>>>>> It seems as if a lot of work
>>>>>> and overhead is being generated for the applications, just to save a
>>>>>> tiny bit of work for the people implementing a new boot procedure.
>>>>>>
>>>>>> (cd /dev .run && find . -depth | cpio -pdm /var/run; cd -; rmdir
>>>>>> /dev/.run)
>>>>
>>>> What about to use "mount --move" from /var/run of the initrams to final
>>>> /var/run before the chroot command?
>>>
>>> Oops, I meant pivot_root.
>>
>> Static files in /var/run would be lost that way.
>>
>
> That should be easy using a simple subdir /var/run/mdadm and to mount --move
> this.
Except that /var/run/mdadm doesn't even need moved. Only the mdmon
files need this. See my upcoming email for more details.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 6:40 ` Neil Brown
@ 2010-02-04 18:45 ` Doug Ledford
[not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-08 3:45 ` Neil Brown
0 siblings, 2 replies; 66+ messages in thread
From: Doug Ledford @ 2010-02-04 18:45 UTC (permalink / raw)
To: Neil Brown
Cc: linux-raid, initramfs, Dan Williams, martin f krafft,
Michal Marek, Hans de Goede, Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 20983 bytes --]
On 02/04/2010 01:40 AM, Neil Brown wrote:
>
> [cc:ing initramfs because anther part of this thread was already
> cc:ed there, but this is the one I wanted to reply to.
> cc:ed to various md/mdadm maintainers too]
>
> On Tue, 19 Jan 2010 12:51:52 -0500
> Doug Ledford <dledford@redhat.com> wrote:
>
>> On 01/18/2010 05:09 PM, Neil Brown wrote:
>>> On Mon, 11 Jan 2010 15:38:11 -0500
>>> Doug Ledford <dledford@redhat.com> wrote:
>>>
>>>> Signed-off-by: Doug Ledford <dledford@redhat.com>
>>>
>>> I really really don't like this.
>>> I wasn't very keen on allowing the map file to be found in /dev,
>>> but this it just too ugly.
>>
>> I've had to rewrite my response to this a few times :-/
>>
>> So, let's be clear: you are objecting to these non device special files
>> being located under /dev. Not necessarily *where* they are under /dev,
>> just that they are under /dev at all. That's what I get from your
>> statement above.
>>
>> First with devfs, then later with udev, the old unix tradition of only
>> device special files under /dev is truly dead. And it should be. The
>> files we are creating are needed prior to / filesystem bring up, and
>> they are needed simply in order to fully populate /dev. In fact, an
>> argument can be made that a new tradition, that files related to the
>> creation and maintenance of device special files belong under /dev with
>> the files they relate to, has been created. And this new tradition
>> makes sense and is elegant on the basis that it requires only one
>> read/write filesystem mount point during device special file population.
>> It also makes sense that this new tradition would supersede the old
>> tradition on the basis that the old tradition was created prior to the
>> advent of hot plug and the need to have any read/write data just to
>> populate your device special files. The old tradition didn't have the
>> flexibility to deal with modern hot plug architectures, the new
>> tradition fixes that, and does so as elegantly as possible.
>>
>> That being the case, the big player in the game, udev, is following the
>> new tradition by creating an entire tree of non device special files
>> under /dev/.udev and using that to store the information it needs. And
>> here mdadm/mdmon are, the small players in the device bring up game that
>> only have minor bit parts compared to udev, holding up progress and
>> playing the recalcitrant old fart. Sorry Neil, but the war has already
>> been decided and this is a dead battle. Files related to device special
>> file bring up belong under /dev along with the files we are creating.
>> Your claim that these changes are ugly are misplaced and based upon
>> adherence to a dead tradition that has been replaced by a more sensible
>> tradition. Maybe you don't like where they are under /dev, but the fact
>> that they are under /dev is definitely the right thing to do and is not
>> in the least bit ugly.
>>
>>> I understand there is a problem here, but I don't like this approach to a
>>> solution. I'll give it more though when I get home from LCA2010 and see
>>> what I can come up with.
>>
>> Feel free to come up with something different. But, if your solution
>> involves maintaining an additional read/write mount area in deference to
>> a long dead unix tradition, I'm just going to shake my head and patch
>> your solution away to something sane.
>>
>
> So I've had a good long think about this.
>
> Your arguments about using /dev do have some merit. However they sound more
> like post-hoc justification then genuine motivation.
> If the train of thought went:
> I need some files that are related to device management. Where shall I
> put them? I know, I'll put them in /dev.
> then it would be more convincing. But the logic actually went:
> I need some files to persist from early boot through to when the system
> has all basic filesystems mounted. Where shall I put them? I know, I'll
> put them in /dev.
> That sounds a lot less convincing.
To be fair, if post-hoc versus initial made any difference what so ever,
then so would the fact that I wouldn't have chosen to have these files
exist at all. I would have made incremental assembly work without a map
file and I would have made imsm superblock handling be in the kernel.
So, I'm dealing with the consequences of decisions I didn't make and
wouldn't have made. I don't think it's then fair to put some sort of
'premeditated' versus 'dealing with the situation' bias on my response.
> Given that chain of thought I would be more likely to come to the conclusion
> "I know, I'll put them in /lib/init/rw". Or at least I would on Debian -
> I don't know that any non-Debian-derived distros support that directory.
I have no idea. Not one of the files in question belongs there any more
than in /dev or anywhere else for that matter though, so I wouldn't come
to that conclusion in your shoes. But I find it somewhat disheartening
to hear you disparage my choice to put the files in /dev because "I just
wanted someplace to throw them" and then you would suggest /lib/init/rw
when in fact, according to this debian bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%23403863#35
the whole /lib/init/rw things is *exactly* that same thing. It's a "we
needed someplace to throw some files and didn't want to go through
committee so we found someplace we owned and could do what we want"
thing. In addition, as the person that reported this bug pointed out,
things like pid files and map files are just as big of a FHS violation
in /lib as they are in /dev. Neither place is the right place. Hell,
they even had to make modifications to chkrootkit to accommodate this
new directory and the files in there. Your choice of one over the other
is purely personal aesthetics, and there are real and legitimate reasons
to prefer *not* to have this directory. Boot complexity being the main
one. The fact that at least the mdadm map file is an enumeration of
device special files and mdadm devices and as such really belongs much
more in /dev than in /lib is another.
> The fact that Debian does have this directory and stores in there things that
> are not related to devices suggests that there is a real need for "persists
> from early boot" that does not fit in /dev. So if I put mdadm bits in /dev
> just because I can then I am making the /proc mistake of valuing pragmatics
> over elegance, and that is not a good long-term direction.
>
> Your argument that "udev does it so it must be OK" is also fairly weak. I
> would rather be a "recalcitrant old fart" than "wrong" any day.
> The fact that udev uses "/dev/.udev" is already an admission of failure.
I disagree.
> Prefixing a file name with '.' effectively says "I don't know where to put
> this, and I know it doesn't really belong here, but I cannot think of
> anything better so I'm going to do it anyway - shhh don't tell anyone".
I disagree with that too, all except the shhh don't tell anyone part.
Yes dot files by default keep something from being seen. But in the
context of /dev/.udev the idea makes sense. The udev files are directly
related to device bring up, but a big part of the reason udev is in use
today was to unclutter /dev and remove device special files that we used
to create *in case* the device existed and replace them with the device
special files that are there for the devices that actually do exist.
So, since udev is there to declutter /dev, it would not then make sense
to turn around and add back in new clutter, so .udev instead of udev.
> If only the founding fathers had given us a $HOME/rc directory for all the
> rc files we would be a lot better off.
>
> But there is still a problem that needs to be solved.
>
> mdmon needs to be running before any a certain class of md arrays (those with
> user-space managed metadata) can be written to. Because some filesystems
> choose to write to the device even when the filesystem is mounted read-only
> (which should be a hanging offence, but isn't yet)
Just to sidestep a second on the filesystem issue, there are only two
choices when it comes to filesystems: allow them to be mounted read only
(truly read only) and inconsistent or pseudo read only (where the
filesystem itself is the only thing that writes to the filesystem) and
be able to guarantee consistency. The only way for a journaled
filesystem to provide the guarantee it does is that it writes to the
device during mount even if its a read only mount. This is because they
guarantee to always be able to *restore* a filesystem to a sane state,
not that it will always *be* in a sane state. If they didn't do that
restore on mount, then possibly the thing that is inconsistent is
/sbin/init and the machine doesn't boot. In other words, the point of a
journaled filesystem would be wasted if they didn't do what they do.
The only other option is to do the replay in page cache and allow the
page cache and physical device to differ until the filesystem goes read
write, but I'm not sure that level of complexity is warranted or
advisable, especially since it could easily confuse anything that tries
to read from the disks directly.
> we potentially need mdmon
> running before the root filesystem is mounted.
>
> Because we want to unmount and completely discard the filesystem that holds
> the mdmon binary that was run early, we need to kill it and start a new one
> running from final namespace. This is also needed as to a small extent the
> filesystem is used to communicate between mdadm and a running mdmon, and
> having them have the same root is less confusing.
>
> There are three ways we can achieve this.
>
> 1/ If we can assume that between the time when the original "mount" completes
> and when the "mount -o remount,rw" happens the filesystem doesn't write to
> the device, then we can simply kill mdmon after the root is mounted, and
> restart it before remounting. However I don't trust filesystem
> implementers so I won't recommend that.
>
> 2/ Before the pivot root we can kill the old mdmon and start the new one
> chrooted into the final root.
> 3/ After the pivot root we can kill the old mdmon and start the new one.
>
> Number 2 is the approach that we (Well mostly Dan) originally intended and
> that the code implements ... or tries to. It got broken and I never
> noticed. I think I have fixed it now for 3.1.2.
Note, as I recall, Hans switched things to be #3 for various reasons.
That he switched it to #3 doesn't effect mdmon really, as it still is
just killing and restarting, but doing it after the pivot root solved a
couple issues. I don't recall what they were, you would have to talk to
Hans about that.
And you left part of the issue out. Yes, all the before bring up stuff
is true, but also true is that we want mdmon to hang around longer than
anyone else. By the time mdmon is ready to be shutdown, /var/run is
once again read only. So clean up can't be done. On the other hand, if
the files for mdmon are on a temporary filesystem that is rebuilt at
every boot...you get the point.
> However it requires that /var/run exists and is writeable during early boot.
> I'm not sure that I am really comfortable requiring that. If the contents
> of /var/run are not going to persist then it would be better if they didn't
> exist. mdadm current relies on that non-existence for proper handing of the
> "mapfile".
Can you explain this? I see nothing in the sources that tells me what
you mean by the non-existence of /var/run causing the mapfile to be
handled properly (and I'm not sure that's a valid requirement to put on
the system anyway because now you are dictating that if another early
boot application needs read only access to /var/run and we create
/var/run for that purpose, then it would in some way break mdadm's
operation).
> Number 3 would seem simplest except for the simple task of
> finding out which process to kill, and how to wait for it to clean up and
> die.
>
> This is where the suggestion of putting some key files in /dev comes from.
> If the mdmon pid file and socket were in /dev then a new mdmon would be able
> to find them, signal the pid, and read on the socket until it got EOF
> (because the other end was closed). If they aren't in /dev (or /lib/init/rw)
> then it isn't possible to find them.
>
> I could hunt through /proc to find the process called "mdmon" with the right
> args, kill that, and wait until it has gone. But that is rather ugly and I
> want to avoid "ugly".
>
> A really key consideration here is to make it all really easy for the distro
> package maintainers because debugging issues with early boot is really hard,
> and the maintainers have all got more interesting things to do with their
> time.
> So while I could suggest that the above ugliness be put in a script if you
> don't want to make /var/run persist from early boot (my preferred solution),
> I'm not going to do that.
>
> I think that what I will do is:
>
> - the "official" homes for the pid and unix-domain-sock are in /var/run
> (preferably /var/run/mdadm/ but Doug said something about needing
> /var/run/mdmon/ to placate the monster that is SELinux - I need more
> information about that).
mdmon does not need access to sendmail, so it should not be in the same
context as the mdadm files. This allows a more restrictive set of perms
on mdmon than on mdadm itself. If we put the mdmon files in
/var/run/mdadm, then they will have to have the same context as mdadm,
and because mdadm does so many things, it's already got an overly
liberal set of permissions compared to what mdmon realistically needs.
> When mdadm wants to communicate with mdmon it always looks there.
>
> - There is an alternative home which is /lib/init/rw/mdadm/ by default,
What happens to the files later in the boot process. Are they left
here? Or are they migrated to an appropriate location later? If they
are just left here, then this makes even *less* sense than putting the
files under /dev as you've created a diversion zone in the filesystem.
Someplace to throw things that *should* be elsewhere and then leave them
there. Hopefully nothing gets left here. And if nothing gets left
here, then whether the temporary spot is
/dev/gonna_be_deleted_after_stuff_is_moved_out or /lib/init/rw makes no
real difference except in the complexity of the initramfs, and more
complex is more prone to break so I go with the single rw mount point/area.
> but a 'make' option can easily change that if a distro wants to.
Thank you, I'm sure I'll end up using that.
> If I cannot access or mkdir /var/run/mdadm, I will mkdir /lib/init/rw/mdadm
> to have some where to create files
And so we are back to preserving two different read/write areas in the
filesystem for very early boot, at least in the default, which is why
I'm sure I'll use the make option.
> - mdadm when run in the "take over from previous instance" mode will
> look in /lib/init/rw/mdadm for the relevant .pid and .sock files if they
> aren't in /var/run/mdadm
Now I'm a bit concerned. What happens when the new program starts up?
If /var/run is now read/write, will the new mdmon then write the files
in /var/run/mdadm (or mdmon)? If it does do this in preference to
/lib/init/rw/mdadm, which I would expect because if it doesn't then the
issue that Bill Davidson brought up about the issue not being files
under /dev but actually being certain files *not* being under /var/run
creeps right back up. So, are you going to symlink /var/run/mdadm (or
mdmon) to /lib/init/rw/mdadm? If so, then you are now doing *exactly*
as I proposed except in /lib/init/rw/mdadm instead of something like
/dev/md/.mdadm. If you don't, then I foresee problems in your future in
that when mdmon is restarted in the root context, it will write files in
the real /var/run/mdadm directory, but before mdmon ever shuts down, the
/ filesystem will be readonly, and so those files will never get
cleaned, and on the next boot you will have stale files there that you
will have to workaround when it comes mdmon restart time as you'll need
to ignore or clean out /var/run/mdadm and then use the ones in
/lib/init/rw/mdadm instead. I'm sorry Neil, but this is sounding uglier
and uglier by the minute, not elegant.
> - mdmon.8 will list the various options with details.
>
>
> So I get to maintain a Unix tradition which might still have some life it
> after all, and Doug gets a very easy way to patch in his own version of
> sanity.
>
> (comments always welcome - I have made the changes described above and pushed
> them to git://neil.brown.name/mdadm, but it isn't to late to change it
> completely if that turns out to be best)
I made my proposal in another email. But, I didn't necessarily argue
for it. Since you've argued for yours, and since this is going to a
mailing list that I don't think significant parts of the original thread
went to, I'll present mine with the arguments.
Let's look at this on a file by file basis. First, for mdadm:
mdadm.map - incremental map file, needs to be read/write before / is
read/write if using incremental assembly on root array. Used to be
stored in /var/run/mdadm/mdadm.map. This isn't read/write early enough,
so incremental assembly would break. Neil noted something above about
if /var/run/mdadm doesn't exist and isn't writable then mdadm does
something different in mdadm current, but I looked in the git repo and
could not see where the specific problem a readonly /var/run caused
would be fixed, so I'll assume for now that a readonly /var/run is still
just as broken as before. We moved the file to /dev/md/.mdadm.map, but
Neil didn't like that and made it /dev/.mdadm.map instead. I would
actually propose /dev/md/incremental.map as it A) isn't hidden and I
believe it shouldn't be hidden because of E later on, B) clearly
indicates the purpose of the file, C) would be in an md specific/owned
area of /dev, D) is unlike to ever conflict with someone's desired md
device name, E) is a file specific to the enumeration and bring up of md
device special files and as such can be argued to belong in /dev anyway,
and F) solves the problem of needing a read/write /var/run for
incremental assembly to work.
mdadm.pid - this is only used my mdadm in monitor mode, which is not
started until after the filesystem is read/write. This can safely
reside in /var/run/mdadm as it does today, no changes needed.
Now the files for mdmon:
devname.pid, devname.sock - we use one mdmon per imsm array and each
mdmon has its own pid and sock file named after the array it is
watching. The problem being that if our root filesystem is on one of
these imsm arrays, we need mdmon up and running so it can mark the array
dirty because we will likely cause writes via possible journal replays
as we mount root. Likewise, even though there is code in mdmon to clean
up the pid/sock files, if we are talking about the mdmon for the root
filesystem, that cleanup can't happen as we need mdmon around to mark
the array clean after the final writes from going readonly are complete
(and in fact, during the final halt script on Fedora, we specifically
exclude *all* mdmon instances from the last killall that we do, then we
call mdadm to --wait-clean so we know that all the mdmons have marked
the devices clean after the readonly remount, then we reboot, so we
don't even kill the mdmon programs, ever). That means they will never
clean up their sock and pid files. As it turns out, being on a tmpfs,
permanently, is best for the mdmon files. We need them to be written
before the system comes up, and we need them to stick around while the
system goes down (we actually read the pid files to find what pids to
omit from the global killall we do), but we also want them to go away
when we reboot. So, location wise, /dev isn't necessarily the right
place for them. However, now that we use udev for dev, semantic wise
it's perfect. And we do have the one argument that they are at least
related to the bring up and take down of device special files. So, for
these files, I would actually argue for either /dev/.udev/mdmon with a
symlink from /var/run/mdmon to this location, or for /dev/md/.mdmon,
again with a symlink from /var/run/mdmon.
So that's my suggestion for how to handle this stuff.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2010-02-04 23:04 ` Dan Williams
[not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-06 17:51 ` Doug Ledford
2010-02-07 22:13 ` Hans de Goede
1 sibling, 2 replies; 66+ messages in thread
From: Dan Williams @ 2010-02-04 23:04 UTC (permalink / raw)
To: Doug Ledford
Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
On Thu, Feb 4, 2010 at 11:45 AM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> To be fair, if post-hoc versus initial made any difference what so ever,
> then so would the fact that I wouldn't have chosen to have these files
> exist at all. I would have made incremental assembly work without a map
> file and I would have made imsm superblock handling be in the kernel.
> So, I'm dealing with the consequences of decisions I didn't make and
> wouldn't have made. I don't think it's then fair to put some sort of
> 'premeditated' versus 'dealing with the situation' bias on my response.
On the argument about where to place the mdmon files I am now torn
between the "Neil" and "Doug" positions, but on the decision of where
to place imsm superblock handling I stand behind the design decision
to put it in userspace.
1/ If you take a look at native md superblock support you see that the
support code is duplicated between kernel-space and user space, having
it all handled in userspace means only one code base to maintain
(elegant aspect #1).
2/ The kernel can simply worry about the *mechanism* of providing raid
while all the assembly *policy* and support for any number of
superblock formats is relegated to where policy belongs (elegant
aspect #2).
2a/ This simply follows in the path of the design decision to not
support in-kernel auto-assembly of version-1 superblocks which started
the requirement to use an initramfs to boot software raid. (this is a
not so elegant aspect because it mandates an initramfs to boot, but I
don't think a general purpose distro can ever get away from that
requirement).
I will say that needing to touch several software packages (kernel,
initramfs, initscripts, mdadm) to get imsm superblock support has
added some excitement to the process in the short term. Long term I
think the elegant aspects of the decision will prove their worth.
--
Dan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-05 0:21 ` Bill Davidsen
2010-02-05 12:14 ` Luca Berra
0 siblings, 1 reply; 66+ messages in thread
From: Bill Davidsen @ 2010-02-05 0:21 UTC (permalink / raw)
To: Dan Williams
Cc: Doug Ledford, Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
Dan Williams wrote:
> On Thu, Feb 4, 2010 at 11:45 AM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
>> To be fair, if post-hoc versus initial made any difference what so ever,
>> then so would the fact that I wouldn't have chosen to have these files
>> exist at all. I would have made incremental assembly work without a map
>> file and I would have made imsm superblock handling be in the kernel.
>> So, I'm dealing with the consequences of decisions I didn't make and
>> wouldn't have made. I don't think it's then fair to put some sort of
>> 'premeditated' versus 'dealing with the situation' bias on my response.
>>
>
> On the argument about where to place the mdmon files I am now torn
> between the "Neil" and "Doug" positions, but on the decision of where
> to place imsm superblock handling I stand behind the design decision
> to put it in userspace.
>
> 1/ If you take a look at native md superblock support you see that the
> support code is duplicated between kernel-space and user space, having
> it all handled in userspace means only one code base to maintain
> (elegant aspect #1).
>
That is the decision which I question. Having anything mission critical
in user space means that there suddenly arise ownership, privilege and
scheduling issues which just don't exist for things in the kernel.
Just my opinion, I believe it introduces additional points of failure.
Perhaps like crypto it could be called from user or kernel space but
live in the kernel.
--
Bill Davidsen <davidsen-sQDSfeB7uhw@public.gmane.org>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-05 0:21 ` Bill Davidsen
@ 2010-02-05 12:14 ` Luca Berra
0 siblings, 0 replies; 66+ messages in thread
From: Luca Berra @ 2010-02-05 12:14 UTC (permalink / raw)
To: linux-raid
On Thu, Feb 04, 2010 at 07:21:59PM -0500, Bill Davidsen wrote:
>> 1/ If you take a look at native md superblock support you see that the
>> support code is duplicated between kernel-space and user space, having
>> it all handled in userspace means only one code base to maintain
>> (elegant aspect #1).
>>
>
> That is the decision which I question. Having anything mission critical in
> user space means that there suddenly arise ownership, privilege and
> scheduling issues which just don't exist for things in the kernel.
lol @ /sbin/mount
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 23:04 ` Dan Williams
[not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-06 17:51 ` Doug Ledford
[not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
1 sibling, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-06 17:51 UTC (permalink / raw)
To: Dan Williams
Cc: Neil Brown, linux-raid, initramfs, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 4352 bytes --]
On 02/04/2010 06:04 PM, Dan Williams wrote:
> On Thu, Feb 4, 2010 at 11:45 AM, Doug Ledford <dledford@redhat.com> wrote:
>> To be fair, if post-hoc versus initial made any difference what so ever,
>> then so would the fact that I wouldn't have chosen to have these files
>> exist at all. I would have made incremental assembly work without a map
>> file and I would have made imsm superblock handling be in the kernel.
>> So, I'm dealing with the consequences of decisions I didn't make and
>> wouldn't have made. I don't think it's then fair to put some sort of
>> 'premeditated' versus 'dealing with the situation' bias on my response.
>
> On the argument about where to place the mdmon files I am now torn
> between the "Neil" and "Doug" positions, but on the decision of where
> to place imsm superblock handling I stand behind the design decision
> to put it in userspace.
>
> 1/ If you take a look at native md superblock support you see that the
> support code is duplicated between kernel-space and user space, having
> it all handled in userspace means only one code base to maintain
> (elegant aspect #1).
Elegance is in the eye of the beholder. More on that in a minute.
> 2/ The kernel can simply worry about the *mechanism* of providing raid
> while all the assembly *policy* and support for any number of
> superblock formats is relegated to where policy belongs (elegant
> aspect #2).
I would argue that dirty/clean state manipulation is *not* policy and
*is* mechanism. So, by your definition of what should be in the kernel
combined with my definition of what dirty/clean state manipulation is,
the solution is not only not elegant, it's flat incorrect.
> 2a/ This simply follows in the path of the design decision to not
> support in-kernel auto-assembly of version-1 superblocks which started
> the requirement to use an initramfs to boot software raid. (this is a
> not so elegant aspect because it mandates an initramfs to boot, but I
> don't think a general purpose distro can ever get away from that
> requirement).
I'm fine with needing mdadm to assemble the device. I'm not fine with
needing mdmon once it's assembled.
> I will say that needing to touch several software packages (kernel,
> initramfs, initscripts, mdadm) to get imsm superblock support has
> added some excitement to the process in the short term. Long term I
> think the elegant aspects of the decision will prove their worth.
I will say that needing to touch multiple software packages might not be
a bad thing, but think of *how* they had to be changed. We had to add
special exceptions for mdmon all over the place: kernel scheduler (for
suspend/resume, mdmon can't be frozen like the rest of user space or
else writing our suspend to disk image doesn't work), initramfs,
initscripts after initramfs, initscripts on halt, SELinux. In all these
cases, we had to take something that we want to keep simple and add
special case rules and exceptions for mdmon. That pretty solidly says
that while this arrangement may have been elegant for *you*, it was not
elegant in the overall grand scheme of things.
What would have been smart was to leave array creation, assembly,
verfication, and modification to user space, but to put *all* of the
raid mechanics, including superblock clean/dirty state processing and
array shut down capabilities, in the kernel. Had you done that, I would
have called your solution elegant.
It's at this point that I feel obliged to mention that, in terms of this
whole big argument, the incremental map file has at least some amount of
sense belonging in /dev, it's really the mdmon .pid and .sock files that
don't, and those files wouldn't even exist had you designed things as I
mention here. It's the fact that you have two files per device that you
should be placing in a specific place on the filesystem in order for
them to be useful and adhere to standards yet the program they belong to
needs to exist outside the context of any filesystem that I think is
pretty strong evidence of the inelegance of this design.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2010-02-06 21:07 ` Dan Williams
[not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-08 4:23 ` Neil Brown
1 sibling, 1 reply; 66+ messages in thread
From: Dan Williams @ 2010-02-06 21:07 UTC (permalink / raw)
To: Doug Ledford
Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
>> 1/ If you take a look at native md superblock support you see that the
>> support code is duplicated between kernel-space and user space, having
>> it all handled in userspace means only one code base to maintain
>> (elegant aspect #1).
>
> Elegance is in the eye of the beholder. More on that in a minute.
>
True, but let's agree that superblock formats are quirky, arbitrary
and all around inelegant. Only needing to write that code once is at
the very least an aid to one's sanity.
>> 2/ The kernel can simply worry about the *mechanism* of providing raid
>> while all the assembly *policy* and support for any number of
>> superblock formats is relegated to where policy belongs (elegant
>> aspect #2).
>
> I would argue that dirty/clean state manipulation is *not* policy and
> *is* mechanism. So, by your definition of what should be in the kernel
> combined with my definition of what dirty/clean state manipulation is,
> the solution is not only not elegant, it's flat incorrect.
You are conveniently blurring the lines between event generation and
event handling. The kernel handles all the detail of detecting,
notifying and reaping the event. The arbitrary superblock specific
actions that need to happen in response to that event are really not
very interesting to rest of the mechanism of providing raid. You
could argue that I am conveniently drawing a line, and you would be
right. There are convenient aspects of having this portion of the
solution in userspace which do not compromise the integrity of the
raid mechanism.
We can now also handle spare assignment policy, hot-plug policy,
corner case disagreements between a superblock's definition of a
"container", all without thrashing the kernel.
>
>> 2a/ This simply follows in the path of the design decision to not
>> support in-kernel auto-assembly of version-1 superblocks which started
>> the requirement to use an initramfs to boot software raid. (this is a
>> not so elegant aspect because it mandates an initramfs to boot, but I
>> don't think a general purpose distro can ever get away from that
>> requirement).
>
> I'm fine with needing mdadm to assemble the device. I'm not fine with
> needing mdmon once it's assembled.
>
>> I will say that needing to touch several software packages (kernel,
>> initramfs, initscripts, mdadm) to get imsm superblock support has
>> added some excitement to the process in the short term. Long term I
>> think the elegant aspects of the decision will prove their worth.
>
> I will say that needing to touch multiple software packages might not be
> a bad thing, but think of *how* they had to be changed. We had to add
> special exceptions for mdmon all over the place: kernel scheduler (for
> suspend/resume, mdmon can't be frozen like the rest of user space or
> else writing our suspend to disk image doesn't work), initramfs,
> initscripts after initramfs, initscripts on halt, SELinux. In all these
> cases, we had to take something that we want to keep simple and add
> special case rules and exceptions for mdmon. That pretty solidly says
> that while this arrangement may have been elegant for *you*, it was not
> elegant in the overall grand scheme of things.
No, nothing elegant about that, but I think you would agree this isn't
something we threw over the wall and walked away from. Making mdmon
more convenient to handle is hopefully an obvious priority. Yes, I
know you would like to see it die, but we are where we are.
>
> What would have been smart was to leave array creation, assembly,
> verfication, and modification to user space, but to put *all* of the
> raid mechanics, including superblock clean/dirty state processing and
> array shut down capabilities, in the kernel. Had you done that, I would
> have called your solution elegant.
>
> It's at this point that I feel obliged to mention that, in terms of this
> whole big argument, the incremental map file has at least some amount of
> sense belonging in /dev, it's really the mdmon .pid and .sock files that
> don't, and those files wouldn't even exist had you designed things as I
> mention here. It's the fact that you have two files per device that you
> should be placing in a specific place on the filesystem in order for
> them to be useful and adhere to standards yet the program they belong to
> needs to exist outside the context of any filesystem that I think is
> pretty strong evidence of the inelegance of this design.
>
This comment makes me see Neil's argument in a different light,
(hopefully I am not mischaracterizing it), but essentially we are
waiting for the standards to catch up with this new class of program.
FUSE, CUSE, and mdmon belong to a class of programs that move
traditionally exclusive kernel space functionality to userspace.
Debian's /lib/init/rw looks to be a response to this grey area of the
standards (not that I have any familiarity with the LSB).
--
Dan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-06 21:46 ` martin f krafft
2010-02-06 22:06 ` Michael Evans
2010-02-08 15:32 ` Doug Ledford
1 sibling, 1 reply; 66+ messages in thread
From: martin f krafft @ 2010-02-06 21:46 UTC (permalink / raw)
To: Dan Williams
Cc: Doug Ledford, Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede,
Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 1987 bytes --]
also sprach Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> [2010.02.07.1007 +1300]:
> This comment makes me see Neil's argument in a different light,
> (hopefully I am not mischaracterizing it), but essentially we are
> waiting for the standards to catch up with this new class of
> program. FUSE, CUSE, and mdmon belong to a class of programs that
> move traditionally exclusive kernel space functionality to
> userspace. Debian's /lib/init/rw looks to be a response to this
> grey area of the standards (not that I have any familiarity with
> the LSB).
I have not read the full thread for lack of time, but I would like
to chime in that I favour user-space over kernel-space any day: it
makes for stabler systems, better interfaces, and easier upgrades
— even though it's definitely more work for the distro maintainers.
So mdmon seems like a good idea, even though some details might need
to be worked out to everyone's satisfaction yet.
I agree with Dan that this trend is new and that slow-moving
standards like the FHS have yet to catch up. But they cannot catch
up if distros don't explore the field. Debian's latest move in this
exploration was indeed /lib/init/rw, but it's questionable, not only
because it's a tmpfs, which makes it unusable for e.g. md bitmaps
— unless we invented a place that moved to persistent storage as
early as possible, in a way that would make it accessible early
during the next boot. But now I am diverting the topic…
--
.''`. martin f. krafft <madduck@d.o> Related projects:
: :' : proud Debian developer http://debiansystem.info
`. `'` http://people.debian.org/~madduck http://vcs-pkg.org
`- Debian - when you have better things to do than fixing systems
"there are two major products that come out of berkeley: lsd and unix.
we don't believe this to be a coincidence."
-- jeremy s. anderson
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-06 21:46 ` martin f krafft
@ 2010-02-06 22:06 ` Michael Evans
0 siblings, 0 replies; 66+ messages in thread
From: Michael Evans @ 2010-02-06 22:06 UTC (permalink / raw)
To: Dan Williams, Doug Ledford, Neil Brown, linux-raid, initramfs,
Michal Marek
On Sat, Feb 6, 2010 at 1:46 PM, martin f krafft <madduck@debian.org> wrote:
> also sprach Dan Williams <dan.j.williams@intel.com> [2010.02.07.1007 +1300]:
>> This comment makes me see Neil's argument in a different light,
>> (hopefully I am not mischaracterizing it), but essentially we are
>> waiting for the standards to catch up with this new class of
>> program. FUSE, CUSE, and mdmon belong to a class of programs that
>> move traditionally exclusive kernel space functionality to
>> userspace. Debian's /lib/init/rw looks to be a response to this
>> grey area of the standards (not that I have any familiarity with
>> the LSB).
>
> I have not read the full thread for lack of time, but I would like
> to chime in that I favour user-space over kernel-space any day: it
> makes for stabler systems, better interfaces, and easier upgrades
> — even though it's definitely more work for the distro maintainers.
>
> So mdmon seems like a good idea, even though some details might need
> to be worked out to everyone's satisfaction yet.
>
> I agree with Dan that this trend is new and that slow-moving
> standards like the FHS have yet to catch up. But they cannot catch
> up if distros don't explore the field. Debian's latest move in this
> exploration was indeed /lib/init/rw, but it's questionable, not only
> because it's a tmpfs, which makes it unusable for e.g. md bitmaps
> — unless we invented a place that moved to persistent storage as
> early as possible, in a way that would make it accessible early
> during the next boot. But now I am diverting the topic…
>
> --
> .''`. martin f. krafft <madduck@d.o> Related projects:
> : :' : proud Debian developer http://debiansystem.info
> `. `'` http://people.debian.org/~madduck http://vcs-pkg.org
> `- Debian - when you have better things to do than fixing systems
>
> "there are two major products that come out of berkeley: lsd and unix.
> we don't believe this to be a coincidence."
> -- jeremy s. anderson
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iEYEAREDAAYFAktt40AACgkQIgvIgzMMSnX4fQCgsyGhAdpfuObqWlkmBLNFI/jO
> YxQAniFBRkITdqXjkkx1VgkHHNCJDbO2
> =tLKB
> -----END PGP SIGNATURE-----
>
>
Shouldn't all /state/ information be held in the kernel in some form
and exported via one of the virtual filesystems? (dev, proc, sysfs)
This way if some userspace need exists the file can be read. If some
kernel access is required it's already within known structures.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-04 23:04 ` Dan Williams
@ 2010-02-07 22:13 ` Hans de Goede
2010-02-07 23:06 ` Neil Brown
1 sibling, 1 reply; 66+ messages in thread
From: Hans de Goede @ 2010-02-07 22:13 UTC (permalink / raw)
To: Doug Ledford
Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, Dan Williams, martin f krafft,
Michal Marek, Bill Nottingham
Hi All,
On 02/04/2010 07:45 PM, Doug Ledford wrote:
> On 02/04/2010 01:40 AM, Neil Brown wrote:
>>
<snip>
>> Because we want to unmount and completely discard the filesystem that holds
>> the mdmon binary that was run early, we need to kill it and start a new one
>> running from final namespace. This is also needed as to a small extent the
>> filesystem is used to communicate between mdadm and a running mdmon, and
>> having them have the same root is less confusing.
>>
>> There are three ways we can achieve this.
>>
>> 1/ If we can assume that between the time when the original "mount" completes
>> and when the "mount -o remount,rw" happens the filesystem doesn't write to
>> the device, then we can simply kill mdmon after the root is mounted, and
>> restart it before remounting. However I don't trust filesystem
>> implementers so I won't recommend that.
>>
>> 2/ Before the pivot root we can kill the old mdmon and start the new one
>> chrooted into the final root.
>> 3/ After the pivot root we can kill the old mdmon and start the new one.
>>
>> Number 2 is the approach that we (Well mostly Dan) originally intended and
>> that the code implements ... or tries to. It got broken and I never
>> noticed. I think I have fixed it now for 3.1.2.
>
> Note, as I recall, Hans switched things to be #3 for various reasons.
> That he switched it to #3 doesn't effect mdmon really, as it still is
> just killing and restarting, but doing it after the pivot root solved a
> couple issues. I don't recall what they were, you would have to talk to
> Hans about that.
>
The reasons I made this change was that although the mdmon takeover
mechanism was designed to be used as 2., at the time I was integrating this
code in to Fedora and tying all bits together the mdmon code for doing 2
was very very broken. Back then I've send Dan a long list of issues with it,
which I believe are all fixed now.
But as using option 3. just worked from the time I integrated this and
has stayed working. I've never seen a need to switch things back to 2. again
and given that 2. requires all kind of trickery and is hard to get right,
where as 3. is pretty easy to get right, and much less prone to break
(regress) I think that staying with 3. is a good solution / decision.
As for the whole were to store mdmon .pid and .sock files, my 2cents is
that /dev is the only dir where a socket file (which cannot be moved
cross filesystems) can be made in the initramfs and still be accessible
from the real root, and other things like /lib/whythefuckputthisinslashlib/rw,
can only be implemented by:
1) adding a second tmpfs which stays living after the chroot to the real
root.
2) symlinks which need to be both present on the real and the initramfs,
with the big problem being ensuring they are there on the read only
root fs from the initramds.
Both of which is needlessly complicated and fragile. So as for as I'm concerned
Fedora and the next RHEL will have these files under /dev. And if upstream
does not want this, then we will just keep patching mdadm / mdmon to do this
till the end of time. Note that /dev is already (ab)used in the same way
for passing dhcp leases from the initramfs to the running system when / lives
on a network device, and a few other state things which need to be passed
between the initramfs and the real root.
Pretty? No but effective and simple, and anytime you have this state passing
problem the most likely solution you will end up with, because it is
KISS and KISS is good.
Regards,
Hans
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-07 22:13 ` Hans de Goede
@ 2010-02-07 23:06 ` Neil Brown
0 siblings, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-02-07 23:06 UTC (permalink / raw)
To: Hans de Goede
Cc: Doug Ledford, linux-raid, initramfs, Dan Williams,
martin f krafft, Michal Marek, Bill Nottingham
On Sun, 07 Feb 2010 23:13:49 +0100
Hans de Goede <hdegoede@redhat.com> wrote:
> Both of which is needlessly complicated and fragile. So as for as I'm concerned
> Fedora and the next RHEL will have these files under /dev. And if upstream
> does not want this, then we will just keep patching mdadm / mdmon to do this
> till the end of time. Note that /dev is already (ab)used in the same way
> for passing dhcp leases from the initramfs to the running system when / lives
> on a network device, and a few other state things which need to be passed
> between the initramfs and the real root.
>
> Pretty? No but effective and simple, and anytime you have this state passing
> problem the most likely solution you will end up with, because it is
> KISS and KISS is good.
You admit that /dev is being abused, yet you seem proud of it. Odd. Maybe
I misunderstand.
Your dhcp lease example is perfect (thanks!) for demonstrating that something
is needed beyond devices. i.e. some sort of generic place to pass files from
'before' and 'after' pivot root is needed.
The thing I like about /lib/init/rw is that it is clearly admitting this need
and trying to address it. I have no particular attachment to the name (and
would much rather use /var/run!) but it is the honesty and forward thinking
that I like.
By contrast /dev/.udev seems dishonest (as it tries to hide) and not forward
thinking (as it appear to be udev specific).
If it was /dev/udev or /dev/UDEV it would be better.
If it was /dev/RUN/udev it would be better still.
Though I would really like the carry-over filesystem to be
/init
and it contain 'dev' and 'var/run' and anything else needed,
and after pivotroot, the interesting parts are bind mounted to their final
home.
mount --bind /init/dev /dev
Yes. "Keep it simple" is very important. So is being generic and
forward-looking. I haven't seen much evidence of being forward looking in
the various suggestions and reference examples that have been put forward.
Yes, the only difference among a lot of the options put forward is the name
of a directory. Are names really so important. Emphatically YES. They
guide the way people think. Bad names confuse people, good names educate
people.
So my leaning is still to default to the best name currently available which
seems to be /lib/init/rw, and to make it easy to choose a distro-specific
name at compile time.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-04 18:45 ` Doug Ledford
[not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2010-02-08 3:45 ` Neil Brown
2010-02-08 16:56 ` Bill Nottingham
1 sibling, 1 reply; 66+ messages in thread
From: Neil Brown @ 2010-02-08 3:45 UTC (permalink / raw)
To: Doug Ledford
Cc: linux-raid, initramfs, Dan Williams, martin f krafft,
Michal Marek, Hans de Goede, Bill Nottingham
On Thu, 04 Feb 2010 13:45:07 -0500
Doug Ledford <dledford@redhat.com> wrote:
> On 02/04/2010 01:40 AM, Neil Brown wrote:
> >
> > [cc:ing initramfs because anther part of this thread was already
> > cc:ed there, but this is the one I wanted to reply to.
> > cc:ed to various md/mdadm maintainers too]
> >
> > On Tue, 19 Jan 2010 12:51:52 -0500
> > Doug Ledford <dledford@redhat.com> wrote:
> >
> >> On 01/18/2010 05:09 PM, Neil Brown wrote:
> >>> On Mon, 11 Jan 2010 15:38:11 -0500
> >>> Doug Ledford <dledford@redhat.com> wrote:
> >>>
> >>>> Signed-off-by: Doug Ledford <dledford@redhat.com>
> >>>
> >>> I really really don't like this.
> >>> I wasn't very keen on allowing the map file to be found in /dev,
> >>> but this it just too ugly.
> >>
> >> I've had to rewrite my response to this a few times :-/
> >>
> >> So, let's be clear: you are objecting to these non device special files
> >> being located under /dev. Not necessarily *where* they are under /dev,
> >> just that they are under /dev at all. That's what I get from your
> >> statement above.
> >>
> >> First with devfs, then later with udev, the old unix tradition of only
> >> device special files under /dev is truly dead. And it should be. The
> >> files we are creating are needed prior to / filesystem bring up, and
> >> they are needed simply in order to fully populate /dev. In fact, an
> >> argument can be made that a new tradition, that files related to the
> >> creation and maintenance of device special files belong under /dev with
> >> the files they relate to, has been created. And this new tradition
> >> makes sense and is elegant on the basis that it requires only one
> >> read/write filesystem mount point during device special file population.
> >> It also makes sense that this new tradition would supersede the old
> >> tradition on the basis that the old tradition was created prior to the
> >> advent of hot plug and the need to have any read/write data just to
> >> populate your device special files. The old tradition didn't have the
> >> flexibility to deal with modern hot plug architectures, the new
> >> tradition fixes that, and does so as elegantly as possible.
> >>
> >> That being the case, the big player in the game, udev, is following the
> >> new tradition by creating an entire tree of non device special files
> >> under /dev/.udev and using that to store the information it needs. And
> >> here mdadm/mdmon are, the small players in the device bring up game that
> >> only have minor bit parts compared to udev, holding up progress and
> >> playing the recalcitrant old fart. Sorry Neil, but the war has already
> >> been decided and this is a dead battle. Files related to device special
> >> file bring up belong under /dev along with the files we are creating.
> >> Your claim that these changes are ugly are misplaced and based upon
> >> adherence to a dead tradition that has been replaced by a more sensible
> >> tradition. Maybe you don't like where they are under /dev, but the fact
> >> that they are under /dev is definitely the right thing to do and is not
> >> in the least bit ugly.
> >>
> >>> I understand there is a problem here, but I don't like this approach to a
> >>> solution. I'll give it more though when I get home from LCA2010 and see
> >>> what I can come up with.
> >>
> >> Feel free to come up with something different. But, if your solution
> >> involves maintaining an additional read/write mount area in deference to
> >> a long dead unix tradition, I'm just going to shake my head and patch
> >> your solution away to something sane.
> >>
> >
> > So I've had a good long think about this.
> >
> > Your arguments about using /dev do have some merit. However they sound more
> > like post-hoc justification then genuine motivation.
> > If the train of thought went:
> > I need some files that are related to device management. Where shall I
> > put them? I know, I'll put them in /dev.
> > then it would be more convincing. But the logic actually went:
> > I need some files to persist from early boot through to when the system
> > has all basic filesystems mounted. Where shall I put them? I know, I'll
> > put them in /dev.
> > That sounds a lot less convincing.
>
> To be fair, if post-hoc versus initial made any difference what so ever,
> then so would the fact that I wouldn't have chosen to have these files
> exist at all. I would have made incremental assembly work without a map
> file and I would have made imsm superblock handling be in the kernel.
> So, I'm dealing with the consequences of decisions I didn't make and
> wouldn't have made. I don't think it's then fair to put some sort of
> 'premeditated' versus 'dealing with the situation' bias on my response.
>
> > Given that chain of thought I would be more likely to come to the conclusion
> > "I know, I'll put them in /lib/init/rw". Or at least I would on Debian -
> > I don't know that any non-Debian-derived distros support that directory.
>
> I have no idea. Not one of the files in question belongs there any more
> than in /dev or anywhere else for that matter though, so I wouldn't come
> to that conclusion in your shoes. But I find it somewhat disheartening
> to hear you disparage my choice to put the files in /dev because "I just
> wanted someplace to throw them" and then you would suggest /lib/init/rw
I think names are really important. If you were suggesting
/dev/init/rw
I wouldn't be able to suggest that /lib/init/rw is any better.
But I think it is better than /dev/.
> > But there is still a problem that needs to be solved.
> >
> > mdmon needs to be running before any a certain class of md arrays (those with
> > user-space managed metadata) can be written to. Because some filesystems
> > choose to write to the device even when the filesystem is mounted read-only
> > (which should be a hanging offence, but isn't yet)
>
> Just to sidestep a second on the filesystem issue, there are only two
> choices when it comes to filesystems: allow them to be mounted read only
> (truly read only) and inconsistent or pseudo read only (where the
> filesystem itself is the only thing that writes to the filesystem) and
> be able to guarantee consistency. The only way for a journaled
> filesystem to provide the guarantee it does is that it writes to the
> device during mount even if its a read only mount. This is because they
> guarantee to always be able to *restore* a filesystem to a sane state,
> not that it will always *be* in a sane state. If they didn't do that
> restore on mount, then possibly the thing that is inconsistent is
> /sbin/init and the machine doesn't boot. In other words, the point of a
> journaled filesystem would be wasted if they didn't do what they do.
> The only other option is to do the replay in page cache and allow the
> page cache and physical device to differ until the filesystem goes read
> write, but I'm not sure that level of complexity is warranted or
> advisable, especially since it could easily confuse anything that tries
> to read from the disks directly.
The other other option is to build a lookup table from the journal (a TLB ??)
and at the very last step before reading from storage, map the sector address
through this lookup table and thus possibly read from the journal instead
from from the main FS. I'm fairly sure this would work for ext3 journals.
I'm less confident of XFS simply because I am less familar with them.
This would not necessary present a filesystem that is completely consistent
from a 'write' perspective (there could be allocated inodes that aren't
referenced and maybe the free-space bitmaps might not be 100%). But it
should give all the consistency for reading from the filesystem, which is all
you need.
Yes, it is added complexity in the filesystem, but not much I think, and very
localised.
>
> > we potentially need mdmon
> > running before the root filesystem is mounted.
> >
> > Because we want to unmount and completely discard the filesystem that holds
> > the mdmon binary that was run early, we need to kill it and start a new one
> > running from final namespace. This is also needed as to a small extent the
> > filesystem is used to communicate between mdadm and a running mdmon, and
> > having them have the same root is less confusing.
> >
> > There are three ways we can achieve this.
> >
> > 1/ If we can assume that between the time when the original "mount" completes
> > and when the "mount -o remount,rw" happens the filesystem doesn't write to
> > the device, then we can simply kill mdmon after the root is mounted, and
> > restart it before remounting. However I don't trust filesystem
> > implementers so I won't recommend that.
> >
> > 2/ Before the pivot root we can kill the old mdmon and start the new one
> > chrooted into the final root.
> > 3/ After the pivot root we can kill the old mdmon and start the new one.
> >
> > Number 2 is the approach that we (Well mostly Dan) originally intended and
> > that the code implements ... or tries to. It got broken and I never
> > noticed. I think I have fixed it now for 3.1.2.
>
> Note, as I recall, Hans switched things to be #3 for various reasons.
> That he switched it to #3 doesn't effect mdmon really, as it still is
> just killing and restarting, but doing it after the pivot root solved a
> couple issues. I don't recall what they were, you would have to talk to
> Hans about that.
>
> And you left part of the issue out. Yes, all the before bring up stuff
> is true, but also true is that we want mdmon to hang around longer than
> anyone else. By the time mdmon is ready to be shutdown, /var/run is
> once again read only. So clean up can't be done. On the other hand, if
> the files for mdmon are on a temporary filesystem that is rebuilt at
> every boot...you get the point.
Yes, I have not been thinking much about the shutdown side of the equation.
Cleanup isn't an issue - you do not need to clean up /var/run when shutting
down because it always happens on boot (and won't happen on a crash anyway).
The only possible issue that I can see is if you want to unmount /var before
setting / to read-only. You won't be able to do this because mdmon holds an
open file descriptor on /var.
So instead of unmounting /var you would need to remount it read-only, and
then remount '/' read-only.
Is that going to be a problem?
>
> > However it requires that /var/run exists and is writeable during early boot.
> > I'm not sure that I am really comfortable requiring that. If the contents
> > of /var/run are not going to persist then it would be better if they didn't
> > exist. mdadm current relies on that non-existence for proper handing of the
> > "mapfile".
>
> Can you explain this? I see nothing in the sources that tells me what
> you mean by the non-existence of /var/run causing the mapfile to be
> handled properly (and I'm not sure that's a valid requirement to put on
> the system anyway because now you are dictating that if another early
> boot application needs read only access to /var/run and we create
> /var/run for that purpose, then it would in some way break mdadm's
> operation).
When mdadm writes to the "mapfile" it tries to create it in /var/run. If
that doesn't work it tries to create it in /dev.
So if /var/run exists and is writeable during early boot the mapfile will be
created there. If this is not preserved then the information that was stored
in the mapfile will be lost.
The code for this is all very early in mapfile.c
Yes, I agree that requiring the non-existence of /var/run is somewhat
fragile. I hadn't completely thought that through until I wrote the above
quoted text.
Is it a reasonable requirement? I would like to think so as having
a /var/run that spontaneously disappears would seem to break the principle of
least surprise. Unfortunately I don't like the alternatives (though clearly
you do).
However ... as I note below, this might be a non-issue. There may not really
be any need to preserve the mapfile across pivot_root.
> > - the "official" homes for the pid and unix-domain-sock are in /var/run
> > (preferably /var/run/mdadm/ but Doug said something about needing
> > /var/run/mdmon/ to placate the monster that is SELinux - I need more
> > information about that).
>
> mdmon does not need access to sendmail, so it should not be in the same
> context as the mdadm files. This allows a more restrictive set of perms
> on mdmon than on mdadm itself. If we put the mdmon files in
> /var/run/mdadm, then they will have to have the same context as mdadm,
> and because mdadm does so many things, it's already got an overly
> liberal set of permissions compared to what mdmon realistically needs.
And you cannot allow two programs in different contexts to write to the same
directory? Am I going to have to learn how SELinux works ?(he asked with
dread).
Would it work to use /var/run/mdadm/mdmon ?? I'm not necessarily suggesting
that, just scoping out the range of options.
>
> > When mdadm wants to communicate with mdmon it always looks there.
> >
> > - There is an alternative home which is /lib/init/rw/mdadm/ by default,
>
> What happens to the files later in the boot process. Are they left
> here? Or are they migrated to an appropriate location later? If they
> are just left here, then this makes even *less* sense than putting the
> files under /dev as you've created a diversion zone in the filesystem.
> Someplace to throw things that *should* be elsewhere and then leave them
> there. Hopefully nothing gets left here. And if nothing gets left
> here, then whether the temporary spot is
> /dev/gonna_be_deleted_after_stuff_is_moved_out or /lib/init/rw makes no
> real difference except in the complexity of the initramfs, and more
> complex is more prone to break so I go with the single rw mount point/area.
The $dev.pid and $dev.sock files belong to the running mdmon.
When we kill the initramfs mdmon and start a new one, these files are removed
and new ones are created in /var/run. If /var/run is not writeable they are
created in the alternate until /var/run becomes writeable (we
monitor /proc/mounts for changes) and then remove and recreate the files.
The mapfile is read from the alternate if it doesn't exist in /var/run, and
written to /var/run if possible when a write is needed. So it is
effectively copied at the first update.
And as I said elsewhere, I think names are very important, in part because
people copy them. And
/dev/temp_place_for_files_carried_over_from_initramfs/
would be a lot better than /dev/.mdmon as the purpose would be obvious and
the example set for others would be clear. I would put things in
/dev/temp_place_for_files_carried_over_from_initramfs/var/run/mdmon
I think.
>
> > - mdadm when run in the "take over from previous instance" mode will
> > look in /lib/init/rw/mdadm for the relevant .pid and .sock files if they
> > aren't in /var/run/mdadm
>
> Now I'm a bit concerned. What happens when the new program starts up?
> If /var/run is now read/write, will the new mdmon then write the files
> in /var/run/mdadm (or mdmon)? If it does do this in preference to
> /lib/init/rw/mdadm, which I would expect because if it doesn't then the
> issue that Bill Davidson brought up about the issue not being files
> under /dev but actually being certain files *not* being under /var/run
> creeps right back up. So, are you going to symlink /var/run/mdadm (or
> mdmon) to /lib/init/rw/mdadm? If so, then you are now doing *exactly*
> as I proposed except in /lib/init/rw/mdadm instead of something like
> /dev/md/.mdadm. If you don't, then I foresee problems in your future in
> that when mdmon is restarted in the root context, it will write files in
> the real /var/run/mdadm directory, but before mdmon ever shuts down, the
> / filesystem will be readonly, and so those files will never get
> cleaned, and on the next boot you will have stale files there that you
> will have to workaround when it comes mdmon restart time as you'll need
> to ignore or clean out /var/run/mdadm and then use the ones in
> /lib/init/rw/mdadm instead. I'm sorry Neil, but this is sounding uglier
> and uglier by the minute, not elegant.
But /var/run is cleaned by init scripts. All non-directories are removed.
I'm fairly sure that all distros do this.
I guess that means that mdmon might find it's .pid and .sock files get
removed after it has created them, which would be embarrassing.
(Of course if /var/run were a tmpfs, there would be no need for
embarrassment...).
.... no, that should be a problem. As long as we run the
mdmon --all /
after /var/run has been mounted and clean all should be happiness.
No, I'm not suggesting symlinks. The "alternate" location is only used
temporarily to carry information across from before to after pivot_root.
>
> > - mdmon.8 will list the various options with details.
> >
> >
> > So I get to maintain a Unix tradition which might still have some life it
> > after all, and Doug gets a very easy way to patch in his own version of
> > sanity.
> >
> > (comments always welcome - I have made the changes described above and pushed
> > them to git://neil.brown.name/mdadm, but it isn't to late to change it
> > completely if that turns out to be best)
>
> I made my proposal in another email. But, I didn't necessarily argue
> for it. Since you've argued for yours, and since this is going to a
> mailing list that I don't think significant parts of the original thread
> went to, I'll present mine with the arguments.
>
> Let's look at this on a file by file basis. First, for mdadm:
>
> mdadm.map - incremental map file, needs to be read/write before / is
> read/write if using incremental assembly on root array. Used to be
> stored in /var/run/mdadm/mdadm.map. This isn't read/write early enough,
> so incremental assembly would break. Neil noted something above about
> if /var/run/mdadm doesn't exist and isn't writable then mdadm does
> something different in mdadm current, but I looked in the git repo and
> could not see where the specific problem a readonly /var/run caused
> would be fixed, so I'll assume for now that a readonly /var/run is still
> just as broken as before. We moved the file to /dev/md/.mdadm.map, but
> Neil didn't like that and made it /dev/.mdadm.map instead. I would
> actually propose /dev/md/incremental.map as it A) isn't hidden and I
> believe it shouldn't be hidden because of E later on, B) clearly
> indicates the purpose of the file, C) would be in an md specific/owned
> area of /dev, D) is unlike to ever conflict with someone's desired md
> device name, E) is a file specific to the enumeration and bring up of md
> device special files and as such can be argued to belong in /dev anyway,
> and F) solves the problem of needing a read/write /var/run for
> incremental assembly to work.
The mapfile isn't used only for incremental assembly, so "incremental.map"
wouldn't be a good name.
There are (if I remember correctly) two main uses for the "mapfile".
The first is as a cache for the mapping from UUID to md device (major/minor
number). This is particularly need for Incremental mode so that when a new
device is found, it is easy to find if an md device already is (partially)
assembled for that array.
Being a cache, this information can be recreated at any time - simply read
the meta from some device in each array in record the UUID. This can be
done with
mdadm --incremental --rebuild-map
(or mdadm -Ir).
I think "mdadm --incremental" might even do this transparently if the mapfile
cannot be found.
The other use is to record the 'name' of the array. This 'name' might be
extracted from the metadata (if the metadata stores a name), might be
specified on the mdadm command line or in /etc/mdadm.conf, or might be
generated from the metadata, the chosen minor device and other 'random'
information to generate a unique name in cases where a clash with a
preexisting name cannot be ruled out and would be inconvenient.
This name is used by the udev rules to tell udev what name to create
in /dev/md/.
This isn't a pure cache as the name may be based on user input, or on the
order of array discovery.
However the names created by "mdadm -Ir" during boot should be the same
as any names generated by mdadm calls in the initramfs unless there were
significant differences between mdadm.conf in initramfs versus the final root.
So it is probable that we don't need to preserve the mapfile across
pivot_root. I think we did before, but there have been a number of
improvements in --incremental since then, particularly the auto-generation of
the mapfile.
>
> mdadm.pid - this is only used my mdadm in monitor mode, which is not
> started until after the filesystem is read/write. This can safely
> reside in /var/run/mdadm as it does today, no changes needed.
Agreed.
>
> Now the files for mdmon:
>
> devname.pid, devname.sock - we use one mdmon per imsm array and each
> mdmon has its own pid and sock file named after the array it is
> watching. The problem being that if our root filesystem is on one of
> these imsm arrays, we need mdmon up and running so it can mark the array
> dirty because we will likely cause writes via possible journal replays
> as we mount root. Likewise, even though there is code in mdmon to clean
> up the pid/sock files, if we are talking about the mdmon for the root
> filesystem, that cleanup can't happen as we need mdmon around to mark
> the array clean after the final writes from going readonly are complete
> (and in fact, during the final halt script on Fedora, we specifically
> exclude *all* mdmon instances from the last killall that we do, then we
> call mdadm to --wait-clean so we know that all the mdmons have marked
> the devices clean after the readonly remount, then we reboot, so we
> don't even kill the mdmon programs, ever). That means they will never
> clean up their sock and pid files. As it turns out, being on a tmpfs,
> permanently, is best for the mdmon files. We need them to be written
> before the system comes up, and we need them to stick around while the
> system goes down (we actually read the pid files to find what pids to
> omit from the global killall we do), but we also want them to go away
> when we reboot. So, location wise, /dev isn't necessarily the right
> place for them. However, now that we use udev for dev, semantic wise
> it's perfect. And we do have the one argument that they are at least
> related to the bring up and take down of device special files. So, for
> these files, I would actually argue for either /dev/.udev/mdmon with a
> symlink from /var/run/mdmon to this location, or for /dev/md/.mdmon,
> again with a symlink from /var/run/mdmon.
Points where I differ are:
1/ clean-up: it is a non-issue. initscripts already do that.
2/ udev model: I don't agree that it is a good model to copy.
>
> So that's my suggestion for how to handle this stuff.
>
Thanks.
Following this step in the discussion I plan to:
1/ remove the 'switchroot' option (option 2 in a previous Email).
from mdmon. I don't think anyone will use it and it has
no convincing benefit, and some real costs.
2/ remove the watching of /proc/mounts to see when /var becomes
writeable. Rather I will require (and document) and
/var/run/ should be writeable (and cleaned) before
mdmon --all
is run to take over from any mdmon that might still be running
from the initramfs. This removes any possible race with
automatic cleaning of /var/run/
3/ Document that at mdmon may prevent /var from being unmounted and
recommend "-o remount,ro" as an alternative.
4/ Use the "alternate run" directory as an alternate location for
the mapfile, rather than explicitly using /dev/.mdadm.map.
I should have done this before, but forgot.
If we get two (or more) distros agreeing on a generic name for a scratch area
to carry files over from before the pivot_root, then I will certainly
consider using that rather than /lib/init/rw, even if it is in /dev.
Hopefully it will not have a leading '.' in any name component.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-06 21:07 ` Dan Williams
@ 2010-02-08 4:23 ` Neil Brown
1 sibling, 0 replies; 66+ messages in thread
From: Neil Brown @ 2010-02-08 4:23 UTC (permalink / raw)
To: Doug Ledford
Cc: Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
On Sat, 06 Feb 2010 12:51:02 -0500
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> I will say that needing to touch multiple software packages might not be
> a bad thing, but think of *how* they had to be changed. We had to add
> special exceptions for mdmon all over the place: kernel scheduler (for
> suspend/resume, mdmon can't be frozen like the rest of user space or
> else writing our suspend to disk image doesn't work), initramfs,
> initscripts after initramfs, initscripts on halt, SELinux. In all these
> cases, we had to take something that we want to keep simple and add
> special case rules and exceptions for mdmon. That pretty solidly says
> that while this arrangement may have been elegant for *you*, it was not
> elegant in the overall grand scheme of things.
>
or it just means we are breaking new ground :-)
The suspend/resume issue you bring up is an important one and to my mind is
currently unsolved.
Based on my limited understanding of hibernation, I think that mdmon should
be to quiesce (but not actually be frozen) prior to taking the in-memory
snapshot, then thawed prior to writing that snapshot out to disk.
Further when it is thawed after resume-from-disk it needs to know it has been
thawed so it can check the metadata on-disk to see if any failure happened
while it slept.
Similar thing would be needed for suspend through fuse.
Do you know exactly what was done to the scheduler in redhat?
NeilBrown
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-06 21:46 ` martin f krafft
@ 2010-02-08 15:32 ` Doug Ledford
2010-02-08 21:38 ` Neil Brown
1 sibling, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-08 15:32 UTC (permalink / raw)
To: Dan Williams
Cc: Neil Brown, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 2373 bytes --]
On 02/06/2010 04:07 PM, Dan Williams wrote:
> This comment makes me see Neil's argument in a different light,
> (hopefully I am not mischaracterizing it), but essentially we are
> waiting for the standards to catch up with this new class of program.
> FUSE, CUSE, and mdmon belong to a class of programs that move
> traditionally exclusive kernel space functionality to userspace.
> Debian's /lib/init/rw looks to be a response to this grey area of the
> standards (not that I have any familiarity with the LSB).
So if we want to argue that the standards are simply behind the times,
and we need to do something that makes sense regardless of the
standards, then I don't think anything in /dev or /lib makes sense. The
files that need to be created pre-rw-root are varied in their type and
purpose between different things. What we really need is simply an
early boot /tmp area. So, why not make a top level directory that
clearly delineates this nature? Something like /pre-init or /early-tmp
or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or
/tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux
specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for
possible standards acceptance later)? I was thinking that mdmon's files
would be stuck there, but then I remembered that we are doing option #3
for mdmon, restarting after the system is up and running, so only the
mdmon instances from the initramfs would put their files there, the
final ones would be on the real /var/run area. So, since as far as I
know the mdmon .sock files were the only pre-boot files that couldn't be
moved later (but effectively get moved by restarting mdmon after r/w
/var/run), any and all files in /tmp/pre-pivot-root should be removed
once the system is up and running, and quite possibly the filesystem
could be entirely done away with. At least then the naming would be to
Neil's satisfaction I think, and mine. And personally, when the
standards are simply behind the times, I have no problem blazing ahead
and letting them catch up when they get off their asses.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-08 3:45 ` Neil Brown
@ 2010-02-08 16:56 ` Bill Nottingham
0 siblings, 0 replies; 66+ messages in thread
From: Bill Nottingham @ 2010-02-08 16:56 UTC (permalink / raw)
To: Neil Brown
Cc: Doug Ledford, linux-raid, initramfs, Dan Williams,
martin f krafft, Michal Marek, Hans de Goede
Neil Brown (neilb@suse.de) said:
> Yes, I have not been thinking much about the shutdown side of the equation.
> Cleanup isn't an issue - you do not need to clean up /var/run when shutting
> down because it always happens on boot (and won't happen on a crash anyway).
> The only possible issue that I can see is if you want to unmount /var before
> setting / to read-only. You won't be able to do this because mdmon holds an
> open file descriptor on /var.
> So instead of unmounting /var you would need to remount it read-only, and
> then remount '/' read-only.
>
> Is that going to be a problem?
It's certainly a change in behavior. Historically all non-root filesystems
can be cleanly unmounted, then root is marked read-only, then you
halt/reboot.
> The first is as a cache for the mapping from UUID to md device (major/minor
> number). This is particularly need for Incremental mode so that when a new
> device is found, it is easy to find if an md device already is (partially)
> assembled for that array.
> Being a cache, this information can be recreated at any time - simply read
> the meta from some device in each array in record the UUID. This can be
> done with
> mdadm --incremental --rebuild-map
> (or mdadm -Ir).
> I think "mdadm --incremental" might even do this transparently if the mapfile
> cannot be found.
This seems like it could be integrated with the udev database, could it not?
(Whether or not you want this dependency is another matter.)
> 3/ Document that at mdmon may prevent /var from being unmounted and
> recommend "-o remount,ro" as an alternative.
As said above, I think this is a problem.
Bill
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-08 15:32 ` Doug Ledford
@ 2010-02-08 21:38 ` Neil Brown
2010-02-09 0:20 ` Michael Evans
[not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
0 siblings, 2 replies; 66+ messages in thread
From: Neil Brown @ 2010-02-08 21:38 UTC (permalink / raw)
To: Doug Ledford
Cc: Dan Williams, linux-raid, initramfs, martin f krafft,
Michal Marek, Hans de Goede, Bill Nottingham
On Mon, 08 Feb 2010 10:32:53 -0500
Doug Ledford <dledford@redhat.com> wrote:
> On 02/06/2010 04:07 PM, Dan Williams wrote:
>
> > This comment makes me see Neil's argument in a different light,
> > (hopefully I am not mischaracterizing it), but essentially we are
> > waiting for the standards to catch up with this new class of program.
> > FUSE, CUSE, and mdmon belong to a class of programs that move
> > traditionally exclusive kernel space functionality to userspace.
> > Debian's /lib/init/rw looks to be a response to this grey area of the
> > standards (not that I have any familiarity with the LSB).
>
> So if we want to argue that the standards are simply behind the times,
> and we need to do something that makes sense regardless of the
> standards, then I don't think anything in /dev or /lib makes sense. The
> files that need to be created pre-rw-root are varied in their type and
> purpose between different things. What we really need is simply an
> early boot /tmp area. So, why not make a top level directory that
> clearly delineates this nature? Something like /pre-init or /early-tmp
> or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or
> /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux
> specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for
> possible standards acceptance later)? I was thinking that mdmon's files
> would be stuck there, but then I remembered that we are doing option #3
> for mdmon, restarting after the system is up and running, so only the
> mdmon instances from the initramfs would put their files there, the
> final ones would be on the real /var/run area. So, since as far as I
> know the mdmon .sock files were the only pre-boot files that couldn't be
> moved later (but effectively get moved by restarting mdmon after r/w
> /var/run), any and all files in /tmp/pre-pivot-root should be removed
> once the system is up and running, and quite possibly the filesystem
> could be entirely done away with. At least then the naming would be to
> Neil's satisfaction I think, and mine. And personally, when the
> standards are simply behind the times, I have no problem blazing ahead
> and letting them catch up when they get off their asses.
>
>
That's the spirit!!!
Let's figure out what we really want/need, and just do it.
Following my recent discovery that mdmon prevents /var from being unmounted
at shutdown, I wonder if we really want something generic that persists from
very early boot to very late shutdown, rather than just the early-boot part.
So something like /var/run, but not dependent on /var and guaranteed to be
in-memory (or swap) and created very early by initramfs.
/run
???
Trivial implementation for most distros would be to make it a symlink
to /dev/run.
I would prefer a name a little more descriptive than "/run" - something that
reflects the idea that it is particularly for early-boot or late-shutdown -
but nothing comes to mind.
I could probably actually live with "/dev/run" as the permanent home for the
mdmon files: /dev/run/mdmon/*.{sock,pid}
It addresses most of the issues I had with the original suggestion (hidden
files, non-generic approach) so the "cons" are weaker. And I now understand
the "pros" better (races with cleaning /var/run, issues with unmounting /var
etc).
Anyone second the motion?
NeilBrown
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-08 21:38 ` Neil Brown
@ 2010-02-09 0:20 ` Michael Evans
[not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
1 sibling, 0 replies; 66+ messages in thread
From: Michael Evans @ 2010-02-09 0:20 UTC (permalink / raw)
To: Neil Brown
Cc: Doug Ledford, Dan Williams, linux-raid, initramfs,
martin f krafft, Michal Marek, Hans de Goede, Bill Nottingham
On Mon, Feb 8, 2010 at 1:38 PM, Neil Brown <neilb@suse.de> wrote:
> On Mon, 08 Feb 2010 10:32:53 -0500
> Doug Ledford <dledford@redhat.com> wrote:
>
>> On 02/06/2010 04:07 PM, Dan Williams wrote:
>>
>> > This comment makes me see Neil's argument in a different light,
>> > (hopefully I am not mischaracterizing it), but essentially we are
>> > waiting for the standards to catch up with this new class of program.
>> > FUSE, CUSE, and mdmon belong to a class of programs that move
>> > traditionally exclusive kernel space functionality to userspace.
>> > Debian's /lib/init/rw looks to be a response to this grey area of the
>> > standards (not that I have any familiarity with the LSB).
>>
>> So if we want to argue that the standards are simply behind the times,
>> and we need to do something that makes sense regardless of the
>> standards, then I don't think anything in /dev or /lib makes sense. The
>> files that need to be created pre-rw-root are varied in their type and
>> purpose between different things. What we really need is simply an
>> early boot /tmp area. So, why not make a top level directory that
>> clearly delineates this nature? Something like /pre-init or /early-tmp
>> or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or
>> /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux
>> specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for
>> possible standards acceptance later)? I was thinking that mdmon's files
>> would be stuck there, but then I remembered that we are doing option #3
>> for mdmon, restarting after the system is up and running, so only the
>> mdmon instances from the initramfs would put their files there, the
>> final ones would be on the real /var/run area. So, since as far as I
>> know the mdmon .sock files were the only pre-boot files that couldn't be
>> moved later (but effectively get moved by restarting mdmon after r/w
>> /var/run), any and all files in /tmp/pre-pivot-root should be removed
>> once the system is up and running, and quite possibly the filesystem
>> could be entirely done away with. At least then the naming would be to
>> Neil's satisfaction I think, and mine. And personally, when the
>> standards are simply behind the times, I have no problem blazing ahead
>> and letting them catch up when they get off their asses.
>>
>>
>
> That's the spirit!!!
> Let's figure out what we really want/need, and just do it.
>
> Following my recent discovery that mdmon prevents /var from being unmounted
> at shutdown, I wonder if we really want something generic that persists from
> very early boot to very late shutdown, rather than just the early-boot part.
> So something like /var/run, but not dependent on /var and guaranteed to be
> in-memory (or swap) and created very early by initramfs.
>
> /run
> ???
> Trivial implementation for most distros would be to make it a symlink
> to /dev/run.
>
> I would prefer a name a little more descriptive than "/run" - something that
> reflects the idea that it is particularly for early-boot or late-shutdown -
> but nothing comes to mind.
>
> I could probably actually live with "/dev/run" as the permanent home for the
> mdmon files: /dev/run/mdmon/*.{sock,pid}
> It addresses most of the issues I had with the original suggestion (hidden
> files, non-generic approach) so the "cons" are weaker. And I now understand
> the "pros" better (races with cleaning /var/run, issues with unmounting /var
> etc).
>
> Anyone second the motion?
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
What about systems that have only devices known at compile/create time
and thus might be created with a fully static /dev for extra
simplicity. We should not simply expect that /dev is read-write as a
system requirement. This is one reason why my previous solutions
suggested using a known area and symlinking in an implementation
defined way that mdadm/mdmon didn't need to know about.
Maybe a good name for it is '/state' as in system state information.
It would be reasonable to expect it to be a ram/swap backed filesystem
for SMALL files to exist as a user-space state area for various
daemons and such.
However any information in there which is also potentially useful to
in-kernel code should probably be re-located to an entry exposed via
sysfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2010-02-09 2:19 ` martin f krafft
[not found] ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org>
2010-02-09 20:30 ` Doug Ledford
1 sibling, 1 reply; 66+ messages in thread
From: martin f krafft @ 2010-02-09 2:19 UTC (permalink / raw)
To: Neil Brown
Cc: Doug Ledford, Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede,
Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 1961 bytes --]
also sprach Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> [2010.02.09.1038 +1300]:
> I could probably actually live with "/dev/run" as the permanent
> home for the mdmon files: /dev/run/mdmon/*.{sock,pid} It
> addresses most of the issues I had with the original suggestion
> (hidden files, non-generic approach) so the "cons" are weaker.
> And I now understand the "pros" better (races with cleaning
> /var/run, issues with unmounting /var etc).
Note that initramfs already carries /dev across the pivot_root, and
initramfs already uses /dev/.initramfs to carry stuff across.
I am not sure /dev/run will fly past the Debian Police. On the other
hand, it would be convenient, since it'll work out-of-the-box, at
least on Debian systems. I don't really like the idea of a symlink
in / though. Nor do I really have a better idea.
> Anyone second the motion?
I am all for finding a solution that works, but I don't think it's
as easy as "the standards are slow, so let's just forge ahead with
mdadm only and give them something to standardise".
I wouldn't mind avoiding all the bikeshedding, and maybe it'll just
work, but having to change things later might possibly be a lot of
trouble — after all, we don't want to break people's systems then.
On the other hand, this is something that is reinitialised on every
boot, isn't it? If that's the case and there don't seem to be
complications with a later move, then I say: yeah, let's go ahead.
--
.''`. martin f. krafft <madduck@d.o> Related projects:
: :' : proud Debian developer http://debiansystem.info
`. `'` http://people.debian.org/~madduck http://vcs-pkg.org
`- Debian - when you have better things to do than fixing systems
"when a gentoo admin tells me that the KISS principle is good for
'busy sysadmins', and that it's not an evolutionary step backwards,
i wonder whether their tape is already running backwards."
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-09 2:19 ` martin f krafft
@ 2010-02-09 20:30 ` Doug Ledford
1 sibling, 0 replies; 66+ messages in thread
From: Doug Ledford @ 2010-02-09 20:30 UTC (permalink / raw)
To: Neil Brown
Cc: Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, martin f krafft, Michal Marek,
Hans de Goede, Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 4752 bytes --]
On 02/08/2010 04:38 PM, Neil Brown wrote:
> On Mon, 08 Feb 2010 10:32:53 -0500
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
>> On 02/06/2010 04:07 PM, Dan Williams wrote:
>>
>>> This comment makes me see Neil's argument in a different light,
>>> (hopefully I am not mischaracterizing it), but essentially we are
>>> waiting for the standards to catch up with this new class of program.
>>> FUSE, CUSE, and mdmon belong to a class of programs that move
>>> traditionally exclusive kernel space functionality to userspace.
>>> Debian's /lib/init/rw looks to be a response to this grey area of the
>>> standards (not that I have any familiarity with the LSB).
>>
>> So if we want to argue that the standards are simply behind the times,
>> and we need to do something that makes sense regardless of the
>> standards, then I don't think anything in /dev or /lib makes sense. The
>> files that need to be created pre-rw-root are varied in their type and
>> purpose between different things. What we really need is simply an
>> early boot /tmp area. So, why not make a top level directory that
>> clearly delineates this nature? Something like /pre-init or /early-tmp
>> or whatever? Or possibly /tmp/pre-boot or /tmp/pre-init or
>> /tmp/pre-pivot-root (the pre-pivot-root naming is awfully linux
>> specific, so maybe /tmp/pre-init or /tmp/pre-boot would be better for
>> possible standards acceptance later)? I was thinking that mdmon's files
>> would be stuck there, but then I remembered that we are doing option #3
>> for mdmon, restarting after the system is up and running, so only the
>> mdmon instances from the initramfs would put their files there, the
>> final ones would be on the real /var/run area. So, since as far as I
>> know the mdmon .sock files were the only pre-boot files that couldn't be
>> moved later (but effectively get moved by restarting mdmon after r/w
>> /var/run), any and all files in /tmp/pre-pivot-root should be removed
>> once the system is up and running, and quite possibly the filesystem
>> could be entirely done away with. At least then the naming would be to
>> Neil's satisfaction I think, and mine. And personally, when the
>> standards are simply behind the times, I have no problem blazing ahead
>> and letting them catch up when they get off their asses.
>>
>>
>
> That's the spirit!!!
> Let's figure out what we really want/need, and just do it.
>
> Following my recent discovery that mdmon prevents /var from being unmounted
> at shutdown, I wonder if we really want something generic that persists from
> very early boot to very late shutdown, rather than just the early-boot part.
> So something like /var/run, but not dependent on /var and guaranteed to be
> in-memory (or swap) and created very early by initramfs.
>
> /run
> ???
> Trivial implementation for most distros would be to make it a symlink
> to /dev/run.
>
> I would prefer a name a little more descriptive than "/run" - something that
> reflects the idea that it is particularly for early-boot or late-shutdown -
> but nothing comes to mind.
>
> I could probably actually live with "/dev/run" as the permanent home for the
> mdmon files: /dev/run/mdmon/*.{sock,pid}
> It addresses most of the issues I had with the original suggestion (hidden
> files, non-generic approach) so the "cons" are weaker. And I now understand
> the "pros" better (races with cleaning /var/run, issues with unmounting /var
> etc).
>
> Anyone second the motion?
I second the idea, but I hate the name run. Mainly because it's not
really descriptive to the issue at hand. I mean, everything needs to
run, the part that's different about all of this is that it needs to run
*pre-root-filesystem-available*.
If we were to stick with unix tradition of being short and cryptic (but
make sense when explained), then /ptmp -> /dev/ptmp might work with the
explanation being that it's a pre-init temporary file area (or
pre-root-filesystem temporary file area).
Of course, the /ptmp -> /dev/ptmp would only be if we are using the dev
filesystem for simplicity's sake. However, as someone else mentioned,
realistically if we want this to be accepted, it should not be dependent
on udev and a tmpfs based dev directory, it should stand on its own.
Meaning that it *should* be its own minor tmpfs filesystem mounted at
/ptmp. But maybe that's something to work toward versus a first step.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org>
@ 2010-02-09 20:34 ` Doug Ledford
[not found] ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2010-02-09 20:34 UTC (permalink / raw)
To: Neil Brown, Dan Williams, linux-raid-u79uwXL29TY76Z2rM5mHXA,
initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede
[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]
On 02/08/2010 09:19 PM, martin f krafft wrote:
> also sprach Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> [2010.02.09.1038 +1300]:
>> I could probably actually live with "/dev/run" as the permanent
>> home for the mdmon files: /dev/run/mdmon/*.{sock,pid} It
>> addresses most of the issues I had with the original suggestion
>> (hidden files, non-generic approach) so the "cons" are weaker.
>> And I now understand the "pros" better (races with cleaning
>> /var/run, issues with unmounting /var etc).
>
> Note that initramfs already carries /dev across the pivot_root, and
> initramfs already uses /dev/.initramfs to carry stuff across.
And the things carried in there should be able to be trivially moved to
a final location.
> I am not sure /dev/run will fly past the Debian Police. On the other
> hand, it would be convenient, since it'll work out-of-the-box, at
> least on Debian systems. I don't really like the idea of a symlink
> in / though. Nor do I really have a better idea.
Persuant to the comments that this should work even if /dev is not
read/write, it really needs to officially be a top level directory (or
else some other mount point that is separate from /dev I think, I guess
it could be in /tmp itself).
>> Anyone second the motion?
>
> I am all for finding a solution that works, but I don't think it's
> as easy as "the standards are slow, so let's just forge ahead with
> mdadm only and give them something to standardise".
>
> I wouldn't mind avoiding all the bikeshedding, and maybe it'll just
> work, but having to change things later might possibly be a lot of
> trouble — after all, we don't want to break people's systems then.
I don't think so. Once it's all set up, any future change should be no
more than a coordinate package update cycle where initscripts, mkinitrd,
dracut, and a few other select packages that use the locations are all
updated simultaneously.
> On the other hand, this is something that is reinitialised on every
> boot, isn't it? If that's the case and there don't seem to be
> complications with a later move, then I say: yeah, let's go ahead.
>
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2010-02-10 0:58 ` Mr. James W. Laferriere
[not found] ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 66+ messages in thread
From: Mr. James W. Laferriere @ 2010-02-10 0:58 UTC (permalink / raw)
To: Doug Ledford
Cc: Neil Brown, Dan Williams, linux-raid maillist,
initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede,
Bill Nottingham
Hello Doug ,
On Tue, 9 Feb 2010, Doug Ledford wrote:
> On 02/08/2010 09:19 PM, martin f krafft wrote:
...snip...
>> I am all for finding a solution that works, but I don't think it's
>> as easy as "the standards are slow, so let's just forge ahead with
>> mdadm only and give them something to standardise".
>>
>> I wouldn't mind avoiding all the bikeshedding, and maybe it'll just
>> work, but having to change things later might possibly be a lot of
>> trouble ? after all, we don't want to break people's systems then.
>
> I don't think so. Once it's all set up, any future change should be no
> more than a coordinate package update cycle where initscripts, mkinitrd,
> dracut, and a few other select packages that use the locations are all
> updated simultaneously.
The key words in the above are: 'select packages' & 'simultaneously' .
How is the community of linux distributors going to accomplish this ?
Heck some of the distributors don't even use the same names for packages
that do exactly the same thing .
The Simultainity is going to hurt alot more than is mentioned here .
But that said , the idea of a /'name' area for this is imo a very good
thing . Rather hiding it below others .
Tia , JimL
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network&System Engineer | 3237 Holden Road | Give me Linux |
| babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org | Fairbanks, AK. 99709 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org>
@ 2010-02-10 1:33 ` Neil Brown
2010-02-10 9:46 ` Harald Hoyer
[not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
0 siblings, 2 replies; 66+ messages in thread
From: Neil Brown @ 2010-02-10 1:33 UTC (permalink / raw)
To: Mr. James W. Laferriere
Cc: Doug Ledford, Dan Williams, linux-raid maillist,
initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede,
Bill Nottingham
On Tue, 9 Feb 2010 15:58:52 -0900 (AKST)
"Mr. James W. Laferriere" <babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org> wrote:
> Hello Doug ,
>
> On Tue, 9 Feb 2010, Doug Ledford wrote:
> > On 02/08/2010 09:19 PM, martin f krafft wrote:
> ...snip...
> >> I am all for finding a solution that works, but I don't think it's
> >> as easy as "the standards are slow, so let's just forge ahead with
> >> mdadm only and give them something to standardise".
> >>
> >> I wouldn't mind avoiding all the bikeshedding, and maybe it'll just
> >> work, but having to change things later might possibly be a lot of
> >> trouble ? after all, we don't want to break people's systems then.
> >
> > I don't think so. Once it's all set up, any future change should be no
> > more than a coordinate package update cycle where initscripts, mkinitrd,
> > dracut, and a few other select packages that use the locations are all
> > updated simultaneously.
>
> The key words in the above are: 'select packages' & 'simultaneously' .
> How is the community of linux distributors going to accomplish this ?
> Heck some of the distributors don't even use the same names for packages
> that do exactly the same thing .
>
> The Simultainity is going to hurt alot more than is mentioned here .
Simultaneity only needs to be within one host, not across all distros. I
don't think it should be that hard to manage.
>
> But that said , the idea of a /'name' area for this is imo a very good
> thing . Rather hiding it below others .
Thanks.
One idea that has occurred to me is that maybe /sys is the right place to put
this stuff!!! If only sysfs directories could be writeable, I could write the
pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a
similar name.
I could of course get the md module to create a file called "mdmon.pid" and
allow it to be read and written much like a normal file. But I don't think I
want to do that - and I couldn't use that solution for the socket in any case.
Not a short-term solution, but something to keep in mind longer-term maybe...
NeilBrown
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-10 1:33 ` Neil Brown
@ 2010-02-10 9:46 ` Harald Hoyer
[not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
1 sibling, 0 replies; 66+ messages in thread
From: Harald Hoyer @ 2010-02-10 9:46 UTC (permalink / raw)
To: Neil Brown
Cc: Mr. James W. Laferriere, Doug Ledford, Dan Williams,
linux-raid maillist, initramfs, Michal Marek, Hans de Goede,
Bill Nottingham
On 02/10/2010 02:33 AM, Neil Brown wrote:
> One idea that has occurred to me is that maybe /sys is the right place to put
> this stuff!!! If only sysfs directories could be writeable, I could write the
> pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a
> similar name.
Another idea might also be /dev/shm/ which is also writable..
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2010-02-10 15:49 ` Dan Williams
2010-02-10 16:06 ` Michael Evans
0 siblings, 1 reply; 66+ messages in thread
From: Dan Williams @ 2010-02-10 15:49 UTC (permalink / raw)
To: Neil Brown
Cc: Mr. James W. Laferriere, Doug Ledford, linux-raid maillist,
initramfs-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Hans de Goede,
Bill Nottingham
On Tue, Feb 9, 2010 at 6:33 PM, Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
> On Tue, 9 Feb 2010 15:58:52 -0900 (AKST)
> "Mr. James W. Laferriere" <babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org> wrote:
>>
>> But that said , the idea of a /'name' area for this is imo a very good
>> thing . Rather hiding it below others .
>
> Thanks.
>
> One idea that has occurred to me is that maybe /sys is the right place to put
> this stuff!!! If only sysfs directories could be writeable, I could write the
> pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a
> similar name.
Hmm... we already have /sys/kernel/debug as a simple mount point for
debugfs. What about adding /sys/kernel/init as a mount point for this
tmpfs?
--
Dan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
2010-02-10 15:49 ` Dan Williams
@ 2010-02-10 16:06 ` Michael Evans
[not found] ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 66+ messages in thread
From: Michael Evans @ 2010-02-10 16:06 UTC (permalink / raw)
To: Dan Williams
Cc: Neil Brown, Mr. James W. Laferriere, Doug Ledford,
linux-raid maillist, initramfs, Michal Marek, Hans de Goede,
Bill Nottingham
On Wed, Feb 10, 2010 at 7:49 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Feb 9, 2010 at 6:33 PM, Neil Brown <neilb@suse.de> wrote:
>> On Tue, 9 Feb 2010 15:58:52 -0900 (AKST)
>> "Mr. James W. Laferriere" <babydr@baby-dragons.com> wrote:
>>>
>>> But that said , the idea of a /'name' area for this is imo a very good
>>> thing . Rather hiding it below others .
>>
>> Thanks.
>>
>> One idea that has occurred to me is that maybe /sys is the right place to put
>> this stuff!!! If only sysfs directories could be writeable, I could write the
>> pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a
>> similar name.
>
> Hmm... we already have /sys/kernel/debug as a simple mount point for
> debugfs. What about adding /sys/kernel/init as a mount point for this
> tmpfs?
>
> --
> Dan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Except that isn't quite accurate is it? This is less to do with init
for the kernel and more to do with various pieces of system state
information.
/sys/early_rw
Isn't very descriptive, but might make sense. It also might not quite
be what we want to mean, as the files in it could also linger past
root unmount as the system is brought down.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
[not found] ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-02-11 2:30 ` Doug Ledford
0 siblings, 0 replies; 66+ messages in thread
From: Doug Ledford @ 2010-02-11 2:30 UTC (permalink / raw)
To: Michael Evans
Cc: Dan Williams, Neil Brown, Mr. James W. Laferriere,
linux-raid maillist, initramfs-u79uwXL29TY76Z2rM5mHXA,
Michal Marek, Hans de Goede, Bill Nottingham
[-- Attachment #1: Type: text/plain, Size: 4105 bytes --]
On 02/10/2010 11:06 AM, Michael Evans wrote:
> On Wed, Feb 10, 2010 at 7:49 AM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> On Tue, Feb 9, 2010 at 6:33 PM, Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
>>> On Tue, 9 Feb 2010 15:58:52 -0900 (AKST)
>>> "Mr. James W. Laferriere" <babydr-hujCQpUib4khwW3g317DAQ@public.gmane.org> wrote:
>>>>
>>>> But that said , the idea of a /'name' area for this is imo a very good
>>>> thing . Rather hiding it below others .
>>>
>>> Thanks.
>>>
>>> One idea that has occurred to me is that maybe /sys is the right place to put
>>> this stuff!!! If only sysfs directories could be writeable, I could write the
>>> pid file in /sys/class/block/md0/md/mdmon.pid and create a socket with a
>>> similar name.
>>
>> Hmm... we already have /sys/kernel/debug as a simple mount point for
>> debugfs. What about adding /sys/kernel/init as a mount point for this
>> tmpfs?
>>
>> --
>> Dan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Except that isn't quite accurate is it? This is less to do with init
> for the kernel and more to do with various pieces of system state
> information.
>
> /sys/early_rw
>
> Isn't very descriptive, but might make sense. It also might not quite
> be what we want to mean, as the files in it could also linger past
> root unmount as the system is brought down.
Well, *no* name really fits well so far. The fact of the matter is that
we are talking about lots of different types of files, and different
lifetimes for those files. Eg: dhcp lease file, only needs to be there
until moved after root mounted; mdmon.sock file, can't be moved, but
when mdmon is restart will get new file in proper location (unless we
use this location on restart too in order to avoid shutdown issues,
although I'm not convinced we need to do this, seems to me we could just
as easily switch to using remount ro as the norm instead of umount and
problem solved); mdmon.pid file so we know what processes to restart;
other files too that I'm not so familiar with. The only thing all these
files have in common is that they violate a core tenet of unix
philosophy/prior art. Specifically, the concept of everything as a file
in unix means that the unix kernel is not really functional without a
filesystem. Hence why unix never booted into a basic interpreter
without a disk, but instead always panicked. But, in the past, old time
unix kernels always brought up the root filesystem before doing anything
else. That is no longer true, and we are struggling to access our root
filesystem to create files when the real root filesystem does not yet
exist. That is the one thing all of these files have in common. That
they are being created before the kernel is ready to deal with files
properly. So since this is specifically a kernel not ready thing, I
think /sys/kernel makes sense. Then I would suggest naming whatever we
put in there according to this one common trait. I could see
/sys/kernel/pre-init-tmp (or ptmp for short). If someone wanted to do
some neat kernel programming maybe we could make /sys/kernel/early-root
and allow programs to create files in there as well as directory
hiearchies, and maybe add a syscall that would actually move all the
files in here to the real root sometime after pivot root and read write
bring up are complete (that would just be cool...no manually moving
files, just bring root up r/w, clean out /var/run and any other cleanups
we do before proceeding, then do this syscall and get things moved from
early-root to the real root). Anyway, my $.02.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2010-02-11 2:30 UTC | newest]
Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
2010-01-18 22:01 ` Neil Brown
2010-01-18 22:13 ` Dan Williams
2010-01-19 1:55 ` Doug Ledford
2010-01-19 4:42 ` Dan Williams
2010-01-19 5:31 ` Doug Ledford
2010-01-19 5:47 ` Dan Williams
2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford
2010-01-18 22:09 ` Neil Brown
2010-01-19 7:21 ` Luca Berra
2010-01-19 17:51 ` Doug Ledford
2010-02-01 20:32 ` Bill Davidsen
2010-02-01 21:32 ` Doug Ledford
2010-02-01 22:42 ` Bill Davidsen
2010-02-02 4:08 ` Michael Evans
2010-02-02 7:17 ` Luca Berra
2010-02-02 15:42 ` Bill Davidsen
2010-02-02 18:19 ` Doug Ledford
2010-02-04 13:50 ` Bernd Schubert
2010-02-04 15:03 ` Bernd Schubert
2010-02-04 15:48 ` Doug Ledford
2010-02-04 16:40 ` Bernd Schubert
2010-02-04 17:35 ` Doug Ledford
2010-02-02 18:11 ` Doug Ledford
2010-02-02 18:07 ` Doug Ledford
2010-02-02 18:18 ` Bill Davidsen
2010-02-04 6:40 ` Neil Brown
2010-02-04 18:45 ` Doug Ledford
[not found] ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-04 23:04 ` Dan Williams
[not found] ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-05 0:21 ` Bill Davidsen
2010-02-05 12:14 ` Luca Berra
2010-02-06 17:51 ` Doug Ledford
[not found] ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-06 21:07 ` Dan Williams
[not found] ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-06 21:46 ` martin f krafft
2010-02-06 22:06 ` Michael Evans
2010-02-08 15:32 ` Doug Ledford
2010-02-08 21:38 ` Neil Brown
2010-02-09 0:20 ` Michael Evans
[not found] ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-09 2:19 ` martin f krafft
[not found] ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org>
2010-02-09 20:34 ` Doug Ledford
[not found] ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-10 0:58 ` Mr. James W. Laferriere
[not found] ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org>
2010-02-10 1:33 ` Neil Brown
2010-02-10 9:46 ` Harald Hoyer
[not found] ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-10 15:49 ` Dan Williams
2010-02-10 16:06 ` Michael Evans
[not found] ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-11 2:30 ` Doug Ledford
2010-02-09 20:30 ` Doug Ledford
2010-02-08 4:23 ` Neil Brown
2010-02-07 22:13 ` Hans de Goede
2010-02-07 23:06 ` Neil Brown
2010-02-08 3:45 ` Neil Brown
2010-02-08 16:56 ` Bill Nottingham
2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford
2010-01-18 22:02 ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford
2010-01-18 22:03 ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford
2010-01-18 22:03 ` Neil Brown
2010-01-12 0:49 ` Minor mdadm fixes Mr. James W. Laferriere
2010-01-12 3:10 ` Andre Noll
2010-01-12 3:36 ` Doug Ledford
2010-01-12 4:39 ` Andre Noll
2010-01-12 4:46 ` Doug Ledford
2010-01-12 5:21 ` Andre Noll
2010-01-18 22:05 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).