linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Partitioned arrays initially missing from /proc/partitions
@ 2006-12-01 20:53 Mike Accetta
  2007-04-23 14:56 ` David Greaves
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Accetta @ 2006-12-01 20:53 UTC (permalink / raw)
  To: linux-raid

In setting up a partitioned array as the boot disk and using a nash 
initrd to find the root file system by volume label, I see a delay in 
the appearance of the /dev/md_d0p partitions in /proc/partitions.  When 
the mdadm --assemble command completes, only /dev/md_d0 is visible. 
Since the raid partitions are not visible after the assemble, the volume 
label search will not consult them in looking for the root volume and 
the boot gets aborted. When I run a similar assemble command while up 
multi-user in a friendlier debug environment I see the same effect and 
observe that pretty much any access of /dev/md_d0 has the side effect of 
then making the /dev/md_d0p partitions visible in /proc/partitions.

I tried a few experiments changing the --assemble code in mdadm.  If I 
open() and close() /dev/md_d0 after assembly *before* closing the file 
descriptor which the assemble step used to assemble the array, there is 
no effect.  Even doing a BLKRRPART ioctl call on the assembly fd or the 
newly opened fd have no effect.  The kernel prints "unknown partition" 
diagnostics on the console.  However, if the assembly fd is first 
close()'d, a simple open() of /dev/md_d0 and immediate close() of that 
fd has the side effect of making the /dev/md_d0p partitions visible and 
one sees the console disk partitioning confirmation from the kernel as well.

Adding the open()/close() after assembly within mdadm solves my problem, 
but I thought I'd raise the issue on the list as it seems there is a bug 
somewhere.  I see in the kernel md driver that the RUN_ARRAY ioctl() 
calls do_md_run() which calls md_probe() which calls add_disk() and I 
gather that this would normally have the side effect of making the 
partitions visible.  However, my experiments at user level seem to imply 
that the array isn't completely usable until the assembly file 
descriptor is closed, even on return from the ioctl(), and hence the 
kernel add_disk() isn't having the desired partitioning side effect at 
the point it is being invoked.

This is all with kernel 2.6.18 and mdadm 2.3.1
-- 
Mike Accetta

ECI Telecom Ltd.
Data Networking Division (previously Laurel Networks)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2006-12-01 20:53 Partitioned arrays initially missing from /proc/partitions Mike Accetta
@ 2007-04-23 14:56 ` David Greaves
  2007-04-23 19:31   ` Mike Accetta
  0 siblings, 1 reply; 15+ messages in thread
From: David Greaves @ 2007-04-23 14:56 UTC (permalink / raw)
  To: Mike Accetta, Neil Brown; +Cc: linux-raid

Hi Neil

I think this is a bug.

Essentially if I create an auto=part md device then I get md_d0p? partitions.
If I stop the array and just re-assemble, I don't.

It looks like the same (?) problem as Mike (see below - Mike do you have a
patch?) but I'm on 2.6.20.7 with mdadm v2.5.6

FWIW I upgraded from 2.6.16 where it worked (but used in-kernel detection which
isn't working in 2.6.20 for some reason but I don't mind).


Here's a simple sequence of commands:

teak:~# mdadm --stop /dev/md_d0
mdadm: stopped /dev/md_d0

teak:~# mdadm --create /dev/md_d0 -l5 -n5 --bitmap=internal -e1.2 --auto=part
--name media --force /dev/sde1 /dev/sdc1 /dev/sdd1 missing /dev/sdf1
mdadm: /dev/sde1 appears to be part of a raid array:
    level=raid5 devices=5 ctime=Mon Apr 23 15:02:13 2007
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=5 ctime=Mon Apr 23 15:02:13 2007
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=5 ctime=Mon Apr 23 15:02:13 2007
mdadm: /dev/sdf1 appears to be part of a raid array:
    level=raid5 devices=5 ctime=Mon Apr 23 15:02:13 2007
Continue creating array? y
mdadm: array /dev/md_d0 started.

teak:~# grep md /proc/partitions
 254     0 1250241792 md_d0
 254     1 1250144138 md_d0p1
 254     2      97652 md_d0p2

teak:~# mdadm --stop /dev/md_d0
mdadm: stopped /dev/md_d0

teak:~# mdadm --assemble /dev/md_d0 --auto=part  /dev/sde1 /dev/sdc1 /dev/sdd1
/dev/sdf1
mdadm: /dev/md_d0 has been started with 4 drives (out of 5).

teak:~# grep md /proc/partitions
 254     0 1250241792 md_d0


If I then run cfdisk it finds the partition table. I write this and get:
teak:~# cfdisk /dev/md_d0

Disk has been changed.

WARNING: If you have created or modified any
DOS 6.x partitions, please see the cfdisk manual
page for additional information.
teak:~# grep md /proc/partitions
 254     0 1250241792 md_d0
 254     1 1250144138 md_d0p1
 254     2      97652 md_d0p2


and the syslog:
Apr 23 15:13:13 localhost kernel: md: md_d0 stopped.
Apr 23 15:13:13 localhost kernel: md: unbind<sde1>
Apr 23 15:13:13 localhost kernel: md: export_rdev(sde1)
Apr 23 15:13:13 localhost kernel: md: unbind<sdf1>
Apr 23 15:13:13 localhost kernel: md: export_rdev(sdf1)
Apr 23 15:13:13 localhost kernel: md: unbind<sdd1>
Apr 23 15:13:13 localhost kernel: md: export_rdev(sdd1)
Apr 23 15:13:13 localhost kernel: md: unbind<sdc1>
Apr 23 15:13:13 localhost kernel: md: export_rdev(sdc1)
Apr 23 15:13:13 localhost mdadm: DeviceDisappeared event detected on md device
/dev/md_d0
Apr 23 15:13:36 localhost kernel: md: bind<sde1>
Apr 23 15:13:36 localhost kernel: md: bind<sdc1>
Apr 23 15:13:36 localhost kernel: md: bind<sdd1>
Apr 23 15:13:36 localhost kernel: md: bind<sdf1>
Apr 23 15:13:36 localhost kernel: raid5: device sdf1 operational as raid disk 4
Apr 23 15:13:36 localhost kernel: raid5: device sdd1 operational as raid disk 2
Apr 23 15:13:36 localhost kernel: raid5: device sdc1 operational as raid disk 1
Apr 23 15:13:36 localhost kernel: raid5: device sde1 operational as raid disk 0
Apr 23 15:13:36 localhost kernel: raid5: allocated 5236kB for md_d0
Apr 23 15:13:36 localhost kernel: raid5: raid level 5 set md_d0 active with 4
out of 5 devices, algorithm 2
Apr 23 15:13:36 localhost kernel: RAID5 conf printout:
Apr 23 15:13:36 localhost kernel:  --- rd:5 wd:4
Apr 23 15:13:36 localhost kernel:  disk 0, o:1, dev:sde1
Apr 23 15:13:36 localhost kernel:  disk 1, o:1, dev:sdc1
Apr 23 15:13:36 localhost kernel:  disk 2, o:1, dev:sdd1
Apr 23 15:13:36 localhost kernel:  disk 4, o:1, dev:sdf1
Apr 23 15:13:36 localhost kernel: md_d0: bitmap initialized from disk: read 1/1
pages, set 19078 bits, status: 0
Apr 23 15:13:36 localhost kernel: created bitmap (10 pages) for device md_d0
Apr 23 15:13:36 localhost kernel:  md_d0: p1 p2
Apr 23 15:13:54 localhost kernel: md: md_d0 stopped.
Apr 23 15:13:54 localhost kernel: md: unbind<sdf1>
Apr 23 15:13:54 localhost kernel: md: export_rdev(sdf1)
Apr 23 15:13:54 localhost kernel: md: unbind<sdd1>
Apr 23 15:13:54 localhost kernel: md: export_rdev(sdd1)
Apr 23 15:13:54 localhost kernel: md: unbind<sdc1>
Apr 23 15:13:54 localhost kernel: md: export_rdev(sdc1)
Apr 23 15:13:54 localhost kernel: md: unbind<sde1>
Apr 23 15:13:54 localhost kernel: md: export_rdev(sde1)
Apr 23 15:13:54 localhost mdadm: DeviceDisappeared event detected on md device
/dev/md_d0
Apr 23 15:14:04 localhost kernel: md: md_d0 stopped.
Apr 23 15:14:04 localhost kernel: md: bind<sdc1>
Apr 23 15:14:04 localhost kernel: md: bind<sdd1>
Apr 23 15:14:04 localhost kernel: md: bind<sdf1>
Apr 23 15:14:04 localhost kernel: md: bind<sde1>
Apr 23 15:14:04 localhost kernel: raid5: device sde1 operational as raid disk 0
Apr 23 15:14:04 localhost kernel: raid5: device sdf1 operational as raid disk 4
Apr 23 15:14:04 localhost kernel: raid5: device sdd1 operational as raid disk 2
Apr 23 15:14:04 localhost kernel: raid5: device sdc1 operational as raid disk 1
Apr 23 15:14:04 localhost kernel: raid5: allocated 5236kB for md_d0
Apr 23 15:14:04 localhost kernel: raid5: raid level 5 set md_d0 active with 4
out of 5 devices, algorithm 2
Apr 23 15:14:04 localhost kernel: RAID5 conf printout:
Apr 23 15:14:04 localhost kernel:  --- rd:5 wd:4
Apr 23 15:14:04 localhost kernel:  disk 0, o:1, dev:sde1
Apr 23 15:14:04 localhost kernel:  disk 1, o:1, dev:sdc1
Apr 23 15:14:04 localhost kernel:  disk 2, o:1, dev:sdd1
Apr 23 15:14:04 localhost kernel:  disk 4, o:1, dev:sdf1
Apr 23 15:14:04 localhost kernel: md_d0: bitmap initialized from disk: read 1/1
pages, set 0 bits, status: 0
Apr 23 15:14:04 localhost kernel: created bitmap (10 pages) for device md_d0
Apr 23 15:14:04 localhost kernel:  md_d0: unknown partition table

after cfdisk write:
Apr 23 15:33:00 localhost kernel:  md_d0: p1 p2


Back in dec 2006,Mike Accetta wrote:
> In setting up a partitioned array as the boot disk and using a nash
> initrd to find the root file system by volume label, I see a delay in
> the appearance of the /dev/md_d0p partitions in /proc/partitions.  When
> the mdadm --assemble command completes, only /dev/md_d0 is visible.
> Since the raid partitions are not visible after the assemble, the volume
> label search will not consult them in looking for the root volume and
> the boot gets aborted. When I run a similar assemble command while up
> multi-user in a friendlier debug environment I see the same effect and
> observe that pretty much any access of /dev/md_d0 has the side effect of
> then making the /dev/md_d0p partitions visible in /proc/partitions.
>
> I tried a few experiments changing the --assemble code in mdadm.  If I
> open() and close() /dev/md_d0 after assembly *before* closing the file
> descriptor which the assemble step used to assemble the array, there is
> no effect.  Even doing a BLKRRPART ioctl call on the assembly fd or the
> newly opened fd have no effect.  The kernel prints "unknown partition"
> diagnostics on the console.  However, if the assembly fd is first
> close()'d, a simple open() of /dev/md_d0 and immediate close() of that
> fd has the side effect of making the /dev/md_d0p partitions visible and
> one sees the console disk partitioning confirmation from the kernel as
> well.
>
> Adding the open()/close() after assembly within mdadm solves my problem,
> but I thought I'd raise the issue on the list as it seems there is a bug
> somewhere.  I see in the kernel md driver that the RUN_ARRAY ioctl()
> calls do_md_run() which calls md_probe() which calls add_disk() and I
> gather that this would normally have the side effect of making the
> partitions visible.  However, my experiments at user level seem to imply
> that the array isn't completely usable until the assembly file
> descriptor is closed, even on return from the ioctl(), and hence the
> kernel add_disk() isn't having the desired partitioning side effect at
> the point it is being invoked.
>
> This is all with kernel 2.6.18 and mdadm 2.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-23 14:56 ` David Greaves
@ 2007-04-23 19:31   ` Mike Accetta
  2007-04-23 23:52     ` Neil Brown
  2007-04-24  9:37     ` David Greaves
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Accetta @ 2007-04-23 19:31 UTC (permalink / raw)
  To: David Greaves; +Cc: Neil Brown, linux-raid

David Greaves writes:

...
> It looks like the same (?) problem as Mike (see below - Mike do you have a
> patch?) but I'm on 2.6.20.7 with mdadm v2.5.6
...

We have since started assembling the array from the initrd using
--homehost and --auto-update-homehost which takes a different path through
the code, and in this path the kernel figures out there are partitions
on the array before mdadm exists.

For the previous code path, we had been ruuning with the patch I described
in my original post which I've included below.  I'd guess that the bug
is actually in the kernel code and I looked at it briefly but couldn't
figure out how things all fit together well enough to come up with a
patch there.  The user level patch is a bit of a hack and there may be
other code paths that also need a similar patch.  I only made this patch
in the assembly code path we were executing at the time.

==== BUILD/mdadm/mdadm.c#2 (text) - BUILD/mdadm/mdadm.c#3 (text) ==== content
@@ -983,6 +983,10 @@
                                                               NULL,
                                                               readonly, runstop, NULL, verbose-quiet, force);
                                        close(mdfd);
+                                       mdfd = open(array_list->devname, O_RDONLY); 
+                                       if (mdfd >= 0) {
+                                           close(mdfd);
+                                       }
                                }
                }
                break;
--
Mike Accetta

ECI Telecom Ltd.
Data Networking Division (previously Laurel Networks)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-23 19:31   ` Mike Accetta
@ 2007-04-23 23:52     ` Neil Brown
  2007-04-24  9:22       ` David Greaves
  2007-04-24 10:49       ` David Greaves
  2007-04-24  9:37     ` David Greaves
  1 sibling, 2 replies; 15+ messages in thread
From: Neil Brown @ 2007-04-23 23:52 UTC (permalink / raw)
  To: Mike Accetta; +Cc: David Greaves, linux-raid


This problem is very hard to solve inside the kernel.
The partitions will not be visible until the array is opened *after*
it has been created.  Making the partitions visible before that would
be possible, but would be very easy.

I think the best solution is Mike's solution which is to simply
open/close the array after it has been assembled.  I will make sure
this is in the next release of mdadm.

Note that you can still access the partitions even though they do not
appear in /proc/partitions.  Any attempt to access and of them will
make them all appear in /proc/partitions.  But I understand there is
sometimes value in seeing them before accessing them.

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-23 23:52     ` Neil Brown
@ 2007-04-24  9:22       ` David Greaves
  2007-04-24 10:57         ` Neil Brown
  2007-04-24 10:49       ` David Greaves
  1 sibling, 1 reply; 15+ messages in thread
From: David Greaves @ 2007-04-24  9:22 UTC (permalink / raw)
  To: Neil Brown; +Cc: Mike Accetta, linux-raid

Neil Brown wrote:
> This problem is very hard to solve inside the kernel.
> The partitions will not be visible until the array is opened *after*
> it has been created.  Making the partitions visible before that would
> be possible, but would be very easy.
> 
> I think the best solution is Mike's solution which is to simply
> open/close the array after it has been assembled.  I will make sure
> this is in the next release of mdadm.
> 
> Note that you can still access the partitions even though they do not
> appear in /proc/partitions. Any attempt to access and of them will
> make them all appear in /proc/partitions.  But I understand there is
> sometimes value in seeing them before accessing them.
> 
> NeilBrown

Um. Are you sure?
The reason I noticed is that I couldn't mount them until they appeared; see
these cut'n'pastes from my terminal history:

teak:~# mount /media/
mount: /dev/md_d0p1 is not a valid block device

teak:~# mount /dev/md_d0p1 /media
mount: you must specify the filesystem type

teak:~# xfs_repair -ln /dev/md_d0p2 /dev/md_d0p1
Usage: xfs_repair [-nLvV] [-o subopt[=value]] [-l logdev] [-r rtdev] devname
teak:~# ll /dev/md*
brw-rw---- 1 root disk 254, 0 2007-04-23 15:44 /dev/md_d0
brw-rw---- 1 root disk 254, 1 2007-04-23 14:46 /dev/md_d0p1
brw-rw---- 1 root disk 254, 2 2007-04-23 14:46 /dev/md_d0p2
brw-rw---- 1 root disk 254, 3 2007-04-23 15:44 /dev/md_d0p3
brw-rw---- 1 root disk 254, 4 2007-04-23 15:44 /dev/md_d0p4

/dev/md:
total 0
teak:~# /etc/init.d/mdadm-raid stop
Stopping MD array md_d0...done (stopped).
teak:~# /etc/init.d/mdadm-raid start
Assembling MD array md_d0...done (degraded [4/5]).
Generating udev events for MD arrays...done.
teak:~# cfdisk /dev/md_d0

teak:~# mount /dev/md_d0p1
mount: /dev/md_d0p1 is not a valid block device

and so on...

Notice the cfdisk command above. I did this to check the on-array table (it was
good). I assume cfdisk opens the array - but the partitions were still not there
afterwards. I did not do a 'Write' from in cfdisk this time.

I wouldn't be so concerned at a cosmetic thing in /proc/partitions - the problem
is that I can't mount my array after doing an assemble and I have to --create
each time - not the nicest solution.

Oh, I'm using udev FWIW.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-23 19:31   ` Mike Accetta
  2007-04-23 23:52     ` Neil Brown
@ 2007-04-24  9:37     ` David Greaves
  2007-04-24  9:46       ` David Greaves
  1 sibling, 1 reply; 15+ messages in thread
From: David Greaves @ 2007-04-24  9:37 UTC (permalink / raw)
  To: Mike Accetta, Neil Brown; +Cc: linux-raid

Mike Accetta wrote:
> David Greaves writes:
> 
> ...
>> It looks like the same (?) problem as Mike (see below - Mike do you have a
>> patch?) but I'm on 2.6.20.7 with mdadm v2.5.6
> ...
> 
> We have since started assembling the array from the initrd using
> --homehost and --auto-update-homehost which takes a different path through
> the code, and in this path the kernel figures out there are partitions
> on the array before mdadm exists.
Just tried that - doesn't work :)


> For the previous code path, we had been ruuning with the patch I described
> in my original post which I've included below.  I'd guess that the bug
> is actually in the kernel code and I looked at it briefly but couldn't
> figure out how things all fit together well enough to come up with a
> patch there.  The user level patch is a bit of a hack and there may be
> other code paths that also need a similar patch.  I only made this patch
> in the assembly code path we were executing at the time.
> 
> ==== BUILD/mdadm/mdadm.c#2 (text) - BUILD/mdadm/mdadm.c#3 (text) ==== content
> @@ -983,6 +983,10 @@
>                                                                NULL,
>                                                                readonly, runstop, NULL, verbose-quiet, force);
>                                         close(mdfd);
> +                                       mdfd = open(array_list->devname, O_RDONLY); 
> +                                       if (mdfd >= 0) {
> +                                           close(mdfd);
> +                                       }
>                                 }
>                 }
>                 break;

Thanks Mike

But this doesn't work for me either :(

I changed array_list to devlist inline with 2.6.9 and it compiles and runs OK.

teak:~# mdadm --stop /dev/md_d0
mdadm: stopped /dev/md_d0
teak:~# /everything/devel/mdadm/mdadm-2.5.6/mdadm --assemble /dev/md_d0
/dev/sd[bcdef]1
mdadm: With Fudge.
mdadm: /dev/md_d0 has been started with 5 drives.
mdadm: Fudging partition creation.
teak:~# mount /media
mount: /dev/md_d0p1 is not a valid block device
teak:~#

I also wrote a small c program to call the RAID_AUTORUN ioctl - that didn't work
either because I'd compiled RAID as a module so the ioctl isn't defined.

currently recompiling the kernel to allow autorun...

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24  9:37     ` David Greaves
@ 2007-04-24  9:46       ` David Greaves
  0 siblings, 0 replies; 15+ messages in thread
From: David Greaves @ 2007-04-24  9:46 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown

David Greaves wrote:

> currently recompiling the kernel to allow autorun...

Which of course won't work because I'm on 1.2 superblocks:
md: Autodetecting RAID arrays.
md: invalid raid superblock magic on sdb1
md: sdb1 has invalid sb, not importing!
md: invalid raid superblock magic on sdc1
md: sdc1 has invalid sb, not importing!
md: invalid raid superblock magic on sdd1
md: sdd1 has invalid sb, not importing!
md: invalid raid superblock magic on sde1
md: sde1 has invalid sb, not importing!
md: invalid raid superblock magic on sdf1
md: sdf1 has invalid sb, not importing!
md: autorun ...
md: ... autorun DONE.


David

PS Dropped Mike from cc since I doubt he's too interested :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-23 23:52     ` Neil Brown
  2007-04-24  9:22       ` David Greaves
@ 2007-04-24 10:49       ` David Greaves
  2007-04-24 11:38         ` Neil Brown
  1 sibling, 1 reply; 15+ messages in thread
From: David Greaves @ 2007-04-24 10:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

Neil Brown wrote:
> This problem is very hard to solve inside the kernel.
> The partitions will not be visible until the array is opened *after*
> it has been created.  Making the partitions visible before that would
> be possible, but would be very easy.
> 
> I think the best solution is Mike's solution which is to simply
> open/close the array after it has been assembled.  I will make sure
> this is in the next release of mdadm.
> 
> Note that you can still access the partitions even though they do not
> appear in /proc/partitions.  Any attempt to access and of them will
> make them all appear in /proc/partitions.  But I understand there is
> sometimes value in seeing them before accessing them.
> 
> NeilBrown

For anyone else who is in this boat and doesn't fancy finding somewhere in mdadm
 to hack, here's a simple program that issues the BLKRRPART ioctl.
This re-reads the block device partition table and 'works for me'.

I think partx -a would do the same job but for some reason partx isn't in
utils-linux for Debian...

Neil, isn't it easy to just do this after an assemble?

David


[-- Attachment #2: raid_readparts.c --]
[-- Type: text/x-csrc, Size: 642 bytes --]

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>

#include <linux/raid/md_u.h>
#include <linux/major.h>
#include </usr/include/linux/fs.h>

int main(int argc, char *argv[])
{
    int fd;

	if (argc != 2)
       fprintf(stderr, "Usage: %s <md device>\n", argv[0]);

	
    if ((fd = open(argv[1], O_RDONLY)) == -1) {
       fprintf(stderr, "Can't open md device %s\n", argv[1]);
       return -1;
    }

	if (ioctl(fd,  BLKRRPART, NULL) != 0) {
       fprintf(stderr, "ioctl failed\n");
        close (fd);
        return -1;
    }

    close (fd);

    return 0;
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24  9:22       ` David Greaves
@ 2007-04-24 10:57         ` Neil Brown
  2007-04-24 12:00           ` David Greaves
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2007-04-24 10:57 UTC (permalink / raw)
  To: David Greaves; +Cc: Mike Accetta, linux-raid

On Tuesday April 24, david@dgreaves.com wrote:
> Neil Brown wrote:
> > This problem is very hard to solve inside the kernel.
> > The partitions will not be visible until the array is opened *after*
> > it has been created.  Making the partitions visible before that would
> > be possible, but would be very easy.
> > 
> > I think the best solution is Mike's solution which is to simply
> > open/close the array after it has been assembled.  I will make sure
> > this is in the next release of mdadm.
> > 
> > Note that you can still access the partitions even though they do not
> > appear in /proc/partitions. Any attempt to access and of them will
> > make them all appear in /proc/partitions.  But I understand there is
> > sometimes value in seeing them before accessing them.
> > 
> > NeilBrown
> 
> Um. Are you sure?

"Works for me".

What happens if you
  blockdev --rereadpt /dev/md_d0
?? It probably works then.
It sounds like someone is deliberately removing all the partition
info.

Can you try this patch and see if it reports anyone calling
'2' on md_d0 ??

diff .prev/block/ioctl.c ./block/ioctl.c
--- .prev/block/ioctl.c	2007-04-17 11:42:15.000000000 +1000
+++ ./block/ioctl.c	2007-04-24 20:55:41.000000000 +1000
@@ -17,6 +17,7 @@ static int blkpg_ioctl(struct block_devi
 	long long start, length;
 	int part;
 	int i;
+	char b[BDEVNAME_SIZE];
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
@@ -30,6 +31,8 @@ static int blkpg_ioctl(struct block_devi
 	part = p.pno;
 	if (part <= 0 || part >= disk->minors)
 		return -EINVAL;
+	printk("blkpg_ioctl: %s called %d on %s\n",
+	       current->comm, a.op, bdevname(bdev, b));
 	switch (a.op) {
 		case BLKPG_ADD_PARTITION:
 			start = p.start >> 9;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24 10:49       ` David Greaves
@ 2007-04-24 11:38         ` Neil Brown
  2007-04-24 12:32           ` David Greaves
  2007-04-24 15:39           ` Doug Ledford
  0 siblings, 2 replies; 15+ messages in thread
From: Neil Brown @ 2007-04-24 11:38 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

On Tuesday April 24, david@dgreaves.com wrote:
> Neil Brown wrote:
> > This problem is very hard to solve inside the kernel.
> > The partitions will not be visible until the array is opened *after*
> > it has been created.  Making the partitions visible before that would
> > be possible, but would be very easy.
> > 
> > I think the best solution is Mike's solution which is to simply
> > open/close the array after it has been assembled.  I will make sure
> > this is in the next release of mdadm.
> > 
> > Note that you can still access the partitions even though they do not
> > appear in /proc/partitions.  Any attempt to access and of them will
> > make them all appear in /proc/partitions.  But I understand there is
> > sometimes value in seeing them before accessing them.
> > 
> > NeilBrown
> 
> For anyone else who is in this boat and doesn't fancy finding somewhere in mdadm
>  to hack, here's a simple program that issues the BLKRRPART ioctl.
> This re-reads the block device partition table and 'works for me'.

blockdev --rereadpt /dev/md_d0
does the same thing.

> 
> I think partx -a would do the same job but for some reason partx isn't in
> utils-linux for Debian...
> 
> Neil, isn't it easy to just do this after an assemble?

Yes, but it should not be needed, and I'd like to understand why it
is.
One of the last things do_md_run does is
   mddev->changed = 1;

When you next open /dev/md_d0, md_open is called which calls
check_disk_change().
This will call into md_fops->md_media_changed which will return the
value of mddev->changed, which will be '1'.
So check_disk_change will then call md_fops->revalidate_disk which
will set mddev->changed to 0, and will then set bd_invalidated to 1
(as bd_disk->minors > 1 (being 64)).

md_open will then return into do_open (in fs/block_dev.c) and because
bd_invalidated is true, it will call rescan_partitions and the
partitions will appear.

Hmmm... there is room for a race there.  If some other process opens
/dev/md_d0 before mdadm gets to close it, it will call
rescan_partitions before first calling  bd_set_size to update the size
of the bdev.  So when we try to read the partition table, it will
appear to be reading past the EOF, and will not actually read
anything..

I guess udev must be opening the block device at exactly the wrong
time. 

I can simulate this by holding /dev/md_d0 open while assembling the
array.  If I do that, the partitions don't get created.
Yuck.

Maybe I could call bd_set_size in md_open before calling
check_disk_change..

Yep, this patch seems to fix it.  Could you confirm?

Thanks,

NeilBrown

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2007-04-17 11:42:15.000000000 +1000
+++ ./drivers/md/md.c	2007-04-24 21:29:51.000000000 +1000
@@ -4485,6 +4485,8 @@ static int md_open(struct inode *inode, 
 	mddev_get(mddev);
 	mddev_unlock(mddev);
 
+	if (mddev->changed)
+		bd_set_size(inode->i_bdev, mddev->array_size << 1);
 	check_disk_change(inode->i_bdev);
  out:
 	return err;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24 10:57         ` Neil Brown
@ 2007-04-24 12:00           ` David Greaves
  0 siblings, 0 replies; 15+ messages in thread
From: David Greaves @ 2007-04-24 12:00 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Tuesday April 24, david@dgreaves.com wrote:
>> Neil Brown wrote:
>>> This problem is very hard to solve inside the kernel.
>>> The partitions will not be visible until the array is opened *after*
>>> it has been created.  Making the partitions visible before that would
>>> be possible, but would be very easy.
>>>
>>> I think the best solution is Mike's solution which is to simply
>>> open/close the array after it has been assembled.  I will make sure
>>> this is in the next release of mdadm.
>>>
>>> Note that you can still access the partitions even though they do not
>>> appear in /proc/partitions. Any attempt to access and of them will
>>> make them all appear in /proc/partitions.  But I understand there is
>>> sometimes value in seeing them before accessing them.
>>>
>>> NeilBrown
>> Um. Are you sure?
> 
> "Works for me".
Lucky you ;)

> What happens if you
>   blockdev --rereadpt /dev/md_d0
> ?? It probably works then.
Well, that's probably the same as my BLKRRPART ioctl so I guess yes.
[confirmed - yes, but blockdev seems to do it twice - I get 2 kernel messages]

> It sounds like someone is deliberately removing all the partition
> info.
Gremlins?

> Can you try this patch and see if it reports anyone calling
> '2' on md_d0 ??

Nope, not being called at all.

teak:~# mdadm --assemble /dev/md_d0 --auto=parts /dev/sd[bcdef]1
mdadm: /dev/md_d0 has been started with 5 drives.

dmesg:
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdb1>
md: bind<sdf1>
md: bind<sde1>
raid5: device sde1 operational as raid disk 0
raid5: device sdf1 operational as raid disk 4
raid5: device sdb1 operational as raid disk 3
raid5: device sdd1 operational as raid disk 2
raid5: device sdc1 operational as raid disk 1
raid5: allocated 5236kB for md_d0
raid5: raid level 5 set md_d0 active with 5 out of 5 devices, algorithm 2
RAID5 conf printout:
 --- rd:5 wd:5
 disk 0, o:1, dev:sde1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdb1
 disk 4, o:1, dev:sdf1
md_d0: bitmap initialized from disk: read 1/1 pages, set 0 bits, status: 0
created bitmap (10 pages) for device md_d0


teak:~# mount /media
mount: special device /dev/md_d0p1 does not exist

no dmesg


teak:~# blockdev --rereadpt /dev/md_d0
dmesg:
 md_d0: p1 p2
 md_d0: p1 p2


did I mention 2.6.20.7 and mdadm v2.5.6 and udev

I'd be happy if I've done something wrong...

anyway, more config data...

teak:~# mdadm --detail /dev/md_d0
/dev/md_d0:
        Version : 01.02.03
  Creation Time : Mon Apr 23 15:13:35 2007
     Raid Level : raid5
     Array Size : 1250241792 (1192.32 GiB 1280.25 GB)
    Device Size : 625120896 (298.08 GiB 320.06 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Apr 24 12:49:26 2007
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : media
           UUID : f7835ba6:e38b6feb:c0cd2e2d:3079db59
         Events : 25292

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       5       8       17        3      active sync   /dev/sdb1
       4       8       81        4      active sync   /dev/sdf1
teak:~# cat /etc/mdadm/mdadm.conf
DEVICE partitions
ARRAY /dev/md_d0 auto=part level=raid5 num-devices=5
UUID=f7835ba6:e38b6feb:c0cd2e2d:3079db59
MAILADDR david@dgreaves.com



David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24 11:38         ` Neil Brown
@ 2007-04-24 12:32           ` David Greaves
  2007-05-07  8:28             ` David Greaves
  2007-04-24 15:39           ` Doug Ledford
  1 sibling, 1 reply; 15+ messages in thread
From: David Greaves @ 2007-04-24 12:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Tuesday April 24, david@dgreaves.com wrote:
>> Neil, isn't it easy to just do this after an assemble?
> 
> Yes, but it should not be needed, and I'd like to understand why it
> is.
> One of the last things do_md_run does is
>    mddev->changed = 1;
> 
> When you next open /dev/md_d0, md_open is called which calls
> check_disk_change().
> This will call into md_fops->md_media_changed which will return the
> value of mddev->changed, which will be '1'.
> So check_disk_change will then call md_fops->revalidate_disk which
> will set mddev->changed to 0, and will then set bd_invalidated to 1
> (as bd_disk->minors > 1 (being 64)).
> 
> md_open will then return into do_open (in fs/block_dev.c) and because
> bd_invalidated is true, it will call rescan_partitions and the
> partitions will appear.
> 
> Hmmm... there is room for a race there.  If some other process opens
> /dev/md_d0 before mdadm gets to close it, it will call
> rescan_partitions before first calling  bd_set_size to update the size
> of the bdev.  So when we try to read the partition table, it will
> appear to be reading past the EOF, and will not actually read
> anything..
> 
> I guess udev must be opening the block device at exactly the wrong
> time. 
> 
> I can simulate this by holding /dev/md_d0 open while assembling the
> array.  If I do that, the partitions don't get created.
> Yuck.
> 
> Maybe I could call bd_set_size in md_open before calling
> check_disk_change..
> 
> Yep, this patch seems to fix it.  Could you confirm?
almost...

teak:~# mdadm --assemble /dev/md_d0 --auto=parts /dev/sd[bcdef]1
mdadm: /dev/md_d0 has been started with 5 drives.
teak:~# mount /media
teak:~# umount /media
teak:~# mdadm --stop /dev/md_d0
mdadm: stopped /dev/md_d0
teak:~# mdadm --assemble /dev/md_d0 --auto=parts /dev/sd[bcdef]1
mdadm: /dev/md_d0 has been started with 5 drives.
teak:~# mount /media
mount: No such file or directory
teak:~# mount /media
teak:~#
(second mount succeeds second time around)



md: md_d0 stopped.
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdb1>
md: bind<sdf1>
md: bind<sde1>
raid5: device sde1 operational as raid disk 0
raid5: device sdf1 operational as raid disk 4
raid5: device sdb1 operational as raid disk 3
raid5: device sdd1 operational as raid disk 2
raid5: device sdc1 operational as raid disk 1
raid5: allocated 5236kB for md_d0
raid5: raid level 5 set md_d0 active with 5 out of 5 devices, algorithm 2
RAID5 conf printout:
 --- rd:5 wd:5
 disk 0, o:1, dev:sde1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdb1
 disk 4, o:1, dev:sdf1
md_d0: bitmap initialized from disk: read 1/1 pages, set 0 bits, status: 0
created bitmap (10 pages) for device md_d0
 md_d0: p1 p2
Filesystem "md_d0p1": Disabling barriers, not supported with external log device
XFS mounting filesystem md_d0p1
Ending clean XFS mount for filesystem: md_d0p1
md: md_d0 stopped.
md: unbind<sde1>
md: export_rdev(sde1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: md_d0 stopped.
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdb1>
md: bind<sdf1>
md: bind<sde1>
raid5: device sde1 operational as raid disk 0
raid5: device sdf1 operational as raid disk 4
raid5: device sdb1 operational as raid disk 3
raid5: device sdd1 operational as raid disk 2
raid5: device sdc1 operational as raid disk 1
raid5: allocated 5236kB for md_d0
raid5: raid level 5 set md_d0 active with 5 out of 5 devices, algorithm 2
RAID5 conf printout:
 --- rd:5 wd:5
 disk 0, o:1, dev:sde1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sdb1
 disk 4, o:1, dev:sdf1
md_d0: bitmap initialized from disk: read 1/1 pages, set 0 bits, status: 0
created bitmap (10 pages) for device md_d0
 md_d0: p1 p2
XFS: Invalid device [/dev/md_d0p2], error=-2
Filesystem "md_d0p1": Disabling barriers, not supported with external log device
XFS mounting filesystem md_d0p1
Ending clean XFS mount for filesystem: md_d0p1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24 11:38         ` Neil Brown
  2007-04-24 12:32           ` David Greaves
@ 2007-04-24 15:39           ` Doug Ledford
  1 sibling, 0 replies; 15+ messages in thread
From: Doug Ledford @ 2007-04-24 15:39 UTC (permalink / raw)
  To: Neil Brown; +Cc: David Greaves, linux-raid

Neil Brown wrote:
>
> Yes, but it should not be needed, and I'd like to understand why it
> is.
> One of the last things do_md_run does is
>    mddev->changed = 1;
> 
> When you next open /dev/md_d0, md_open is called which calls
> check_disk_change().
> This will call into md_fops->md_media_changed which will return the
> value of mddev->changed, which will be '1'.
> So check_disk_change will then call md_fops->revalidate_disk which
> will set mddev->changed to 0, and will then set bd_invalidated to 1
> (as bd_disk->minors > 1 (being 64)).
> 
> md_open will then return into do_open (in fs/block_dev.c) and because
> bd_invalidated is true, it will call rescan_partitions and the
> partitions will appear.

Yuck.  The md stack should populate the partition information on device 
creation *without* needing someone to open the resulting device.  That 
you can tweak mdadm to open the device after creation is fine, but 
unless no other program is allowed to use the ioctls to start devices, 
and unless this is a documented part of the API, waiting until second 
open to populate the device info is just flat wrong.  It breaks all 
sorts of expectations people have regarding things like mount by label, etc.

> Hmmm... there is room for a race there.  If some other process opens
> /dev/md_d0 before mdadm gets to close it, it will call
> rescan_partitions before first calling  bd_set_size to update the size
> of the bdev.  So when we try to read the partition table, it will
> appear to be reading past the EOF, and will not actually read
> anything..
> 
> I guess udev must be opening the block device at exactly the wrong
> time. 
> 
> I can simulate this by holding /dev/md_d0 open while assembling the
> array.  If I do that, the partitions don't get created.
> Yuck.
> 
> Maybe I could call bd_set_size in md_open before calling
> check_disk_change..
> 
> Yep, this patch seems to fix it.  Could you confirm?
> 
> Thanks,
> 
> NeilBrown
> 
> diff .prev/drivers/md/md.c ./drivers/md/md.c
> --- .prev/drivers/md/md.c	2007-04-17 11:42:15.000000000 +1000
> +++ ./drivers/md/md.c	2007-04-24 21:29:51.000000000 +1000
> @@ -4485,6 +4485,8 @@ static int md_open(struct inode *inode, 
>  	mddev_get(mddev);
>  	mddev_unlock(mddev);
>  
> +	if (mddev->changed)
> +		bd_set_size(inode->i_bdev, mddev->array_size << 1);
>  	check_disk_change(inode->i_bdev);
>   out:
>  	return err;
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford@redhat.com>
http://people.redhat.com/dledford

Infiniband specific RPMs can be found at
http://people.redhat.com/dledford/Infiniband

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-04-24 12:32           ` David Greaves
@ 2007-05-07  8:28             ` David Greaves
  2007-05-07  9:01               ` Neil Brown
  0 siblings, 1 reply; 15+ messages in thread
From: David Greaves @ 2007-05-07  8:28 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Doug Ledford

Hi Neil

Just wondering what the status is here - do you need any more from me or is it
on your stack?

The patch helped but didn't cure.
After a clean boot it mounted correctly first try.
Then I unmounted, stopped and re-assembled the array.
The next mount failed.
The subsequent mount succeeded.

How do other block devices initialise their partitions on 'discovery'?

David

David Greaves wrote:
> Neil Brown wrote:
>> On Tuesday April 24, david@dgreaves.com wrote:
>>> Neil, isn't it easy to just do this after an assemble?
>> Yes, but it should not be needed, and I'd like to understand why it
>> is.
>> One of the last things do_md_run does is
>>    mddev->changed = 1;
>>
>> When you next open /dev/md_d0, md_open is called which calls
>> check_disk_change().
>> This will call into md_fops->md_media_changed which will return the
>> value of mddev->changed, which will be '1'.
>> So check_disk_change will then call md_fops->revalidate_disk which
>> will set mddev->changed to 0, and will then set bd_invalidated to 1
>> (as bd_disk->minors > 1 (being 64)).
>>
>> md_open will then return into do_open (in fs/block_dev.c) and because
>> bd_invalidated is true, it will call rescan_partitions and the
>> partitions will appear.
>>
>> Hmmm... there is room for a race there.  If some other process opens
>> /dev/md_d0 before mdadm gets to close it, it will call
>> rescan_partitions before first calling  bd_set_size to update the size
>> of the bdev.  So when we try to read the partition table, it will
>> appear to be reading past the EOF, and will not actually read
>> anything..
>>
>> I guess udev must be opening the block device at exactly the wrong
>> time. 
>>
>> I can simulate this by holding /dev/md_d0 open while assembling the
>> array.  If I do that, the partitions don't get created.
>> Yuck.
>>
>> Maybe I could call bd_set_size in md_open before calling
>> check_disk_change..
>>
>> Yep, this patch seems to fix it.  Could you confirm?
> almost...
> 
> teak:~# mdadm --assemble /dev/md_d0 --auto=parts /dev/sd[bcdef]1
> mdadm: /dev/md_d0 has been started with 5 drives.
> teak:~# mount /media
> teak:~# umount /media
> teak:~# mdadm --stop /dev/md_d0
> mdadm: stopped /dev/md_d0
> teak:~# mdadm --assemble /dev/md_d0 --auto=parts /dev/sd[bcdef]1
> mdadm: /dev/md_d0 has been started with 5 drives.
> teak:~# mount /media
> mount: No such file or directory
> teak:~# mount /media
> teak:~#
> (second mount succeeds second time around)
> 
> 
> 
> md: md_d0 stopped.
> md: bind<sdc1>
> md: bind<sdd1>
> md: bind<sdb1>
> md: bind<sdf1>
> md: bind<sde1>
> raid5: device sde1 operational as raid disk 0
> raid5: device sdf1 operational as raid disk 4
> raid5: device sdb1 operational as raid disk 3
> raid5: device sdd1 operational as raid disk 2
> raid5: device sdc1 operational as raid disk 1
> raid5: allocated 5236kB for md_d0
> raid5: raid level 5 set md_d0 active with 5 out of 5 devices, algorithm 2
> RAID5 conf printout:
>  --- rd:5 wd:5
>  disk 0, o:1, dev:sde1
>  disk 1, o:1, dev:sdc1
>  disk 2, o:1, dev:sdd1
>  disk 3, o:1, dev:sdb1
>  disk 4, o:1, dev:sdf1
> md_d0: bitmap initialized from disk: read 1/1 pages, set 0 bits, status: 0
> created bitmap (10 pages) for device md_d0
>  md_d0: p1 p2
> Filesystem "md_d0p1": Disabling barriers, not supported with external log device
> XFS mounting filesystem md_d0p1
> Ending clean XFS mount for filesystem: md_d0p1
> md: md_d0 stopped.
> md: unbind<sde1>
> md: export_rdev(sde1)
> md: unbind<sdf1>
> md: export_rdev(sdf1)
> md: unbind<sdb1>
> md: export_rdev(sdb1)
> md: unbind<sdd1>
> md: export_rdev(sdd1)
> md: unbind<sdc1>
> md: export_rdev(sdc1)
> md: md_d0 stopped.
> md: bind<sdc1>
> md: bind<sdd1>
> md: bind<sdb1>
> md: bind<sdf1>
> md: bind<sde1>
> raid5: device sde1 operational as raid disk 0
> raid5: device sdf1 operational as raid disk 4
> raid5: device sdb1 operational as raid disk 3
> raid5: device sdd1 operational as raid disk 2
> raid5: device sdc1 operational as raid disk 1
> raid5: allocated 5236kB for md_d0
> raid5: raid level 5 set md_d0 active with 5 out of 5 devices, algorithm 2
> RAID5 conf printout:
>  --- rd:5 wd:5
>  disk 0, o:1, dev:sde1
>  disk 1, o:1, dev:sdc1
>  disk 2, o:1, dev:sdd1
>  disk 3, o:1, dev:sdb1
>  disk 4, o:1, dev:sdf1
> md_d0: bitmap initialized from disk: read 1/1 pages, set 0 bits, status: 0
> created bitmap (10 pages) for device md_d0
>  md_d0: p1 p2
> XFS: Invalid device [/dev/md_d0p2], error=-2
> Filesystem "md_d0p1": Disabling barriers, not supported with external log device
> XFS mounting filesystem md_d0p1
> Ending clean XFS mount for filesystem: md_d0p1
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Partitioned arrays initially missing from /proc/partitions
  2007-05-07  8:28             ` David Greaves
@ 2007-05-07  9:01               ` Neil Brown
  0 siblings, 0 replies; 15+ messages in thread
From: Neil Brown @ 2007-05-07  9:01 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid, Doug Ledford

On Monday May 7, david@dgreaves.com wrote:
> Hi Neil
> 
> Just wondering what the status is here - do you need any more from me or is it
> on your stack?
> 
> The patch helped but didn't cure.
> After a clean boot it mounted correctly first try.
> Then I unmounted, stopped and re-assembled the array.
> The next mount failed.
> The subsequent mount succeeded.

I just wrote the following patch.  I think it does what you want.
Let me know how it goes.

> 
> How do other block devices initialise their partitions on 'discovery'?
> 

They don't call add_disk() until the disk actually exists.
md has to call add_disk before the array exists, so that it can be
created.

NeilBrown

-----------------------------------
Improve partition detection in md array.

md currently uses ->media_changed to make sure rescan_partitions
is call on md array after they are assembled.

However that doesn't happen until the array is opened, which is later
than some people would like.

So use blkdev_ioctl to do the rescan immediately that the
array has been assembled.

This means we can remove all the ->change infrastructure as it was only used
to trigger a partition rescan.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c           |   26 ++++++++------------------
 ./drivers/md/raid1.c        |    1 -
 ./drivers/md/raid5.c        |    2 --
 ./include/linux/raid/md_k.h |    6 +++---
 4 files changed, 11 insertions(+), 24 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2007-05-07 16:43:02.000000000 +1000
+++ ./drivers/md/md.c	2007-05-07 17:47:15.000000000 +1000
@@ -3104,6 +3104,7 @@ static int do_md_run(mddev_t * mddev)
 	struct gendisk *disk;
 	struct mdk_personality *pers;
 	char b[BDEVNAME_SIZE];
+	struct block_device *bdev;
 
 	if (list_empty(&mddev->disks))
 		/* cannot run an array with no devices.. */
@@ -3331,7 +3332,13 @@ static int do_md_run(mddev_t * mddev)
 	md_wakeup_thread(mddev->thread);
 	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
 
-	mddev->changed = 1;
+	bdev = bdget_disk(mddev->gendisk, 0);
+	if (bdev) {
+		bd_set_size(bdev, mddev->array_size << 1);
+		blkdev_ioctl(bdev->bd_inode, NULL, BLKRRPART, 0);
+		bdput(bdev);
+	}
+
 	md_new_event(mddev);
 	kobject_uevent(&mddev->gendisk->kobj, KOBJ_CHANGE);
 	return 0;
@@ -3453,7 +3460,6 @@ static int do_md_stop(mddev_t * mddev, i
 			mddev->pers = NULL;
 
 			set_capacity(disk, 0);
-			mddev->changed = 1;
 
 			if (mddev->ro)
 				mddev->ro = 0;
@@ -4593,20 +4599,6 @@ static int md_release(struct inode *inod
 	return 0;
 }
 
-static int md_media_changed(struct gendisk *disk)
-{
-	mddev_t *mddev = disk->private_data;
-
-	return mddev->changed;
-}
-
-static int md_revalidate(struct gendisk *disk)
-{
-	mddev_t *mddev = disk->private_data;
-
-	mddev->changed = 0;
-	return 0;
-}
 static struct block_device_operations md_fops =
 {
 	.owner		= THIS_MODULE,
@@ -4614,8 +4606,6 @@ static struct block_device_operations md
 	.release	= md_release,
 	.ioctl		= md_ioctl,
 	.getgeo		= md_getgeo,
-	.media_changed	= md_media_changed,
-	.revalidate_disk= md_revalidate,
 };
 
 static int md_thread(void * arg)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c	2007-05-07 16:43:01.000000000 +1000
+++ ./drivers/md/raid1.c	2007-05-07 17:02:27.000000000 +1000
@@ -2063,7 +2063,6 @@ static int raid1_resize(mddev_t *mddev, 
 	 */
 	mddev->array_size = sectors>>1;
 	set_capacity(mddev->gendisk, mddev->array_size << 1);
-	mddev->changed = 1;
 	if (mddev->array_size > mddev->size && mddev->recovery_cp == MaxSector) {
 		mddev->recovery_cp = mddev->size << 1;
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2007-05-07 16:43:01.000000000 +1000
+++ ./drivers/md/raid5.c	2007-05-07 17:03:05.000000000 +1000
@@ -4514,7 +4514,6 @@ static int raid5_resize(mddev_t *mddev, 
 	sectors &= ~((sector_t)mddev->chunk_size/512 - 1);
 	mddev->array_size = (sectors * (mddev->raid_disks-conf->max_degraded))>>1;
 	set_capacity(mddev->gendisk, mddev->array_size << 1);
-	mddev->changed = 1;
 	if (sectors/2  > mddev->size && mddev->recovery_cp == MaxSector) {
 		mddev->recovery_cp = mddev->size << 1;
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
@@ -4649,7 +4648,6 @@ static void end_reshape(raid5_conf_t *co
 		conf->mddev->array_size = conf->mddev->size *
 			(conf->raid_disks - conf->max_degraded);
 		set_capacity(conf->mddev->gendisk, conf->mddev->array_size << 1);
-		conf->mddev->changed = 1;
 
 		bdev = bdget_disk(conf->mddev->gendisk, 0);
 		if (bdev) {

diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h	2007-05-07 16:43:02.000000000 +1000
+++ ./include/linux/raid/md_k.h	2007-05-07 17:00:19.000000000 +1000
@@ -201,9 +201,9 @@ struct mddev_s
 	struct mutex			reconfig_mutex;
 	atomic_t			active;
 
-	int				changed;	/* true if we might need to reread partition info */
-	int				degraded;	/* whether md should consider
-							 * adding a spare
+	int				degraded;	/* whether md should
+							 * consider adding a
+							 * spare
 							 */
 	int				barriers_work;	/* initialised to true, cleared as soon
 							 * as a barrier request to slave

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-05-07  9:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-01 20:53 Partitioned arrays initially missing from /proc/partitions Mike Accetta
2007-04-23 14:56 ` David Greaves
2007-04-23 19:31   ` Mike Accetta
2007-04-23 23:52     ` Neil Brown
2007-04-24  9:22       ` David Greaves
2007-04-24 10:57         ` Neil Brown
2007-04-24 12:00           ` David Greaves
2007-04-24 10:49       ` David Greaves
2007-04-24 11:38         ` Neil Brown
2007-04-24 12:32           ` David Greaves
2007-05-07  8:28             ` David Greaves
2007-05-07  9:01               ` Neil Brown
2007-04-24 15:39           ` Doug Ledford
2007-04-24  9:37     ` David Greaves
2007-04-24  9:46       ` David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).