linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Partitioned raid and major number
@ 2004-02-25 14:56 Miquel van Smoorenburg
  2004-02-25 18:46 ` H. Peter Anvin
  2004-02-25 23:25 ` Neil Brown
  0 siblings, 2 replies; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-25 14:56 UTC (permalink / raw)
  To: linux-raid

Hello,

	I see that Linus merged partitioned raid into bitkeeper.
The major number of partitioned raid devices is allocated dynamically.

I want to set up a server with 2 disks in RAID1 mode, partitioned.
To be able to boot from it, the RAID1 device needs to have a fixed
major number (I don't want to be forced to use an initrd). Is it
planned to ask LANANA for a fixed major number? If not, would a
patch to pass the major number on the kernel command line be accepted ?

Thanks,

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-25 14:56 Partitioned raid and major number Miquel van Smoorenburg
@ 2004-02-25 18:46 ` H. Peter Anvin
  2004-02-25 23:25 ` Neil Brown
  1 sibling, 0 replies; 23+ messages in thread
From: H. Peter Anvin @ 2004-02-25 18:46 UTC (permalink / raw)
  To: linux-raid

Followup to:  <20040225145624.GA1513@cistron.nl>
By author:    Miquel van Smoorenburg <miquels@cistron.nl>
In newsgroup: linux.dev.raid
>
> Hello,
> 
> 	I see that Linus merged partitioned raid into bitkeeper.
> The major number of partitioned raid devices is allocated dynamically.
> 
> I want to set up a server with 2 disks in RAID1 mode, partitioned.
> To be able to boot from it, the RAID1 device needs to have a fixed
> major number (I don't want to be forced to use an initrd). Is it
> planned to ask LANANA for a fixed major number? If not, would a
> patch to pass the major number on the kernel command line be accepted ?
> 

Please ask <device@lanana.org> for a device number.  Make sure to
specify if it's 2.6-specific (and hence will be assigned a number
above 255) or not.

	-hpa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-25 14:56 Partitioned raid and major number Miquel van Smoorenburg
  2004-02-25 18:46 ` H. Peter Anvin
@ 2004-02-25 23:25 ` Neil Brown
  2004-02-26 21:51   ` Miquel van Smoorenburg
  1 sibling, 1 reply; 23+ messages in thread
From: Neil Brown @ 2004-02-25 23:25 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Wednesday February 25, miquels@cistron.nl wrote:
> Hello,
> 
> 	I see that Linus merged partitioned raid into bitkeeper.
> The major number of partitioned raid devices is allocated dynamically.
> 
> I want to set up a server with 2 disks in RAID1 mode, partitioned.
> To be able to boot from it, the RAID1 device needs to have a fixed
> major number (I don't want to be forced to use an initrd). Is it
> planned to ask LANANA for a fixed major number? If not, would a
> patch to pass the major number on the kernel command line be accepted ?

The lack of a statically allocate device number is not the problem.
You can have a kernel parameter that says
       root=/dev/md_d0p1
and it should manage to find the device thanks to sysfs.
The bit that you cannot do yet is assemble the array as a
partitionable array rather than a non-partitionable array.

Would you be willing to try the following patch?

With it:
  If you put 
         raid=partitionable
  or just
         raid=part
as a kernel parameter, then all auto-detected raid arrays will be
partitionable, using the dynamically allocated major.

Also, if you use e.g. "md=0,/dev/sda,/dev/sdb" to assemble your arrays
at boot time, you can now use:
         "md=d0,/dev/sda,/dev/sdb"
to assemble as a partitionable array (so it will be /dev/md/d0 instead
of /dev/md0.  Hence the 'd').

I use md= to assembly my root arrays, so I would now use:

    md=d0,/dev/sda,/dev/sdb root=/dev/md_d0p1

to use the first partition of the raid array on sda and sdb as my root
filesystem.

Please let me know if you try it an whether it works. I'm will be
testing it in a day or so.

NeilBrown



 ----------- Diffstat output ------------
 ./drivers/md/md.c     |   19 ++++++---
 ./init/do_mounts_md.c |   99 +++++++++++++++++++++++++++++---------------------
 2 files changed, 70 insertions(+), 48 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2004-02-24 11:49:11.000000000 +1100
+++ ./drivers/md/md.c	2004-02-26 10:17:05.000000000 +1100
@@ -60,7 +60,7 @@
 
 
 #ifndef MODULE
-static void autostart_arrays (void);
+static void autostart_arrays (int part);
 #endif
 
 static mdk_personality_t *pers[MAX_PERSONALITY];
@@ -1795,7 +1795,7 @@ static void autorun_array(mddev_t *mddev
  *
  * If "unit" is allocated, then bump its reference count
  */
-static void autorun_devices(void)
+static void autorun_devices(int part)
 {
 	struct list_head candidates;
 	struct list_head *tmp;
@@ -1828,7 +1828,12 @@ static void autorun_devices(void)
 			       bdevname(rdev0->bdev, b), rdev0->preferred_minor);
 			break;
 		}
-		dev = MKDEV(MD_MAJOR, rdev0->preferred_minor);
+		if (part)
+			dev = MKDEV(mdp_major,
+				    rdev0->preferred_minor << MdpMinorShift);
+		else
+			dev = MKDEV(MD_MAJOR, rdev0->preferred_minor);
+
 		md_probe(dev, NULL, NULL);
 		mddev = mddev_find(dev);
 		if (!mddev) {
@@ -1925,7 +1930,7 @@ static int autostart_array(dev_t startde
 	/*
 	 * possibly return codes
 	 */
-	autorun_devices();
+	autorun_devices(0);
 	return 0;
 
 }
@@ -2507,7 +2512,7 @@ static int md_ioctl(struct inode *inode,
 #ifndef MODULE
 		case RAID_AUTORUN:
 			err = 0;
-			autostart_arrays();
+			autostart_arrays(arg);
 			goto done;
 #endif
 		default:;
@@ -3685,7 +3690,7 @@ void md_autodetect_dev(dev_t dev)
 }
 
 
-static void autostart_arrays(void)
+static void autostart_arrays(int part)
 {
 	char b[BDEVNAME_SIZE];
 	mdk_rdev_t *rdev;
@@ -3710,7 +3715,7 @@ static void autostart_arrays(void)
 	}
 	dev_cnt = 0;
 
-	autorun_devices();
+	autorun_devices(part);
 }
 
 #endif

diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~	2004-02-26 09:02:33.000000000 +1100
+++ ./init/do_mounts_md.c	2004-02-26 10:15:07.000000000 +1100
@@ -12,14 +12,17 @@
  * The code for that is here.
  */
 
-static int __initdata raid_noautodetect;
+static int __initdata raid_noautodetect, raid_autopart;
 
 static struct {
-	char device_set [MAX_MD_DEVS];
-	int pers[MAX_MD_DEVS];
-	int chunk[MAX_MD_DEVS];
-	char *device_names[MAX_MD_DEVS];
-} md_setup_args __initdata;
+	int minor;
+	int partitioned;
+	int pers;
+	int chunk;
+	char *device_names;
+} md_setup_args[MAX_MD_DEVS] __initdata;
+
+static int md_setup_ents __initdata;
 
 /*
  * Parse the command-line parameters given our kernel, but do not
@@ -43,21 +46,37 @@ static struct {
  */
 static int __init md_setup(char *str)
 {
-	int minor, level, factor, fault, pers;
+	int minor, level, factor, fault, pers, partitioned = 0;
 	char *pername = "";
-	char *str1 = str;
+	char *str1;
+	int ent;
 
+	if (*str == 'd') {
+		partitioned = 1;
+		str++;
+	}
 	if (get_option(&str, &minor) != 2) {	/* MD Number */
 		printk(KERN_WARNING "md: Too few arguments supplied to md=.\n");
 		return 0;
 	}
+	str1 = str;
 	if (minor >= MAX_MD_DEVS) {
 		printk(KERN_WARNING "md: md=%d, Minor device number too high.\n", minor);
 		return 0;
-	} else if (md_setup_args.device_names[minor]) {
-		printk(KERN_WARNING "md: md=%d, Specified more than once. "
-		       "Replacing previous definition.\n", minor);
 	}
+	for (ent=0 ; ent< md_setup_ents ; ent++) 
+		if (md_setup_args[ent].minor == minor &&
+		    md_setup_args[ent].partitioned == partitioned) {
+			printk(KERN_WARNING "md: md=%s%d, Specified more than once. "
+			       "Replacing previous definition.\n", partitioned?"d":"", minor);
+			break;
+		}
+	if (ent >= MAX_MD_DEVS) {
+		printk(KERN_WARNING "md: md=%s%d - too many md initialisations\n", partitioned?"d":"", minor);
+		return 0;
+	}
+	if (ent >= md_setup_ents)
+		md_setup_ents++;
 	switch (get_option(&str, &level)) {	/* RAID Personality */
 	case 2: /* could be 0 or -1.. */
 		if (level == 0 || level == LEVEL_LINEAR) {
@@ -66,24 +85,16 @@ static int __init md_setup(char *str)
 				printk(KERN_WARNING "md: Too few arguments supplied to md=.\n");
 				return 0;
 			}
-			md_setup_args.pers[minor] = level;
-			md_setup_args.chunk[minor] = 1 << (factor+12);
-			switch(level) {
-			case LEVEL_LINEAR:
+			md_setup_args[ent].pers = level;
+			md_setup_args[ent].chunk = 1 << (factor+12);
+			if (level ==  LEVEL_LINEAR) {
 				pers = LINEAR;
 				pername = "linear";
-				break;
-			case 0:
+			} else {
 				pers = RAID0;
 				pername = "raid0";
-				break;
-			default:
-				printk(KERN_WARNING
-				       "md: The kernel has not been configured for raid%d support!\n",
-				       level);
-				return 0;
 			}
-			md_setup_args.pers[minor] = pers;
+			md_setup_args[ent].pers = pers;
 			break;
 		}
 		/* FALL THROUGH */
@@ -91,35 +102,38 @@ static int __init md_setup(char *str)
 		str = str1;
 		/* FALL THROUGH */
 	case 0:
-		md_setup_args.pers[minor] = 0;
+		md_setup_args[ent].pers = 0;
 		pername="super-block";
 	}
 
 	printk(KERN_INFO "md: Will configure md%d (%s) from %s, below.\n",
 		minor, pername, str);
-	md_setup_args.device_names[minor] = str;
+	md_setup_args[ent].device_names = str;
+	md_setup_args[ent].partitioned = partitioned;
+	md_setup_args[ent].minor = minor;
 
 	return 1;
 }
 
 static void __init md_setup_drive(void)
 {
-	int minor, i;
+	int minor, i, ent, partitioned;
 	dev_t dev;
 	dev_t devices[MD_SB_DISKS+1];
 
-	for (minor = 0; minor < MAX_MD_DEVS; minor++) {
+	for (ent = 0; ent < md_setup_ents ; ent++) {
 		int fd;
 		int err = 0;
 		char *devname;
 		mdu_disk_info_t dinfo;
 		char name[16], devfs_name[16];
 
-		if (!(devname = md_setup_args.device_names[minor]))
-			continue;
-		
-		sprintf(name, "/dev/md%d", minor);
-		sprintf(devfs_name, "/dev/md/%d", minor);
+		minor = md_setup_args[ent].minor;
+		partitioned = md_setup_args[ent].partitioned;
+		devname = md_setup_args[ent].device_names;
+
+		sprintf(name, "/dev/md%s%d", partitioned?"_d":"", minor);
+		sprintf(devfs_name, "/dev/md/%s%d", partitioned?"d":"", minor);
 		create_dev(name, MKDEV(MD_MAJOR, minor), devfs_name);
 		for (i = 0; i < MD_SB_DISKS && devname != 0; i++) {
 			char *p;
@@ -143,20 +157,19 @@ static void __init md_setup_drive(void)
 			}
 
 			devices[i] = dev;
-			md_setup_args.device_set[minor] = 1;
 
 			devname = p;
 		}
 		devices[i] = 0;
 
-		if (!md_setup_args.device_set[minor])
+		if (!i)
 			continue;
 
-		printk(KERN_INFO "md: Loading md%d: %s\n", minor, md_setup_args.device_names[minor]);
+		printk(KERN_INFO "md: Loading md%s%d: %s\n", partitioned?"_d":"", minor, md_setup_args[ent].device_names);
 
 		fd = open(name, 0, 0);
 		if (fd < 0) {
-			printk(KERN_ERR "md: open failed - cannot start array %d\n", minor);
+			printk(KERN_ERR "md: open failed - cannot start array %s\n", name);
 			continue;
 		}
 		if (sys_ioctl(fd, SET_ARRAY_INFO, 0) == -EBUSY) {
@@ -167,10 +180,10 @@ static void __init md_setup_drive(void)
 			continue;
 		}
 
-		if (md_setup_args.pers[minor]) {
+		if (md_setup_args[ent].pers) {
 			/* non-persistent */
 			mdu_array_info_t ainfo;
-			ainfo.level = pers_to_level(md_setup_args.pers[minor]);
+			ainfo.level = pers_to_level(md_setup_args[ent].pers);
 			ainfo.size = 0;
 			ainfo.nr_disks =0;
 			ainfo.raid_disks =0;
@@ -181,7 +194,7 @@ static void __init md_setup_drive(void)
 
 			ainfo.state = (1 << MD_SB_CLEAN);
 			ainfo.layout = 0;
-			ainfo.chunk_size = md_setup_args.chunk[minor];
+			ainfo.chunk_size = md_setup_args[ent].chunk;
 			err = sys_ioctl(fd, SET_ARRAY_INFO, (long)&ainfo);
 			for (i = 0; !err && i <= MD_SB_DISKS; i++) {
 				dev = devices[i];
@@ -229,6 +242,10 @@ static int __init raid_setup(char *str)
 
 		if (!strncmp(str, "noautodetect", wlen))
 			raid_noautodetect = 1;
+		if (strncmp(str, "partitionable", wlen)==0)
+			raid_autopart = 1;
+		if (strncmp(str, "part", wlen)==0)
+			raid_autopart = 1;
 		pos += wlen+1;
 	}
 	return 1;
@@ -245,7 +262,7 @@ void __init md_run_setup(void)
 	else {
 		int fd = open("/dev/md0", 0, 0);
 		if (fd >= 0) {
-			sys_ioctl(fd, RAID_AUTORUN, 0);
+			sys_ioctl(fd, RAID_AUTORUN, raid_autopart);
 			close(fd);
 		}
 	}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-25 23:25 ` Neil Brown
@ 2004-02-26 21:51   ` Miquel van Smoorenburg
  2004-02-27  0:21     ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-26 21:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On Thu, 26 Feb 2004 00:25:46, Neil Brown wrote:
> On Wednesday February 25, miquels@cistron.nl wrote:
> > Hello,
> > 
> > 	I see that Linus merged partitioned raid into bitkeeper.
> > The major number of partitioned raid devices is allocated dynamically.
> > 
> > I want to set up a server with 2 disks in RAID1 mode, partitioned.
> > To be able to boot from it, the RAID1 device needs to have a fixed
> > major number (I don't want to be forced to use an initrd). Is it
> > planned to ask LANANA for a fixed major number? If not, would a
> > patch to pass the major number on the kernel command line be accepted ?
> 
> The lack of a statically allocate device number is not the problem.
> You can have a kernel parameter that says
>        root=/dev/md_d0p1
> and it should manage to find the device thanks to sysfs.
> The bit that you cannot do yet is assemble the array as a
> partitionable array rather than a non-partitionable array.

Hmm, is there anyone who has > 128 MD devices on one system? If not, why not
use the top 128 majors for, say, 8 MD devices each with 16 partitions ?
Then the kernel command line option "md=0,/dev/sda,/dev/sdb,part" would create
a partitionable md0 array. And if you don't add "part" things stay as they
were with 256 MD majors.

That's just a suggestion. Feel free to completely ignore it (as you
probably will since dynamically allocated majors, initrd and udev are the
future some people say ..)

> Would you be willing to try the following patch?

Sure.

> With it:
>   If you put 
>          raid=partitionable
>   or just
>          raid=part
> as a kernel parameter, then all auto-detected raid arrays will be
> partitionable, using the dynamically allocated major.
> 
> Also, if you use e.g. "md=0,/dev/sda,/dev/sdb" to assemble your arrays
> at boot time, you can now use:
>          "md=d0,/dev/sda,/dev/sdb"
> to assemble as a partitionable array (so it will be /dev/md/d0 instead
> of /dev/md0.  Hence the 'd').

I did exactly that on a server with /dev/sda and /dev/sdb (those are 2
SATA disks through libata). Since /dev/sda contains the currently installed
system I marked it as failed in /etc/raidtab before creating the array.

But it doesn't work.

# cat /proc/cmdline
auto BOOT_IMAGE=Linux ro root=801 raid=partitionable md=d0,/dev/sda,/dev/sdb
# ls /sys/block
md0  sda  sdb
# ls /sys/block/md0
dev  range  size  stat
# dmesg | grep md:
md: Will configure md0 (super-block) from /dev/sda,/dev/sdb, below.
md: raid1 personality registered as nr 3
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Loading md_d0: /dev/sda
md: invalid raid superblock magic on sda
md: sda has invalid sb, not importing!
md: md_import_device returned -22
md: bind<sdb>

Is it because the first disk is invalid ? That shouldn't happen, right?

Thanks,

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
@ 2004-02-26 23:01 Michael
  0 siblings, 0 replies; 23+ messages in thread
From: Michael @ 2004-02-26 23:01 UTC (permalink / raw)
  To: linux-raid

<snip>
> That's just a suggestion. Feel free to completely ignore it (as you
> probably will since dynamically allocated majors, initrd and udev
> are the future some people say ..)
> 
<snip>

Well.... :-(
Back at raid v 0.5x, initrd was the only way you could start and stop
root mounted raid and it was a huge pain in the butt. Modifying the
initrd file every time you must make a change to some little thing is
no fun at all. I for one am very fond of partition type "FD".

Michael
Michael@Insulin-Pumpers.org

Michael@Insulin-Pumpers.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-26 21:51   ` Miquel van Smoorenburg
@ 2004-02-27  0:21     ` Neil Brown
  2004-02-27  1:17       ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Brown @ 2004-02-27  0:21 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Thursday February 26, miquels@cistron.nl wrote:
> 
> Hmm, is there anyone who has > 128 MD devices on one system? If not, why not
> use the top 128 majors for, say, 8 MD devices each with 16 partitions ?
> Then the kernel command line option "md=0,/dev/sda,/dev/sdb,part" would create
> a partitionable md0 array. And if you don't add "part" things stay as they
> were with 256 MD majors.

Maybe there is someone with > 128 MD arrays.  There is no way to find
out except to break it and wait about a year or two.

I did consider having some partitionable and some non-partitionable
arrays under the one MAJOR.  It would be technically quite easy, but I
think it is conceptually harder to work with - the mapping from minor
number to device would not be what people have come to expect.

> > With it:
> >   If you put 
> >          raid=partitionable
> >   or just
> >          raid=part
> > as a kernel parameter, then all auto-detected raid arrays will be
> > partitionable, using the dynamically allocated major.
> > 
> > Also, if you use e.g. "md=0,/dev/sda,/dev/sdb" to assemble your arrays
> > at boot time, you can now use:
> >          "md=d0,/dev/sda,/dev/sdb"
> > to assemble as a partitionable array (so it will be /dev/md/d0 instead
> > of /dev/md0.  Hence the 'd').
> 
> I did exactly that on a server with /dev/sda and /dev/sdb (those are 2
> SATA disks through libata). Since /dev/sda contains the currently installed
> system I marked it as failed in /etc/raidtab before creating the array.
> 
> But it doesn't work.
> 
> # cat /proc/cmdline
> auto BOOT_IMAGE=Linux ro root=801 raid=partitionable md=d0,/dev/sda,/dev/sdb
> # ls /sys/block
> md0  sda  sdb
> # ls /sys/block/md0
> dev  range  size  stat
> # dmesg | grep md:
> md: Will configure md0 (super-block) from /dev/sda,/dev/sdb, below.
> md: raid1 personality registered as nr 3
> md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
> md: Autodetecting RAID arrays.
> md: autorun ...
> md: ... autorun DONE.
> md: Loading md_d0: /dev/sda
> md: invalid raid superblock magic on sda
> md: sda has invalid sb, not importing!
> md: md_import_device returned -22
> md: bind<sdb>
> 
> Is it because the first disk is invalid ? That shouldn't happen,
> right?

Right.  I missed a bit in the patch.
(I assume you are still wanting to boot off /dev/sda until you copy
the data into /dev/md/d0p* - then you will use root=/dev/md_d0p1)

NeilBrown



 ----------- Diffstat output ------------
 ./init/do_mounts_md.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~	2004-02-26 10:15:07.000000000 +1100
+++ ./init/do_mounts_md.c	2004-02-27 11:20:14.000000000 +1100
@@ -134,7 +134,8 @@ static void __init md_setup_drive(void)
 
 		sprintf(name, "/dev/md%s%d", partitioned?"_d":"", minor);
 		sprintf(devfs_name, "/dev/md/%s%d", partitioned?"d":"", minor);
-		create_dev(name, MKDEV(MD_MAJOR, minor), devfs_name);
+		dev = name_to_dev_t(name);
+		create_dev(name, dev, devfs_name);
 		for (i = 0; i < MD_SB_DISKS && devname != 0; i++) {
 			char *p;
 			char comp_name[64];

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-27  0:21     ` Neil Brown
@ 2004-02-27  1:17       ` Neil Brown
  2004-02-27 16:56         ` Miquel van Smoorenburg
  2004-03-09 16:46         ` Booting from partitioned raid, do_mounts_md.c patch " Miquel van Smoorenburg
  0 siblings, 2 replies; 23+ messages in thread
From: Neil Brown @ 2004-02-27  1:17 UTC (permalink / raw)
  To: Miquel van Smoorenburg, linux-raid

On Friday February 27, neilb@cse.unsw.edu.au wrote:
> 
> Right.  I missed a bit in the patch.
> (I assume you are still wanting to boot off /dev/sda until you copy
> the data into /dev/md/d0p* - then you will use root=/dev/md_d0p1)

Sorry, that patch was wrong.
This one, ontop of the original patch, works for me (I finally got
around to testing it).

 NeilBrown



 ----------- Diffstat output ------------
 ./drivers/md/md.c     |    2 +-
 ./init/do_mounts_md.c |    9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2004-02-26 10:17:05.000000000 +1100
+++ ./drivers/md/md.c	2004-02-27 12:06:31.000000000 +1100
@@ -1450,7 +1450,7 @@ abort:
 	return 1;
 }
 
-static int mdp_major = 0;
+int mdp_major = 0;
 
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {

diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~	2004-02-26 10:15:07.000000000 +1100
+++ ./init/do_mounts_md.c	2004-02-27 12:09:29.000000000 +1100
@@ -24,6 +24,7 @@ static struct {
 
 static int md_setup_ents __initdata;
 
+extern int mdp_major;
 /*
  * Parse the command-line parameters given our kernel, but do not
  * actually try to invoke the MD device now; that is handled by
@@ -115,6 +116,8 @@ static int __init md_setup(char *str)
 	return 1;
 }
 
+#define MdpMinorShift 6
+
 static void __init md_setup_drive(void)
 {
 	int minor, i, ent, partitioned;
@@ -134,7 +137,11 @@ static void __init md_setup_drive(void)
 
 		sprintf(name, "/dev/md%s%d", partitioned?"_d":"", minor);
 		sprintf(devfs_name, "/dev/md/%s%d", partitioned?"d":"", minor);
-		create_dev(name, MKDEV(MD_MAJOR, minor), devfs_name);
+		if (partitioned)
+			dev = MKDEV(mdp_major, minor << MdpMinorShift);
+		else
+			dev = MKDEV(MD_MAJOR, minor);
+		create_dev(name, dev, devfs_name);
 		for (i = 0; i < MD_SB_DISKS && devname != 0; i++) {
 			char *p;
 			char comp_name[64];

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-27  1:17       ` Neil Brown
@ 2004-02-27 16:56         ` Miquel van Smoorenburg
  2004-02-28  1:09           ` Miquel van Smoorenburg
  2004-03-01  0:16           ` Partitioned raid and major number Neil Brown
  2004-03-09 16:46         ` Booting from partitioned raid, do_mounts_md.c patch " Miquel van Smoorenburg
  1 sibling, 2 replies; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-27 16:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On 2004.02.27 02:17, Neil Brown wrote:
> On Friday February 27, neilb@cse.unsw.edu.au wrote:
> > 
> > Right.  I missed a bit in the patch.
> > (I assume you are still wanting to boot off /dev/sda until you copy
> > the data into /dev/md/d0p* - then you will use root=/dev/md_d0p1)
> 
> Sorry, that patch was wrong.
> This one, ontop of the original patch, works for me (I finally got
> around to testing it).

Yes, it works! Great.

Now how to enable RAID1 on an existing disk... I hoped that I could create
an array with /dev/sda and /dev/sdb, with /dev/sdb marked as failed-disk.
Because initializing a RAID1 array with just one working disk doesn't
destroy the existing contents of the disk, right ? (I kept the last few
MB of the disk as free space for the raid superblock).

Unfortunately the current tools (or the kernel) doesn't let me do that
(/dev/sda is busy).

Two more minor issues - one, if partitioned MD is on (/dev/md/d0 etc)
the standard /dev/md0 device doesn't work anymore. For accessing the whole
device (management purposes / tools) shouldn't both /dev/md0 and
/dev/md/d0 open the same device ?

Two, shouldn't raid=partitionable md=d0,/dev/sda,/dev/sdb simply be
md=d0,/dev/sda,/dev/sdb,partitionable ? You could even leave out the 'd'
then and make it md=0,/dev/sda,/dev/sdb,partitionable. Together with
(one) this would make a bit more sense.

I hope to figure out how to migrate an existing 1-disk setup to RAID1 on
a live machine over the weekend.

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-27 16:56         ` Miquel van Smoorenburg
@ 2004-02-28  1:09           ` Miquel van Smoorenburg
  2004-02-28  7:27             ` Luca Berra
  2004-03-01  0:54             ` Neil Brown
  2004-03-01  0:16           ` Partitioned raid and major number Neil Brown
  1 sibling, 2 replies; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-02-28  1:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On Fri, 27 Feb 2004 17:56:14, Miquel van Smoorenburg wrote:
> On 2004.02.27 02:17, Neil Brown wrote:
> > On Friday February 27, neilb@cse.unsw.edu.au wrote:
> > > 
> > > Right.  I missed a bit in the patch.
> > > (I assume you are still wanting to boot off /dev/sda until you copy
> > > the data into /dev/md/d0p* - then you will use root=/dev/md_d0p1)
> > 
> > Sorry, that patch was wrong.
> > This one, ontop of the original patch, works for me (I finally got
> > around to testing it).
> 
> Yes, it works! Great.
> 
> Now how to enable RAID1 on an existing disk...

Hmm. With a dynamic major, the system might fail at checking the root
file system at boot. At that time, /dev is still read-only, and
/dev/md/d0p1 might not be the correct device yet.

So either mdp needs its own partition number, or we need a /dev/root
device that's an alias for the current root (like /dev/console).

Fortunately, that's very easy. Which makes me wonder why this hasn't
been done before .. what am I overlooking ?

Patch below uses 4,1 which is just arbitrary, ofcourse. Comments ?

--- linux-2.6.3/fs/block_dev.c	2004-02-18 04:59:58.000000000 +0100
+++ linux-2.6.3-bk8-mdp/fs/block_dev.c	2004-02-28 01:58:27.000000000 +0100
@@ -339,6 +339,11 @@ struct block_device *bdget(dev_t dev)
 	struct block_device *bdev;
 	struct inode *inode;
 
+#if 1 /* XXX miquels */
+	if (dev == MKDEV(4, 1))
+		dev = current->fs->pwdmnt->mnt_sb->s_dev;
+#endif
+
 	inode = iget5_locked(bd_mnt->mnt_sb, hash(dev),
 			bdev_test, bdev_set, &dev);
 

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-28  1:09           ` Miquel van Smoorenburg
@ 2004-02-28  7:27             ` Luca Berra
  2004-03-01  0:54             ` Neil Brown
  1 sibling, 0 replies; 23+ messages in thread
From: Luca Berra @ 2004-02-28  7:27 UTC (permalink / raw)
  To: linux-raid

On Sat, Feb 28, 2004 at 02:09:16AM +0100, Miquel van Smoorenburg wrote:
>Hmm. With a dynamic major, the system might fail at checking the root
>file system at boot. At that time, /dev is still read-only, and
>/dev/md/d0p1 might not be the correct device yet.
>
>So either mdp needs its own partition number, or we need a /dev/root
>device that's an alias for the current root (like /dev/console).
>
>Fortunately, that's very easy. Which makes me wonder why this hasn't
>been done before .. what am I overlooking ?
>
>Patch below uses 4,1 which is just arbitrary, ofcourse. Comments ?
>
I was missing this feature from linux, and i don't know why it was not
done before...
Having /dev/root also solves the similar problem of ppl whose root is on a
device-mapper like myself. (dm has dynamic majors)

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-27 16:56         ` Miquel van Smoorenburg
  2004-02-28  1:09           ` Miquel van Smoorenburg
@ 2004-03-01  0:16           ` Neil Brown
  2004-03-01  0:42             ` Miquel van Smoorenburg
  1 sibling, 1 reply; 23+ messages in thread
From: Neil Brown @ 2004-03-01  0:16 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Friday February 27, miquels@cistron.nl wrote:
> On 2004.02.27 02:17, Neil Brown wrote:
> > On Friday February 27, neilb@cse.unsw.edu.au wrote:
> > > 
> > > Right.  I missed a bit in the patch.
> > > (I assume you are still wanting to boot off /dev/sda until you copy
> > > the data into /dev/md/d0p* - then you will use root=/dev/md_d0p1)
> > 
> > Sorry, that patch was wrong.
> > This one, ontop of the original patch, works for me (I finally got
> > around to testing it).
> 
> Yes, it works! Great.
> 
> Now how to enable RAID1 on an existing disk... I hoped that I could create
> an array with /dev/sda and /dev/sdb, with /dev/sdb marked as failed-disk.
> Because initializing a RAID1 array with just one working disk doesn't
> destroy the existing contents of the disk, right ? (I kept the last few
> MB of the disk as free space for the raid superblock).
> 
> Unfortunately the current tools (or the kernel) doesn't let me do that
> (/dev/sda is busy).

Yep.  Currently it isn't really possible while sda is mounted.  
You need to boot off some other media and create the array there.

You are right: creating a raid1 with just one working disk doesn't
destroy existing contents - except for last 128K or so.

> 
> Two more minor issues - one, if partitioned MD is on (/dev/md/d0 etc)
> the standard /dev/md0 device doesn't work anymore. For accessing the whole
> device (management purposes / tools) shouldn't both /dev/md0 and
> /dev/md/d0 open the same device ?

No - they are completely different devices.  Making them appear as one
device has all sorts of problems relating to confusing bits of code
that think they have exclusive access.  The idea as appealing but
wasn't worth the effort.



> 
> Two, shouldn't raid=partitionable md=d0,/dev/sda,/dev/sdb simply be
> md=d0,/dev/sda,/dev/sdb,partitionable ? You could even leave out the 'd'
> then and make it md=0,/dev/sda,/dev/sdb,partitionable. Together with
> (one) this would make a bit more sense.

no.
   raid=partitionable
and
   md=d0,.....
are related, but mean substantially different things.  You normally
only need one of these, not both.

  raid=partitionable
means that any auto-detected (using partition type 0xfd) arrays are
assembled as partitionable.  This does not apply to you at all because
you are not making arrays out of partitions.

 md=d0,......
assembles an array from explicit devices and a partitionable array.

NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-03-01  0:16           ` Partitioned raid and major number Neil Brown
@ 2004-03-01  0:42             ` Miquel van Smoorenburg
  2004-03-01  1:09               ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-03-01  0:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On Mon, 01 Mar 2004 01:16:03, Neil Brown wrote:
> On Friday February 27, miquels@cistron.nl wrote:
> > Now how to enable RAID1 on an existing disk... I hoped that I could create
> > an array with /dev/sda and /dev/sdb, with /dev/sdb marked as failed-disk.
> > Because initializing a RAID1 array with just one working disk doesn't
> > destroy the existing contents of the disk, right ? (I kept the last few
> > MB of the disk as free space for the raid superblock).
> > 
> > Unfortunately the current tools (or the kernel) doesn't let me do that
> > (/dev/sda is busy).
> 
> Yep.  Currently it isn't really possible while sda is mounted.  
> You need to boot off some other media and create the array there.

I hacked on it this weekend. I added a SET_ARRAY_INFO_CONFONLY ioctl
that creates an mddev, but markes it "confonly" internally. Meaning you
can configure it and add disks to it, but it can't be started/run.
The confonly array doesn't lock the disks when you add_new_disk().
I also patched raidtools2 to accept --confonly to mkraid, which uses
this new functionality. And that allows me to create a new raid array
on a disk that is currently in-use (you still have to use --really-force,
ofcourse).

What do you think of that approach ? Converting from a 1 disk setup
to a 2-disk RAID1 setup on an existing system is something that lots
of people want to do, seeing that the software raid howto even has
a few paragraphs dedicated to it.

Though it works, and I can boot from it, lilo doesn't understand it yet
so I'll have to hack on that next.

But if it works it would probably eventually be possible to add ICH5-R
etc raid1 superblock support to it. Or just write a valid ICH5-R
raid1 superblock to the disk (hopefully at another offset) so that the
BIOS knows this is a RAID1 setup and can boot when sda/hda is dead.

> > Two more minor issues - one, if partitioned MD is on (/dev/md/d0 etc)
> > the standard /dev/md0 device doesn't work anymore. For accessing the whole
> > device (management purposes / tools) shouldn't both /dev/md0 and
> > /dev/md/d0 open the same device ?
> 
> No - they are completely different devices.  Making them appear as one
> device has all sorts of problems relating to confusing bits of code
> that think they have exclusive access.  The idea as appealing but
> wasn't worth the effort.

Hmm yes, I understand. Thanks for explaining all that.

BTW, if you want to boot from a partitionable raid, you need the /dev/root
patch I posted before or you can't check the root filesystem. That patch
I think will not be accepted since stat(/dev/root) and
fd=open(/dev/root);fstat(fd) will return different things which is
inconsequent. How do you feel about applying for a static device number
for partitioned raid ? Hpa is also on this list, I noticed, and from his
reaction I think it wouldn't be a problem. Also it would be easier for
bootloaders like LILO to detect and deal with this.

Besides, if you check the current devices.txt you'll see that although
we've almost run out of majors that's only true for character devices.
There's plenty, plenty of block majors left.

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-02-28  1:09           ` Miquel van Smoorenburg
  2004-02-28  7:27             ` Luca Berra
@ 2004-03-01  0:54             ` Neil Brown
  2004-03-01  1:04               ` Miquel van Smoorenburg
                                 ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Neil Brown @ 2004-03-01  0:54 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Saturday February 28, miquels@cistron.nl wrote:
> 
> Hmm. With a dynamic major, the system might fail at checking the root
> file system at boot. At that time, /dev is still read-only, and
> /dev/md/d0p1 might not be the correct device yet.
> 
> So either mdp needs its own partition number, or we need a /dev/root
> device that's an alias for the current root (like /dev/console).
> 

Yes, I think this is a real problem.
There are a number of avenues that could be followed to fix it.
One it your suggestion.

Another is to make "rootfs" remountable like this:

--- ./fs/ramfs/inode.c~current~	2004-03-01 11:20:58.000000000 +1100
+++ ./fs/ramfs/inode.c	2004-03-01 11:21:15.000000000 +1100
@@ -207,7 +207,7 @@ static struct super_block *ramfs_get_sb(
 static struct super_block *rootfs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	return get_sb_nodev(fs_type, flags|MS_NOUSER, data, ramfs_fill_super);
+	return get_sb_single(fs_type, flags, data, ramfs_fill_super);
 }
 
 static struct file_system_type ramfs_fs_type = {



And then:

  mount -t rootfs rootfs /mnt/root
  fsck /mnt/root/dev/root

Another is to add "rootdev" to /proc/*, as in appended patch. Then
  ln -s /proc/self/roodev /dev/root

and providing /proc is mounted, /dev/root will work.

I think I prefer the /proc/self/rootdev approach despite it being the
bigger patch.

I might try to push it on linux-kernel.

NeilBrown



diff ./fs/proc/base.c~current~ ./fs/proc/base.c
--- ./fs/proc/base.c~current~	2004-03-01 11:28:24.000000000 +1100
+++ ./fs/proc/base.c	2004-03-01 11:48:07.000000000 +1100
@@ -50,6 +50,7 @@ enum pid_directory_inos {
 	PROC_TGID_MEM,
 	PROC_TGID_CWD,
 	PROC_TGID_ROOT,
+	PROC_TGID_ROOTDEV,
 	PROC_TGID_EXE,
 	PROC_TGID_FD,
 	PROC_TGID_ENVIRON,
@@ -73,6 +74,7 @@ enum pid_directory_inos {
 	PROC_TID_MEM,
 	PROC_TID_CWD,
 	PROC_TID_ROOT,
+	PROC_TID_ROOTDEV,
 	PROC_TID_EXE,
 	PROC_TID_FD,
 	PROC_TID_ENVIRON,
@@ -115,6 +117,7 @@ static struct pid_entry tgid_base_stuff[
 	E(PROC_TGID_MEM,       "mem",     S_IFREG|S_IRUSR|S_IWUSR),
 	E(PROC_TGID_CWD,       "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_ROOT,      "root",    S_IFLNK|S_IRWXUGO),
+	E(PROC_TGID_ROOTDEV,   "rootdev", S_IFBLK|S_IRUSR|S_IWUSR),
 	E(PROC_TGID_EXE,       "exe",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_MOUNTS,    "mounts",  S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
@@ -137,6 +140,7 @@ static struct pid_entry tid_base_stuff[]
 	E(PROC_TID_MEM,        "mem",     S_IFREG|S_IRUSR|S_IWUSR),
 	E(PROC_TID_CWD,        "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_ROOT,       "root",    S_IFLNK|S_IRWXUGO),
+	E(PROC_TID_ROOTDEV,    "rootdev", S_IFBLK|S_IRUSR|S_IWUSR),
 	E(PROC_TID_EXE,        "exe",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_MOUNTS,     "mounts",  S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
@@ -771,6 +775,32 @@ static struct inode_operations proc_pid_
 	.follow_link	= proc_pid_follow_link
 };
 
+int proc_pid_get_attr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct fs_struct *fs;
+	int result = -ENOENT;
+	generic_fillattr(inode, stat);
+	task_lock(proc_task(inode));
+	fs = proc_task(inode)->fs;
+	if(fs)
+		atomic_inc(&fs->count);
+	task_unlock(proc_task(inode));
+	if (fs) {
+		read_lock(&fs->lock);
+		stat->rdev = fs->pwdmnt->mnt_sb->s_dev;
+		read_unlock(&fs->lock);
+		result = 0;
+		put_fs_struct(fs);
+	}
+
+	return result;
+}
+
+static struct inode_operations proc_pid_dev_inode_operations = {
+	.getattr	= proc_pid_get_attr,
+};
+
 static int pid_alive(struct task_struct *p)
 {
 	BUG_ON(p->pids[PIDTYPE_PID].pidptr != &p->pids[PIDTYPE_PID].pid);
@@ -1319,6 +1349,10 @@ static struct dentry *proc_pident_lookup
 			inode->i_op = &proc_pid_link_inode_operations;
 			ei->op.proc_get_link = proc_root_link;
 			break;
+		case PROC_TID_ROOTDEV:
+		case PROC_TGID_ROOTDEV:
+			inode->i_op = &proc_pid_dev_inode_operations;
+			break;
 		case PROC_TID_ENVIRON:
 		case PROC_TGID_ENVIRON:
 			inode->i_fop = &proc_info_file_operations;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-03-01  0:54             ` Neil Brown
@ 2004-03-01  1:04               ` Miquel van Smoorenburg
  2004-03-02  0:36                 ` H. Peter Anvin
  2004-03-01 15:38               ` Miquel van Smoorenburg
  2004-03-09 15:34               ` /dev/root (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
  2 siblings, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-03-01  1:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On Mon, 01 Mar 2004 01:54:29, Neil Brown wrote:
> On Saturday February 28, miquels@cistron.nl wrote:
> > 
> > Hmm. With a dynamic major, the system might fail at checking the root
> > file system at boot. At that time, /dev is still read-only, and
> > /dev/md/d0p1 might not be the correct device yet.
> > 
> > So either mdp needs its own partition number, or we need a /dev/root
> > device that's an alias for the current root (like /dev/console).
> > 
> 
> Yes, I think this is a real problem.
> There are a number of avenues that could be followed to fix it.
> One it your suggestion.

That was a quick hack and as I said, I think it's flawed :)

> Another is to make "rootfs" remountable like this:

POSIX allows different semantics for / and //, I mentioned before that
perhaps we should make "cd //" chdir to rootfs instead of /. Then
you can also have //proc and //sys without explicitly mounting them.
But I don't think the time has come for that yet (besides it needs
more thought wrt namespaces, chroot etc).

> Another is to add "rootdev" to /proc/*, as in appended patch. Then
>   ln -s /proc/self/roodev /dev/root
> 
> and providing /proc is mounted, /dev/root will work.

I like that approach.

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-03-01  0:42             ` Miquel van Smoorenburg
@ 2004-03-01  1:09               ` Neil Brown
  2004-03-09 15:32                 ` Creating partitionable raid on existing disk (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Brown @ 2004-03-01  1:09 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Monday March 1, miquels@cistron.nl wrote:
> 
> What do you think of that approach ? Converting from a 1 disk setup
> to a 2-disk RAID1 setup on an existing system is something that lots
> of people want to do, seeing that the software raid howto even has
> a few paragraphs dedicated to it.

What I am thinking of doing is allowing:

   md=0,1,/dev/sda

to assemble a raid1 array without a superblock which uses just
/dev/sda.
Then

   md=d0,1,/dev/sda root=/dev/md_d0p1

would boot off md/d0p1 instead of sda1, but it would be the same data.

Then you would be able to add mirrors to this with something like:

  mdadm --grow /dev/md/d0 --disks=2
  mdadm /dev/md/d0 --add /dev/sdb

and you could convert it into an array with a persistent superblock
using:
   mdadm --grow /dev/md/d0 --persistent=yes

The only difficult bit is the setting a persistent superblock means
reducing the size of the device, and I would like it to be hard to do
that in error, but not impossible to do it.

> 
> Though it works, and I can boot from it, lilo doesn't understand it yet
> so I'll have to hack on that next.

I have lilo working well with partitioned raid in 2.4.
I have a stanza in /etc/lilo.conf like:

boot=/dev/Mda
disk=/dev/Mda
 bios=0x80
 sectors=63
 heads=255
 cylinders=1024
 partition=/dev/md/d0p1
   start=1

where /dev/Mda is a symlink to /dev/md/d0, because lilo thinks it
understands device names that start "/dev/md".
The "start=" number is fairly important - lilo cannot or does not
figure this out itself, so you have to tell it.  It is the start of
/dev/md/d0p1 in /dev/md/d0.


> 
> But if it works it would probably eventually be possible to add ICH5-R
> etc raid1 superblock support to it. Or just write a valid ICH5-R
> raid1 superblock to the disk (hopefully at another offset) so that the
> BIOS knows this is a RAID1 setup and can boot when sda/hda is dead.

Not knowing anything about ICH5-R superblocks, I cannot comment, but I
would like to be able to support multiple superblock formats.
> 
> BTW, if you want to boot from a partitionable raid, you need the /dev/root
> patch I posted before or you can't check the root filesystem. That patch
> I think will not be accepted since stat(/dev/root) and
> fd=open(/dev/root);fstat(fd) will return different things which is
> inconsequent. How do you feel about applying for a static device number
> for partitioned raid ? Hpa is also on this list, I noticed, and from his
> reaction I think it wouldn't be a problem. Also it would be easier for
> bootloaders like LILO to detect and deal with this.
> 
> Besides, if you check the current devices.txt you'll see that although
> we've almost run out of majors that's only true for character devices.
> There's plenty, plenty of block majors left.

I realise that we could get a major allocated, but I would rather not.
As there seems to be a push for dynamic device numbers, I would like
to ride with it and find out all the implications.

NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-03-01  0:54             ` Neil Brown
  2004-03-01  1:04               ` Miquel van Smoorenburg
@ 2004-03-01 15:38               ` Miquel van Smoorenburg
  2004-03-09 15:34               ` /dev/root (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
  2 siblings, 0 replies; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-03-01 15:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On 2004.03.01 01:54, Neil Brown wrote:
> Another is to add "rootdev" to /proc/*, as in appended patch. Then
>   ln -s /proc/self/roodev /dev/root
> 
> and providing /proc is mounted, /dev/root will work.
> 
> I think I prefer the /proc/self/rootdev approach despite it being the
> bigger patch.
> 
> I might try to push it on linux-kernel.

It doesn't work. Here's a version that does:

--- linux-2.6.3/fs/proc/base.c	2004-02-18 04:58:32.000000000 +0100
+++ linux-2.6.3-bk8-mdp/fs/proc/base.c	2004-03-01 15:20:22.000000000 +0100
@@ -50,6 +50,7 @@
 	PROC_TGID_MEM,
 	PROC_TGID_CWD,
 	PROC_TGID_ROOT,
+	PROC_TGID_ROOTDEV,
 	PROC_TGID_EXE,
 	PROC_TGID_FD,
 	PROC_TGID_ENVIRON,
@@ -73,6 +74,7 @@
 	PROC_TID_MEM,
 	PROC_TID_CWD,
 	PROC_TID_ROOT,
+	PROC_TID_ROOTDEV,
 	PROC_TID_EXE,
 	PROC_TID_FD,
 	PROC_TID_ENVIRON,
@@ -115,6 +117,7 @@
 	E(PROC_TGID_MEM,       "mem",     S_IFREG|S_IRUSR|S_IWUSR),
 	E(PROC_TGID_CWD,       "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_ROOT,      "root",    S_IFLNK|S_IRWXUGO),
+	E(PROC_TGID_ROOTDEV,   "rootdev", S_IFBLK|S_IRUSR|S_IWUSR),
 	E(PROC_TGID_EXE,       "exe",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_MOUNTS,    "mounts",  S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
@@ -137,6 +140,7 @@
 	E(PROC_TID_MEM,        "mem",     S_IFREG|S_IRUSR|S_IWUSR),
 	E(PROC_TID_CWD,        "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_ROOT,       "root",    S_IFLNK|S_IRWXUGO),
+	E(PROC_TID_ROOTDEV,    "rootdev", S_IFBLK|S_IRUSR|S_IWUSR),
 	E(PROC_TID_EXE,        "exe",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_MOUNTS,     "mounts",  S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
@@ -771,6 +775,32 @@
 	.follow_link	= proc_pid_follow_link
 };
 
+static int init_pid_rootdev_inode(struct inode *inode)
+{
+	struct fs_struct *fs;
+	struct vfsmount *vmnt;
+	int result = -ENOENT;
+	dev_t rootdev = 0;
+
+	task_lock(proc_task(inode));
+	fs = proc_task(inode)->fs;
+	if(fs)
+		atomic_inc(&fs->count);
+	task_unlock(proc_task(inode));
+	if (fs) {
+		read_lock(&fs->lock);
+		vmnt = mntget(fs->rootmnt);
+		rootdev = vmnt->mnt_sb->s_dev;
+		mntput(vmnt);
+		read_unlock(&fs->lock);
+		result = 0;
+		put_fs_struct(fs);
+	}
+	init_special_inode(inode, inode->i_mode, rootdev);
+
+	return result;
+}
+
 static int pid_alive(struct task_struct *p)
 {
 	BUG_ON(p->pids[PIDTYPE_PID].pidptr != &p->pids[PIDTYPE_PID].pid);
@@ -958,7 +988,9 @@
 	ei->type = ino;
 	inode->i_uid = 0;
 	inode->i_gid = 0;
-	if (ino == PROC_TGID_INO || ino == PROC_TID_INO || task_dumpable(task)) {
+	if (ino != PROC_TGID_ROOTDEV && ino != PROC_TID_ROOTDEV &&
+	    (ino == PROC_TGID_INO || ino == PROC_TID_INO ||
+	     task_dumpable(task))) {
 		inode->i_uid = task->euid;
 		inode->i_gid = task->egid;
 	}
@@ -988,7 +1020,10 @@
 	struct inode *inode = dentry->d_inode;
 	struct task_struct *task = proc_task(inode);
 	if (pid_alive(task)) {
-		if (proc_type(inode) == PROC_TGID_INO || proc_type(inode) == PROC_TID_INO || task_dumpable(task)) {
+		int ino = proc_type(inode);
+		if (ino != PROC_TGID_ROOTDEV && ino != PROC_TID_ROOTDEV &&
+		    (ino == PROC_TGID_INO || ino == PROC_TID_INO ||
+		     task_dumpable(task))) {
 			inode->i_uid = task->euid;
 			inode->i_gid = task->egid;
 		} else {
@@ -1319,6 +1354,10 @@
 			inode->i_op = &proc_pid_link_inode_operations;
 			ei->op.proc_get_link = proc_root_link;
 			break;
+		case PROC_TID_ROOTDEV:
+		case PROC_TGID_ROOTDEV:
+			init_pid_rootdev_inode(inode);
+			break;
 		case PROC_TID_ENVIRON:
 		case PROC_TGID_ENVIRON:
 			inode->i_fop = &proc_info_file_operations;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Partitioned raid and major number
  2004-03-01  1:04               ` Miquel van Smoorenburg
@ 2004-03-02  0:36                 ` H. Peter Anvin
  0 siblings, 0 replies; 23+ messages in thread
From: H. Peter Anvin @ 2004-03-02  0:36 UTC (permalink / raw)
  To: linux-raid

Followup to:  <20040301010419.GK14194@drinkel.cistron.nl>
By author:    Miquel van Smoorenburg <miquels@cistron.nl>
In newsgroup: linux.dev.raid
> 
> POSIX allows different semantics for / and //, I mentioned before that
> perhaps we should make "cd //" chdir to rootfs instead of /. Then
> you can also have //proc and //sys without explicitly mounting them.
> But I don't think the time has come for that yet (besides it needs
> more thought wrt namespaces, chroot etc).
> 

It's allowed, but definitely not recommended.

I do like the idea of making rootfs remountable, though.

> > Another is to add "rootdev" to /proc/*, as in appended patch. Then
> >   ln -s /proc/self/roodev /dev/root
> > 
> > and providing /proc is mounted, /dev/root will work.
> 
> I like that approach.

.. assuming the rootfs isn't a nodev filesystem.

	-hpa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Creating partitionable raid on existing disk (was: Re: Partitioned raid and major number)
  2004-03-01  1:09               ` Neil Brown
@ 2004-03-09 15:32                 ` Miquel van Smoorenburg
  2004-03-10  2:41                   ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-03-09 15:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On 2004.03.01 02:09, Neil Brown wrote:
> On Monday March 1, miquels@cistron.nl wrote:
> > 
> > What do you think of that approach ? Converting from a 1 disk setup
> > to a 2-disk RAID1 setup on an existing system is something that lots
> > of people want to do, seeing that the software raid howto even has
> > a few paragraphs dedicated to it.
> 
> What I am thinking of doing is allowing:
> 
>    md=0,1,/dev/sda
> 
> to assemble a raid1 array without a superblock which uses just
> /dev/sda.
> Then
> 
>    md=d0,1,/dev/sda root=/dev/md_d0p1
> 
> would boot off md/d0p1 instead of sda1, but it would be the same data.
> 
> Then you would be able to add mirrors to this with something like:
> 
>   mdadm --grow /dev/md/d0 --disks=2
>   mdadm /dev/md/d0 --add /dev/sdb
> 
> and you could convert it into an array with a persistent superblock
> using:
>    mdadm --grow /dev/md/d0 --persistent=yes
> 
> The only difficult bit is the setting a persistent superblock means
> reducing the size of the device, and I would like it to be hard to do
> that in error, but not impossible to do it.

Any progress on this?

The right thing to do would probably be to check if the device actually has
partitions - refuse to reduce the size of the device if it doesn't have
any partitions, or if any partition overlaps with the MD superblock.
That should be easy enough I think.

But it would take changes in both userlevel tools (add --grow to mdadm) and
the kernel, right ? So in the short run, I'll be better off by booting a
Knoppix CD and running mdadm from there, I suppose.

Are you interested in the patch I made to be able to initialize (but not run!)
an array on a disk that is otherwise busy (i.e. can't be bd_claim()ed) ?
Basically it just adds a SET_ARRAY_INFO_CONFONLY ioctl. I have patches for
both the kernel and mkraid, and they're pretty simple.

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* /dev/root (was: Re: Partitioned raid and major number)
  2004-03-01  0:54             ` Neil Brown
  2004-03-01  1:04               ` Miquel van Smoorenburg
  2004-03-01 15:38               ` Miquel van Smoorenburg
@ 2004-03-09 15:34               ` Miquel van Smoorenburg
  2004-03-10  2:05                 ` Neil Brown
  2 siblings, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-03-09 15:34 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On 2004.03.01 01:54, Neil Brown wrote:
> On Saturday February 28, miquels@cistron.nl wrote:
> > 
> > Hmm. With a dynamic major, the system might fail at checking the root
> > file system at boot. At that time, /dev is still read-only, and
> > /dev/md/d0p1 might not be the correct device yet.
> > 
> > So either mdp needs its own partition number, or we need a /dev/root
> > device that's an alias for the current root (like /dev/console).
> > 
> 
> Yes, I think this is a real problem.
> There are a number of avenues that could be followed to fix it.
> One it your suggestion.
> 
> Another is to make "rootfs" remountable
>
> And then:
> 
>   mount -t rootfs rootfs /mnt/root
>   fsck /mnt/root/dev/root
> 
> Another is to add "rootdev" to /proc/*, as in appended patch. Then
>   ln -s /proc/self/roodev /dev/root
> 
> and providing /proc is mounted, /dev/root will work.
> 
> I might try to push it on linux-kernel.

Mind if I post all three approaches (/dev/root alias device, rootfs, /proc/pid/root)
to linux-kernel and ask input on what approach is the preferred one ?

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Booting from partitioned raid,  do_mounts_md.c patch (was: Re: Partitioned raid and major number)
  2004-02-27  1:17       ` Neil Brown
  2004-02-27 16:56         ` Miquel van Smoorenburg
@ 2004-03-09 16:46         ` Miquel van Smoorenburg
  2004-03-10  2:36           ` Neil Brown
  1 sibling, 1 reply; 23+ messages in thread
From: Miquel van Smoorenburg @ 2004-03-09 16:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miquel van Smoorenburg, linux-raid

On 2004.02.27 02:17, Neil Brown wrote:
> On Friday February 27, neilb@cse.unsw.edu.au wrote:
> > 
> > Right.  I missed a bit in the patch.
> > (I assume you are still wanting to boot off /dev/sda until you copy
> > the data into /dev/md/d0p* - then you will use root=/dev/md_d0p1)
> 
> Sorry, that patch was wrong.
> This one, ontop of the original patch, works for me (I finally got
> around to testing it).
> 
> 
>  ----------- Diffstat output ------------
>  ./drivers/md/md.c     |    2 +-
>  ./init/do_mounts_md.c |    9 ++++++++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c

I've tested this a lot as well, and it works fine. On my supermicro 1U test
machine I can now pull out one of the two disks, and the machine still boots.
I can even take out disk2, insert disk1 in the slot for disk2 and it
still boots. Which is pretty cool ;)

I didn't see this in -rc2-mm1 yet - is this going to be submitted to -mm soon ?

Mike.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: /dev/root (was: Re: Partitioned raid and major number)
  2004-03-09 15:34               ` /dev/root (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
@ 2004-03-10  2:05                 ` Neil Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2004-03-10  2:05 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Tuesday March 9, miquels@cistron.nl wrote:
> 
> Mind if I post all three approaches (/dev/root alias device, rootfs, /proc/pid/root)
> to linux-kernel and ask input on what approach is the preferred one ?

Go right ahead.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Booting from partitioned raid,  do_mounts_md.c patch (was: Re: Partitioned raid and major number)
  2004-03-09 16:46         ` Booting from partitioned raid, do_mounts_md.c patch " Miquel van Smoorenburg
@ 2004-03-10  2:36           ` Neil Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2004-03-10  2:36 UTC (permalink / raw)
  To: Andrew Morton, Miquel van Smoorenburg; +Cc: linux-raid

On Tuesday March 9, miquels@cistron.nl wrote:
> > 
> > Sorry, that patch was wrong.
> > This one, ontop of the original patch, works for me (I finally got
> > around to testing it).
> > 
> > 
> >  ----------- Diffstat output ------------
> >  ./drivers/md/md.c     |    2 +-
> >  ./init/do_mounts_md.c |    9 ++++++++-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> > diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
> 
> I've tested this a lot as well, and it works fine. On my supermicro 1U test
> machine I can now pull out one of the two disks, and the machine still boots.
> I can even take out disk2, insert disk1 in the slot for disk2 and it
> still boots. Which is pretty cool ;)
> 
> I didn't see this in -rc2-mm1 yet - is this going to be submitted to -mm soon ?
> 

Thanks for reminding me.

Andrew: please include this patch which completes the change started
by "md-array-assembly-fix".

Thanks,
NeilBrown


----------------------------------------------------
Make sure correct major is used when assembling partitioned md arrays from boot parameters.

We need to make mdp_major available to do_mounts_md.c, and use it
there.

 ----------- Diffstat output ------------
 ./drivers/md/md.c     |    2 +-
 ./init/do_mounts_md.c |    9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2004-02-28 11:19:17.000000000 +1100
+++ ./drivers/md/md.c	2004-02-28 11:19:17.000000000 +1100
@@ -1450,7 +1450,7 @@ abort:
 	return 1;
 }
 
-static int mdp_major = 0;
+int mdp_major = 0;
 
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {

diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~	2004-02-28 11:19:17.000000000 +1100
+++ ./init/do_mounts_md.c	2004-02-28 11:19:17.000000000 +1100
@@ -24,6 +24,7 @@ static struct {
 
 static int md_setup_ents __initdata;
 
+extern int mdp_major;
 /*
  * Parse the command-line parameters given our kernel, but do not
  * actually try to invoke the MD device now; that is handled by
@@ -115,6 +116,8 @@ static int __init md_setup(char *str)
 	return 1;
 }
 
+#define MdpMinorShift 6
+
 static void __init md_setup_drive(void)
 {
 	int minor, i, ent, partitioned;
@@ -134,7 +137,11 @@ static void __init md_setup_drive(void)
 
 		sprintf(name, "/dev/md%s%d", partitioned?"_d":"", minor);
 		sprintf(devfs_name, "/dev/md/%s%d", partitioned?"d":"", minor);
-		create_dev(name, MKDEV(MD_MAJOR, minor), devfs_name);
+		if (partitioned)
+			dev = MKDEV(mdp_major, minor << MdpMinorShift);
+		else
+			dev = MKDEV(MD_MAJOR, minor);
+		create_dev(name, dev, devfs_name);
 		for (i = 0; i < MD_SB_DISKS && devname != 0; i++) {
 			char *p;
 			char comp_name[64];


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Creating partitionable raid on existing disk (was: Re: Partitioned raid and major number)
  2004-03-09 15:32                 ` Creating partitionable raid on existing disk (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
@ 2004-03-10  2:41                   ` Neil Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2004-03-10  2:41 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-raid

On Tuesday March 9, miquels@cistron.nl wrote:
> On 2004.03.01 02:09, Neil Brown wrote:
> > On Monday March 1, miquels@cistron.nl wrote:
> > > 
> > > What do you think of that approach ? Converting from a 1 disk setup
> > > to a 2-disk RAID1 setup on an existing system is something that lots
> > > of people want to do, seeing that the software raid howto even has
> > > a few paragraphs dedicated to it.
> > 
> > What I am thinking of doing is allowing:
> > 
> >    md=0,1,/dev/sda
> > 
> > to assemble a raid1 array without a superblock which uses just
> > /dev/sda.
...
> 
> Any progress on this?

No, not yet.

> 
> The right thing to do would probably be to check if the device actually has
> partitions - refuse to reduce the size of the device if it doesn't have
> any partitions, or if any partition overlaps with the MD superblock.
> That should be easy enough I think.

Certainly do-able.  But I'm not sure it is completely correct.
Suppose the device isn't partitioned, and has a single filesystem on
it, which is smaller then the whole.  Why not shrink it then?

I really want some general interface to find out how much of a drive
is in-use.  Maybe if "bd_claim" to a parameter which said how much was
being claimed.....

Or maybe claimants should reducde i_size of the block device.  

> 
> But it would take changes in both userlevel tools (add --grow to mdadm) and
> the kernel, right ? So in the short run, I'll be better off by booting a
> Knoppix CD and running mdadm from there, I suppose.

Yes, there is a bit of work to be done before you can use this
approach.

> 
> Are you interested in the patch I made to be able to initialize (but not run!)
> an array on a disk that is otherwise busy (i.e. can't be bd_claim()ed) ?
> Basically it just adds a SET_ARRAY_INFO_CONFONLY ioctl. I have patches for
> both the kernel and mkraid, and they're pretty simple.

Not really.  If it just writes out a superblock, then it can be done
entirely in user-space - no kernel patch should be needed.

NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-03-10  2:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25 14:56 Partitioned raid and major number Miquel van Smoorenburg
2004-02-25 18:46 ` H. Peter Anvin
2004-02-25 23:25 ` Neil Brown
2004-02-26 21:51   ` Miquel van Smoorenburg
2004-02-27  0:21     ` Neil Brown
2004-02-27  1:17       ` Neil Brown
2004-02-27 16:56         ` Miquel van Smoorenburg
2004-02-28  1:09           ` Miquel van Smoorenburg
2004-02-28  7:27             ` Luca Berra
2004-03-01  0:54             ` Neil Brown
2004-03-01  1:04               ` Miquel van Smoorenburg
2004-03-02  0:36                 ` H. Peter Anvin
2004-03-01 15:38               ` Miquel van Smoorenburg
2004-03-09 15:34               ` /dev/root (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
2004-03-10  2:05                 ` Neil Brown
2004-03-01  0:16           ` Partitioned raid and major number Neil Brown
2004-03-01  0:42             ` Miquel van Smoorenburg
2004-03-01  1:09               ` Neil Brown
2004-03-09 15:32                 ` Creating partitionable raid on existing disk (was: Re: Partitioned raid and major number) Miquel van Smoorenburg
2004-03-10  2:41                   ` Neil Brown
2004-03-09 16:46         ` Booting from partitioned raid, do_mounts_md.c patch " Miquel van Smoorenburg
2004-03-10  2:36           ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2004-02-26 23:01 Partitioned raid and major number Michael

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).