Found a new bug!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Found a new bug!
  2005-07-17  8:27 ` [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size NeilBrown
@ 2005-07-17 12:10   ` djani22
  2005-07-17 22:13     ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: djani22 @ 2005-07-17 12:10 UTC (permalink / raw)
  To: linux-raid

Hi all!

I think I found a new bug in the kernel ! (or mdadm?)

First I try this:
mkraid --configfile /etc/raidtab.nw /dev/md0 -R
DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
handling MD device /dev/md0
analyzing super-block
couldn't get device size for /dev/md31 -- File too large
mkraid: aborted.
(In addition to the above messages, see the syslog and /proc/mdstat as well
 for potential clues.)

Next  I try this:

./create_linear
mdadm: /dev/md31 appears to be part of a raid array:
    level=0 devices=1 ctime=Sun Jul 17 13:30:27 2005
Continue creating array? y
./create_linear: line 1:  2853 Segmentation fault      mdadm --create
/dev/md0 --chunk=32 --level=linear --force --raid-devices=1 /dev/md31

After this little script the half of the raid subsystem hangs:

The raidtools makes nothing, the mdadm makes nothing too.
AND the cat /proc/mdstat is hangs too!
But the /dev/md31 device is still working.

mdstat in previous 2s: (watch cat /proc/mdstat)

Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [faulty]
md31 : active raid0 md4[3] md3[2] md2[1] md1[0]
      7814332928 blocks 32k chunks

md4 : active raid1 nbd3[0]
      1953583296 blocks [2/1] [U_]

md3 : active raid1 nbd2[0]
      1953583296 blocks [2/1] [U_]

md2 : active raid1 nbd1[0]
      1953583296 blocks [2/1] [U_]

md1 : active raid1 nbd0[0]
      1953583296 blocks [2/1] [U_]

unused devices: <none>

Kernel: 2.6.13-rc3
raidtools-1.00.3
mdadm-1.12.0

The background:
I try to build a big array ~8TB.

I use for this 5 PCs.
4 for "disk nodes" with nbd and 1 for "concentrator".
(from previous idea in this list. ;)
In the concentrator, the first level raid  (md1-4) is for ability to backup,
swap the disk nodes. (node-spare)
The next level (md31) is for the performance. ;)
And, the last level (md0 linear) for scalability.

Why dont use LVM for last level?
Well, I try that, but cat /dev/.../LV >/dev/null can do only 15 - 16 MB/s
and cat /dev/md31 >/dev/null can do 34-38MB/s.
(the network is G-Ethernet, but only 32bit/33MHz PCI!)

Thanks
Janos


----- Original Message -----
From: "NeilBrown" <neilb@cse.unsw.edu.au>
To: "Andrew Morton" <akpm@osdl.org>
Cc: <linux-raid@vger.kernel.org>
Sent: Sunday, July 17, 2005 10:27 AM
Subject: [PATCH md ] When resizing an array, we need to update
resync_max_sectors as well as size.


> Another md patch against 2.6.13-rc2-mm2, suitable for 2.6.13.
> Thanks,
> NeilBrown
>
> ### Comments for Changeset
>
> Without this, and attempt to 'grow' an array will claim to have synced
> the extra part without actually having done anything.
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
>  ./drivers/md/raid1.c     |    1 +
>  ./drivers/md/raid5.c     |    1 +
>  ./drivers/md/raid6main.c |    1 +
>  3 files changed, 3 insertions(+)
>
> diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
> --- ./drivers/md/raid1.c~current~ 2005-07-17 18:25:47.000000000 +1000
> +++ ./drivers/md/raid1.c 2005-07-17 17:18:13.000000000 +1000
> @@ -1467,6 +1467,7 @@ static int raid1_resize(mddev_t *mddev,
>   set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   }
>   mddev->size = mddev->array_size;
> + mddev->resync_max_sectors = sectors;
>   return 0;
>  }
>
>
> diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
> --- ./drivers/md/raid5.c~current~ 2005-07-17 18:25:47.000000000 +1000
> +++ ./drivers/md/raid5.c 2005-07-17 18:25:52.000000000 +1000
> @@ -1931,6 +1931,7 @@ static int raid5_resize(mddev_t *mddev,
>   set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   }
>   mddev->size = sectors /2;
> + mddev->resync_max_sectors = sectors;
>   return 0;
>  }
>
>
> diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
> --- ./drivers/md/raid6main.c~current~ 2005-07-17 18:25:47.000000000 +1000
> +++ ./drivers/md/raid6main.c 2005-07-17 17:19:04.000000000 +1000
> @@ -2095,6 +2095,7 @@ static int raid6_resize(mddev_t *mddev,
>   set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   }
>   mddev->size = sectors /2;
> + mddev->resync_max_sectors = sectors;
>   return 0;
>  }
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-07-17 12:10   ` Found a new bug! djani22
@ 2005-07-17 22:13     ` Neil Brown
  2005-07-17 22:31       ` djani22
  2005-08-14 22:38       ` djani22
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Brown @ 2005-07-17 22:13 UTC (permalink / raw)
  To: djani22; +Cc: linux-raid

On Sunday July 17, djani22@dynamicweb.hu wrote:
> Hi all!
> 
> I think I found a new bug in the kernel ! (or mdadm?)

Yes.  With the current code you cannot have components of a 'linear'
which are larger than 2^32 sectors.  I'll try to put together a fix
for this in the next day or so.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
@ 2005-07-17 22:20 djani22
  0 siblings, 0 replies; 9+ messages in thread
From: djani22 @ 2005-07-17 22:20 UTC (permalink / raw)
  To: linux-raid

I have played more with the 8 TB beast (/dev/md31):

 Try to make the top of it an another type raid.
The raidtools simply refused that. ("file too large")

 But the mdadm allow me to build raid0 from one big drive! (/dev/md31 ->
 /dev/md0)

 But when I try to fill the XFS filesystem on it, after the first 8-10GB,
the
kernel dropped the mount pont.
xfs_repair: http://download.netcenter.hu/raid-bug/xfs.log1

 After xfs_repair, I have mount the md, and almost all file contains
garbage.
RAID allocation problem?

 It is impossible to use >2TB block devices to the raid input?

Janos


> ----- Original Message -----
> From: <djani22@dynamicweb.hu>
> To: <linux-raid@vger.kernel.org>
> Sent: Sunday, July 17, 2005 2:10 PM
> Subject: Found a new bug!
>
>
> > Hi all!
> >
> > I think I found a new bug in the kernel ! (or mdadm?)
> >
> > First I try this:
> > mkraid --configfile /etc/raidtab.nw /dev/md0 -R
> > DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
> > handling MD device /dev/md0
> > analyzing super-block
> > couldn't get device size for /dev/md31 -- File too large
> > mkraid: aborted.
> > (In addition to the above messages, see the syslog and /proc/mdstat as
> well
> >  for potential clues.)
> >
> > Next  I try this:
> >
> > ./create_linear
> > mdadm: /dev/md31 appears to be part of a raid array:
> >     level=0 devices=1 ctime=Sun Jul 17 13:30:27 2005
> > Continue creating array? y
> > ./create_linear: line 1:  2853 Segmentation fault      mdadm --create
> > /dev/md0 --chunk=32 --level=linear --force --raid-devices=1 /dev/md31
> >
> > After this little script the half of the raid subsystem hangs:
> >
> > The raidtools makes nothing, the mdadm makes nothing too.
> > AND the cat /proc/mdstat is hangs too!
> > But the /dev/md31 device is still working.
> >
> > mdstat in previous 2s: (watch cat /proc/mdstat)
> >
> > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [faulty]
> > md31 : active raid0 md4[3] md3[2] md2[1] md1[0]
> >       7814332928 blocks 32k chunks
> >
> > md4 : active raid1 nbd3[0]
> >       1953583296 blocks [2/1] [U_]
> >
> > md3 : active raid1 nbd2[0]
> >       1953583296 blocks [2/1] [U_]
> >
> > md2 : active raid1 nbd1[0]
> >       1953583296 blocks [2/1] [U_]
> >
> > md1 : active raid1 nbd0[0]
> >       1953583296 blocks [2/1] [U_]
> >
> > unused devices: <none>
> >
> > Kernel: 2.6.13-rc3
> > raidtools-1.00.3
> > mdadm-1.12.0
> >
> > The background:
> > I try to build a big array ~8TB.
> >
> > I use for this 5 PCs.
> > 4 for "disk nodes" with nbd and 1 for "concentrator".
> > (from previous idea in this list. ;)
> > In the concentrator, the first level raid  (md1-4) is for ability to
> backup,
> > swap the disk nodes. (node-spare)
> > The next level (md31) is for the performance. ;)
> > And, the last level (md0 linear) for scalability.
> >
> > Why dont use LVM for last level?
> > Well, I try that, but cat /dev/.../LV >/dev/null can do only 15 - 16
MB/s
> > and cat /dev/md31 >/dev/null can do 34-38MB/s.
> > (the network is G-Ethernet, but only 32bit/33MHz PCI!)
> >
> > Thanks
> > Janos
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-07-17 22:13     ` Neil Brown
@ 2005-07-17 22:31       ` djani22
  2005-08-14 22:38       ` djani22
  1 sibling, 0 replies; 9+ messages in thread
From: djani22 @ 2005-07-17 22:31 UTC (permalink / raw)
  To: linux-raid


----- Original Message -----
From: "Neil Brown" <neilb@cse.unsw.edu.au>
To: <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, July 18, 2005 12:13 AM
Subject: Re: Found a new bug!


> On Sunday July 17, djani22@dynamicweb.hu wrote:
> > Hi all!
> >
> > I think I found a new bug in the kernel ! (or mdadm?)
>
> Yes.  With the current code you cannot have components of a 'linear'
> which are larger than 2^32 sectors.  I'll try to put together a fix
> for this in the next day or so.
>
> NeilBrown

Thanks for help!

Another one question:

I did'nt find an usable way for me, but my system must start anyway....

I hava created the XFS directly to the 8TB raid0 (/dev/md31), and now the
copy is runing...
It is possible in the future to convert it to be a part of the planned
linear array without backup all data?

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-07-17 22:13     ` Neil Brown
  2005-07-17 22:31       ` djani22
@ 2005-08-14 22:38       ` djani22
  2005-08-15  1:21         ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: djani22 @ 2005-08-14 22:38 UTC (permalink / raw)
  To: linux-raid

Hello list, Neil!

Is there something news with the 2TB raid-input problem?
Sooner or later, I will need to join two 8TB array to one big 16TB. :-)

Thanks, 

Janos


----- Original Message ----- 
From: "Neil Brown" <neilb@cse.unsw.edu.au>
To: <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, July 18, 2005 12:13 AM
Subject: Re: Found a new bug!


> On Sunday July 17, djani22@dynamicweb.hu wrote:
> > Hi all!
> > 
> > I think I found a new bug in the kernel ! (or mdadm?)
> 
> Yes.  With the current code you cannot have components of a 'linear'
> which are larger than 2^32 sectors.  I'll try to put together a fix
> for this in the next day or so.
> 
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-08-14 22:38       ` djani22
@ 2005-08-15  1:21         ` Neil Brown
  2005-08-15 10:50           ` djani22
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2005-08-15  1:21 UTC (permalink / raw)
  To: djani22; +Cc: linux-raid

On Monday August 15, djani22@dynamicweb.hu wrote:
> Hello list, Neil!
> 
> Is there something news with the 2TB raid-input problem?
> Sooner or later, I will need to join two 8TB array to one big 16TB. :-)

Thanks for the reminder.

The following patch should work, but my test machine won't boot the
current -mm kernels :-( so it is hard to test properly.

Let me know the results if you are able to test it.

Thanks,
NeilBrown

---------------------------------
Support md/linear array with components greater than 2 terabytes.

linear currently uses division by the size of the smallest componenet
device to find which device a request goes to.
If that smallest device is larger than 2 terabytes, then the division
will not work on some systems.

So we introduce a pre-shift, and take care not to make the hash table
too large, much like the code in raid0.

Also get rid of conf->nr_zones, which is not needed.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/linear.c         |   99 ++++++++++++++++++++++++++++--------------
 ./include/linux/raid/linear.h |    4 -
 2 files changed, 70 insertions(+), 33 deletions(-)

diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~	2005-08-15 11:18:21.000000000 +1000
+++ ./drivers/md/linear.c	2005-08-15 11:18:27.000000000 +1000
@@ -38,7 +38,8 @@ static inline dev_info_t *which_dev(mdde
 	/*
 	 * sector_div(a,b) returns the remainer and sets a to a/b
 	 */
-	(void)sector_div(block, conf->smallest->size);
+	block >>= conf->preshift;
+	(void)sector_div(block, conf->hash_spacing);
 	hash = conf->hash_table[block];
 
 	while ((sector>>1) >= (hash->size + hash->offset))
@@ -47,7 +48,7 @@ static inline dev_info_t *which_dev(mdde
 }
 
 /**
- *	linear_mergeable_bvec -- tell bio layer if a two requests can be merged
+ *	linear_mergeable_bvec -- tell bio layer if two requests can be merged
  *	@q: request queue
  *	@bio: the buffer head that's been built up so far
  *	@biovec: the request that could be merged to it.
@@ -116,7 +117,7 @@ static int linear_run (mddev_t *mddev)
 	dev_info_t **table;
 	mdk_rdev_t *rdev;
 	int i, nb_zone, cnt;
-	sector_t start;
+	sector_t min_spacing;
 	sector_t curr_offset;
 	struct list_head *tmp;
 
@@ -127,11 +128,6 @@ static int linear_run (mddev_t *mddev)
 	memset(conf, 0, sizeof(*conf) + mddev->raid_disks*sizeof(dev_info_t));
 	mddev->private = conf;
 
-	/*
-	 * Find the smallest device.
-	 */
-
-	conf->smallest = NULL;
 	cnt = 0;
 	mddev->array_size = 0;
 
@@ -159,8 +155,6 @@ static int linear_run (mddev_t *mddev)
 		disk->size = rdev->size;
 		mddev->array_size += rdev->size;
 
-		if (!conf->smallest || (disk->size < conf->smallest->size))
-			conf->smallest = disk;
 		cnt++;
 	}
 	if (cnt != mddev->raid_disks) {
@@ -168,6 +162,36 @@ static int linear_run (mddev_t *mddev)
 		goto out;
 	}
 
+	min_spacing = mddev->array_size;
+	sector_div(min_spacing, PAGE_SIZE/sizeof(struct dev_info *));
+
+	/* min_spacing is the minimum spacing that will fit the hash
+	 * table in one PAGE.  This may be much smaller than needed.
+	 * We find the smallest non-terminal set of consecutive devices
+	 * that is larger than min_spacing as use the size of that as
+	 * the actual spacing 
+	 */
+	conf->hash_spacing = mddev->array_size;
+	for (i=0; i < cnt-1 ; i++) {
+		sector_t sz = 0;
+		int j;
+		for (j=i; i<cnt-1 && sz < min_spacing ; j++)
+			sz += conf->disks[j].size;
+		if (sz >= min_spacing && sz < conf->hash_spacing)
+			conf->hash_spacing = sz;
+	}
+
+	/* hash_spacing may be too large for sector_div to work with,
+	 * so we might need to pre-shift 
+	 */
+	conf->preshift = 0;
+	if (sizeof(sector_t) > sizeof(u32)) {
+		sector_t space = conf->hash_spacing;
+		while (space > (sector_t)(~(u32)0)) {
+			space >>= 1;
+			conf->preshift++;
+		}
+	}
 	/*
 	 * This code was restructured to work around a gcc-2.95.3 internal
 	 * compiler error.  Alter it with care.
@@ -177,39 +201,52 @@ static int linear_run (mddev_t *mddev)
 		unsigned round;
 		unsigned long base;
 
-		sz = mddev->array_size;
-		base = conf->smallest->size;
+		sz = mddev->array_size >> conf->preshift;
+		sz += 1; /* force round-up */
+		base = conf->hash_spacing >> conf->preshift;
 		round = sector_div(sz, base);
-		nb_zone = conf->nr_zones = sz + (round ? 1 : 0);
+		nb_zone = sz + (round ? 1 : 0);
 	}
-			
-	conf->hash_table = kmalloc (sizeof (dev_info_t*) * nb_zone,
+	BUG_ON(nb_zone > PAGE_SIZE / sizeof(struct dev_info *));
+
+	conf->hash_table = kmalloc (sizeof (struct dev_info *) * nb_zone,
 					GFP_KERNEL);
 	if (!conf->hash_table)
 		goto out;
 
 	/*
 	 * Here we generate the linear hash table
+	 * First calculate the device offsets.
 	 */
+	conf->disks[0].offset = 0;
+	for (i=1; i<mddev->raid_disks; i++)
+		conf->disks[i].offset =
+			conf->disks[i-1].offset +
+			conf->disks[i-1].size;
+
 	table = conf->hash_table;
-	start = 0;
 	curr_offset = 0;
-	for (i = 0; i < cnt; i++) {
-		dev_info_t *disk = conf->disks + i;
-
-		disk->offset = curr_offset;
-		curr_offset += disk->size;
-
-		/* 'curr_offset' is the end of this disk
-		 * 'start' is the start of table
+	i = 0;
+	for (curr_offset = 0;
+	     curr_offset < mddev->array_size;
+	     curr_offset += conf->hash_spacing) {
+
+		while (i < mddev->raid_disks-1 &&
+		       curr_offset >= conf->disks[i+1].offset)
+			i++;
+
+		*table ++ = conf->disks + i;
+	}
+
+	if (conf->preshift) {
+		conf->hash_spacing >>= conf->preshift;
+		/* round hash_spacing up so that when we divide by it,
+		 * we err on the side of "too-low", which is safest.
 		 */
-		while (start < curr_offset) {
-			*table++ = disk;
-			start += conf->smallest->size;
-		}
+		conf->hash_spacing++;
 	}
-	if (table-conf->hash_table != nb_zone)
-		BUG();
+
+	BUG_ON(table - conf->hash_table > nb_zone);
 
 	blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
 	mddev->queue->unplug_fn = linear_unplug;
@@ -299,7 +336,7 @@ static void linear_status (struct seq_fi
 	sector_t s = 0;
   
 	seq_printf(seq, "      ");
-	for (j = 0; j < conf->nr_zones; j++)
+	for (j = 0; j < mddev->raid_disks; j++)
 	{
 		char b[BDEVNAME_SIZE];
 		s += conf->smallest_size;

diff ./include/linux/raid/linear.h~current~ ./include/linux/raid/linear.h
--- ./include/linux/raid/linear.h~current~	2005-08-15 11:18:21.000000000 +1000
+++ ./include/linux/raid/linear.h	2005-08-15 09:13:55.000000000 +1000
@@ -14,8 +14,8 @@ typedef struct dev_info dev_info_t;
 struct linear_private_data
 {
 	dev_info_t		**hash_table;
-	dev_info_t		*smallest;
-	int			nr_zones;
+	sector_t		hash_spacing;
+	int			preshift; /* shift before dividing by hash_spacing */
 	dev_info_t		disks[0];
 };
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-08-15  1:21         ` Neil Brown
@ 2005-08-15 10:50           ` djani22
  2005-08-18  4:34             ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: djani22 @ 2005-08-15 10:50 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Thanks, I will test it, when I can...

In this moment, my system is an working online system, and now only one 8TB
space what I can use...
Thats right, maybe I can built linear array from only one soure device,but:
My first problem is, on my 8TB device is already exists XFS filesystem, with
valuable data, what I can't backup.
It is still OK, but I can't insert one raid layer, because the raid's
superblock, and the XFS is'nt shrinkable. :-(

The only one way (I think) to plug in another raw device, and build an array
from 8TB-device + new small device, to get much space to FS.

But it is too risky for me!

Do you think it is safe?

Currently I use 2.6.13-rc3.
This patch is good for this version, or only the last version?

Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :)

Thanks,

Janos


----- Original Message -----
From: "Neil Brown" <neilb@cse.unsw.edu.au>
To: <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, August 15, 2005 3:21 AM
Subject: Re: Found a new bug!


> On Monday August 15, djani22@dynamicweb.hu wrote:
> > Hello list, Neil!
> >
> > Is there something news with the 2TB raid-input problem?
> > Sooner or later, I will need to join two 8TB array to one big 16TB. :-)
>
> Thanks for the reminder.
>
> The following patch should work, but my test machine won't boot the
> current -mm kernels :-( so it is hard to test properly.
>
> Let me know the results if you are able to test it.
>
> Thanks,
> NeilBrown
>
> ---------------------------------
> Support md/linear array with components greater than 2 terabytes.
>
> linear currently uses division by the size of the smallest componenet
> device to find which device a request goes to.
> If that smallest device is larger than 2 terabytes, then the division
> will not work on some systems.
>
> So we introduce a pre-shift, and take care not to make the hash table
> too large, much like the code in raid0.
>
> Also get rid of conf->nr_zones, which is not needed.
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
>  ./drivers/md/linear.c         |   99
++++++++++++++++++++++++++++--------------
>  ./include/linux/raid/linear.h |    4 -
>  2 files changed, 70 insertions(+), 33 deletions(-)
>
> diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
> --- ./drivers/md/linear.c~current~ 2005-08-15 11:18:21.000000000 +1000
> +++ ./drivers/md/linear.c 2005-08-15 11:18:27.000000000 +1000
> @@ -38,7 +38,8 @@ static inline dev_info_t *which_dev(mdde
>   /*
>   * sector_div(a,b) returns the remainer and sets a to a/b
>   */
> - (void)sector_div(block, conf->smallest->size);
> + block >>= conf->preshift;
> + (void)sector_div(block, conf->hash_spacing);
>   hash = conf->hash_table[block];
>
>   while ((sector>>1) >= (hash->size + hash->offset))
> @@ -47,7 +48,7 @@ static inline dev_info_t *which_dev(mdde
>  }
>
>  /**
> - * linear_mergeable_bvec -- tell bio layer if a two requests can be
merged
> + * linear_mergeable_bvec -- tell bio layer if two requests can be merged
>   * @q: request queue
>   * @bio: the buffer head that's been built up so far
>   * @biovec: the request that could be merged to it.
> @@ -116,7 +117,7 @@ static int linear_run (mddev_t *mddev)
>   dev_info_t **table;
>   mdk_rdev_t *rdev;
>   int i, nb_zone, cnt;
> - sector_t start;
> + sector_t min_spacing;
>   sector_t curr_offset;
>   struct list_head *tmp;
>
> @@ -127,11 +128,6 @@ static int linear_run (mddev_t *mddev)
>   memset(conf, 0, sizeof(*conf) + mddev->raid_disks*sizeof(dev_info_t));
>   mddev->private = conf;
>
> - /*
> - * Find the smallest device.
> - */
> -
> - conf->smallest = NULL;
>   cnt = 0;
>   mddev->array_size = 0;
>
> @@ -159,8 +155,6 @@ static int linear_run (mddev_t *mddev)
>   disk->size = rdev->size;
>   mddev->array_size += rdev->size;
>
> - if (!conf->smallest || (disk->size < conf->smallest->size))
> - conf->smallest = disk;
>   cnt++;
>   }
>   if (cnt != mddev->raid_disks) {
> @@ -168,6 +162,36 @@ static int linear_run (mddev_t *mddev)
>   goto out;
>   }
>
> + min_spacing = mddev->array_size;
> + sector_div(min_spacing, PAGE_SIZE/sizeof(struct dev_info *));
> +
> + /* min_spacing is the minimum spacing that will fit the hash
> + * table in one PAGE.  This may be much smaller than needed.
> + * We find the smallest non-terminal set of consecutive devices
> + * that is larger than min_spacing as use the size of that as
> + * the actual spacing
> + */
> + conf->hash_spacing = mddev->array_size;
> + for (i=0; i < cnt-1 ; i++) {
> + sector_t sz = 0;
> + int j;
> + for (j=i; i<cnt-1 && sz < min_spacing ; j++)
> + sz += conf->disks[j].size;
> + if (sz >= min_spacing && sz < conf->hash_spacing)
> + conf->hash_spacing = sz;
> + }
> +
> + /* hash_spacing may be too large for sector_div to work with,
> + * so we might need to pre-shift
> + */
> + conf->preshift = 0;
> + if (sizeof(sector_t) > sizeof(u32)) {
> + sector_t space = conf->hash_spacing;
> + while (space > (sector_t)(~(u32)0)) {
> + space >>= 1;
> + conf->preshift++;
> + }
> + }
>   /*
>   * This code was restructured to work around a gcc-2.95.3 internal
>   * compiler error.  Alter it with care.
> @@ -177,39 +201,52 @@ static int linear_run (mddev_t *mddev)
>   unsigned round;
>   unsigned long base;
>
> - sz = mddev->array_size;
> - base = conf->smallest->size;
> + sz = mddev->array_size >> conf->preshift;
> + sz += 1; /* force round-up */
> + base = conf->hash_spacing >> conf->preshift;
>   round = sector_div(sz, base);
> - nb_zone = conf->nr_zones = sz + (round ? 1 : 0);
> + nb_zone = sz + (round ? 1 : 0);
>   }
> -
> - conf->hash_table = kmalloc (sizeof (dev_info_t*) * nb_zone,
> + BUG_ON(nb_zone > PAGE_SIZE / sizeof(struct dev_info *));
> +
> + conf->hash_table = kmalloc (sizeof (struct dev_info *) * nb_zone,
>   GFP_KERNEL);
>   if (!conf->hash_table)
>   goto out;
>
>   /*
>   * Here we generate the linear hash table
> + * First calculate the device offsets.
>   */
> + conf->disks[0].offset = 0;
> + for (i=1; i<mddev->raid_disks; i++)
> + conf->disks[i].offset =
> + conf->disks[i-1].offset +
> + conf->disks[i-1].size;
> +
>   table = conf->hash_table;
> - start = 0;
>   curr_offset = 0;
> - for (i = 0; i < cnt; i++) {
> - dev_info_t *disk = conf->disks + i;
> -
> - disk->offset = curr_offset;
> - curr_offset += disk->size;
> -
> - /* 'curr_offset' is the end of this disk
> - * 'start' is the start of table
> + i = 0;
> + for (curr_offset = 0;
> +      curr_offset < mddev->array_size;
> +      curr_offset += conf->hash_spacing) {
> +
> + while (i < mddev->raid_disks-1 &&
> +        curr_offset >= conf->disks[i+1].offset)
> + i++;
> +
> + *table ++ = conf->disks + i;
> + }
> +
> + if (conf->preshift) {
> + conf->hash_spacing >>= conf->preshift;
> + /* round hash_spacing up so that when we divide by it,
> + * we err on the side of "too-low", which is safest.
>   */
> - while (start < curr_offset) {
> - *table++ = disk;
> - start += conf->smallest->size;
> - }
> + conf->hash_spacing++;
>   }
> - if (table-conf->hash_table != nb_zone)
> - BUG();
> +
> + BUG_ON(table - conf->hash_table > nb_zone);
>
>   blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
>   mddev->queue->unplug_fn = linear_unplug;
> @@ -299,7 +336,7 @@ static void linear_status (struct seq_fi
>   sector_t s = 0;
>
>   seq_printf(seq, "      ");
> - for (j = 0; j < conf->nr_zones; j++)
> + for (j = 0; j < mddev->raid_disks; j++)
>   {
>   char b[BDEVNAME_SIZE];
>   s += conf->smallest_size;
>
> diff ./include/linux/raid/linear.h~current~ ./include/linux/raid/linear.h
> --- ./include/linux/raid/linear.h~current~ 2005-08-15 11:18:21.000000000
+1000
> +++ ./include/linux/raid/linear.h 2005-08-15 09:13:55.000000000 +1000
> @@ -14,8 +14,8 @@ typedef struct dev_info dev_info_t;
>  struct linear_private_data
>  {
>   dev_info_t **hash_table;
> - dev_info_t *smallest;
> - int nr_zones;
> + sector_t hash_spacing;
> + int preshift; /* shift before dividing by hash_spacing */
>   dev_info_t disks[0];
>  };
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-08-15 10:50           ` djani22
@ 2005-08-18  4:34             ` Neil Brown
  2005-08-18 15:39               ` djani22
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2005-08-18  4:34 UTC (permalink / raw)
  To: djani22; +Cc: linux-raid

On Monday August 15, djani22@dynamicweb.hu wrote:
> Thanks, I will test it, when I can...
> 
> In this moment, my system is an working online system, and now only one 8TB
> space what I can use...
> Thats right, maybe I can built linear array from only one soure device,but:
> My first problem is, on my 8TB device is already exists XFS filesystem, with
> valuable data, what I can't backup.
> It is still OK, but I can't insert one raid layer, because the raid's
> superblock, and the XFS is'nt shrinkable. :-(
> 
> The only one way (I think) to plug in another raw device, and build an array
> from 8TB-device + new small device, to get much space to FS.
> 
> But it is too risky for me!

Yes, I wouldn't bother just for testing.  I've managed to put together
some huge devices with spare files and multi-layer linear arrays (ext3
won't allow files as big as 2TB) and I am happy that the patch works.

Longer term, I have been thinking of enhancing mdadm so that when you
create a linear array, it copies the few blocks from the end that will
be over written by the superblock onto the start of the second
device.  This would allow a single device to be extended into a linear
array without loss.  (I also have patches to hot-add devices to the
end of a linear array which I really should dust-off and get into
mainline). 
> 
> Do you think it is safe?
> 
> Currently I use 2.6.13-rc3.
> This patch is good for this version, or only the last version?
> 
> Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :)

The patch should be good against any reasonable recent version of
2.6.  I always work against the latest -mm, but this code has been
largely untouched for a while so there shouldn't be any patch
conflicts.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Found a new bug!
  2005-08-18  4:34             ` Neil Brown
@ 2005-08-18 15:39               ` djani22
  0 siblings, 0 replies; 9+ messages in thread
From: djani22 @ 2005-08-18 15:39 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

----- Original Message -----
From: "Neil Brown" <neilb@cse.unsw.edu.au>
To: <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, August 18, 2005 6:34 AM
Subject: Re: Found a new bug!


> On Monday August 15, djani22@dynamicweb.hu wrote:
> > Thanks, I will test it, when I can...
> >
> > In this moment, my system is an working online system, and now only one
8TB
> > space what I can use...
> > Thats right, maybe I can built linear array from only one soure
device,but:
> > My first problem is, on my 8TB device is already exists XFS filesystem,
with
> > valuable data, what I can't backup.
> > It is still OK, but I can't insert one raid layer, because the raid's
> > superblock, and the XFS is'nt shrinkable. :-(
> >
> > The only one way (I think) to plug in another raw device, and build an
array
> > from 8TB-device + new small device, to get much space to FS.
> >
> > But it is too risky for me!
>
> Yes, I wouldn't bother just for testing.  I've managed to put together
> some huge devices with spare files and multi-layer linear arrays (ext3
> won't allow files as big as 2TB) and I am happy that the patch works.
>
> Longer term, I have been thinking of enhancing mdadm so that when you
> create a linear array, it copies the few blocks from the end that will
> be over written by the superblock onto the start of the second
> device.  This would allow a single device to be extended into a linear
> array without loss.  (I also have patches to hot-add devices to the
> end of a linear array which I really should dust-off and get into
> mainline).

Yes!
This is very good idea!
I can do that manually with dd, but some people can't.
This, and sometimes reverse of this is a usefull options!

In my case:
I add some small HDD to my big array, to try the patch.
Thats ok.
But later, when I try to change the small to another big, there is no easy
way, to do this.
When I copy the small drive with dd or cat to 2nd big array, the superblock
is wrong placed.
(or not?)


> >
> > Do you think it is safe?
> >
> > Currently I use 2.6.13-rc3.
> > This patch is good for this version, or only the last version?
> >
> > Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :)
>
> The patch should be good against any reasonable recent version of
> 2.6.  I always work against the latest -mm, but this code has been
> largely untouched for a while so there shouldn't be any patch
> conflicts.

Thanks, I will try it!
But in the last month my system's downtime is almost more than uptime, and
now I try to fix this very bad stat. :-)


Janos


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-08-18 15:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-17 22:20 Found a new bug! djani22
     [not found] <20050717182650.24540.patches@notabene>
2005-07-17  8:27 ` [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size NeilBrown
2005-07-17 12:10   ` Found a new bug! djani22
2005-07-17 22:13     ` Neil Brown
2005-07-17 22:31       ` djani22
2005-08-14 22:38       ` djani22
2005-08-15  1:21         ` Neil Brown
2005-08-15 10:50           ` djani22
2005-08-18  4:34             ` Neil Brown
2005-08-18 15:39               ` djani22

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).