From: djani22@dynamicweb.hu
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: linux-raid@vger.kernel.org
Subject: Re: Found a new bug!
Date: Mon, 15 Aug 2005 12:50:58 +0200 [thread overview]
Message-ID: <011101c5a187$3bc8cf00$0400a8c0@LocalHost> (raw)
In-Reply-To: 17151.60931.26972.713074@cse.unsw.edu.au
Thanks, I will test it, when I can...
In this moment, my system is an working online system, and now only one 8TB
space what I can use...
Thats right, maybe I can built linear array from only one soure device,but:
My first problem is, on my 8TB device is already exists XFS filesystem, with
valuable data, what I can't backup.
It is still OK, but I can't insert one raid layer, because the raid's
superblock, and the XFS is'nt shrinkable. :-(
The only one way (I think) to plug in another raw device, and build an array
from 8TB-device + new small device, to get much space to FS.
But it is too risky for me!
Do you think it is safe?
Currently I use 2.6.13-rc3.
This patch is good for this version, or only the last version?
Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :)
Thanks,
Janos
----- Original Message -----
From: "Neil Brown" <neilb@cse.unsw.edu.au>
To: <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, August 15, 2005 3:21 AM
Subject: Re: Found a new bug!
> On Monday August 15, djani22@dynamicweb.hu wrote:
> > Hello list, Neil!
> >
> > Is there something news with the 2TB raid-input problem?
> > Sooner or later, I will need to join two 8TB array to one big 16TB. :-)
>
> Thanks for the reminder.
>
> The following patch should work, but my test machine won't boot the
> current -mm kernels :-( so it is hard to test properly.
>
> Let me know the results if you are able to test it.
>
> Thanks,
> NeilBrown
>
> ---------------------------------
> Support md/linear array with components greater than 2 terabytes.
>
> linear currently uses division by the size of the smallest componenet
> device to find which device a request goes to.
> If that smallest device is larger than 2 terabytes, then the division
> will not work on some systems.
>
> So we introduce a pre-shift, and take care not to make the hash table
> too large, much like the code in raid0.
>
> Also get rid of conf->nr_zones, which is not needed.
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
> ./drivers/md/linear.c | 99
++++++++++++++++++++++++++++--------------
> ./include/linux/raid/linear.h | 4 -
> 2 files changed, 70 insertions(+), 33 deletions(-)
>
> diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
> --- ./drivers/md/linear.c~current~ 2005-08-15 11:18:21.000000000 +1000
> +++ ./drivers/md/linear.c 2005-08-15 11:18:27.000000000 +1000
> @@ -38,7 +38,8 @@ static inline dev_info_t *which_dev(mdde
> /*
> * sector_div(a,b) returns the remainer and sets a to a/b
> */
> - (void)sector_div(block, conf->smallest->size);
> + block >>= conf->preshift;
> + (void)sector_div(block, conf->hash_spacing);
> hash = conf->hash_table[block];
>
> while ((sector>>1) >= (hash->size + hash->offset))
> @@ -47,7 +48,7 @@ static inline dev_info_t *which_dev(mdde
> }
>
> /**
> - * linear_mergeable_bvec -- tell bio layer if a two requests can be
merged
> + * linear_mergeable_bvec -- tell bio layer if two requests can be merged
> * @q: request queue
> * @bio: the buffer head that's been built up so far
> * @biovec: the request that could be merged to it.
> @@ -116,7 +117,7 @@ static int linear_run (mddev_t *mddev)
> dev_info_t **table;
> mdk_rdev_t *rdev;
> int i, nb_zone, cnt;
> - sector_t start;
> + sector_t min_spacing;
> sector_t curr_offset;
> struct list_head *tmp;
>
> @@ -127,11 +128,6 @@ static int linear_run (mddev_t *mddev)
> memset(conf, 0, sizeof(*conf) + mddev->raid_disks*sizeof(dev_info_t));
> mddev->private = conf;
>
> - /*
> - * Find the smallest device.
> - */
> -
> - conf->smallest = NULL;
> cnt = 0;
> mddev->array_size = 0;
>
> @@ -159,8 +155,6 @@ static int linear_run (mddev_t *mddev)
> disk->size = rdev->size;
> mddev->array_size += rdev->size;
>
> - if (!conf->smallest || (disk->size < conf->smallest->size))
> - conf->smallest = disk;
> cnt++;
> }
> if (cnt != mddev->raid_disks) {
> @@ -168,6 +162,36 @@ static int linear_run (mddev_t *mddev)
> goto out;
> }
>
> + min_spacing = mddev->array_size;
> + sector_div(min_spacing, PAGE_SIZE/sizeof(struct dev_info *));
> +
> + /* min_spacing is the minimum spacing that will fit the hash
> + * table in one PAGE. This may be much smaller than needed.
> + * We find the smallest non-terminal set of consecutive devices
> + * that is larger than min_spacing as use the size of that as
> + * the actual spacing
> + */
> + conf->hash_spacing = mddev->array_size;
> + for (i=0; i < cnt-1 ; i++) {
> + sector_t sz = 0;
> + int j;
> + for (j=i; i<cnt-1 && sz < min_spacing ; j++)
> + sz += conf->disks[j].size;
> + if (sz >= min_spacing && sz < conf->hash_spacing)
> + conf->hash_spacing = sz;
> + }
> +
> + /* hash_spacing may be too large for sector_div to work with,
> + * so we might need to pre-shift
> + */
> + conf->preshift = 0;
> + if (sizeof(sector_t) > sizeof(u32)) {
> + sector_t space = conf->hash_spacing;
> + while (space > (sector_t)(~(u32)0)) {
> + space >>= 1;
> + conf->preshift++;
> + }
> + }
> /*
> * This code was restructured to work around a gcc-2.95.3 internal
> * compiler error. Alter it with care.
> @@ -177,39 +201,52 @@ static int linear_run (mddev_t *mddev)
> unsigned round;
> unsigned long base;
>
> - sz = mddev->array_size;
> - base = conf->smallest->size;
> + sz = mddev->array_size >> conf->preshift;
> + sz += 1; /* force round-up */
> + base = conf->hash_spacing >> conf->preshift;
> round = sector_div(sz, base);
> - nb_zone = conf->nr_zones = sz + (round ? 1 : 0);
> + nb_zone = sz + (round ? 1 : 0);
> }
> -
> - conf->hash_table = kmalloc (sizeof (dev_info_t*) * nb_zone,
> + BUG_ON(nb_zone > PAGE_SIZE / sizeof(struct dev_info *));
> +
> + conf->hash_table = kmalloc (sizeof (struct dev_info *) * nb_zone,
> GFP_KERNEL);
> if (!conf->hash_table)
> goto out;
>
> /*
> * Here we generate the linear hash table
> + * First calculate the device offsets.
> */
> + conf->disks[0].offset = 0;
> + for (i=1; i<mddev->raid_disks; i++)
> + conf->disks[i].offset =
> + conf->disks[i-1].offset +
> + conf->disks[i-1].size;
> +
> table = conf->hash_table;
> - start = 0;
> curr_offset = 0;
> - for (i = 0; i < cnt; i++) {
> - dev_info_t *disk = conf->disks + i;
> -
> - disk->offset = curr_offset;
> - curr_offset += disk->size;
> -
> - /* 'curr_offset' is the end of this disk
> - * 'start' is the start of table
> + i = 0;
> + for (curr_offset = 0;
> + curr_offset < mddev->array_size;
> + curr_offset += conf->hash_spacing) {
> +
> + while (i < mddev->raid_disks-1 &&
> + curr_offset >= conf->disks[i+1].offset)
> + i++;
> +
> + *table ++ = conf->disks + i;
> + }
> +
> + if (conf->preshift) {
> + conf->hash_spacing >>= conf->preshift;
> + /* round hash_spacing up so that when we divide by it,
> + * we err on the side of "too-low", which is safest.
> */
> - while (start < curr_offset) {
> - *table++ = disk;
> - start += conf->smallest->size;
> - }
> + conf->hash_spacing++;
> }
> - if (table-conf->hash_table != nb_zone)
> - BUG();
> +
> + BUG_ON(table - conf->hash_table > nb_zone);
>
> blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
> mddev->queue->unplug_fn = linear_unplug;
> @@ -299,7 +336,7 @@ static void linear_status (struct seq_fi
> sector_t s = 0;
>
> seq_printf(seq, " ");
> - for (j = 0; j < conf->nr_zones; j++)
> + for (j = 0; j < mddev->raid_disks; j++)
> {
> char b[BDEVNAME_SIZE];
> s += conf->smallest_size;
>
> diff ./include/linux/raid/linear.h~current~ ./include/linux/raid/linear.h
> --- ./include/linux/raid/linear.h~current~ 2005-08-15 11:18:21.000000000
+1000
> +++ ./include/linux/raid/linear.h 2005-08-15 09:13:55.000000000 +1000
> @@ -14,8 +14,8 @@ typedef struct dev_info dev_info_t;
> struct linear_private_data
> {
> dev_info_t **hash_table;
> - dev_info_t *smallest;
> - int nr_zones;
> + sector_t hash_spacing;
> + int preshift; /* shift before dividing by hash_spacing */
> dev_info_t disks[0];
> };
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2005-08-15 10:50 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20050717182650.24540.patches@notabene>
2005-07-17 8:27 ` [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size NeilBrown
2005-07-17 12:10 ` Found a new bug! djani22
2005-07-17 22:13 ` Neil Brown
2005-07-17 22:31 ` djani22
2005-08-14 22:38 ` djani22
2005-08-15 1:21 ` Neil Brown
2005-08-15 10:50 ` djani22 [this message]
2005-08-16 13:54 ` perfomance question djani22
2005-08-16 14:30 ` RAID6 Query Colonel Hell
2005-08-16 15:40 ` dean gaudet
2005-08-16 16:44 ` Colonel Hell
2005-08-18 4:59 ` perfomance question Neil Brown
2005-08-18 15:20 ` djani22
2005-08-18 4:34 ` Found a new bug! Neil Brown
2005-08-18 15:39 ` djani22
2005-08-20 9:55 ` Oops in raid1? djani22
2005-08-20 15:53 ` Pallai Roland
2005-08-20 16:26 ` djani22
2005-08-20 16:50 ` Pallai Roland
2005-08-20 16:57 ` djani22
2005-07-17 22:20 Found a new bug! djani22
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='011101c5a187$3bc8cf00$0400a8c0@LocalHost' \
--to=djani22@dynamicweb.hu \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@cse.unsw.edu.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.