* [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size. [not found] <20050717182650.24540.patches@notabene> @ 2005-07-17 8:27 ` NeilBrown 2005-07-17 12:10 ` Found a new bug! djani22 0 siblings, 1 reply; 20+ messages in thread From: NeilBrown @ 2005-07-17 8:27 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid Another md patch against 2.6.13-rc2-mm2, suitable for 2.6.13. Thanks, NeilBrown ### Comments for Changeset Without this, and attempt to 'grow' an array will claim to have synced the extra part without actually having done anything. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> ### Diffstat output ./drivers/md/raid1.c | 1 + ./drivers/md/raid5.c | 1 + ./drivers/md/raid6main.c | 1 + 3 files changed, 3 insertions(+) diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c --- ./drivers/md/raid1.c~current~ 2005-07-17 18:25:47.000000000 +1000 +++ ./drivers/md/raid1.c 2005-07-17 17:18:13.000000000 +1000 @@ -1467,6 +1467,7 @@ static int raid1_resize(mddev_t *mddev, set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); } mddev->size = mddev->array_size; + mddev->resync_max_sectors = sectors; return 0; } diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c --- ./drivers/md/raid5.c~current~ 2005-07-17 18:25:47.000000000 +1000 +++ ./drivers/md/raid5.c 2005-07-17 18:25:52.000000000 +1000 @@ -1931,6 +1931,7 @@ static int raid5_resize(mddev_t *mddev, set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); } mddev->size = sectors /2; + mddev->resync_max_sectors = sectors; return 0; } diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c --- ./drivers/md/raid6main.c~current~ 2005-07-17 18:25:47.000000000 +1000 +++ ./drivers/md/raid6main.c 2005-07-17 17:19:04.000000000 +1000 @@ -2095,6 +2095,7 @@ static int raid6_resize(mddev_t *mddev, set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); } mddev->size = sectors /2; + mddev->resync_max_sectors = sectors; return 0; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Found a new bug! 2005-07-17 8:27 ` [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size NeilBrown @ 2005-07-17 12:10 ` djani22 2005-07-17 22:13 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: djani22 @ 2005-07-17 12:10 UTC (permalink / raw) To: linux-raid Hi all! I think I found a new bug in the kernel ! (or mdadm?) First I try this: mkraid --configfile /etc/raidtab.nw /dev/md0 -R DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure! handling MD device /dev/md0 analyzing super-block couldn't get device size for /dev/md31 -- File too large mkraid: aborted. (In addition to the above messages, see the syslog and /proc/mdstat as well for potential clues.) Next I try this: ./create_linear mdadm: /dev/md31 appears to be part of a raid array: level=0 devices=1 ctime=Sun Jul 17 13:30:27 2005 Continue creating array? y ./create_linear: line 1: 2853 Segmentation fault mdadm --create /dev/md0 --chunk=32 --level=linear --force --raid-devices=1 /dev/md31 After this little script the half of the raid subsystem hangs: The raidtools makes nothing, the mdadm makes nothing too. AND the cat /proc/mdstat is hangs too! But the /dev/md31 device is still working. mdstat in previous 2s: (watch cat /proc/mdstat) Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [faulty] md31 : active raid0 md4[3] md3[2] md2[1] md1[0] 7814332928 blocks 32k chunks md4 : active raid1 nbd3[0] 1953583296 blocks [2/1] [U_] md3 : active raid1 nbd2[0] 1953583296 blocks [2/1] [U_] md2 : active raid1 nbd1[0] 1953583296 blocks [2/1] [U_] md1 : active raid1 nbd0[0] 1953583296 blocks [2/1] [U_] unused devices: <none> Kernel: 2.6.13-rc3 raidtools-1.00.3 mdadm-1.12.0 The background: I try to build a big array ~8TB. I use for this 5 PCs. 4 for "disk nodes" with nbd and 1 for "concentrator". (from previous idea in this list. ;) In the concentrator, the first level raid (md1-4) is for ability to backup, swap the disk nodes. (node-spare) The next level (md31) is for the performance. ;) And, the last level (md0 linear) for scalability. Why dont use LVM for last level? Well, I try that, but cat /dev/.../LV >/dev/null can do only 15 - 16 MB/s and cat /dev/md31 >/dev/null can do 34-38MB/s. (the network is G-Ethernet, but only 32bit/33MHz PCI!) Thanks Janos ----- Original Message ----- From: "NeilBrown" <neilb@cse.unsw.edu.au> To: "Andrew Morton" <akpm@osdl.org> Cc: <linux-raid@vger.kernel.org> Sent: Sunday, July 17, 2005 10:27 AM Subject: [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size. > Another md patch against 2.6.13-rc2-mm2, suitable for 2.6.13. > Thanks, > NeilBrown > > ### Comments for Changeset > > Without this, and attempt to 'grow' an array will claim to have synced > the extra part without actually having done anything. > > Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> > > ### Diffstat output > ./drivers/md/raid1.c | 1 + > ./drivers/md/raid5.c | 1 + > ./drivers/md/raid6main.c | 1 + > 3 files changed, 3 insertions(+) > > diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c > --- ./drivers/md/raid1.c~current~ 2005-07-17 18:25:47.000000000 +1000 > +++ ./drivers/md/raid1.c 2005-07-17 17:18:13.000000000 +1000 > @@ -1467,6 +1467,7 @@ static int raid1_resize(mddev_t *mddev, > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > } > mddev->size = mddev->array_size; > + mddev->resync_max_sectors = sectors; > return 0; > } > > > diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c > --- ./drivers/md/raid5.c~current~ 2005-07-17 18:25:47.000000000 +1000 > +++ ./drivers/md/raid5.c 2005-07-17 18:25:52.000000000 +1000 > @@ -1931,6 +1931,7 @@ static int raid5_resize(mddev_t *mddev, > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > } > mddev->size = sectors /2; > + mddev->resync_max_sectors = sectors; > return 0; > } > > > diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c > --- ./drivers/md/raid6main.c~current~ 2005-07-17 18:25:47.000000000 +1000 > +++ ./drivers/md/raid6main.c 2005-07-17 17:19:04.000000000 +1000 > @@ -2095,6 +2095,7 @@ static int raid6_resize(mddev_t *mddev, > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > } > mddev->size = sectors /2; > + mddev->resync_max_sectors = sectors; > return 0; > } > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-07-17 12:10 ` Found a new bug! djani22 @ 2005-07-17 22:13 ` Neil Brown 2005-07-17 22:31 ` djani22 2005-08-14 22:38 ` djani22 0 siblings, 2 replies; 20+ messages in thread From: Neil Brown @ 2005-07-17 22:13 UTC (permalink / raw) To: djani22; +Cc: linux-raid On Sunday July 17, djani22@dynamicweb.hu wrote: > Hi all! > > I think I found a new bug in the kernel ! (or mdadm?) Yes. With the current code you cannot have components of a 'linear' which are larger than 2^32 sectors. I'll try to put together a fix for this in the next day or so. NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-07-17 22:13 ` Neil Brown @ 2005-07-17 22:31 ` djani22 2005-08-14 22:38 ` djani22 1 sibling, 0 replies; 20+ messages in thread From: djani22 @ 2005-07-17 22:31 UTC (permalink / raw) To: linux-raid ----- Original Message ----- From: "Neil Brown" <neilb@cse.unsw.edu.au> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Monday, July 18, 2005 12:13 AM Subject: Re: Found a new bug! > On Sunday July 17, djani22@dynamicweb.hu wrote: > > Hi all! > > > > I think I found a new bug in the kernel ! (or mdadm?) > > Yes. With the current code you cannot have components of a 'linear' > which are larger than 2^32 sectors. I'll try to put together a fix > for this in the next day or so. > > NeilBrown Thanks for help! Another one question: I did'nt find an usable way for me, but my system must start anyway.... I hava created the XFS directly to the 8TB raid0 (/dev/md31), and now the copy is runing... It is possible in the future to convert it to be a part of the planned linear array without backup all data? Thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-07-17 22:13 ` Neil Brown 2005-07-17 22:31 ` djani22 @ 2005-08-14 22:38 ` djani22 2005-08-15 1:21 ` Neil Brown 1 sibling, 1 reply; 20+ messages in thread From: djani22 @ 2005-08-14 22:38 UTC (permalink / raw) To: linux-raid Hello list, Neil! Is there something news with the 2TB raid-input problem? Sooner or later, I will need to join two 8TB array to one big 16TB. :-) Thanks, Janos ----- Original Message ----- From: "Neil Brown" <neilb@cse.unsw.edu.au> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Monday, July 18, 2005 12:13 AM Subject: Re: Found a new bug! > On Sunday July 17, djani22@dynamicweb.hu wrote: > > Hi all! > > > > I think I found a new bug in the kernel ! (or mdadm?) > > Yes. With the current code you cannot have components of a 'linear' > which are larger than 2^32 sectors. I'll try to put together a fix > for this in the next day or so. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-08-14 22:38 ` djani22 @ 2005-08-15 1:21 ` Neil Brown 2005-08-15 10:50 ` djani22 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2005-08-15 1:21 UTC (permalink / raw) To: djani22; +Cc: linux-raid On Monday August 15, djani22@dynamicweb.hu wrote: > Hello list, Neil! > > Is there something news with the 2TB raid-input problem? > Sooner or later, I will need to join two 8TB array to one big 16TB. :-) Thanks for the reminder. The following patch should work, but my test machine won't boot the current -mm kernels :-( so it is hard to test properly. Let me know the results if you are able to test it. Thanks, NeilBrown --------------------------------- Support md/linear array with components greater than 2 terabytes. linear currently uses division by the size of the smallest componenet device to find which device a request goes to. If that smallest device is larger than 2 terabytes, then the division will not work on some systems. So we introduce a pre-shift, and take care not to make the hash table too large, much like the code in raid0. Also get rid of conf->nr_zones, which is not needed. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> ### Diffstat output ./drivers/md/linear.c | 99 ++++++++++++++++++++++++++++-------------- ./include/linux/raid/linear.h | 4 - 2 files changed, 70 insertions(+), 33 deletions(-) diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c --- ./drivers/md/linear.c~current~ 2005-08-15 11:18:21.000000000 +1000 +++ ./drivers/md/linear.c 2005-08-15 11:18:27.000000000 +1000 @@ -38,7 +38,8 @@ static inline dev_info_t *which_dev(mdde /* * sector_div(a,b) returns the remainer and sets a to a/b */ - (void)sector_div(block, conf->smallest->size); + block >>= conf->preshift; + (void)sector_div(block, conf->hash_spacing); hash = conf->hash_table[block]; while ((sector>>1) >= (hash->size + hash->offset)) @@ -47,7 +48,7 @@ static inline dev_info_t *which_dev(mdde } /** - * linear_mergeable_bvec -- tell bio layer if a two requests can be merged + * linear_mergeable_bvec -- tell bio layer if two requests can be merged * @q: request queue * @bio: the buffer head that's been built up so far * @biovec: the request that could be merged to it. @@ -116,7 +117,7 @@ static int linear_run (mddev_t *mddev) dev_info_t **table; mdk_rdev_t *rdev; int i, nb_zone, cnt; - sector_t start; + sector_t min_spacing; sector_t curr_offset; struct list_head *tmp; @@ -127,11 +128,6 @@ static int linear_run (mddev_t *mddev) memset(conf, 0, sizeof(*conf) + mddev->raid_disks*sizeof(dev_info_t)); mddev->private = conf; - /* - * Find the smallest device. - */ - - conf->smallest = NULL; cnt = 0; mddev->array_size = 0; @@ -159,8 +155,6 @@ static int linear_run (mddev_t *mddev) disk->size = rdev->size; mddev->array_size += rdev->size; - if (!conf->smallest || (disk->size < conf->smallest->size)) - conf->smallest = disk; cnt++; } if (cnt != mddev->raid_disks) { @@ -168,6 +162,36 @@ static int linear_run (mddev_t *mddev) goto out; } + min_spacing = mddev->array_size; + sector_div(min_spacing, PAGE_SIZE/sizeof(struct dev_info *)); + + /* min_spacing is the minimum spacing that will fit the hash + * table in one PAGE. This may be much smaller than needed. + * We find the smallest non-terminal set of consecutive devices + * that is larger than min_spacing as use the size of that as + * the actual spacing + */ + conf->hash_spacing = mddev->array_size; + for (i=0; i < cnt-1 ; i++) { + sector_t sz = 0; + int j; + for (j=i; i<cnt-1 && sz < min_spacing ; j++) + sz += conf->disks[j].size; + if (sz >= min_spacing && sz < conf->hash_spacing) + conf->hash_spacing = sz; + } + + /* hash_spacing may be too large for sector_div to work with, + * so we might need to pre-shift + */ + conf->preshift = 0; + if (sizeof(sector_t) > sizeof(u32)) { + sector_t space = conf->hash_spacing; + while (space > (sector_t)(~(u32)0)) { + space >>= 1; + conf->preshift++; + } + } /* * This code was restructured to work around a gcc-2.95.3 internal * compiler error. Alter it with care. @@ -177,39 +201,52 @@ static int linear_run (mddev_t *mddev) unsigned round; unsigned long base; - sz = mddev->array_size; - base = conf->smallest->size; + sz = mddev->array_size >> conf->preshift; + sz += 1; /* force round-up */ + base = conf->hash_spacing >> conf->preshift; round = sector_div(sz, base); - nb_zone = conf->nr_zones = sz + (round ? 1 : 0); + nb_zone = sz + (round ? 1 : 0); } - - conf->hash_table = kmalloc (sizeof (dev_info_t*) * nb_zone, + BUG_ON(nb_zone > PAGE_SIZE / sizeof(struct dev_info *)); + + conf->hash_table = kmalloc (sizeof (struct dev_info *) * nb_zone, GFP_KERNEL); if (!conf->hash_table) goto out; /* * Here we generate the linear hash table + * First calculate the device offsets. */ + conf->disks[0].offset = 0; + for (i=1; i<mddev->raid_disks; i++) + conf->disks[i].offset = + conf->disks[i-1].offset + + conf->disks[i-1].size; + table = conf->hash_table; - start = 0; curr_offset = 0; - for (i = 0; i < cnt; i++) { - dev_info_t *disk = conf->disks + i; - - disk->offset = curr_offset; - curr_offset += disk->size; - - /* 'curr_offset' is the end of this disk - * 'start' is the start of table + i = 0; + for (curr_offset = 0; + curr_offset < mddev->array_size; + curr_offset += conf->hash_spacing) { + + while (i < mddev->raid_disks-1 && + curr_offset >= conf->disks[i+1].offset) + i++; + + *table ++ = conf->disks + i; + } + + if (conf->preshift) { + conf->hash_spacing >>= conf->preshift; + /* round hash_spacing up so that when we divide by it, + * we err on the side of "too-low", which is safest. */ - while (start < curr_offset) { - *table++ = disk; - start += conf->smallest->size; - } + conf->hash_spacing++; } - if (table-conf->hash_table != nb_zone) - BUG(); + + BUG_ON(table - conf->hash_table > nb_zone); blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec); mddev->queue->unplug_fn = linear_unplug; @@ -299,7 +336,7 @@ static void linear_status (struct seq_fi sector_t s = 0; seq_printf(seq, " "); - for (j = 0; j < conf->nr_zones; j++) + for (j = 0; j < mddev->raid_disks; j++) { char b[BDEVNAME_SIZE]; s += conf->smallest_size; diff ./include/linux/raid/linear.h~current~ ./include/linux/raid/linear.h --- ./include/linux/raid/linear.h~current~ 2005-08-15 11:18:21.000000000 +1000 +++ ./include/linux/raid/linear.h 2005-08-15 09:13:55.000000000 +1000 @@ -14,8 +14,8 @@ typedef struct dev_info dev_info_t; struct linear_private_data { dev_info_t **hash_table; - dev_info_t *smallest; - int nr_zones; + sector_t hash_spacing; + int preshift; /* shift before dividing by hash_spacing */ dev_info_t disks[0]; }; ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-08-15 1:21 ` Neil Brown @ 2005-08-15 10:50 ` djani22 2005-08-16 13:54 ` perfomance question djani22 2005-08-18 4:34 ` Found a new bug! Neil Brown 0 siblings, 2 replies; 20+ messages in thread From: djani22 @ 2005-08-15 10:50 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Thanks, I will test it, when I can... In this moment, my system is an working online system, and now only one 8TB space what I can use... Thats right, maybe I can built linear array from only one soure device,but: My first problem is, on my 8TB device is already exists XFS filesystem, with valuable data, what I can't backup. It is still OK, but I can't insert one raid layer, because the raid's superblock, and the XFS is'nt shrinkable. :-( The only one way (I think) to plug in another raw device, and build an array from 8TB-device + new small device, to get much space to FS. But it is too risky for me! Do you think it is safe? Currently I use 2.6.13-rc3. This patch is good for this version, or only the last version? Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :) Thanks, Janos ----- Original Message ----- From: "Neil Brown" <neilb@cse.unsw.edu.au> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Monday, August 15, 2005 3:21 AM Subject: Re: Found a new bug! > On Monday August 15, djani22@dynamicweb.hu wrote: > > Hello list, Neil! > > > > Is there something news with the 2TB raid-input problem? > > Sooner or later, I will need to join two 8TB array to one big 16TB. :-) > > Thanks for the reminder. > > The following patch should work, but my test machine won't boot the > current -mm kernels :-( so it is hard to test properly. > > Let me know the results if you are able to test it. > > Thanks, > NeilBrown > > --------------------------------- > Support md/linear array with components greater than 2 terabytes. > > linear currently uses division by the size of the smallest componenet > device to find which device a request goes to. > If that smallest device is larger than 2 terabytes, then the division > will not work on some systems. > > So we introduce a pre-shift, and take care not to make the hash table > too large, much like the code in raid0. > > Also get rid of conf->nr_zones, which is not needed. > > Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> > > ### Diffstat output > ./drivers/md/linear.c | 99 ++++++++++++++++++++++++++++-------------- > ./include/linux/raid/linear.h | 4 - > 2 files changed, 70 insertions(+), 33 deletions(-) > > diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c > --- ./drivers/md/linear.c~current~ 2005-08-15 11:18:21.000000000 +1000 > +++ ./drivers/md/linear.c 2005-08-15 11:18:27.000000000 +1000 > @@ -38,7 +38,8 @@ static inline dev_info_t *which_dev(mdde > /* > * sector_div(a,b) returns the remainer and sets a to a/b > */ > - (void)sector_div(block, conf->smallest->size); > + block >>= conf->preshift; > + (void)sector_div(block, conf->hash_spacing); > hash = conf->hash_table[block]; > > while ((sector>>1) >= (hash->size + hash->offset)) > @@ -47,7 +48,7 @@ static inline dev_info_t *which_dev(mdde > } > > /** > - * linear_mergeable_bvec -- tell bio layer if a two requests can be merged > + * linear_mergeable_bvec -- tell bio layer if two requests can be merged > * @q: request queue > * @bio: the buffer head that's been built up so far > * @biovec: the request that could be merged to it. > @@ -116,7 +117,7 @@ static int linear_run (mddev_t *mddev) > dev_info_t **table; > mdk_rdev_t *rdev; > int i, nb_zone, cnt; > - sector_t start; > + sector_t min_spacing; > sector_t curr_offset; > struct list_head *tmp; > > @@ -127,11 +128,6 @@ static int linear_run (mddev_t *mddev) > memset(conf, 0, sizeof(*conf) + mddev->raid_disks*sizeof(dev_info_t)); > mddev->private = conf; > > - /* > - * Find the smallest device. > - */ > - > - conf->smallest = NULL; > cnt = 0; > mddev->array_size = 0; > > @@ -159,8 +155,6 @@ static int linear_run (mddev_t *mddev) > disk->size = rdev->size; > mddev->array_size += rdev->size; > > - if (!conf->smallest || (disk->size < conf->smallest->size)) > - conf->smallest = disk; > cnt++; > } > if (cnt != mddev->raid_disks) { > @@ -168,6 +162,36 @@ static int linear_run (mddev_t *mddev) > goto out; > } > > + min_spacing = mddev->array_size; > + sector_div(min_spacing, PAGE_SIZE/sizeof(struct dev_info *)); > + > + /* min_spacing is the minimum spacing that will fit the hash > + * table in one PAGE. This may be much smaller than needed. > + * We find the smallest non-terminal set of consecutive devices > + * that is larger than min_spacing as use the size of that as > + * the actual spacing > + */ > + conf->hash_spacing = mddev->array_size; > + for (i=0; i < cnt-1 ; i++) { > + sector_t sz = 0; > + int j; > + for (j=i; i<cnt-1 && sz < min_spacing ; j++) > + sz += conf->disks[j].size; > + if (sz >= min_spacing && sz < conf->hash_spacing) > + conf->hash_spacing = sz; > + } > + > + /* hash_spacing may be too large for sector_div to work with, > + * so we might need to pre-shift > + */ > + conf->preshift = 0; > + if (sizeof(sector_t) > sizeof(u32)) { > + sector_t space = conf->hash_spacing; > + while (space > (sector_t)(~(u32)0)) { > + space >>= 1; > + conf->preshift++; > + } > + } > /* > * This code was restructured to work around a gcc-2.95.3 internal > * compiler error. Alter it with care. > @@ -177,39 +201,52 @@ static int linear_run (mddev_t *mddev) > unsigned round; > unsigned long base; > > - sz = mddev->array_size; > - base = conf->smallest->size; > + sz = mddev->array_size >> conf->preshift; > + sz += 1; /* force round-up */ > + base = conf->hash_spacing >> conf->preshift; > round = sector_div(sz, base); > - nb_zone = conf->nr_zones = sz + (round ? 1 : 0); > + nb_zone = sz + (round ? 1 : 0); > } > - > - conf->hash_table = kmalloc (sizeof (dev_info_t*) * nb_zone, > + BUG_ON(nb_zone > PAGE_SIZE / sizeof(struct dev_info *)); > + > + conf->hash_table = kmalloc (sizeof (struct dev_info *) * nb_zone, > GFP_KERNEL); > if (!conf->hash_table) > goto out; > > /* > * Here we generate the linear hash table > + * First calculate the device offsets. > */ > + conf->disks[0].offset = 0; > + for (i=1; i<mddev->raid_disks; i++) > + conf->disks[i].offset = > + conf->disks[i-1].offset + > + conf->disks[i-1].size; > + > table = conf->hash_table; > - start = 0; > curr_offset = 0; > - for (i = 0; i < cnt; i++) { > - dev_info_t *disk = conf->disks + i; > - > - disk->offset = curr_offset; > - curr_offset += disk->size; > - > - /* 'curr_offset' is the end of this disk > - * 'start' is the start of table > + i = 0; > + for (curr_offset = 0; > + curr_offset < mddev->array_size; > + curr_offset += conf->hash_spacing) { > + > + while (i < mddev->raid_disks-1 && > + curr_offset >= conf->disks[i+1].offset) > + i++; > + > + *table ++ = conf->disks + i; > + } > + > + if (conf->preshift) { > + conf->hash_spacing >>= conf->preshift; > + /* round hash_spacing up so that when we divide by it, > + * we err on the side of "too-low", which is safest. > */ > - while (start < curr_offset) { > - *table++ = disk; > - start += conf->smallest->size; > - } > + conf->hash_spacing++; > } > - if (table-conf->hash_table != nb_zone) > - BUG(); > + > + BUG_ON(table - conf->hash_table > nb_zone); > > blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec); > mddev->queue->unplug_fn = linear_unplug; > @@ -299,7 +336,7 @@ static void linear_status (struct seq_fi > sector_t s = 0; > > seq_printf(seq, " "); > - for (j = 0; j < conf->nr_zones; j++) > + for (j = 0; j < mddev->raid_disks; j++) > { > char b[BDEVNAME_SIZE]; > s += conf->smallest_size; > > diff ./include/linux/raid/linear.h~current~ ./include/linux/raid/linear.h > --- ./include/linux/raid/linear.h~current~ 2005-08-15 11:18:21.000000000 +1000 > +++ ./include/linux/raid/linear.h 2005-08-15 09:13:55.000000000 +1000 > @@ -14,8 +14,8 @@ typedef struct dev_info dev_info_t; > struct linear_private_data > { > dev_info_t **hash_table; > - dev_info_t *smallest; > - int nr_zones; > + sector_t hash_spacing; > + int preshift; /* shift before dividing by hash_spacing */ > dev_info_t disks[0]; > }; > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* perfomance question. 2005-08-15 10:50 ` djani22 @ 2005-08-16 13:54 ` djani22 2005-08-16 14:30 ` RAID6 Query Colonel Hell 2005-08-18 4:59 ` perfomance question Neil Brown 2005-08-18 4:34 ` Found a new bug! Neil Brown 1 sibling, 2 replies; 20+ messages in thread From: djani22 @ 2005-08-16 13:54 UTC (permalink / raw) To: linux-raid Hello list, I have performance problem. (again) :-) What chunk size is better in raid5, and raid0? The lot of small chunks, or some bigger? I know it is depends on FS, but I think only the raid code! Which is better readable/writable? Thanks Janos ^ permalink raw reply [flat|nested] 20+ messages in thread
* RAID6 Query 2005-08-16 13:54 ` perfomance question djani22 @ 2005-08-16 14:30 ` Colonel Hell 2005-08-16 15:40 ` dean gaudet 2005-08-18 4:59 ` perfomance question Neil Brown 1 sibling, 1 reply; 20+ messages in thread From: Colonel Hell @ 2005-08-16 14:30 UTC (permalink / raw) To: linux-raid Hi, I just went thru a couple of papers describing RAID6. I dunno how relevant this discussion grp is for the qry ...but here I go :) ... I couldnt figure out why is P+Q configuration better over P+q' where q' == P. What I mean is instead of calculating a new checksum (thru a lot of GF theory etc) just store the parity block (P)again. In this case as well we have the same amount of fault tolerance or not :-s ... Let me know, here are the links which I went thru. http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf http://www.intel.com/design/storage/papers/308122.htm Regards, Amritanshu. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: RAID6 Query 2005-08-16 14:30 ` RAID6 Query Colonel Hell @ 2005-08-16 15:40 ` dean gaudet 2005-08-16 16:44 ` Colonel Hell 0 siblings, 1 reply; 20+ messages in thread From: dean gaudet @ 2005-08-16 15:40 UTC (permalink / raw) To: Colonel Hell; +Cc: linux-raid On Tue, 16 Aug 2005, Colonel Hell wrote: > I just went thru a couple of papers describing RAID6. > I dunno how relevant this discussion grp is for the qry ...but here I go :) ... > I couldnt figure out why is P+Q configuration better over P+q' where > q' == P. What I mean is instead of calculating a new checksum (thru a > lot of GF theory etc) just store the parity block (P)again. In this > case as well we have the same amount of fault tolerance or not > :-s ... this is no better than raid5 at surviving a two disk failure. i.e. consider the case of two data blocks missing -- you can't reconstruct if all you have is parity. -dean ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: RAID6 Query 2005-08-16 15:40 ` dean gaudet @ 2005-08-16 16:44 ` Colonel Hell 0 siblings, 0 replies; 20+ messages in thread From: Colonel Hell @ 2005-08-16 16:44 UTC (permalink / raw) To: dean gaudet; +Cc: linux-raid thanks and sorry for a stupid qry suffering from foot-in-the-mouth disease :P On 8/16/05, dean gaudet <dean-list-linux-raid@arctic.org> wrote: > On Tue, 16 Aug 2005, Colonel Hell wrote: > > > I just went thru a couple of papers describing RAID6. > > I dunno how relevant this discussion grp is for the qry ...but here I go :) ... > > I couldnt figure out why is P+Q configuration better over P+q' where > > q' == P. What I mean is instead of calculating a new checksum (thru a > > lot of GF theory etc) just store the parity block (P)again. In this > > case as well we have the same amount of fault tolerance or not > > :-s ... > > this is no better than raid5 at surviving a two disk failure. i.e. > consider the case of two data blocks missing -- you can't reconstruct if > all you have is parity. > > -dean > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: perfomance question. 2005-08-16 13:54 ` perfomance question djani22 2005-08-16 14:30 ` RAID6 Query Colonel Hell @ 2005-08-18 4:59 ` Neil Brown 2005-08-18 15:20 ` djani22 1 sibling, 1 reply; 20+ messages in thread From: Neil Brown @ 2005-08-18 4:59 UTC (permalink / raw) To: djani22; +Cc: linux-raid On Tuesday August 16, djani22@dynamicweb.hu wrote: > Hello list, > > I have performance problem. (again) :-) > > What chunk size is better in raid5, and raid0? > The lot of small chunks, or some bigger? This is highly dependant one workload and hardware performance. The best thing to do is develop a test that simulates your real workload and run it with various stripe sizes, and see which one wins. I suspect there would be very little gain in going to very small chunk sizes (<16k). Anywhere between there and 1Meg is worth trying. mdadm uses a default of 64k which is probably not too bad for most situations, but I cannot promise it being optimal for any. Sorry I cannot be more helpful. Your performance problem may not be chunk-size related. Maybe increasing the readahead (with blockdev) would help... NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: perfomance question. 2005-08-18 4:59 ` perfomance question Neil Brown @ 2005-08-18 15:20 ` djani22 0 siblings, 0 replies; 20+ messages in thread From: djani22 @ 2005-08-18 15:20 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Thanks for trying to help me! My problem is (looks like) solved. It was a kernel problem. (I think...) When I switch to 2.6.13-rc6 (from rc3), the problem is gone! It is very interesting! I use SWRAID to distribute equal load to nodes. (raid0 chunksize 32k) In my system with 2.6.13-rc3 the "node-3" gets much more (4x - 5x) read requests, but dont know why, dont ask! :-) First I think, the XFS's log is somehow always on 3 rd chunk. I send this question to XFS-list too, and get this answer: "The XFS log is always write, except recoverying." - Thats right! Next idea is to break more the 32k chunks, and send this previous letter to here. But I have more problems (network layer-bug) with 13-rc3, and try the newer kernel, and the problem is gone. :-) It looks like some network issue. Thanks Janos ----- Original Message ----- From: "Neil Brown" <neilb@cse.unsw.edu.au> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Thursday, August 18, 2005 6:59 AM Subject: Re: perfomance question. > On Tuesday August 16, djani22@dynamicweb.hu wrote: > > Hello list, > > > > I have performance problem. (again) :-) > > > > What chunk size is better in raid5, and raid0? > > The lot of small chunks, or some bigger? > > This is highly dependant one workload and hardware performance. > The best thing to do is develop a test that simulates your real > workload and run it with various stripe sizes, and see which one wins. > > I suspect there would be very little gain in going to very small chunk > sizes (<16k). Anywhere between there and 1Meg is worth trying. > > mdadm uses a default of 64k which is probably not too bad for most > situations, but I cannot promise it being optimal for any. > > Sorry I cannot be more helpful. > > Your performance problem may not be chunk-size related. Maybe > increasing the readahead (with blockdev) would help... > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-08-15 10:50 ` djani22 2005-08-16 13:54 ` perfomance question djani22 @ 2005-08-18 4:34 ` Neil Brown 2005-08-18 15:39 ` djani22 1 sibling, 1 reply; 20+ messages in thread From: Neil Brown @ 2005-08-18 4:34 UTC (permalink / raw) To: djani22; +Cc: linux-raid On Monday August 15, djani22@dynamicweb.hu wrote: > Thanks, I will test it, when I can... > > In this moment, my system is an working online system, and now only one 8TB > space what I can use... > Thats right, maybe I can built linear array from only one soure device,but: > My first problem is, on my 8TB device is already exists XFS filesystem, with > valuable data, what I can't backup. > It is still OK, but I can't insert one raid layer, because the raid's > superblock, and the XFS is'nt shrinkable. :-( > > The only one way (I think) to plug in another raw device, and build an array > from 8TB-device + new small device, to get much space to FS. > > But it is too risky for me! Yes, I wouldn't bother just for testing. I've managed to put together some huge devices with spare files and multi-layer linear arrays (ext3 won't allow files as big as 2TB) and I am happy that the patch works. Longer term, I have been thinking of enhancing mdadm so that when you create a linear array, it copies the few blocks from the end that will be over written by the superblock onto the start of the second device. This would allow a single device to be extended into a linear array without loss. (I also have patches to hot-add devices to the end of a linear array which I really should dust-off and get into mainline). > > Do you think it is safe? > > Currently I use 2.6.13-rc3. > This patch is good for this version, or only the last version? > > Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :) The patch should be good against any reasonable recent version of 2.6. I always work against the latest -mm, but this code has been largely untouched for a while so there shouldn't be any patch conflicts. NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Found a new bug! 2005-08-18 4:34 ` Found a new bug! Neil Brown @ 2005-08-18 15:39 ` djani22 2005-08-20 9:55 ` Oops in raid1? djani22 0 siblings, 1 reply; 20+ messages in thread From: djani22 @ 2005-08-18 15:39 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid ----- Original Message ----- From: "Neil Brown" <neilb@cse.unsw.edu.au> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Thursday, August 18, 2005 6:34 AM Subject: Re: Found a new bug! > On Monday August 15, djani22@dynamicweb.hu wrote: > > Thanks, I will test it, when I can... > > > > In this moment, my system is an working online system, and now only one 8TB > > space what I can use... > > Thats right, maybe I can built linear array from only one soure device,but: > > My first problem is, on my 8TB device is already exists XFS filesystem, with > > valuable data, what I can't backup. > > It is still OK, but I can't insert one raid layer, because the raid's > > superblock, and the XFS is'nt shrinkable. :-( > > > > The only one way (I think) to plug in another raw device, and build an array > > from 8TB-device + new small device, to get much space to FS. > > > > But it is too risky for me! > > Yes, I wouldn't bother just for testing. I've managed to put together > some huge devices with spare files and multi-layer linear arrays (ext3 > won't allow files as big as 2TB) and I am happy that the patch works. > > Longer term, I have been thinking of enhancing mdadm so that when you > create a linear array, it copies the few blocks from the end that will > be over written by the superblock onto the start of the second > device. This would allow a single device to be extended into a linear > array without loss. (I also have patches to hot-add devices to the > end of a linear array which I really should dust-off and get into > mainline). Yes! This is very good idea! I can do that manually with dd, but some people can't. This, and sometimes reverse of this is a usefull options! In my case: I add some small HDD to my big array, to try the patch. Thats ok. But later, when I try to change the small to another big, there is no easy way, to do this. When I copy the small drive with dd or cat to 2nd big array, the superblock is wrong placed. (or not?) > > > > Do you think it is safe? > > > > Currently I use 2.6.13-rc3. > > This patch is good for this version, or only the last version? > > > > Witch is the last? 2.6.13-rc6 or rc6-git7, or 2.6.14 -git cvs? :) > > The patch should be good against any reasonable recent version of > 2.6. I always work against the latest -mm, but this code has been > largely untouched for a while so there shouldn't be any patch > conflicts. Thanks, I will try it! But in the last month my system's downtime is almost more than uptime, and now I try to fix this very bad stat. :-) Janos ^ permalink raw reply [flat|nested] 20+ messages in thread
* Oops in raid1? 2005-08-18 15:39 ` djani22 @ 2005-08-20 9:55 ` djani22 2005-08-20 15:53 ` Pallai Roland 0 siblings, 1 reply; 20+ messages in thread From: djani22 @ 2005-08-20 9:55 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 3895 bytes --] Hello list, Neil! I found this, bud don't know what is this exactly... It is not look like the *NBD's deadlock. :-/ Neil! It is the "original" 2.6.13-rc6, not with your patch! Only with two mods, what I get from netdev list, and attached to this letter.... Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] Unable to handle kernel paging request at virtual address a014d7a5 Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] printing eip: Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] c0118cee Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] *pde = f7bedd02 Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] Oops: 0000 [#1] Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] SMP Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] Modules linked in: netconsole gnbd Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] CPU: 0 Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] EIP: 0060:[<c0118cee>] Not tainted VLI Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] EFLAGS: 00010296 (2.6.13-rc6) Aug 20 01:07:23 192.168.2.50 kernel: [42992885.040000] EIP is at kmap+0x1e/0x54 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] eax: 00000246 ebx: a014d7a5 ecx: c11ef260 edx: cabbc400 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] esi: 00008000 edi: 00000001 ebp: f6c7fe00 esp: f6c7fdf4 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] ds: 007b es: 007b ss: 0068 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] Process md3_raid1 (pid: 2769, threadinfo=f6c7e000 task=f7eef020) Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] Stack: c0577800 00000006 f5f93cfc f6c7fe54 f895a9cc a014d7a5 00000001 c f793000 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] 00001000 00004000 d3fc3180 f73e9bf0 f895e718 cabbc400 007ea037 0 1000000 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] d4175a4c f895e6f0 65000000 00f03d8d 00100000 d4175a4c f895e6f0 f 895e700 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] Call Trace: Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0103ca2>] show_stack+0x9a/0xd0 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0103e6d>] show_registers+0x175/0x209 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c010408c>] die+0xfa/0x17c Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0117b68>] do_page_fault+0x269/0x7bd Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c01038d7>] error_code+0x4f/0x54 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<f895a9cc>] __gnbd_send_req+0x196/0x28d [gnbd] Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<f895af12>] do_gnbd_request+0xe5/0x198 [gnbd] Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0383a0d>] __generic_unplug_device+0x28/0x2e Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c038150f>] __elv_add_request+0xaa/0xac Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0384e5b>] __make_request+0x20d/0x512 Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0385490>] generic_make_request+0xb2/0x27a Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c04748a2>] raid1d+0xbf/0x2cb Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c04825c9>] md_thread+0x134/0x16f Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c01010d5>] kernel_thread_helper+0x5/0xb Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] Code: 89 c1 81 e1 ff ff 0f 00 eb b0 90 90 90 55 89 e5 53 83 ec 08 8b 5d 08 c7 44 24 04 06 00 00 00 c7 04 24 00 78 57 c0 e8 72 47 00 00 <8b> 03 c1 e8 1e 8b 14 85 14 db 73 c0 8b 82 0c 04 00 00 05 00 09 Aug 20 01:07:24 192.168.2.50 Fatal exception: panic in 5 seconds Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] <0>Fatal exception: panic in 5 seconds Aug 20 01:07:27 192.168.2.50 [42992890.060000] Kernel panic - not syncing: Fatal exception Janos [-- Attachment #2: p.txt --] [-- Type: text/plain, Size: 567 bytes --] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1474,6 +1474,10 @@ static void tcp_mark_head_lost(struct so int cnt = packets; BUG_TRAP(cnt <= tp->packets_out); + if (unlikely(cnt > tp->packets_out)) { + printk("packets_out = %d, fackets_out = %d, reordering = %d, sack_ok = 0x%x, mss_cache=%d\n", tp->packets_out, tp->fackets_out, tp->reordering, tp->rx_opt.sack_ok, tp->mss_cache); + dump_stack(); + } sk_stream_for_retrans_queue(skb, sk) { cnt -= tcp_skb_pcount(skb); [-- Attachment #3: fix.txt --] [-- Type: text/plain, Size: 854 bytes --] diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1370,15 +1370,21 @@ int tcp_retransmit_skb(struct sock *sk, if (skb->len > cur_mss) { int old_factor = tcp_skb_pcount(skb); - int new_factor; + int diff; if (tcp_fragment(sk, skb, cur_mss, cur_mss)) return -ENOMEM; /* We'll try again later. */ /* New SKB created, account for it. */ - new_factor = tcp_skb_pcount(skb); - tp->packets_out -= old_factor - new_factor; - tp->packets_out += tcp_skb_pcount(skb->next); + diff = old_factor - tcp_skb_pcount(skb) - + tcp_skb_pcount(skb->next); + tp->packets_out -= diff; + + if (diff > 0) { + tp->fackets_out -= diff; + if ((int)tp->fackets_out < 0) + tp->fackets_out = 0; + } } /* Collapse two adjacent packets if worthwhile and we can. */ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Oops in raid1? 2005-08-20 9:55 ` Oops in raid1? djani22 @ 2005-08-20 15:53 ` Pallai Roland 2005-08-20 16:26 ` djani22 0 siblings, 1 reply; 20+ messages in thread From: Pallai Roland @ 2005-08-20 15:53 UTC (permalink / raw) To: djani22; +Cc: linux-raid Hi, On Sat, 2005-08-20 at 11:55 +0200, djani22@dynamicweb.hu wrote: > I found this, bud don't know what is this exactly... > It is not look like the *NBD's deadlock. :-/ it's exactly a GNBD bug, imho > [...] > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] Process md3_raid1 > (pid: 2769, threadinfo=f6c7e000 task=f7eef020) > [...] > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0117b68>] > do_page_fault+0x269/0x7bd > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c01038d7>] > error_code+0x4f/0x54 > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<f895a9cc>] > __gnbd_send_req+0x196/0x28d [gnbd] > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<f895af12>] > do_gnbd_request+0xe5/0x198 [gnbd] > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0383a0d>] > __generic_unplug_device+0x28/0x2e > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c038150f>] > __elv_add_request+0xaa/0xac > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0384e5b>] > __make_request+0x20d/0x512 > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0385490>] > generic_make_request+0xb2/0x27a > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c04748a2>] > raid1d+0xbf/0x2cb > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c04825c9>] > md_thread+0x134/0x16f > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c01010d5>] > kernel_thread_helper+0x5/0xb -- dap ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Oops in raid1? 2005-08-20 15:53 ` Pallai Roland @ 2005-08-20 16:26 ` djani22 2005-08-20 16:50 ` Pallai Roland 0 siblings, 1 reply; 20+ messages in thread From: djani22 @ 2005-08-20 16:26 UTC (permalink / raw) To: Pallai Roland; +Cc: linux-raid ----- Original Message ----- From: "Pallai Roland" <dap@mail.index.hu> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Saturday, August 20, 2005 5:53 PM Subject: Re: Oops in raid1? > > Hi, > > On Sat, 2005-08-20 at 11:55 +0200, djani22@dynamicweb.hu wrote: > > I found this, bud don't know what is this exactly... > > It is not look like the *NBD's deadlock. :-/ > it's exactly a GNBD bug, imho Hmmm. Possibly... I get this message, when high upload. (disk write) But the GNBD generates this, in that situation, and thats why I think, this is something else... Jul 17 23:05:10 dy-base kernel: ------------[ cut here ]------------ Jul 17 23:05:10 dy-base kernel: kernel BUG at mm/highmem.c:183! Jul 17 23:05:10 dy-base kernel: invalid operand: 0000 [#1] Jul 17 23:05:10 dy-base kernel: PREEMPT SMP Jul 17 23:05:10 dy-base kernel: Modules linked in: gnbd Jul 17 23:05:10 dy-base kernel: CPU: 0 Jul 17 23:05:10 dy-base kernel: EIP: 0060:[<c0155aff>] Tainted: G B VLI Jul 17 23:05:10 dy-base kernel: EFLAGS: 00010246 (2.6.13-rc3-plus-NFS) Jul 17 23:05:10 dy-base kernel: EIP is at kunmap_high+0x1f/0xa0 Jul 17 23:05:10 dy-base kernel: eax: 00000000 ebx: c1a98cc0 ecx: c1a98cc0 edx: 00000202 Jul 17 23:05:10 dy-base kernel: esi: dc9f0900 edi: 00000000 ebp: d5a1e600 esp: ee6c3e74 Jul 17 23:05:10 dy-base kernel: ds: 007b es: 007b ss: 0068 Jul 17 23:05:10 dy-base kernel: Process md4_raid1 (pid: 15185, threadinfo=ee6c2000 task=d224e020) Jul 17 23:05:10 dy-base kernel: Stack: c1a98cc0 00001000 f883fa7e c1a98cc0 00000001 c009c000 00001000 00000000 Jul 17 23:05:10 dy-base kernel: 40e38500 003d2431 007ea037 01000000 e593104c ee6c2000 5d000000 0000faff Jul 17 23:05:10 dy-base kernel: 00200000 c055b2c6 e593104c f8842d08 f8842d18 f6e7abf0 f884001b f8842d08 Jul 17 23:05:10 dy-base kernel: Call Trace: ---> Jul 17 23:05:10 dy-base kernel: [<f883fa7e>] __gnbd_send_req+0x15e/0x280 [gnbd] Jul 17 23:05:10 dy-base kernel: [<c055b2c6>] preempt_schedule+0x56/0x80 Jul 17 23:05:10 dy-base kernel: [<f884001b>] do_gnbd_request+0xeb/0x1a0 [gnbd] Jul 17 23:05:10 dy-base kernel: [<c03788f6>] __generic_unplug_device+0x36/0x40 Jul 17 23:05:10 dy-base kernel: [<c037891e>] generic_unplug_device+0x1e/0x30 Jul 17 23:05:10 dy-base kernel: [<c0461018>] unplug_slaves+0xe8/0x100 Jul 17 23:05:10 dy-base kernel: [<c0462405>] raid1d+0x205/0x2a0 Jul 17 23:05:10 dy-base kernel: [<c0470919>] md_thread+0x159/0x1a0 Jul 17 23:05:10 dy-base kernel: [<c0137370>] autoremove_wake_function+0x0/0x60 Jul 17 23:05:10 dy-base kernel: [<c01030d2>] ret_from_fork+0x6/0x14 Jul 17 23:05:10 dy-base kernel: [<c0137370>] autoremove_wake_function+0x0/0x60 Jul 17 23:05:10 dy-base kernel: [<c04707c0>] md_thread+0x0/0x1a0 Jul 17 23:05:10 dy-base kernel: [<c0101205>] kernel_thread_helper+0x5/0x10 Jul 17 23:05:10 dy-base kernel: Code: ff 8d 74 26 00 8d bc 27 00 00 00 00 83 ec 08 89 5c 24 04 89 c3 b8 80 10 6d c0 e8 0d 68 4 Jul 17 23:05:10 dy-base kernel: <6>note: md4_raid1[15185] exited with preempt_count 1 Thanks Janos > > > [...] > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] Process md3_raid1 > > (pid: 2769, threadinfo=f6c7e000 task=f7eef020) > > [...] > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0117b68>] > > do_page_fault+0x269/0x7bd > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c01038d7>] > > error_code+0x4f/0x54 > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<f895a9cc>] > > __gnbd_send_req+0x196/0x28d [gnbd] > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<f895af12>] > > do_gnbd_request+0xe5/0x198 [gnbd] > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0383a0d>] > > __generic_unplug_device+0x28/0x2e > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c038150f>] > > __elv_add_request+0xaa/0xac > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0384e5b>] > > __make_request+0x20d/0x512 > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c0385490>] > > generic_make_request+0xb2/0x27a > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c04748a2>] > > raid1d+0xbf/0x2cb > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c04825c9>] > > md_thread+0x134/0x16f > > Aug 20 01:07:24 192.168.2.50 kernel: [42992885.040000] [<c01010d5>] > > kernel_thread_helper+0x5/0xb > > > -- > dap > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Oops in raid1? 2005-08-20 16:26 ` djani22 @ 2005-08-20 16:50 ` Pallai Roland 2005-08-20 16:57 ` djani22 0 siblings, 1 reply; 20+ messages in thread From: Pallai Roland @ 2005-08-20 16:50 UTC (permalink / raw) To: djani22; +Cc: linux-raid On Sat, 2005-08-20 at 18:26 +0200, djani22@dynamicweb.hu wrote: > I get this message, when high upload. (disk write) > But the GNBD generates this, in that situation, and thats why I think, this > is something else... yes, seems like it's an another bug in the GNBD, but the backtrace is clear in the first case too, the request to the underlying device what's generated that panic, not the raid1d's own. all in all, try to disable the preempt mode, that may help.. -- dap ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Oops in raid1? 2005-08-20 16:50 ` Pallai Roland @ 2005-08-20 16:57 ` djani22 0 siblings, 0 replies; 20+ messages in thread From: djani22 @ 2005-08-20 16:57 UTC (permalink / raw) To: Pallai Roland; +Cc: linux-raid ----- Original Message ----- From: "Pallai Roland" <dap@mail.index.hu> To: <djani22@dynamicweb.hu> Cc: <linux-raid@vger.kernel.org> Sent: Saturday, August 20, 2005 6:50 PM Subject: Re: Oops in raid1? > > On Sat, 2005-08-20 at 18:26 +0200, djani22@dynamicweb.hu wrote: > > I get this message, when high upload. (disk write) > > But the GNBD generates this, in that situation, and thats why I think, this > > is something else... > yes, seems like it's an another bug in the GNBD, but the backtrace is > clear in the first case too, the request to the underlying device what's > generated that panic, not the raid1d's own. OK, I understand. :-) In this case, I'll send it to RedHat's list... > > all in all, try to disable the preempt mode, that may help.. Yes, I know it! :-) The preempt-kernel is much older! :-) Thanks for help! Janos > > > -- > dap ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-08-20 16:57 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050717182650.24540.patches@notabene>
2005-07-17 8:27 ` [PATCH md ] When resizing an array, we need to update resync_max_sectors as well as size NeilBrown
2005-07-17 12:10 ` Found a new bug! djani22
2005-07-17 22:13 ` Neil Brown
2005-07-17 22:31 ` djani22
2005-08-14 22:38 ` djani22
2005-08-15 1:21 ` Neil Brown
2005-08-15 10:50 ` djani22
2005-08-16 13:54 ` perfomance question djani22
2005-08-16 14:30 ` RAID6 Query Colonel Hell
2005-08-16 15:40 ` dean gaudet
2005-08-16 16:44 ` Colonel Hell
2005-08-18 4:59 ` perfomance question Neil Brown
2005-08-18 15:20 ` djani22
2005-08-18 4:34 ` Found a new bug! Neil Brown
2005-08-18 15:39 ` djani22
2005-08-20 9:55 ` Oops in raid1? djani22
2005-08-20 15:53 ` Pallai Roland
2005-08-20 16:26 ` djani22
2005-08-20 16:50 ` Pallai Roland
2005-08-20 16:57 ` djani22
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).