* RE: Degraded Array
2010-12-04 4:26 ` Majed B.
@ 2010-12-04 4:47 ` Leslie Rhorer
2010-12-04 6:44 ` Neil Brown
1 sibling, 0 replies; 6+ messages in thread
From: Leslie Rhorer @ 2010-12-04 4:47 UTC (permalink / raw)
To: linux-raid
> -----Original Message-----
> From: Majed B. [mailto:majedb@gmail.com]
> Sent: Friday, December 03, 2010 10:27 PM
> To: lrhorer@satx.rr.com
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Degraded Array
>
> You have a degraded array now with 1 disk down. If you proceed, more
> disks might pop out due to errors.
Well, sort of. A significant fraction of the data is now striped
across 12 + 0 drives, rather than 11 + 1. There are no errors occurring on
the drives, although of course an unrecoverable error could happen at any
time.
> It's best to backup your data, run a check on the array, fix it then
The data is backed up. Except in extreme circumstances, I would
never start a re-shape without a current backup.
> run a check on the array, fix it then, try to resume the reshape.
The array is in good health, other than the two kicked drives. I'm not sure
I understand what you mean, though. I'm asking about the two offline
drives. Should I add the 13th back? It still has substantially the same
data as the other 12 drives, discounting the amount that has been
re-written. If so, how can I safely stop the array re-shape and re-add the
drive? (This is under mdadm 2.6.7.2.)
>
> On Sat, Dec 4, 2010 at 5:42 AM, Leslie Rhorer <lrhorer@satx.rr.com> wrote:
> >
> > Hello everyone.
> >
> > I was just growing one of my RAID6 arrays from 13 to 14
> > members. The array growth had passed its critical stage and had been
> > growing for several minutes when the system came to a screeching halt.
> > It hit the big red switch, and when the system rebooted, the array
I meant to type *I*, not *It*.
> > but two members are missing. One of the members is the new drive and
> the
> > other is the 13th drive in the RAID set. Of course, the array can run
> well
> > enough with only 12 members, but its definitely not the best situation,
> > especially since the re-shape will take another day and a half. Is it
> best
> > I go ahead and leave the array in its current state until the re-shape
> is
> > done, or should I go ahead and add back the two failed drives?
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Degraded Array
2010-12-04 4:26 ` Majed B.
2010-12-04 4:47 ` Leslie Rhorer
@ 2010-12-04 6:44 ` Neil Brown
2010-12-04 8:53 ` Leslie Rhorer
2010-12-11 4:29 ` Leslie Rhorer
1 sibling, 2 replies; 6+ messages in thread
From: Neil Brown @ 2010-12-04 6:44 UTC (permalink / raw)
To: Majed B.; +Cc: lrhorer, linux-raid
On Sat, 4 Dec 2010 07:26:36 +0300 "Majed B." <majedb@gmail.com> wrote:
> You have a degraded array now with 1 disk down. If you proceed, more
> disks might pop out due to errors.
>
> It's best to backup your data, run a check on the array, fix it then
> try to resume the reshape.
Backups are always a good idea, but are sometimes impractical.
I don't think running a 'check' would help at all. A 'reshape' will do much
the same sort of work, and more.
It isn't strictly true that the array is '1 disk down'. Parts of it are 1
disk down, parts are 2 disks down. As the reshape progresses more and more
will be 2 disks down. We don't really want that.
This case isn't really handled well at present. You want to do a 'recovery'
and a 'reshape' at the same time. This is quite possible, but doesn't
currently happen when you restart a reshape in the middle (added to my todo
list).
I suggest you:
- apply the patch below to mdadm.
- assemble the array with --update=revert-reshape. You should give
it a --backup-file too.
- let the reshape complete so you are back to 13 devices.
- add a spare and let it recovery
- then add a spare and reshape the array.
Of course you needed to be running a new enough kernel to be able decrease
the number of devices in a raid5.
NeilBrown
>
> On Sat, Dec 4, 2010 at 5:42 AM, Leslie Rhorer <lrhorer@satx.rr.com> wrote:
> >
> > Hello everyone.
> >
> > I was just growing one of my RAID6 arrays from 13 to 14
> > members. The array growth had passed its critical stage and had been
> > growing for several minutes when the system came to a screeching halt. It
> > hit the big red switch, and when the system rebooted, the array assembled,
> > but two members are missing. One of the members is the new drive and the
> > other is the 13th drive in the RAID set. Of course, the array can run well
> > enough with only 12 members, but it’s definitely not the best situation,
> > especially since the re-shape will take another day and a half. Is it best
> > I go ahead and leave the array in its current state until the re-shape is
> > done, or should I go ahead and add back the two failed drives?
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Majed B.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
commit 12bab17f765a4130c7bd133a0bbb3b83f3f492b0
Author: NeilBrown <neilb@suse.de>
Date: Sat Dec 4 17:37:14 2010 +1100
Support reverting of reshape.
Allow --update=revert-reshape to do what you would expect.
FIXME
needs review. Think about interface and use cases.
Document.
diff --git a/Assemble.c b/Assemble.c
index afd4e60..c034e37 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -592,6 +592,12 @@ int Assemble(struct supertype *st, char *mddev,
/* Ok, no bad inconsistancy, we can try updating etc */
bitmap_done = 0;
content->update_private = NULL;
+ if (update && strcmp(update, "revert-reshape") == 0 &&
+ (content->reshape_active == 0 || content->delta_disks <= 0)) {
+ fprintf(stderr, Name ": Cannot revert-reshape on this array\n");
+ close(mdfd);
+ return 1;
+ }
for (tmpdev = devlist; tmpdev; tmpdev=tmpdev->next) if (tmpdev->used == 1) {
char *devname = tmpdev->devname;
struct stat stb;
diff --git a/mdadm.c b/mdadm.c
index 08e8ea4..7cf51b5 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -662,6 +662,8 @@ int main(int argc, char *argv[])
continue;
if (strcmp(update, "devicesize")==0)
continue;
+ if (strcmp(update, "revert-reshape")==0)
+ continue;
if (strcmp(update, "byteorder")==0) {
if (ss) {
fprintf(stderr, Name ": must not set metadata type with --update=byteorder.\n");
@@ -688,7 +690,8 @@ int main(int argc, char *argv[])
}
fprintf(outf, "Valid --update options are:\n"
" 'sparc2.2', 'super-minor', 'uuid', 'name', 'resync',\n"
- " 'summaries', 'homehost', 'byteorder', 'devicesize'.\n");
+ " 'summaries', 'homehost', 'byteorder', 'devicesize',\n"
+ " 'revert-reshape'.\n");
exit(outf == stdout ? 0 : 2);
case O(INCREMENTAL,NoDegraded):
diff --git a/super0.c b/super0.c
index ae3e885..01d5cfa 100644
--- a/super0.c
+++ b/super0.c
@@ -545,6 +545,19 @@ static int update_super0(struct supertype *st, struct mdinfo *info,
}
if (strcmp(update, "_reshape_progress")==0)
sb->reshape_position = info->reshape_progress;
+ if (strcmp(update, "revert-reshape") == 0 &&
+ sb->minor_version > 90 && sb->delta_disks != 0) {
+ int tmp;
+ sb->raid_disks -= sb->delta_disks;
+ sb->delta_disks = - sb->delta_disks;
+ tmp = sb->new_layout;
+ sb->new_layout = sb->layout;
+ sb->layout = tmp;
+
+ tmp = sb->new_chunk;
+ sb->new_chunk = sb->chunk_size;
+ sb->chunk_size = tmp;
+ }
sb->sb_csum = calc_sb0_csum(sb);
return rv;
diff --git a/super1.c b/super1.c
index 0eb0323..805777e 100644
--- a/super1.c
+++ b/super1.c
@@ -781,6 +781,19 @@ static int update_super1(struct supertype *st, struct mdinfo *info,
}
if (strcmp(update, "_reshape_progress")==0)
sb->reshape_position = __cpu_to_le64(info->reshape_progress);
+ if (strcmp(update, "revert-reshape") == 0 && sb->delta_disks) {
+ __u32 temp;
+ sb->raid_disks = __cpu_to_le32(__le32_to_cpu(sb->raid_disks) + __le32_to_cpu(sb->delta_disks));
+ sb->delta_disks = __cpu_to_le32(-__le32_to_cpu(sb->delta_disks));
+ printf("REverted to %d\n", (int)__le32_to_cpu(sb->delta_disks));
+ temp = sb->new_layout;
+ sb->new_layout = sb->layout;
+ sb->layout = temp;
+
+ temp = sb->new_chunk;
+ sb->new_chunk = sb->chunksize;
+ sb->chunksize = temp;
+ }
sb->sb_csum = calc_sb_1_csum(sb);
return rv;
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 6+ messages in thread