* [PATCH 0/3] UT and error case changes @ 2011-03-14 14:09 Adam Kwolek 2011-03-14 14:09 ` [PATCH 1/3] imsm: FIX: existing backup file fails unit tests Adam Kwolek ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Adam Kwolek @ 2011-03-14 14:09 UTC (permalink / raw) To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer The following series implements 2 changes: 1. Fix for unit tests failure. UT suits 12 and 13 fails, when backup file cannot be opened for grow operation (backup file exists already). 2. I've got proposal for handling/rollback metadata in error case. Currently in case of error external metadata can remain in reshape state. In some cases metadata can be automatically restored to initial state (i.e. metadata during imsm container operation can be rolled back when error occurs on first reshaped array before reshape is started). For such cases, additional superswitch function can be introduced. Metadata shouldn't be rolled back in restart case. I'm passing restart flag to abort function in Grow.c only, as this is general rule. In the same way array reshape state is checked. This is proposal, so I've put no implementation in to imsm handler (no metadata update is created yet). Please let me know your opinion. If you will like it, I'll fill out gaps in imsm code. BR Adam --- Adam Kwolek (3): imsm: Add metadata abort changes handler template External metadata has to be restored to initial state in error case imsm: FIX: existing backup file fails unit tests Grow.c | 26 ++++++++++++++++++++++++++ mdadm.h | 10 ++++++++++ super-intel.c | 15 +++++++++++++++ tests/imsm-grow-template | 8 ++++++-- 4 files changed, 57 insertions(+), 2 deletions(-) -- Signature ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] imsm: FIX: existing backup file fails unit tests 2011-03-14 14:09 [PATCH 0/3] UT and error case changes Adam Kwolek @ 2011-03-14 14:09 ` Adam Kwolek 2011-03-14 14:09 ` [PATCH 2/3] External metadata has to be restored to initial state in error case Adam Kwolek ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Adam Kwolek @ 2011-03-14 14:09 UTC (permalink / raw) To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer During normal test execution, backup file is deleted after test execution. If test is interrupted/broken, backup file can remain for next run. When backup file exists before unit test run, suits 12 and 13 fails. To avoid this remove backup file before grow is executed. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> --- tests/imsm-grow-template | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/imsm-grow-template b/tests/imsm-grow-template index 7c212c4..d03752d 100644 --- a/tests/imsm-grow-template +++ b/tests/imsm-grow-template @@ -17,8 +17,10 @@ function grow_member() { local offset=$6 local chunk=$7 local array_size=$((comps * size)) + local backup_imsm=/tmp/backup_imsm - ( set -ex; mdadm --grow $member --chunk=$chunk --level=$level --backup-file=/tmp/backup_imsm ) + rm -f $backup_imsm + ( set -ex; mdadm --grow $member --chunk=$chunk --level=$level --backup-file=$backup_imsm ) local status=$? if [ $negative_test -ne 0 ]; then if [ $status -eq 0 ]; then @@ -71,6 +73,7 @@ done imsm_check container $num_disks num_disks=$((num_disks + add_to_num_disks)) +backup_imsm=/tmp/backup_imsm # Grow each member or a container depending on the type of an operation if [ $migration_test -ne 0 ]; then @@ -82,7 +85,8 @@ if [ $migration_test -ne 0 ]; then grow_member $member1 $new_num_disks $vol1_new_num_comps $vol1_new_level $vol1_comp_size $vol1_offset $vol1_new_chunk fi else - ( set -x; mdadm --grow $container --raid-disks=$num_disks --backup-file=/tmp/backup_imsm ) + rm -f $backup_imsm + ( set -x; mdadm --grow $container --raid-disks=$num_disks --backup-file=$backup_imsm ) grow_status=$? if [ $negative_test -ne 0 ]; then if [ $grow_status -eq 0 ]; then ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] External metadata has to be restored to initial state in error case 2011-03-14 14:09 [PATCH 0/3] UT and error case changes Adam Kwolek 2011-03-14 14:09 ` [PATCH 1/3] imsm: FIX: existing backup file fails unit tests Adam Kwolek @ 2011-03-14 14:09 ` Adam Kwolek 2011-03-14 14:09 ` [PATCH 3/3] imsm: Add metadata abort changes handler template Adam Kwolek 2011-03-14 21:53 ` [PATCH 0/3] UT and error case changes NeilBrown 3 siblings, 0 replies; 14+ messages in thread From: Adam Kwolek @ 2011-03-14 14:09 UTC (permalink / raw) To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer When any error occurs in reshape_array(), mdadm should have ability to rollback changes. Based on metadata knowledge abort_metadata_changes() should decide: 1. metadata abort is allowed? In case of container operation abort can be allowed on first array only It depends on metadata implementation. 2. If rollback is possible, execute required action to prepare update. Returned value is: 1 if rollback succeed (send update) 0 if rollback not possible (no action) -1 for other error Metadata handler can be called only if reshape is not started in md. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> --- Grow.c | 26 ++++++++++++++++++++++++++ mdadm.h | 10 ++++++++++ 2 files changed, 36 insertions(+), 0 deletions(-) diff --git a/Grow.c b/Grow.c index 40e693e..26fd7eb 100644 --- a/Grow.c +++ b/Grow.c @@ -594,6 +594,31 @@ static void sync_metadata(struct supertype *st) } } +static int abort_metadata_changes(struct supertype *st, struct mdinfo *sra, + int restart) +{ + int ret_val = 0; + char action[40]; + + /* for not restarted external metadata only + */ + if (!st->ss->external || restart) + return ret_val; + + /* check if reshape is not started in md + */ + if (sra && sysfs_get_str(sra, NULL, "sync_action", action, 40) > 0 && + strncmp(action, "reshape", 7) == 0) + return ret_val; + + if (st->ss->abort_metadata_changes) + ret_val = st->ss->abort_metadata_changes(st); + if (ret_val > 0) + sync_metadata(st); + + return ret_val; +} + static int subarray_set_num(char *container, struct mdinfo *sra, char *name, int n) { /* when dealing with external metadata subarrays we need to be @@ -2187,6 +2212,7 @@ out: exit(0); release: + abort_metadata_changes(st, sra, restart); if (orig_level != UnSet && sra) { c = map_num(pers, orig_level); if (c && sysfs_set_str(sra, NULL, "level", c) == 0) diff --git a/mdadm.h b/mdadm.h index d3ed50a..d502b13 100644 --- a/mdadm.h +++ b/mdadm.h @@ -727,6 +727,16 @@ extern struct superswitch { struct supertype *st, unsigned long blocks, int *fds, unsigned long long *offsets, int dests, int *destfd, unsigned long long *destoffsets); + /* abort_metadata_changes() will rollbacks metadata changes + * in case of error. + * It can be called for not restart case + * and when reshape has not been started in md. + * Function should return values: + * 1 if abort succeed + * 0 if abort not possible + * -1 for other error + */ + int (*abort_metadata_changes)(struct supertype *st); /* for mdmon */ int (*open_new)(struct supertype *c, struct active_array *a, ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] imsm: Add metadata abort changes handler template 2011-03-14 14:09 [PATCH 0/3] UT and error case changes Adam Kwolek 2011-03-14 14:09 ` [PATCH 1/3] imsm: FIX: existing backup file fails unit tests Adam Kwolek 2011-03-14 14:09 ` [PATCH 2/3] External metadata has to be restored to initial state in error case Adam Kwolek @ 2011-03-14 14:09 ` Adam Kwolek 2011-03-14 21:53 ` [PATCH 0/3] UT and error case changes NeilBrown 3 siblings, 0 replies; 14+ messages in thread From: Adam Kwolek @ 2011-03-14 14:09 UTC (permalink / raw) To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer Add metadata rollback function implementation/placeholder for imsm metadata. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> --- super-intel.c | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/super-intel.c b/super-intel.c index 44c100b..2c2bd1e 100644 --- a/super-intel.c +++ b/super-intel.c @@ -7364,6 +7364,20 @@ static int imsm_manage_reshape( afd, sra, reshape, st, stripes, fds, offsets, dests, destfd, destoffsets); } + +/* imsm metadata rollback handler, manage metadata for error scenario + */ +static int imsm_abort_metadata_changes(struct supertype *st) +{ + int ret_val = -1; + + dprintf("imsm: imsm_abort_metadata_changes() called\n"); + + /* ToDo: when possible, prepare rollback metadata update + */ + + return ret_val; +} #endif /* MDASSEMBLE */ struct superswitch super_imsm = { @@ -7386,6 +7400,7 @@ struct superswitch super_imsm = { .get_disk_controller_domain = imsm_get_disk_controller_domain, .reshape_super = imsm_reshape_super, .manage_reshape = imsm_manage_reshape, + .abort_metadata_changes = imsm_abort_metadata_changes, #endif .match_home = match_home_imsm, .uuid_from_super= uuid_from_super_imsm, ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] UT and error case changes 2011-03-14 14:09 [PATCH 0/3] UT and error case changes Adam Kwolek ` (2 preceding siblings ...) 2011-03-14 14:09 ` [PATCH 3/3] imsm: Add metadata abort changes handler template Adam Kwolek @ 2011-03-14 21:53 ` NeilBrown 2011-03-15 7:28 ` Kwolek, Adam 3 siblings, 1 reply; 14+ messages in thread From: NeilBrown @ 2011-03-14 21:53 UTC (permalink / raw) To: Adam Kwolek Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote: > The following series implements 2 changes: > 1. Fix for unit tests failure. > UT suits 12 and 13 fails, when backup file cannot be opened > for grow operation (backup file exists already). Thanks - applied. > > 2. I've got proposal for handling/rollback metadata in error case. > Currently in case of error external metadata can remain in reshape state. > In some cases metadata can be automatically restored to initial state > (i.e. metadata during imsm container operation can be rolled back > when error occurs on first reshaped array before reshape is started). > For such cases, additional superswitch function can be introduced. > > Metadata shouldn't be rolled back in restart case. > I'm passing restart flag to abort function in Grow.c only, as this is general rule. > In the same way array reshape state is checked. > > This is proposal, so I've put no implementation in to imsm handler (no metadata update is created yet). > Please let me know your opinion. If you will like it, I'll fill out gaps in imsm code. I'm not sure.. it sounds like it might be a good idea, but I'd like to have some concrete examples to help me think about it. Do you have an example in mind of an error which might be detected after the metadata has been updated, but before the reshape has actually started, and for which reverting the metadata update is likely to be useful? The more general case of changing your mind just after a reshape has started and asking the reshape to revert would certainly be useful, but md isn't capable of that in general (yet). Thanks, NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 0/3] UT and error case changes 2011-03-14 21:53 ` [PATCH 0/3] UT and error case changes NeilBrown @ 2011-03-15 7:28 ` Kwolek, Adam 2011-03-18 2:07 ` NeilBrown 2011-03-22 2:23 ` Something wrong with __prep_thunderdome in super-intel.c NeilBrown 0 siblings, 2 replies; 14+ messages in thread From: Kwolek, Adam @ 2011-03-15 7:28 UTC (permalink / raw) To: NeilBrown Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of NeilBrown > Sent: Monday, March 14, 2011 10:54 PM > To: Kwolek, Adam > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed; > Neubauer, Wojciech > Subject: Re: [PATCH 0/3] UT and error case changes > > On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek <adam.kwolek@intel.com> > wrote: > > > The following series implements 2 changes: > > 1. Fix for unit tests failure. > > UT suits 12 and 13 fails, when backup file cannot be opened > > for grow operation (backup file exists already). > > Thanks - applied. > > > > > > 2. I've got proposal for handling/rollback metadata in error case. > > Currently in case of error external metadata can remain in reshape > state. > > In some cases metadata can be automatically restored to initial state > > (i.e. metadata during imsm container operation can be rolled back > > when error occurs on first reshaped array before reshape is started). > > For such cases, additional superswitch function can be introduced. > > > > Metadata shouldn't be rolled back in restart case. > > I'm passing restart flag to abort function in Grow.c only, as this is > general rule. > > In the same way array reshape state is checked. > > > > This is proposal, so I've put no implementation in to imsm handler (no > metadata update is created yet). > > Please let me know your opinion. If you will like it, I'll fill out > gaps in imsm code. > > I'm not sure.. it sounds like it might be a good idea, but I'd like to > have > some concrete examples to help me think about it. I've got this idea during correcting UT so, i.e.: wrong/already existing/ backup file name passed by used. Backup file verification/opening is when metadata is in reshape state already. This error causes mdadm exit at reshape position == 0. It is too early position for reshape restart. User has manually restore metadata information and this can be quite difficult. Metadata rollback can be possible for container operation if: 1. external metadata case (checked in Grow.c) 2. This is action on first array in container (check in metadata handler) 2. md doesn't start reshape (checked in Grow.c) 3. Other conditions/metadata specific (checked in Grow.c) We can also pass some additional status to reshape_container() to indicate that metadata is rolled back and allow for container unfreeze in such case also. This can allow mdadm to leave array/container in state that work is started from (hopefully ;)). BR Adam > > Do you have an example in mind of an error which might be detected after > the metadata has been updated, but before the reshape has actually > started, > and for which reverting the metadata update is likely to be useful? > > The more general case of changing your mind just after a reshape has > started and asking the reshape to revert would certainly be useful, but > md isn't capable of that in general (yet). > > Thanks, > NeilBrown > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] UT and error case changes 2011-03-15 7:28 ` Kwolek, Adam @ 2011-03-18 2:07 ` NeilBrown 2011-03-22 2:23 ` Something wrong with __prep_thunderdome in super-intel.c NeilBrown 1 sibling, 0 replies; 14+ messages in thread From: NeilBrown @ 2011-03-18 2:07 UTC (permalink / raw) To: Kwolek, Adam Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed, Neubauer, Wojciech On Tue, 15 Mar 2011 07:28:00 +0000 "Kwolek, Adam" <adam.kwolek@intel.com> wrote: > > > > -----Original Message----- > > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > > owner@vger.kernel.org] On Behalf Of NeilBrown > > Sent: Monday, March 14, 2011 10:54 PM > > To: Kwolek, Adam > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed; > > Neubauer, Wojciech > > Subject: Re: [PATCH 0/3] UT and error case changes > > > > On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek <adam.kwolek@intel.com> > > wrote: > > > > > The following series implements 2 changes: > > > 1. Fix for unit tests failure. > > > UT suits 12 and 13 fails, when backup file cannot be opened > > > for grow operation (backup file exists already). > > > > Thanks - applied. > > > > > > > > > > 2. I've got proposal for handling/rollback metadata in error case. > > > Currently in case of error external metadata can remain in reshape > > state. > > > In some cases metadata can be automatically restored to initial state > > > (i.e. metadata during imsm container operation can be rolled back > > > when error occurs on first reshaped array before reshape is started). > > > For such cases, additional superswitch function can be introduced. > > > > > > Metadata shouldn't be rolled back in restart case. > > > I'm passing restart flag to abort function in Grow.c only, as this is > > general rule. > > > In the same way array reshape state is checked. > > > > > > This is proposal, so I've put no implementation in to imsm handler (no > > metadata update is created yet). > > > Please let me know your opinion. If you will like it, I'll fill out > > gaps in imsm code. > > > > I'm not sure.. it sounds like it might be a good idea, but I'd like to > > have > > some concrete examples to help me think about it. > > > > I've got this idea during correcting UT so, i.e.: > wrong/already existing/ backup file name passed by used. > Backup file verification/opening is when metadata is in reshape state already. > This error causes mdadm exit at reshape position == 0. It is too early position for reshape restart. > User has manually restore metadata information and this can be quite difficult. > > Metadata rollback can be possible for container operation if: > 1. external metadata case (checked in Grow.c) > 2. This is action on first array in container (check in metadata handler) > 2. md doesn't start reshape (checked in Grow.c) > 3. Other conditions/metadata specific (checked in Grow.c) > > We can also pass some additional status to reshape_container() to indicate that metadata is rolled back > and allow for container unfreeze in such case also. > This can allow mdadm to leave array/container in state that work is started from (hopefully ;)). > > BR > Adam > I think I see your point - there are some issues with some failure possibilities at start-up. We should of course do as much checking as possible before committing to the reshape, and I think we do. But it is still possible that something could go wrong. It would probably be good to take the first backup after freezing IO, but before updating the metadata to show that a reshape has started. That would ensure that a crash at any time will either be able to replay the backup and continue the reshape, or won't see the reshape at all. It might be a bit awkward to do that with the current code, I'm not sure. Another possibility is to at least write out a 'zeroed' backup file and arrange that when we restart and array, if the backup file is empty, then that is a clear sign that no reshape has started and ->update_super("_reshape_progress") could revert the reshape. Longer term: I want to eventually teach md to be able to revert a reshape that has already started. In that case we will need some sort of metadata method to record that reshape is going backwards - though it may end up looking like something we already have... NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Something wrong with __prep_thunderdome in super-intel.c 2011-03-15 7:28 ` Kwolek, Adam 2011-03-18 2:07 ` NeilBrown @ 2011-03-22 2:23 ` NeilBrown 2011-03-25 2:40 ` Dan Williams 1 sibling, 1 reply; 14+ messages in thread From: NeilBrown @ 2011-03-22 2:23 UTC (permalink / raw) To: Williams, Dan J Cc: Kwolek, Adam, linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech Hi Dan, I think you were the original author of imsm_thunderdome and __prep_thunderdome - yes? I found a case (thank to the test suite) where it isn't working correctly, If I have a container with 2 devices, and the second one is failed, then imsm_thunderdome returns NULL. __prep_thunderdome sees the first and adds it to the table of superblocks. Then it sees the second notices the family_num and checksum are the same, and so replaces the first with the second in the table. Then in imsm_thunderdome, d->serial is full of 'nul', so disk_list_get doesn't find anything so the super_table becomes empty and nothing works. So it could be: load_and_parse_mpb is wrong for putting the nul serial in there __prep_thunderdome is wrong for thinking the two are equivalent imsm_thunderdome is wrong for giving up when just one device isn't found and I really don't know which. You can easily reproduce with ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01] ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm ./mdadm /dev/md/r1 -f /dev/loop1 ./mdadm -E /dev/md/imsm and notice that nothing gets printed. If you fail loop0 instead, it works properly. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Something wrong with __prep_thunderdome in super-intel.c 2011-03-22 2:23 ` Something wrong with __prep_thunderdome in super-intel.c NeilBrown @ 2011-03-25 2:40 ` Dan Williams 2011-03-25 8:43 ` Kwolek, Adam 2011-03-28 1:35 ` NeilBrown 0 siblings, 2 replies; 14+ messages in thread From: Dan Williams @ 2011-03-25 2:40 UTC (permalink / raw) To: NeilBrown Cc: Kwolek, Adam, linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech, Wojcik, Krzysztof On Mon, 2011-03-21 at 19:23 -0700, NeilBrown wrote: > Hi Dan, > > I think you were the original author of imsm_thunderdome and > __prep_thunderdome - yes? yes. > > I found a case (thank to the test suite) where it isn't working correctly, > > If I have a container with 2 devices, and the second one is failed, then > imsm_thunderdome returns NULL. > > __prep_thunderdome sees the first and adds it to the table of superblocks. > Then it sees the second notices the family_num and checksum are the same, and > so replaces the first with the second in the table. > > Then in imsm_thunderdome, d->serial is full of 'nul', so disk_list_get > doesn't find anything so the super_table becomes empty and nothing works. > > So it could be: > load_and_parse_mpb is wrong for putting the nul serial in there > __prep_thunderdome is wrong for thinking the two are equivalent > imsm_thunderdome is wrong for giving up when just one device isn't found > > and I really don't know which. hmm... <context switch out of isci driver review mode> > > You can easily reproduce with > > ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01] > ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm > ./mdadm /dev/md/r1 -f /dev/loop1 > ./mdadm -E /dev/md/imsm > > and notice that nothing gets printed. > If you fail loop0 instead, it works properly. The first thought is that the generation number on the good device should be ahead of the bad, but lo and behold it isn't. We should stop advancing the generation number on failed devices. The regression that Krzysztof's patch somewhat addresses is that thunderdome expects that failed devices should at least be able to look themselves up in their own mpb. Fixing up the serial number so that other devices see the other disk as out of date is the expectation. Stopping metadata write outs to failed devices fixes both problems. Krzysztof care to try the following with your recent change reverted? I verified it addresses the problem above, but I have not had a chance to double check the orom compatibility (although I suspect it will be ok). This also simplifies the calling convention to mark_missing and mark_failure. -- Dan diff --git a/super-intel.c b/super-intel.c index 2b41e08..6a6f738 100644 --- a/super-intel.c +++ b/super-intel.c @@ -5213,13 +5213,14 @@ static int is_resyncing(struct imsm_dev *dev) } /* return true if we recorded new information */ -static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) +static int mark_failure(struct imsm_dev *dev, struct dl *dl) { __u32 ord; int slot; struct imsm_map *map; char buf[MAX_RAID_SERIAL_LEN+3]; - unsigned int len, shift = 0; + struct imsm_disk *disk = &dl->disk; + unsigned int len, shift = 0, idx = dl->index; /* new failures are always set in map[0] */ map = get_imsm_map(dev, 0); @@ -5232,6 +5233,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) if (is_failed(disk) && (ord & IMSM_ORD_REBUILD)) return 0; + dl->index = -2; sprintf(buf, "%s:0", disk->serial); if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN) shift = len - MAX_RAID_SERIAL_LEN + 1; @@ -5244,9 +5246,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) return 1; } -static void mark_missing(struct imsm_dev *dev, struct imsm_disk *disk, int idx) +static void mark_missing(struct imsm_dev *dev, struct dl *dl) { - mark_failure(dev, disk, idx); + struct imsm_disk *disk = &dl->disk; + + mark_failure(dev, dl); if (disk->scsi_id == __cpu_to_le32(~(__u32)0)) return; @@ -5269,7 +5273,7 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev) dprintf("imsm: mark missing\n"); end_migration(dev, map_state); for (dl = super->missing; dl; dl = dl->next) - mark_missing(dev, &dl->disk, dl->index); + mark_missing(dev, dl); super->updates_pending++; } @@ -5497,7 +5501,7 @@ static void imsm_set_disk(struct active_array *a, int n, int state) struct intel_super *super = a->container->sb; struct imsm_dev *dev = get_imsm_dev(super, inst); struct imsm_map *map = get_imsm_map(dev, 0); - struct imsm_disk *disk; + struct dl *dl; int failed; __u32 ord; __u8 map_state; @@ -5512,11 +5516,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state) dprintf("imsm: set_disk %d:%x\n", n, state); ord = get_imsm_ord_tbl_ent(dev, n, -1); - disk = get_imsm_disk(super, ord_to_idx(ord)); + dl = get_imsm_dl_disk(super, ord_to_idx(ord)); /* check for new failures */ if (state & DS_FAULTY) { - if (mark_failure(dev, disk, ord_to_idx(ord))) + if (mark_failure(dev, dl)) super->updates_pending++; } @@ -6265,7 +6269,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u, for (du = super->missing; du; du = du->next) if (du->index >= 0) { set_imsm_ord_tbl_ent(map, du->index, du->index); - mark_missing(dev_new, &du->disk, du->index); + mark_missing(dev_new, du); } return 1; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: Something wrong with __prep_thunderdome in super-intel.c 2011-03-25 2:40 ` Dan Williams @ 2011-03-25 8:43 ` Kwolek, Adam 2011-03-25 18:50 ` Dan Williams 2011-03-28 2:28 ` NeilBrown 2011-03-28 1:35 ` NeilBrown 1 sibling, 2 replies; 14+ messages in thread From: Kwolek, Adam @ 2011-03-25 8:43 UTC (permalink / raw) To: Williams, Dan J, NeilBrown Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech, Wojcik, Krzysztof > -----Original Message----- > From: Williams, Dan J > Sent: Friday, March 25, 2011 3:41 AM > To: NeilBrown > Cc: Kwolek, Adam; linux-raid@vger.kernel.org; Ciechanowski, Ed; > Neubauer, Wojciech; Wojcik, Krzysztof > Subject: Re: Something wrong with __prep_thunderdome in super-intel.c > > On Mon, 2011-03-21 at 19:23 -0700, NeilBrown wrote: > > Hi Dan, > > > > I think you were the original author of imsm_thunderdome and > > __prep_thunderdome - yes? > > yes. > > > > > I found a case (thank to the test suite) where it isn't working > correctly, > > > > If I have a container with 2 devices, and the second one is failed, > then > > imsm_thunderdome returns NULL. > > > > __prep_thunderdome sees the first and adds it to the table of > superblocks. > > Then it sees the second notices the family_num and checksum are the > same, and > > so replaces the first with the second in the table. > > > > Then in imsm_thunderdome, d->serial is full of 'nul', so > disk_list_get > > doesn't find anything so the super_table becomes empty and nothing > works. > > > > So it could be: > > load_and_parse_mpb is wrong for putting the nul serial in there > > __prep_thunderdome is wrong for thinking the two are equivalent > > imsm_thunderdome is wrong for giving up when just one device isn't > found > > > > and I really don't know which. > > hmm... > > <context switch out of isci driver review mode> > > > > > You can easily reproduce with > > > > ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01] > > ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm > > ./mdadm /dev/md/r1 -f /dev/loop1 > > ./mdadm -E /dev/md/imsm > > > > and notice that nothing gets printed. > > If you fail loop0 instead, it works properly. > > The first thought is that the generation number on the good device > should be ahead of the bad, but lo and behold it isn't. We should stop > advancing the generation number on failed devices. The regression that > Krzysztof's patch somewhat addresses is that thunderdome expects that > failed devices should at least be able to look themselves up in their > own mpb. Fixing up the serial number so that other devices see the > other disk as out of date is the expectation. > > Stopping metadata write outs to failed devices fixes both problems. > Krzysztof care to try the following with your recent change reverted? I > verified it addresses the problem above, but I have not had a chance to > double check the orom compatibility (although I suspect it will be ok). > This also simplifies the calling convention to mark_missing and > mark_failure. > > -- > Dan > > diff --git a/super-intel.c b/super-intel.c > index 2b41e08..6a6f738 100644 > --- a/super-intel.c > +++ b/super-intel.c > @@ -5213,13 +5213,14 @@ static int is_resyncing(struct imsm_dev *dev) > } > > /* return true if we recorded new information */ > -static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, > int idx) > +static int mark_failure(struct imsm_dev *dev, struct dl *dl) > { > __u32 ord; > int slot; > struct imsm_map *map; > char buf[MAX_RAID_SERIAL_LEN+3]; > - unsigned int len, shift = 0; > + struct imsm_disk *disk = &dl->disk; > + unsigned int len, shift = 0, idx = dl->index; > > /* new failures are always set in map[0] */ > map = get_imsm_map(dev, 0); > @@ -5232,6 +5233,7 @@ static int mark_failure(struct imsm_dev *dev, > struct imsm_disk *disk, int idx) > if (is_failed(disk) && (ord & IMSM_ORD_REBUILD)) > return 0; > > + dl->index = -2; > sprintf(buf, "%s:0", disk->serial); > if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN) > shift = len - MAX_RAID_SERIAL_LEN + 1; > @@ -5244,9 +5246,11 @@ static int mark_failure(struct imsm_dev *dev, > struct imsm_disk *disk, int idx) > return 1; > } > > -static void mark_missing(struct imsm_dev *dev, struct imsm_disk *disk, > int idx) > +static void mark_missing(struct imsm_dev *dev, struct dl *dl) > { > - mark_failure(dev, disk, idx); > + struct imsm_disk *disk = &dl->disk; > + > + mark_failure(dev, dl); > > if (disk->scsi_id == __cpu_to_le32(~(__u32)0)) > return; > @@ -5269,7 +5273,7 @@ static void handle_missing(struct intel_super > *super, struct imsm_dev *dev) > dprintf("imsm: mark missing\n"); > end_migration(dev, map_state); > for (dl = super->missing; dl; dl = dl->next) > - mark_missing(dev, &dl->disk, dl->index); > + mark_missing(dev, dl); > super->updates_pending++; > } > > @@ -5497,7 +5501,7 @@ static void imsm_set_disk(struct active_array *a, > int n, int state) > struct intel_super *super = a->container->sb; > struct imsm_dev *dev = get_imsm_dev(super, inst); > struct imsm_map *map = get_imsm_map(dev, 0); > - struct imsm_disk *disk; > + struct dl *dl; > int failed; > __u32 ord; > __u8 map_state; > @@ -5512,11 +5516,11 @@ static void imsm_set_disk(struct active_array > *a, int n, int state) > dprintf("imsm: set_disk %d:%x\n", n, state); > > ord = get_imsm_ord_tbl_ent(dev, n, -1); > - disk = get_imsm_disk(super, ord_to_idx(ord)); > + dl = get_imsm_dl_disk(super, ord_to_idx(ord)); > > /* check for new failures */ > if (state & DS_FAULTY) { > - if (mark_failure(dev, disk, ord_to_idx(ord))) > + if (mark_failure(dev, dl)) > super->updates_pending++; > } > > @@ -6265,7 +6269,7 @@ static int apply_takeover_update(struct > imsm_update_takeover *u, > for (du = super->missing; du; du = du->next) > if (du->index >= 0) { > set_imsm_ord_tbl_ent(map, du->index, du->index); > - mark_missing(dev_new, &du->disk, du->index); > + mark_missing(dev_new, du); > } > > return 1; > Unfortunately this doesn't close issue. I've reached the same from IMSM compatibility point of view, but have in mind that unit test suit 11 fails using this approach. Put focus on test 5 in this suit (I'm using it, as this is the only one that fails for me) for IMSM (note that first test pass is for native metadata). Using your path tests 5b and 5c fails. My code fails 5c test only. I'm currently working on it. BR Adam ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Something wrong with __prep_thunderdome in super-intel.c 2011-03-25 8:43 ` Kwolek, Adam @ 2011-03-25 18:50 ` Dan Williams 2011-03-28 2:28 ` NeilBrown 1 sibling, 0 replies; 14+ messages in thread From: Dan Williams @ 2011-03-25 18:50 UTC (permalink / raw) To: Kwolek, Adam Cc: NeilBrown, linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech, Wojcik, Krzysztof On 3/25/2011 1:43 AM, Kwolek, Adam wrote: > > Unfortunately this doesn't close issue. > > I've reached the same from IMSM compatibility point of view, but have in mind that unit test suit 11 fails using this approach. > Put focus on test 5 in this suit (I'm using it, as this is the only one that fails for me) for IMSM (note that first test pass is for native metadata). > Using your path tests 5b and 5c fails. My code fails 5c test only. I'm currently working on it. I am seeing a different failing set (after applying Anna's patches): 1, 1a, 2, 4, 5b, 5c... One thing too look out for is that failed disks will no longer self identify. Prior to this mdadm would update the metadata on failed disks to show "failed", now the disk must be viewed in relation to others to determine if it is stale or not. This makes the mdadm -f case more in line with the hot removed case. -- Dan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Something wrong with __prep_thunderdome in super-intel.c 2011-03-25 8:43 ` Kwolek, Adam 2011-03-25 18:50 ` Dan Williams @ 2011-03-28 2:28 ` NeilBrown 1 sibling, 0 replies; 14+ messages in thread From: NeilBrown @ 2011-03-28 2:28 UTC (permalink / raw) To: Kwolek, Adam Cc: Williams, Dan J, linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech, Wojcik, Krzysztof On Fri, 25 Mar 2011 08:43:26 +0000 "Kwolek, Adam" <adam.kwolek@intel.com> wrote: > > Unfortunately this doesn't close issue. > > I've reached the same from IMSM compatibility point of view, but have in mind that unit test suit 11 fails using this approach. > Put focus on test 5 in this suit (I'm using it, as this is the only one that fails for me) for IMSM (note that first test pass is for native metadata). > Using your path tests 5b and 5c fails. My code fails 5c test only. I'm currently working on it. I've got it passing all migration tests with your patch. The '5' tests have a problem that they are trying to use both dev6 and dev7 which have the same backing device and are used for multipath testing. I change it to use dev10 instead of dev7 and everything passes. So I'll be releasing 3.2.1 shortly with what I've got. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Something wrong with __prep_thunderdome in super-intel.c 2011-03-25 2:40 ` Dan Williams 2011-03-25 8:43 ` Kwolek, Adam @ 2011-03-28 1:35 ` NeilBrown 2011-03-28 16:56 ` Dan Williams 1 sibling, 1 reply; 14+ messages in thread From: NeilBrown @ 2011-03-28 1:35 UTC (permalink / raw) To: Dan Williams Cc: Kwolek, Adam, linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech, Wojcik, Krzysztof On Thu, 24 Mar 2011 19:40:46 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > > hmm... > > <context switch out of isci driver review mode> :-) > > > > > You can easily reproduce with > > > > ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01] > > ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm > > ./mdadm /dev/md/r1 -f /dev/loop1 > > ./mdadm -E /dev/md/imsm > > > > and notice that nothing gets printed. > > If you fail loop0 instead, it works properly. > > The first thought is that the generation number on the good device > should be ahead of the bad, but lo and behold it isn't. We should stop > advancing the generation number on failed devices. The regression that > Krzysztof's patch somewhat addresses is that thunderdome expects that > failed devices should at least be able to look themselves up in their > own mpb. Fixing up the serial number so that other devices see the > other disk as out of date is the expectation. > > Stopping metadata write outs to failed devices fixes both problems. > Krzysztof care to try the following with your recent change reverted? I > verified it addresses the problem above, but I have not had a chance to > double check the orom compatibility (although I suspect it will be ok). > This also simplifies the calling convention to mark_missing and > mark_failure. Thanks for looking at this Dan. I tried your patch, however ... > > diff --git a/super-intel.c b/super-intel.c > index 2b41e08..6a6f738 100644 > --- a/super-intel.c > +++ b/super-intel.c > @@ -5213,13 +5213,14 @@ static int is_resyncing(struct imsm_dev *dev) > } > > /* return true if we recorded new information */ > -static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) > +static int mark_failure(struct imsm_dev *dev, struct dl *dl) > { > __u32 ord; > int slot; > struct imsm_map *map; > char buf[MAX_RAID_SERIAL_LEN+3]; > - unsigned int len, shift = 0; > + struct imsm_disk *disk = &dl->disk; > + unsigned int len, shift = 0, idx = dl->index; > > /* new failures are always set in map[0] */ > map = get_imsm_map(dev, 0); > @@ -5232,6 +5233,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) > if (is_failed(disk) && (ord & IMSM_ORD_REBUILD)) > return 0; > > + dl->index = -2; > sprintf(buf, "%s:0", disk->serial); > if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN) > shift = len - MAX_RAID_SERIAL_LEN + 1; > @@ -5244,9 +5246,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) > return 1; > } > > -static void mark_missing(struct imsm_dev *dev, struct imsm_disk *disk, int idx) > +static void mark_missing(struct imsm_dev *dev, struct dl *dl) > { > - mark_failure(dev, disk, idx); > + struct imsm_disk *disk = &dl->disk; > + > + mark_failure(dev, dl); > > if (disk->scsi_id == __cpu_to_le32(~(__u32)0)) > return; > @@ -5269,7 +5273,7 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev) > dprintf("imsm: mark missing\n"); > end_migration(dev, map_state); > for (dl = super->missing; dl; dl = dl->next) > - mark_missing(dev, &dl->disk, dl->index); > + mark_missing(dev, dl); > super->updates_pending++; > } > > @@ -5497,7 +5501,7 @@ static void imsm_set_disk(struct active_array *a, int n, int state) > struct intel_super *super = a->container->sb; > struct imsm_dev *dev = get_imsm_dev(super, inst); > struct imsm_map *map = get_imsm_map(dev, 0); > - struct imsm_disk *disk; > + struct dl *dl; > int failed; > __u32 ord; > __u8 map_state; > @@ -5512,11 +5516,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state) > dprintf("imsm: set_disk %d:%x\n", n, state); > > ord = get_imsm_ord_tbl_ent(dev, n, -1); > - disk = get_imsm_disk(super, ord_to_idx(ord)); > + dl = get_imsm_dl_disk(super, ord_to_idx(ord)); This sometimes return NULL, leading to bad stuff and mdmon crashing.... So there is more to this than meets the eye... I'll stop trying this patch. Thanks, NeilBrown > > /* check for new failures */ > if (state & DS_FAULTY) { > - if (mark_failure(dev, disk, ord_to_idx(ord))) > + if (mark_failure(dev, dl)) > super->updates_pending++; > } > > @@ -6265,7 +6269,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u, > for (du = super->missing; du; du = du->next) > if (du->index >= 0) { > set_imsm_ord_tbl_ent(map, du->index, du->index); > - mark_missing(dev_new, &du->disk, du->index); > + mark_missing(dev_new, du); > } > > return 1; > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Something wrong with __prep_thunderdome in super-intel.c 2011-03-28 1:35 ` NeilBrown @ 2011-03-28 16:56 ` Dan Williams 0 siblings, 0 replies; 14+ messages in thread From: Dan Williams @ 2011-03-28 16:56 UTC (permalink / raw) To: NeilBrown Cc: Kwolek, Adam, linux-raid@vger.kernel.org, Ciechanowski, Ed, Neubauer, Wojciech, Wojcik, Krzysztof On Sun, 2011-03-27 at 18:35 -0700, NeilBrown wrote: > On Thu, 24 Mar 2011 19:40:46 -0700 Dan Williams <dan.j.williams@intel.com> > wrote: > > > <context switch out of isci driver review mode> > > :-) > [..] > > - disk = get_imsm_disk(super, ord_to_idx(ord)); > > + dl = get_imsm_dl_disk(super, ord_to_idx(ord)); > > This sometimes return NULL, leading to bad stuff and mdmon crashing.... > > So there is more to this than meets the eye... Yes, (and I chalk this up to context switch latency), setting the index to -2 is not correct as other paths need to be able to reference a valid disk index until the failed device is removed via a rebuild. > I'll stop trying this patch. Ok, here is a proposed v2 on top of the latest devel-3.2, but I need to play with it a bit more, and figure out what the spare migration test is complaining about. diff --git a/super-intel.c b/super-intel.c index 6e12af2..e2f66aa 100644 --- a/super-intel.c +++ b/super-intel.c @@ -3993,7 +3993,7 @@ static int write_super_imsm(struct supertype *st, int doclose) /* write the mpb for disks that compose raid devices */ for (d = super->disks; d ; d = d->next) { - if (d->index < 0) + if (d->index < 0 || is_failed(&d->disk)) continue; if (store_imsm_mpb(d->fd, mpb)) fprintf(stderr, "%s: failed for device %d:%d %s\n", @@ -5218,6 +5218,8 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) __u32 ord; int slot; struct imsm_map *map; + char buf[MAX_RAID_SERIAL_LEN+3]; + unsigned int len, shift = 0; /* new failures are always set in map[0] */ map = get_imsm_map(dev, 0); @@ -5230,6 +5232,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx) if (is_failed(disk) && (ord & IMSM_ORD_REBUILD)) return 0; + sprintf(buf, "%s:0", disk->serial); + if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN) + shift = len - MAX_RAID_SERIAL_LEN + 1; + strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN); + disk->status |= FAILED_DISK; set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD); if (map->failed_disk_num == 0xff) ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-03-28 16:56 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-14 14:09 [PATCH 0/3] UT and error case changes Adam Kwolek 2011-03-14 14:09 ` [PATCH 1/3] imsm: FIX: existing backup file fails unit tests Adam Kwolek 2011-03-14 14:09 ` [PATCH 2/3] External metadata has to be restored to initial state in error case Adam Kwolek 2011-03-14 14:09 ` [PATCH 3/3] imsm: Add metadata abort changes handler template Adam Kwolek 2011-03-14 21:53 ` [PATCH 0/3] UT and error case changes NeilBrown 2011-03-15 7:28 ` Kwolek, Adam 2011-03-18 2:07 ` NeilBrown 2011-03-22 2:23 ` Something wrong with __prep_thunderdome in super-intel.c NeilBrown 2011-03-25 2:40 ` Dan Williams 2011-03-25 8:43 ` Kwolek, Adam 2011-03-25 18:50 ` Dan Williams 2011-03-28 2:28 ` NeilBrown 2011-03-28 1:35 ` NeilBrown 2011-03-28 16:56 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).