From: NeilBrown <neilb@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Kwolek, Adam" <adam.kwolek@intel.com>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
"Neubauer, Wojciech" <Wojciech.Neubauer@intel.com>,
"Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>
Subject: Re: Something wrong with __prep_thunderdome in super-intel.c
Date: Mon, 28 Mar 2011 12:35:09 +1100 [thread overview]
Message-ID: <20110328123509.043555e7@notabene.brown> (raw)
In-Reply-To: <1301020846.15264.12.camel@dwillia2-linux>
On Thu, 24 Mar 2011 19:40:46 -0700 Dan Williams <dan.j.williams@intel.com>
wrote:
>
> hmm...
>
> <context switch out of isci driver review mode>
:-)
>
> >
> > You can easily reproduce with
> >
> > ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01]
> > ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm
> > ./mdadm /dev/md/r1 -f /dev/loop1
> > ./mdadm -E /dev/md/imsm
> >
> > and notice that nothing gets printed.
> > If you fail loop0 instead, it works properly.
>
> The first thought is that the generation number on the good device
> should be ahead of the bad, but lo and behold it isn't. We should stop
> advancing the generation number on failed devices. The regression that
> Krzysztof's patch somewhat addresses is that thunderdome expects that
> failed devices should at least be able to look themselves up in their
> own mpb. Fixing up the serial number so that other devices see the
> other disk as out of date is the expectation.
>
> Stopping metadata write outs to failed devices fixes both problems.
> Krzysztof care to try the following with your recent change reverted? I
> verified it addresses the problem above, but I have not had a chance to
> double check the orom compatibility (although I suspect it will be ok).
> This also simplifies the calling convention to mark_missing and
> mark_failure.
Thanks for looking at this Dan.
I tried your patch, however ...
>
> diff --git a/super-intel.c b/super-intel.c
> index 2b41e08..6a6f738 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -5213,13 +5213,14 @@ static int is_resyncing(struct imsm_dev *dev)
> }
>
> /* return true if we recorded new information */
> -static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> +static int mark_failure(struct imsm_dev *dev, struct dl *dl)
> {
> __u32 ord;
> int slot;
> struct imsm_map *map;
> char buf[MAX_RAID_SERIAL_LEN+3];
> - unsigned int len, shift = 0;
> + struct imsm_disk *disk = &dl->disk;
> + unsigned int len, shift = 0, idx = dl->index;
>
> /* new failures are always set in map[0] */
> map = get_imsm_map(dev, 0);
> @@ -5232,6 +5233,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
> return 0;
>
> + dl->index = -2;
> sprintf(buf, "%s:0", disk->serial);
> if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
> shift = len - MAX_RAID_SERIAL_LEN + 1;
> @@ -5244,9 +5246,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> return 1;
> }
>
> -static void mark_missing(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> +static void mark_missing(struct imsm_dev *dev, struct dl *dl)
> {
> - mark_failure(dev, disk, idx);
> + struct imsm_disk *disk = &dl->disk;
> +
> + mark_failure(dev, dl);
>
> if (disk->scsi_id == __cpu_to_le32(~(__u32)0))
> return;
> @@ -5269,7 +5273,7 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
> dprintf("imsm: mark missing\n");
> end_migration(dev, map_state);
> for (dl = super->missing; dl; dl = dl->next)
> - mark_missing(dev, &dl->disk, dl->index);
> + mark_missing(dev, dl);
> super->updates_pending++;
> }
>
> @@ -5497,7 +5501,7 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
> struct intel_super *super = a->container->sb;
> struct imsm_dev *dev = get_imsm_dev(super, inst);
> struct imsm_map *map = get_imsm_map(dev, 0);
> - struct imsm_disk *disk;
> + struct dl *dl;
> int failed;
> __u32 ord;
> __u8 map_state;
> @@ -5512,11 +5516,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
> dprintf("imsm: set_disk %d:%x\n", n, state);
>
> ord = get_imsm_ord_tbl_ent(dev, n, -1);
> - disk = get_imsm_disk(super, ord_to_idx(ord));
> + dl = get_imsm_dl_disk(super, ord_to_idx(ord));
This sometimes return NULL, leading to bad stuff and mdmon crashing....
So there is more to this than meets the eye...
I'll stop trying this patch.
Thanks,
NeilBrown
>
> /* check for new failures */
> if (state & DS_FAULTY) {
> - if (mark_failure(dev, disk, ord_to_idx(ord)))
> + if (mark_failure(dev, dl))
> super->updates_pending++;
> }
>
> @@ -6265,7 +6269,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
> for (du = super->missing; du; du = du->next)
> if (du->index >= 0) {
> set_imsm_ord_tbl_ent(map, du->index, du->index);
> - mark_missing(dev_new, &du->disk, du->index);
> + mark_missing(dev_new, du);
> }
>
> return 1;
>
next prev parent reply other threads:[~2011-03-28 1:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-14 14:09 [PATCH 0/3] UT and error case changes Adam Kwolek
2011-03-14 14:09 ` [PATCH 1/3] imsm: FIX: existing backup file fails unit tests Adam Kwolek
2011-03-14 14:09 ` [PATCH 2/3] External metadata has to be restored to initial state in error case Adam Kwolek
2011-03-14 14:09 ` [PATCH 3/3] imsm: Add metadata abort changes handler template Adam Kwolek
2011-03-14 21:53 ` [PATCH 0/3] UT and error case changes NeilBrown
2011-03-15 7:28 ` Kwolek, Adam
2011-03-18 2:07 ` NeilBrown
2011-03-22 2:23 ` Something wrong with __prep_thunderdome in super-intel.c NeilBrown
2011-03-25 2:40 ` Dan Williams
2011-03-25 8:43 ` Kwolek, Adam
2011-03-25 18:50 ` Dan Williams
2011-03-28 2:28 ` NeilBrown
2011-03-28 1:35 ` NeilBrown [this message]
2011-03-28 16:56 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110328123509.043555e7@notabene.brown \
--to=neilb@suse.de \
--cc=Wojciech.Neubauer@intel.com \
--cc=adam.kwolek@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ed.ciechanowski@intel.com \
--cc=krzysztof.wojcik@intel.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.