* Fixes for DDF test case (race conditions in mdmon)
@ 2013-03-25 23:10 mwilck
  2013-03-25 23:11 ` [PATCH 1/8] DDF: __write_ddf_structure: Fix wrong reference to ddf->primary mwilck
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:10 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
This is a series of patches to fix the DDF tests case(tests/10ddf-create).
Patch 3 fixes a problem that was implicitly introduced by commit 
c1ea5a98 - "mdadm -Db" would now print UUIDs for subarrays, but the
wrong ones.
Patch 5-7 are the interesting part. The DDF test case kept failing for
me right there where a comment already says "# This failed once. 
The raid5 was resyncing". I tracked this down to race conditions
between mdmon and the kernel when arrays are stopped - the kernel
cleans up in sysfs and reading the attributes from there is unreliable.
Files may have already vanished, read() may fail or even succeed
with incorrect results.
The other patches are hopefully self-explaining.
Regards
Martin
Martin Wilck (8):
  DDF: __write_ddf_structure: Fix wrong reference to ddf->primary
  DDF: __write_init_super_ddf: just use seq number of active header
  DDF: brief_detail_super_ddf: print correct UUID for subarrays
  DDF: add code to debug state changes
  monitor: don't call pselect() on deleted sysfs files
  monitor: read_and_act: handle race conditions for resync_start
  monitor: treat unreadable array_state as clean
  tests/10ddf-create: omit log output check
 monitor.c          |   38 ++++++++++++----
 super-ddf.c        |  124 ++++++++++++++++++++++++++++++++++++++-------------
 tests/10ddf-create |    6 +--
 3 files changed, 121 insertions(+), 47 deletions(-)
^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH 1/8] DDF: __write_ddf_structure: Fix wrong reference to ddf->primary
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 2/8] DDF: __write_init_super_ddf: just use seq number of active header mwilck
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
Should reference "header" instead here.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/super-ddf.c b/super-ddf.c
index c5f6f5c..32dd023 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -2443,7 +2443,7 @@ static int __write_ddf_structure(struct dl *d, struct ddf_super *ddf, __u8 type,
 					&dummy);
 		}
 		if (c) {
-			vdc->seqnum = ddf->primary.seq;
+			vdc->seqnum = header->seq;
 			vdc->crc = calc_crc(vdc, conf_size);
 			if (write(fd, vdc, conf_size) < 0)
 				break;
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 2/8] DDF: __write_init_super_ddf: just use seq number of active header
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
  2013-03-25 23:11 ` [PATCH 1/8] DDF: __write_ddf_structure: Fix wrong reference to ddf->primary mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 4/8] DDF: add code to debug state changes mwilck
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
It's not necessary to check for 0xffffffff, which is a valid
sequential number.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/super-ddf.c b/super-ddf.c
index 32dd023..85af345 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -2483,12 +2483,7 @@ static int __write_init_super_ddf(struct supertype *st)
 	}
 	memset(null_aligned, 0xff, NULL_CONF_SZ);
 
-	if (ddf->primary.seq != 0xffffffff)
-		seq = __cpu_to_be32(__be32_to_cpu(ddf->primary.seq)+1);
-	else if (ddf->secondary.seq != 0xffffffff)
-		seq = __cpu_to_be32(__be32_to_cpu(ddf->secondary.seq)+1);
-	else
-		seq = __cpu_to_be32(1);
+	seq = ddf->active->seq + 1;
 
 	/* try to write updated metadata,
 	 * if we catch a failure move on to the next disk
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 4/8] DDF: add code to debug state changes
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
  2013-03-25 23:11 ` [PATCH 1/8] DDF: __write_ddf_structure: Fix wrong reference to ddf->primary mwilck
  2013-03-25 23:11 ` [PATCH 2/8] DDF: __write_init_super_ddf: just use seq number of active header mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 3/8] DDF: brief_detail_super_ddf: print correct UUID for subarrays mwilck
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
The 10ddf-create test case fails sporadically because wrong meta
data is written, making the array appear inconsistent when it's
restarted. Added code to aid debugging this.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   49 ++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 36 insertions(+), 13 deletions(-)
diff --git a/super-ddf.c b/super-ddf.c
index bd08d85..455f0f8 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -446,6 +446,28 @@ struct ddf_super {
 #define offsetof(t,f) ((size_t)&(((t*)0)->f))
 #endif
 
+#if DEBUG
+static int all_ff(char *guid);
+static void pr_state(struct ddf_super *ddf, const char *msg)
+{
+	unsigned int i;
+	dprintf("%s/%s: ", __func__, msg);
+	for (i = 0; i < __be16_to_cpu(ddf->active->max_vd_entries); i++) {
+		if (all_ff(ddf->virt->entries[i].guid))
+			continue;
+		dprintf("%u(s=%02x i=%02x) ", i,
+			ddf->virt->entries[i].state,
+			ddf->virt->entries[i].init_state);
+	}
+	dprintf("\n");
+}
+#else
+static void pr_state(const struct ddf_super *ddf, const char *msg) {}
+#endif
+
+#define ddf_set_updates_pending(x) \
+	do { (x)->updates_pending = 1; pr_state(x, __func__); } while (0)
+
 static unsigned int calc_crc(void *buf, int len)
 {
 	/* crcs are always at the same place as in the ddf_header */
@@ -1886,7 +1908,7 @@ static int init_super_ddf(struct supertype *st,
 		memset(&vd->entries[i], 0xff, sizeof(struct virtual_entry));
 
 	st->sb = ddf;
-	ddf->updates_pending = 1;
+	ddf_set_updates_pending(ddf);
 	return 1;
 }
 
@@ -2164,7 +2186,7 @@ static int init_super_ddf_bvd(struct supertype *st,
 	vcl->next = ddf->conflist;
 	ddf->conflist = vcl;
 	ddf->currentconf = vcl;
-	ddf->updates_pending = 1;
+	ddf_set_updates_pending(ddf);
 	return 1;
 }
 
@@ -2268,7 +2290,7 @@ static void add_to_super_ddf_bvd(struct supertype *st,
 
 	ddf->phys->entries[dl->pdnum].type &= ~__cpu_to_be16(DDF_Global_Spare);
 	ddf->phys->entries[dl->pdnum].type |= __cpu_to_be16(DDF_Active_in_VD);
-	ddf->updates_pending = 1;
+	ddf_set_updates_pending(ddf);
 }
 
 /* add a device to a container, either while creating it or while
@@ -2371,7 +2393,7 @@ static int add_to_super_ddf(struct supertype *st,
 	} else {
 		dd->next = ddf->dlist;
 		ddf->dlist = dd;
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 	}
 
 	return 0;
@@ -2520,6 +2542,7 @@ static int __write_init_super_ddf(struct supertype *st)
 	char *null_aligned;
 	__u32 seq;
 
+	pr_state(ddf, __func__);
 	if (posix_memalign((void**)&null_aligned, 4096, NULL_CONF_SZ) != 0) {
 		return -ENOMEM;
 	}
@@ -3599,7 +3622,7 @@ static int ddf_set_array_state(struct active_array *a, int consistent)
 	else
 		ddf->virt->entries[inst].state |= DDF_state_inconsistent;
 	if (old != ddf->virt->entries[inst].state)
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 
 	old = ddf->virt->entries[inst].init_state;
 	ddf->virt->entries[inst].init_state &= ~DDF_initstate_mask;
@@ -3610,7 +3633,7 @@ static int ddf_set_array_state(struct active_array *a, int consistent)
 	else
 		ddf->virt->entries[inst].init_state |= DDF_init_quick;
 	if (old != ddf->virt->entries[inst].init_state)
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 
 	dprintf("ddf mark %d %s %llu\n", inst, consistent?"clean":"dirty",
 		a->info.resync_start);
@@ -3677,7 +3700,7 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 				~__cpu_to_be16(DDF_Global_Spare);
 			ddf->phys->entries[pd].type |=
 				__cpu_to_be16(DDF_Active_in_VD);
-			ddf->updates_pending = 1;
+			ddf_set_updates_pending(ddf);
 		}
 	} else {
 		int old = ddf->phys->entries[pd].state;
@@ -3688,7 +3711,7 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 			ddf->phys->entries[pd].state  &= __cpu_to_be16(~DDF_Rebuilding);
 		}
 		if (old != ddf->phys->entries[pd].state)
-			ddf->updates_pending = 1;
+			ddf_set_updates_pending(ddf);
 	}
 
 	dprintf("ddf: set_disk %d to %x\n", n, state);
@@ -3743,7 +3766,7 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 		ddf->virt->entries[inst].state =
 			(ddf->virt->entries[inst].state & ~DDF_state_mask)
 			| state;
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 	}
 
 }
@@ -3836,7 +3859,7 @@ static void ddf_process_update(struct supertype *st,
 					break;
 				}
 			}
-			ddf->updates_pending = 1;
+			ddf_set_updates_pending(ddf);
 			return;
 		}
 		if (!all_ff(ddf->phys->entries[ent].guid))
@@ -3844,7 +3867,7 @@ static void ddf_process_update(struct supertype *st,
 		ddf->phys->entries[ent] = pd->entries[0];
 		ddf->phys->used_pdes = __cpu_to_be16(1 +
 						     __be16_to_cpu(ddf->phys->used_pdes));
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 		if (ddf->add_list) {
 			struct active_array *a;
 			struct dl *al = ddf->add_list;
@@ -3876,7 +3899,7 @@ static void ddf_process_update(struct supertype *st,
 		ddf->virt->entries[ent] = vd->entries[0];
 		ddf->virt->populated_vdes = __cpu_to_be16(1 +
 							  __be16_to_cpu(ddf->virt->populated_vdes));
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 		break;
 
 	case DDF_VD_CONF_MAGIC:
@@ -4005,7 +4028,7 @@ static void ddf_process_update(struct supertype *st,
 			pd2++;
 		}
 
-		ddf->updates_pending = 1;
+		ddf_set_updates_pending(ddf);
 		break;
 	case DDF_SPARE_ASSIGN_MAGIC:
 	default: break;
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 3/8] DDF: brief_detail_super_ddf: print correct UUID for subarrays
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
                   ` (2 preceding siblings ...)
  2013-03-25 23:11 ` [PATCH 4/8] DDF: add code to debug state changes mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 8/8] tests/10ddf-create: omit log output check mwilck
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
Commit c1ea5a98 caused brief_detail_super_ddf() to be called
for subarrays. But the UUID printed was always the one of the
container. This is wrong and actually worse than printing no UUID
at all, and causes the DDF test case (10ddf-create) to fail.
This patch adds code to determine the MD UUID of a subarray correctly.
The hard part is to figure out for which subarray the function is
called. Moved that to an extra function.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   66 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 54 insertions(+), 12 deletions(-)
diff --git a/super-ddf.c b/super-ddf.c
index 85af345..bd08d85 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -44,6 +44,9 @@ unsigned long crc32(
 	const unsigned char *buf,
 	unsigned len);
 
+#define DDF_NOTFOUND (~0U)
+#define DDF_CONTAINER (DDF_NOTFOUND-1)
+
 /* The DDF metadata handling.
  * DDF metadata lives at the end of the device.
  * The last 512 byte block provides an 'anchor' which is used to locate
@@ -1219,8 +1222,40 @@ static void examine_super_ddf(struct supertype *st, char *homehost)
 
 static void getinfo_super_ddf(struct supertype *st, struct mdinfo *info, char *map);
 
+static void uuid_from_ddf_guid(const char *guid, int uuid[4]);
 static void uuid_from_super_ddf(struct supertype *st, int uuid[4]);
 
+static unsigned int get_vd_num_of_subarray(struct supertype *st)
+{
+	/*
+	 * Figure out the VD number for this supertype.
+	 * Returns DDF_CONTAINER for the container itself,
+	 * and DDF_NOTFOUND on error.
+	 */
+	struct ddf_super *ddf = st->sb;
+	struct mdinfo *sra;
+	char *sub, *end;
+	unsigned int vcnum;
+
+	if (*st->container_devnm == '\0')
+		return DDF_CONTAINER;
+
+	sra = sysfs_read(-1, st->devnm, GET_VERSION);
+	if (!sra || sra->array.major_version != -1 ||
+	    sra->array.minor_version != -2 ||
+	    !is_subarray(sra->text_version))
+		return DDF_NOTFOUND;
+
+	sub = strchr(sra->text_version + 1, '/');
+	if (sub != NULL)
+		vcnum = strtoul(sub + 1, &end, 10);
+	if (sub == NULL || *sub == '\0' || *end != '\0' ||
+	    vcnum >= __be16_to_cpu(ddf->active->max_vd_entries))
+		return DDF_NOTFOUND;
+
+	return vcnum;
+}
+
 static void brief_examine_super_ddf(struct supertype *st, int verbose)
 {
 	/* We just write a generic DDF ARRAY entry
@@ -1282,13 +1317,16 @@ static void detail_super_ddf(struct supertype *st, char *homehost)
 
 static void brief_detail_super_ddf(struct supertype *st)
 {
-	/* FIXME I really need to know which array we are detailing.
-	 * Can that be stored in ddf_super??
-	 */
-//	struct ddf_super *ddf = st->sb;
 	struct mdinfo info;
 	char nbuf[64];
-	getinfo_super_ddf(st, &info, NULL);
+	struct ddf_super *ddf = st->sb;
+	unsigned int vcnum = get_vd_num_of_subarray(st);
+	if (vcnum == DDF_CONTAINER)
+		uuid_from_super_ddf(st, info.uuid);
+	else if (vcnum == DDF_NOTFOUND)
+		return;
+	else
+		uuid_from_ddf_guid(ddf->virt->entries[vcnum].guid, info.uuid);
 	fname_from_uuid(st, &info, nbuf,':');
 	printf(" UUID=%s", nbuf + 5);
 }
@@ -1337,6 +1375,16 @@ static int find_phys(struct ddf_super *ddf, __u32 phys_refnum)
 	return -1;
 }
 
+static void uuid_from_ddf_guid(const char *guid, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(guid, DDF_GUID_LEN, &ctx);
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+}
+
 static void uuid_from_super_ddf(struct supertype *st, int uuid[4])
 {
 	/* The uuid returned here is used for:
@@ -1361,18 +1409,12 @@ static void uuid_from_super_ddf(struct supertype *st, int uuid[4])
 	struct ddf_super *ddf = st->sb;
 	struct vcl *vcl = ddf->currentconf;
 	char *guid;
-	char buf[20];
-	struct sha1_ctx ctx;
 
 	if (vcl)
 		guid = vcl->conf.guid;
 	else
 		guid = ddf->anchor.guid;
-
-	sha1_init_ctx(&ctx);
-	sha1_process_bytes(guid, DDF_GUID_LEN, &ctx);
-	sha1_finish_ctx(&ctx, buf);
-	memcpy(uuid, buf, 4*4);
+	uuid_from_ddf_guid(guid, uuid);
 }
 
 static void getinfo_super_ddf_bvd(struct supertype *st, struct mdinfo *info, char *map);
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 8/8] tests/10ddf-create: omit log output check
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
                   ` (3 preceding siblings ...)
  2013-03-25 23:11 ` [PATCH 3/8] DDF: brief_detail_super_ddf: print correct UUID for subarrays mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 7/8] monitor: treat unreadable array_state as clean mwilck
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
The test script was counting output lines - its expectations
don't match the current code any more. Remove this pointless
test.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 tests/10ddf-create |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)
diff --git a/tests/10ddf-create b/tests/10ddf-create
index 0925612..3055a5d 100644
--- a/tests/10ddf-create
+++ b/tests/10ddf-create
@@ -67,11 +67,7 @@ mdadm -Ss
 # and now assemble fully incrementally.
 for i in  $dev8 $dev9 $dev10 $dev11 $dev12
 do 
-  #./mdadm -I $i -vv 2>&1 | wc -l > /tmp/cnt
-  ./mdadm -I $i 2> /tmp/thing
-  wc -l < /tmp/thing > /tmp/cnt
-  # should find container and 2 devices, so 3 lines.
-  [ `cat /tmp/cnt` -eq 3 ]
+   mdadm -I $i -c /var/tmp/mdadm.conf
 done
 check nosync
 udevadm settle
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 7/8] monitor: treat unreadable array_state as clean
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
                   ` (4 preceding siblings ...)
  2013-03-25 23:11 ` [PATCH 8/8] tests/10ddf-create: omit log output check mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 6/8] monitor: read_and_act: handle race conditions for resync_start mwilck
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
Failure to read array_state can only mean the array has been
deleted by the kernel; it is not an indication that the array
is dirty.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 monitor.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/monitor.c b/monitor.c
index 60c5d5a..47432b2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -265,7 +265,7 @@ static int read_and_act(struct active_array *a)
 		 */
 		a->container->ss->set_array_state(a, 0);
 	}
-	if (a->curr_state <= inactive &&
+	if ((a->curr_state == bad_word || a->curr_state <= inactive) &&
 	    a->prev_state > inactive) {
 		/* array has been stopped */
 		a->container->ss->set_array_state(a, 1);
@@ -288,8 +288,7 @@ static int read_and_act(struct active_array *a)
 		a->container->ss->set_array_state(a, 1);
 	}
 	if (a->curr_state == active ||
-	    a->curr_state == suspended ||
-	    a->curr_state == bad_word)
+	    a->curr_state == suspended)
 		ret |= ARRAY_DIRTY;
 	if (a->curr_state == readonly) {
 		/* Well, I'm ready to handle things.  If readonly
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 6/8] monitor: read_and_act: handle race conditions for resync_start
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
                   ` (5 preceding siblings ...)
  2013-03-25 23:11 ` [PATCH 7/8] monitor: treat unreadable array_state as clean mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-03-25 23:11 ` [PATCH 5/8] monitor: don't call pselect() on deleted sysfs files mwilck
  2013-04-23  7:15 ` Fixes for DDF test case (race conditions in mdmon) NeilBrown
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
When arrays are stopped, sysfs attributes may be deleted by
the kernel, and attempts to read these attributes will fail.
Setting resync_start to 0 is wrong in this case, because it
may make is_resync_complete() erroneously return
FALSE for a clean array. It is better to leave resync_start
untouched (the previously read value for this array).
Otherwise set_array_state() will pass thewrong state information
to the metadata handler, which will write it to disk, and at
the next restart an unnecessary recovery is started for the
array.
It is also possible that resync_start is actually *not* deleted
yet when read_and_act is running, and an apparently valid
value of "0" is read from it, with the same effect as described
above. This happens if the kernel has already called md_clean()
on the array (setting recovery_cp = 0), but the delayed removal
of "resync_start" hasn't happened yet. Therefore, in "clear"
state, "resync_start" shouldn't be read at all.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 monitor.c |   24 +++++++++++++++++-------
 1 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/monitor.c b/monitor.c
index 3cb4214..60c5d5a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -75,18 +75,21 @@ static int read_attr(char *buf, int len, int fd)
 	return n;
 }
 
-static unsigned long long read_resync_start(int fd)
+static void read_resync_start(int fd, unsigned long long *v)
 {
 	char buf[30];
 	int n;
 
 	n = read_attr(buf, 30, fd);
-	if (n <= 0)
-		return 0;
+	if (n <= 0) {
+		dprintf("%s: Failed to read resync_start (%d)\n",
+			__func__, fd);
+		return;
+	}
 	if (strncmp(buf, "none", 4) == 0)
-		return MaxSector;
+		*v = MaxSector;
 	else
-		return strtoull(buf, NULL, 10);
+		*v = strtoull(buf, NULL, 10);
 }
 
 static unsigned long long read_sync_completed(int fd)
@@ -237,13 +240,20 @@ static int read_and_act(struct active_array *a)
 
 	a->curr_state = read_state(a->info.state_fd);
 	a->curr_action = read_action(a->action_fd);
-	a->info.resync_start = read_resync_start(a->resync_start_fd);
+	if (a->curr_state != clear)
+		/*
+		 * In "clear" state, resync_start may wrongly be set to "0"
+		 * when the kernel called md_clean but didn't remove the
+		 * sysfs attributes yet
+		 */
+		read_resync_start(a->resync_start_fd, &a->info.resync_start);
 	sync_completed = read_sync_completed(a->sync_completed_fd);
 	for (mdi = a->info.devs; mdi ; mdi = mdi->next) {
 		mdi->next_state = 0;
 		mdi->curr_state = 0;
 		if (mdi->state_fd >= 0) {
-			mdi->recovery_start = read_resync_start(mdi->recovery_fd);
+			read_resync_start(mdi->recovery_fd,
+					  &mdi->recovery_start);
 			mdi->curr_state = read_dev_state(mdi->state_fd);
 		}
 	}
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH 5/8] monitor: don't call pselect() on deleted sysfs files
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
                   ` (6 preceding siblings ...)
  2013-03-25 23:11 ` [PATCH 6/8] monitor: read_and_act: handle race conditions for resync_start mwilck
@ 2013-03-25 23:11 ` mwilck
  2013-04-23  7:15 ` Fixes for DDF test case (race conditions in mdmon) NeilBrown
  8 siblings, 0 replies; 13+ messages in thread
From: mwilck @ 2013-03-25 23:11 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
It makes no sense to listen for events on files that have
been deleted. This happens when arrays are stopped and the
kernel removes the associated sysfs structures.
Calling pselect() on the deleted attributes may cause a storm
of wake events.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 monitor.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/monitor.c b/monitor.c
index e034a6a..3cb4214 100644
--- a/monitor.c
+++ b/monitor.c
@@ -38,8 +38,17 @@ static int write_attr(char *attr, int fd)
 
 static void add_fd(fd_set *fds, int *maxfd, int fd)
 {
+	struct stat st;
 	if (fd < 0)
 		return;
+	if (fstat(fd, &st) == -1) {
+		dprintf("%s: Invalid fd %d\n", __func__, fd);
+		return;
+	}
+	if (st.st_nlink == 0) {
+		dprintf("%s: fd %d was deleted\n", __func__, fd);
+		return;
+	}
 	if (fd > *maxfd)
 		*maxfd = fd;
 	FD_SET(fd, fds);
-- 
1.7.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* Re: Fixes for DDF test case (race conditions in mdmon)
  2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
                   ` (7 preceding siblings ...)
  2013-03-25 23:11 ` [PATCH 5/8] monitor: don't call pselect() on deleted sysfs files mwilck
@ 2013-04-23  7:15 ` NeilBrown
  2013-04-23 17:30   ` Martin Wilck
  2013-04-23 18:10   ` [PATCH] DDF: fix bug in compare_super_ddf mwilck
  8 siblings, 2 replies; 13+ messages in thread
From: NeilBrown @ 2013-04-23  7:15 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]
On Fri, 25 Oct 2013 12:07:31 +0200 mwilck@arcor.de wrote:
> This is a series of patches to fix the DDF tests case(tests/10ddf-create).
> 
> Patch 3 fixes a problem that was implicitly introduced by commit 
> c1ea5a98 - "mdadm -Db" would now print UUIDs for subarrays, but the
> wrong ones.
> 
> Patch 5-7 are the interesting part. The DDF test case kept failing for
> me right there where a comment already says "# This failed once. 
> The raid5 was resyncing". I tracked this down to race conditions
> between mdmon and the kernel when arrays are stopped - the kernel
> cleans up in sysfs and reading the attributes from there is unreliable.
> Files may have already vanished, read() may fail or even succeed
> with incorrect results.
> 
> The other patches are hopefully self-explaining.
> 
> Regards
> Martin
> 
> Martin Wilck (8):
>   DDF: __write_ddf_structure: Fix wrong reference to ddf->primary
>   DDF: __write_init_super_ddf: just use seq number of active header
>   DDF: brief_detail_super_ddf: print correct UUID for subarrays
>   DDF: add code to debug state changes
>   monitor: don't call pselect() on deleted sysfs files
>   monitor: read_and_act: handle race conditions for resync_start
>   monitor: treat unreadable array_state as clean
>   tests/10ddf-create: omit log output check
> 
>  monitor.c          |   38 ++++++++++++----
>  super-ddf.c        |  124 ++++++++++++++++++++++++++++++++++++++-------------
>  tests/10ddf-create |    6 +--
>  3 files changed, 121 insertions(+), 47 deletions(-)
Thanks for these, and sorry for the delay in getting to them
(though the Date: field was
   Date: Fri, 25 Oct 2013 12:07:31 +0200
 so I've actually processed them 6 months before they will have been sent!!!)
All make sense and have been applied and push out.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Fixes for DDF test case (race conditions in mdmon)
  2013-04-23  7:15 ` Fixes for DDF test case (race conditions in mdmon) NeilBrown
@ 2013-04-23 17:30   ` Martin Wilck
  2013-04-23 18:10   ` [PATCH] DDF: fix bug in compare_super_ddf mwilck
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Wilck @ 2013-04-23 17:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid
On 04/23/2013 09:15 AM, NeilBrown wrote:
> Thanks for these, and sorry for the delay in getting to them
> (though the Date: field was
>    Date: Fri, 25 Oct 2013 12:07:31 +0200
>  so I've actually processed them 6 months before they will have been sent!!!)
Yeah, I had a problem with my RTC and didn't notice until I had sent out
the patches. Sorry about that.
> All make sense and have been applied and push out.
Thanks a lot,
Martin
^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH] DDF: fix bug in compare_super_ddf
  2013-04-23  7:15 ` Fixes for DDF test case (race conditions in mdmon) NeilBrown
  2013-04-23 17:30   ` Martin Wilck
@ 2013-04-23 18:10   ` mwilck
  2013-04-24  6:34     ` NeilBrown
  1 sibling, 1 reply; 13+ messages in thread
From: mwilck @ 2013-04-23 18:10 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck
Fix bug in previous patch
"DDF: compare_super_ddf: merge local info of other superblock"
Just discovered this bug in my last patch set - unfortunately, just after
you committed it.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/super-ddf.c b/super-ddf.c
index 455f0f8..ea8439b 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -3514,7 +3514,7 @@ static int compare_super_ddf(struct supertype *st, struct supertype *tst)
 
 		vl1->next = first->conflist;
 		vl1->block_sizes = NULL;
-		if (vl1->conf.sec_elmnt_count > 1) {
+		if (vl2->conf.sec_elmnt_count > 1) {
 			vl1->other_bvds = xcalloc(vl2->conf.sec_elmnt_count - 1,
 						  sizeof(struct vd_config *));
 		} else
-- 
1.7.3.4
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* Re: [PATCH] DDF: fix bug in compare_super_ddf
  2013-04-23 18:10   ` [PATCH] DDF: fix bug in compare_super_ddf mwilck
@ 2013-04-24  6:34     ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2013-04-24  6:34 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 963 bytes --]
On Tue, 23 Apr 2013 20:10:16 +0200 mwilck@arcor.de wrote:
> Fix bug in previous patch
> "DDF: compare_super_ddf: merge local info of other superblock"
> 
> Just discovered this bug in my last patch set - unfortunately, just after
> you committed it.
Life's like that sometimes :-)
Applied- thanks.
NeilBrown
> 
> Signed-off-by: Martin Wilck <mwilck@arcor.de>
> ---
>  super-ddf.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/super-ddf.c b/super-ddf.c
> index 455f0f8..ea8439b 100644
> --- a/super-ddf.c
> +++ b/super-ddf.c
> @@ -3514,7 +3514,7 @@ static int compare_super_ddf(struct supertype *st, struct supertype *tst)
>  
>  		vl1->next = first->conflist;
>  		vl1->block_sizes = NULL;
> -		if (vl1->conf.sec_elmnt_count > 1) {
> +		if (vl2->conf.sec_elmnt_count > 1) {
>  			vl1->other_bvds = xcalloc(vl2->conf.sec_elmnt_count - 1,
>  						  sizeof(struct vd_config *));
>  		} else
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply	[flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-04-24  6:34 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-25 23:10 Fixes for DDF test case (race conditions in mdmon) mwilck
2013-03-25 23:11 ` [PATCH 1/8] DDF: __write_ddf_structure: Fix wrong reference to ddf->primary mwilck
2013-03-25 23:11 ` [PATCH 2/8] DDF: __write_init_super_ddf: just use seq number of active header mwilck
2013-03-25 23:11 ` [PATCH 4/8] DDF: add code to debug state changes mwilck
2013-03-25 23:11 ` [PATCH 3/8] DDF: brief_detail_super_ddf: print correct UUID for subarrays mwilck
2013-03-25 23:11 ` [PATCH 8/8] tests/10ddf-create: omit log output check mwilck
2013-03-25 23:11 ` [PATCH 7/8] monitor: treat unreadable array_state as clean mwilck
2013-03-25 23:11 ` [PATCH 6/8] monitor: read_and_act: handle race conditions for resync_start mwilck
2013-03-25 23:11 ` [PATCH 5/8] monitor: don't call pselect() on deleted sysfs files mwilck
2013-04-23  7:15 ` Fixes for DDF test case (race conditions in mdmon) NeilBrown
2013-04-23 17:30   ` Martin Wilck
2013-04-23 18:10   ` [PATCH] DDF: fix bug in compare_super_ddf mwilck
2013-04-24  6:34     ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).