linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly
@ 2014-04-24  7:22 Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

As mentioned in the kernel enabling [1] these cache volumes augment the
existing imsm model in that cache legs each imply a container.  In the
standard cache configuration two single-drive-raid0 volumes (from
separate containers) are associated into a cached volume.  The diagram
below attempts to make this clearer.

+-----------------------+              +----------------------+             
|          sda          | SSD          |         sdb          | HDD         
| +-------------------+ |              | +------------------+ |             
| |   /dev/md/imsm0   | | Container0   | |  /dev/md/imsm1   | | Container1  
| | +---------------+ | |              | | +--------------+ | |             
| | | /dev/md/vol0  | | | RAID Volume0 | | | /dev/md/vol1 | | | RAID Volume1
| | |  +---------+  | | |              | | | +----------+ | | |             
| | |  |SRT Cache|  | | |              | | | |SRT Target| | | |             
+-+-+--+----+----+--+-+-+              +-+-+-+----+-----+-+-+-+             
            |                                     |                         
            |                                     |                         
            |          HDD Cached by SSD          |                         
            |           +--------------+          |                         
            +-----------+ /dev/md/isrt +----------+                         
                        +--------------+                                    

In support of the standard mdadm volume discovery model a uuid a
synthesized from the combination of the two container-family-numbers and
immutable volume-ids.  Examine_brief is modified to aggregate cache legs
across containers.

Create support is not included, but existing volumes can be
auto-assembled:

  mdadm -Ebs > conf
  mdadm -Asc conf

To facilitate testing the patches are also available on github, but note
that this branch will rebase according to review feedback.

  git://github.com/djbw/mdadm isrt

[1]: http://marc.info/?l=linux-raid&m=139832034826379&w=2

---

Dan Williams (11):
      sysfs: fix sysfs_set_array() to accept valid negative array levels
      make must_be_container() more selective
      Assemble: show the uuid in the verbose case
      Assemble: teardown partially assembled arrays
      Examine: support for coalescing "cache legs"
      imsm: immutable volume id
      imsm: cache metadata definitions
      imsm: read cache metadata
      imsm: examine cache configurations
      imsm: assemble cache volumes
      imsm: support cache enabled arrays


 Assemble.c    |   27 ++-
 Examine.c     |   92 +++++++++-
 Makefile      |    2 
 isrt-intel.h  |  270 ++++++++++++++++++++++++++++++
 maps.c        |    1 
 mdadm.h       |    4 
 super-intel.c |  516 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 sysfs.c       |   13 +
 util.c        |   18 +-
 9 files changed, 865 insertions(+), 78 deletions(-)
 create mode 100644 isrt-intel.h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

From: Dan Williams <dan.j.william@intel.com>

Assume this FIXME was to prevent loading a personality for containers.
Fix it up to accept the values that correlate with the actual md kernel
personalities.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 sysfs.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/sysfs.c b/sysfs.c
index 9a1d856960e8..4cbd4e5d051b 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -628,8 +628,9 @@ int sysfs_set_array(struct mdinfo *info, int vers)
 			return 1;
 		}
 	}
-	if (info->array.level < 0)
-		return 0; /* FIXME */
+	/* containers have no personality, they're rather bland */
+	if (info->array.level <= LEVEL_CONTAINER)
+		return 0;
 	rv |= sysfs_set_str(info, NULL, "level",
 			    map_num(pers, info->array.level));
 	if (info->reshape_active && info->delta_disks != UnSet)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 02/11] make must_be_container() more selective
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Cache configurations in mid-assembly may appear to be a "container" as
they are 0-sized and external.  Teach must_be_container() to look for
"external:<metadata name>" as the container identfier from
<sysfs>/md/metadata_version.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 util.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/util.c b/util.c
index afb2bb110f24..93f9200fa4c7 100644
--- a/util.c
+++ b/util.c
@@ -1176,18 +1176,19 @@ int get_dev_size(int fd, char *dname, unsigned long long *sizep)
 }
 
 /* Return true if this can only be a container, not a member device.
- * i.e. is and md device and size is zero
+ * i.e. is and md device and the text_version matches an external
+ * metadata format
  */
 int must_be_container(int fd)
 {
-	unsigned long long size;
-	if (md_get_version(fd) < 0)
+	struct mdinfo *sra = sysfs_read(fd, NULL, GET_VERSION);
+	struct superswitch *ss;
+
+	if (!sra)
 		return 0;
-	if (get_dev_size(fd, NULL, &size) == 0)
-		return 1;
-	if (size == 0)
-		return 1;
-	return 0;
+	ss = version_to_superswitch(sra->text_version);
+	sysfs_free(sra);
+	return ss ? ss->external : 0;
 }
 
 /* Sets endofpart parameter to the last block used by the last GPT partition on the device.


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Make the verbose output more usable when the array name is not given.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Assemble.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 05ace561fb50..a72d427f4773 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1288,9 +1288,21 @@ try_again:
 	 */
 	if (!st && ident->st)
 		st = ident->st;
-	if (c->verbose>0)
-		pr_err("looking for devices for %s\n",
-		       mddev ? mddev : "further assembly");
+
+	if (c->verbose > 0) {
+		char uuid[64], *id;
+
+		if (mddev)
+			id = mddev;
+		else if (ident->uuid_set) {
+			__fname_from_uuid(ident->uuid,
+					  st ? st->ss->swapuuid : 0,
+					  uuid, ':');
+			id = uuid + 5;
+		} else
+			id = "further assembly";
+		pr_err("looking for devices for %s\n", id);
+	}
 
 	content = &info;
 	if (st)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (2 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

In the scenario of assembling composite md arrays, e.g. /dev/md2 with
/dev/md1 and /dev/md0 as components, we want /dev/md2 assembly to be
delayed until the components are available.  If we attempt to assemble
/dev/md2 when only /dev/md0 is available /dev/md2 will not be fully
initialized, and /dev/md1 will be assembled to an equally defunct
/dev/md3.

So teardown the early /dev/md2 on the expectation that more devices will
arrive in a later assembly pass.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Assemble.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index a72d427f4773..9a2399ef6411 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1147,16 +1147,17 @@ static int start_array(int mdfd,
 		if (sparecnt)
 			fprintf(stderr, " and %d spare%s", sparecnt, sparecnt==1?"":"s");
 		if (!enough(content->array.level, content->array.raid_disks,
-			    content->array.layout, 1, avail))
+			    content->array.layout, 1, avail)) {
 			fprintf(stderr, " - not enough to start the array.\n");
-		else if (!enough(content->array.level,
+			ioctl(mdfd, STOP_ARRAY, NULL);
+		} else if (!enough(content->array.level,
 				 content->array.raid_disks,
 				 content->array.layout, clean,
-				 avail))
+				 avail)) {
 			fprintf(stderr, " - not enough to start the "
 				"array while not clean - consider "
 				"--force.\n");
-		else {
+		} else {
 			if (req_cnt == (unsigned)content->array.raid_disks)
 				fprintf(stderr, " - need all %d to start it", req_cnt);
 			else


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs"
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (3 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

'isrt' volumes introduce a new category of a array.  They have
components that are subarrays from separate containers (likely thanks to
the constraint that all active members in an imsm container must be
members of all subarrays).  We want '-Eb' to identify the composite
volume uuid, but the default coalescing (by container) results in
duplicated output of the cache volume uuid.

Instead, introduce infrastructure to handle this directly.

1/ add ->cache_legs to struct mdinfo to indicate how many subarrays in a given
   container are components (legs) of a cache association.

2/ add ->cache_leg to struct supertype to indicate a cache leg to
   enumerate via ->getinfo_super()

3/ teach Examine to coalesce cache volumes across containers by uuid and
   dump their details via ->brief_examine_cache() extension to struct
   superswitch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Examine.c |   92 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 mdadm.h   |    3 ++
 2 files changed, 83 insertions(+), 12 deletions(-)

diff --git a/Examine.c b/Examine.c
index 953b8eee2360..945af8454a5f 100644
--- a/Examine.c
+++ b/Examine.c
@@ -30,6 +30,73 @@
 #endif
 #include	"md_u.h"
 #include	"md_p.h"
+
+struct array {
+	struct supertype *st;
+	struct mdinfo info;
+	void *devs;
+	struct array *next;
+	int spares;
+	int cache_leg;
+};
+
+static struct array *add_cache_legs(struct array *caches, struct supertype *st,
+				    struct mdinfo *info, struct array *arrays)
+{
+	struct mdinfo cache_info;
+	struct array *ap;
+	int i;
+
+	for (i = 1; i <= info->cache_legs; i++) {
+		/* in the case where the cache leg is assembled its uuid
+		 * may appear in the arrays list, so we need to check
+		 * both the caches list and the arrays list for
+		 * duplicates
+		 */
+		struct array *lists[] = { caches, arrays };
+		int j;
+
+		st->cache_leg = i;
+		st->ss->getinfo_super(st, &cache_info, NULL);
+		st->cache_leg = 0;
+		for (j = 0; j < 2; j++) {
+			for (ap = lists[j]; ap; ap = ap->next) {
+				if (st->ss == ap->st->ss
+				    && same_uuid(ap->info.uuid, cache_info.uuid,
+						 st->ss->swapuuid))
+					break;
+			}
+			if (ap)
+				break;
+		}
+		if (!ap) {
+			ap = xcalloc(1, sizeof(*ap));
+			ap->devs = dl_head();
+			ap->next = caches;
+			ap->st = st;
+			ap->cache_leg = i;
+			caches = ap;
+			memcpy(&ap->info, &cache_info, sizeof(cache_info));
+		}
+	}
+
+	return caches;
+}
+
+static void free_arrays(struct array *arrays)
+{
+	struct array *ap;
+
+	while (arrays) {
+		ap = arrays;
+		arrays = ap->next;
+
+		ap->st->ss->free_super(ap->st);
+		free(ap);
+	}
+}
+
+
 int Examine(struct mddev_dev *devlist,
 	    struct context *c,
 	    struct supertype *forcest)
@@ -54,14 +121,7 @@ int Examine(struct mddev_dev *devlist,
 	int fd;
 	int rv = 0;
 	int err = 0;
-
-	struct array {
-		struct supertype *st;
-		struct mdinfo info;
-		void *devs;
-		struct array *next;
-		int spares;
-	} *arrays = NULL;
+	struct array *arrays = NULL, *caches = NULL;
 
 	for (; devlist ; devlist = devlist->next) {
 		struct supertype *st;
@@ -131,13 +191,14 @@ int Examine(struct mddev_dev *devlist,
 					break;
 			}
 			if (!ap) {
-				ap = xmalloc(sizeof(*ap));
+				ap = xcalloc(1, sizeof(*ap));
 				ap->devs = dl_head();
 				ap->next = arrays;
-				ap->spares = 0;
 				ap->st = st;
 				arrays = ap;
 				st->ss->getinfo_super(st, &ap->info, NULL);
+				caches = add_cache_legs(caches, st, &ap->info,
+							arrays);
 			} else
 				st->ss->getinfo_super(st, &ap->info, NULL);
 			if (!have_container &&
@@ -179,11 +240,18 @@ int Examine(struct mddev_dev *devlist,
 					printf("\n");
 				ap->st->ss->brief_examine_subarrays(ap->st, c->verbose);
 			}
-			ap->st->ss->free_super(ap->st);
-			/* FIXME free ap */
 			if (ap->spares || c->verbose > 0)
 				printf("\n");
 		}
+		/* list container caches after their parent containers
+		 * and subarrays
+		 */
+		for (ap = caches; ap; ap = ap->next)
+			if (ap->st->ss->brief_examine_cache)
+				ap->st->ss->brief_examine_cache(ap->st, ap->cache_leg);
+		free_arrays(arrays);
+		free_arrays(caches);
+
 	}
 	return rv;
 }
diff --git a/mdadm.h b/mdadm.h
index f6a614e19316..111f90f599af 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -233,6 +233,7 @@ struct mdinfo {
 	int container_enough; /* flag external handlers can set to
 			       * indicate that subarrays have not enough (-1),
 			       * enough to start (0), or all expected disks (1) */
+	int cache_legs; /* number of cross-container cache members in this 'array' */
 	char		sys_name[20];
 	struct mdinfo *devs;
 	struct mdinfo *next;
@@ -684,6 +685,7 @@ extern struct superswitch {
 	void (*examine_super)(struct supertype *st, char *homehost);
 	void (*brief_examine_super)(struct supertype *st, int verbose);
 	void (*brief_examine_subarrays)(struct supertype *st, int verbose);
+	void (*brief_examine_cache)(struct supertype *st, int cache);
 	void (*export_examine_super)(struct supertype *st);
 	int (*examine_badblocks)(struct supertype *st, int fd, char *devname);
 	int (*copy_metadata)(struct supertype *st, int from, int to);
@@ -1006,6 +1008,7 @@ struct supertype {
 				 Used when examining metadata to display content of disk
 				 when user has no hw/firmare compatible system.
 			      */
+	int cache_leg; /* hack to interrogate cache legs within containers */
 	struct metadata_update *updates;
 	struct metadata_update **update_tail;
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 06/11] imsm: immutable volume id
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (4 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Support the new extensions to have a volume-id that is immutable and
unique for the life of the container.  Prior to this change deleting and
recreating a volume would result in it having the same uuid as the
previous volume in that position.  Now, every time a volume is created a
container generation count is incremented allowing the volume-ids to
include container generation salt.

TODO update kill_subarray_imsm() and update_subarray_imsm() to allow
deletion and renaming (respectively) of arrays with an immutable id.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 super-intel.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index f0a7ab5ccc7a..07e4c68982cd 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -188,8 +188,9 @@ struct imsm_dev {
 	__u16 cache_policy;
 	__u8  cng_state;
 	__u8  cng_sub_state;
-#define IMSM_DEV_FILLERS 10
-	__u32 filler[IMSM_DEV_FILLERS];
+	__u16 dev_id;
+	__u16 fill;
+	__u32 filler[9];
 	struct imsm_vol vol;
 } __attribute__ ((packed));
 
@@ -209,8 +210,9 @@ struct imsm_super {
 	__u32 orig_family_num;		/* 0x40 - 0x43 original family num */
 	__u32 pwr_cycle_count;		/* 0x44 - 0x47 simulated power cycle count for array */
 	__u32 bbm_log_size;		/* 0x48 - 0x4B - size of bad Block Mgmt Log in bytes */
-#define IMSM_FILLERS 35
-	__u32 filler[IMSM_FILLERS];	/* 0x4C - 0xD7 RAID_MPB_FILLERS */
+	__u16 create_events;		/* counter for generating unique ids */
+	__u16 fill1;
+	__u32 filler[34];
 	struct imsm_disk disk[1];	/* 0xD8 diskTbl[numDisks] */
 	/* here comes imsm_dev[num_raid_devs] */
 	/* here comes BBM logs */
@@ -1984,6 +1986,30 @@ static int match_home_imsm(struct supertype *st, char *homehost)
 	return -1;
 }
 
+static void volume_uuid_from_super(struct intel_super *super, struct sha1_ctx *ctx)
+{
+	struct imsm_dev *dev = NULL;
+
+	if (super->current_vol >= 0)
+		dev = get_imsm_dev(super, super->current_vol);
+
+	if (!dev)
+		return;
+
+	/* if the container is tracking creation events then dev_id is
+	 * valid and we can advertise an immutable uuid, otherwise use the
+	 * old volume-position/name method
+	 */
+	if (super->anchor->create_events) {
+		sha1_process_bytes(&dev->dev_id, sizeof(dev->dev_id), ctx);
+	} else {
+		__u32 vol = super->current_vol;
+
+		sha1_process_bytes(&vol, sizeof(vol), ctx);
+		sha1_process_bytes(dev->volume, MAX_RAID_SERIAL_LEN, ctx);
+	}
+}
+
 static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
 {
 	/* The uuid returned here is used for:
@@ -2012,7 +2038,6 @@ static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
 
 	char buf[20];
 	struct sha1_ctx ctx;
-	struct imsm_dev *dev = NULL;
 	__u32 family_num;
 
 	/* some mdadm versions failed to set ->orig_family_num, in which
@@ -2025,13 +2050,7 @@ static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
 	sha1_init_ctx(&ctx);
 	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
 	sha1_process_bytes(&family_num, sizeof(__u32), &ctx);
-	if (super->current_vol >= 0)
-		dev = get_imsm_dev(super, super->current_vol);
-	if (dev) {
-		__u32 vol = super->current_vol;
-		sha1_process_bytes(&vol, sizeof(vol), &ctx);
-		sha1_process_bytes(dev->volume, MAX_RAID_SERIAL_LEN, &ctx);
-	}
+	volume_uuid_from_super(super, &ctx);
 	sha1_finish_ctx(&ctx, buf);
 	memcpy(uuid, buf, 4*4);
 }
@@ -4562,6 +4581,39 @@ static int check_name(struct intel_super *super, char *name, int quiet)
 	return !reason;
 }
 
+static void new_dev_id(struct intel_super *super, struct imsm_dev *new_dev)
+{
+	struct imsm_super *mpb = super->anchor;
+	__u16 create_events, i;
+
+	/* only turn on create_events tracking for newly born
+	 * containers, lest we change the uuid of live volumes (see
+	 * volume_uuid_from_super())
+	 */
+	create_events = __le16_to_cpu(mpb->create_events);
+	if (super->current_vol > 0 && !create_events)
+		return;
+
+	/* catch the case of create_events wrapping to an existing id in
+	 * the mpb
+	 */
+	do {
+		create_events++;
+		/* wrap to 1, because zero means no dev_id support */
+		if (create_events == 0)
+			create_events = 1;
+		for (i = 0; i < mpb->num_raid_devs; i++) {
+			struct imsm_dev *dev = __get_imsm_dev(mpb, i);
+
+			if (dev->dev_id == __cpu_to_le16(create_events))
+				break;
+		}
+	} while (i < mpb->num_raid_devs);
+
+	new_dev->dev_id = __cpu_to_le16(create_events);
+	mpb->create_events = __cpu_to_le16(create_events);
+}
+
 static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
 				  unsigned long long size, char *name,
 				  char *homehost, int *uuid,
@@ -4655,6 +4707,7 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
 		return 0;
 	dv = xmalloc(sizeof(*dv));
 	dev = xcalloc(1, sizeof(*dev) + sizeof(__u32) * (info->raid_disks - 1));
+	new_dev_id(super, dev);
 	strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_LEN);
 	array_blocks = calc_array_size(info->level, info->raid_disks,
 					       info->layout, info->chunk_size,


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 07/11] imsm: cache metadata definitions
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (5 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
@ 2014-04-24  7:22 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:22 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Makefile      |    2 
 isrt-intel.h  |  256 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 super-intel.c |   16 +++-
 3 files changed, 271 insertions(+), 3 deletions(-)
 create mode 100644 isrt-intel.h

diff --git a/Makefile b/Makefile
index b823d85f89e3..7d50df69a744 100644
--- a/Makefile
+++ b/Makefile
@@ -127,7 +127,7 @@ CHECK_OBJS = restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
 
 SRCS =  $(patsubst %.o,%.c,$(OBJS))
 
-INCL = mdadm.h part.h bitmap.h
+INCL = mdadm.h part.h bitmap.h isrt-intel.h platform-intel.h
 
 MON_OBJS = mdmon.o monitor.o managemon.o util.o maps.o mdstat.o sysfs.o \
 	policy.o lib.o \
diff --git a/isrt-intel.h b/isrt-intel.h
new file mode 100644
index 000000000000..50365de1a620
--- /dev/null
+++ b/isrt-intel.h
@@ -0,0 +1,256 @@
+/*
+ * mdadm - Intel(R) Smart Response Technology Support
+ *
+ * Copyright (C) 2011-2014 Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef __ISRT_INTEL_H__
+#define __ISRT_INTEL_H__
+
+enum {
+	/* for a given cache device how many volumes can be associated */
+	MAX_NV_CACHE_VOLS = 1,
+	/* likely should be dynamically configurable when this driver is
+	 * made more generic
+	 */
+	ISRT_FRAME_SIZE = 8192,
+	VOL_CONFIG_RESERVED = 32,
+	MD_HEADER_RESERVED = 32,
+	MAX_RAID_SERIAL_LEN = 16,
+	NVC_SIG_LEN = 32,
+	ISRT_DEV_IDX = 0,
+	ISRT_TARGET_DEV_IDX = 1,
+
+	NV_CACHE_MODE_OFF          = 0,
+	NV_CACHE_MODE_OFF_TO_SAFE  = 1, /* powerfail recovery state */
+	NV_CACHE_MODE_OFF_TO_PERF  = 2, /* powerfail recovery state */
+	NV_CACHE_MODE_SAFE         = 3,
+	NV_CACHE_MODE_SAFE_TO_OFF  = 4,
+	NV_CACHE_MODE_PERF         = 5,
+	NV_CACHE_MODE_PERF_TO_OFF  = 6,
+	NV_CACHE_MODE_PERF_TO_SAFE = 7,
+	NV_CACHE_MODE_IS_FAILING   = 8,
+	NV_CACHE_MODE_HAS_FAILED   = 9,
+	NV_CACHE_MODE_DIS_PERF     = 10, /* caching on volume or nv cache disabled */
+	NV_CACHE_MODE_DIS_SAFE     = 11, /* volume or NV cache not associated */
+};
+
+struct segment_index_pair {
+	__u32 segment;
+	__u32 index;
+};
+
+#define NV_CACHE_CONFIG_SIG "Intel IMSM NV Cache Cfg. Sig.   "
+#define MAX_NVC_SIZE_GB            128UL      /* Max NvCache we can support is 128GB */
+#define NVC_FRAME_SIZE             8192UL
+#define NVC_FRAME_SIZE_IN_KB       (NVC_FRAME_SIZE / 1024UL)                  /* 8 */
+#define NVC_FRAMES_PER_GB          (1024UL * (1024UL / NVC_FRAME_SIZE_IN_KB))   /* 128k */
+#define MAX_NVC_FRAMES             (MAX_NVC_SIZE_GB * NVC_FRAMES_PER_GB)    /* 16m */
+#define SEGIDX_PAIRS_PER_NVC_FRAME (NVC_FRAME_SIZE / sizeof(struct segment_index_pair)) /* 1k */
+#define SEGHEAP_SEGS_PER_NVC_FRAME (NVC_FRAME_SIZE / sizeof(__u32)) /* 2k */
+#define FRAMES_PER_SEGHEAP_FRAME   (SEGIDX_PAIRS_PER_NVC_FRAME \
+				    * SEGHEAP_SEGS_PER_NVC_FRAME) /* 2m */
+#define MAX_SEGHEAP_NVC_FRAMES     (MAX_NVC_FRAMES/FRAMES_PER_SEGHEAP_FRAME)  /* 8 */
+#define MAX_SEGHEAP_TOC_ENTRIES    (MAX_SEGHEAP_NVC_FRAMES + 1)
+
+
+/* XXX: size of enum guarantees? */
+enum nvc_shutdown_state {
+	ShutdownStateNormal,
+	ShutdownStateS4CrashDmpStart,
+	ShutdownStateS4CrashDmpEnd,
+	ShutdownStateS4CrashDmpFailed
+};
+
+struct isrt_mpb {
+	/*
+	 * Metadata array (packed_md0_nba or packed_md1_nba).  is the base for
+	 * the Metadata Delta Log changes.  The current contents of the Metadata
+	 * Delta Log applied to this packed metadata base becomes the working
+	 * packed metadata upon recovery from a power failure.  The alternate
+	 * packed metadata array, indicated by (md_base_for_delta_log ^1) is
+	 * where the next complete write of packed metadata from DRAM will be
+	 * written. On a clean shutdown, packed metadata will also be written to
+	 * the alternate array.
+	 */
+	__u32 packed_md0_nba; /* Start of primary packed metadata array */
+	__u32 packed_md1_nba; /* Start of secondary packed metadata array */
+	__u32 md_base_for_delta_log; /* 0 or 1. Indicates which packed */
+	__u32 packed_md_size; /* Size of packed metadata array in bytes */
+	__u32 aux_packed_md_nba; /* Start of array of extra metadata for driver use */
+	__u32 aux_packed_md_size; /* Size of array of extra metadata for driver use */
+	__u32 cache_frame0_nba; /* Start of actual cache frames */
+	__u32 seg_num_index_nba; /* Start of the Seg_num_index array */
+	__u32 seg_num_heap_nba; /* Start of the Seg_num_heap */
+	__u32 seg_num_heap_size; /* Size of the Seg_num Heap in bytes (always a */
+	/*
+	 * Multiple of NVM_PAGE_SIZE bytes. The Seg_nums in the tail of the last
+	 * page are all set to 0xFFFFFFFF
+	 */
+	__u32 seg_heap_toc[MAX_SEGHEAP_TOC_ENTRIES];
+	__u32 md_delta_log_nba; /* Start of the Metadata Delta Log region */
+	/*  The Delta Log is a circular buffer */
+	__u32 md_delta_log_max_size; /* Size of the Metadata Delta Log region in bytes */
+	__u32 orom_frames_to_sync_nba; /* Start of the orom_frames_to_sync record */
+	__u32 num_cache_frames; /* Total number of cache frames */
+	__u32 cache_frame_size; /* Size of each cache frame in bytes */
+	__u32 lba_alignment; /* Offset to add to host I/O request LBA before
+			       * shifting to form the segment number
+			       */
+	__u32 valid_frame_gen_num; /* Valid cache frame generation number */
+	/*
+	 * If the cache frame metadata contains a smaller generation number,
+	 * that frame's contents are considered invalid.
+	 */
+	__u32 packed_md_frame_gen_num; /* Packed metadata frame generation number */
+	/*
+	 * This is the frame generation number associated with all frames in the
+	 * packed metadata array. If this is < valid_frame_gen_num, then all
+	 * frames in packed metadata are considered invalid.
+	 */
+	__u32 curr_clean_batch_num; /* Initialized to 0, incremented whenever
+				      * the cache goes clean. If this value is
+				      * greater than the Nv_cache_metadata
+				      * dirty_batch_num in the atomic metadata
+				      * of the cache frame, the frame is
+				      * considered clean.
+				      */
+	__u32 total_used_sectors; /* Total number of NVM sectors of size
+				    * NVM_SECTOR_SIZE used by cache frames and
+				    * metadata.
+				    */
+	/* OROM I/O Log fields */
+	__u32 orom_log_nba; /* OROM I/O Log area for next boot */
+	__u32 orom_log_size; /* OROM I/O Log size in 512-byte blocks */
+
+	/* Hibernate/Crashdump Extent_log */
+	__u32 s4_crash_dmp_extent_log_nba; /* I/O Extent Log area created by the */
+					   /* hibernate/crashdump driver for OROM */
+	/* Driver shutdown state utilized by the OROM */
+	enum nvc_shutdown_state driver_shutdown_state;
+
+	__u32 validity_bits;
+	__u64 nvc_hdr_array_in_dram;
+
+	/* The following fields are used in managing the Metadata Delta Log. */
+
+	/*
+	 * Every delta record in the Metadata Delta Log  has a copy of the value
+	 * of this field at the time the record was written. This gen num is
+	 * incremented by 1 every time the log fills up, and allows powerfail
+	 * recovery to easily find the end of the log (it's the first record
+	 * whose gen num field is < curr_delta_log_gen_num.)
+	 */
+	__u32 curr_delta_log_gen_num;
+	/*
+	 * This is the Nba to the start of the current generation of delta
+	 * records in the log.  Since the log is circular, the currentlog
+	 * extends from md_delta_log_first up to and including
+	 * (md_delta_log_first +max_records-2) % max_records) NOTE: when reading
+	 * the delta log, the actual end of the log is indicated by the first
+	 * record whose gen num field is <curr_delta_log_gen_num, so the
+	 * 'max_records-2' guarantees we'll have at least one delta record whose
+	 * gen num field will qualify to mark the end of the log.
+	 */
+	__u32 md_delta_log_first;
+	/*
+	 * How many free frames are used in the Metadata Delta Log. After every
+	 * write of a delta log record that contains at least one
+	 * Md_delta_log_entry, there must always be exactly
+	 */
+
+	__u32 md_delta_log_num_free_frames;
+	__u32 num_dirty_frames; /* Number of dirty frames in cache when this
+				  * isrt_mpb was written.
+				  */
+	__u32 num_dirty_frames_at_mode_trans; /* Number of dirty frames from
+						* the start of the most recent
+						* transition out of Performance
+						* mode (Perf_to_safe/Perf_to_off)
+						*/
+} __attribute__((packed));
+
+
+struct nv_cache_vol_config_md {
+	__u32 acc_vol_orig_family_num; /* Unique Volume Id of the accelerated
+					 * volume caching to the NVC Volume
+					 */
+	__u16 acc_vol_dev_id; /* (original family + dev_id ) if there is no
+				* volume associated with Nv_cache, both of these
+				* fields are 0.
+				*/
+	__u16 nv_cache_mode; /* NV Cache mode of this volume */
+	/*
+	 * The serial_no of the accelerated volume associated with Nv_cache.  If
+	 * there is no volume associated with Nv_cache, acc_vol_name[0] = 0
+	 */
+	char acc_vol_name[MAX_RAID_SERIAL_LEN];
+	__u32 flags;
+	__u32 power_cycle_count; /* Power Cycle Count of the underlying disk or
+				   * volume from the last device enumeration.
+				   */
+	/* Used to determine separation case. */
+	__u32  expansion_space[VOL_CONFIG_RESERVED];
+} __attribute__((packed));
+
+struct nv_cache_config_md_header {
+	char signature[NVC_SIG_LEN]; /* "Intel IMSM NV Cache Cfg. Sig.   " */
+	__u16  version_number; /* NV_CACHE_CFG_MD_VERSION */
+	__u16  header_length; /* Length by bytes */
+	__u32  total_length; /* Length of the entire Config Metadata including
+			       * header and volume(s) in bytes
+			       */
+	/* Elements above here will never change even in new versions */
+	__u16  num_volumes; /* Number of volumes that have config metadata. in
+			      * 9.0 it's either 0 or 1
+			      */
+	__u32 expansion_space[MD_HEADER_RESERVED];
+	struct nv_cache_vol_config_md vol_config_md[MAX_NV_CACHE_VOLS]; /* Array of Volume */
+	/* Config Metadata entries. Contains "num_volumes" */
+	/* entries. In 9.0 'MAX_NV_CACHE_VOLS' = 1. */
+} __attribute__((packed));
+
+struct nv_cache_control_data {
+	struct nv_cache_config_md_header hdr;
+	struct isrt_mpb mpb;
+} __attribute__((packed));
+
+/* One or more sectors in NAND page are bad */
+#define NVC_PACKED_SECTORS_BAD (1 << 0)
+#define NVC_PACKED_DIRTY (1 << 1)
+#define NVC_PACKED_FRAME_TYPE_SHIFT (2)
+/* If set, frame is in clean area of LRU list */
+#define NVC_PACKED_IN_CLEAN_AREA (1 << 5)
+/*
+ * This frame was TRIMMed (OROM shouldn't expect the delta log rebuild to match
+ * the packed metadata stored on a clean shutdown.
+ */
+#define NVC_PACKED_TRIMMED (1 << 6)
+
+struct nv_cache_packed_md {
+	__u32 seg_num; /* Disk Segment currently assigned to frame */
+	__u16 per_sector_validity; /* Per sector validity */
+	__u8 flags;
+	union {
+		__u8 pad;
+		/* repurpose padding for driver state */
+		__u8 locked;
+	};
+} __attribute__((packed));
+
+#define SEGMENTS_PER_PAGE_SHIFT 6
+#define SEGMENTS_PER_PAGE (1 << SEGMENTS_PER_PAGE_SHIFT)
+#define SEGMENTS_PER_PAGE_MASK (SEGMENTS_PER_PAGE-1)
+#define FRAME_SHIFT 4
+#define SECTORS_PER_FRAME (1 << FRAME_SHIFT)
+#define FRAME_MASK (SECTORS_PER_FRAME-1)
+
+#endif /* __ISRT_INTEL_H__ */
diff --git a/super-intel.c b/super-intel.c
index 07e4c68982cd..acc46368322f 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -22,6 +22,7 @@
 #include "mdmon.h"
 #include "sha1.h"
 #include "platform-intel.h"
+#include "isrt-intel.h"
 #include <values.h>
 #include <scsi/sg.h>
 #include <ctype.h>
@@ -39,7 +40,6 @@
 #define MPB_VERSION_CNG "1.2.06"
 #define MPB_VERSION_ATTRIBS "1.3.00"
 #define MAX_SIGNATURE_LENGTH  32
-#define MAX_RAID_SERIAL_LEN   16
 
 /* supports RAID0 */
 #define MPB_ATTRIB_RAID0		__cpu_to_le32(0x00000001)
@@ -179,6 +179,8 @@ struct imsm_dev {
 #define DEV_CLONE_N_GO		__cpu_to_le32(0x400)
 #define DEV_CLONE_MAN_SYNC	__cpu_to_le32(0x800)
 #define DEV_CNG_MASTER_DISK_NUM	__cpu_to_le32(0x1000)
+/* Volume is being used as NvCache for an accelerated volume */
+#define DEV_NVC_VOLUME          __cpu_to_le32(0x4000)
 	__u32 status;	/* Persistent RaidDev status */
 	__u32 reserved_blocks; /* Reserved blocks at beginning of volume */
 	__u8  migr_priority;
@@ -189,8 +191,18 @@ struct imsm_dev {
 	__u8  cng_state;
 	__u8  cng_sub_state;
 	__u16 dev_id;
+	__u8 nv_cache_mode;
+#define DEV_NVC_CLEAN		(0)
+#define DEV_NVC_DIRTY		(1)
+#define DEV_NVC_HEALTH_GOOD     (0 << 1)
+#define DEV_NVC_HEALTH_FAILED	(1 << 1)
+#define DEV_NVC_HEALTH_READONLY	(2 << 1)
+#define DEV_NVC_HEALTH_BACKUP	(3 << 1)
+	__u8 nv_cache_flags;
+	__u32 nvc_orig_family_num; /* Unique Volume Id of the cache */
+	__u16 nvc_dev_id;	   /* volume associated with this volume */
 	__u16 fill;
-	__u32 filler[9];
+	__u32 filler[7];
 	struct imsm_vol vol;
 } __attribute__ ((packed));
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 08/11] imsm: read cache metadata
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (6 preceding siblings ...)
  2014-04-24  7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Add support for identifying cache volumes and retrieving the associated
cache mpb located at the start of the volume marked with a
DEV_NVC_VOLUME flag.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 super-intel.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index acc46368322f..f179d80b8209 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -345,6 +345,7 @@ static unsigned int mpb_sectors(struct imsm_super *mpb)
 struct intel_dev {
 	struct imsm_dev *dev;
 	struct intel_dev *next;
+	struct nv_cache_control_data *nvc;
 	unsigned index;
 };
 
@@ -3086,6 +3087,7 @@ static void free_devlist(struct intel_super *super)
 
 	while (super->devlist) {
 		dv = super->devlist->next;
+		free(super->devlist->nvc);
 		free(super->devlist->dev);
 		free(super->devlist);
 		super->devlist = dv;
@@ -3467,9 +3469,34 @@ static void end_migration(struct imsm_dev *dev, struct intel_super *super,
 }
 #endif
 
-static int parse_raid_devices(struct intel_super *super)
+static int load_cache(int fd, struct intel_dev *dv)
 {
-	int i;
+	struct imsm_dev *dev = dv->dev;
+	struct imsm_map *map = get_imsm_map(dev, MAP_X);
+	off_t offset = pba_of_lba0(map) << 9;
+	ssize_t size = (sizeof(*dv->nvc) + 511) & ~511;
+	int ret;
+
+	if (posix_memalign((void**) &dv->nvc, 512, size) != 0) {
+		pr_err("Failed to allocate cache anchor buffer"
+				" for %.16s\n", dev->volume);
+		return 1;
+	}
+
+	ret = pread(fd, dv->nvc, size, offset);
+	if (ret != size) {
+		pr_err("Failed to read cache metadata for %.16s: %s\n",
+			dev->volume, strerror(errno));
+		free(dv->nvc);
+		dv->nvc = NULL;
+	}
+
+	return ret != size;
+}
+
+static int load_raid_devices(int fd, struct intel_super *super)
+{
+	int i, err;
 	struct imsm_dev *dev_new;
 	size_t len, len_migr;
 	size_t max_len = 0;
@@ -3496,6 +3523,16 @@ static int parse_raid_devices(struct intel_super *super)
 		dv->index = i;
 		dv->next = super->devlist;
 		super->devlist = dv;
+
+		/* volumes that serve as caches have metadata at offset-0 from
+		 * the start of the volume
+		 */
+		if (dv->dev->status & DEV_NVC_VOLUME) {
+			err = load_cache(fd, dv);
+			if (err)
+				return err;
+		} else
+			dv->nvc = NULL;
 	}
 
 	/* ensure that super->buf is large enough when all raid devices
@@ -3718,8 +3755,7 @@ static void clear_hi(struct intel_super *super)
 	}
 }
 
-static int
-load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd)
+static int load_mpb(int fd, struct intel_super *super, char *devname, int keep_fd)
 {
 	int err;
 
@@ -3729,7 +3765,10 @@ load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd
 	err = load_imsm_disk(fd, super, devname, keep_fd);
 	if (err)
 		return err;
-	err = parse_raid_devices(super);
+	err = load_raid_devices(fd, super);
+	if (err)
+		return err;
+
 	clear_hi(super);
 	return err;
 }
@@ -4384,13 +4423,13 @@ static int get_super_block(struct intel_super **super_list, char *devnm, char *d
 	}
 
 	find_intel_hba_capability(dfd, s, devname);
-	err = load_and_parse_mpb(dfd, s, NULL, keep_fd);
+	err = load_mpb(dfd, s, NULL, keep_fd);
 
 	/* retry the load if we might have raced against mdmon */
 	if (err == 3 && devnm && mdmon_running(devnm))
 		for (retry = 0; retry < 3; retry++) {
 			usleep(3000);
-			err = load_and_parse_mpb(dfd, s, NULL, keep_fd);
+			err = load_mpb(dfd, s, NULL, keep_fd);
 			if (err != 3)
 				break;
 		}
@@ -4473,7 +4512,7 @@ static int load_super_imsm(struct supertype *st, int fd, char *devname)
 		free_imsm(super);
 		return 2;
 	}
-	rv = load_and_parse_mpb(fd, super, devname, 0);
+	rv = load_mpb(fd, super, devname, 0);
 
 	if (rv) {
 		if (devname)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 09/11] imsm: examine cache configurations
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (7 preceding siblings ...)
  2014-04-24  7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Allow -E to show the cache associations of volumes.  The UUIDs are
calculated from the stored "orig_family_num" and "dev_id" in the cache
metadata.

For -Eb a UUID is synthesized from the union of the
<cache>:<cache-target> tuple for the purposes of identifying the
complete volume.

The proposed assembly hierarchy is:

1/ (2) containers (one for the cache "array", one for the cache-target
   "array")

2/ (2) subarrays (one for the cache "volume", one for the cache-target
   "volume")

3/ (1) stacked array with the subarray from 2/ as component members

...where "array" and "volume" are the imsm terminology for a mdadm
container and subarray.

TODO: what to do about the name of the composite volume?  Leave it
dynamically assigned for now, we could have it takeover the cache-target
name, but that name is not available when examinig the cache device...

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 isrt-intel.h  |   12 ++++
 super-intel.c |  184 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 172 insertions(+), 24 deletions(-)

diff --git a/isrt-intel.h b/isrt-intel.h
index 50365de1a620..6d7e92f4da37 100644
--- a/isrt-intel.h
+++ b/isrt-intel.h
@@ -43,6 +43,18 @@ enum {
 	NV_CACHE_MODE_DIS_SAFE     = 11, /* volume or NV cache not associated */
 };
 
+static inline int nvc_enabled(__u8 mode)
+{
+	switch (mode) {
+	case NV_CACHE_MODE_OFF:
+	case NV_CACHE_MODE_DIS_PERF:
+	case NV_CACHE_MODE_DIS_SAFE:
+		return 0;
+	default:
+		return 1;
+	}
+}
+
 struct segment_index_pair {
 	__u32 segment;
 	__u32 index;
diff --git a/super-intel.c b/super-intel.c
index f179d80b8209..7a7a48e9e6d7 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -770,18 +770,40 @@ static struct imsm_dev *__get_imsm_dev(struct imsm_super *mpb, __u8 index)
 	return NULL;
 }
 
-static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
+static struct intel_dev *get_intel_dev(struct intel_super *super, __u8 index)
 {
 	struct intel_dev *dv;
 
-	if (index >= super->anchor->num_raid_devs)
-		return NULL;
 	for (dv = super->devlist; dv; dv = dv->next)
 		if (dv->index == index)
-			return dv->dev;
+			return dv;
+	return NULL;
+}
+
+static int is_isrt_leg(struct intel_dev *dv)
+{
+	return dv->nvc || nvc_enabled(dv->dev->nv_cache_mode);
+}
+
+static struct intel_dev *get_isrt_leg(struct intel_super *super, int leg)
+{
+	struct intel_dev *dv;
+
+	for (dv = super->devlist; dv; dv = dv->next)
+		if (!is_isrt_leg(dv))
+			continue;
+		else if (--leg == 0)
+			return dv;
 	return NULL;
 }
 
+static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
+{
+	struct intel_dev *dv = get_intel_dev(super, index);
+
+	return dv ? dv->dev : NULL;
+}
+
 /*
  * for second_map:
  *  == MAP_0 get first map
@@ -1122,20 +1144,112 @@ static int is_gen_migration(struct imsm_dev *dev);
 static __u64 blocks_per_migr_unit(struct intel_super *super,
 				  struct imsm_dev *dev);
 
-static void print_imsm_dev(struct intel_super *super,
-			   struct imsm_dev *dev,
-			   char *uuid,
-			   int disk_idx)
+/* generate the <cache> + <cache_target> (in that order) UUID */
+static int cache_volume_uuid(struct intel_super *super, struct intel_dev *dv, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+	struct imsm_dev *dev = dv->dev;
+
+	if (!is_isrt_leg(dv))
+		return 1;
+
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+	if (dv->nvc) {
+		struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+		/* self id + cache target id */
+		sha1_process_bytes(&super->anchor->orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&dev->dev_id, sizeof(dv->dev->dev_id), &ctx);
+		sha1_process_bytes(&cfg->acc_vol_orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&cfg->acc_vol_dev_id, sizeof(cfg->acc_vol_dev_id), &ctx);
+	} else if (nvc_enabled(dev->nv_cache_mode)) {
+		/* cache id + self id */
+		sha1_process_bytes(&dev->nvc_orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&dev->nvc_dev_id, sizeof(dev->nvc_dev_id), &ctx);
+		sha1_process_bytes(&super->anchor->orig_family_num, sizeof(__u32), &ctx);
+		sha1_process_bytes(&dev->dev_id, sizeof(dev->dev_id), &ctx);
+	}
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+	return 0;
+}
+
+static void cache_target_uuid(struct intel_super *super, struct intel_dev *dv, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+	struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+	sha1_process_bytes(&cfg->acc_vol_orig_family_num, sizeof(__u32), &ctx);
+	sha1_process_bytes(&cfg->acc_vol_dev_id, sizeof(cfg->acc_vol_dev_id), &ctx);
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+}
+
+static void cache_uuid(struct intel_super *super, struct imsm_dev *dev, int uuid[4])
+{
+	char buf[20];
+	struct sha1_ctx ctx;
+
+	sha1_init_ctx(&ctx);
+	sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+	sha1_process_bytes(&dev->nvc_orig_family_num, sizeof(__u32), &ctx);
+	sha1_process_bytes(&dev->nvc_dev_id, sizeof(dev->nvc_dev_id), &ctx);
+	sha1_finish_ctx(&ctx, buf);
+	memcpy(uuid, buf, 4*4);
+}
+
+static void examine_cache(struct intel_super *super, struct intel_dev *dv)
+{
+	int uuid[4];
+	char uuid_str[64];
+	char *cache_role = NULL;
+	struct imsm_dev *dev = dv->dev;
+
+	if (dv->nvc) {
+		cache_role = "cache";
+		cache_target_uuid(super, dv, uuid);
+	}
+	if (nvc_enabled(dev->nv_cache_mode)) {
+		if (cache_role)
+			cache_role = NULL; /* can't have it both ways */
+		else {
+			cache_role = "cache-target";
+			cache_uuid(super, dev, uuid);
+		}
+	}
+	__fname_from_uuid(uuid, 0, uuid_str, ':');
+
+	if (!cache_role)
+		return;
+
+	printf("          Magic : Intel (R) Smart Response Technology\n");
+	printf("     Cache role : %s\n", cache_role);
+	printf("     Cache peer : %s\n", uuid_str + 5);
+	cache_volume_uuid(super, dv, uuid);
+	__fname_from_uuid(uuid, 0, uuid_str, ':');
+	printf("   Cache volume : %s\n", uuid_str + 5);
+}
+
+static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
+			   struct mdinfo *info, int disk_idx)
 {
 	__u64 sz;
+	__u32 ord;
 	int slot, i;
+	char uuid_str[64];
+	struct imsm_dev *dev = dv->dev;
 	struct imsm_map *map = get_imsm_map(dev, MAP_0);
 	struct imsm_map *map2 = get_imsm_map(dev, MAP_1);
-	__u32 ord;
 
 	printf("\n");
 	printf("[%.16s]:\n", dev->volume);
-	printf("           UUID : %s\n", uuid);
+	__fname_from_uuid(info->uuid, 0, uuid_str, ':');
+	printf("           UUID : %s\n", uuid_str + 5);
 	printf("     RAID Level : %d", get_imsm_raid_level(map));
 	if (map2)
 		printf(" <-- %d", get_imsm_raid_level(map2));
@@ -1224,6 +1338,11 @@ static void print_imsm_dev(struct intel_super *super,
 	}
 	printf("\n");
 	printf("    Dirty State : %s\n", dev->vol.dirty ? "dirty" : "clean");
+
+	if (is_isrt_leg(dv)) {
+		printf("\n");
+		examine_cache(super, dv);
+	}
 }
 
 static void print_imsm_disk(struct imsm_disk *disk, int index, __u32 reserved)
@@ -1443,13 +1562,12 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
 		       (unsigned long long) __le64_to_cpu(log->first_spare_lba));
 	}
 	for (i = 0; i < mpb->num_raid_devs; i++) {
+		struct intel_dev *dv = get_intel_dev(super, i);
 		struct mdinfo info;
-		struct imsm_dev *dev = __get_imsm_dev(mpb, i);
 
 		super->current_vol = i;
 		getinfo_super_imsm(st, &info, NULL);
-		fname_from_uuid(st, &info, nbuf, ':');
-		print_imsm_dev(super, dev, nbuf + 5, super->disks->index);
+		print_imsm_dev(super, dv, &info, super->disks->index);
 	}
 	for (i = 0; i < mpb->num_disks; i++) {
 		if (i == super->disks->index)
@@ -1466,7 +1584,6 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
 
 static void brief_examine_super_imsm(struct supertype *st, int verbose)
 {
-	/* We just write a generic IMSM ARRAY entry */
 	struct mdinfo info;
 	char nbuf[64];
 	struct intel_super *super = st->sb;
@@ -1481,14 +1598,28 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
 	printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
 }
 
+static void brief_examine_cache_imsm(struct supertype *st, int cache_leg)
+{
+	int uuid[4];
+	char nbuf[64];
+	struct intel_super *super = st->sb;
+	struct intel_dev *dv = get_isrt_leg(super, cache_leg);
+
+	if (!dv)
+		return;
+
+	cache_volume_uuid(super, dv, uuid);
+	__fname_from_uuid(uuid, 0, nbuf, ':');
+	printf("ARRAY UUID=%s\n", nbuf + 5);
+}
+
 static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
 {
-	/* We just write a generic IMSM ARRAY entry */
-	struct mdinfo info;
+	int i;
 	char nbuf[64];
 	char nbuf1[64];
+	struct mdinfo info;
 	struct intel_super *super = st->sb;
-	int i;
 
 	if (!super->anchor->num_raid_devs)
 		return;
@@ -1496,13 +1627,13 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
 	getinfo_super_imsm(st, &info, NULL);
 	fname_from_uuid(st, &info, nbuf, ':');
 	for (i = 0; i < super->anchor->num_raid_devs; i++) {
-		struct imsm_dev *dev = get_imsm_dev(super, i);
+		struct intel_dev *dv = get_intel_dev(super, i);
 
 		super->current_vol = i;
 		getinfo_super_imsm(st, &info, NULL);
 		fname_from_uuid(st, &info, nbuf1, ':');
 		printf("ARRAY /dev/md/%.16s container=%s member=%d UUID=%s\n",
-		       dev->volume, nbuf + 5, i, nbuf1 + 5);
+		       dv->dev->volume, nbuf + 5, i, nbuf1 + 5);
 	}
 }
 
@@ -2827,12 +2958,12 @@ static struct imsm_disk *get_imsm_missing(struct intel_super *super, __u8 index)
 
 static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map)
 {
-	struct intel_super *super = st->sb;
-	struct imsm_disk *disk;
-	int map_disks = info->array.raid_disks;
-	int max_enough = -1;
 	int i;
+	struct imsm_disk *disk;
 	struct imsm_super *mpb;
+	struct intel_super *super = st->sb;
+	int max_enough = -1, cache_legs = 0;
+	int map_disks = info->array.raid_disks;
 
 	if (super->current_vol >= 0) {
 		getinfo_super_imsm_volume(st, info, map);
@@ -2869,7 +3000,8 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 	mpb = super->anchor;
 
 	for (i = 0; i < mpb->num_raid_devs; i++) {
-		struct imsm_dev *dev = get_imsm_dev(super, i);
+		struct intel_dev *dv = get_intel_dev(super, i);
+		struct imsm_dev *dev = dv->dev;
 		int failed, enough, j, missing = 0;
 		struct imsm_map *map;
 		__u8 state;
@@ -2877,6 +3009,8 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 		failed = imsm_count_failed(super, dev, MAP_0);
 		state = imsm_check_degraded(super, dev, failed, MAP_0);
 		map = get_imsm_map(dev, MAP_0);
+		if (is_isrt_leg(dv))
+			cache_legs++;
 
 		/* any newly missing disks?
 		 * (catches single-degraded vs double-degraded)
@@ -2917,6 +3051,7 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 	}
 	dprintf("%s: enough: %d\n", __func__, max_enough);
 	info->container_enough = max_enough;
+	info->cache_legs = cache_legs;
 
 	if (super->disks) {
 		__u32 reserved = imsm_reserved_sectors(super, super->disks);
@@ -10578,6 +10713,7 @@ struct superswitch super_imsm = {
 	.examine_super	= examine_super_imsm,
 	.brief_examine_super = brief_examine_super_imsm,
 	.brief_examine_subarrays = brief_examine_subarrays_imsm,
+	.brief_examine_cache = brief_examine_cache_imsm,
 	.export_examine_super = export_examine_super_imsm,
 	.detail_super	= detail_super_imsm,
 	.brief_detail_super = brief_detail_super_imsm,


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 10/11] imsm: assemble cache volumes
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (8 preceding siblings ...)
  2014-04-24  7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  2014-04-24  7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Teach load_super to examine the passed in fd and determine if it is a
cache or cache-target md device.

Generate info to allow the two halves of the cache to be assembled.

XXX: what are the rules we need for compare_super to determine stale
cache associations?

Create a LEVEL_ISRT md device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 isrt-intel.h  |    2 +
 maps.c        |    1 
 mdadm.h       |    1 
 super-intel.c |  191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 sysfs.c       |    8 ++
 util.c        |    1 
 6 files changed, 198 insertions(+), 6 deletions(-)

diff --git a/isrt-intel.h b/isrt-intel.h
index 6d7e92f4da37..ea106c5ac02c 100644
--- a/isrt-intel.h
+++ b/isrt-intel.h
@@ -28,6 +28,8 @@ enum {
 	NVC_SIG_LEN = 32,
 	ISRT_DEV_IDX = 0,
 	ISRT_TARGET_DEV_IDX = 1,
+	ISRT_ROLE_CACHE = 0,
+	ISRT_ROLE_TARGET = 1,
 
 	NV_CACHE_MODE_OFF          = 0,
 	NV_CACHE_MODE_OFF_TO_SAFE  = 1, /* powerfail recovery state */
diff --git a/maps.c b/maps.c
index 64f1df2c42c3..28c010fdf9bf 100644
--- a/maps.c
+++ b/maps.c
@@ -93,6 +93,7 @@ mapping_t pers[] = {
 	{ "10", 10},
 	{ "faulty", LEVEL_FAULTY},
 	{ "container", LEVEL_CONTAINER},
+	{ "isrt", LEVEL_ISRT },
 	{ NULL, 0}
 };
 
diff --git a/mdadm.h b/mdadm.h
index 111f90f599af..e613d3866d8b 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1457,6 +1457,7 @@ char *xstrdup(const char *str);
 #define	LEVEL_MULTIPATH		(-4)
 #define	LEVEL_LINEAR		(-1)
 #define	LEVEL_FAULTY		(-5)
+#define	LEVEL_ISRT		(-12)
 
 /* kernel module doesn't know about these */
 #define LEVEL_CONTAINER		(-100)
diff --git a/super-intel.c b/super-intel.c
index 7a7a48e9e6d7..e69d2a044e92 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -379,6 +379,8 @@ struct intel_super {
 	int updates_pending; /* count of pending updates for mdmon */
 	int current_vol; /* index of raid device undergoing creation */
 	unsigned long long create_offset; /* common start for 'current_vol' */
+	int load_cache; /* flag to indicate we are operating on the cache metadata */
+	int cache_dev; /* subarray/volume index of the cache volume */
 	__u32 random; /* random data for seeding new family numbers */
 	struct intel_dev *devlist;
 	struct dl {
@@ -1246,6 +1248,9 @@ static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
 	struct imsm_map *map = get_imsm_map(dev, MAP_0);
 	struct imsm_map *map2 = get_imsm_map(dev, MAP_1);
 
+	if (super->load_cache)
+		examine_cache(super, dv);
+
 	printf("\n");
 	printf("[%.16s]:\n", dev->volume);
 	__fname_from_uuid(info->uuid, 0, uuid_str, ':');
@@ -1339,7 +1344,7 @@ static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
 	printf("\n");
 	printf("    Dirty State : %s\n", dev->vol.dirty ? "dirty" : "clean");
 
-	if (is_isrt_leg(dv)) {
+	if (!super->load_cache) {
 		printf("\n");
 		examine_cache(super, dv);
 	}
@@ -1514,6 +1519,8 @@ static int imsm_check_attributes(__u32 attributes)
 
 #ifndef MDASSEMBLE
 static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map);
+static void getinfo_super_imsm_cache(struct intel_super *super, struct intel_dev *dv,
+				     struct mdinfo *info, char *map);
 
 static void examine_super_imsm(struct supertype *st, char *homehost)
 {
@@ -1527,6 +1534,18 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
 	__u32 reserved = imsm_reserved_sectors(super, super->disks);
 	struct dl *dl;
 
+	if (super->load_cache) {
+		struct intel_dev *dv = get_intel_dev(super, super->cache_dev);
+		struct mdinfo info;
+
+		super->load_cache = 0;
+		super->current_vol = super->cache_dev;
+		getinfo_super_imsm(st, &info, NULL);
+		super->load_cache = 1;
+		print_imsm_dev(super, dv, &info, super->disks->index);
+		return;
+	}
+
 	snprintf(str, MPB_SIG_LEN, "%s", mpb->sig);
 	printf("          Magic : %s\n", str);
 	snprintf(str, strlen(MPB_VERSION_RAID0), "%s", get_imsm_version(mpb));
@@ -1595,21 +1614,24 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
 
 	getinfo_super_imsm(st, &info, NULL);
 	fname_from_uuid(st, &info, nbuf, ':');
-	printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
+	if (super->load_cache)
+		printf("ARRAY UUID=%s\n", nbuf + 5);
+	else
+		printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
 }
 
 static void brief_examine_cache_imsm(struct supertype *st, int cache_leg)
 {
-	int uuid[4];
 	char nbuf[64];
+	struct mdinfo info;
 	struct intel_super *super = st->sb;
 	struct intel_dev *dv = get_isrt_leg(super, cache_leg);
 
 	if (!dv)
 		return;
 
-	cache_volume_uuid(super, dv, uuid);
-	__fname_from_uuid(uuid, 0, nbuf, ':');
+	getinfo_super_imsm_cache(super, dv, &info, NULL);
+	fname_from_uuid(st, &info, nbuf, ':');
 	printf("ARRAY UUID=%s\n", nbuf + 5);
 }
 
@@ -1621,6 +1643,10 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
 	struct mdinfo info;
 	struct intel_super *super = st->sb;
 
+	/* don't re-report container metadata info */
+	if (super->load_cache)
+		return;
+
 	if (!super->anchor->num_raid_devs)
 		return;
 
@@ -2956,6 +2982,71 @@ static struct imsm_disk *get_imsm_missing(struct intel_super *super, __u8 index)
 	return NULL;
 }
 
+static void getinfo_super_imsm_cache(struct intel_super *super, struct intel_dev *dv,
+				     struct mdinfo *info, char *dmap)
+{
+	__u16 nv_cache_mode;
+	int role_failed = 0, role;
+	struct imsm_dev *dev = dv->dev;
+	struct imsm_map *map = get_imsm_map(dev, MAP_X);
+
+	memset(info, 0, sizeof(*info));
+
+	role = dv->nvc ? ISRT_ROLE_CACHE : ISRT_ROLE_TARGET;
+	if (role == ISRT_ROLE_CACHE) {
+		struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+		nv_cache_mode = cfg->nv_cache_mode;
+		info->events = 0;
+	} else {
+		nv_cache_mode = dev->nv_cache_mode;
+		info->events = 1; /* make Assemble choose the cache target */
+	}
+
+	if (map->map_state == IMSM_T_STATE_FAILED ||
+	    nv_cache_mode == NV_CACHE_MODE_IS_FAILING ||
+	    nv_cache_mode == NV_CACHE_MODE_HAS_FAILED)
+		role_failed = 1;
+
+	info->array.raid_disks    = 2;
+	info->array.level         = LEVEL_ISRT;
+	info->array.layout        = 0;
+	info->array.md_minor      = -1;
+	info->array.ctime         = 0;
+	info->array.utime         = 0;
+	info->array.chunk_size    = 0;
+
+	info->disk.major = 0;
+	info->disk.minor = 0;
+	info->disk.raid_disk = role;
+	info->reshape_active = 0;
+	info->array.major_version = -1;
+	info->array.minor_version = -2;
+	strcpy(info->text_version, "isrt");
+	info->safe_mode_delay = 0;
+	info->disk.number = role;
+	info->name[0] = 0;
+	info->recovery_start = MaxSector;
+	info->data_offset = 0;
+	info->custom_array_size = __le32_to_cpu(dev->size_high);
+	info->custom_array_size <<= 32;
+	info->custom_array_size |= __le32_to_cpu(dev->size_low);
+	info->component_size = info->custom_array_size;
+
+	if (role_failed)
+		info->disk.state = (1 << MD_DISK_FAULTY);
+	else
+		info->disk.state = (1 << MD_DISK_ACTIVE) | (1 << MD_DISK_SYNC);
+	cache_volume_uuid(super, dv, info->uuid);
+
+	if (dmap) {
+		/* we can only report self-state */
+		dmap[!role] = 1;
+		dmap[role] = !role_failed;
+	}
+}
+
+
 static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map)
 {
 	int i;
@@ -2965,6 +3056,20 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 	int max_enough = -1, cache_legs = 0;
 	int map_disks = info->array.raid_disks;
 
+	if (super->load_cache || st->cache_leg) {
+		struct intel_dev *dv;
+
+		if (st->cache_leg) {
+			dv = get_isrt_leg(super, st->cache_leg);
+			if (!dv)
+				return;
+		} else
+			dv = get_intel_dev(super, super->cache_dev);
+
+		getinfo_super_imsm_cache(super, dv, info, map);
+		return;
+	}
+
 	if (super->current_vol >= 0) {
 		getinfo_super_imsm_volume(st, info, map);
 		return;
@@ -3266,6 +3371,27 @@ static int compare_super_imsm(struct supertype *st, struct supertype *tst)
 		}
 	}
 
+	/* cache configuration metadata lives on member arrays, as long
+	 * as they mutually agree on the volume-uuid then consider them a match
+	 * XXX: sufficient? we do have the failure checks in
+	 * getinfo_super_cache() to mitigate
+	 */
+	if (first->load_cache != sec->load_cache)
+		return 3;
+	else if (first->load_cache) {
+		struct intel_dev *first_dv, *sec_dv;
+		int first_uuid[4], sec_uuid[4];
+
+		first_dv = get_intel_dev(first, first->cache_dev);
+		sec_dv = get_intel_dev(sec, sec->cache_dev);
+		cache_volume_uuid(first, first_dv, first_uuid);
+		cache_volume_uuid(sec, sec_dv, sec_uuid);
+		if (memcmp(first_uuid, sec_uuid, sizeof(first_uuid)))
+			return 3;
+		else
+			return 0;
+	}
+
 	/* if an anchor does not have num_raid_devs set then it is a free
 	 * floating spare
 	 */
@@ -4621,6 +4747,52 @@ static int load_container_imsm(struct supertype *st, int fd, char *devname)
 {
 	return load_super_imsm_all(st, fd, &st->sb, devname, NULL, 1);
 }
+
+static int load_super_cache(struct supertype *st, int fd, char *devname)
+{
+	struct mdinfo *sra = sysfs_read(fd, 0, GET_VERSION);
+	char *subarray, *devnm, *ep;
+	int cfd, cache_dev, err = 1;
+	struct intel_super *super;
+	struct intel_dev *dv;
+
+	if (sra && sra->array.major_version == -1 &&
+	    is_subarray(sra->text_version))
+		/* pass */;
+	else
+		goto out;
+
+	/* modify sra->text_version in place */
+	ep = strchr(sra->text_version+1, '/');
+	*ep = '\0';
+	devnm = sra->text_version+1;
+	subarray = ep+1;
+
+	cfd = open_dev(devnm);
+	if (cfd < 0)
+		goto out;
+
+	err = load_container_imsm(st, cfd, devname);
+	close(cfd);
+	if (err)
+		goto out;
+
+	super = st->sb;
+	cache_dev = strtoul(subarray, &ep, 10);
+	/* validate this volume is a cache or cache-target */
+	if (*ep != '\0' || !(dv = get_intel_dev(super, cache_dev))
+	    || !is_isrt_leg(dv)) {
+		free_super_imsm(st);
+		err = 2;
+		goto out;
+	}
+
+	super->load_cache = 1;
+	super->cache_dev = cache_dev;
+ out:
+	sysfs_free(sra);
+	return err;
+}
 #endif
 
 static int load_super_imsm(struct supertype *st, int fd, char *devname)
@@ -4634,6 +4806,15 @@ static int load_super_imsm(struct supertype *st, int fd, char *devname)
 
 	free_super_imsm(st);
 
+#ifndef MDASSEMBLE
+	/* check if this is a component leg of a cache array and load
+	 * the cache metadata from the parent container
+	 */
+	rv = load_super_cache(st, fd, devname);
+	if (rv == 0)
+		return rv;
+#endif
+
 	super = alloc_super();
 	/* Load hba and capabilities if they exist.
 	 * But do not preclude loading metadata in case capabilities or hba are
diff --git a/sysfs.c b/sysfs.c
index 4cbd4e5d051b..898edde49392 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -638,7 +638,13 @@ int sysfs_set_array(struct mdinfo *info, int vers)
 	rv |= sysfs_set_num(info, NULL, "raid_disks", raid_disks);
 	rv |= sysfs_set_num(info, NULL, "chunk_size", info->array.chunk_size);
 	rv |= sysfs_set_num(info, NULL, "layout", info->array.layout);
-	rv |= sysfs_set_num(info, NULL, "component_size", info->component_size/2);
+	if (info->array.level == LEVEL_ISRT) {
+		/* FIXME: how do we support asymmetric component sizes for
+		 * external metadata?
+		 */
+		rv |= sysfs_set_num(info, NULL, "component_size", 0);
+	} else
+		rv |= sysfs_set_num(info, NULL, "component_size", info->component_size/2);
 	if (info->custom_array_size) {
 		int rc;
 
diff --git a/util.c b/util.c
index 93f9200fa4c7..c9c4dec0fac1 100644
--- a/util.c
+++ b/util.c
@@ -362,6 +362,7 @@ int enough(int level, int raid_disks, int layout, int clean, char *avail)
 
 	case LEVEL_MULTIPATH:
 		return avail_disks>= 1;
+	case LEVEL_ISRT:
 	case LEVEL_LINEAR:
 	case 0:
 		return avail_disks == raid_disks;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays
  2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
                   ` (9 preceding siblings ...)
  2014-04-24  7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
@ 2014-04-24  7:23 ` Dan Williams
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24  7:23 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang

Turn on attribute support for caching.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 super-intel.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index e69d2a044e92..10c38b248ce6 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -81,7 +81,8 @@
 					MPB_ATTRIB_RAID1           | \
 					MPB_ATTRIB_RAID10          | \
 					MPB_ATTRIB_RAID5           | \
-					MPB_ATTRIB_EXP_STRIPE_SIZE)
+					MPB_ATTRIB_EXP_STRIPE_SIZE | \
+					MPB_ATTRIB_NVM)
 
 /* Define attributes that are unused but not harmful */
 #define MPB_ATTRIB_IGNORED		(MPB_ATTRIB_NEVER_USE)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-24  7:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-24  7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
2014-04-24  7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
2014-04-24  7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).