* [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly
@ 2014-04-24 7:22 Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
As mentioned in the kernel enabling [1] these cache volumes augment the
existing imsm model in that cache legs each imply a container. In the
standard cache configuration two single-drive-raid0 volumes (from
separate containers) are associated into a cached volume. The diagram
below attempts to make this clearer.
+-----------------------+ +----------------------+
| sda | SSD | sdb | HDD
| +-------------------+ | | +------------------+ |
| | /dev/md/imsm0 | | Container0 | | /dev/md/imsm1 | | Container1
| | +---------------+ | | | | +--------------+ | |
| | | /dev/md/vol0 | | | RAID Volume0 | | | /dev/md/vol1 | | | RAID Volume1
| | | +---------+ | | | | | | +----------+ | | |
| | | |SRT Cache| | | | | | | |SRT Target| | | |
+-+-+--+----+----+--+-+-+ +-+-+-+----+-----+-+-+-+
| |
| |
| HDD Cached by SSD |
| +--------------+ |
+-----------+ /dev/md/isrt +----------+
+--------------+
In support of the standard mdadm volume discovery model a uuid a
synthesized from the combination of the two container-family-numbers and
immutable volume-ids. Examine_brief is modified to aggregate cache legs
across containers.
Create support is not included, but existing volumes can be
auto-assembled:
mdadm -Ebs > conf
mdadm -Asc conf
To facilitate testing the patches are also available on github, but note
that this branch will rebase according to review feedback.
git://github.com/djbw/mdadm isrt
[1]: http://marc.info/?l=linux-raid&m=139832034826379&w=2
---
Dan Williams (11):
sysfs: fix sysfs_set_array() to accept valid negative array levels
make must_be_container() more selective
Assemble: show the uuid in the verbose case
Assemble: teardown partially assembled arrays
Examine: support for coalescing "cache legs"
imsm: immutable volume id
imsm: cache metadata definitions
imsm: read cache metadata
imsm: examine cache configurations
imsm: assemble cache volumes
imsm: support cache enabled arrays
Assemble.c | 27 ++-
Examine.c | 92 +++++++++-
Makefile | 2
isrt-intel.h | 270 ++++++++++++++++++++++++++++++
maps.c | 1
mdadm.h | 4
super-intel.c | 516 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
sysfs.c | 13 +
util.c | 18 +-
9 files changed, 865 insertions(+), 78 deletions(-)
create mode 100644 isrt-intel.h
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
From: Dan Williams <dan.j.william@intel.com>
Assume this FIXME was to prevent loading a personality for containers.
Fix it up to accept the values that correlate with the actual md kernel
personalities.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
sysfs.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/sysfs.c b/sysfs.c
index 9a1d856960e8..4cbd4e5d051b 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -628,8 +628,9 @@ int sysfs_set_array(struct mdinfo *info, int vers)
return 1;
}
}
- if (info->array.level < 0)
- return 0; /* FIXME */
+ /* containers have no personality, they're rather bland */
+ if (info->array.level <= LEVEL_CONTAINER)
+ return 0;
rv |= sysfs_set_str(info, NULL, "level",
map_num(pers, info->array.level));
if (info->reshape_active && info->delta_disks != UnSet)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 02/11] make must_be_container() more selective
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Cache configurations in mid-assembly may appear to be a "container" as
they are 0-sized and external. Teach must_be_container() to look for
"external:<metadata name>" as the container identfier from
<sysfs>/md/metadata_version.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
util.c | 17 +++++++++--------
1 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/util.c b/util.c
index afb2bb110f24..93f9200fa4c7 100644
--- a/util.c
+++ b/util.c
@@ -1176,18 +1176,19 @@ int get_dev_size(int fd, char *dname, unsigned long long *sizep)
}
/* Return true if this can only be a container, not a member device.
- * i.e. is and md device and size is zero
+ * i.e. is and md device and the text_version matches an external
+ * metadata format
*/
int must_be_container(int fd)
{
- unsigned long long size;
- if (md_get_version(fd) < 0)
+ struct mdinfo *sra = sysfs_read(fd, NULL, GET_VERSION);
+ struct superswitch *ss;
+
+ if (!sra)
return 0;
- if (get_dev_size(fd, NULL, &size) == 0)
- return 1;
- if (size == 0)
- return 1;
- return 0;
+ ss = version_to_superswitch(sra->text_version);
+ sysfs_free(sra);
+ return ss ? ss->external : 0;
}
/* Sets endofpart parameter to the last block used by the last GPT partition on the device.
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Make the verbose output more usable when the array name is not given.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Assemble.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/Assemble.c b/Assemble.c
index 05ace561fb50..a72d427f4773 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1288,9 +1288,21 @@ try_again:
*/
if (!st && ident->st)
st = ident->st;
- if (c->verbose>0)
- pr_err("looking for devices for %s\n",
- mddev ? mddev : "further assembly");
+
+ if (c->verbose > 0) {
+ char uuid[64], *id;
+
+ if (mddev)
+ id = mddev;
+ else if (ident->uuid_set) {
+ __fname_from_uuid(ident->uuid,
+ st ? st->ss->swapuuid : 0,
+ uuid, ':');
+ id = uuid + 5;
+ } else
+ id = "further assembly";
+ pr_err("looking for devices for %s\n", id);
+ }
content = &info;
if (st)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (2 preceding siblings ...)
2014-04-24 7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
In the scenario of assembling composite md arrays, e.g. /dev/md2 with
/dev/md1 and /dev/md0 as components, we want /dev/md2 assembly to be
delayed until the components are available. If we attempt to assemble
/dev/md2 when only /dev/md0 is available /dev/md2 will not be fully
initialized, and /dev/md1 will be assembled to an equally defunct
/dev/md3.
So teardown the early /dev/md2 on the expectation that more devices will
arrive in a later assembly pass.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Assemble.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/Assemble.c b/Assemble.c
index a72d427f4773..9a2399ef6411 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1147,16 +1147,17 @@ static int start_array(int mdfd,
if (sparecnt)
fprintf(stderr, " and %d spare%s", sparecnt, sparecnt==1?"":"s");
if (!enough(content->array.level, content->array.raid_disks,
- content->array.layout, 1, avail))
+ content->array.layout, 1, avail)) {
fprintf(stderr, " - not enough to start the array.\n");
- else if (!enough(content->array.level,
+ ioctl(mdfd, STOP_ARRAY, NULL);
+ } else if (!enough(content->array.level,
content->array.raid_disks,
content->array.layout, clean,
- avail))
+ avail)) {
fprintf(stderr, " - not enough to start the "
"array while not clean - consider "
"--force.\n");
- else {
+ } else {
if (req_cnt == (unsigned)content->array.raid_disks)
fprintf(stderr, " - need all %d to start it", req_cnt);
else
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs"
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (3 preceding siblings ...)
2014-04-24 7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
'isrt' volumes introduce a new category of a array. They have
components that are subarrays from separate containers (likely thanks to
the constraint that all active members in an imsm container must be
members of all subarrays). We want '-Eb' to identify the composite
volume uuid, but the default coalescing (by container) results in
duplicated output of the cache volume uuid.
Instead, introduce infrastructure to handle this directly.
1/ add ->cache_legs to struct mdinfo to indicate how many subarrays in a given
container are components (legs) of a cache association.
2/ add ->cache_leg to struct supertype to indicate a cache leg to
enumerate via ->getinfo_super()
3/ teach Examine to coalesce cache volumes across containers by uuid and
dump their details via ->brief_examine_cache() extension to struct
superswitch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Examine.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------
mdadm.h | 3 ++
2 files changed, 83 insertions(+), 12 deletions(-)
diff --git a/Examine.c b/Examine.c
index 953b8eee2360..945af8454a5f 100644
--- a/Examine.c
+++ b/Examine.c
@@ -30,6 +30,73 @@
#endif
#include "md_u.h"
#include "md_p.h"
+
+struct array {
+ struct supertype *st;
+ struct mdinfo info;
+ void *devs;
+ struct array *next;
+ int spares;
+ int cache_leg;
+};
+
+static struct array *add_cache_legs(struct array *caches, struct supertype *st,
+ struct mdinfo *info, struct array *arrays)
+{
+ struct mdinfo cache_info;
+ struct array *ap;
+ int i;
+
+ for (i = 1; i <= info->cache_legs; i++) {
+ /* in the case where the cache leg is assembled its uuid
+ * may appear in the arrays list, so we need to check
+ * both the caches list and the arrays list for
+ * duplicates
+ */
+ struct array *lists[] = { caches, arrays };
+ int j;
+
+ st->cache_leg = i;
+ st->ss->getinfo_super(st, &cache_info, NULL);
+ st->cache_leg = 0;
+ for (j = 0; j < 2; j++) {
+ for (ap = lists[j]; ap; ap = ap->next) {
+ if (st->ss == ap->st->ss
+ && same_uuid(ap->info.uuid, cache_info.uuid,
+ st->ss->swapuuid))
+ break;
+ }
+ if (ap)
+ break;
+ }
+ if (!ap) {
+ ap = xcalloc(1, sizeof(*ap));
+ ap->devs = dl_head();
+ ap->next = caches;
+ ap->st = st;
+ ap->cache_leg = i;
+ caches = ap;
+ memcpy(&ap->info, &cache_info, sizeof(cache_info));
+ }
+ }
+
+ return caches;
+}
+
+static void free_arrays(struct array *arrays)
+{
+ struct array *ap;
+
+ while (arrays) {
+ ap = arrays;
+ arrays = ap->next;
+
+ ap->st->ss->free_super(ap->st);
+ free(ap);
+ }
+}
+
+
int Examine(struct mddev_dev *devlist,
struct context *c,
struct supertype *forcest)
@@ -54,14 +121,7 @@ int Examine(struct mddev_dev *devlist,
int fd;
int rv = 0;
int err = 0;
-
- struct array {
- struct supertype *st;
- struct mdinfo info;
- void *devs;
- struct array *next;
- int spares;
- } *arrays = NULL;
+ struct array *arrays = NULL, *caches = NULL;
for (; devlist ; devlist = devlist->next) {
struct supertype *st;
@@ -131,13 +191,14 @@ int Examine(struct mddev_dev *devlist,
break;
}
if (!ap) {
- ap = xmalloc(sizeof(*ap));
+ ap = xcalloc(1, sizeof(*ap));
ap->devs = dl_head();
ap->next = arrays;
- ap->spares = 0;
ap->st = st;
arrays = ap;
st->ss->getinfo_super(st, &ap->info, NULL);
+ caches = add_cache_legs(caches, st, &ap->info,
+ arrays);
} else
st->ss->getinfo_super(st, &ap->info, NULL);
if (!have_container &&
@@ -179,11 +240,18 @@ int Examine(struct mddev_dev *devlist,
printf("\n");
ap->st->ss->brief_examine_subarrays(ap->st, c->verbose);
}
- ap->st->ss->free_super(ap->st);
- /* FIXME free ap */
if (ap->spares || c->verbose > 0)
printf("\n");
}
+ /* list container caches after their parent containers
+ * and subarrays
+ */
+ for (ap = caches; ap; ap = ap->next)
+ if (ap->st->ss->brief_examine_cache)
+ ap->st->ss->brief_examine_cache(ap->st, ap->cache_leg);
+ free_arrays(arrays);
+ free_arrays(caches);
+
}
return rv;
}
diff --git a/mdadm.h b/mdadm.h
index f6a614e19316..111f90f599af 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -233,6 +233,7 @@ struct mdinfo {
int container_enough; /* flag external handlers can set to
* indicate that subarrays have not enough (-1),
* enough to start (0), or all expected disks (1) */
+ int cache_legs; /* number of cross-container cache members in this 'array' */
char sys_name[20];
struct mdinfo *devs;
struct mdinfo *next;
@@ -684,6 +685,7 @@ extern struct superswitch {
void (*examine_super)(struct supertype *st, char *homehost);
void (*brief_examine_super)(struct supertype *st, int verbose);
void (*brief_examine_subarrays)(struct supertype *st, int verbose);
+ void (*brief_examine_cache)(struct supertype *st, int cache);
void (*export_examine_super)(struct supertype *st);
int (*examine_badblocks)(struct supertype *st, int fd, char *devname);
int (*copy_metadata)(struct supertype *st, int from, int to);
@@ -1006,6 +1008,7 @@ struct supertype {
Used when examining metadata to display content of disk
when user has no hw/firmare compatible system.
*/
+ int cache_leg; /* hack to interrogate cache legs within containers */
struct metadata_update *updates;
struct metadata_update **update_tail;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 06/11] imsm: immutable volume id
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (4 preceding siblings ...)
2014-04-24 7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Support the new extensions to have a volume-id that is immutable and
unique for the life of the container. Prior to this change deleting and
recreating a volume would result in it having the same uuid as the
previous volume in that position. Now, every time a volume is created a
container generation count is incremented allowing the volume-ids to
include container generation salt.
TODO update kill_subarray_imsm() and update_subarray_imsm() to allow
deletion and renaming (respectively) of arrays with an immutable id.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
super-intel.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 65 insertions(+), 12 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index f0a7ab5ccc7a..07e4c68982cd 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -188,8 +188,9 @@ struct imsm_dev {
__u16 cache_policy;
__u8 cng_state;
__u8 cng_sub_state;
-#define IMSM_DEV_FILLERS 10
- __u32 filler[IMSM_DEV_FILLERS];
+ __u16 dev_id;
+ __u16 fill;
+ __u32 filler[9];
struct imsm_vol vol;
} __attribute__ ((packed));
@@ -209,8 +210,9 @@ struct imsm_super {
__u32 orig_family_num; /* 0x40 - 0x43 original family num */
__u32 pwr_cycle_count; /* 0x44 - 0x47 simulated power cycle count for array */
__u32 bbm_log_size; /* 0x48 - 0x4B - size of bad Block Mgmt Log in bytes */
-#define IMSM_FILLERS 35
- __u32 filler[IMSM_FILLERS]; /* 0x4C - 0xD7 RAID_MPB_FILLERS */
+ __u16 create_events; /* counter for generating unique ids */
+ __u16 fill1;
+ __u32 filler[34];
struct imsm_disk disk[1]; /* 0xD8 diskTbl[numDisks] */
/* here comes imsm_dev[num_raid_devs] */
/* here comes BBM logs */
@@ -1984,6 +1986,30 @@ static int match_home_imsm(struct supertype *st, char *homehost)
return -1;
}
+static void volume_uuid_from_super(struct intel_super *super, struct sha1_ctx *ctx)
+{
+ struct imsm_dev *dev = NULL;
+
+ if (super->current_vol >= 0)
+ dev = get_imsm_dev(super, super->current_vol);
+
+ if (!dev)
+ return;
+
+ /* if the container is tracking creation events then dev_id is
+ * valid and we can advertise an immutable uuid, otherwise use the
+ * old volume-position/name method
+ */
+ if (super->anchor->create_events) {
+ sha1_process_bytes(&dev->dev_id, sizeof(dev->dev_id), ctx);
+ } else {
+ __u32 vol = super->current_vol;
+
+ sha1_process_bytes(&vol, sizeof(vol), ctx);
+ sha1_process_bytes(dev->volume, MAX_RAID_SERIAL_LEN, ctx);
+ }
+}
+
static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
{
/* The uuid returned here is used for:
@@ -2012,7 +2038,6 @@ static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
char buf[20];
struct sha1_ctx ctx;
- struct imsm_dev *dev = NULL;
__u32 family_num;
/* some mdadm versions failed to set ->orig_family_num, in which
@@ -2025,13 +2050,7 @@ static void uuid_from_super_imsm(struct supertype *st, int uuid[4])
sha1_init_ctx(&ctx);
sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
sha1_process_bytes(&family_num, sizeof(__u32), &ctx);
- if (super->current_vol >= 0)
- dev = get_imsm_dev(super, super->current_vol);
- if (dev) {
- __u32 vol = super->current_vol;
- sha1_process_bytes(&vol, sizeof(vol), &ctx);
- sha1_process_bytes(dev->volume, MAX_RAID_SERIAL_LEN, &ctx);
- }
+ volume_uuid_from_super(super, &ctx);
sha1_finish_ctx(&ctx, buf);
memcpy(uuid, buf, 4*4);
}
@@ -4562,6 +4581,39 @@ static int check_name(struct intel_super *super, char *name, int quiet)
return !reason;
}
+static void new_dev_id(struct intel_super *super, struct imsm_dev *new_dev)
+{
+ struct imsm_super *mpb = super->anchor;
+ __u16 create_events, i;
+
+ /* only turn on create_events tracking for newly born
+ * containers, lest we change the uuid of live volumes (see
+ * volume_uuid_from_super())
+ */
+ create_events = __le16_to_cpu(mpb->create_events);
+ if (super->current_vol > 0 && !create_events)
+ return;
+
+ /* catch the case of create_events wrapping to an existing id in
+ * the mpb
+ */
+ do {
+ create_events++;
+ /* wrap to 1, because zero means no dev_id support */
+ if (create_events == 0)
+ create_events = 1;
+ for (i = 0; i < mpb->num_raid_devs; i++) {
+ struct imsm_dev *dev = __get_imsm_dev(mpb, i);
+
+ if (dev->dev_id == __cpu_to_le16(create_events))
+ break;
+ }
+ } while (i < mpb->num_raid_devs);
+
+ new_dev->dev_id = __cpu_to_le16(create_events);
+ mpb->create_events = __cpu_to_le16(create_events);
+}
+
static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
unsigned long long size, char *name,
char *homehost, int *uuid,
@@ -4655,6 +4707,7 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
return 0;
dv = xmalloc(sizeof(*dv));
dev = xcalloc(1, sizeof(*dev) + sizeof(__u32) * (info->raid_disks - 1));
+ new_dev_id(super, dev);
strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_LEN);
array_blocks = calc_array_size(info->level, info->raid_disks,
info->layout, info->chunk_size,
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 07/11] imsm: cache metadata definitions
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (5 preceding siblings ...)
2014-04-24 7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
@ 2014-04-24 7:22 ` Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:22 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Makefile | 2
isrt-intel.h | 256 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
super-intel.c | 16 +++-
3 files changed, 271 insertions(+), 3 deletions(-)
create mode 100644 isrt-intel.h
diff --git a/Makefile b/Makefile
index b823d85f89e3..7d50df69a744 100644
--- a/Makefile
+++ b/Makefile
@@ -127,7 +127,7 @@ CHECK_OBJS = restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
SRCS = $(patsubst %.o,%.c,$(OBJS))
-INCL = mdadm.h part.h bitmap.h
+INCL = mdadm.h part.h bitmap.h isrt-intel.h platform-intel.h
MON_OBJS = mdmon.o monitor.o managemon.o util.o maps.o mdstat.o sysfs.o \
policy.o lib.o \
diff --git a/isrt-intel.h b/isrt-intel.h
new file mode 100644
index 000000000000..50365de1a620
--- /dev/null
+++ b/isrt-intel.h
@@ -0,0 +1,256 @@
+/*
+ * mdadm - Intel(R) Smart Response Technology Support
+ *
+ * Copyright (C) 2011-2014 Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+#ifndef __ISRT_INTEL_H__
+#define __ISRT_INTEL_H__
+
+enum {
+ /* for a given cache device how many volumes can be associated */
+ MAX_NV_CACHE_VOLS = 1,
+ /* likely should be dynamically configurable when this driver is
+ * made more generic
+ */
+ ISRT_FRAME_SIZE = 8192,
+ VOL_CONFIG_RESERVED = 32,
+ MD_HEADER_RESERVED = 32,
+ MAX_RAID_SERIAL_LEN = 16,
+ NVC_SIG_LEN = 32,
+ ISRT_DEV_IDX = 0,
+ ISRT_TARGET_DEV_IDX = 1,
+
+ NV_CACHE_MODE_OFF = 0,
+ NV_CACHE_MODE_OFF_TO_SAFE = 1, /* powerfail recovery state */
+ NV_CACHE_MODE_OFF_TO_PERF = 2, /* powerfail recovery state */
+ NV_CACHE_MODE_SAFE = 3,
+ NV_CACHE_MODE_SAFE_TO_OFF = 4,
+ NV_CACHE_MODE_PERF = 5,
+ NV_CACHE_MODE_PERF_TO_OFF = 6,
+ NV_CACHE_MODE_PERF_TO_SAFE = 7,
+ NV_CACHE_MODE_IS_FAILING = 8,
+ NV_CACHE_MODE_HAS_FAILED = 9,
+ NV_CACHE_MODE_DIS_PERF = 10, /* caching on volume or nv cache disabled */
+ NV_CACHE_MODE_DIS_SAFE = 11, /* volume or NV cache not associated */
+};
+
+struct segment_index_pair {
+ __u32 segment;
+ __u32 index;
+};
+
+#define NV_CACHE_CONFIG_SIG "Intel IMSM NV Cache Cfg. Sig. "
+#define MAX_NVC_SIZE_GB 128UL /* Max NvCache we can support is 128GB */
+#define NVC_FRAME_SIZE 8192UL
+#define NVC_FRAME_SIZE_IN_KB (NVC_FRAME_SIZE / 1024UL) /* 8 */
+#define NVC_FRAMES_PER_GB (1024UL * (1024UL / NVC_FRAME_SIZE_IN_KB)) /* 128k */
+#define MAX_NVC_FRAMES (MAX_NVC_SIZE_GB * NVC_FRAMES_PER_GB) /* 16m */
+#define SEGIDX_PAIRS_PER_NVC_FRAME (NVC_FRAME_SIZE / sizeof(struct segment_index_pair)) /* 1k */
+#define SEGHEAP_SEGS_PER_NVC_FRAME (NVC_FRAME_SIZE / sizeof(__u32)) /* 2k */
+#define FRAMES_PER_SEGHEAP_FRAME (SEGIDX_PAIRS_PER_NVC_FRAME \
+ * SEGHEAP_SEGS_PER_NVC_FRAME) /* 2m */
+#define MAX_SEGHEAP_NVC_FRAMES (MAX_NVC_FRAMES/FRAMES_PER_SEGHEAP_FRAME) /* 8 */
+#define MAX_SEGHEAP_TOC_ENTRIES (MAX_SEGHEAP_NVC_FRAMES + 1)
+
+
+/* XXX: size of enum guarantees? */
+enum nvc_shutdown_state {
+ ShutdownStateNormal,
+ ShutdownStateS4CrashDmpStart,
+ ShutdownStateS4CrashDmpEnd,
+ ShutdownStateS4CrashDmpFailed
+};
+
+struct isrt_mpb {
+ /*
+ * Metadata array (packed_md0_nba or packed_md1_nba). is the base for
+ * the Metadata Delta Log changes. The current contents of the Metadata
+ * Delta Log applied to this packed metadata base becomes the working
+ * packed metadata upon recovery from a power failure. The alternate
+ * packed metadata array, indicated by (md_base_for_delta_log ^1) is
+ * where the next complete write of packed metadata from DRAM will be
+ * written. On a clean shutdown, packed metadata will also be written to
+ * the alternate array.
+ */
+ __u32 packed_md0_nba; /* Start of primary packed metadata array */
+ __u32 packed_md1_nba; /* Start of secondary packed metadata array */
+ __u32 md_base_for_delta_log; /* 0 or 1. Indicates which packed */
+ __u32 packed_md_size; /* Size of packed metadata array in bytes */
+ __u32 aux_packed_md_nba; /* Start of array of extra metadata for driver use */
+ __u32 aux_packed_md_size; /* Size of array of extra metadata for driver use */
+ __u32 cache_frame0_nba; /* Start of actual cache frames */
+ __u32 seg_num_index_nba; /* Start of the Seg_num_index array */
+ __u32 seg_num_heap_nba; /* Start of the Seg_num_heap */
+ __u32 seg_num_heap_size; /* Size of the Seg_num Heap in bytes (always a */
+ /*
+ * Multiple of NVM_PAGE_SIZE bytes. The Seg_nums in the tail of the last
+ * page are all set to 0xFFFFFFFF
+ */
+ __u32 seg_heap_toc[MAX_SEGHEAP_TOC_ENTRIES];
+ __u32 md_delta_log_nba; /* Start of the Metadata Delta Log region */
+ /* The Delta Log is a circular buffer */
+ __u32 md_delta_log_max_size; /* Size of the Metadata Delta Log region in bytes */
+ __u32 orom_frames_to_sync_nba; /* Start of the orom_frames_to_sync record */
+ __u32 num_cache_frames; /* Total number of cache frames */
+ __u32 cache_frame_size; /* Size of each cache frame in bytes */
+ __u32 lba_alignment; /* Offset to add to host I/O request LBA before
+ * shifting to form the segment number
+ */
+ __u32 valid_frame_gen_num; /* Valid cache frame generation number */
+ /*
+ * If the cache frame metadata contains a smaller generation number,
+ * that frame's contents are considered invalid.
+ */
+ __u32 packed_md_frame_gen_num; /* Packed metadata frame generation number */
+ /*
+ * This is the frame generation number associated with all frames in the
+ * packed metadata array. If this is < valid_frame_gen_num, then all
+ * frames in packed metadata are considered invalid.
+ */
+ __u32 curr_clean_batch_num; /* Initialized to 0, incremented whenever
+ * the cache goes clean. If this value is
+ * greater than the Nv_cache_metadata
+ * dirty_batch_num in the atomic metadata
+ * of the cache frame, the frame is
+ * considered clean.
+ */
+ __u32 total_used_sectors; /* Total number of NVM sectors of size
+ * NVM_SECTOR_SIZE used by cache frames and
+ * metadata.
+ */
+ /* OROM I/O Log fields */
+ __u32 orom_log_nba; /* OROM I/O Log area for next boot */
+ __u32 orom_log_size; /* OROM I/O Log size in 512-byte blocks */
+
+ /* Hibernate/Crashdump Extent_log */
+ __u32 s4_crash_dmp_extent_log_nba; /* I/O Extent Log area created by the */
+ /* hibernate/crashdump driver for OROM */
+ /* Driver shutdown state utilized by the OROM */
+ enum nvc_shutdown_state driver_shutdown_state;
+
+ __u32 validity_bits;
+ __u64 nvc_hdr_array_in_dram;
+
+ /* The following fields are used in managing the Metadata Delta Log. */
+
+ /*
+ * Every delta record in the Metadata Delta Log has a copy of the value
+ * of this field at the time the record was written. This gen num is
+ * incremented by 1 every time the log fills up, and allows powerfail
+ * recovery to easily find the end of the log (it's the first record
+ * whose gen num field is < curr_delta_log_gen_num.)
+ */
+ __u32 curr_delta_log_gen_num;
+ /*
+ * This is the Nba to the start of the current generation of delta
+ * records in the log. Since the log is circular, the currentlog
+ * extends from md_delta_log_first up to and including
+ * (md_delta_log_first +max_records-2) % max_records) NOTE: when reading
+ * the delta log, the actual end of the log is indicated by the first
+ * record whose gen num field is <curr_delta_log_gen_num, so the
+ * 'max_records-2' guarantees we'll have at least one delta record whose
+ * gen num field will qualify to mark the end of the log.
+ */
+ __u32 md_delta_log_first;
+ /*
+ * How many free frames are used in the Metadata Delta Log. After every
+ * write of a delta log record that contains at least one
+ * Md_delta_log_entry, there must always be exactly
+ */
+
+ __u32 md_delta_log_num_free_frames;
+ __u32 num_dirty_frames; /* Number of dirty frames in cache when this
+ * isrt_mpb was written.
+ */
+ __u32 num_dirty_frames_at_mode_trans; /* Number of dirty frames from
+ * the start of the most recent
+ * transition out of Performance
+ * mode (Perf_to_safe/Perf_to_off)
+ */
+} __attribute__((packed));
+
+
+struct nv_cache_vol_config_md {
+ __u32 acc_vol_orig_family_num; /* Unique Volume Id of the accelerated
+ * volume caching to the NVC Volume
+ */
+ __u16 acc_vol_dev_id; /* (original family + dev_id ) if there is no
+ * volume associated with Nv_cache, both of these
+ * fields are 0.
+ */
+ __u16 nv_cache_mode; /* NV Cache mode of this volume */
+ /*
+ * The serial_no of the accelerated volume associated with Nv_cache. If
+ * there is no volume associated with Nv_cache, acc_vol_name[0] = 0
+ */
+ char acc_vol_name[MAX_RAID_SERIAL_LEN];
+ __u32 flags;
+ __u32 power_cycle_count; /* Power Cycle Count of the underlying disk or
+ * volume from the last device enumeration.
+ */
+ /* Used to determine separation case. */
+ __u32 expansion_space[VOL_CONFIG_RESERVED];
+} __attribute__((packed));
+
+struct nv_cache_config_md_header {
+ char signature[NVC_SIG_LEN]; /* "Intel IMSM NV Cache Cfg. Sig. " */
+ __u16 version_number; /* NV_CACHE_CFG_MD_VERSION */
+ __u16 header_length; /* Length by bytes */
+ __u32 total_length; /* Length of the entire Config Metadata including
+ * header and volume(s) in bytes
+ */
+ /* Elements above here will never change even in new versions */
+ __u16 num_volumes; /* Number of volumes that have config metadata. in
+ * 9.0 it's either 0 or 1
+ */
+ __u32 expansion_space[MD_HEADER_RESERVED];
+ struct nv_cache_vol_config_md vol_config_md[MAX_NV_CACHE_VOLS]; /* Array of Volume */
+ /* Config Metadata entries. Contains "num_volumes" */
+ /* entries. In 9.0 'MAX_NV_CACHE_VOLS' = 1. */
+} __attribute__((packed));
+
+struct nv_cache_control_data {
+ struct nv_cache_config_md_header hdr;
+ struct isrt_mpb mpb;
+} __attribute__((packed));
+
+/* One or more sectors in NAND page are bad */
+#define NVC_PACKED_SECTORS_BAD (1 << 0)
+#define NVC_PACKED_DIRTY (1 << 1)
+#define NVC_PACKED_FRAME_TYPE_SHIFT (2)
+/* If set, frame is in clean area of LRU list */
+#define NVC_PACKED_IN_CLEAN_AREA (1 << 5)
+/*
+ * This frame was TRIMMed (OROM shouldn't expect the delta log rebuild to match
+ * the packed metadata stored on a clean shutdown.
+ */
+#define NVC_PACKED_TRIMMED (1 << 6)
+
+struct nv_cache_packed_md {
+ __u32 seg_num; /* Disk Segment currently assigned to frame */
+ __u16 per_sector_validity; /* Per sector validity */
+ __u8 flags;
+ union {
+ __u8 pad;
+ /* repurpose padding for driver state */
+ __u8 locked;
+ };
+} __attribute__((packed));
+
+#define SEGMENTS_PER_PAGE_SHIFT 6
+#define SEGMENTS_PER_PAGE (1 << SEGMENTS_PER_PAGE_SHIFT)
+#define SEGMENTS_PER_PAGE_MASK (SEGMENTS_PER_PAGE-1)
+#define FRAME_SHIFT 4
+#define SECTORS_PER_FRAME (1 << FRAME_SHIFT)
+#define FRAME_MASK (SECTORS_PER_FRAME-1)
+
+#endif /* __ISRT_INTEL_H__ */
diff --git a/super-intel.c b/super-intel.c
index 07e4c68982cd..acc46368322f 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -22,6 +22,7 @@
#include "mdmon.h"
#include "sha1.h"
#include "platform-intel.h"
+#include "isrt-intel.h"
#include <values.h>
#include <scsi/sg.h>
#include <ctype.h>
@@ -39,7 +40,6 @@
#define MPB_VERSION_CNG "1.2.06"
#define MPB_VERSION_ATTRIBS "1.3.00"
#define MAX_SIGNATURE_LENGTH 32
-#define MAX_RAID_SERIAL_LEN 16
/* supports RAID0 */
#define MPB_ATTRIB_RAID0 __cpu_to_le32(0x00000001)
@@ -179,6 +179,8 @@ struct imsm_dev {
#define DEV_CLONE_N_GO __cpu_to_le32(0x400)
#define DEV_CLONE_MAN_SYNC __cpu_to_le32(0x800)
#define DEV_CNG_MASTER_DISK_NUM __cpu_to_le32(0x1000)
+/* Volume is being used as NvCache for an accelerated volume */
+#define DEV_NVC_VOLUME __cpu_to_le32(0x4000)
__u32 status; /* Persistent RaidDev status */
__u32 reserved_blocks; /* Reserved blocks at beginning of volume */
__u8 migr_priority;
@@ -189,8 +191,18 @@ struct imsm_dev {
__u8 cng_state;
__u8 cng_sub_state;
__u16 dev_id;
+ __u8 nv_cache_mode;
+#define DEV_NVC_CLEAN (0)
+#define DEV_NVC_DIRTY (1)
+#define DEV_NVC_HEALTH_GOOD (0 << 1)
+#define DEV_NVC_HEALTH_FAILED (1 << 1)
+#define DEV_NVC_HEALTH_READONLY (2 << 1)
+#define DEV_NVC_HEALTH_BACKUP (3 << 1)
+ __u8 nv_cache_flags;
+ __u32 nvc_orig_family_num; /* Unique Volume Id of the cache */
+ __u16 nvc_dev_id; /* volume associated with this volume */
__u16 fill;
- __u32 filler[9];
+ __u32 filler[7];
struct imsm_vol vol;
} __attribute__ ((packed));
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 08/11] imsm: read cache metadata
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (6 preceding siblings ...)
2014-04-24 7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
@ 2014-04-24 7:23 ` Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:23 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Add support for identifying cache volumes and retrieving the associated
cache mpb located at the start of the volume marked with a
DEV_NVC_VOLUME flag.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
super-intel.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 47 insertions(+), 8 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index acc46368322f..f179d80b8209 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -345,6 +345,7 @@ static unsigned int mpb_sectors(struct imsm_super *mpb)
struct intel_dev {
struct imsm_dev *dev;
struct intel_dev *next;
+ struct nv_cache_control_data *nvc;
unsigned index;
};
@@ -3086,6 +3087,7 @@ static void free_devlist(struct intel_super *super)
while (super->devlist) {
dv = super->devlist->next;
+ free(super->devlist->nvc);
free(super->devlist->dev);
free(super->devlist);
super->devlist = dv;
@@ -3467,9 +3469,34 @@ static void end_migration(struct imsm_dev *dev, struct intel_super *super,
}
#endif
-static int parse_raid_devices(struct intel_super *super)
+static int load_cache(int fd, struct intel_dev *dv)
{
- int i;
+ struct imsm_dev *dev = dv->dev;
+ struct imsm_map *map = get_imsm_map(dev, MAP_X);
+ off_t offset = pba_of_lba0(map) << 9;
+ ssize_t size = (sizeof(*dv->nvc) + 511) & ~511;
+ int ret;
+
+ if (posix_memalign((void**) &dv->nvc, 512, size) != 0) {
+ pr_err("Failed to allocate cache anchor buffer"
+ " for %.16s\n", dev->volume);
+ return 1;
+ }
+
+ ret = pread(fd, dv->nvc, size, offset);
+ if (ret != size) {
+ pr_err("Failed to read cache metadata for %.16s: %s\n",
+ dev->volume, strerror(errno));
+ free(dv->nvc);
+ dv->nvc = NULL;
+ }
+
+ return ret != size;
+}
+
+static int load_raid_devices(int fd, struct intel_super *super)
+{
+ int i, err;
struct imsm_dev *dev_new;
size_t len, len_migr;
size_t max_len = 0;
@@ -3496,6 +3523,16 @@ static int parse_raid_devices(struct intel_super *super)
dv->index = i;
dv->next = super->devlist;
super->devlist = dv;
+
+ /* volumes that serve as caches have metadata at offset-0 from
+ * the start of the volume
+ */
+ if (dv->dev->status & DEV_NVC_VOLUME) {
+ err = load_cache(fd, dv);
+ if (err)
+ return err;
+ } else
+ dv->nvc = NULL;
}
/* ensure that super->buf is large enough when all raid devices
@@ -3718,8 +3755,7 @@ static void clear_hi(struct intel_super *super)
}
}
-static int
-load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd)
+static int load_mpb(int fd, struct intel_super *super, char *devname, int keep_fd)
{
int err;
@@ -3729,7 +3765,10 @@ load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd
err = load_imsm_disk(fd, super, devname, keep_fd);
if (err)
return err;
- err = parse_raid_devices(super);
+ err = load_raid_devices(fd, super);
+ if (err)
+ return err;
+
clear_hi(super);
return err;
}
@@ -4384,13 +4423,13 @@ static int get_super_block(struct intel_super **super_list, char *devnm, char *d
}
find_intel_hba_capability(dfd, s, devname);
- err = load_and_parse_mpb(dfd, s, NULL, keep_fd);
+ err = load_mpb(dfd, s, NULL, keep_fd);
/* retry the load if we might have raced against mdmon */
if (err == 3 && devnm && mdmon_running(devnm))
for (retry = 0; retry < 3; retry++) {
usleep(3000);
- err = load_and_parse_mpb(dfd, s, NULL, keep_fd);
+ err = load_mpb(dfd, s, NULL, keep_fd);
if (err != 3)
break;
}
@@ -4473,7 +4512,7 @@ static int load_super_imsm(struct supertype *st, int fd, char *devname)
free_imsm(super);
return 2;
}
- rv = load_and_parse_mpb(fd, super, devname, 0);
+ rv = load_mpb(fd, super, devname, 0);
if (rv) {
if (devname)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 09/11] imsm: examine cache configurations
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (7 preceding siblings ...)
2014-04-24 7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
@ 2014-04-24 7:23 ` Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:23 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Allow -E to show the cache associations of volumes. The UUIDs are
calculated from the stored "orig_family_num" and "dev_id" in the cache
metadata.
For -Eb a UUID is synthesized from the union of the
<cache>:<cache-target> tuple for the purposes of identifying the
complete volume.
The proposed assembly hierarchy is:
1/ (2) containers (one for the cache "array", one for the cache-target
"array")
2/ (2) subarrays (one for the cache "volume", one for the cache-target
"volume")
3/ (1) stacked array with the subarray from 2/ as component members
...where "array" and "volume" are the imsm terminology for a mdadm
container and subarray.
TODO: what to do about the name of the composite volume? Leave it
dynamically assigned for now, we could have it takeover the cache-target
name, but that name is not available when examinig the cache device...
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
isrt-intel.h | 12 ++++
super-intel.c | 184 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 172 insertions(+), 24 deletions(-)
diff --git a/isrt-intel.h b/isrt-intel.h
index 50365de1a620..6d7e92f4da37 100644
--- a/isrt-intel.h
+++ b/isrt-intel.h
@@ -43,6 +43,18 @@ enum {
NV_CACHE_MODE_DIS_SAFE = 11, /* volume or NV cache not associated */
};
+static inline int nvc_enabled(__u8 mode)
+{
+ switch (mode) {
+ case NV_CACHE_MODE_OFF:
+ case NV_CACHE_MODE_DIS_PERF:
+ case NV_CACHE_MODE_DIS_SAFE:
+ return 0;
+ default:
+ return 1;
+ }
+}
+
struct segment_index_pair {
__u32 segment;
__u32 index;
diff --git a/super-intel.c b/super-intel.c
index f179d80b8209..7a7a48e9e6d7 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -770,18 +770,40 @@ static struct imsm_dev *__get_imsm_dev(struct imsm_super *mpb, __u8 index)
return NULL;
}
-static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
+static struct intel_dev *get_intel_dev(struct intel_super *super, __u8 index)
{
struct intel_dev *dv;
- if (index >= super->anchor->num_raid_devs)
- return NULL;
for (dv = super->devlist; dv; dv = dv->next)
if (dv->index == index)
- return dv->dev;
+ return dv;
+ return NULL;
+}
+
+static int is_isrt_leg(struct intel_dev *dv)
+{
+ return dv->nvc || nvc_enabled(dv->dev->nv_cache_mode);
+}
+
+static struct intel_dev *get_isrt_leg(struct intel_super *super, int leg)
+{
+ struct intel_dev *dv;
+
+ for (dv = super->devlist; dv; dv = dv->next)
+ if (!is_isrt_leg(dv))
+ continue;
+ else if (--leg == 0)
+ return dv;
return NULL;
}
+static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
+{
+ struct intel_dev *dv = get_intel_dev(super, index);
+
+ return dv ? dv->dev : NULL;
+}
+
/*
* for second_map:
* == MAP_0 get first map
@@ -1122,20 +1144,112 @@ static int is_gen_migration(struct imsm_dev *dev);
static __u64 blocks_per_migr_unit(struct intel_super *super,
struct imsm_dev *dev);
-static void print_imsm_dev(struct intel_super *super,
- struct imsm_dev *dev,
- char *uuid,
- int disk_idx)
+/* generate the <cache> + <cache_target> (in that order) UUID */
+static int cache_volume_uuid(struct intel_super *super, struct intel_dev *dv, int uuid[4])
+{
+ char buf[20];
+ struct sha1_ctx ctx;
+ struct imsm_dev *dev = dv->dev;
+
+ if (!is_isrt_leg(dv))
+ return 1;
+
+ sha1_init_ctx(&ctx);
+ sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+ if (dv->nvc) {
+ struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+ /* self id + cache target id */
+ sha1_process_bytes(&super->anchor->orig_family_num, sizeof(__u32), &ctx);
+ sha1_process_bytes(&dev->dev_id, sizeof(dv->dev->dev_id), &ctx);
+ sha1_process_bytes(&cfg->acc_vol_orig_family_num, sizeof(__u32), &ctx);
+ sha1_process_bytes(&cfg->acc_vol_dev_id, sizeof(cfg->acc_vol_dev_id), &ctx);
+ } else if (nvc_enabled(dev->nv_cache_mode)) {
+ /* cache id + self id */
+ sha1_process_bytes(&dev->nvc_orig_family_num, sizeof(__u32), &ctx);
+ sha1_process_bytes(&dev->nvc_dev_id, sizeof(dev->nvc_dev_id), &ctx);
+ sha1_process_bytes(&super->anchor->orig_family_num, sizeof(__u32), &ctx);
+ sha1_process_bytes(&dev->dev_id, sizeof(dev->dev_id), &ctx);
+ }
+ sha1_finish_ctx(&ctx, buf);
+ memcpy(uuid, buf, 4*4);
+ return 0;
+}
+
+static void cache_target_uuid(struct intel_super *super, struct intel_dev *dv, int uuid[4])
+{
+ char buf[20];
+ struct sha1_ctx ctx;
+ struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+ sha1_init_ctx(&ctx);
+ sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+ sha1_process_bytes(&cfg->acc_vol_orig_family_num, sizeof(__u32), &ctx);
+ sha1_process_bytes(&cfg->acc_vol_dev_id, sizeof(cfg->acc_vol_dev_id), &ctx);
+ sha1_finish_ctx(&ctx, buf);
+ memcpy(uuid, buf, 4*4);
+}
+
+static void cache_uuid(struct intel_super *super, struct imsm_dev *dev, int uuid[4])
+{
+ char buf[20];
+ struct sha1_ctx ctx;
+
+ sha1_init_ctx(&ctx);
+ sha1_process_bytes(super->anchor->sig, MPB_SIG_LEN, &ctx);
+ sha1_process_bytes(&dev->nvc_orig_family_num, sizeof(__u32), &ctx);
+ sha1_process_bytes(&dev->nvc_dev_id, sizeof(dev->nvc_dev_id), &ctx);
+ sha1_finish_ctx(&ctx, buf);
+ memcpy(uuid, buf, 4*4);
+}
+
+static void examine_cache(struct intel_super *super, struct intel_dev *dv)
+{
+ int uuid[4];
+ char uuid_str[64];
+ char *cache_role = NULL;
+ struct imsm_dev *dev = dv->dev;
+
+ if (dv->nvc) {
+ cache_role = "cache";
+ cache_target_uuid(super, dv, uuid);
+ }
+ if (nvc_enabled(dev->nv_cache_mode)) {
+ if (cache_role)
+ cache_role = NULL; /* can't have it both ways */
+ else {
+ cache_role = "cache-target";
+ cache_uuid(super, dev, uuid);
+ }
+ }
+ __fname_from_uuid(uuid, 0, uuid_str, ':');
+
+ if (!cache_role)
+ return;
+
+ printf(" Magic : Intel (R) Smart Response Technology\n");
+ printf(" Cache role : %s\n", cache_role);
+ printf(" Cache peer : %s\n", uuid_str + 5);
+ cache_volume_uuid(super, dv, uuid);
+ __fname_from_uuid(uuid, 0, uuid_str, ':');
+ printf(" Cache volume : %s\n", uuid_str + 5);
+}
+
+static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
+ struct mdinfo *info, int disk_idx)
{
__u64 sz;
+ __u32 ord;
int slot, i;
+ char uuid_str[64];
+ struct imsm_dev *dev = dv->dev;
struct imsm_map *map = get_imsm_map(dev, MAP_0);
struct imsm_map *map2 = get_imsm_map(dev, MAP_1);
- __u32 ord;
printf("\n");
printf("[%.16s]:\n", dev->volume);
- printf(" UUID : %s\n", uuid);
+ __fname_from_uuid(info->uuid, 0, uuid_str, ':');
+ printf(" UUID : %s\n", uuid_str + 5);
printf(" RAID Level : %d", get_imsm_raid_level(map));
if (map2)
printf(" <-- %d", get_imsm_raid_level(map2));
@@ -1224,6 +1338,11 @@ static void print_imsm_dev(struct intel_super *super,
}
printf("\n");
printf(" Dirty State : %s\n", dev->vol.dirty ? "dirty" : "clean");
+
+ if (is_isrt_leg(dv)) {
+ printf("\n");
+ examine_cache(super, dv);
+ }
}
static void print_imsm_disk(struct imsm_disk *disk, int index, __u32 reserved)
@@ -1443,13 +1562,12 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
(unsigned long long) __le64_to_cpu(log->first_spare_lba));
}
for (i = 0; i < mpb->num_raid_devs; i++) {
+ struct intel_dev *dv = get_intel_dev(super, i);
struct mdinfo info;
- struct imsm_dev *dev = __get_imsm_dev(mpb, i);
super->current_vol = i;
getinfo_super_imsm(st, &info, NULL);
- fname_from_uuid(st, &info, nbuf, ':');
- print_imsm_dev(super, dev, nbuf + 5, super->disks->index);
+ print_imsm_dev(super, dv, &info, super->disks->index);
}
for (i = 0; i < mpb->num_disks; i++) {
if (i == super->disks->index)
@@ -1466,7 +1584,6 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
static void brief_examine_super_imsm(struct supertype *st, int verbose)
{
- /* We just write a generic IMSM ARRAY entry */
struct mdinfo info;
char nbuf[64];
struct intel_super *super = st->sb;
@@ -1481,14 +1598,28 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
}
+static void brief_examine_cache_imsm(struct supertype *st, int cache_leg)
+{
+ int uuid[4];
+ char nbuf[64];
+ struct intel_super *super = st->sb;
+ struct intel_dev *dv = get_isrt_leg(super, cache_leg);
+
+ if (!dv)
+ return;
+
+ cache_volume_uuid(super, dv, uuid);
+ __fname_from_uuid(uuid, 0, nbuf, ':');
+ printf("ARRAY UUID=%s\n", nbuf + 5);
+}
+
static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
{
- /* We just write a generic IMSM ARRAY entry */
- struct mdinfo info;
+ int i;
char nbuf[64];
char nbuf1[64];
+ struct mdinfo info;
struct intel_super *super = st->sb;
- int i;
if (!super->anchor->num_raid_devs)
return;
@@ -1496,13 +1627,13 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
for (i = 0; i < super->anchor->num_raid_devs; i++) {
- struct imsm_dev *dev = get_imsm_dev(super, i);
+ struct intel_dev *dv = get_intel_dev(super, i);
super->current_vol = i;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf1, ':');
printf("ARRAY /dev/md/%.16s container=%s member=%d UUID=%s\n",
- dev->volume, nbuf + 5, i, nbuf1 + 5);
+ dv->dev->volume, nbuf + 5, i, nbuf1 + 5);
}
}
@@ -2827,12 +2958,12 @@ static struct imsm_disk *get_imsm_missing(struct intel_super *super, __u8 index)
static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map)
{
- struct intel_super *super = st->sb;
- struct imsm_disk *disk;
- int map_disks = info->array.raid_disks;
- int max_enough = -1;
int i;
+ struct imsm_disk *disk;
struct imsm_super *mpb;
+ struct intel_super *super = st->sb;
+ int max_enough = -1, cache_legs = 0;
+ int map_disks = info->array.raid_disks;
if (super->current_vol >= 0) {
getinfo_super_imsm_volume(st, info, map);
@@ -2869,7 +3000,8 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
mpb = super->anchor;
for (i = 0; i < mpb->num_raid_devs; i++) {
- struct imsm_dev *dev = get_imsm_dev(super, i);
+ struct intel_dev *dv = get_intel_dev(super, i);
+ struct imsm_dev *dev = dv->dev;
int failed, enough, j, missing = 0;
struct imsm_map *map;
__u8 state;
@@ -2877,6 +3009,8 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
failed = imsm_count_failed(super, dev, MAP_0);
state = imsm_check_degraded(super, dev, failed, MAP_0);
map = get_imsm_map(dev, MAP_0);
+ if (is_isrt_leg(dv))
+ cache_legs++;
/* any newly missing disks?
* (catches single-degraded vs double-degraded)
@@ -2917,6 +3051,7 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
}
dprintf("%s: enough: %d\n", __func__, max_enough);
info->container_enough = max_enough;
+ info->cache_legs = cache_legs;
if (super->disks) {
__u32 reserved = imsm_reserved_sectors(super, super->disks);
@@ -10578,6 +10713,7 @@ struct superswitch super_imsm = {
.examine_super = examine_super_imsm,
.brief_examine_super = brief_examine_super_imsm,
.brief_examine_subarrays = brief_examine_subarrays_imsm,
+ .brief_examine_cache = brief_examine_cache_imsm,
.export_examine_super = export_examine_super_imsm,
.detail_super = detail_super_imsm,
.brief_detail_super = brief_detail_super_imsm,
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 10/11] imsm: assemble cache volumes
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (8 preceding siblings ...)
2014-04-24 7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
@ 2014-04-24 7:23 ` Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:23 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Teach load_super to examine the passed in fd and determine if it is a
cache or cache-target md device.
Generate info to allow the two halves of the cache to be assembled.
XXX: what are the rules we need for compare_super to determine stale
cache associations?
Create a LEVEL_ISRT md device.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
isrt-intel.h | 2 +
maps.c | 1
mdadm.h | 1
super-intel.c | 191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
sysfs.c | 8 ++
util.c | 1
6 files changed, 198 insertions(+), 6 deletions(-)
diff --git a/isrt-intel.h b/isrt-intel.h
index 6d7e92f4da37..ea106c5ac02c 100644
--- a/isrt-intel.h
+++ b/isrt-intel.h
@@ -28,6 +28,8 @@ enum {
NVC_SIG_LEN = 32,
ISRT_DEV_IDX = 0,
ISRT_TARGET_DEV_IDX = 1,
+ ISRT_ROLE_CACHE = 0,
+ ISRT_ROLE_TARGET = 1,
NV_CACHE_MODE_OFF = 0,
NV_CACHE_MODE_OFF_TO_SAFE = 1, /* powerfail recovery state */
diff --git a/maps.c b/maps.c
index 64f1df2c42c3..28c010fdf9bf 100644
--- a/maps.c
+++ b/maps.c
@@ -93,6 +93,7 @@ mapping_t pers[] = {
{ "10", 10},
{ "faulty", LEVEL_FAULTY},
{ "container", LEVEL_CONTAINER},
+ { "isrt", LEVEL_ISRT },
{ NULL, 0}
};
diff --git a/mdadm.h b/mdadm.h
index 111f90f599af..e613d3866d8b 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1457,6 +1457,7 @@ char *xstrdup(const char *str);
#define LEVEL_MULTIPATH (-4)
#define LEVEL_LINEAR (-1)
#define LEVEL_FAULTY (-5)
+#define LEVEL_ISRT (-12)
/* kernel module doesn't know about these */
#define LEVEL_CONTAINER (-100)
diff --git a/super-intel.c b/super-intel.c
index 7a7a48e9e6d7..e69d2a044e92 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -379,6 +379,8 @@ struct intel_super {
int updates_pending; /* count of pending updates for mdmon */
int current_vol; /* index of raid device undergoing creation */
unsigned long long create_offset; /* common start for 'current_vol' */
+ int load_cache; /* flag to indicate we are operating on the cache metadata */
+ int cache_dev; /* subarray/volume index of the cache volume */
__u32 random; /* random data for seeding new family numbers */
struct intel_dev *devlist;
struct dl {
@@ -1246,6 +1248,9 @@ static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
struct imsm_map *map = get_imsm_map(dev, MAP_0);
struct imsm_map *map2 = get_imsm_map(dev, MAP_1);
+ if (super->load_cache)
+ examine_cache(super, dv);
+
printf("\n");
printf("[%.16s]:\n", dev->volume);
__fname_from_uuid(info->uuid, 0, uuid_str, ':');
@@ -1339,7 +1344,7 @@ static void print_imsm_dev(struct intel_super *super, struct intel_dev *dv,
printf("\n");
printf(" Dirty State : %s\n", dev->vol.dirty ? "dirty" : "clean");
- if (is_isrt_leg(dv)) {
+ if (!super->load_cache) {
printf("\n");
examine_cache(super, dv);
}
@@ -1514,6 +1519,8 @@ static int imsm_check_attributes(__u32 attributes)
#ifndef MDASSEMBLE
static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map);
+static void getinfo_super_imsm_cache(struct intel_super *super, struct intel_dev *dv,
+ struct mdinfo *info, char *map);
static void examine_super_imsm(struct supertype *st, char *homehost)
{
@@ -1527,6 +1534,18 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
__u32 reserved = imsm_reserved_sectors(super, super->disks);
struct dl *dl;
+ if (super->load_cache) {
+ struct intel_dev *dv = get_intel_dev(super, super->cache_dev);
+ struct mdinfo info;
+
+ super->load_cache = 0;
+ super->current_vol = super->cache_dev;
+ getinfo_super_imsm(st, &info, NULL);
+ super->load_cache = 1;
+ print_imsm_dev(super, dv, &info, super->disks->index);
+ return;
+ }
+
snprintf(str, MPB_SIG_LEN, "%s", mpb->sig);
printf(" Magic : %s\n", str);
snprintf(str, strlen(MPB_VERSION_RAID0), "%s", get_imsm_version(mpb));
@@ -1595,21 +1614,24 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
- printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
+ if (super->load_cache)
+ printf("ARRAY UUID=%s\n", nbuf + 5);
+ else
+ printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
}
static void brief_examine_cache_imsm(struct supertype *st, int cache_leg)
{
- int uuid[4];
char nbuf[64];
+ struct mdinfo info;
struct intel_super *super = st->sb;
struct intel_dev *dv = get_isrt_leg(super, cache_leg);
if (!dv)
return;
- cache_volume_uuid(super, dv, uuid);
- __fname_from_uuid(uuid, 0, nbuf, ':');
+ getinfo_super_imsm_cache(super, dv, &info, NULL);
+ fname_from_uuid(st, &info, nbuf, ':');
printf("ARRAY UUID=%s\n", nbuf + 5);
}
@@ -1621,6 +1643,10 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
struct mdinfo info;
struct intel_super *super = st->sb;
+ /* don't re-report container metadata info */
+ if (super->load_cache)
+ return;
+
if (!super->anchor->num_raid_devs)
return;
@@ -2956,6 +2982,71 @@ static struct imsm_disk *get_imsm_missing(struct intel_super *super, __u8 index)
return NULL;
}
+static void getinfo_super_imsm_cache(struct intel_super *super, struct intel_dev *dv,
+ struct mdinfo *info, char *dmap)
+{
+ __u16 nv_cache_mode;
+ int role_failed = 0, role;
+ struct imsm_dev *dev = dv->dev;
+ struct imsm_map *map = get_imsm_map(dev, MAP_X);
+
+ memset(info, 0, sizeof(*info));
+
+ role = dv->nvc ? ISRT_ROLE_CACHE : ISRT_ROLE_TARGET;
+ if (role == ISRT_ROLE_CACHE) {
+ struct nv_cache_vol_config_md *cfg = &dv->nvc->hdr.vol_config_md[0];
+
+ nv_cache_mode = cfg->nv_cache_mode;
+ info->events = 0;
+ } else {
+ nv_cache_mode = dev->nv_cache_mode;
+ info->events = 1; /* make Assemble choose the cache target */
+ }
+
+ if (map->map_state == IMSM_T_STATE_FAILED ||
+ nv_cache_mode == NV_CACHE_MODE_IS_FAILING ||
+ nv_cache_mode == NV_CACHE_MODE_HAS_FAILED)
+ role_failed = 1;
+
+ info->array.raid_disks = 2;
+ info->array.level = LEVEL_ISRT;
+ info->array.layout = 0;
+ info->array.md_minor = -1;
+ info->array.ctime = 0;
+ info->array.utime = 0;
+ info->array.chunk_size = 0;
+
+ info->disk.major = 0;
+ info->disk.minor = 0;
+ info->disk.raid_disk = role;
+ info->reshape_active = 0;
+ info->array.major_version = -1;
+ info->array.minor_version = -2;
+ strcpy(info->text_version, "isrt");
+ info->safe_mode_delay = 0;
+ info->disk.number = role;
+ info->name[0] = 0;
+ info->recovery_start = MaxSector;
+ info->data_offset = 0;
+ info->custom_array_size = __le32_to_cpu(dev->size_high);
+ info->custom_array_size <<= 32;
+ info->custom_array_size |= __le32_to_cpu(dev->size_low);
+ info->component_size = info->custom_array_size;
+
+ if (role_failed)
+ info->disk.state = (1 << MD_DISK_FAULTY);
+ else
+ info->disk.state = (1 << MD_DISK_ACTIVE) | (1 << MD_DISK_SYNC);
+ cache_volume_uuid(super, dv, info->uuid);
+
+ if (dmap) {
+ /* we can only report self-state */
+ dmap[!role] = 1;
+ dmap[role] = !role_failed;
+ }
+}
+
+
static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *map)
{
int i;
@@ -2965,6 +3056,20 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
int max_enough = -1, cache_legs = 0;
int map_disks = info->array.raid_disks;
+ if (super->load_cache || st->cache_leg) {
+ struct intel_dev *dv;
+
+ if (st->cache_leg) {
+ dv = get_isrt_leg(super, st->cache_leg);
+ if (!dv)
+ return;
+ } else
+ dv = get_intel_dev(super, super->cache_dev);
+
+ getinfo_super_imsm_cache(super, dv, info, map);
+ return;
+ }
+
if (super->current_vol >= 0) {
getinfo_super_imsm_volume(st, info, map);
return;
@@ -3266,6 +3371,27 @@ static int compare_super_imsm(struct supertype *st, struct supertype *tst)
}
}
+ /* cache configuration metadata lives on member arrays, as long
+ * as they mutually agree on the volume-uuid then consider them a match
+ * XXX: sufficient? we do have the failure checks in
+ * getinfo_super_cache() to mitigate
+ */
+ if (first->load_cache != sec->load_cache)
+ return 3;
+ else if (first->load_cache) {
+ struct intel_dev *first_dv, *sec_dv;
+ int first_uuid[4], sec_uuid[4];
+
+ first_dv = get_intel_dev(first, first->cache_dev);
+ sec_dv = get_intel_dev(sec, sec->cache_dev);
+ cache_volume_uuid(first, first_dv, first_uuid);
+ cache_volume_uuid(sec, sec_dv, sec_uuid);
+ if (memcmp(first_uuid, sec_uuid, sizeof(first_uuid)))
+ return 3;
+ else
+ return 0;
+ }
+
/* if an anchor does not have num_raid_devs set then it is a free
* floating spare
*/
@@ -4621,6 +4747,52 @@ static int load_container_imsm(struct supertype *st, int fd, char *devname)
{
return load_super_imsm_all(st, fd, &st->sb, devname, NULL, 1);
}
+
+static int load_super_cache(struct supertype *st, int fd, char *devname)
+{
+ struct mdinfo *sra = sysfs_read(fd, 0, GET_VERSION);
+ char *subarray, *devnm, *ep;
+ int cfd, cache_dev, err = 1;
+ struct intel_super *super;
+ struct intel_dev *dv;
+
+ if (sra && sra->array.major_version == -1 &&
+ is_subarray(sra->text_version))
+ /* pass */;
+ else
+ goto out;
+
+ /* modify sra->text_version in place */
+ ep = strchr(sra->text_version+1, '/');
+ *ep = '\0';
+ devnm = sra->text_version+1;
+ subarray = ep+1;
+
+ cfd = open_dev(devnm);
+ if (cfd < 0)
+ goto out;
+
+ err = load_container_imsm(st, cfd, devname);
+ close(cfd);
+ if (err)
+ goto out;
+
+ super = st->sb;
+ cache_dev = strtoul(subarray, &ep, 10);
+ /* validate this volume is a cache or cache-target */
+ if (*ep != '\0' || !(dv = get_intel_dev(super, cache_dev))
+ || !is_isrt_leg(dv)) {
+ free_super_imsm(st);
+ err = 2;
+ goto out;
+ }
+
+ super->load_cache = 1;
+ super->cache_dev = cache_dev;
+ out:
+ sysfs_free(sra);
+ return err;
+}
#endif
static int load_super_imsm(struct supertype *st, int fd, char *devname)
@@ -4634,6 +4806,15 @@ static int load_super_imsm(struct supertype *st, int fd, char *devname)
free_super_imsm(st);
+#ifndef MDASSEMBLE
+ /* check if this is a component leg of a cache array and load
+ * the cache metadata from the parent container
+ */
+ rv = load_super_cache(st, fd, devname);
+ if (rv == 0)
+ return rv;
+#endif
+
super = alloc_super();
/* Load hba and capabilities if they exist.
* But do not preclude loading metadata in case capabilities or hba are
diff --git a/sysfs.c b/sysfs.c
index 4cbd4e5d051b..898edde49392 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -638,7 +638,13 @@ int sysfs_set_array(struct mdinfo *info, int vers)
rv |= sysfs_set_num(info, NULL, "raid_disks", raid_disks);
rv |= sysfs_set_num(info, NULL, "chunk_size", info->array.chunk_size);
rv |= sysfs_set_num(info, NULL, "layout", info->array.layout);
- rv |= sysfs_set_num(info, NULL, "component_size", info->component_size/2);
+ if (info->array.level == LEVEL_ISRT) {
+ /* FIXME: how do we support asymmetric component sizes for
+ * external metadata?
+ */
+ rv |= sysfs_set_num(info, NULL, "component_size", 0);
+ } else
+ rv |= sysfs_set_num(info, NULL, "component_size", info->component_size/2);
if (info->custom_array_size) {
int rc;
diff --git a/util.c b/util.c
index 93f9200fa4c7..c9c4dec0fac1 100644
--- a/util.c
+++ b/util.c
@@ -362,6 +362,7 @@ int enough(int level, int raid_disks, int layout, int clean, char *avail)
case LEVEL_MULTIPATH:
return avail_disks>= 1;
+ case LEVEL_ISRT:
case LEVEL_LINEAR:
case 0:
return avail_disks == raid_disks;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
` (9 preceding siblings ...)
2014-04-24 7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
@ 2014-04-24 7:23 ` Dan Williams
10 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2014-04-24 7:23 UTC (permalink / raw)
To: neilb; +Cc: linux-raid, jes.sorensen, artur.paszkiewicz, dave.jiang
Turn on attribute support for caching.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
super-intel.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index e69d2a044e92..10c38b248ce6 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -81,7 +81,8 @@
MPB_ATTRIB_RAID1 | \
MPB_ATTRIB_RAID10 | \
MPB_ATTRIB_RAID5 | \
- MPB_ATTRIB_EXP_STRIPE_SIZE)
+ MPB_ATTRIB_EXP_STRIPE_SIZE | \
+ MPB_ATTRIB_NVM)
/* Define attributes that are unused but not harmful */
#define MPB_ATTRIB_IGNORED (MPB_ATTRIB_NEVER_USE)
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-24 7:23 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-24 7:22 [RFC mdadm PATCH 00/11] Intel(R) Smart Response Technology mdadm enumeration/assembly Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 01/11] sysfs: fix sysfs_set_array() to accept valid negative array levels Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 02/11] make must_be_container() more selective Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 03/11] Assemble: show the uuid in the verbose case Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 04/11] Assemble: teardown partially assembled arrays Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 05/11] Examine: support for coalescing "cache legs" Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 06/11] imsm: immutable volume id Dan Williams
2014-04-24 7:22 ` [RFC mdadm PATCH 07/11] imsm: cache metadata definitions Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 08/11] imsm: read cache metadata Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 09/11] imsm: examine cache configurations Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 10/11] imsm: assemble cache volumes Dan Williams
2014-04-24 7:23 ` [RFC mdadm PATCH 11/11] imsm: support cache enabled arrays Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).