* [RFC 0/6] Waiting for the missing device in mirror
@ 2015-06-08 7:48 Lidong Zhong
2015-06-08 7:48 ` [RFC 1/6] Enable the keep_log feature while creating a mirror device Lidong Zhong
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-06-08 7:48 UTC (permalink / raw)
To: lvm-devel
Hi List,
The implementation here is trying to add another policy for the
missing leg/log device in mirror. We want to wait the device for some
time in case of a temporary device failure, especially a network disconnection
for clvmd, to avoid a full disk recovery.
This version is kind of a draft. There are many immature places to improve. So comments
and suggestions are welcomed.
The responding kernel part is here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
commit/?h=for-next&id=ed63287dd670f8e9d2412a913de7fdc50a689831
Regards,
Lidong
Lidong Zhong (6):
Enable the keep_log feature while creating a mirror device
Replace table info and export two simple APIs to do suspend/resume
Mark a device if already being waited
Write the device number into metadata
Add another policy for the missing device -- wait
lvconvert: implement the wait policy
lib/config/config_settings.h | 2 +
lib/config/defaults.h | 2 +
lib/format_text/export.c | 4 +
lib/format_text/import_vsn1.c | 5 +
lib/metadata/lv_manip.c | 1 +
lib/metadata/metadata-exported.h | 1 +
lib/metadata/pv.h | 1 +
lib/mirror/mirrored.c | 3 +
lib/misc/lvm-globals.c | 11 ++
lib/misc/lvm-globals.h | 2 +
libdm/libdevmapper.h | 7 +
libdm/libdm-deptree.c | 96 +++++++++++-
tools/commands.h | 4 +-
tools/lvconvert.c | 325 +++++++++++++++++++++++++++++++++++++--
tools/lvcreate.c | 4 +-
15 files changed, 453 insertions(+), 15 deletions(-)
--
1.8.1.4
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC 1/6] Enable the keep_log feature while creating a mirror device
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
@ 2015-06-08 7:48 ` Lidong Zhong
2015-06-08 7:48 ` [RFC 3/6] Mark a device if already being waited Lidong Zhong
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-06-08 7:48 UTC (permalink / raw)
To: lvm-devel
Here we refer to the --trackchanges parameter in lvconvert to enable
this feature. This feature depends on handle_errors.
---
lib/metadata/lv_manip.c | 1 +
lib/metadata/metadata-exported.h | 1 +
lib/mirror/mirrored.c | 3 +++
lib/misc/lvm-globals.c | 11 +++++++++++
lib/misc/lvm-globals.h | 2 ++
libdm/libdevmapper.h | 1 +
libdm/libdm-deptree.c | 17 ++++++++++++++++-
tools/commands.h | 4 ++--
tools/lvcreate.c | 4 +++-
9 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/lib/metadata/lv_manip.c b/lib/metadata/lv_manip.c
index 1251a5d..e955a02 100644
--- a/lib/metadata/lv_manip.c
+++ b/lib/metadata/lv_manip.c
@@ -7036,6 +7036,7 @@ static struct logical_volume *_lv_create_an_lv(struct volume_group *vg,
/* FIXME This will not pass cluster lock! */
init_mirror_in_sync(lp->nosync);
+ init_track_changes(lp->track_changes);
if (lp->nosync) {
log_warn("WARNING: New %s won't be synchronised. "
diff --git a/lib/metadata/metadata-exported.h b/lib/metadata/metadata-exported.h
index 0e52153..7b5e078 100644
--- a/lib/metadata/metadata-exported.h
+++ b/lib/metadata/metadata-exported.h
@@ -848,6 +848,7 @@ struct lvcreate_params {
int32_t minor; /* all */
int log_count; /* mirror */
int nosync; /* mirror */
+ int track_changes; /* mirror */
int pool_metadata_spare; /* pools */
int type; /* type arg is given */
int temporary; /* temporary LV */
diff --git a/lib/mirror/mirrored.c b/lib/mirror/mirrored.c
index e57e9bb..8447b8e 100644
--- a/lib/mirror/mirrored.c
+++ b/lib/mirror/mirrored.c
@@ -371,6 +371,9 @@ static int _add_log(struct dm_pool *mem, struct lv_segment *seg,
if (_block_on_error_available && !(seg->status & PVMOVE))
log_flags |= DM_BLOCK_ON_ERROR;
+ if (track_changes())
+ log_flags |= DM_KEEPLOG;
+
return dm_tree_node_add_mirror_target_log(node, region_size, clustered, log_dlid, area_count, log_flags);
}
diff --git a/lib/misc/lvm-globals.c b/lib/misc/lvm-globals.c
index 6455788..3a00beb 100644
--- a/lib/misc/lvm-globals.c
+++ b/lib/misc/lvm-globals.c
@@ -39,6 +39,7 @@ static int _ignorelockingfailure = 0;
static int _security_level = SECURITY_LEVEL;
static char _cmd_name[30] = "";
static int _mirror_in_sync = 0;
+static int _track_changes = 0;
static int _dmeventd_monitor = DEFAULT_DMEVENTD_MONITOR;
static int _background_polling = DEFAULT_BACKGROUND_POLLING;
static int _ignore_suspended_devices = 0;
@@ -121,6 +122,11 @@ void init_mirror_in_sync(int in_sync)
_mirror_in_sync = in_sync;
}
+void init_track_changes(int track_changes)
+{
+ _track_changes = track_changes;
+}
+
void init_dmeventd_monitor(int reg)
{
_dmeventd_monitor = reg;
@@ -277,6 +283,11 @@ int mirror_in_sync(void)
return _mirror_in_sync;
}
+int track_changes(void)
+{
+ return _track_changes;
+}
+
int dmeventd_monitor_mode(void)
{
return _dmeventd_monitor;
diff --git a/lib/misc/lvm-globals.h b/lib/misc/lvm-globals.h
index b25f4ae..55ff3c2 100644
--- a/lib/misc/lvm-globals.h
+++ b/lib/misc/lvm-globals.h
@@ -37,6 +37,7 @@ void init_ignorelockingfailure(int level);
void init_lockingfailed(int level);
void init_security_level(int level);
void init_mirror_in_sync(int in_sync);
+void init_track_changes(int track_changes);
void init_dmeventd_monitor(int reg);
void init_background_polling(int polling);
void init_ignore_suspended_devices(int ignore);
@@ -69,6 +70,7 @@ int ignorelockingfailure(void);
int lockingfailed(void);
int security_level(void);
int mirror_in_sync(void);
+int track_changes(void);
int background_polling(void);
int ignore_suspended_devices(void);
int ignore_lvm_mirrors(void);
diff --git a/libdm/libdevmapper.h b/libdm/libdevmapper.h
index c811641..30cab21 100644
--- a/libdm/libdevmapper.h
+++ b/libdm/libdevmapper.h
@@ -717,6 +717,7 @@ int dm_tree_node_add_mirror_target(struct dm_tree_node *node,
#define DM_FORCESYNC 0x00000002 /* Force resync */
#define DM_BLOCK_ON_ERROR 0x00000004 /* On error, suspend I/O */
#define DM_CORELOG 0x00000008 /* In-memory log */
+#define DM_KEEPLOG 0x00000010 /* Keep log while device missing*/
int dm_tree_node_add_mirror_target_log(struct dm_tree_node *node,
uint32_t region_size,
diff --git a/libdm/libdm-deptree.c b/libdm/libdm-deptree.c
index 578f645..c08c776 100644
--- a/libdm/libdm-deptree.c
+++ b/libdm/libdm-deptree.c
@@ -2167,6 +2167,7 @@ static int _mirror_emit_segment_line(struct dm_task *dmt, struct load_segment *s
int block_on_error = 0;
int handle_errors = 0;
int dm_log_userspace = 0;
+ int track_changes = 0;
struct utsname uts;
unsigned log_parm_count;
int pos = 0, parts;
@@ -2203,6 +2204,14 @@ static int _mirror_emit_segment_line(struct dm_task *dmt, struct load_segment *s
block_on_error = 1;
}
+ if (seg->flags & DM_KEEPLOG) {
+ track_changes = 1;
+ if (!handle_errors) {
+ log_error("Feature keep_log depends on handle_errors");
+ return_0;
+ }
+ }
+
if (seg->clustered) {
/* Cluster mirrors require a UUID */
if (!seg->uuid)
@@ -2229,6 +2238,10 @@ static int _mirror_emit_segment_line(struct dm_task *dmt, struct load_segment *s
/* "handle_errors" is a feature arg now */
if (handle_errors)
log_parm_count--;
+
+ if (track_changes)
+ log_parm_count--;
+
/* DM_CORELOG does not count in the param list */
if (seg->flags & DM_CORELOG)
@@ -2280,7 +2293,9 @@ static int _mirror_emit_segment_line(struct dm_task *dmt, struct load_segment *s
if (_emit_areas_line(dmt, seg, params, paramsize, &pos) <= 0)
return_0;
- if (handle_errors)
+ if (handle_errors && track_changes)
+ EMIT_PARAMS(pos, " 2 handle_errors keep_log");
+ else if (handle_errors)
EMIT_PARAMS(pos, " 1 handle_errors");
return 1;
diff --git a/tools/commands.h b/tools/commands.h
index 8e87681..003fc57 100644
--- a/tools/commands.h
+++ b/tools/commands.h
@@ -315,7 +315,7 @@ xx(lvcreate,
"\t -L|--size LogicalVolumeSize[bBsSkKmMgGtTpPeE]}\n"
"\t[-M|--persistent {y|n}] [-j|--major major] [--minor minor]\n"
"\t[--metadataprofile ProfileName]\n"
- "\t[-m|--mirrors Mirrors [--nosync]\n"
+ "\t[-m|--mirrors Mirrors [--nosync][--trackchanges]\n"
"\t [{--mirrorlog {disk|core|mirrored}|--corelog}]]\n"
"\t[-n|--name LogicalVolumeName]\n"
"\t[--noudevsync]\n"
@@ -381,7 +381,7 @@ xx(lvcreate,
chunksize_ARG, contiguous_ARG, corelog_ARG, discards_ARG, errorwhenfull_ARG,
extents_ARG, ignoreactivationskip_ARG, ignoremonitoring_ARG, major_ARG,
metadataprofile_ARG, minor_ARG, mirrorlog_ARG, mirrors_ARG, monitor_ARG,
- minrecoveryrate_ARG, maxrecoveryrate_ARG, name_ARG, nosync_ARG,
+ minrecoveryrate_ARG, maxrecoveryrate_ARG, name_ARG, nosync_ARG, trackchanges_ARG,
noudevsync_ARG, permission_ARG, persistent_ARG,
//pooldatasize_ARG,
poolmetadatasize_ARG, poolmetadataspare_ARG,
diff --git a/tools/lvcreate.c b/tools/lvcreate.c
index e41f76c..ac64677 100644
--- a/tools/lvcreate.c
+++ b/tools/lvcreate.c
@@ -535,6 +535,7 @@ static int _read_mirror_and_raid_params(struct cmd_context *cmd,
lp->mirrors = seg_is_mirrored(lp) ? 2 : 1;
lp->nosync = arg_is_set(cmd, nosync_ARG);
+ lp->track_changes = arg_is_set(cmd, trackchanges_ARG);
if (!(lp->region_size = arg_uint_value(cmd, regionsize_ARG, 0)) &&
((lp->region_size = get_default_region_size(cmd)) <= 0)) {
@@ -720,7 +721,8 @@ static int _lvcreate_params(struct cmd_context *cmd,
#define MIRROR_ARGS \
corelog_ARG,\
- mirrorlog_ARG
+ mirrorlog_ARG,\
+ trackchanges_ARG
#define MIRROR_RAID_ARGS \
nosync_ARG,\
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 3/6] Mark a device if already being waited
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
2015-06-08 7:48 ` [RFC 1/6] Enable the keep_log feature while creating a mirror device Lidong Zhong
@ 2015-06-08 7:48 ` Lidong Zhong
2015-06-08 7:48 ` [RFC 4/6] Write the device number into metadata Lidong Zhong
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-06-08 7:48 UTC (permalink / raw)
To: lvm-devel
We use a file named by uuid of the missing device under /tmp to identify
if this device is already being waited. In case of there are two or more
missing device in one mirror, we keep waiting for the same device.
Fix me if there is other good idea.
---
tools/lvconvert.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/tools/lvconvert.c b/tools/lvconvert.c
index fe8b761..8fcbefd 100644
--- a/tools/lvconvert.c
+++ b/tools/lvconvert.c
@@ -962,6 +962,49 @@ static void _lvconvert_mirrors_repair_ask(struct cmd_context *cmd,
}
/*
+ * Here we use a tmp file named UUID of pv under /tmp
+ * to identify if this device already being waited.
+ */
+#define WAITING_DIR "/tmp/"
+static int already_being_waited(struct physical_volume *pv)
+{
+ char uuid[64];
+ char path[70];
+ FILE *fp;
+ int r = 0;
+
+ id_write_format(&pv->id, uuid, 64);
+ sprintf(path, WAITING_DIR"%s", uuid);
+ fp = fopen(path, "r");
+ if (!fp) {
+ fp = fopen(path, "w");
+ fclose(fp);
+ } else
+ r = 1;
+ return r;
+}
+
+static int remove_device_from_waiting_dir(struct physical_volume *pv)
+{
+ char uuid[64];
+ char path[70];
+ FILE *fp;
+ int r = 1;
+
+ id_write_format(&pv->id, uuid, 64);
+ sprintf(path, WAITING_DIR"%s", uuid);
+ fp = fopen(path, "r");
+ if (!fp) {
+ log_error("It should open the file successfully");
+ r = 0;
+ } else {
+ fclose(fp);
+ remove(path);
+ }
+ return r;
+}
+
+/*
* _get_log_count
* @lv: the mirror LV
*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 4/6] Write the device number into metadata
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
2015-06-08 7:48 ` [RFC 1/6] Enable the keep_log feature while creating a mirror device Lidong Zhong
2015-06-08 7:48 ` [RFC 3/6] Mark a device if already being waited Lidong Zhong
@ 2015-06-08 7:48 ` Lidong Zhong
2015-06-08 7:48 ` [RFC 5/6] Add another policy for the missing device -- wait Lidong Zhong
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-06-08 7:48 UTC (permalink / raw)
To: lvm-devel
Here we try to write the missing device's major/minor number into
metadata. Fix me if there is other good idea. Otherwise we have to do
the same changes to other kinds metadata format.
---
lib/format_text/export.c | 4 ++++
lib/format_text/import_vsn1.c | 5 +++++
lib/metadata/pv.h | 1 +
3 files changed, 10 insertions(+)
diff --git a/lib/format_text/export.c b/lib/format_text/export.c
index 018772e..be7f06d 100644
--- a/lib/format_text/export.c
+++ b/lib/format_text/export.c
@@ -504,6 +504,10 @@ static int _print_pvs(struct formatter *f, struct volume_group *vg)
outhint(f, "device = \"%s\"",
dm_escape_double_quotes(buffer, pv_dev_name(pv)));
+ if (pv_dev(pv))
+ outhint(f, "dev_t = %u", pv_dev(pv)->dev);
+ else
+ outhint(f, "dev_t = %u", pv->old_dev);
outnl(f);
if (!_print_flag_config(f, pv->status, PV_FLAGS))
diff --git a/lib/format_text/import_vsn1.c b/lib/format_text/import_vsn1.c
index 2d62905..f33c01e 100644
--- a/lib/format_text/import_vsn1.c
+++ b/lib/format_text/import_vsn1.c
@@ -228,6 +228,11 @@ static int _read_pv(struct format_instance *fid,
log_info("Recovering a previously MISSING PV %s with no MDAs.",
pv_dev_name(pv));
}
+ if ((pv->status & MISSING_PV) && dm_config_has_node(pvn, "dev_t") &&
+ !_read_uint32(pvn, "dev_t", &pv->old_dev)) {
+ log_error("Couldn't read dev_t for physical volume when it's missing.");
+ return 0;
+ }
/* Late addition */
if (dm_config_has_node(pvn, "dev_size") &&
diff --git a/lib/metadata/pv.h b/lib/metadata/pv.h
index 2d436cb..cdd0674 100644
--- a/lib/metadata/pv.h
+++ b/lib/metadata/pv.h
@@ -26,6 +26,7 @@ struct physical_volume {
struct id id;
struct id old_id; /* Set during pvchange -u. */
struct device *dev;
+ dev_t old_dev; /*Only meaningfull when pv is missing*/
const struct format_type *fmt;
struct format_instance *fid;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 5/6] Add another policy for the missing device -- wait
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
` (2 preceding siblings ...)
2015-06-08 7:48 ` [RFC 4/6] Write the device number into metadata Lidong Zhong
@ 2015-06-08 7:48 ` Lidong Zhong
2015-06-08 7:48 ` [RFC 6/6] lvconvert: implement the wait policy Lidong Zhong
2015-06-08 8:38 ` [RFC 0/6] Waiting for the missing device in mirror Zdenek Kabelac
5 siblings, 0 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-06-08 7:48 UTC (permalink / raw)
To: lvm-devel
We add another policy for the missing device. If the missing deivce
could not come back with in the waiting time, then it will excute the
original policy(remove/allocate)
Not sure about the vsn() meaning, so I put down the kernel version...
---
lib/config/config_settings.h | 2 ++
lib/config/defaults.h | 2 ++
2 files changed, 4 insertions(+)
diff --git a/lib/config/config_settings.h b/lib/config/config_settings.h
index 885a2fa..bfea715 100644
--- a/lib/config/config_settings.h
+++ b/lib/config/config_settings.h
@@ -1147,6 +1147,8 @@ cfg(activation_mirror_log_fault_policy_CFG, "mirror_log_fault_policy", activatio
"Defines how a device failure in a 'mirror' log LV is handled.\n"
"The mirror_image_fault_policy description for mirrored LVs\n"
"also applies to mirrored log LVs.\n")
+cfg(activation_mirror_keep_log_CFG, "mirror_keep_log", activation_CFG_SECTION, 0, CFG_TYPE_INT, DEFAULT_MIRROR_KEEP_LOG, vsn(4, 2, 0), NULL,0, NULL, "Enable to wait for the missing device come back within a fixed time.\n")
+cfg(activation_mirror_keep_log_timeout_CFG, "mirror_keep_log_timeout", activation_CFG_SECTION, 0, CFG_TYPE_INT, DEFAULT_MIRROR_KEEP_LOG_TIMEOUT, vsn(4, 2, 0), NULL, 0, NULL, "Defines the wating time.\n")
cfg(activation_mirror_device_fault_policy_CFG, "mirror_device_fault_policy", activation_CFG_SECTION, 0, CFG_TYPE_STRING, DEFAULT_MIRROR_DEVICE_FAULT_POLICY, vsn(1, 2, 10), NULL, vsn(2, 2, 57),
"This has been replaced by the activation/mirror_image_fault_policy setting.\n",
diff --git a/lib/config/defaults.h b/lib/config/defaults.h
index 6793d01..6fbadfd 100644
--- a/lib/config/defaults.h
+++ b/lib/config/defaults.h
@@ -216,6 +216,8 @@
#define DEFAULT_MIRROR_DEVICE_FAULT_POLICY "remove"
#define DEFAULT_MIRROR_LOG_FAULT_POLICY "allocate"
+#define DEFAULT_MIRROR_KEEP_LOG 0
+#define DEFAULT_MIRROR_KEEP_LOG_TIMEOUT 60
#define DEFAULT_SNAPSHOT_AUTOEXTEND_THRESHOLD 100
#define DEFAULT_SNAPSHOT_AUTOEXTEND_PERCENT 20
#define DEFAULT_THIN_POOL_AUTOEXTEND_THRESHOLD 100
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 6/6] lvconvert: implement the wait policy
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
` (3 preceding siblings ...)
2015-06-08 7:48 ` [RFC 5/6] Add another policy for the missing device -- wait Lidong Zhong
@ 2015-06-08 7:48 ` Lidong Zhong
2015-06-08 8:38 ` [RFC 0/6] Waiting for the missing device in mirror Zdenek Kabelac
5 siblings, 0 replies; 9+ messages in thread
From: Lidong Zhong @ 2015-06-08 7:48 UTC (permalink / raw)
To: lvm-devel
Based on udev, we wait for the missing device in a fixed time, otherwise
it will work according to the original policy set.
---
tools/lvconvert.c | 282 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 271 insertions(+), 11 deletions(-)
diff --git a/tools/lvconvert.c b/tools/lvconvert.c
index 8fcbefd..47564de 100644
--- a/tools/lvconvert.c
+++ b/tools/lvconvert.c
@@ -16,6 +16,9 @@
#include "polldaemon.h"
#include "lv_alloc.h"
#include "lvconvert_poll.h"
+#include "lvm-wrappers.h"
+#include <blkid/blkid.h>
+#include <libudev.h>
struct lvconvert_params {
int cache;
@@ -925,6 +928,13 @@ static struct logical_volume *_original_lv(struct logical_volume *lv)
return next_lv;
}
+/*
+ * replace_log/replace_mirrors value
+ * 0: remove
+ * 1: replace
+ * 2: wait+remove
+ * 3: wait+replace
+ */
static void _lvconvert_mirrors_repair_ask(struct cmd_context *cmd,
int failed_log, int failed_mirrors,
int *replace_log, int *replace_mirrors)
@@ -932,12 +942,20 @@ static void _lvconvert_mirrors_repair_ask(struct cmd_context *cmd,
const char *leg_policy, *log_policy;
int force = arg_count(cmd, force_ARG);
int yes = arg_count(cmd, yes_ARG);
+ int keep_log = 0;
if (arg_count(cmd, use_policies_ARG)) {
leg_policy = find_config_tree_str(cmd, activation_mirror_image_fault_policy_CFG, NULL);
log_policy = find_config_tree_str(cmd, activation_mirror_log_fault_policy_CFG, NULL);
*replace_mirrors = strcmp(leg_policy, "remove");
*replace_log = strcmp(log_policy, "remove");
+ keep_log = find_config_tree_int(cmd, activation_mirror_keep_log_CFG, NULL);
+ *replace_mirrors = !!strcmp(leg_policy, "remove");
+ *replace_log = !!strcmp(log_policy, "remove");
+ if (keep_log) {
+ *replace_mirrors += 2;
+ *replace_log += 2;
+ }
return;
}
@@ -1004,6 +1022,193 @@ static int remove_device_from_waiting_dir(struct physical_volume *pv)
return r;
}
+static int get_a_missing_device(struct logical_volume *lv, struct physical_volume **pv)
+{
+ struct lv_segment *lvseg;
+ unsigned s;
+ int r = 0;
+
+
+ dm_list_iterate_items(lvseg, &lv->segments) {
+ for (s = 0; s < lvseg->area_count; s++) {
+ if (seg_type(lvseg, s) == AREA_LV) {
+ if (is_temporary_mirror_layer(seg_lv(lvseg, s)))
+ return get_a_missing_device(seg_lv(lvseg, s), pv);
+ else if (seg_lv(lvseg, s)->status & PARTIAL_LV)
+ return get_a_missing_device(seg_lv(lvseg, s), pv);
+ }
+ else if (seg_type(lvseg, s) == AREA_PV &&
+ is_missing_pv(seg_pv(lvseg, s))) {
+ if (already_being_waited(seg_pv(lvseg, s)))
+ continue;
+ *pv = seg_pv(lvseg, s);
+ r = 1;
+ break;
+ }
+ }
+ if (r == 1)
+ break;
+ }
+ return r;
+}
+
+static int missing_device_return(struct logical_volume *lv, struct physical_volume **pv,
+ dev_t *old, dev_t *new, int wait_time)
+{
+ struct udev_monitor *mon;
+ struct udev_device *device;
+ const char *new_uuid, *dev_nod;
+ struct udev *udev;
+ char uuid[64] __attribute__((aligned(8)));
+ blkid_probe pr;
+ int fd;
+ fd_set fds;
+ struct timeval tv;
+ int r = 0;
+
+ r = get_a_missing_device(lv, pv);
+ if (!r) {
+ log_error("Failed to get the missing pv");
+ goto bad;
+ }
+ /*
+ * How to get the missing device dev_t?
+ */
+ *old = (*pv)->old_dev;
+ if (!*old) {
+ log_error("Failed to get the dev_t of missing pv");
+ goto bad;
+ }
+
+ if (!id_write_format((&(*pv)->id), uuid, sizeof(uuid))) {
+ stack;
+ r = 0;
+ goto bad;
+ }
+
+ if (!(udev = udev_new())) {
+ r = 0;
+ goto bad;
+ }
+
+ mon = udev_monitor_new_from_netlink(udev, "udev");
+ udev_monitor_filter_add_match_subsystem_devtype(mon, "block", NULL);
+ udev_monitor_enable_receiving(mon);
+ fd = udev_monitor_get_fd(mon);
+
+ tv.tv_sec = wait_time;
+ tv.tv_usec = 0;
+ FD_ZERO(&fds);
+ FD_SET(fd, &fds);
+ while(1) {
+ r = select(fd+1, &fds, NULL, NULL, &tv);
+ if (r > 0 && FD_ISSET(fd, &fds)) {
+ /*
+ * check if it's the missing device
+ */
+ device = udev_monitor_receive_device(mon);
+ if (device) {
+ dev_nod = udev_device_get_devnode(device);
+ if (NULL == dev_nod) {
+ continue;
+ }
+ pr = blkid_new_probe_from_filename(dev_nod);
+ if (!pr) {
+ log_error("Failed to open %s",dev_nod);
+ goto out;
+ }
+ blkid_do_probe(pr);
+ blkid_probe_lookup_value(pr, "UUID", &new_uuid, NULL);
+
+ if (!strncmp(new_uuid, uuid, sizeof(uuid))) {
+ *new = udev_device_get_devnum(device);
+ r = 1;
+ break;
+ }
+ blkid_free_probe(pr);
+ udev_device_unref(device);
+ } else {
+ log_error("No device from udev_monitor_receive_device.");
+ goto out;
+ }
+ }
+ /*
+ * timeout while waiting the missing device
+ */
+ else if (r == 0) {
+ break;
+ }
+ }
+
+out:
+ udev_unref(udev);
+bad:
+ if(*pv)
+ remove_device_from_waiting_dir(*pv);
+ return r;
+}
+
+static int start_sync_lv(struct logical_volume *lv)
+{
+ char target[70];
+ int r = 1;
+
+ sprintf(target,"%s-%s", lv->vg->name, lv->name);
+
+ if (!dm_lv_simple_suspend(target) ||
+ !dm_lv_simple_resume(target)) {
+ r = 0;
+ log_error("Failed to suspend/resume %s", lv->name);
+ }
+
+ return r;
+}
+
+static int lv_replace_table(struct cmd_context *cmd, struct logical_volume *lv,
+ struct physical_volume *pv, dev_t old, dev_t new)
+{
+ struct logical_volume *sub_lv = NULL;
+ struct lv_segment *lvseg;
+ char target[70];
+ unsigned s;
+ int r = 0;
+
+ /*check which lv is on the missing pv*/
+ dm_list_iterate_items(lvseg, &lv->segments) {
+ for (s = 0; s < lvseg->area_count; s++) {
+ if ((seg_type(lvseg, s) == AREA_LV) &&
+ (seg_lv(lvseg, s)->status & PARTIAL_LV)) {
+ sub_lv = seg_lv(lvseg, s);
+ if (lv_is_on_pv(sub_lv, pv)) {
+ r = 1;
+ break;
+ }
+ }
+ }
+ if (1 == r) {
+ r = 0;
+ break;
+ }
+ }
+
+ if (!sub_lv)
+ goto fail;
+
+ sprintf(target,"%s-%s", sub_lv->vg->name, sub_lv->name);
+ if (!dm_lv_simple_suspend(target))
+ goto fail;
+
+ if (!dm_replace_mirror_table(target, old, new))
+ goto fail;
+
+ if (!dm_lv_simple_resume(target))
+ goto fail;
+ r = 1;
+fail:
+ return r;
+}
+
+
/*
* _get_log_count
* @lv: the mirror LV
@@ -1510,6 +1715,10 @@ static int _lvconvert_mirrors_repair(struct cmd_context *cmd,
int replace_logs = 0;
int replace_mimages = 0;
uint32_t log_count;
+ int wait_time = 0;
+ int daemon_mode = 0;
+ dev_t old, new;
+ struct physical_volume *pv = NULL;
uint32_t original_mimages = lv_mirror_count(lv);
uint32_t original_logs = _get_log_count(lv);
@@ -1525,28 +1734,79 @@ static int _lvconvert_mirrors_repair(struct cmd_context *cmd,
return 1;
}
+ /*
+ * Count the failed log devices
+ */
failed_mimages = _failed_mirrors_count(lv);
failed_logs = _failed_logs_count(lv);
- if (!mirror_remove_missing(cmd, lv, 0))
- return_0;
+ /*
+ * Find out our policies
+ */
+ _lvconvert_mirrors_repair_ask(cmd, failed_logs, failed_mimages,
+ &replace_logs, &replace_mimages);
+
if (failed_mimages)
log_print_unless_silent("Mirror status: %d of %d images failed.",
failed_mimages, original_mimages);
- /*
- * Count the failed log devices
- */
- if (failed_logs)
+ if (failed_logs) {
log_print_unless_silent("Mirror log status: %d of %d images failed.",
failed_logs, original_logs);
+ if (replace_logs >= 2) {
+ log_error("The log device failed, there is probably no meaning to keep the bitmap."
+ "We make keep_log doesn't work on log devices");
+ replace_logs -= 2;
+ }
+ }
- /*
- * Find out our policies
- */
- _lvconvert_mirrors_repair_ask(cmd, failed_logs, failed_mimages,
- &replace_logs, &replace_mimages);
+ if (replace_mimages >= 2) {
+ if (!udev_is_running()) {
+ log_error("Udev is not running!This feature need support by udev");
+ goto feature_fail;
+ }
+ /*
+ * Here we make it a daemon to wait
+ */
+ daemon_mode = become_daemon(cmd, 0);
+ if (daemon_mode == 0)
+ return ECMD_PROCESSED; /*CHECK: safe for parent to return directly*/
+ else if (daemon_mode != 1) {
+ log_error("Failed to become daemon and disable this feature");
+ goto feature_fail;
+ }
+ wait_time = find_config_tree_int(cmd, activation_mirror_keep_log_timeout_CFG, NULL);
+ log_print_unless_silent("This mirror LV will wait %d secs before"
+ "adopting new policy", wait_time);
+ if (missing_device_return(lv, &pv, &old, &new, wait_time)) {
+ /* Remove the MISSING flag*/
+ pv->status &= ~MISSING_PV;
+
+ if (!pv) {
+ log_error("Failed to get the missing device");
+ goto feature_fail;
+ }
+ if ((MAJOR(old) == MAJOR(new)) && (MINOR(old) == MINOR(new)))
+ goto skip_replace;
+ if (!lv_replace_table(cmd, lv, pv, old, new)) {
+ log_error("Failed to replace table since major/minor changed");
+ goto feature_fail;
+ }
+skip_replace:
+ /*Do sync now*/
+ if (!start_sync_lv(lv)) {
+ log_error("Failed to start do the inremental sync");
+ goto feature_fail;
+ }
+ return 1;
+ }
+feature_fail:
+ replace_mimages -= 2;
+ }
+
+ if (!mirror_remove_missing(cmd, lv, 0))
+ return_0;
/*
* Second phase - replace faulty devices
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC 0/6] Waiting for the missing device in mirror
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
` (4 preceding siblings ...)
2015-06-08 7:48 ` [RFC 6/6] lvconvert: implement the wait policy Lidong Zhong
@ 2015-06-08 8:38 ` Zdenek Kabelac
2015-06-09 3:19 ` Lidong Zhong
5 siblings, 1 reply; 9+ messages in thread
From: Zdenek Kabelac @ 2015-06-08 8:38 UTC (permalink / raw)
To: lvm-devel
Dne 8.6.2015 v 09:48 Lidong Zhong napsal(a):
> Hi List,
>
> The implementation here is trying to add another policy for the
> missing leg/log device in mirror. We want to wait the device for some
> time in case of a temporary device failure, especially a network disconnection
> for clvmd, to avoid a full disk recovery.
>
> This version is kind of a draft. There are many immature places to improve. So comments
> and suggestions are welcomed.
>
> The responding kernel part is here:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
> commit/?h=for-next&id=ed63287dd670f8e9d2412a913de7fdc50a689831
Hi
I think you should please start first with the very precise description what
you are trying to achieve/fix - then we should discuss how to reach desired goal.
With very light overview of patches there are number of problem which can't
fit lvm2 design.
#1 - Never store any device major:minor in lvm2 metadata - everything is
strictly PV UUID oriented (there are number of daemons these days)
#2 - Activation layer & Command layer are 2 separate entities - so your
command may run on different node then the actual activation happens (unless
you do a local activation) - the layer separator is ATM 'lock' - the code
before lock and after lock do not share any data - and the 'activation' layer
knows only what is in written metadata on disk (just for optimization purposes
there is some internal mechanism of caching and reusing of some existing data).
#3 - There is no 'hidden' data exchange channel via /tmp for activation -
everything goes strictly via written and committed metadata, and for every
such metadata state there needs to be some clear recovery path (e.g. what
happens after 'power-off' with each committed lvm2 metadata state)
I do not yet quite understand what are you trying to achieve - but I've not
noticed any patch for 'dmeventd' which is the actual tool that fixes broken
mirrors - so could that tool by anyhow used for detection of some temporary
network failures ?
Regards
Zdenek
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC 0/6] Waiting for the missing device in mirror
2015-06-08 8:38 ` [RFC 0/6] Waiting for the missing device in mirror Zdenek Kabelac
@ 2015-06-09 3:19 ` Lidong Zhong
2015-06-09 7:12 ` Zdenek Kabelac
0 siblings, 1 reply; 9+ messages in thread
From: Lidong Zhong @ 2015-06-09 3:19 UTC (permalink / raw)
To: lvm-devel
>> On 6/8/2015 at 04:38 PM, in message <55755485.2080802@redhat.com>, Zdenek
Kabelac <zkabelac@redhat.com> wrote:
> Dne 8.6.2015 v 09:48 Lidong Zhong napsal(a):
> > Hi List,
> >
> > The implementation here is trying to add another policy for the
> > missing leg/log device in mirror. We want to wait the device for some
> > time in case of a temporary device failure, especially a network
> disconnection
> > for clvmd, to avoid a full disk recovery.
> >
> > This version is kind of a draft. There are many immature places to improve.
> So comments
> > and suggestions are welcomed.
> >
> > The responding kernel part is here:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
> > commit/?h=for-next&id=ed63287dd670f8e9d2412a913de7fdc50a689831
>
> Hi
>
Hi Zdenek,
Thanks for your reply.
> I think you should please start first with the very precise description what
>
> you are trying to achieve/fix - then we should discuss how to reach desired
> goal.
>
Sorry, my fault. Here is the situation:
If one leg of the mirror fails, according to current implementation, the failed leg
will either be removed or be replaced. However, if it is a temporary failure( such as
network failure in clvmd), we have to do a full sync for the disk if we re-add it as mirror ,
which will cost a long time. So we plan to add another policy for the missing device, that is
waiting the device for a configurable time. Then we could just do a incremental sync
for the device while it's disappeared.
What I do in the patch series is:
Add a new feature for the mirror target, which enables bios still could be written to the left
mirror devices and also keep the bitmap. The implementation has been done for the kernel.
We add a KEEP_LOG feature, which depends on current HANDLE_ERRORS feature. For the
userspace, we should add the parameter --trackchanges if we create a dm-mirror device to
enable this feature.
When dmeventd gets a device failure event, it will call lvconvert according to the policy set in
lvm.conf. So most of our work is for lvconvert command.
We add another policy, that is mirror_keep_log/mirror_keep_log_timeout in activation section.
If this policy is set, then we will make it a daemon to wait for the missing device within
mirror_keep_log_timeout. If it doesn't comes back, it will act the same as before based on
the mirror_device_fault_policy set. Otherwise, it will start the incremental sync and return.
Some immature points are:
1\ It will create a temporary file named by UUID of the device under /tmp file, in case of there
are two or more failed devices and the daemons wait for the same one.
2\ The major:minor of the missing device probably changes when it comes back. So I put the original
device number into metadata.(As already pointed out, it does not fit the rule.)
and some others.
> With very light overview of patches there are number of problem which can't
> fit lvm2 design.
>
> #1 - Never store any device major:minor in lvm2 metadata - everything is
> strictly PV UUID oriented (there are number of daemons these days)
>
I thought about storing this info into lvmetad. But if lvmetad service is not running,
then what should we do.
> #2 - Activation layer & Command layer are 2 separate entities - so your
> command may run on different node then the actual activation happens (unless
>
> you do a local activation) - the layer separator is ATM 'lock' - the code
> before lock and after lock do not share any data - and the 'activation'
> layer
> knows only what is in written metadata on disk (just for optimization
> purposes
> there is some internal mechanism of caching and reusing of some existing
> data).
>
I don't quite understand this part. I guess it's related to the replacing table info and
starting sync in my code. I will look deep into this part. Thanks.
> #3 - There is no 'hidden' data exchange channel via /tmp for activation -
> everything goes strictly via written and committed metadata, and for every
> such metadata state there needs to be some clear recovery path (e.g. what
> happens after 'power-off' with each committed lvm2 metadata state)
>
You mean I should put the waiting device info into metadata?
Regards,
Lidong
> I do not yet quite understand what are you trying to achieve - but I've not
> noticed any patch for 'dmeventd' which is the actual tool that fixes broken
> mirrors - so could that tool by anyhow used for detection of some temporary
> network failures ?
>
> Regards
>
> Zdenek
>
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC 0/6] Waiting for the missing device in mirror
2015-06-09 3:19 ` Lidong Zhong
@ 2015-06-09 7:12 ` Zdenek Kabelac
0 siblings, 0 replies; 9+ messages in thread
From: Zdenek Kabelac @ 2015-06-09 7:12 UTC (permalink / raw)
To: lvm-devel
Dne 9.6.2015 v 05:19 Lidong Zhong napsal(a):
>>> On 6/8/2015 at 04:38 PM, in message <55755485.2080802@redhat.com>, Zdenek
> Kabelac <zkabelac@redhat.com> wrote:
>> Dne 8.6.2015 v 09:48 Lidong Zhong napsal(a):
>>> Hi List,
>>>
>>> The implementation here is trying to add another policy for the
>>> missing leg/log device in mirror. We want to wait the device for some
>>> time in case of a temporary device failure, especially a network
>> disconnection
>>> for clvmd, to avoid a full disk recovery.
>>>
>>> This version is kind of a draft. There are many immature places to improve.
>> So comments
>>> and suggestions are welcomed.
>>>
>>> The responding kernel part is here:
>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
>>> commit/?h=for-next&id=ed63287dd670f8e9d2412a913de7fdc50a689831
>>
>> Hi
>>
> Hi Zdenek,
>
> Thanks for your reply.
>> I think you should please start first with the very precise description what
>>
>> you are trying to achieve/fix - then we should discuss how to reach desired
>> goal.
>>
>
> Sorry, my fault. Here is the situation:
> If one leg of the mirror fails, according to current implementation, the failed leg
> will either be removed or be replaced. However, if it is a temporary failure( such as
> network failure in clvmd), we have to do a full sync for the disk if we re-add it as mirror ,
> which will cost a long time. So we plan to add another policy for the missing device, that is
> waiting the device for a configurable time. Then we could just do a incremental sync
> for the device while it's disappeared.
>
> What I do in the patch series is:
> Add a new feature for the mirror target, which enables bios still could be written to the left
> mirror devices and also keep the bitmap. The implementation has been done for the kernel.
> We add a KEEP_LOG feature, which depends on current HANDLE_ERRORS feature. For the
> userspace, we should add the parameter --trackchanges if we create a dm-mirror device to
> enable this feature.
Before we start to think about enhancing the old mirror which is really
incapable to easily track multiple lost legs compared with new 'raid1' target:
Does user needs to activate mirror on multiple nodes at once (using cmirrord.
and gfs?)
For exclusive mirror I'd advice to switch to superior new 'raid1' --type which
already does provide 'tracking' feature .
> When dmeventd gets a device failure event, it will call lvconvert according to the policy set in
So the 'failures' are not short term - but there is really device lost and
reappears ?
> 1\ It will create a temporary file named by UUID of the device under /tmp file, in case of there
> are two or more failed devices and the daemons wait for the same one.
Can't use things in /tmp - you need to have prepared some device
(like we already introduces _pmspare for repair of thin pools)
> 2\ The major:minor of the missing device probably changes when it comes back. So I put the original
> device number into metadata.(As already pointed out, it does not fit the rule.)
Devices are simply always mapped by PV UUID - never ever by major:minor - and
they are discovered by udev and stored in lvmetad - this is basically:
'vgextend --restoremissing' operation.
You also have numerous filtering rules (host & guest disks on a single box).
>> #1 - Never store any device major:minor in lvm2 metadata - everything is
>> strictly PV UUID oriented (there are number of daemons these days)
>>
>
> I thought about storing this info into lvmetad. But if lvmetad service is not running,
> then what should we do.
Simply forget about major:minor - you don't need them.
>> #2 - Activation layer & Command layer are 2 separate entities - so your
>> command may run on different node then the actual activation happens (unless
>>
>> you do a local activation) - the layer separator is ATM 'lock' - the code
>> before lock and after lock do not share any data - and the 'activation'
>> layer
>> knows only what is in written metadata on disk (just for optimization
>> purposes
>> there is some internal mechanism of caching and reusing of some existing
>> data).
>>
>
> I don't quite understand this part. I guess it's related to the replacing table info and
> starting sync in my code. I will look deep into this part. Thanks.
There is 'extra' interface how a '/tools' command could manipulate with dm
table. It's represented currently by a lock and you could imagine 'clvmd' as
an activation daemon which understands 4 simple commands:
activate, deactivate, suspend, resume
As parameter it gets LV-UUID and few extra bits (unfortunately we run out of
free bits years ago and it's hard to extend protocol without breaking
compatibility)
Nothing else gets passed through - and this activation 'side' accesses on-disk
metadata and does the actual activation (in parallel on multiple nodes if
needed) (in case it's all running in single command there are some 'caching
methods' for speed up.
>
>> #3 - There is no 'hidden' data exchange channel via /tmp for activation -
>> everything goes strictly via written and committed metadata, and for every
>> such metadata state there needs to be some clear recovery path (e.g. what
>> happens after 'power-off' with each committed lvm2 metadata state)
>>
>
> You mean I should put the waiting device info into metadata?
Figuring proper setup for an old mirror may get complex (since old mirror
does not support separate tracking device for individual leg).
So it will be something like 'pvmove'.
You 'create' another mirror layer and you pass a new 'temporary' log device to
it. But when you consider you want a universal solution and you would need to
be able to track changes for i.e. 16legged mirror - it may get seriously scary.
But if you really do not need parallel activation - I'd recommend to switch
the raid1 mirroring - so first check if this would not resolve your problem.
(As old mirrors are seen as 'obsolete')
Zdenek
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-06-09 7:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-08 7:48 [RFC 0/6] Waiting for the missing device in mirror Lidong Zhong
2015-06-08 7:48 ` [RFC 1/6] Enable the keep_log feature while creating a mirror device Lidong Zhong
2015-06-08 7:48 ` [RFC 3/6] Mark a device if already being waited Lidong Zhong
2015-06-08 7:48 ` [RFC 4/6] Write the device number into metadata Lidong Zhong
2015-06-08 7:48 ` [RFC 5/6] Add another policy for the missing device -- wait Lidong Zhong
2015-06-08 7:48 ` [RFC 6/6] lvconvert: implement the wait policy Lidong Zhong
2015-06-08 8:38 ` [RFC 0/6] Waiting for the missing device in mirror Zdenek Kabelac
2015-06-09 3:19 ` Lidong Zhong
2015-06-09 7:12 ` Zdenek Kabelac
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.