* [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
@ 2013-10-21 22:34 Jonathan Brassow
2013-10-22 9:42 ` Zdenek Kabelac
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jonathan Brassow @ 2013-10-21 22:34 UTC (permalink / raw)
To: lvm-devel
agk,
The patch below is designed to address bug 1009700:
https://bugzilla.redhat.com/show_bug.cgi?id=1009700
It is the simplest way with the most flexibility for users that I could
think of to handle the problem outside of doing the asynchronous label
reads.
brassow
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O. When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices. This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.
Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel. While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely. This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.
I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs). The default
will be to not allow this. If the user chooses to allow label scanning
of mirror LVs, it will be at the expense of a possible hang in I/O or
LVM processes during a mirror failure.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command. This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror. They
can then release their lock and allow a repair command to commence. In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.
This patch brings solution #1. If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
Index: lvm2/conf/example.conf.in
===================================================================
--- lvm2.orig/conf/example.conf.in
+++ lvm2/conf/example.conf.in
@@ -186,6 +186,15 @@ devices {
# in recovery situations.
ignore_suspended_devices = 0
+ # By default, LVM2 will ignore logical volumes of "mirror" segment
+ # types. This means they will not be able to be used as physical
+ # volumes for a higher ("stacked") volume group. This is done because
+ # of the possibility of reading a mirror logical volume during a device
+ # failure and blocking indefinitely on it. This is a design limitation
+ # of the "mirror" segment type that does not affect the LVM RAID types,
+ # including "raid1".
+ lvm_mirror_scanning = 0
+
# During each LVM operation errors received from each device are counted.
# If the counter of a particular device exceeds the limit set here, no
# further I/O is sent to that device for the remainder of the respective
Index: lvm2/lib/activate/dev_manager.c
===================================================================
--- lvm2.orig/lib/activate/dev_manager.c
+++ lvm2/lib/activate/dev_manager.c
@@ -397,11 +397,17 @@ static int _device_is_usable(struct devi
next = dm_get_next_target(dmt, next, &start, &length,
&target_type, ¶ms);
- if (target_type && !strcmp(target_type, "mirror") &&
- !_ignore_blocked_mirror_devices(dev, start, length, params)) {
- log_debug_activation("%s: Mirror device %s not usable.",
- dev_name(dev), name);
- goto out;
+ if (target_type && !strcmp(target_type, "mirror")) {
+ if (!lvm_mirror_scanning()) {
+ log_debug_activation("%s: Scanning mirror devices is disabled.", dev_name(dev));
+ goto out;
+ }
+ if (!_ignore_blocked_mirror_devices(dev, start,
+ length, params)) {
+ log_debug_activation("%s: Mirror device %s not usable.",
+ dev_name(dev), name);
+ goto out;
+ }
}
/*
Index: lvm2/lib/misc/lvm-globals.c
===================================================================
--- lvm2.orig/lib/misc/lvm-globals.c
+++ lvm2/lib/misc/lvm-globals.c
@@ -40,6 +40,7 @@ static int _mirror_in_sync = 0;
static int _dmeventd_monitor = DEFAULT_DMEVENTD_MONITOR;
static int _background_polling = DEFAULT_BACKGROUND_POLLING;
static int _ignore_suspended_devices = 0;
+static int _lvm_mirror_scanning = DEFAULT_LVM_MIRROR_SCANNING;
static int _error_message_produced = 0;
static unsigned _is_static = 0;
static int _udev_checking = 1;
@@ -123,6 +124,11 @@ void init_ignore_suspended_devices(int i
_ignore_suspended_devices = ignore;
}
+void init_lvm_mirror_scanning(int scan)
+{
+ _lvm_mirror_scanning = scan;
+}
+
void init_cmd_name(int status)
{
_log_cmd_name = status;
@@ -259,6 +265,11 @@ int ignore_suspended_devices(void)
return _ignore_suspended_devices;
}
+int lvm_mirror_scanning(void)
+{
+ return _lvm_mirror_scanning;
+}
+
void init_debug(int level)
{
_debug_level = level;
Index: lvm2/lib/config/config_settings.h
===================================================================
--- lvm2.orig/lib/config/config_settings.h
+++ lvm2/lib/config/config_settings.h
@@ -96,6 +96,7 @@ cfg(devices_data_alignment_detection_CFG
cfg(devices_data_alignment_CFG, "data_alignment", devices_CFG_SECTION, 0, CFG_TYPE_INT, 0, vsn(2, 2, 45), NULL)
cfg(devices_data_alignment_offset_detection_CFG, "data_alignment_offset_detection", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION, vsn(2, 2, 50), NULL)
cfg(devices_ignore_suspended_devices_CFG, "ignore_suspended_devices", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_IGNORE_SUSPENDED_DEVICES, vsn(1, 2, 19), NULL)
+cfg(lvm_mirror_scanning_CFG, "lvm_mirror_scanning", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_LVM_MIRROR_SCANNING, vsn(2, 2, 104), NULL)
cfg(devices_disable_after_error_count_CFG, "disable_after_error_count", devices_CFG_SECTION, 0, CFG_TYPE_INT, DEFAULT_DISABLE_AFTER_ERROR_COUNT, vsn(2, 2, 75), NULL)
cfg(devices_require_restorefile_with_uuid_CFG, "require_restorefile_with_uuid", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_REQUIRE_RESTOREFILE_WITH_UUID, vsn(2, 2, 73), NULL)
cfg(devices_pv_min_size_CFG, "pv_min_size", devices_CFG_SECTION, 0, CFG_TYPE_INT, DEFAULT_PV_MIN_SIZE_KB, vsn(2, 2, 85), NULL)
Index: lvm2/lib/config/defaults.h
===================================================================
--- lvm2.orig/lib/config/defaults.h
+++ lvm2/lib/config/defaults.h
@@ -33,6 +33,7 @@
#define DEFAULT_SYSFS_SCAN 1
#define DEFAULT_MD_COMPONENT_DETECTION 1
#define DEFAULT_MD_CHUNK_ALIGNMENT 1
+#define DEFAULT_LVM_MIRROR_SCANNING 0
#define DEFAULT_MULTIPATH_COMPONENT_DETECTION 1
#define DEFAULT_IGNORE_SUSPENDED_DEVICES 1
#define DEFAULT_DISABLE_AFTER_ERROR_COUNT 0
Index: lvm2/lib/misc/lvm-globals.h
===================================================================
--- lvm2.orig/lib/misc/lvm-globals.h
+++ lvm2/lib/misc/lvm-globals.h
@@ -38,6 +38,7 @@ void init_mirror_in_sync(int in_sync);
void init_dmeventd_monitor(int reg);
void init_background_polling(int polling);
void init_ignore_suspended_devices(int ignore);
+void init_lvm_mirror_scanning(int scan);
void init_error_message_produced(int produced);
void init_is_static(unsigned value);
void init_udev_checking(int checking);
@@ -66,6 +67,7 @@ int security_level(void);
int mirror_in_sync(void);
int background_polling(void);
int ignore_suspended_devices(void);
+int lvm_mirror_scanning(void);
const char *log_command_name(void);
unsigned is_static(void);
int udev_checking(void);
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
2013-10-21 22:34 [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors Jonathan Brassow
@ 2013-10-22 9:42 ` Zdenek Kabelac
2013-10-22 13:00 ` Brassow Jonathan
2013-10-22 23:39 ` [PATCH v2]: " Jonathan Brassow
2013-10-22 23:41 ` [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors Jonathan Brassow
2 siblings, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2013-10-22 9:42 UTC (permalink / raw)
To: lvm-devel
Dne 22.10.2013 00:34, Jonathan Brassow napsal(a):
> agk,
>
> The patch below is designed to address bug 1009700:
> https://bugzilla.redhat.com/show_bug.cgi?id=1009700
> It is the simplest way with the most flexibility for users that I could
> think of to handle the problem outside of doing the asynchronous label
> reads.
>
I believe the patch needs to go in different direction.
See the bug https://bugzilla.redhat.com/show_bug.cgi?id=905028
Lvm2 should be smarter in it's scanning code and
must avoid reading device it know it may block.
There are very similar problems i.e. with thin pools, so
it's not practical to put in various hacks for individual
targets.
The fix must be in the filter to skip known dm devices.
Zdenek
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
2013-10-22 9:42 ` Zdenek Kabelac
@ 2013-10-22 13:00 ` Brassow Jonathan
0 siblings, 0 replies; 7+ messages in thread
From: Brassow Jonathan @ 2013-10-22 13:00 UTC (permalink / raw)
To: lvm-devel
On Oct 22, 2013, at 4:42 AM, Zdenek Kabelac wrote:
> Dne 22.10.2013 00:34, Jonathan Brassow napsal(a):
>> agk,
>>
>> The patch below is designed to address bug 1009700:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1009700
>> It is the simplest way with the most flexibility for users that I could
>> think of to handle the problem outside of doing the asynchronous label
>> reads.
>>
>
>
> I believe the patch needs to go in different direction.
>
> See the bug https://bugzilla.redhat.com/show_bug.cgi?id=905028
>
> Lvm2 should be smarter in it's scanning code and
> must avoid reading device it know it may block.
>
> There are very similar problems i.e. with thin pools, so
> it's not practical to put in various hacks for individual
> targets.
>
> The fix must be in the filter to skip known dm devices.
Yes, I've thought about that already. One idea for a solution I had was to avoid reading DM devices that have UUID that indicate it is part of the VG being operated on. That was a non-starter because at the point of the problem, we do not yet know the VGs UUID and therefore cannot use it as a filter.
As far as avoiding devices that are blocked, we already do that for mirrors - which is the problem I am trying to solve. It is called lib/activate/dev_manager.c:'ignore_blocked_mirror_devices'. In the header of the patch provided, I describe why this is not sufficient to solve the problem entirely. To clean-up the code, we could put this function (and the whole of 'device_is_usable') into a filter of its own. Again though, this doesn't solve the problem.
I like what you are saying. I've been down that road. This is why I've presented the solution as I have.
brassow
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
2013-10-21 22:34 [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors Jonathan Brassow
2013-10-22 9:42 ` Zdenek Kabelac
@ 2013-10-22 23:39 ` Jonathan Brassow
2013-10-23 8:50 ` Zdenek Kabelac
2013-10-22 23:41 ` [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors Jonathan Brassow
2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Brassow @ 2013-10-22 23:39 UTC (permalink / raw)
To: lvm-devel
Changed some variable/function names and added more explanation to the
config file.
I will send a separate patch that contains a warning message if mirrors
are activated and 'ignore_lvm_mirrors' is not set... We can talk about
whether that is needed also.
brassow
Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O. When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices. This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.
Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel. While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely. This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.
I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs). If the user
chooses to allow label scanning of mirror LVs, it will be at the expense
of a possible hang in I/O or LVM processes.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command. This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror. They
can then release their lock and allow a repair command to commence. In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.
This patch brings solution #1. If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
Index: lvm2/conf/example.conf.in
===================================================================
--- lvm2.orig/conf/example.conf.in
+++ lvm2/conf/example.conf.in
@@ -186,6 +186,35 @@ devices {
# in recovery situations.
ignore_suspended_devices = 0
+ # ignore_lvm_mirrors: Introduced in version 2.02.104
+ # This setting determines whether logical volumes of "mirror" segment
+ # type are scanned for LVM labels. This affects the ability of
+ # mirrors to be used as physical volumes. If 'ignore_lvm_mirrors'
+ # is set to '1', it becomes impossible to create volume groups on top
+ # of mirror logical volumes - i.e. to stack volume groups on mirrors.
+ #
+ # Allowing mirror logical volumes to be scanned (setting the value to '0')
+ # can potentially cause LVM processes and I/O to the mirror to become
+ # blocked. This is due to the way that the "mirror" segment type handles
+ # failures. In order for the hang to manifest itself, an LVM command must
+ # be run just after a failure and before the automatic LVM repair process
+ # takes place OR there must be failures in multiple mirrors in the same
+ # volume group at the same time with write failures occurring moments
+ # before a scan of the mirror's labels.
+ #
+ # Note that these scanning limitations do not apply to the LVM RAID
+ # types, like "raid1". The RAID segment types handle failures in a
+ # different way and are not subject to possible process or I/O blocking.
+ #
+ # It is encouraged that users set 'ignore_lvm_mirrors' to 1 if they
+ # are using the "mirror" segment type. Users that require volume group
+ # stacking on mirrored logical volumes should consider using the "raid1"
+ # segment type. The "raid1" segment type is not available for
+ # active/active clustered volume groups.
+ #
+ # Set to 1 to disallow stacking and thereby avoid a possible deadlock.
+ ignore_lvm_mirrors = 1
+
# During each LVM operation errors received from each device are counted.
# If the counter of a particular device exceeds the limit set here, no
# further I/O is sent to that device for the remainder of the respective
Index: lvm2/lib/activate/dev_manager.c
===================================================================
--- lvm2.orig/lib/activate/dev_manager.c
+++ lvm2/lib/activate/dev_manager.c
@@ -397,11 +397,17 @@ static int _device_is_usable(struct devi
next = dm_get_next_target(dmt, next, &start, &length,
&target_type, ¶ms);
- if (target_type && !strcmp(target_type, "mirror") &&
- !_ignore_blocked_mirror_devices(dev, start, length, params)) {
- log_debug_activation("%s: Mirror device %s not usable.",
- dev_name(dev), name);
- goto out;
+ if (target_type && !strcmp(target_type, "mirror")) {
+ if (ignore_lvm_mirrors()) {
+ log_debug_activation("%s: Scanning mirror devices is disabled.", dev_name(dev));
+ goto out;
+ }
+ if (!_ignore_blocked_mirror_devices(dev, start,
+ length, params)) {
+ log_debug_activation("%s: Mirror device %s not usable.",
+ dev_name(dev), name);
+ goto out;
+ }
}
/*
Index: lvm2/lib/misc/lvm-globals.c
===================================================================
--- lvm2.orig/lib/misc/lvm-globals.c
+++ lvm2/lib/misc/lvm-globals.c
@@ -40,6 +40,7 @@ static int _mirror_in_sync = 0;
static int _dmeventd_monitor = DEFAULT_DMEVENTD_MONITOR;
static int _background_polling = DEFAULT_BACKGROUND_POLLING;
static int _ignore_suspended_devices = 0;
+static int _ignore_lvm_mirrors = DEFAULT_IGNORE_LVM_MIRRORS;
static int _error_message_produced = 0;
static unsigned _is_static = 0;
static int _udev_checking = 1;
@@ -123,6 +124,11 @@ void init_ignore_suspended_devices(int i
_ignore_suspended_devices = ignore;
}
+void init_ignore_lvm_mirrors(int scan)
+{
+ _ignore_lvm_mirrors = scan;
+}
+
void init_cmd_name(int status)
{
_log_cmd_name = status;
@@ -259,6 +265,11 @@ int ignore_suspended_devices(void)
return _ignore_suspended_devices;
}
+int ignore_lvm_mirrors(void)
+{
+ return _ignore_lvm_mirrors;
+}
+
void init_debug(int level)
{
_debug_level = level;
Index: lvm2/lib/config/config_settings.h
===================================================================
--- lvm2.orig/lib/config/config_settings.h
+++ lvm2/lib/config/config_settings.h
@@ -96,6 +96,7 @@ cfg(devices_data_alignment_detection_CFG
cfg(devices_data_alignment_CFG, "data_alignment", devices_CFG_SECTION, 0, CFG_TYPE_INT, 0, vsn(2, 2, 45), NULL)
cfg(devices_data_alignment_offset_detection_CFG, "data_alignment_offset_detection", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION, vsn(2, 2, 50), NULL)
cfg(devices_ignore_suspended_devices_CFG, "ignore_suspended_devices", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_IGNORE_SUSPENDED_DEVICES, vsn(1, 2, 19), NULL)
+cfg(devices_ignore_lvm_mirrors_CFG, "ignore_lvm_mirrors", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_IGNORE_LVM_MIRRORS, vsn(2, 2, 104), NULL)
cfg(devices_disable_after_error_count_CFG, "disable_after_error_count", devices_CFG_SECTION, 0, CFG_TYPE_INT, DEFAULT_DISABLE_AFTER_ERROR_COUNT, vsn(2, 2, 75), NULL)
cfg(devices_require_restorefile_with_uuid_CFG, "require_restorefile_with_uuid", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_REQUIRE_RESTOREFILE_WITH_UUID, vsn(2, 2, 73), NULL)
cfg(devices_pv_min_size_CFG, "pv_min_size", devices_CFG_SECTION, 0, CFG_TYPE_INT, DEFAULT_PV_MIN_SIZE_KB, vsn(2, 2, 85), NULL)
Index: lvm2/lib/config/defaults.h
===================================================================
--- lvm2.orig/lib/config/defaults.h
+++ lvm2/lib/config/defaults.h
@@ -33,6 +33,7 @@
#define DEFAULT_SYSFS_SCAN 1
#define DEFAULT_MD_COMPONENT_DETECTION 1
#define DEFAULT_MD_CHUNK_ALIGNMENT 1
+#define DEFAULT_IGNORE_LVM_MIRRORS 1
#define DEFAULT_MULTIPATH_COMPONENT_DETECTION 1
#define DEFAULT_IGNORE_SUSPENDED_DEVICES 1
#define DEFAULT_DISABLE_AFTER_ERROR_COUNT 0
Index: lvm2/lib/misc/lvm-globals.h
===================================================================
--- lvm2.orig/lib/misc/lvm-globals.h
+++ lvm2/lib/misc/lvm-globals.h
@@ -38,6 +38,7 @@ void init_mirror_in_sync(int in_sync);
void init_dmeventd_monitor(int reg);
void init_background_polling(int polling);
void init_ignore_suspended_devices(int ignore);
+void init_ignore_lvm_mirrors(int scan);
void init_error_message_produced(int produced);
void init_is_static(unsigned value);
void init_udev_checking(int checking);
@@ -66,6 +67,7 @@ int security_level(void);
int mirror_in_sync(void);
int background_polling(void);
int ignore_suspended_devices(void);
+int ignore_lvm_mirrors(void);
const char *log_command_name(void);
unsigned is_static(void);
int udev_checking(void);
Index: lvm2/lib/commands/toolcontext.c
===================================================================
--- lvm2.orig/lib/commands/toolcontext.c
+++ lvm2/lib/commands/toolcontext.c
@@ -905,6 +905,7 @@ static int _init_filters(struct cmd_cont
goto_bad;
init_ignore_suspended_devices(find_config_tree_bool(cmd, devices_ignore_suspended_devices_CFG, NULL));
+ init_ignore_lvm_mirrors(find_config_tree_bool(cmd, devices_ignore_lvm_mirrors_CFG, NULL));
/*
* If 'cache_dir' or 'cache_file_prefix' is set, ignore 'cache'.
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v2]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
2013-10-22 23:39 ` [PATCH v2]: " Jonathan Brassow
@ 2013-10-23 8:50 ` Zdenek Kabelac
0 siblings, 0 replies; 7+ messages in thread
From: Zdenek Kabelac @ 2013-10-23 8:50 UTC (permalink / raw)
To: lvm-devel
Dne 23.10.2013 01:39, Jonathan Brassow napsal(a):
> Changed some variable/function names and added more explanation to the
> config file.
>
> I will send a separate patch that contains a warning message if mirrors
> are activated and 'ignore_lvm_mirrors' is not set... We can talk about
> whether that is needed also.
>
> brassow
>
> Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
>
> There is a problem with the way mirrors have been designed to handle
> failures that is resulting in stuck LVM processes and hung I/O. When
> mirrors encounter a write failure, they block I/O and notify userspace
> to reconfigure the mirror to remove failed devices. This process is
> open to a couple races:
> 1) Any LVM process other than the one that is meant to deal with the
> mirror failure can attempt to read the mirror, fail, and block other
> LVM commands (including the repair command) from proceeding due to
> holding a lock on the volume group.
> 2) If there are multiple mirrors that suffer a failure in the same
> volume group, a repair can block while attempting to read the LVM
> label from one mirror while trying to repair the other.
>
> Mitigation of these races has been attempted by disallowing label reading
> of mirrors that are either suspended or are indicated as blocking by
> the kernel. While this has closed the window of opportunity for hitting
Is mirror read 'abort-able' (i.e. sigalarm()) when it's blocked ?
So our 'scan' routine could try to read mirror - which suddenly
gets 'frozen' by write error.
If we would have used sigalarm - we should be able abort() read operation
(though I'm not sure where the read gets stuck - maybe it would need change in
the kernel driver?) - after read failure we may detect mirror error
conditions through dm status - and make some reaction?
The very similar thing needs to be added for scanning of i.e. thinly
provisioned devices - which may get stuck when the pool is overfilled - so
some solution in this direction is unavoidable - IMHO we should not hide the
problem by disabling of scanning).
> 2) Instrument a way to allow asynchronous label reading - allowing
> blocked label reads to be ignored while continuing to process the LVM
> command. This would action would allow LVM commands to continue even
> though they would have otherwise blocked trying to read a mirror. They
> can then release their lock and allow a repair command to commence. In
> the event of #2 above, the repair command already in progress can continue
> and repair the failed mirror.
Async read is not the only problem here - we have other issues:
i.e. activate mirror - and wait for confirmation (dmsetup udevcomplete)
but this may also run watch rule - and also blkid may get blocked (mirror error)
So now we get into fancy states - where our command is waiting for
semaphore completion (no timeout on semaphore for now) - which doesn't happen
since master udev kills its udev scan completely - without any 'finalization'
step.
So - we would need to probably make a mirror device also 'unscannable' ??
(which makes it unusable for filesystems??)
Anyway - more troubles ahead....
Zdenek
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors
2013-10-21 22:34 [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors Jonathan Brassow
2013-10-22 9:42 ` Zdenek Kabelac
2013-10-22 23:39 ` [PATCH v2]: " Jonathan Brassow
@ 2013-10-22 23:41 ` Jonathan Brassow
2013-10-23 7:52 ` Zdenek Kabelac
2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Brassow @ 2013-10-22 23:41 UTC (permalink / raw)
To: lvm-devel
Not sure if I care for the static variable, but I don't want it printed
for every mirror on every command that activates...
brassow
Mirror: Print warning on activation if mirrors exists and !ignore_lvm_mirrors
Print a warning if mirrors are present and config file settings indicate
that they can be scanned for labels. (A process that has the potential
to block indefinitely if it happens just after a failure.)
Index: lvm2/lib/activate/activate.c
===================================================================
--- lvm2.orig/lib/activate/activate.c
+++ lvm2/lib/activate/activate.c
@@ -1077,9 +1077,18 @@ static int _lv_open_count(struct cmd_con
static int _lv_activate_lv(struct logical_volume *lv, struct lv_activate_opts *laopts)
{
+ static int mirror_warning_printed = 0;
int r;
struct dev_manager *dm;
+ if (lv_is_mirrored(lv) && !lv_is_raid(lv) &&
+ !mirror_warning_printed && !ignore_lvm_mirrors()) {
+ log_print("Scanning mirrors for LVM labels is enabled.\n"
+ " It is possible for this to cause I/O hangs and stuck LVM processes.\n"
+ " See 'ignore_lvm_mirrors' in the LVM configuation file for details.");
+ mirror_warning_printed = 1;
+ }
+
if (!(dm = dev_manager_create(lv->vg->cmd, lv->vg->name, (lv->status & PVMOVE) ? 0 : 1)))
return_0;
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors
2013-10-22 23:41 ` [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors Jonathan Brassow
@ 2013-10-23 7:52 ` Zdenek Kabelac
0 siblings, 0 replies; 7+ messages in thread
From: Zdenek Kabelac @ 2013-10-23 7:52 UTC (permalink / raw)
To: lvm-devel
Dne 23.10.2013 01:41, Jonathan Brassow napsal(a):
> Not sure if I care for the static variable, but I don't want it printed
> for every mirror on every command that activates...
>
> brassow
>
>
> Mirror: Print warning on activation if mirrors exists and !ignore_lvm_mirrors
>
> Print a warning if mirrors are present and config file settings indicate
> that they can be scanned for labels. (A process that has the potential
> to block indefinitely if it happens just after a failure.)
>
> Index: lvm2/lib/activate/activate.c
> ===================================================================
> --- lvm2.orig/lib/activate/activate.c
> +++ lvm2/lib/activate/activate.c
> @@ -1077,9 +1077,18 @@ static int _lv_open_count(struct cmd_con
>
> static int _lv_activate_lv(struct logical_volume *lv, struct lv_activate_opts *laopts)
> {
> + static int mirror_warning_printed = 0;
> int r;
> struct dev_manager *dm;
>
> + if (lv_is_mirrored(lv) && !lv_is_raid(lv) &&
> + !mirror_warning_printed && !ignore_lvm_mirrors()) {
> + log_print("Scanning mirrors for LVM labels is enabled.\n"
> + " It is possible for this to cause I/O hangs and stuck LVM processes.\n"
> + " See 'ignore_lvm_mirrors' in the LVM configuation file for details.");
> + mirror_warning_printed = 1;
> + }
> +
> if (!(dm = dev_manager_create(lv->vg->cmd, lv->vg->name, (lv->status & PVMOVE) ? 0 : 1)))
We should probably introduce log_print_once()
(Just like we have log_error_once())
Zdenek
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-10-23 8:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-21 22:34 [PATCH]: Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors Jonathan Brassow
2013-10-22 9:42 ` Zdenek Kabelac
2013-10-22 13:00 ` Brassow Jonathan
2013-10-22 23:39 ` [PATCH v2]: " Jonathan Brassow
2013-10-23 8:50 ` Zdenek Kabelac
2013-10-22 23:41 ` [PATCH]: Mirror: warn when activating mirror and !ignore_lvm_mirrors Jonathan Brassow
2013-10-23 7:52 ` Zdenek Kabelac
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.