* [PATCH v2 0/5] Fix checkpointing - invasive
@ 2024-01-18 10:30 Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 1/5] Remove hardcoded checkpoint interval checking Mateusz Kusiak
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Mateusz Kusiak @ 2024-01-18 10:30 UTC (permalink / raw)
To: linux-raid; +Cc: jes, mariusz.tkaczyk
This is the second half of splitted patchset "Fix checkpointing" as
asked. It contains invasive changes regarding checkpoint handling,
that should not be merged prior to release.
Note, this patchset applies on the first part "Fix checkpointing -
minors".
Mateusz Kusiak (5):
Remove hardcoded checkpoint interval checking
monitor: refactor checkpoint update
Super-intel: Fix first checkpoint restart
Grow: Move update_tail assign to Grow_reshape()
Add understanding output section in man
Grow.c | 13 ++++++-----
mdadm.8.in | 21 ++++++++++++++++-
monitor.c | 65 +++++++++++++++++++++------------------------------
super-intel.c | 3 +++
4 files changed, 57 insertions(+), 45 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/5] Remove hardcoded checkpoint interval checking
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
@ 2024-01-18 10:30 ` Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 2/5] monitor: refactor checkpoint update Mateusz Kusiak
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mateusz Kusiak @ 2024-01-18 10:30 UTC (permalink / raw)
To: linux-raid; +Cc: jes, mariusz.tkaczyk
Mdmon assumes that kernel marks checkpoint every 1/16 of the volume size
and that the checkpoints are equal in size. This is not true, kernel may
mark checkpoints more frequently depending on several factors, including
sync speed. This results in checkpoints reported by mdadm --examine
falling behind the one reported by kernel.
Remove hardcoded checkpoint interval checking.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
---
monitor.c | 22 ++++++----------------
1 file changed, 6 insertions(+), 16 deletions(-)
diff --git a/monitor.c b/monitor.c
index 4acec6783e6e..b8d9e8810b17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -564,22 +564,10 @@ static int read_and_act(struct active_array *a, fd_set *fds)
}
}
- /* Check for recovery checkpoint notifications. We need to be a
- * minimum distance away from the last checkpoint to prevent
- * over checkpointing. Note reshape checkpointing is handled
- * in the second branch.
+ /* Handle reshape checkpointing
*/
- if (sync_completed > a->last_checkpoint &&
- sync_completed - a->last_checkpoint > a->info.component_size >> 4 &&
- a->curr_action > reshape) {
- /* A (non-reshape) sync_action has reached a checkpoint.
- * Record the updated position in the metadata
- */
- a->last_checkpoint = sync_completed;
- a->container->ss->set_array_state(a, a->curr_state <= clean);
- } else if ((a->curr_action == idle && a->prev_action == reshape) ||
- (a->curr_action == reshape &&
- sync_completed > a->last_checkpoint)) {
+ if ((a->curr_action == idle && a->prev_action == reshape) ||
+ (a->curr_action == reshape && sync_completed > a->last_checkpoint)) {
/* Reshape has progressed or completed so we need to
* update the array state - and possibly the array size
*/
@@ -607,8 +595,10 @@ static int read_and_act(struct active_array *a, fd_set *fds)
a->last_checkpoint = sync_completed;
}
- if (sync_completed > a->last_checkpoint)
+ if (sync_completed > a->last_checkpoint) {
a->last_checkpoint = sync_completed;
+ a->container->ss->set_array_state(a, a->curr_state <= clean);
+ }
if (sync_completed >= a->info.component_size)
a->last_checkpoint = 0;
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/5] monitor: refactor checkpoint update
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 1/5] Remove hardcoded checkpoint interval checking Mateusz Kusiak
@ 2024-01-18 10:30 ` Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 3/5] Super-intel: Fix first checkpoint restart Mateusz Kusiak
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mateusz Kusiak @ 2024-01-18 10:30 UTC (permalink / raw)
To: linux-raid; +Cc: jes, mariusz.tkaczyk
"if" statements of checkpoint updates have too many responsibilties.
This results in unclear code flow and duplicated code.
Refactor checkpoint update code and simplify "if" statements.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
---
monitor.c | 51 +++++++++++++++++++++++++--------------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/monitor.c b/monitor.c
index b8d9e8810b17..be0bec785080 100644
--- a/monitor.c
+++ b/monitor.c
@@ -412,6 +412,7 @@ static int read_and_act(struct active_array *a, fd_set *fds)
int ret = 0;
int count = 0;
struct timeval tv;
+ bool write_checkpoint = false;
a->next_state = bad_word;
a->next_action = bad_action;
@@ -564,40 +565,38 @@ static int read_and_act(struct active_array *a, fd_set *fds)
}
}
- /* Handle reshape checkpointing
- */
- if ((a->curr_action == idle && a->prev_action == reshape) ||
- (a->curr_action == reshape && sync_completed > a->last_checkpoint)) {
- /* Reshape has progressed or completed so we need to
- * update the array state - and possibly the array size
- */
+ /* Update reshape checkpoint, depending if it finished or progressed */
+ if (a->curr_action == idle && a->prev_action == reshape) {
+ char buf[SYSFS_MAX_BUF_SIZE];
+
if (sync_completed != 0)
a->last_checkpoint = sync_completed;
- /* We might need to update last_checkpoint depending on
- * the reason that reshape finished.
- * if array reshape is really finished:
- * set check point to the end, this allows
- * set_array_state() to finalize reshape in metadata
- * if reshape if broken: do not set checkpoint to the end
- * this allows for reshape restart from checkpoint
+
+ /*
+ * If reshape really finished, set checkpoint to the end to finalize it.
+ * Do not set checkpoint if reshape is broken.
+ * Reshape will restart from last checkpoint.
*/
- if ((a->curr_action != reshape) &&
- (a->prev_action == reshape)) {
- char buf[SYSFS_MAX_BUF_SIZE];
- if ((sysfs_get_str(&a->info, NULL,
- "reshape_position",
- buf,
- sizeof(buf)) >= 0) &&
- str_is_none(buf) == true)
+ if (sysfs_get_str(&a->info, NULL, "reshape_position", buf, sizeof(buf)) >= 0)
+ if (str_is_none(buf) == true)
a->last_checkpoint = a->info.component_size;
- }
- a->container->ss->set_array_state(a, a->curr_state <= clean);
- a->last_checkpoint = sync_completed;
+
+ write_checkpoint = true;
}
- if (sync_completed > a->last_checkpoint) {
+ if (a->curr_action >= reshape && sync_completed > a->last_checkpoint) {
+ /* Update checkpoint if neither reshape nor idle action */
a->last_checkpoint = sync_completed;
+
+ write_checkpoint = true;
+ }
+
+ /* Save checkpoint */
+ if (write_checkpoint) {
a->container->ss->set_array_state(a, a->curr_state <= clean);
+
+ if (a->curr_action <= reshape)
+ a->last_checkpoint = sync_completed;
}
if (sync_completed >= a->info.component_size)
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/5] Super-intel: Fix first checkpoint restart
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 1/5] Remove hardcoded checkpoint interval checking Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 2/5] monitor: refactor checkpoint update Mateusz Kusiak
@ 2024-01-18 10:30 ` Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 4/5] Grow: Move update_tail assign to Grow_reshape() Mateusz Kusiak
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mateusz Kusiak @ 2024-01-18 10:30 UTC (permalink / raw)
To: linux-raid; +Cc: jes, mariusz.tkaczyk
When imsm based array is stopped after reaching first checkpoint and
then assembled, first checkpoint is reported as 0.
This behaviour is valid only for initial checkpoint, if the array was
stopped while performing some action.
Last checkpoint value is not taken from metadata but always starts
with 0 and it's incremented when sync_completed in sysfs changes.
In simplification, read_and_act() is responsible for checkpoint updates
and is executed each time sysfs checkpoint update happens. For first
checkpoint it is executed twice and due to marking checkpoint before
triggering any action on the array, it is impossible to read
sync_completed from sysfs in just two iterations.
The workaround to this is not marking any checkpoint for first
sysfs checkpoint after RAID assembly, to preserve checkpoint value
stored in metadata.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
---
super-intel.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/super-intel.c b/super-intel.c
index 6a664a2e58d3..ebf43209e75d 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8741,6 +8741,9 @@ static int imsm_set_array_state(struct active_array *a, int consistent)
super->updates_pending++;
}
+ if (a->prev_action == idle)
+ goto skip_mark_checkpoint;
+
mark_checkpoint:
/* skip checkpointing for general migration,
* it is controlled in mdadm
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 4/5] Grow: Move update_tail assign to Grow_reshape()
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
` (2 preceding siblings ...)
2024-01-18 10:30 ` [PATCH v2 3/5] Super-intel: Fix first checkpoint restart Mateusz Kusiak
@ 2024-01-18 10:30 ` Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 5/5] Add understanding output section in man Mateusz Kusiak
2024-02-20 13:07 ` [PATCH v2 0/5] Fix checkpointing - invasive Mariusz Tkaczyk
5 siblings, 0 replies; 7+ messages in thread
From: Mateusz Kusiak @ 2024-01-18 10:30 UTC (permalink / raw)
To: linux-raid; +Cc: jes, mariusz.tkaczyk
Due to e919fb0af245 ("FIX: Enable metadata updates for raid0") code
can't enter super-intel.c:3415, resulting in checkpoint not being
saved to metadata for second volume in matrix raid array.
This results in checkpoint being stuck at last value for the
first volume.
Move st->update_tail to Grow_reshape() so it is assigned for each
volume.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
---
Grow.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/Grow.c b/Grow.c
index f95dae82ef0d..5498e54fec11 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2085,9 +2085,10 @@ int Grow_reshape(char *devname, int fd,
if (!mdmon_running(st->container_devnm))
start_mdmon(st->container_devnm);
ping_monitor(container);
- if (mdmon_running(st->container_devnm) &&
- st->update_tail == NULL)
- st->update_tail = &st->updates;
+ if (mdmon_running(st->container_devnm) == false) {
+ pr_err("No mdmon found. Grow cannot continue.\n");
+ goto release;
+ }
}
if (s->size == MAX_SIZE)
@@ -3048,6 +3049,8 @@ static int reshape_array(char *container, int fd, char *devname,
dprintf("Cannot get array information.\n");
goto release;
}
+ if (st->update_tail == NULL)
+ st->update_tail = &st->updates;
if (array.level == 0 && info->component_size == 0) {
get_dev_size(fd, NULL, &array_size);
info->component_size = array_size / array.raid_disks;
@@ -5152,9 +5155,7 @@ int Grow_continue_command(char *devname, int fd,
start_mdmon(container);
ping_monitor(container);
- if (mdmon_running(container))
- st->update_tail = &st->updates;
- else {
+ if (mdmon_running(container) == false) {
pr_err("No mdmon found. Grow cannot continue.\n");
ret_val = 1;
goto Grow_continue_command_exit;
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 5/5] Add understanding output section in man
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
` (3 preceding siblings ...)
2024-01-18 10:30 ` [PATCH v2 4/5] Grow: Move update_tail assign to Grow_reshape() Mateusz Kusiak
@ 2024-01-18 10:30 ` Mateusz Kusiak
2024-02-20 13:07 ` [PATCH v2 0/5] Fix checkpointing - invasive Mariusz Tkaczyk
5 siblings, 0 replies; 7+ messages in thread
From: Mateusz Kusiak @ 2024-01-18 10:30 UTC (permalink / raw)
To: linux-raid; +Cc: jes, mariusz.tkaczyk
Add new section in man for explaining mdadm outputs.
Describe checkpoint entry.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
---
mdadm.8.in | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/mdadm.8.in b/mdadm.8.in
index 3142436f728c..e3033a957bb3 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -3179,7 +3179,7 @@ environment. This can be useful for testing or for disaster
recovery. You should be aware that interoperability may be
compromised by setting this value.
-These change can also be suppressed by adding
+These change can also be suppressed by adding
.B mdadm.imsm.test=1
to the kernel command line. This makes it easy to test IMSM
code in a virtual machine that doesn't have IMSM virtual hardware.
@@ -3454,6 +3454,25 @@ is any string. These names are supported by
since version 3.3 provided they are enabled in
.IR mdadm.conf .
+.SH UNDERSTANDING OUTPUT
+
+.TP
+EXAMINE
+
+.TP
+.B checkpoint
+Checkpoint value is reported when array is performing some action including
+resync, recovery or reshape. Checkpoints allow resuming action from certain
+point if it was interrupted.
+
+Checkpoint is reported as combination of two values: current migration unit
+and number of blocks per unit. By multiplying those values and dividing by
+array size checkpoint progress percentage can be obtained in relation to
+current progress reported in /proc/mdstat. Checkpoint is also related to (and
+sometimes based on) sysfs entry sync_completed but depending on action units
+may differ. Even if units are the same, it should not be expected that
+checkpoint and sync_completed will be exact match nor updated simultaneously.
+
.SH NOTE
.I mdadm
was previously known as
--
2.35.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/5] Fix checkpointing - invasive
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
` (4 preceding siblings ...)
2024-01-18 10:30 ` [PATCH v2 5/5] Add understanding output section in man Mateusz Kusiak
@ 2024-02-20 13:07 ` Mariusz Tkaczyk
5 siblings, 0 replies; 7+ messages in thread
From: Mariusz Tkaczyk @ 2024-02-20 13:07 UTC (permalink / raw)
To: Mateusz Kusiak; +Cc: linux-raid, jes
On Thu, 18 Jan 2024 11:30:14 +0100
Mateusz Kusiak <mateusz.kusiak@intel.com> wrote:
> This is the second half of splitted patchset "Fix checkpointing" as
> asked. It contains invasive changes regarding checkpoint handling,
> that should not be merged prior to release.
>
> Note, this patchset applies on the first part "Fix checkpointing -
> minors".
>
> Mateusz Kusiak (5):
> Remove hardcoded checkpoint interval checking
> monitor: refactor checkpoint update
> Super-intel: Fix first checkpoint restart
> Grow: Move update_tail assign to Grow_reshape()
> Add understanding output section in man
>
> Grow.c | 13 ++++++-----
> mdadm.8.in | 21 ++++++++++++++++-
> monitor.c | 65 +++++++++++++++++++++------------------------------
> super-intel.c | 3 +++
> 4 files changed, 57 insertions(+), 45 deletions(-)
>
Applied!
Thanks,
Mariusz
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-20 13:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-18 10:30 [PATCH v2 0/5] Fix checkpointing - invasive Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 1/5] Remove hardcoded checkpoint interval checking Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 2/5] monitor: refactor checkpoint update Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 3/5] Super-intel: Fix first checkpoint restart Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 4/5] Grow: Move update_tail assign to Grow_reshape() Mateusz Kusiak
2024-01-18 10:30 ` [PATCH v2 5/5] Add understanding output section in man Mateusz Kusiak
2024-02-20 13:07 ` [PATCH v2 0/5] Fix checkpointing - invasive Mariusz Tkaczyk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).