[PATCH V2] mdmon: imsm: fix metadata corruption when managing new array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
@ 2025-02-18 18:48 Junxiao Bi
  2025-02-24 13:15 ` Blazej Kucman
  0 siblings, 1 reply; 7+ messages in thread
From: Junxiao Bi @ 2025-02-18 18:48 UTC (permalink / raw)
  To: mtkaczyk, blazej.kucman
  Cc: linux-raid, ncroxon, song, xni, yukuai, junxiao.bi

When manager thread detects new array, it will invoke manage_new().
For imsm array, it will further invoke imsm_open_new(). Since
commit bbab0940fa75("imsm: write bad block log on metadata sync"),
it preallocates bad block log when opening the array, that requires
increasing the mpb buffer size.
For that, imsm_open_new() invokes function imsm_update_metadata_locally(),
which first uses imsm_prepare_update() to allocate a larger mpb buffer
and store it at "mpb->next_buf", and then invoke imsm_process_update()
to copy the content from current mpb buffer "mpb->buf" to "mpb->next_buf",
and then free the current mpb buffer and set the new buffer as current.

There is a small race window, when monitor thread is syncing metadata,
it gets current buffer pointer in imsm_sync_metadata()->write_super_imsm(),
but before flushing the buffer to disk, manager thread does above switching
buffer which frees current buffer, then monitor thread will run into
use-after-free issue and could cause on-disk metadata corruption.
If system keeps running, further metadata update could fix the corruption,
because after switching buffer, the new buffer will contain good metadata,
but if panic/power cycle happens while disk metadata is corrupted,
the system will run into bootup failure if array is used as root,
otherwise the array can not be assembled after boot if not used as root.

This issue will not happen for imsm array with only one member array,
because the memory array has not be opened yet, monitor thread will not
do any metadata updates.
This can happen for imsm array with at lease two member array, in the
following two scenarios:
1. Restarting mdmon process with at least two member array
This will happen during system boot up or user restart mdmon after mdadm
upgrade
2. Adding new member array to exist imsm array with at least one member
array.

To fix this, delay the switching buffer operation to monitor thread.

Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
v2 <- v1:
 - address code style in manage_new()
 - make lines of git log not over 75 characters

 managemon.c   | 10 ++++++++--
 super-intel.c | 14 +++++++++++---
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/managemon.c b/managemon.c
index d79813282457..74b64bfc9613 100644
--- a/managemon.c
+++ b/managemon.c
@@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent *mdstat,
 	 * the monitor.
 	 */

+	struct metadata_update *update = NULL;
 	struct active_array *new = NULL;
 	struct mdinfo *mdi = NULL, *di;
-	int i, inst;
-	int failed = 0;
 	char buf[SYSFS_MAX_BUF_SIZE];
+	int failed = 0;
+	int i, inst;

 	/* check if array is ready to be monitored */
 	if (!mdstat->active || !mdstat->level)
@@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent *mdstat,
 	/* if everything checks out tell the metadata handler we want to
 	 * manage this instance
 	 */
+	container->update_tail = &update;
 	if (!aa_ready(new) || container->ss->open_new(container, new, inst) < 0) {
+		container->update_tail = NULL;
 		goto error;
 	} else {
+		if (update)
+			queue_metadata_update(update);
+		container->update_tail = NULL;
 		replace_array(container, victim, new);
 		if (failed) {
 			new->check_degraded = 1;
diff --git a/super-intel.c b/super-intel.c
index cab841980830..4988eef191da 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct intel_super *super, struct imsm_dev *dev,
 	return failed;
 }

+static int imsm_prepare_update(struct supertype *st,
+			       struct metadata_update *update);
 static int imsm_open_new(struct supertype *c, struct active_array *a,
 			 int inst)
 {
 	struct intel_super *super = c->sb;
 	struct imsm_super *mpb = super->anchor;
-	struct imsm_update_prealloc_bb_mem u;
+	struct imsm_update_prealloc_bb_mem *u;
+	struct metadata_update mu;

 	if (inst >= mpb->num_raid_devs) {
 		pr_err("subarry index %d, out of range\n", inst);
@@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype *c, struct active_array *a,
 	dprintf("imsm: open_new %d\n", inst);
 	a->info.container_member = inst;

-	u.type = update_prealloc_badblocks_mem;
-	imsm_update_metadata_locally(c, &u, sizeof(u));
+	u = xmalloc(sizeof(*u));
+	u->type = update_prealloc_badblocks_mem;
+	mu.len = sizeof(*u);
+	mu.buf = (char *)u;
+	imsm_prepare_update(c, &mu);
+	if (c->update_tail)
+		append_metadata_update(c, u, sizeof(*u));

 	return 0;
 }
-- 
2.39.5 (Apple Git-154)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
  2025-02-18 18:48 [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array Junxiao Bi
@ 2025-02-24 13:15 ` Blazej Kucman
  2025-02-27  6:48   ` Xiao Ni
  0 siblings, 1 reply; 7+ messages in thread
From: Blazej Kucman @ 2025-02-24 13:15 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: mtkaczyk, linux-raid, ncroxon, song, xni, yukuai

On Tue, 18 Feb 2025 10:48:31 -0800
Junxiao Bi <junxiao.bi@oracle.com> wrote:

> When manager thread detects new array, it will invoke manage_new().
> For imsm array, it will further invoke imsm_open_new(). Since
> commit bbab0940fa75("imsm: write bad block log on metadata sync"),
> it preallocates bad block log when opening the array, that requires
> increasing the mpb buffer size.
> For that, imsm_open_new() invokes function
> imsm_update_metadata_locally(), which first uses
> imsm_prepare_update() to allocate a larger mpb buffer and store it at
> "mpb->next_buf", and then invoke imsm_process_update() to copy the
> content from current mpb buffer "mpb->buf" to "mpb->next_buf", and
> then free the current mpb buffer and set the new buffer as current.
> 
> There is a small race window, when monitor thread is syncing metadata,
> it gets current buffer pointer in
> imsm_sync_metadata()->write_super_imsm(), but before flushing the
> buffer to disk, manager thread does above switching buffer which
> frees current buffer, then monitor thread will run into
> use-after-free issue and could cause on-disk metadata corruption. If
> system keeps running, further metadata update could fix the
> corruption, because after switching buffer, the new buffer will
> contain good metadata, but if panic/power cycle happens while disk
> metadata is corrupted, the system will run into bootup failure if
> array is used as root, otherwise the array can not be assembled after
> boot if not used as root.
> 
> This issue will not happen for imsm array with only one member array,
> because the memory array has not be opened yet, monitor thread will
> not do any metadata updates.
> This can happen for imsm array with at lease two member array, in the
> following two scenarios:
> 1. Restarting mdmon process with at least two member array
> This will happen during system boot up or user restart mdmon after
> mdadm upgrade
> 2. Adding new member array to exist imsm array with at least one
> member array.
> 
> To fix this, delay the switching buffer operation to monitor thread.
> 
> Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
> v2 <- v1:
>  - address code style in manage_new()
>  - make lines of git log not over 75 characters
> 
>  managemon.c   | 10 ++++++++--
>  super-intel.c | 14 +++++++++++---
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/managemon.c b/managemon.c
> index d79813282457..74b64bfc9613 100644
> --- a/managemon.c
> +++ b/managemon.c
> @@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent
> *mdstat,
>  	 * the monitor.
>  	 */
>  
> +	struct metadata_update *update = NULL;
>  	struct active_array *new = NULL;
>  	struct mdinfo *mdi = NULL, *di;
> -	int i, inst;
> -	int failed = 0;
>  	char buf[SYSFS_MAX_BUF_SIZE];
> +	int failed = 0;
> +	int i, inst;
>  
>  	/* check if array is ready to be monitored */
>  	if (!mdstat->active || !mdstat->level)
> @@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent *mdstat,
>  	/* if everything checks out tell the metadata handler we
> want to
>  	 * manage this instance
>  	 */
> +	container->update_tail = &update;
>  	if (!aa_ready(new) || container->ss->open_new(container,
> new, inst) < 0) {
> +		container->update_tail = NULL;
>  		goto error;
>  	} else {
> +		if (update)
> +			queue_metadata_update(update);
> +		container->update_tail = NULL;
>  		replace_array(container, victim, new);
>  		if (failed) {
>  			new->check_degraded = 1;
> diff --git a/super-intel.c b/super-intel.c
> index cab841980830..4988eef191da 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct
> intel_super *super, struct imsm_dev *dev, return failed;
>  }
>  
> +static int imsm_prepare_update(struct supertype *st,
> +			       struct metadata_update *update);
>  static int imsm_open_new(struct supertype *c, struct active_array *a,
>  			 int inst)
>  {
>  	struct intel_super *super = c->sb;
>  	struct imsm_super *mpb = super->anchor;
> -	struct imsm_update_prealloc_bb_mem u;
> +	struct imsm_update_prealloc_bb_mem *u;
> +	struct metadata_update mu;
>  
>  	if (inst >= mpb->num_raid_devs) {
>  		pr_err("subarry index %d, out of range\n", inst);
> @@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype *c,
> struct active_array *a, dprintf("imsm: open_new %d\n", inst);
>  	a->info.container_member = inst;
>  
> -	u.type = update_prealloc_badblocks_mem;
> -	imsm_update_metadata_locally(c, &u, sizeof(u));
> +	u = xmalloc(sizeof(*u));
> +	u->type = update_prealloc_badblocks_mem;
> +	mu.len = sizeof(*u);
> +	mu.buf = (char *)u;
> +	imsm_prepare_update(c, &mu);
> +	if (c->update_tail)
> +		append_metadata_update(c, u, sizeof(*u));
>  
>  	return 0;
>  }

Hi Junxiao,

LGTM, Approved

Thanks,
Blazej


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
  2025-02-24 13:15 ` Blazej Kucman
@ 2025-02-27  6:48   ` Xiao Ni
  2025-02-28  9:38     ` Blazej Kucman
  0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-02-27  6:48 UTC (permalink / raw)
  To: Blazej Kucman; +Cc: Junxiao Bi, mtkaczyk, linux-raid, ncroxon, song, yukuai

On Mon, Feb 24, 2025 at 9:16 PM Blazej Kucman
<blazej.kucman@linux.intel.com> wrote:
>
> On Tue, 18 Feb 2025 10:48:31 -0800
> Junxiao Bi <junxiao.bi@oracle.com> wrote:
>
> > When manager thread detects new array, it will invoke manage_new().
> > For imsm array, it will further invoke imsm_open_new(). Since
> > commit bbab0940fa75("imsm: write bad block log on metadata sync"),
> > it preallocates bad block log when opening the array, that requires
> > increasing the mpb buffer size.
> > For that, imsm_open_new() invokes function
> > imsm_update_metadata_locally(), which first uses
> > imsm_prepare_update() to allocate a larger mpb buffer and store it at
> > "mpb->next_buf", and then invoke imsm_process_update() to copy the
> > content from current mpb buffer "mpb->buf" to "mpb->next_buf", and
> > then free the current mpb buffer and set the new buffer as current.
> >
> > There is a small race window, when monitor thread is syncing metadata,
> > it gets current buffer pointer in
> > imsm_sync_metadata()->write_super_imsm(), but before flushing the
> > buffer to disk, manager thread does above switching buffer which
> > frees current buffer, then monitor thread will run into
> > use-after-free issue and could cause on-disk metadata corruption. If
> > system keeps running, further metadata update could fix the
> > corruption, because after switching buffer, the new buffer will
> > contain good metadata, but if panic/power cycle happens while disk
> > metadata is corrupted, the system will run into bootup failure if
> > array is used as root, otherwise the array can not be assembled after
> > boot if not used as root.
> >
> > This issue will not happen for imsm array with only one member array,
> > because the memory array has not be opened yet, monitor thread will
> > not do any metadata updates.
> > This can happen for imsm array with at lease two member array, in the
> > following two scenarios:
> > 1. Restarting mdmon process with at least two member array
> > This will happen during system boot up or user restart mdmon after
> > mdadm upgrade
> > 2. Adding new member array to exist imsm array with at least one
> > member array.
> >
> > To fix this, delay the switching buffer operation to monitor thread.
> >
> > Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> > ---
> > v2 <- v1:
> >  - address code style in manage_new()
> >  - make lines of git log not over 75 characters
> >
> >  managemon.c   | 10 ++++++++--
> >  super-intel.c | 14 +++++++++++---
> >  2 files changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/managemon.c b/managemon.c
> > index d79813282457..74b64bfc9613 100644
> > --- a/managemon.c
> > +++ b/managemon.c
> > @@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent
> > *mdstat,
> >        * the monitor.
> >        */
> >
> > +     struct metadata_update *update = NULL;
> >       struct active_array *new = NULL;
> >       struct mdinfo *mdi = NULL, *di;
> > -     int i, inst;
> > -     int failed = 0;
> >       char buf[SYSFS_MAX_BUF_SIZE];
> > +     int failed = 0;
> > +     int i, inst;
> >
> >       /* check if array is ready to be monitored */
> >       if (!mdstat->active || !mdstat->level)
> > @@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent *mdstat,
> >       /* if everything checks out tell the metadata handler we
> > want to
> >        * manage this instance
> >        */
> > +     container->update_tail = &update;
> >       if (!aa_ready(new) || container->ss->open_new(container,
> > new, inst) < 0) {
> > +             container->update_tail = NULL;
> >               goto error;
> >       } else {
> > +             if (update)
> > +                     queue_metadata_update(update);
> > +             container->update_tail = NULL;
> >               replace_array(container, victim, new);
> >               if (failed) {
> >                       new->check_degraded = 1;
> > diff --git a/super-intel.c b/super-intel.c
> > index cab841980830..4988eef191da 100644
> > --- a/super-intel.c
> > +++ b/super-intel.c
> > @@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct
> > intel_super *super, struct imsm_dev *dev, return failed;
> >  }
> >
> > +static int imsm_prepare_update(struct supertype *st,
> > +                            struct metadata_update *update);
> >  static int imsm_open_new(struct supertype *c, struct active_array *a,
> >                        int inst)
> >  {
> >       struct intel_super *super = c->sb;
> >       struct imsm_super *mpb = super->anchor;
> > -     struct imsm_update_prealloc_bb_mem u;
> > +     struct imsm_update_prealloc_bb_mem *u;
> > +     struct metadata_update mu;
> >
> >       if (inst >= mpb->num_raid_devs) {
> >               pr_err("subarry index %d, out of range\n", inst);
> > @@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype *c,
> > struct active_array *a, dprintf("imsm: open_new %d\n", inst);
> >       a->info.container_member = inst;
> >
> > -     u.type = update_prealloc_badblocks_mem;
> > -     imsm_update_metadata_locally(c, &u, sizeof(u));
> > +     u = xmalloc(sizeof(*u));
> > +     u->type = update_prealloc_badblocks_mem;
> > +     mu.len = sizeof(*u);
> > +     mu.buf = (char *)u;
> > +     imsm_prepare_update(c, &mu);
> > +     if (c->update_tail)
> > +             append_metadata_update(c, u, sizeof(*u));
> >
> >       return 0;
> >  }
>
> Hi Junxiao,
>
> LGTM, Approved
>
> Thanks,
> Blazej
>

Hi Blazej

Have you updated the PR
https://github.com/md-raid-utilities/mdadm/pull/152 with this V2
version?

Regards
Xiao


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
  2025-02-27  6:48   ` Xiao Ni
@ 2025-02-28  9:38     ` Blazej Kucman
  2025-02-28 15:56       ` Junxiao Bi
  0 siblings, 1 reply; 7+ messages in thread
From: Blazej Kucman @ 2025-02-28  9:38 UTC (permalink / raw)
  To: Xiao Ni; +Cc: Junxiao Bi, mtkaczyk, linux-raid, ncroxon, song, yukuai

On Thu, 27 Feb 2025 14:48:17 +0800
Xiao Ni <xni@redhat.com> wrote:

> On Mon, Feb 24, 2025 at 9:16 PM Blazej Kucman
> <blazej.kucman@linux.intel.com> wrote:
> >
> > On Tue, 18 Feb 2025 10:48:31 -0800
> > Junxiao Bi <junxiao.bi@oracle.com> wrote:
> >  
> > > When manager thread detects new array, it will invoke
> > > manage_new(). For imsm array, it will further invoke
> > > imsm_open_new(). Since commit bbab0940fa75("imsm: write bad block
> > > log on metadata sync"), it preallocates bad block log when
> > > opening the array, that requires increasing the mpb buffer size.
> > > For that, imsm_open_new() invokes function
> > > imsm_update_metadata_locally(), which first uses
> > > imsm_prepare_update() to allocate a larger mpb buffer and store
> > > it at "mpb->next_buf", and then invoke imsm_process_update() to
> > > copy the content from current mpb buffer "mpb->buf" to
> > > "mpb->next_buf", and then free the current mpb buffer and set the
> > > new buffer as current.
> > >
> > > There is a small race window, when monitor thread is syncing
> > > metadata, it gets current buffer pointer in
> > > imsm_sync_metadata()->write_super_imsm(), but before flushing the
> > > buffer to disk, manager thread does above switching buffer which
> > > frees current buffer, then monitor thread will run into
> > > use-after-free issue and could cause on-disk metadata corruption.
> > > If system keeps running, further metadata update could fix the
> > > corruption, because after switching buffer, the new buffer will
> > > contain good metadata, but if panic/power cycle happens while disk
> > > metadata is corrupted, the system will run into bootup failure if
> > > array is used as root, otherwise the array can not be assembled
> > > after boot if not used as root.
> > >
> > > This issue will not happen for imsm array with only one member
> > > array, because the memory array has not be opened yet, monitor
> > > thread will not do any metadata updates.
> > > This can happen for imsm array with at lease two member array, in
> > > the following two scenarios:
> > > 1. Restarting mdmon process with at least two member array
> > > This will happen during system boot up or user restart mdmon after
> > > mdadm upgrade
> > > 2. Adding new member array to exist imsm array with at least one
> > > member array.
> > >
> > > To fix this, delay the switching buffer operation to monitor
> > > thread.
> > >
> > > Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
> > > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> > > ---
> > > v2 <- v1:
> > >  - address code style in manage_new()
> > >  - make lines of git log not over 75 characters
> > >
> > >  managemon.c   | 10 ++++++++--
> > >  super-intel.c | 14 +++++++++++---
> > >  2 files changed, 19 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/managemon.c b/managemon.c
> > > index d79813282457..74b64bfc9613 100644
> > > --- a/managemon.c
> > > +++ b/managemon.c
> > > @@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent
> > > *mdstat,
> > >        * the monitor.
> > >        */
> > >
> > > +     struct metadata_update *update = NULL;
> > >       struct active_array *new = NULL;
> > >       struct mdinfo *mdi = NULL, *di;
> > > -     int i, inst;
> > > -     int failed = 0;
> > >       char buf[SYSFS_MAX_BUF_SIZE];
> > > +     int failed = 0;
> > > +     int i, inst;
> > >
> > >       /* check if array is ready to be monitored */
> > >       if (!mdstat->active || !mdstat->level)
> > > @@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent
> > > *mdstat, /* if everything checks out tell the metadata handler we
> > > want to
> > >        * manage this instance
> > >        */
> > > +     container->update_tail = &update;
> > >       if (!aa_ready(new) || container->ss->open_new(container,
> > > new, inst) < 0) {
> > > +             container->update_tail = NULL;
> > >               goto error;
> > >       } else {
> > > +             if (update)
> > > +                     queue_metadata_update(update);
> > > +             container->update_tail = NULL;
> > >               replace_array(container, victim, new);
> > >               if (failed) {
> > >                       new->check_degraded = 1;
> > > diff --git a/super-intel.c b/super-intel.c
> > > index cab841980830..4988eef191da 100644
> > > --- a/super-intel.c
> > > +++ b/super-intel.c
> > > @@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct
> > > intel_super *super, struct imsm_dev *dev, return failed;
> > >  }
> > >
> > > +static int imsm_prepare_update(struct supertype *st,
> > > +                            struct metadata_update *update);
> > >  static int imsm_open_new(struct supertype *c, struct
> > > active_array *a, int inst)
> > >  {
> > >       struct intel_super *super = c->sb;
> > >       struct imsm_super *mpb = super->anchor;
> > > -     struct imsm_update_prealloc_bb_mem u;
> > > +     struct imsm_update_prealloc_bb_mem *u;
> > > +     struct metadata_update mu;
> > >
> > >       if (inst >= mpb->num_raid_devs) {
> > >               pr_err("subarry index %d, out of range\n", inst);
> > > @@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype
> > > *c, struct active_array *a, dprintf("imsm: open_new %d\n", inst);
> > >       a->info.container_member = inst;
> > >
> > > -     u.type = update_prealloc_badblocks_mem;
> > > -     imsm_update_metadata_locally(c, &u, sizeof(u));
> > > +     u = xmalloc(sizeof(*u));
> > > +     u->type = update_prealloc_badblocks_mem;
> > > +     mu.len = sizeof(*u);
> > > +     mu.buf = (char *)u;
> > > +     imsm_prepare_update(c, &mu);
> > > +     if (c->update_tail)
> > > +             append_metadata_update(c, u, sizeof(*u));
> > >
> > >       return 0;
> > >  }  
> >
> > Hi Junxiao,
> >
> > LGTM, Approved
> >
> > Thanks,
> > Blazej
> >  
> 
> Hi Blazej
> 
> Have you updated the PR
> https://github.com/md-raid-utilities/mdadm/pull/152 with this V2
> version?
> 
> Regards
> Xiao
> 
Hi,

Yes, the test result is the same as V1 but there is one note from
checkpatch

WARNING:BAD_FIXES_TAG: Please use correct Fixes: style 'Fixes: <12
chars of sha1> ("<title line>")' - ie: 'Fixes: ca847037fafb ("[V2]mdmon: imsm: fix metadata corruption when managing new array")'
#42:
Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")

Regards,
Blazej

> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
  2025-02-28  9:38     ` Blazej Kucman
@ 2025-02-28 15:56       ` Junxiao Bi
  2025-03-03 11:15         ` Blazej Kucman
  0 siblings, 1 reply; 7+ messages in thread
From: Junxiao Bi @ 2025-02-28 15:56 UTC (permalink / raw)
  To: Blazej Kucman
  Cc: Xiao Ni, mtkaczyk@kernel.org, linux-raid@vger.kernel.org,
	ncroxon@redhat.com, song@kernel.org, yukuai@kernel.org



> On Feb 28, 2025, at 1:38 AM, Blazej Kucman <blazej.kucman@linux.intel.com> wrote:
> 
> On Thu, 27 Feb 2025 14:48:17 +0800
> Xiao Ni <xni@redhat.com> wrote:
> 
>>> On Mon, Feb 24, 2025 at 9:16 PM Blazej Kucman
>>> <blazej.kucman@linux.intel.com> wrote:
>>> 
>>> On Tue, 18 Feb 2025 10:48:31 -0800
>>> Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>> 
>>>> When manager thread detects new array, it will invoke
>>>> manage_new(). For imsm array, it will further invoke
>>>> imsm_open_new(). Since commit bbab0940fa75("imsm: write bad block
>>>> log on metadata sync"), it preallocates bad block log when
>>>> opening the array, that requires increasing the mpb buffer size.
>>>> For that, imsm_open_new() invokes function
>>>> imsm_update_metadata_locally(), which first uses
>>>> imsm_prepare_update() to allocate a larger mpb buffer and store
>>>> it at "mpb->next_buf", and then invoke imsm_process_update() to
>>>> copy the content from current mpb buffer "mpb->buf" to
>>>> "mpb->next_buf", and then free the current mpb buffer and set the
>>>> new buffer as current.
>>>> 
>>>> There is a small race window, when monitor thread is syncing
>>>> metadata, it gets current buffer pointer in
>>>> imsm_sync_metadata()->write_super_imsm(), but before flushing the
>>>> buffer to disk, manager thread does above switching buffer which
>>>> frees current buffer, then monitor thread will run into
>>>> use-after-free issue and could cause on-disk metadata corruption.
>>>> If system keeps running, further metadata update could fix the
>>>> corruption, because after switching buffer, the new buffer will
>>>> contain good metadata, but if panic/power cycle happens while disk
>>>> metadata is corrupted, the system will run into bootup failure if
>>>> array is used as root, otherwise the array can not be assembled
>>>> after boot if not used as root.
>>>> 
>>>> This issue will not happen for imsm array with only one member
>>>> array, because the memory array has not be opened yet, monitor
>>>> thread will not do any metadata updates.
>>>> This can happen for imsm array with at lease two member array, in
>>>> the following two scenarios:
>>>> 1. Restarting mdmon process with at least two member array
>>>> This will happen during system boot up or user restart mdmon after
>>>> mdadm upgrade
>>>> 2. Adding new member array to exist imsm array with at least one
>>>> member array.
>>>> 
>>>> To fix this, delay the switching buffer operation to monitor
>>>> thread.
>>>> 
>>>> Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> ---
>>>> v2 <- v1:
>>>> - address code style in manage_new()
>>>> - make lines of git log not over 75 characters
>>>> 
>>>> managemon.c   | 10 ++++++++--
>>>> super-intel.c | 14 +++++++++++---
>>>> 2 files changed, 19 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/managemon.c b/managemon.c
>>>> index d79813282457..74b64bfc9613 100644
>>>> --- a/managemon.c
>>>> +++ b/managemon.c
>>>> @@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent
>>>> *mdstat,
>>>>       * the monitor.
>>>>       */
>>>> 
>>>> +     struct metadata_update *update = NULL;
>>>>      struct active_array *new = NULL;
>>>>      struct mdinfo *mdi = NULL, *di;
>>>> -     int i, inst;
>>>> -     int failed = 0;
>>>>      char buf[SYSFS_MAX_BUF_SIZE];
>>>> +     int failed = 0;
>>>> +     int i, inst;
>>>> 
>>>>      /* check if array is ready to be monitored */
>>>>      if (!mdstat->active || !mdstat->level)
>>>> @@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent
>>>> *mdstat, /* if everything checks out tell the metadata handler we
>>>> want to
>>>>       * manage this instance
>>>>       */
>>>> +     container->update_tail = &update;
>>>>      if (!aa_ready(new) || container->ss->open_new(container,
>>>> new, inst) < 0) {
>>>> +             container->update_tail = NULL;
>>>>              goto error;
>>>>      } else {
>>>> +             if (update)
>>>> +                     queue_metadata_update(update);
>>>> +             container->update_tail = NULL;
>>>>              replace_array(container, victim, new);
>>>>              if (failed) {
>>>>                      new->check_degraded = 1;
>>>> diff --git a/super-intel.c b/super-intel.c
>>>> index cab841980830..4988eef191da 100644
>>>> --- a/super-intel.c
>>>> +++ b/super-intel.c
>>>> @@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct
>>>> intel_super *super, struct imsm_dev *dev, return failed;
>>>> }
>>>> 
>>>> +static int imsm_prepare_update(struct supertype *st,
>>>> +                            struct metadata_update *update);
>>>> static int imsm_open_new(struct supertype *c, struct
>>>> active_array *a, int inst)
>>>> {
>>>>      struct intel_super *super = c->sb;
>>>>      struct imsm_super *mpb = super->anchor;
>>>> -     struct imsm_update_prealloc_bb_mem u;
>>>> +     struct imsm_update_prealloc_bb_mem *u;
>>>> +     struct metadata_update mu;
>>>> 
>>>>      if (inst >= mpb->num_raid_devs) {
>>>>              pr_err("subarry index %d, out of range\n", inst);
>>>> @@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype
>>>> *c, struct active_array *a, dprintf("imsm: open_new %d\n", inst);
>>>>      a->info.container_member = inst;
>>>> 
>>>> -     u.type = update_prealloc_badblocks_mem;
>>>> -     imsm_update_metadata_locally(c, &u, sizeof(u));
>>>> +     u = xmalloc(sizeof(*u));
>>>> +     u->type = update_prealloc_badblocks_mem;
>>>> +     mu.len = sizeof(*u);
>>>> +     mu.buf = (char *)u;
>>>> +     imsm_prepare_update(c, &mu);
>>>> +     if (c->update_tail)
>>>> +             append_metadata_update(c, u, sizeof(*u));
>>>> 
>>>>      return 0;
>>>> }  
>>> 
>>> Hi Junxiao,
>>> 
>>> LGTM, Approved
>>> 
>>> Thanks,
>>> Blazej
>>> 
>> 
>> Hi Blazej
>> 
>> Have you updated the PR
>> https://github.com/md-raid-utilities/mdadm/pull/152 with this V2
>> version?
>> 
>> Regards
>> Xiao
>> 
> Hi,
> 
> Yes, the test result is the same as V1 but there is one note from
> checkpatch
> 
> WARNING:BAD_FIXES_TAG: Please use correct Fixes: style 'Fixes: <12
> chars of sha1> ("<title line>")' - ie: 'Fixes: ca847037fafb ("[V2]mdmon: imsm: fix metadata corruption when managing new array")'
> #42:
> Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
If you don’t mind, just add a space between commit id and subject should fix it, the checkpatch seemed too strict on this.

Thanks,
Junxiao
> 
> Regards,
> Blazej
> 
>> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
  2025-02-28 15:56       ` Junxiao Bi
@ 2025-03-03 11:15         ` Blazej Kucman
  2025-03-03 14:17           ` Mariusz Tkaczyk
  0 siblings, 1 reply; 7+ messages in thread
From: Blazej Kucman @ 2025-03-03 11:15 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: Xiao Ni, mtkaczyk@kernel.org, linux-raid@vger.kernel.org,
	ncroxon@redhat.com, song@kernel.org, yukuai@kernel.org

On Fri, 28 Feb 2025 15:56:03 +0000
Junxiao Bi <junxiao.bi@oracle.com> wrote:

> > On Feb 28, 2025, at 1:38 AM, Blazej Kucman
> > <blazej.kucman@linux.intel.com> wrote:
> > 
> > On Thu, 27 Feb 2025 14:48:17 +0800
> > Xiao Ni <xni@redhat.com> wrote:
> >   
> >>> On Mon, Feb 24, 2025 at 9:16 PM Blazej Kucman
> >>> <blazej.kucman@linux.intel.com> wrote:
> >>> 
> >>> On Tue, 18 Feb 2025 10:48:31 -0800
> >>> Junxiao Bi <junxiao.bi@oracle.com> wrote:
> >>>   
> >>>> When manager thread detects new array, it will invoke
> >>>> manage_new(). For imsm array, it will further invoke
> >>>> imsm_open_new(). Since commit bbab0940fa75("imsm: write bad block
> >>>> log on metadata sync"), it preallocates bad block log when
> >>>> opening the array, that requires increasing the mpb buffer size.
> >>>> For that, imsm_open_new() invokes function
> >>>> imsm_update_metadata_locally(), which first uses
> >>>> imsm_prepare_update() to allocate a larger mpb buffer and store
> >>>> it at "mpb->next_buf", and then invoke imsm_process_update() to
> >>>> copy the content from current mpb buffer "mpb->buf" to
> >>>> "mpb->next_buf", and then free the current mpb buffer and set the
> >>>> new buffer as current.
> >>>> 
> >>>> There is a small race window, when monitor thread is syncing
> >>>> metadata, it gets current buffer pointer in
> >>>> imsm_sync_metadata()->write_super_imsm(), but before flushing the
> >>>> buffer to disk, manager thread does above switching buffer which
> >>>> frees current buffer, then monitor thread will run into
> >>>> use-after-free issue and could cause on-disk metadata corruption.
> >>>> If system keeps running, further metadata update could fix the
> >>>> corruption, because after switching buffer, the new buffer will
> >>>> contain good metadata, but if panic/power cycle happens while
> >>>> disk metadata is corrupted, the system will run into bootup
> >>>> failure if array is used as root, otherwise the array can not be
> >>>> assembled after boot if not used as root.
> >>>> 
> >>>> This issue will not happen for imsm array with only one member
> >>>> array, because the memory array has not be opened yet, monitor
> >>>> thread will not do any metadata updates.
> >>>> This can happen for imsm array with at lease two member array, in
> >>>> the following two scenarios:
> >>>> 1. Restarting mdmon process with at least two member array
> >>>> This will happen during system boot up or user restart mdmon
> >>>> after mdadm upgrade
> >>>> 2. Adding new member array to exist imsm array with at least one
> >>>> member array.
> >>>> 
> >>>> To fix this, delay the switching buffer operation to monitor
> >>>> thread.
> >>>> 
> >>>> Fixes: bbab0940fa75("imsm: write bad block log on metadata sync")
> >>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> >>>> ---
> >>>> v2 <- v1:
> >>>> - address code style in manage_new()
> >>>> - make lines of git log not over 75 characters
> >>>> 
> >>>> managemon.c   | 10 ++++++++--
> >>>> super-intel.c | 14 +++++++++++---
> >>>> 2 files changed, 19 insertions(+), 5 deletions(-)
> >>>> 
> >>>> diff --git a/managemon.c b/managemon.c
> >>>> index d79813282457..74b64bfc9613 100644
> >>>> --- a/managemon.c
> >>>> +++ b/managemon.c
> >>>> @@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent
> >>>> *mdstat,
> >>>>       * the monitor.
> >>>>       */
> >>>> 
> >>>> +     struct metadata_update *update = NULL;
> >>>>      struct active_array *new = NULL;
> >>>>      struct mdinfo *mdi = NULL, *di;
> >>>> -     int i, inst;
> >>>> -     int failed = 0;
> >>>>      char buf[SYSFS_MAX_BUF_SIZE];
> >>>> +     int failed = 0;
> >>>> +     int i, inst;
> >>>> 
> >>>>      /* check if array is ready to be monitored */
> >>>>      if (!mdstat->active || !mdstat->level)
> >>>> @@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent
> >>>> *mdstat, /* if everything checks out tell the metadata handler we
> >>>> want to
> >>>>       * manage this instance
> >>>>       */
> >>>> +     container->update_tail = &update;
> >>>>      if (!aa_ready(new) || container->ss->open_new(container,
> >>>> new, inst) < 0) {
> >>>> +             container->update_tail = NULL;
> >>>>              goto error;
> >>>>      } else {
> >>>> +             if (update)
> >>>> +                     queue_metadata_update(update);
> >>>> +             container->update_tail = NULL;
> >>>>              replace_array(container, victim, new);
> >>>>              if (failed) {
> >>>>                      new->check_degraded = 1;
> >>>> diff --git a/super-intel.c b/super-intel.c
> >>>> index cab841980830..4988eef191da 100644
> >>>> --- a/super-intel.c
> >>>> +++ b/super-intel.c
> >>>> @@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct
> >>>> intel_super *super, struct imsm_dev *dev, return failed;
> >>>> }
> >>>> 
> >>>> +static int imsm_prepare_update(struct supertype *st,
> >>>> +                            struct metadata_update *update);
> >>>> static int imsm_open_new(struct supertype *c, struct
> >>>> active_array *a, int inst)
> >>>> {
> >>>>      struct intel_super *super = c->sb;
> >>>>      struct imsm_super *mpb = super->anchor;
> >>>> -     struct imsm_update_prealloc_bb_mem u;
> >>>> +     struct imsm_update_prealloc_bb_mem *u;
> >>>> +     struct metadata_update mu;
> >>>> 
> >>>>      if (inst >= mpb->num_raid_devs) {
> >>>>              pr_err("subarry index %d, out of range\n", inst);
> >>>> @@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype
> >>>> *c, struct active_array *a, dprintf("imsm: open_new %d\n", inst);
> >>>>      a->info.container_member = inst;
> >>>> 
> >>>> -     u.type = update_prealloc_badblocks_mem;
> >>>> -     imsm_update_metadata_locally(c, &u, sizeof(u));
> >>>> +     u = xmalloc(sizeof(*u));
> >>>> +     u->type = update_prealloc_badblocks_mem;
> >>>> +     mu.len = sizeof(*u);
> >>>> +     mu.buf = (char *)u;
> >>>> +     imsm_prepare_update(c, &mu);
> >>>> +     if (c->update_tail)
> >>>> +             append_metadata_update(c, u, sizeof(*u));
> >>>> 
> >>>>      return 0;
> >>>> }    
> >>> 
> >>> Hi Junxiao,
> >>> 
> >>> LGTM, Approved
> >>> 
> >>> Thanks,
> >>> Blazej
> >>>   
> >> 
> >> Hi Blazej
> >> 
> >> Have you updated the PR
> >> https://github.com/md-raid-utilities/mdadm/pull/152 with this V2
> >> version?
> >> 
> >> Regards
> >> Xiao
> >>   
> > Hi,
> > 
> > Yes, the test result is the same as V1 but there is one note from
> > checkpatch
> > 
> > WARNING:BAD_FIXES_TAG: Please use correct Fixes: style 'Fixes: <12
> > chars of sha1> ("<title line>")' - ie: 'Fixes: ca847037fafb
> > ("[V2]mdmon: imsm: fix metadata corruption when managing new
> > array")' #42: Fixes: bbab0940fa75("imsm: write bad block log on
> > metadata sync")  
> If you don’t mind, just add a space between commit id and subject
> should fix it, the checkpatch seemed too strict on this.

Hi, 
I corrected it in PR.

Regards,
Blazej

> 
> Thanks,
> Junxiao
> > 
> > Regards,
> > Blazej
> >   
> >>   
> >   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array
  2025-03-03 11:15         ` Blazej Kucman
@ 2025-03-03 14:17           ` Mariusz Tkaczyk
  0 siblings, 0 replies; 7+ messages in thread
From: Mariusz Tkaczyk @ 2025-03-03 14:17 UTC (permalink / raw)
  To: Blazej Kucman
  Cc: Junxiao Bi, Xiao Ni, linux-raid@vger.kernel.org,
	ncroxon@redhat.com, song@kernel.org, yukuai@kernel.org

On Mon, 3 Mar 2025 12:15:35 +0100
Blazej Kucman <blazej.kucman@linux.intel.com> wrote:

> Hi, 
> I corrected it in PR.
> 

Thanks Blazej for handling it, merged.

Mariusz

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-03 14:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 18:48 [PATCH V2] mdmon: imsm: fix metadata corruption when managing new array Junxiao Bi
2025-02-24 13:15 ` Blazej Kucman
2025-02-27  6:48   ` Xiao Ni
2025-02-28  9:38     ` Blazej Kucman
2025-02-28 15:56       ` Junxiao Bi
2025-03-03 11:15         ` Blazej Kucman
2025-03-03 14:17           ` Mariusz Tkaczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).