All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] multipath-tools fixes
@ 2024-11-25 14:32 Martin Wilck
  2024-11-25 14:32 ` [PATCH 1/3] libmultipath: fix handling of pp->pgindex Martin Wilck
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Martin Wilck @ 2024-11-25 14:32 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

The first patch is an attempt to fix
https://github.com/opensvc/multipath-tools/issues/105.
The second fixes an issue that occured to me while working on the first one.
The third is a minor cleanup.

Martin Wilck (3):
  libmultipath: fix handling of pp->pgindex
  libmultipath: pgcmp(): compare number of paths
  libmultipath: move pathcmp() to configure.c

 libmultipath/configure.c   | 20 ++++++++++++++++++++
 libmultipath/pgpolicies.c  |  6 ++++++
 libmultipath/structs.c     | 31 +++++++++++--------------------
 libmultipath/structs.h     |  1 -
 libmultipath/structs_vec.c |  8 ++++++++
 5 files changed, 45 insertions(+), 21 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] libmultipath: fix handling of pp->pgindex
  2024-11-25 14:32 [PATCH 0/3] multipath-tools fixes Martin Wilck
@ 2024-11-25 14:32 ` Martin Wilck
  2024-11-25 20:31   ` Benjamin Marzinski
  2024-11-25 14:32 ` [PATCH 2/3] libmultipath: pgcmp(): compare number of paths Martin Wilck
  2024-11-25 14:32 ` [PATCH 3/3] libmultipath: move pathcmp() to configure.c Martin Wilck
  2 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2024-11-25 14:32 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/pgpolicies.c  |  6 ++++++
 libmultipath/structs.c     | 12 +++++++++++-
 libmultipath/structs_vec.c |  8 ++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/libmultipath/pgpolicies.c b/libmultipath/pgpolicies.c
index edc3c61..23ef2bd 100644
--- a/libmultipath/pgpolicies.c
+++ b/libmultipath/pgpolicies.c
@@ -127,6 +127,8 @@ fail:
 int group_paths(struct multipath *mp, int marginal_pathgroups)
 {
 	vector normal, marginal;
+	struct path *pp;
+	int i;
 
 	if (!mp->pg)
 		mp->pg = vector_alloc();
@@ -138,6 +140,10 @@ int group_paths(struct multipath *mp, int marginal_pathgroups)
 	if (!mp->pgpolicyfn)
 		goto fail;
 
+	/* Reset pgindex, we're going to invalidate it */
+	vector_foreach_slot(mp->paths, pp, i)
+		pp->pgindex = 0;
+
 	if (!marginal_pathgroups ||
 	    split_marginal_paths(mp->paths, &normal, &marginal) != 0) {
 		if (mp->pgpolicyfn(mp, mp->paths) != 0)
diff --git a/libmultipath/structs.c b/libmultipath/structs.c
index 61c8f32..4851725 100644
--- a/libmultipath/structs.c
+++ b/libmultipath/structs.c
@@ -239,8 +239,18 @@ free_pgvec (vector pgvec, enum free_path_mode free_paths)
 	if (!pgvec)
 		return;
 
-	vector_foreach_slot(pgvec, pgp, i)
+	vector_foreach_slot(pgvec, pgp, i) {
+
+		/* paths are going to be re-grouped, reset pgindex */
+		if (free_paths != FREE_PATHS) {
+			struct path *pp;
+			int j;
+
+			vector_foreach_slot(pgp->paths, pp, j)
+				pp->pgindex = 0;
+		}
 		free_pathgroup(pgp, free_paths);
+	}
 
 	vector_free(pgvec);
 }
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index d22056c..64c5ad5 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -139,6 +139,13 @@ static bool update_pathvec_from_dm(vector pathvec, struct multipath *mpp,
 				must_reload = true;
 				dm_fail_path(mpp->alias, pp->dev_t);
 				vector_del_slot(pgp->paths, j--);
+				/*
+				 * pp->pgindex has been set in disassemble_map(),
+				 * which has probably been called just before for
+				 * mpp. So he pgindex relates to mpp and may be
+				 * wrong for pp->mpp. Invalidate it.
+				 */
+				pp->pgindex = 0;
 				continue;
 			}
 			pp->mpp = mpp;
@@ -354,6 +361,7 @@ void orphan_path(struct path *pp, const char *reason)
 {
 	condlog(3, "%s: orphan path, %s", pp->dev, reason);
 	pp->mpp = NULL;
+	pp->pgindex = 0;
 	uninitialize_path(pp);
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] libmultipath: pgcmp(): compare number of paths
  2024-11-25 14:32 [PATCH 0/3] multipath-tools fixes Martin Wilck
  2024-11-25 14:32 ` [PATCH 1/3] libmultipath: fix handling of pp->pgindex Martin Wilck
@ 2024-11-25 14:32 ` Martin Wilck
  2024-11-25 21:06   ` Benjamin Marzinski
  2024-11-25 14:32 ` [PATCH 3/3] libmultipath: move pathcmp() to configure.c Martin Wilck
  2 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2024-11-25 14:32 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

pathcmp() makes sure that all paths in pgp have a match in cpgp, but not
vice-versa. Check the number of paths, too.

Fixes: 90773ba ("libmultipath: resolve hash collisions in pgcmp()")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/configure.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index d0e9c95..55140d0 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -442,6 +442,7 @@ pgcmp (struct multipath * mpp, struct multipath * cmpp)
 
 		vector_foreach_slot (cmpp->pg, cpgp, j) {
 			if (pgp->id == cpgp->id &&
+			    VECTOR_SIZE(pgp->paths) == VECTOR_SIZE(cpgp->paths) &&
 			    !pathcmp(pgp, cpgp)) {
 				r = 0;
 				break;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] libmultipath: move pathcmp() to configure.c
  2024-11-25 14:32 [PATCH 0/3] multipath-tools fixes Martin Wilck
  2024-11-25 14:32 ` [PATCH 1/3] libmultipath: fix handling of pp->pgindex Martin Wilck
  2024-11-25 14:32 ` [PATCH 2/3] libmultipath: pgcmp(): compare number of paths Martin Wilck
@ 2024-11-25 14:32 ` Martin Wilck
  2 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2024-11-25 14:32 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

... as it has only one caller, and make it static. No functional changes.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/configure.c | 19 +++++++++++++++++++
 libmultipath/structs.c   | 19 -------------------
 libmultipath/structs.h   |  1 -
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index 55140d0..6f79a4f 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -426,6 +426,25 @@ compute_pgid(struct pathgroup * pgp)
 		pgp->id ^= (long)pp;
 }
 
+static int pathcmp(const struct pathgroup *pgp, const struct pathgroup *cpgp)
+{
+	int i, j;
+	struct path *pp, *cpp;
+	int pnum = 0, found = 0;
+
+	vector_foreach_slot(pgp->paths, pp, i) {
+		pnum++;
+		vector_foreach_slot(cpgp->paths, cpp, j) {
+			if ((long)pp == (long)cpp) {
+				found++;
+				break;
+			}
+		}
+	}
+
+	return pnum - found;
+}
+
 static int
 pgcmp (struct multipath * mpp, struct multipath * cmpp)
 {
diff --git a/libmultipath/structs.c b/libmultipath/structs.c
index 4851725..dfa547b 100644
--- a/libmultipath/structs.c
+++ b/libmultipath/structs.c
@@ -627,25 +627,6 @@ int count_active_pending_paths(const struct multipath *mpp)
 	return do_pathcount(mpp, states, 3);
 }
 
-int pathcmp(const struct pathgroup *pgp, const struct pathgroup *cpgp)
-{
-	int i, j;
-	struct path *pp, *cpp;
-	int pnum = 0, found = 0;
-
-	vector_foreach_slot(pgp->paths, pp, i) {
-		pnum++;
-		vector_foreach_slot(cpgp->paths, cpp, j) {
-			if ((long)pp == (long)cpp) {
-				found++;
-				break;
-			}
-		}
-	}
-
-	return pnum - found;
-}
-
 struct path *
 first_path (const struct multipath * mpp)
 {
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 4821f19..49d9a2f 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -598,7 +598,6 @@ struct path *mp_find_path_by_devt(const struct multipath *mpp, const char *devt)
 int pathcount (const struct multipath *, int);
 int count_active_paths(const struct multipath *);
 int count_active_pending_paths(const struct multipath *);
-int pathcmp (const struct pathgroup *, const struct pathgroup *);
 int add_feature (char **, const char *);
 int remove_feature (char **, const char *);
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] libmultipath: fix handling of pp->pgindex
  2024-11-25 14:32 ` [PATCH 1/3] libmultipath: fix handling of pp->pgindex Martin Wilck
@ 2024-11-25 20:31   ` Benjamin Marzinski
  2024-11-26 11:12     ` Martin Wilck
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Marzinski @ 2024-11-25 20:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, dm-devel, Martin Wilck

On Mon, Nov 25, 2024 at 03:32:22PM +0100, Martin Wilck wrote:
> pp->pgindex is set in disassemble_map() when a map is parsed.
> There are various possiblities for this index to become invalid.
> pp->pgindex is only used in enable_group() and followover_should_fallback(),
> and both callers take no action if it is 0, which is the right
> thing to do if we don't know the path's pathgroup.
> 
> Make sure pp->pgindex is reset to 0 in various places:
> - when it's orphaned,
> - before (re)grouping paths,
> - when we detect a bad mpp assignment in update_pathvec_from_dm().
> 
> The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
> because we're looping over pg->paths in the former and over pg->pgp in
> the latter, I think it's better too play safe.

I'm not sure this will always fix issue #105. Perhaps I'm overlooking
the connection in the code and this just needs a more explicit
explanation to aid clueless reviewers, but here's what I don't get.

As far as I can see, the only change here that would effect the reported
issue is the change to update_pathvec_from_dm(). While I totally agree
that if you have a path that appears to be two multipath devices, you
can't trust pp->pgindex, I believe there's also another problem.

Say you have a multipath device with two path groups (pgp1 and pgp2)
each with one path (pp1 in pgp1 and pp2 in pgp2). In this case,
pp1->pgindex == 1 and pp2->pgindex == 2. If update_pathvec_from_dm()
discovers that pp1 is part of another multipath device, and removes it,
that will mean that pgp1 is now empty, so update_pathvec_from_dm() will
also remove that (at the delete_pg label).  But pp2->pgindex will still
be set to 2 even though there's only one path group, so it will now
point off the end of the pgp list. Right?

At any rate, updating pgindex seems finicky and perhaps we should just
drop it. It's not that much work to scan the path groups for the path in
enable_group() and since we're already reading through the path groups
in followover_should_failback(), we can just refactor the code a little
bit to avoid needing pgindex at all.

-Ben
 
> Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/pgpolicies.c  |  6 ++++++
>  libmultipath/structs.c     | 12 +++++++++++-
>  libmultipath/structs_vec.c |  8 ++++++++
>  3 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/libmultipath/pgpolicies.c b/libmultipath/pgpolicies.c
> index edc3c61..23ef2bd 100644
> --- a/libmultipath/pgpolicies.c
> +++ b/libmultipath/pgpolicies.c
> @@ -127,6 +127,8 @@ fail:
>  int group_paths(struct multipath *mp, int marginal_pathgroups)
>  {
>  	vector normal, marginal;
> +	struct path *pp;
> +	int i;
>  
>  	if (!mp->pg)
>  		mp->pg = vector_alloc();
> @@ -138,6 +140,10 @@ int group_paths(struct multipath *mp, int marginal_pathgroups)
>  	if (!mp->pgpolicyfn)
>  		goto fail;
>  
> +	/* Reset pgindex, we're going to invalidate it */
> +	vector_foreach_slot(mp->paths, pp, i)
> +		pp->pgindex = 0;
> +
>  	if (!marginal_pathgroups ||
>  	    split_marginal_paths(mp->paths, &normal, &marginal) != 0) {
>  		if (mp->pgpolicyfn(mp, mp->paths) != 0)
> diff --git a/libmultipath/structs.c b/libmultipath/structs.c
> index 61c8f32..4851725 100644
> --- a/libmultipath/structs.c
> +++ b/libmultipath/structs.c
> @@ -239,8 +239,18 @@ free_pgvec (vector pgvec, enum free_path_mode free_paths)
>  	if (!pgvec)
>  		return;
>  
> -	vector_foreach_slot(pgvec, pgp, i)
> +	vector_foreach_slot(pgvec, pgp, i) {
> +
> +		/* paths are going to be re-grouped, reset pgindex */
> +		if (free_paths != FREE_PATHS) {
> +			struct path *pp;
> +			int j;
> +
> +			vector_foreach_slot(pgp->paths, pp, j)
> +				pp->pgindex = 0;
> +		}
>  		free_pathgroup(pgp, free_paths);
> +	}
>  
>  	vector_free(pgvec);
>  }
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index d22056c..64c5ad5 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -139,6 +139,13 @@ static bool update_pathvec_from_dm(vector pathvec, struct multipath *mpp,
>  				must_reload = true;
>  				dm_fail_path(mpp->alias, pp->dev_t);
>  				vector_del_slot(pgp->paths, j--);
> +				/*
> +				 * pp->pgindex has been set in disassemble_map(),
> +				 * which has probably been called just before for
> +				 * mpp. So he pgindex relates to mpp and may be
> +				 * wrong for pp->mpp. Invalidate it.
> +				 */
> +				pp->pgindex = 0;
>  				continue;
>  			}
>  			pp->mpp = mpp;
> @@ -354,6 +361,7 @@ void orphan_path(struct path *pp, const char *reason)
>  {
>  	condlog(3, "%s: orphan path, %s", pp->dev, reason);
>  	pp->mpp = NULL;
> +	pp->pgindex = 0;
>  	uninitialize_path(pp);
>  }
>  
> -- 
> 2.47.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] libmultipath: pgcmp(): compare number of paths
  2024-11-25 14:32 ` [PATCH 2/3] libmultipath: pgcmp(): compare number of paths Martin Wilck
@ 2024-11-25 21:06   ` Benjamin Marzinski
  2024-11-26 11:15     ` Martin Wilck
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Marzinski @ 2024-11-25 21:06 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, dm-devel, Martin Wilck

On Mon, Nov 25, 2024 at 03:32:23PM +0100, Martin Wilck wrote:
> pathcmp() makes sure that all paths in pgp have a match in cpgp, but not
> vice-versa. Check the number of paths, too.

This looks fine. But looking at it made we a nervous about cpgp->id.  We
only calculate that in pgcmp() (and just for mpp, not cmpp) and in
disassemble_map(). But we clearly can have pathgroup changes after that,
as the last patch has shown. To be safe, we should either skip the whole
pgp->id thing (it's not a huge time savings) or recalculate it for all
of cmpp's path groups before we start the loops in pgcmp().

-Ben

> 
> Fixes: 90773ba ("libmultipath: resolve hash collisions in pgcmp()")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/configure.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libmultipath/configure.c b/libmultipath/configure.c
> index d0e9c95..55140d0 100644
> --- a/libmultipath/configure.c
> +++ b/libmultipath/configure.c
> @@ -442,6 +442,7 @@ pgcmp (struct multipath * mpp, struct multipath * cmpp)
>  
>  		vector_foreach_slot (cmpp->pg, cpgp, j) {
>  			if (pgp->id == cpgp->id &&
> +			    VECTOR_SIZE(pgp->paths) == VECTOR_SIZE(cpgp->paths) &&
>  			    !pathcmp(pgp, cpgp)) {
>  				r = 0;
>  				break;
> -- 
> 2.47.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] libmultipath: fix handling of pp->pgindex
  2024-11-25 20:31   ` Benjamin Marzinski
@ 2024-11-26 11:12     ` Martin Wilck
  2024-11-26 17:41       ` Benjamin Marzinski
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2024-11-26 11:12 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, dm-devel

On Mon, 2024-11-25 at 15:31 -0500, Benjamin Marzinski wrote:
> On Mon, Nov 25, 2024 at 03:32:22PM +0100, Martin Wilck wrote:
> > pp->pgindex is set in disassemble_map() when a map is parsed.
> > There are various possiblities for this index to become invalid.
> > pp->pgindex is only used in enable_group() and
> > followover_should_fallback(),
> > and both callers take no action if it is 0, which is the right
> > thing to do if we don't know the path's pathgroup.
> > 
> > Make sure pp->pgindex is reset to 0 in various places:
> > - when it's orphaned,
> > - before (re)grouping paths,
> > - when we detect a bad mpp assignment in update_pathvec_from_dm().
> > 
> > The hunk in group_paths is mostly redundant with the hunk in
> > free_pgvec(), but
> > because we're looping over pg->paths in the former and over pg->pgp
> > in
> > the latter, I think it's better too play safe.
> 
> I'm not sure this will always fix issue #105. Perhaps I'm overlooking
> the connection in the code and this just needs a more explicit
> explanation to aid clueless reviewers, but here's what I don't get.
> 
> As far as I can see, the only change here that would effect the
> reported
> issue is the change to update_pathvec_from_dm(). While I totally
> agree
> that if you have a path that appears to be two multipath devices, you
> can't trust pp->pgindex, I believe there's also another problem.
> 
> Say you have a multipath device with two path groups (pgp1 and pgp2)
> each with one path (pp1 in pgp1 and pp2 in pgp2). In this case,
> pp1->pgindex == 1 and pp2->pgindex == 2. If update_pathvec_from_dm()
> discovers that pp1 is part of another multipath device, and removes
> it,
> that will mean that pgp1 is now empty, so update_pathvec_from_dm()
> will
> also remove that (at the delete_pg label).  But pp2->pgindex will
> still
> be set to 2 even though there's only one path group, so it will now
> point off the end of the pgp list. Right?

Right, thanks for pointing this out. If we delete a pg, we need to
invalidate all pgindex values for all paths in the map. We can't
decrement them, because they must match kernel indices (see below).

> At any rate, updating pgindex seems finicky and perhaps we should
> just
> drop it. It's not that much work to scan the path groups for the path
> in
> enable_group() and since we're already reading through the path
> groups
> in followover_should_failback(), we can just refactor the code a
> little
> bit to avoid needing pgindex at all.

I had similar thoughts, but I was looking for a minimal fix for the
0.11.0 release. Probably dropping pgindex for good is the right thing
to do, but is it 0.11.0 material?

We'd still have the problem that enable_group() et al. need a pgindex
value that matches the kernel configuration. We can only be sure that
this index is correct if disassemble_map() has just set it. Any change
we apply in multipathd's data structures won't match the kernel's view
of the map.

This problem is subtle, AFAICS. It's related to the fact that if
update_pathvec_from_dm() finds any inconsistencies, we *should* reload
the map, possibly multiple times, until these inconsistencies have been
resolved. But we can't do that in all code paths in which this function
is called (or at least, we haven't been able to prove that we can
without risking breakage).

I have vague ideas how this could be solved, but they would require a
major code restructuring.

Regards,
Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] libmultipath: pgcmp(): compare number of paths
  2024-11-25 21:06   ` Benjamin Marzinski
@ 2024-11-26 11:15     ` Martin Wilck
  2024-11-26 11:17       ` Martin Wilck
  2024-11-26 17:43       ` Benjamin Marzinski
  0 siblings, 2 replies; 11+ messages in thread
From: Martin Wilck @ 2024-11-26 11:15 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, dm-devel

On Mon, 2024-11-25 at 16:06 -0500, Benjamin Marzinski wrote:
> On Mon, Nov 25, 2024 at 03:32:23PM +0100, Martin Wilck wrote:
> > pathcmp() makes sure that all paths in pgp have a match in cpgp,
> > but not
> > vice-versa. Check the number of paths, too.
> 
> This looks fine. But looking at it made we a nervous about cpgp->id. 
> We
> only calculate that in pgcmp() (and just for mpp, not cmpp) and in
> disassemble_map(). But we clearly can have pathgroup changes after
> that,
> as the last patch has shown. To be safe, we should either skip the
> whole
> pgp->id thing (it's not a huge time savings) or recalculate it for
> all
> of cmpp's path groups before we start the loops in pgcmp().

Agreed. Like before, I am wondering if we should drop pgp->id in 0.11.0
already or postpone.

IMO my patch is a valid fix for a minor issue which needs redesign in a
future release.

Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] libmultipath: pgcmp(): compare number of paths
  2024-11-26 11:15     ` Martin Wilck
@ 2024-11-26 11:17       ` Martin Wilck
  2024-11-26 17:43       ` Benjamin Marzinski
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2024-11-26 11:17 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, dm-devel

On Tue, 2024-11-26 at 12:15 +0100, Martin Wilck wrote:
> 
> IMO my patch is a valid fix for a minor issue which needs redesign in
> a
> future release.

I meant to say:

IMO my patch is a valid fix for a minor issue in a flawed design which
needs redesign in a future release.

Sorry,
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] libmultipath: fix handling of pp->pgindex
  2024-11-26 11:12     ` Martin Wilck
@ 2024-11-26 17:41       ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2024-11-26 17:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, dm-devel

On Tue, Nov 26, 2024 at 12:12:16PM +0100, Martin Wilck wrote:
> On Mon, 2024-11-25 at 15:31 -0500, Benjamin Marzinski wrote:
> > On Mon, Nov 25, 2024 at 03:32:22PM +0100, Martin Wilck wrote:
> > > pp->pgindex is set in disassemble_map() when a map is parsed.
> > > There are various possiblities for this index to become invalid.
> > > pp->pgindex is only used in enable_group() and
> > > followover_should_fallback(),
> > > and both callers take no action if it is 0, which is the right
> > > thing to do if we don't know the path's pathgroup.
> > > 
> > > Make sure pp->pgindex is reset to 0 in various places:
> > > - when it's orphaned,
> > > - before (re)grouping paths,
> > > - when we detect a bad mpp assignment in update_pathvec_from_dm().
> > > 
> > > The hunk in group_paths is mostly redundant with the hunk in
> > > free_pgvec(), but
> > > because we're looping over pg->paths in the former and over pg->pgp
> > > in
> > > the latter, I think it's better too play safe.
> > 
> > I'm not sure this will always fix issue #105. Perhaps I'm overlooking
> > the connection in the code and this just needs a more explicit
> > explanation to aid clueless reviewers, but here's what I don't get.
> > 
> > As far as I can see, the only change here that would effect the
> > reported
> > issue is the change to update_pathvec_from_dm(). While I totally
> > agree
> > that if you have a path that appears to be two multipath devices, you
> > can't trust pp->pgindex, I believe there's also another problem.
> > 
> > Say you have a multipath device with two path groups (pgp1 and pgp2)
> > each with one path (pp1 in pgp1 and pp2 in pgp2). In this case,
> > pp1->pgindex == 1 and pp2->pgindex == 2. If update_pathvec_from_dm()
> > discovers that pp1 is part of another multipath device, and removes
> > it,
> > that will mean that pgp1 is now empty, so update_pathvec_from_dm()
> > will
> > also remove that (at the delete_pg label).  But pp2->pgindex will
> > still
> > be set to 2 even though there's only one path group, so it will now
> > point off the end of the pgp list. Right?
> 
> Right, thanks for pointing this out. If we delete a pg, we need to
> invalidate all pgindex values for all paths in the map. We can't
> decrement them, because they must match kernel indices (see below).
> 
> > At any rate, updating pgindex seems finicky and perhaps we should
> > just
> > drop it. It's not that much work to scan the path groups for the path
> > in
> > enable_group() and since we're already reading through the path
> > groups
> > in followover_should_failback(), we can just refactor the code a
> > little
> > bit to avoid needing pgindex at all.
> 
> I had similar thoughts, but I was looking for a minimal fix for the
> 0.11.0 release. Probably dropping pgindex for good is the right thing
> to do, but is it 0.11.0 material?

I'd be fine with removing it, but it's your call. I have nothing against
your approach for 0.11.0 and then a more involved fix afterwards. 
 
> We'd still have the problem that enable_group() et al. need a pgindex
> value that matches the kernel configuration. We can only be sure that
> this index is correct if disassemble_map() has just set it. Any change
> we apply in multipathd's data structures won't match the kernel's view
> of the map.
> 
> This problem is subtle, AFAICS. It's related to the fact that if
> update_pathvec_from_dm() finds any inconsistencies, we *should* reload
> the map, possibly multiple times, until these inconsistencies have been
> resolved. But we can't do that in all code paths in which this function
> is called (or at least, we haven't been able to prove that we can
> without risking breakage).
> 
> I have vague ideas how this could be solved, but they would require a
> major code restructuring.

We could use mpp->need_reload to signal that pgindex was invalid. It
might still make sense to drop pgindex so we don't have to unset it
in cases like when a path is in two multipath devices. But if the
map needed reloading, we could just let the reload deal with adjusting
the path groups. The only problem is that we can set need_reload, but
not actually trigger that reload for a while. 

-Ben

> Regards,
> Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] libmultipath: pgcmp(): compare number of paths
  2024-11-26 11:15     ` Martin Wilck
  2024-11-26 11:17       ` Martin Wilck
@ 2024-11-26 17:43       ` Benjamin Marzinski
  1 sibling, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2024-11-26 17:43 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, dm-devel

On Tue, Nov 26, 2024 at 12:15:06PM +0100, Martin Wilck wrote:
> On Mon, 2024-11-25 at 16:06 -0500, Benjamin Marzinski wrote:
> > On Mon, Nov 25, 2024 at 03:32:23PM +0100, Martin Wilck wrote:
> > > pathcmp() makes sure that all paths in pgp have a match in cpgp,
> > > but not
> > > vice-versa. Check the number of paths, too.
> > 
> > This looks fine. But looking at it made we a nervous about cpgp->id. 
> > We
> > only calculate that in pgcmp() (and just for mpp, not cmpp) and in
> > disassemble_map(). But we clearly can have pathgroup changes after
> > that,
> > as the last patch has shown. To be safe, we should either skip the
> > whole
> > pgp->id thing (it's not a huge time savings) or recalculate it for
> > all
> > of cmpp's path groups before we start the loops in pgcmp().
> 
> Agreed. Like before, I am wondering if we should drop pgp->id in 0.11.0
> already or postpone.
> 
> IMO my patch is a valid fix for a minor issue which needs redesign in a
> future release.

Yep. Like I said, your patch looks fine. I just noticed the gpg->id
issue while looking at it.

-Ben

> 
> Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-11-26 17:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-25 14:32 [PATCH 0/3] multipath-tools fixes Martin Wilck
2024-11-25 14:32 ` [PATCH 1/3] libmultipath: fix handling of pp->pgindex Martin Wilck
2024-11-25 20:31   ` Benjamin Marzinski
2024-11-26 11:12     ` Martin Wilck
2024-11-26 17:41       ` Benjamin Marzinski
2024-11-25 14:32 ` [PATCH 2/3] libmultipath: pgcmp(): compare number of paths Martin Wilck
2024-11-25 21:06   ` Benjamin Marzinski
2024-11-26 11:15     ` Martin Wilck
2024-11-26 11:17       ` Martin Wilck
2024-11-26 17:43       ` Benjamin Marzinski
2024-11-25 14:32 ` [PATCH 3/3] libmultipath: move pathcmp() to configure.c Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.