public inbox for gfs2@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH 0/1] dlm_tools: support corosync3/knet multi-link
@ 2024-12-24  8:42 Heming Zhao
  2024-12-24  8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2024-12-24  8:42 UTC (permalink / raw)
  To: teigland, aahringo
  Cc: Heming Zhao, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2

This patch is to support corosync3/knet multi-link.

Changelog:
v1->v2: Added new code to support corosync2's 2-ring mode.

Heming Zhao (1):
  dlm_controld: support corosync3/knet multi-link

 dlm_controld/action.c     | 44 ++++++++++++++++++++++++++++++---------
 dlm_controld/dlm_daemon.h |  4 ++--
 dlm_sand/sand_internal.h  |  4 ++--
 3 files changed, 38 insertions(+), 14 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2024-12-24  8:42 [PATCH 0/1] dlm_tools: support corosync3/knet multi-link Heming Zhao
@ 2024-12-24  8:42 ` Heming Zhao
  2025-01-06 18:11   ` Alexander Aring
  0 siblings, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2024-12-24  8:42 UTC (permalink / raw)
  To: teigland, aahringo
  Cc: Heming Zhao, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2

The totem.rrp_mode config item was obsolete in corosync3. And
this patch gives dlm_controld the ability to detect multiple
links.

The corosync and dlm network protocol relationship table:

 -------------+-----------------------+---------------------
              | totem.transport=udpu  | totem.transport=udp
              +-----------------------+---------------------
 corosync 2.x |            |          |      multicast
              |   1-ring   | 2-ring   |---------------------
              |            |          |  default  | 2-ring
 -------------+------------+----------+---------------------
    dlm       |     tcp    | sctp     |   tcp     | sctp
 -------------+------------+----------+---------------------

 -------------+----------------------------+----------------------
              | totem.transport = udpu/udp | totem.transport=knet
 corosync 3.x |----------------------------+----------------------
              |      1-ring                | 1-link  | multi-links
 -------------+----------------------------+---------+-----------
    dlm       |       tcp                  |  tcp    | sctp
 -------------+----------------------------+---------+-----------

At last, this patch should be work with updated kernel dlm module.
Because the DLM_MAX_ADDR_COUNT is changed from 3 to 8.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 dlm_controld/action.c     | 44 ++++++++++++++++++++++++++++++---------
 dlm_controld/dlm_daemon.h |  4 ++--
 dlm_sand/sand_internal.h  |  4 ++--
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/dlm_controld/action.c b/dlm_controld/action.c
index 60eb22a78c56..4b54a28707c6 100644
--- a/dlm_controld/action.c
+++ b/dlm_controld/action.c
@@ -20,12 +20,15 @@ static int comms_nodes_count;
 #define CLUSTER_DIR   "/sys/kernel/config/dlm/cluster"
 #define SPACES_DIR    "/sys/kernel/config/dlm/cluster/spaces"
 #define COMMS_DIR     "/sys/kernel/config/dlm/cluster/comms"
+#define CMAP_STR_LEN  27
 
 static int detect_protocol(void)
 {
 	cmap_handle_t handle;
 	char *str = NULL;
+	char key[CMAP_STR_LEN];
 	int rv, proto = -1;
+	int i, link = 0;
 
 	rv = cmap_initialize(&handle);
 	if (rv != CS_OK) {
@@ -33,19 +36,40 @@ static int detect_protocol(void)
 		return -1; 
 	}
 
-	rv = cmap_get_string(handle, "totem.rrp_mode", &str);
-	if (rv != CS_OK)
-		goto out;
-
-	log_debug("cmap totem.rrp_mode = '%s'", str);
+	for (i = 0; i < MAX_NODE_ADDRESSES; i++) {
+		snprintf(key, CMAP_STR_LEN, "nodelist.node.0.ring%d_addr", i);
+		rv = cmap_get_string(handle, key, &str);
+		if (rv != CS_OK) {
+			/* we only care the link number, ignore all error here */
+			log_debug("[%d] %s rv:%d", i, key, rv);
+			continue;
+		}
+		log_debug("[%d] %s : %s", i, key, str);
+		link++;
+		free(str);
+	}
 
-	if (!strcmp(str, "none"))
-		proto = PROTO_TCP;
-	else
+	if (link > 1)
 		proto = PROTO_SCTP;
- out:
-	if (str)
+	else if (link == 1)
+		proto = PROTO_TCP;
+
+	/*
+	 * Since corosync3 retains rrp_mode but allows any value,
+	 * we should precisely match the rrp_mode value (none,
+	 * active, passive) used in corosync2 env.
+	 */
+	rv = cmap_get_string(handle, "totem.rrp_mode", &str);
+	if (rv == CS_OK) {
+		log_debug("cmap totem.rrp_mode = '%s'", str);
+
+		if (!strcmp(str, "none"))
+			proto = PROTO_TCP;
+		else if (!strcmp(str, "active") || !strcmp(str, "passive"))
+			proto = PROTO_SCTP;
 		free(str);
+	}
+
 	cmap_finalize(handle);
 	return proto;
 }
diff --git a/dlm_controld/dlm_daemon.h b/dlm_controld/dlm_daemon.h
index 4a533e3451e2..3ed4e235a83c 100644
--- a/dlm_controld/dlm_daemon.h
+++ b/dlm_controld/dlm_daemon.h
@@ -176,9 +176,9 @@ EXTERN struct dlm_option dlm_options[dlm_options_max];
 #define MAX_NODES	128
 
 /* Maximum number of IP addresses per node, when using SCTP and multi-ring in
-   corosync  In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 3. */
+   corosync  In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 8. */
 
-#define MAX_NODE_ADDRESSES 4
+#define MAX_NODE_ADDRESSES 8
 
 #define PROTO_TCP  0
 #define PROTO_SCTP 1
diff --git a/dlm_sand/sand_internal.h b/dlm_sand/sand_internal.h
index 4c2fc0897051..c17287abcd47 100644
--- a/dlm_sand/sand_internal.h
+++ b/dlm_sand/sand_internal.h
@@ -138,9 +138,9 @@ EXTERN struct dlm_option dlm_options[dlm_options_max];
    Copied in libdlm.h so apps don't need to include the kernel header. */
 
 /* Maximum number of IP addresses per node, when using SCTP.
-   In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 3. */
+   In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 8. */
 
-#define MAX_NODE_ADDRESSES 4
+#define MAX_NODE_ADDRESSES 8
 
 #define PROTO_TCP  0
 #define PROTO_SCTP 1
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2024-12-24  8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao
@ 2025-01-06 18:11   ` Alexander Aring
  2025-01-07  4:59     ` Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Aring @ 2025-01-06 18:11 UTC (permalink / raw)
  To: Heming Zhao; +Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2

Hi Heming,

On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>
> The totem.rrp_mode config item was obsolete in corosync3. And
> this patch gives dlm_controld the ability to detect multiple
> links.
>
> The corosync and dlm network protocol relationship table:
>
>  -------------+-----------------------+---------------------
>               | totem.transport=udpu  | totem.transport=udp
>               +-----------------------+---------------------
>  corosync 2.x |            |          |      multicast
>               |   1-ring   | 2-ring   |---------------------
>               |            |          |  default  | 2-ring
>  -------------+------------+----------+---------------------
>     dlm       |     tcp    | sctp     |   tcp     | sctp
>  -------------+------------+----------+---------------------
>
>  -------------+----------------------------+----------------------
>               | totem.transport = udpu/udp | totem.transport=knet
>  corosync 3.x |----------------------------+----------------------
>               |      1-ring                | 1-link  | multi-links
>  -------------+----------------------------+---------+-----------
>     dlm       |       tcp                  |  tcp    | sctp
>  -------------+----------------------------+---------+-----------
>
> At last, this patch should be work with updated kernel dlm module.

I am not getting why the network protocol configuration has anything
to do with the corosync configuration.
I know that we currently get the address configurations from corosync
but with this patch we are forced to use SCTP when corosync provides
more than one "ring" configuration?

Even with corosync3 it should be possible to use corosync in SCTP
(multiple rings) and the kernel dlm using TCP only, would this not be
possible with dlm_controld then?

Thanks.

- Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-06 18:11   ` Alexander Aring
@ 2025-01-07  4:59     ` Heming Zhao
  2025-01-08 15:54       ` Alexander Aring
  0 siblings, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2025-01-07  4:59 UTC (permalink / raw)
  To: Alexander Aring
  Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2,
	Roger Zhou

On 1/7/25 02:11, Alexander Aring wrote:
> Hi Heming,
> 
> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>
>> The totem.rrp_mode config item was obsolete in corosync3. And
>> this patch gives dlm_controld the ability to detect multiple
>> links.
>>
>> The corosync and dlm network protocol relationship table:
>>
>>   -------------+-----------------------+---------------------
>>                | totem.transport=udpu  | totem.transport=udp
>>                +-----------------------+---------------------
>>   corosync 2.x |            |          |      multicast
>>                |   1-ring   | 2-ring   |---------------------
>>                |            |          |  default  | 2-ring
>>   -------------+------------+----------+---------------------
>>      dlm       |     tcp    | sctp     |   tcp     | sctp
>>   -------------+------------+----------+---------------------
>>
>>   -------------+----------------------------+----------------------
>>                | totem.transport = udpu/udp | totem.transport=knet
>>   corosync 3.x |----------------------------+----------------------
>>                |      1-ring                | 1-link  | multi-links
>>   -------------+----------------------------+---------+-----------
>>      dlm       |       tcp                  |  tcp    | sctp
>>   -------------+----------------------------+---------+-----------
>>
>> At last, this patch should be work with updated kernel dlm module.
> 
> I am not getting why the network protocol configuration has anything
> to do with the corosync configuration.
> I know that we currently get the address configurations from corosync
> but with this patch we are forced to use SCTP when corosync provides
> more than one "ring" configuration?

Yes. this patch will force dlm to change to SCTP when corosync provides
more than one "ring".

The reason:
(without this patch) When a user sets up multi-links on corosync3
and corosync.conf with an incorrect or missing rrp_mode,
dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
an error.
Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
config item in the further. Therefore, the network protocol will
always be TCP.

> 
> Even with corosync3 it should be possible to use corosync in SCTP
> (multiple rings) and the kernel dlm using TCP only, would this not be
> possible with dlm_controld then?

Only one case for above case: corosync3 on single-link.
A new patch is needed for dlm to work over TCP when corosync3 in SCTP
(multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
-EINVAL when 'dlm_local_count > 1'.

A key point for dlm is that there is no way to get the corosync version.
This patch is compatible with corosync2 env. In corosync2, the user must
correctly config rrp_mode when using 2-ring.

i.e.:
In corosync2, change to 2-ring from 1-ring (whatever multicast mode).
There must include rrp_mode item, if not, error report:
   corosync[1284]:   [MAIN  ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none.
   corosync[1284]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1415.

Even more, since corosync3 isn't compatible with corosync2,
in my view, latest version of dlm_tools should only focus on corosync3
and drop corosync2 support. If any Linux distribution stay with
corosync2, they should choose an old version of dlm_tools.

- Heming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-07  4:59     ` Heming Zhao
@ 2025-01-08 15:54       ` Alexander Aring
  2025-01-09  2:26         ` Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Aring @ 2025-01-08 15:54 UTC (permalink / raw)
  To: Heming Zhao
  Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2,
	Roger Zhou

Hi,

On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
>
> On 1/7/25 02:11, Alexander Aring wrote:
> > Hi Heming,
> >
> > On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
> >>
> >> The totem.rrp_mode config item was obsolete in corosync3. And
> >> this patch gives dlm_controld the ability to detect multiple
> >> links.
> >>
> >> The corosync and dlm network protocol relationship table:
> >>
> >>   -------------+-----------------------+---------------------
> >>                | totem.transport=udpu  | totem.transport=udp
> >>                +-----------------------+---------------------
> >>   corosync 2.x |            |          |      multicast
> >>                |   1-ring   | 2-ring   |---------------------
> >>                |            |          |  default  | 2-ring
> >>   -------------+------------+----------+---------------------
> >>      dlm       |     tcp    | sctp     |   tcp     | sctp
> >>   -------------+------------+----------+---------------------
> >>
> >>   -------------+----------------------------+----------------------
> >>                | totem.transport = udpu/udp | totem.transport=knet
> >>   corosync 3.x |----------------------------+----------------------
> >>                |      1-ring                | 1-link  | multi-links
> >>   -------------+----------------------------+---------+-----------
> >>      dlm       |       tcp                  |  tcp    | sctp
> >>   -------------+----------------------------+---------+-----------
> >>
> >> At last, this patch should be work with updated kernel dlm module.
> >
> > I am not getting why the network protocol configuration has anything
> > to do with the corosync configuration.
> > I know that we currently get the address configurations from corosync
> > but with this patch we are forced to use SCTP when corosync provides
> > more than one "ring" configuration?
>
> Yes. this patch will force dlm to change to SCTP when corosync provides
> more than one "ring".
>
> The reason:
> (without this patch) When a user sets up multi-links on corosync3
> and corosync.conf with an incorrect or missing rrp_mode,
> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
> an error.
> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
> config item in the further. Therefore, the network protocol will
> always be TCP.
>
> >
> > Even with corosync3 it should be possible to use corosync in SCTP
> > (multiple rings) and the kernel dlm using TCP only, would this not be
> > possible with dlm_controld then?
>
> Only one case for above case: corosync3 on single-link.
> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
> -EINVAL when 'dlm_local_count > 1'.
>

I think we should change that condition then.

> A key point for dlm is that there is no way to get the corosync version.
> This patch is compatible with corosync2 env. In corosync2, the user must
> correctly config rrp_mode when using 2-ring.
>

So far I looked into it, it is anyway for detecting a protocol
according to some Corosync functionality it should still be possible
to always force dlm_controld using a different protocol by setting the
right config values/parameters.

> i.e.:
> In corosync2, change to 2-ring from 1-ring (whatever multicast mode).
> There must include rrp_mode item, if not, error report:
>    corosync[1284]:   [MAIN  ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none.
>    corosync[1284]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1415.
>
> Even more, since corosync3 isn't compatible with corosync2,
> in my view, latest version of dlm_tools should only focus on corosync3
> and drop corosync2 support. If any Linux distribution stay with
> corosync2, they should choose an old version of dlm_tools.
>

Is dlm_tool not just a domain socket talking to dlm_controld? It does
not use any library of the Corosync project?

- Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-08 15:54       ` Alexander Aring
@ 2025-01-09  2:26         ` Heming Zhao
  2025-01-09 15:34           ` Alexander Aring
  0 siblings, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2025-01-09  2:26 UTC (permalink / raw)
  To: Alexander Aring
  Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2,
	Roger Zhou

On 1/8/25 23:54, Alexander Aring wrote:
> Hi,
> 
> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>
>> On 1/7/25 02:11, Alexander Aring wrote:
>>> Hi Heming,
>>>
>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>
>>>> The totem.rrp_mode config item was obsolete in corosync3. And
>>>> this patch gives dlm_controld the ability to detect multiple
>>>> links.
>>>>
>>>> The corosync and dlm network protocol relationship table:
>>>>
>>>>    -------------+-----------------------+---------------------
>>>>                 | totem.transport=udpu  | totem.transport=udp
>>>>                 +-----------------------+---------------------
>>>>    corosync 2.x |            |          |      multicast
>>>>                 |   1-ring   | 2-ring   |---------------------
>>>>                 |            |          |  default  | 2-ring
>>>>    -------------+------------+----------+---------------------
>>>>       dlm       |     tcp    | sctp     |   tcp     | sctp
>>>>    -------------+------------+----------+---------------------
>>>>
>>>>    -------------+----------------------------+----------------------
>>>>                 | totem.transport = udpu/udp | totem.transport=knet
>>>>    corosync 3.x |----------------------------+----------------------
>>>>                 |      1-ring                | 1-link  | multi-links
>>>>    -------------+----------------------------+---------+-----------
>>>>       dlm       |       tcp                  |  tcp    | sctp
>>>>    -------------+----------------------------+---------+-----------
>>>>
>>>> At last, this patch should be work with updated kernel dlm module.
>>>
>>> I am not getting why the network protocol configuration has anything
>>> to do with the corosync configuration.
>>> I know that we currently get the address configurations from corosync
>>> but with this patch we are forced to use SCTP when corosync provides
>>> more than one "ring" configuration?
>>
>> Yes. this patch will force dlm to change to SCTP when corosync provides
>> more than one "ring".
>>
>> The reason:
>> (without this patch) When a user sets up multi-links on corosync3
>> and corosync.conf with an incorrect or missing rrp_mode,
>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
>> an error.
>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
>> config item in the further. Therefore, the network protocol will
>> always be TCP.
>>
>>>
>>> Even with corosync3 it should be possible to use corosync in SCTP
>>> (multiple rings) and the kernel dlm using TCP only, would this not be
>>> possible with dlm_controld then?
>>
>> Only one case for above case: corosync3 on single-link.
>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
>> -EINVAL when 'dlm_local_count > 1'.
>>
> 
> I think we should change that condition then.
> 
>> A key point for dlm is that there is no way to get the corosync version.
>> This patch is compatible with corosync2 env. In corosync2, the user must
>> correctly config rrp_mode when using 2-ring.
>>
> 
> So far I looked into it, it is anyway for detecting a protocol
> according to some Corosync functionality it should still be possible
> to always force dlm_controld using a different protocol by setting the
> right config values/parameters.

Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass
the detection phase when its value is "tcp|sctp". But in general, dlm.conf
is seldom used.

Unfortunately, corosync doesn't provide the api.
ref: https://github.com/corosync/corosync/issues/771

> 
>> i.e.:
>> In corosync2, change to 2-ring from 1-ring (whatever multicast mode).
>> There must include rrp_mode item, if not, error report:
>>     corosync[1284]:   [MAIN  ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none.
>>     corosync[1284]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1415.
>>
>> Even more, since corosync3 isn't compatible with corosync2,
>> in my view, latest version of dlm_tools should only focus on corosync3
>> and drop corosync2 support. If any Linux distribution stay with
>> corosync2, they should choose an old version of dlm_tools.
>>
> 
> Is dlm_tool not just a domain socket talking to dlm_controld? It does
> not use any library of the Corosync project?

detect_protocol(@dlm_controld/action.c) uses the cmap APIs to get corosync
settings.
dlm_controld registers a callback in setup_cluster(), then corosync uses
quorum_callback/quorum_nodelist_callback to notify the dlm_controld daemon
about quorum status.

- Heming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-09  2:26         ` Heming Zhao
@ 2025-01-09 15:34           ` Alexander Aring
  2025-01-09 15:38             ` christine caulfield
  2025-01-10 14:28             ` Heming Zhao
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Aring @ 2025-01-09 15:34 UTC (permalink / raw)
  To: Heming Zhao
  Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2,
	Roger Zhou

Hi Heming,

On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote:
>
> On 1/8/25 23:54, Alexander Aring wrote:
> > Hi,
> >
> > On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
> >>
> >> On 1/7/25 02:11, Alexander Aring wrote:
> >>> Hi Heming,
> >>>
> >>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
> >>>>
> >>>> The totem.rrp_mode config item was obsolete in corosync3. And
> >>>> this patch gives dlm_controld the ability to detect multiple
> >>>> links.
> >>>>
> >>>> The corosync and dlm network protocol relationship table:
> >>>>
> >>>>    -------------+-----------------------+---------------------
> >>>>                 | totem.transport=udpu  | totem.transport=udp
> >>>>                 +-----------------------+---------------------
> >>>>    corosync 2.x |            |          |      multicast
> >>>>                 |   1-ring   | 2-ring   |---------------------
> >>>>                 |            |          |  default  | 2-ring
> >>>>    -------------+------------+----------+---------------------
> >>>>       dlm       |     tcp    | sctp     |   tcp     | sctp
> >>>>    -------------+------------+----------+---------------------
> >>>>
> >>>>    -------------+----------------------------+----------------------
> >>>>                 | totem.transport = udpu/udp | totem.transport=knet
> >>>>    corosync 3.x |----------------------------+----------------------
> >>>>                 |      1-ring                | 1-link  | multi-links
> >>>>    -------------+----------------------------+---------+-----------
> >>>>       dlm       |       tcp                  |  tcp    | sctp
> >>>>    -------------+----------------------------+---------+-----------
> >>>>
> >>>> At last, this patch should be work with updated kernel dlm module.
> >>>
> >>> I am not getting why the network protocol configuration has anything
> >>> to do with the corosync configuration.
> >>> I know that we currently get the address configurations from corosync
> >>> but with this patch we are forced to use SCTP when corosync provides
> >>> more than one "ring" configuration?
> >>
> >> Yes. this patch will force dlm to change to SCTP when corosync provides
> >> more than one "ring".
> >>
> >> The reason:
> >> (without this patch) When a user sets up multi-links on corosync3
> >> and corosync.conf with an incorrect or missing rrp_mode,
> >> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
> >> an error.
> >> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
> >> config item in the further. Therefore, the network protocol will
> >> always be TCP.
> >>
> >>>
> >>> Even with corosync3 it should be possible to use corosync in SCTP
> >>> (multiple rings) and the kernel dlm using TCP only, would this not be
> >>> possible with dlm_controld then?
> >>
> >> Only one case for above case: corosync3 on single-link.
> >> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
> >> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
> >> -EINVAL when 'dlm_local_count > 1'.
> >>
> >
> > I think we should change that condition then.
> >
> >> A key point for dlm is that there is no way to get the corosync version.
> >> This patch is compatible with corosync2 env. In corosync2, the user must
> >> correctly config rrp_mode when using 2-ring.
> >>
> >
> > So far I looked into it, it is anyway for detecting a protocol
> > according to some Corosync functionality it should still be possible
> > to always force dlm_controld using a different protocol by setting the
> > right config values/parameters.
>
> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass
> the detection phase when its value is "tcp|sctp". But in general, dlm.conf
> is seldom used.
>
> Unfortunately, corosync doesn't provide the api.
> ref: https://github.com/corosync/corosync/issues/771

I have the following scenario in my head with detect_protocol().

Currently, if somebody uses knet with UDP and has multiple
"nodelist.node.0.ring%d_addr" defined in Corosync but does not set
"totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as
a parameter (it will use detect_protocol()"), then the DLM kernel will
use TCP.

After your patch the behaviour will be changed and the DLM kernel will
use SCTP with the same configuration as before?

- Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-09 15:34           ` Alexander Aring
@ 2025-01-09 15:38             ` christine caulfield
  2025-01-10 14:28             ` Heming Zhao
  1 sibling, 0 replies; 16+ messages in thread
From: christine caulfield @ 2025-01-09 15:38 UTC (permalink / raw)
  To: Alexander Aring, Heming Zhao
  Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou



On 09/01/2025 15:34, Alexander Aring wrote:
> Hi Heming,
> 
> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>
>> On 1/8/25 23:54, Alexander Aring wrote:
>>> Hi,
>>>
>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>
>>>> On 1/7/25 02:11, Alexander Aring wrote:
>>>>> Hi Heming,
>>>>>
>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>>>
>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And
>>>>>> this patch gives dlm_controld the ability to detect multiple
>>>>>> links.
>>>>>>
>>>>>> The corosync and dlm network protocol relationship table:
>>>>>>
>>>>>>     -------------+-----------------------+---------------------
>>>>>>                  | totem.transport=udpu  | totem.transport=udp
>>>>>>                  +-----------------------+---------------------
>>>>>>     corosync 2.x |            |          |      multicast
>>>>>>                  |   1-ring   | 2-ring   |---------------------
>>>>>>                  |            |          |  default  | 2-ring
>>>>>>     -------------+------------+----------+---------------------
>>>>>>        dlm       |     tcp    | sctp     |   tcp     | sctp
>>>>>>     -------------+------------+----------+---------------------
>>>>>>
>>>>>>     -------------+----------------------------+----------------------
>>>>>>                  | totem.transport = udpu/udp | totem.transport=knet
>>>>>>     corosync 3.x |----------------------------+----------------------
>>>>>>                  |      1-ring                | 1-link  | multi-links
>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>        dlm       |       tcp                  |  tcp    | sctp
>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>
>>>>>> At last, this patch should be work with updated kernel dlm module.
>>>>>
>>>>> I am not getting why the network protocol configuration has anything
>>>>> to do with the corosync configuration.
>>>>> I know that we currently get the address configurations from corosync
>>>>> but with this patch we are forced to use SCTP when corosync provides
>>>>> more than one "ring" configuration?
>>>>
>>>> Yes. this patch will force dlm to change to SCTP when corosync provides
>>>> more than one "ring".
>>>>
>>>> The reason:
>>>> (without this patch) When a user sets up multi-links on corosync3
>>>> and corosync.conf with an incorrect or missing rrp_mode,
>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
>>>> an error.
>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
>>>> config item in the further. Therefore, the network protocol will
>>>> always be TCP.
>>>>
>>>>>
>>>>> Even with corosync3 it should be possible to use corosync in SCTP
>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be
>>>>> possible with dlm_controld then?
>>>>
>>>> Only one case for above case: corosync3 on single-link.
>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
>>>> -EINVAL when 'dlm_local_count > 1'.
>>>>
>>>
>>> I think we should change that condition then.
>>>
>>>> A key point for dlm is that there is no way to get the corosync version.
>>>> This patch is compatible with corosync2 env. In corosync2, the user must
>>>> correctly config rrp_mode when using 2-ring.
>>>>
>>>
>>> So far I looked into it, it is anyway for detecting a protocol
>>> according to some Corosync functionality it should still be possible
>>> to always force dlm_controld using a different protocol by setting the
>>> right config values/parameters.
>>
>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass
>> the detection phase when its value is "tcp|sctp". But in general, dlm.conf
>> is seldom used.
>>
>> Unfortunately, corosync doesn't provide the api.
>> ref: https://github.com/corosync/corosync/issues/771
> 
> I have the following scenario in my head with detect_protocol().
> 
> Currently, if somebody uses knet with UDP and has multiple


Just a quick intervention here. If corosync is using knet, then it 
doesn't matter if knet's protocol is UDP or SCTP, it can still provide 
up to 8 links. There is no point in you checking that value (also, 
different links can use different protocols).

Yes, check corosync's transport: key for knet/udp/udpu - THAT will tell 
you whether multi-link is available.


So, the logic goes:

multilink = false
if totem.rrp_mode == 1 then multilink = true
if totem.transport == 'knet' then multilink = true

That should cover it.

Chrissie



> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set
> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as
> a parameter (it will use detect_protocol()"), then the DLM kernel will
> use TCP.
> 
> After your patch the behaviour will be changed and the DLM kernel will
> use SCTP with the same configuration as before?
> 
> - Alex
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-09 15:34           ` Alexander Aring
  2025-01-09 15:38             ` christine caulfield
@ 2025-01-10 14:28             ` Heming Zhao
  2025-01-10 14:43               ` christine caulfield
  1 sibling, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2025-01-10 14:28 UTC (permalink / raw)
  To: Alexander Aring
  Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2,
	Roger Zhou

On 1/9/25 23:34, Alexander Aring wrote:
> Hi Heming,
> 
> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>
>> On 1/8/25 23:54, Alexander Aring wrote:
>>> Hi,
>>>
>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>
>>>> On 1/7/25 02:11, Alexander Aring wrote:
>>>>> Hi Heming,
>>>>>
>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>>>
>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And
>>>>>> this patch gives dlm_controld the ability to detect multiple
>>>>>> links.
>>>>>>
>>>>>> The corosync and dlm network protocol relationship table:
>>>>>>
>>>>>>     -------------+-----------------------+---------------------
>>>>>>                  | totem.transport=udpu  | totem.transport=udp
>>>>>>                  +-----------------------+---------------------
>>>>>>     corosync 2.x |            |          |      multicast
>>>>>>                  |   1-ring   | 2-ring   |---------------------
>>>>>>                  |            |          |  default  | 2-ring
>>>>>>     -------------+------------+----------+---------------------
>>>>>>        dlm       |     tcp    | sctp     |   tcp     | sctp
>>>>>>     -------------+------------+----------+---------------------
>>>>>>
>>>>>>     -------------+----------------------------+----------------------
>>>>>>                  | totem.transport = udpu/udp | totem.transport=knet
>>>>>>     corosync 3.x |----------------------------+----------------------
>>>>>>                  |      1-ring                | 1-link  | multi-links
>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>        dlm       |       tcp                  |  tcp    | sctp
>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>
>>>>>> At last, this patch should be work with updated kernel dlm module.
>>>>>
>>>>> I am not getting why the network protocol configuration has anything
>>>>> to do with the corosync configuration.
>>>>> I know that we currently get the address configurations from corosync
>>>>> but with this patch we are forced to use SCTP when corosync provides
>>>>> more than one "ring" configuration?
>>>>
>>>> Yes. this patch will force dlm to change to SCTP when corosync provides
>>>> more than one "ring".
>>>>
>>>> The reason:
>>>> (without this patch) When a user sets up multi-links on corosync3
>>>> and corosync.conf with an incorrect or missing rrp_mode,
>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
>>>> an error.
>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
>>>> config item in the further. Therefore, the network protocol will
>>>> always be TCP.
>>>>
>>>>>
>>>>> Even with corosync3 it should be possible to use corosync in SCTP
>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be
>>>>> possible with dlm_controld then?
>>>>
>>>> Only one case for above case: corosync3 on single-link.
>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
>>>> -EINVAL when 'dlm_local_count > 1'.
>>>>
>>>
>>> I think we should change that condition then.
>>>
>>>> A key point for dlm is that there is no way to get the corosync version.
>>>> This patch is compatible with corosync2 env. In corosync2, the user must
>>>> correctly config rrp_mode when using 2-ring.
>>>>
>>>
>>> So far I looked into it, it is anyway for detecting a protocol
>>> according to some Corosync functionality it should still be possible
>>> to always force dlm_controld using a different protocol by setting the
>>> right config values/parameters.
>>
>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass
>> the detection phase when its value is "tcp|sctp". But in general, dlm.conf
>> is seldom used.
>>
>> Unfortunately, corosync doesn't provide the api.
>> ref: https://github.com/corosync/corosync/issues/771
> 
> I have the following scenario in my head with detect_protocol().
> 
> Currently, if somebody uses knet with UDP and has multiple
> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set
> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as
> a parameter (it will use detect_protocol()"), then the DLM kernel will
> use TCP.

Since you wrote knet above, so the corosync version is 3.x.
For your description, there are four points/places to notice.

1. The above setting never works in the SUSE HA stack.

The reason I wrote in the previous mail is that corosync will report error:
> corosync[1284]:   [MAIN  ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none.

2. (you are right) DLM kernel will uses TCP

If corosync doesn't complain that the rrp_mode is missing.
The current code (without my patch), dlm_tools func detect_protocol()
returns '-1', which makes the DLM kernel use TCP.

3. DLM kernel module doesn't work

current code (without my patch), DLM kernel dlm_tcp_listen_validate()
will return -EINVAL when 'dlm_local_count > 1'.

4. Corosync using UDP/SCTP is transparent for dlm.

UDP/UDPU just means corosync is under single-link. this is one
rule of corosync 3.x.
knet means corosync is under multi-link. there may be only one
link present, or up to 8 links present.

> 
> After your patch the behaviour will be changed and the DLM kernel will
> use SCTP with the same configuration as before?
> 

According to the corosync/dlm behaviour in SUSE HA stack
(ref above 4 points), my patch:
- corosync 3.x env, forces the dlm to use TCP when only one link exists.
- corosync 3.x env, forces the dlm kernel to use SCTP when more than
   one link exists.
- keep the same behaviour when running corosync 2.x. (since user must
   correctly set rrp_mode).


/Heming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-10 14:28             ` Heming Zhao
@ 2025-01-10 14:43               ` christine caulfield
  2025-01-13  3:12                 ` Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: christine caulfield @ 2025-01-10 14:43 UTC (permalink / raw)
  To: Heming Zhao, Alexander Aring
  Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou



On 10/01/2025 14:28, Heming Zhao wrote:
> On 1/9/25 23:34, Alexander Aring wrote:
>> Hi Heming,
>>
>> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>
>>> On 1/8/25 23:54, Alexander Aring wrote:
>>>> Hi,
>>>>
>>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> 
>>>> wrote:
>>>>>
>>>>> On 1/7/25 02:11, Alexander Aring wrote:
>>>>>> Hi Heming,
>>>>>>
>>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> 
>>>>>> wrote:
>>>>>>>
>>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And
>>>>>>> this patch gives dlm_controld the ability to detect multiple
>>>>>>> links.
>>>>>>>
>>>>>>> The corosync and dlm network protocol relationship table:
>>>>>>>
>>>>>>>     -------------+-----------------------+---------------------
>>>>>>>                  | totem.transport=udpu  | totem.transport=udp
>>>>>>>                  +-----------------------+---------------------
>>>>>>>     corosync 2.x |            |          |      multicast
>>>>>>>                  |   1-ring   | 2-ring   |---------------------
>>>>>>>                  |            |          |  default  | 2-ring
>>>>>>>     -------------+------------+----------+---------------------
>>>>>>>        dlm       |     tcp    | sctp     |   tcp     | sctp
>>>>>>>     -------------+------------+----------+---------------------
>>>>>>>
>>>>>>>     -------------+---------------------------- 
>>>>>>> +----------------------
>>>>>>>                  | totem.transport = udpu/udp | totem.transport=knet
>>>>>>>     corosync 3.x |---------------------------- 
>>>>>>> +----------------------
>>>>>>>                  |      1-ring                | 1-link  | multi- 
>>>>>>> links
>>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>>        dlm       |       tcp                  |  tcp    | sctp
>>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>>
>>>>>>> At last, this patch should be work with updated kernel dlm module.
>>>>>>
>>>>>> I am not getting why the network protocol configuration has anything
>>>>>> to do with the corosync configuration.
>>>>>> I know that we currently get the address configurations from corosync
>>>>>> but with this patch we are forced to use SCTP when corosync provides
>>>>>> more than one "ring" configuration?
>>>>>
>>>>> Yes. this patch will force dlm to change to SCTP when corosync 
>>>>> provides
>>>>> more than one "ring".
>>>>>
>>>>> The reason:
>>>>> (without this patch) When a user sets up multi-links on corosync3
>>>>> and corosync.conf with an incorrect or missing rrp_mode,
>>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and 
>>>>> report
>>>>> an error.
>>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read 
>>>>> this
>>>>> config item in the further. Therefore, the network protocol will
>>>>> always be TCP.
>>>>>
>>>>>>
>>>>>> Even with corosync3 it should be possible to use corosync in SCTP
>>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be
>>>>>> possible with dlm_controld then?
>>>>>
>>>>> Only one case for above case: corosync3 on single-link.
>>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
>>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
>>>>> -EINVAL when 'dlm_local_count > 1'.
>>>>>
>>>>
>>>> I think we should change that condition then.
>>>>
>>>>> A key point for dlm is that there is no way to get the corosync 
>>>>> version.
>>>>> This patch is compatible with corosync2 env. In corosync2, the user 
>>>>> must
>>>>> correctly config rrp_mode when using 2-ring.
>>>>>
>>>>
>>>> So far I looked into it, it is anyway for detecting a protocol
>>>> according to some Corosync functionality it should still be possible
>>>> to always force dlm_controld using a different protocol by setting the
>>>> right config values/parameters.
>>>
>>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can 
>>> bypass
>>> the detection phase when its value is "tcp|sctp". But in general, 
>>> dlm.conf
>>> is seldom used.
>>>
>>> Unfortunately, corosync doesn't provide the api.
>>> ref: https://github.com/corosync/corosync/issues/771
>>
>> I have the following scenario in my head with detect_protocol().
>>
>> Currently, if somebody uses knet with UDP and has multiple
>> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set
>> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as
>> a parameter (it will use detect_protocol()"), then the DLM kernel will
>> use TCP.
> 
> Since you wrote knet above, so the corosync version is 3.x.
> For your description, there are four points/places to notice.
> 
> 1. The above setting never works in the SUSE HA stack.
> 
> The reason I wrote in the previous mail is that corosync will report error:
>> corosync[1284]:   [MAIN  ] parse error in config: 2 is too many 
>> configured interfaces for the rrp_mode setting none.
> 
> 2. (you are right) DLM kernel will uses TCP
> 
> If corosync doesn't complain that the rrp_mode is missing.
> The current code (without my patch), dlm_tools func detect_protocol()
> returns '-1', which makes the DLM kernel use TCP.
> 
> 3. DLM kernel module doesn't work
> 
> current code (without my patch), DLM kernel dlm_tcp_listen_validate()
> will return -EINVAL when 'dlm_local_count > 1'.
> 
> 4. Corosync using UDP/SCTP is transparent for dlm.
> 
> UDP/UDPU just means corosync is under single-link. this is one
> rule of corosync 3.x.
> knet means corosync is under multi-link. there may be only one
> link present, or up to 8 links present.
> 
>>
>> After your patch the behaviour will be changed and the DLM kernel will
>> use SCTP with the same configuration as before?
>>
> 
> According to the corosync/dlm behaviour in SUSE HA stack
> (ref above 4 points), my patch:
> - corosync 3.x env, forces the dlm to use TCP when only one link exists.

That's dangerous though, because corosync3 can dynamically add and 
remove links while running. It's quite possible (and explicitly 
supported) to create a cluster with only 1 link, and then add others later.

Chrissie

> - corosync 3.x env, forces the dlm kernel to use SCTP when more than
>    one link exists.
> - keep the same behaviour when running corosync 2.x. (since user must
>    correctly set rrp_mode).
> 
> 
> /Heming
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-10 14:43               ` christine caulfield
@ 2025-01-13  3:12                 ` Heming Zhao
  2025-01-17 15:11                   ` Alexander Aring
  0 siblings, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2025-01-13  3:12 UTC (permalink / raw)
  To: christine caulfield, Alexander Aring
  Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou

On 1/10/25 22:43, christine caulfield wrote:
> 
> 
> On 10/01/2025 14:28, Heming Zhao wrote:
>> On 1/9/25 23:34, Alexander Aring wrote:
>>> Hi Heming,
>>>
>>> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>
>>>> On 1/8/25 23:54, Alexander Aring wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>>>
>>>>>> On 1/7/25 02:11, Alexander Aring wrote:
>>>>>>> Hi Heming,
>>>>>>>
>>>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>>>>>
>>>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And
>>>>>>>> this patch gives dlm_controld the ability to detect multiple
>>>>>>>> links.
>>>>>>>>
>>>>>>>> The corosync and dlm network protocol relationship table:
>>>>>>>>
>>>>>>>>     -------------+-----------------------+---------------------
>>>>>>>>                  | totem.transport=udpu  | totem.transport=udp
>>>>>>>>                  +-----------------------+---------------------
>>>>>>>>     corosync 2.x |            |          |      multicast
>>>>>>>>                  |   1-ring   | 2-ring   |---------------------
>>>>>>>>                  |            |          |  default  | 2-ring
>>>>>>>>     -------------+------------+----------+---------------------
>>>>>>>>        dlm       |     tcp    | sctp     |   tcp     | sctp
>>>>>>>>     -------------+------------+----------+---------------------
>>>>>>>>
>>>>>>>>     -------------+---------------------------- +----------------------
>>>>>>>>                  | totem.transport = udpu/udp | totem.transport=knet
>>>>>>>>     corosync 3.x |---------------------------- +----------------------
>>>>>>>>                  |      1-ring                | 1-link  | multi- links
>>>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>>>        dlm       |       tcp                  |  tcp    | sctp
>>>>>>>>     -------------+----------------------------+---------+-----------
>>>>>>>>
>>>>>>>> At last, this patch should be work with updated kernel dlm module.
>>>>>>>
>>>>>>> I am not getting why the network protocol configuration has anything
>>>>>>> to do with the corosync configuration.
>>>>>>> I know that we currently get the address configurations from corosync
>>>>>>> but with this patch we are forced to use SCTP when corosync provides
>>>>>>> more than one "ring" configuration?
>>>>>>
>>>>>> Yes. this patch will force dlm to change to SCTP when corosync provides
>>>>>> more than one "ring".
>>>>>>
>>>>>> The reason:
>>>>>> (without this patch) When a user sets up multi-links on corosync3
>>>>>> and corosync.conf with an incorrect or missing rrp_mode,
>>>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
>>>>>> an error.
>>>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
>>>>>> config item in the further. Therefore, the network protocol will
>>>>>> always be TCP.
>>>>>>
>>>>>>>
>>>>>>> Even with corosync3 it should be possible to use corosync in SCTP
>>>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be
>>>>>>> possible with dlm_controld then?
>>>>>>
>>>>>> Only one case for above case: corosync3 on single-link.
>>>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
>>>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
>>>>>> -EINVAL when 'dlm_local_count > 1'.
>>>>>>
>>>>>
>>>>> I think we should change that condition then.
>>>>>
>>>>>> A key point for dlm is that there is no way to get the corosync version.
>>>>>> This patch is compatible with corosync2 env. In corosync2, the user must
>>>>>> correctly config rrp_mode when using 2-ring.
>>>>>>
>>>>>
>>>>> So far I looked into it, it is anyway for detecting a protocol
>>>>> according to some Corosync functionality it should still be possible
>>>>> to always force dlm_controld using a different protocol by setting the
>>>>> right config values/parameters.
>>>>
>>>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass
>>>> the detection phase when its value is "tcp|sctp". But in general, dlm.conf
>>>> is seldom used.
>>>>
>>>> Unfortunately, corosync doesn't provide the api.
>>>> ref: https://github.com/corosync/corosync/issues/771
>>>
>>> I have the following scenario in my head with detect_protocol().
>>>
>>> Currently, if somebody uses knet with UDP and has multiple
>>> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set
>>> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as
>>> a parameter (it will use detect_protocol()"), then the DLM kernel will
>>> use TCP.
>>
>> Since you wrote knet above, so the corosync version is 3.x.
>> For your description, there are four points/places to notice.
>>
>> 1. The above setting never works in the SUSE HA stack.
>>
>> The reason I wrote in the previous mail is that corosync will report error:
>>> corosync[1284]:   [MAIN  ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none.
>>
>> 2. (you are right) DLM kernel will uses TCP
>>
>> If corosync doesn't complain that the rrp_mode is missing.
>> The current code (without my patch), dlm_tools func detect_protocol()
>> returns '-1', which makes the DLM kernel use TCP.
>>
>> 3. DLM kernel module doesn't work
>>
>> current code (without my patch), DLM kernel dlm_tcp_listen_validate()
>> will return -EINVAL when 'dlm_local_count > 1'.
>>
>> 4. Corosync using UDP/SCTP is transparent for dlm.
>>
>> UDP/UDPU just means corosync is under single-link. this is one
>> rule of corosync 3.x.
>> knet means corosync is under multi-link. there may be only one
>> link present, or up to 8 links present.
>>
>>>
>>> After your patch the behaviour will be changed and the DLM kernel will
>>> use SCTP with the same configuration as before?
>>>
>>
>> According to the corosync/dlm behaviour in SUSE HA stack
>> (ref above 4 points), my patch:
>> - corosync 3.x env, forces the dlm to use TCP when only one link exists.
> 
> That's dangerous though, because corosync3 can dynamically add and remove links while running. It's quite possible (and explicitly supported) to create a cluster with only 1 link, and then add others later.
> 
> Chrissie

The current dlm code design doesn't allow reconfiguring the network
protocol on the fly. In the above scenario, the dlm will maintain the
TCP connection until the next dlm_deamon restart.

In my view, it's not essential for dlm to follow the knet dynamically
multi-link style. if the user hasn't set the 'protocol' item in
dlm.conf, (with my patch, for knet env), dlm will detect the corosync
nodelist on startup, and set the appropriate protocol mode.

If the user want to keep maintain a multi-link for dlm, they should
set protocl item in dlm.conf.

On the other hand, if dlm needs to dynamically change the number of
links during runtime, it should always use the SCTP protocol.

Thanks,
Heming



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-13  3:12                 ` Heming Zhao
@ 2025-01-17 15:11                   ` Alexander Aring
  2025-01-17 15:16                     ` christine caulfield
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Aring @ 2025-01-17 15:11 UTC (permalink / raw)
  To: Heming Zhao
  Cc: christine caulfield, teigland, jfriesse, nicholas.yang, glass.su,
	gfs2, Roger Zhou

Hi,

On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote:
...
>
> The current dlm code design doesn't allow reconfiguring the network
> protocol on the fly. In the above scenario, the dlm will maintain the
> TCP connection until the next dlm_deamon restart.
>
> In my view, it's not essential for dlm to follow the knet dynamically
> multi-link style. if the user hasn't set the 'protocol' item in
> dlm.conf, (with my patch, for knet env), dlm will detect the corosync
> nodelist on startup, and set the appropriate protocol mode.
>
> If the user want to keep maintain a multi-link for dlm, they should
> set protocl item in dlm.conf.
>
> On the other hand, if dlm needs to dynamically change the number of
> links during runtime, it should always use the SCTP protocol.

to move on here, I would at first apply/ack the patches for the static
maximum address increase.

I see no problems to do that as this is required by you.

For the automatic protocol detection and corosync3, at least I need to
doing some tests as I have currently 0 ideas how corosync informs the
users about new/removed links during runtime.

- Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-17 15:11                   ` Alexander Aring
@ 2025-01-17 15:16                     ` christine caulfield
  2025-02-18 11:46                       ` Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: christine caulfield @ 2025-01-17 15:16 UTC (permalink / raw)
  To: Alexander Aring, Heming Zhao
  Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou

On 17/01/2025 15:11, Alexander Aring wrote:
> Hi,
> 
> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote:
> ...
>>
>> The current dlm code design doesn't allow reconfiguring the network
>> protocol on the fly. In the above scenario, the dlm will maintain the
>> TCP connection until the next dlm_deamon restart.
>>
>> In my view, it's not essential for dlm to follow the knet dynamically
>> multi-link style. if the user hasn't set the 'protocol' item in
>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync
>> nodelist on startup, and set the appropriate protocol mode.
>>
>> If the user want to keep maintain a multi-link for dlm, they should
>> set protocl item in dlm.conf.
>>
>> On the other hand, if dlm needs to dynamically change the number of
>> links during runtime, it should always use the SCTP protocol.
> 
> to move on here, I would at first apply/ack the patches for the static
> maximum address increase.
> 
> I see no problems to do that as this is required by you.
> 
> For the automatic protocol detection and corosync3, at least I need to
> doing some tests as I have currently 0 ideas how corosync informs the
> users about new/removed links during runtime.
> 

To be honest, it mostly doesn't - and it's maybe something we could look 
into, though I doubt it would be high priority as this is the first time 
it's been thought of :)

Currently you can watch for the cmap key config.reload_in_progress and 
after that goes from 1 back to 0 re-read the new corosync.conf.

Chrissie

> - Alex
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-01-17 15:16                     ` christine caulfield
@ 2025-02-18 11:46                       ` Heming Zhao
  2025-02-18 16:35                         ` Alexander Aring
  0 siblings, 1 reply; 16+ messages in thread
From: Heming Zhao @ 2025-02-18 11:46 UTC (permalink / raw)
  To: christine caulfield, Alexander Aring
  Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou

Hello Christine and Alexander,

On 1/17/25 23:16, christine caulfield wrote:
> On 17/01/2025 15:11, Alexander Aring wrote:
>> Hi,
>>
>> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote:
>> ...
>>>
>>> The current dlm code design doesn't allow reconfiguring the network
>>> protocol on the fly. In the above scenario, the dlm will maintain the
>>> TCP connection until the next dlm_deamon restart.
>>>
>>> In my view, it's not essential for dlm to follow the knet dynamically
>>> multi-link style. if the user hasn't set the 'protocol' item in
>>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync
>>> nodelist on startup, and set the appropriate protocol mode.
>>>
>>> If the user want to keep maintain a multi-link for dlm, they should
>>> set protocl item in dlm.conf.
>>>
>>> On the other hand, if dlm needs to dynamically change the number of
>>> links during runtime, it should always use the SCTP protocol.
>>
>> to move on here, I would at first apply/ack the patches for the static
>> maximum address increase.
>>
>> I see no problems to do that as this is required by you.
>>
>> For the automatic protocol detection and corosync3, at least I need to
>> doing some tests as I have currently 0 ideas how corosync informs the
>> users about new/removed links during runtime.
>>
> 
> To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :)
> 
> Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf.
> 
> Chrissie
> 
>> - Alex
>>
> 

Sorry to raise this context.

I suspect Chrissie misunderstood Alex's writing, Alex comment is about another
patch [1], not this patch.
In my view, it's not harmful to increase the address number in kernel space.
The multiple addresses are SCTP specific, Even if some of the eight links
go down/break, with the SCTP redundant path feature, the SCTP can still work
normally.

[1]: https://lore.kernel.org/gfs2/20241220071418.16786-2-heming.zhao@suse.com/

- Heming

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-02-18 11:46                       ` Heming Zhao
@ 2025-02-18 16:35                         ` Alexander Aring
  2025-02-20  3:56                           ` Heming Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Aring @ 2025-02-18 16:35 UTC (permalink / raw)
  To: Heming Zhao
  Cc: christine caulfield, teigland, jfriesse, nicholas.yang, glass.su,
	gfs2, Roger Zhou

Hi,

On Tue, Feb 18, 2025 at 6:46 AM Heming Zhao <heming.zhao@suse.com> wrote:
>
> Hello Christine and Alexander,
>
> On 1/17/25 23:16, christine caulfield wrote:
> > On 17/01/2025 15:11, Alexander Aring wrote:
> >> Hi,
> >>
> >> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote:
> >> ...
> >>>
> >>> The current dlm code design doesn't allow reconfiguring the network
> >>> protocol on the fly. In the above scenario, the dlm will maintain the
> >>> TCP connection until the next dlm_deamon restart.
> >>>
> >>> In my view, it's not essential for dlm to follow the knet dynamically
> >>> multi-link style. if the user hasn't set the 'protocol' item in
> >>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync
> >>> nodelist on startup, and set the appropriate protocol mode.
> >>>
> >>> If the user want to keep maintain a multi-link for dlm, they should
> >>> set protocl item in dlm.conf.
> >>>
> >>> On the other hand, if dlm needs to dynamically change the number of
> >>> links during runtime, it should always use the SCTP protocol.
> >>
> >> to move on here, I would at first apply/ack the patches for the static
> >> maximum address increase.
> >>
> >> I see no problems to do that as this is required by you.
> >>
> >> For the automatic protocol detection and corosync3, at least I need to
> >> doing some tests as I have currently 0 ideas how corosync informs the
> >> users about new/removed links during runtime.
> >>
> >
> > To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :)
> >
> > Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf.
> >
> > Chrissie
> >
> >> - Alex
> >>
> >
>
> Sorry to raise this context.
>
> I suspect Chrissie misunderstood Alex's writing, Alex comment is about another
> patch [1], not this patch.
> In my view, it's not harmful to increase the address number in kernel space.

good news, the patch is already upstream. [0] Yes it is indeed not
harmful to increase that value.

We need to test/discuss the user space changes with corosync v2 vs v3
related to detect_protocol()?

If you split this patch to have only the "MAX_NODE_ADDRESSES", then I
can already apply this change. Btw: should it not be 9 (kernel 8) when
previously it was 4 (kernel 3)? I feel there might be a termination
entry at the end of the array?

- Alex

[0] https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/commit/?h=next&id=a53a6336171bd722aac6e98964a79d56841c5416


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
  2025-02-18 16:35                         ` Alexander Aring
@ 2025-02-20  3:56                           ` Heming Zhao
  0 siblings, 0 replies; 16+ messages in thread
From: Heming Zhao @ 2025-02-20  3:56 UTC (permalink / raw)
  To: Alexander Aring
  Cc: christine caulfield, teigland, jfriesse, nicholas.yang, glass.su,
	gfs2, Roger Zhou

On 2/19/25 00:35, Alexander Aring wrote:
> Hi,
> 
> On Tue, Feb 18, 2025 at 6:46 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>
>> Hello Christine and Alexander,
>>
>> On 1/17/25 23:16, christine caulfield wrote:
>>> On 17/01/2025 15:11, Alexander Aring wrote:
>>>> Hi,
>>>>
>>>> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>> ...
>>>>>
>>>>> The current dlm code design doesn't allow reconfiguring the network
>>>>> protocol on the fly. In the above scenario, the dlm will maintain the
>>>>> TCP connection until the next dlm_deamon restart.
>>>>>
>>>>> In my view, it's not essential for dlm to follow the knet dynamically
>>>>> multi-link style. if the user hasn't set the 'protocol' item in
>>>>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync
>>>>> nodelist on startup, and set the appropriate protocol mode.
>>>>>
>>>>> If the user want to keep maintain a multi-link for dlm, they should
>>>>> set protocl item in dlm.conf.
>>>>>
>>>>> On the other hand, if dlm needs to dynamically change the number of
>>>>> links during runtime, it should always use the SCTP protocol.
>>>>
>>>> to move on here, I would at first apply/ack the patches for the static
>>>> maximum address increase.
>>>>
>>>> I see no problems to do that as this is required by you.
>>>>
>>>> For the automatic protocol detection and corosync3, at least I need to
>>>> doing some tests as I have currently 0 ideas how corosync informs the
>>>> users about new/removed links during runtime.
>>>>
>>>
>>> To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :)
>>>
>>> Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf.
>>>
>>> Chrissie
>>>
>>>> - Alex
>>>>
>>>
>>
>> Sorry to raise this context.
>>
>> I suspect Chrissie misunderstood Alex's writing, Alex comment is about another
>> patch [1], not this patch.
>> In my view, it's not harmful to increase the address number in kernel space.
> 
> good news, the patch is already upstream. [0] Yes it is indeed not
> harmful to increase that value.
> 
> We need to test/discuss the user space changes with corosync v2 vs v3
> related to detect_protocol()?

Welcome to discuss.
If we want to follow corosync/knet style to support dynamically network link changes,
more code modifications will be necessary.

> 
> If you split this patch to have only the "MAX_NODE_ADDRESSES", then I

zhm: I will send a split patch for 'MAX_NODE_ADDRESSES'.

> can already apply this change. Btw: should it not be 9 (kernel 8) when
> previously it was 4 (kernel 3)? I feel there might be a termination
> entry at the end of the array?

The macro used in both kernel-space and user-space play a termination role.
However, in my view, both values are incorrect.
The correct and workable values for Corosync 2 are:
- DLM_MAX_ADDR_COUNT (kernel-space) should be 2, not 3
- MAX_NODE_ADDRESSES (user-space) should be 2, not 4.

The original author seems to use an enough number to accommodate potential
future increases in network links.

Regarding MAX_NODE_ADDRESSES, the dlm_controld process uses corosync_cfg_get_node_addrs() to request a maximum of four addresses.
However, Corosync 2 only supports a two-ring network.

Similarly, the current DLM kernel code can accept three sets of network addresses.

I tested the modifications with DLM_MAX_ADDR_COUNT set to 2 and
MAX_NODE_ADDRESSES also set to 2, and the cluster operated normally.

So for my patches:
- in kernel-space, I define DLM_MAX_ADDR_COUNT as 8 (not 9)
- in user-space, I define MAX_NODE_ADDRESSES as 8 (not the kernel value plus 1)
- I tested the code before sending both the v1 and v2 patches.

If you very like the style of kernel value plus 1, I am OK to define
MAX_NODE_ADDRESSES as 9.

- Heming


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-02-20  3:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-24  8:42 [PATCH 0/1] dlm_tools: support corosync3/knet multi-link Heming Zhao
2024-12-24  8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao
2025-01-06 18:11   ` Alexander Aring
2025-01-07  4:59     ` Heming Zhao
2025-01-08 15:54       ` Alexander Aring
2025-01-09  2:26         ` Heming Zhao
2025-01-09 15:34           ` Alexander Aring
2025-01-09 15:38             ` christine caulfield
2025-01-10 14:28             ` Heming Zhao
2025-01-10 14:43               ` christine caulfield
2025-01-13  3:12                 ` Heming Zhao
2025-01-17 15:11                   ` Alexander Aring
2025-01-17 15:16                     ` christine caulfield
2025-02-18 11:46                       ` Heming Zhao
2025-02-18 16:35                         ` Alexander Aring
2025-02-20  3:56                           ` Heming Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox