* [PATCH 0/1] dlm_tools: support corosync3/knet multi-link @ 2024-12-24 8:42 Heming Zhao 2024-12-24 8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: Heming Zhao @ 2024-12-24 8:42 UTC (permalink / raw) To: teigland, aahringo Cc: Heming Zhao, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2 This patch is to support corosync3/knet multi-link. Changelog: v1->v2: Added new code to support corosync2's 2-ring mode. Heming Zhao (1): dlm_controld: support corosync3/knet multi-link dlm_controld/action.c | 44 ++++++++++++++++++++++++++++++--------- dlm_controld/dlm_daemon.h | 4 ++-- dlm_sand/sand_internal.h | 4 ++-- 3 files changed, 38 insertions(+), 14 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2024-12-24 8:42 [PATCH 0/1] dlm_tools: support corosync3/knet multi-link Heming Zhao @ 2024-12-24 8:42 ` Heming Zhao 2025-01-06 18:11 ` Alexander Aring 0 siblings, 1 reply; 16+ messages in thread From: Heming Zhao @ 2024-12-24 8:42 UTC (permalink / raw) To: teigland, aahringo Cc: Heming Zhao, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2 The totem.rrp_mode config item was obsolete in corosync3. And this patch gives dlm_controld the ability to detect multiple links. The corosync and dlm network protocol relationship table: -------------+-----------------------+--------------------- | totem.transport=udpu | totem.transport=udp +-----------------------+--------------------- corosync 2.x | | | multicast | 1-ring | 2-ring |--------------------- | | | default | 2-ring -------------+------------+----------+--------------------- dlm | tcp | sctp | tcp | sctp -------------+------------+----------+--------------------- -------------+----------------------------+---------------------- | totem.transport = udpu/udp | totem.transport=knet corosync 3.x |----------------------------+---------------------- | 1-ring | 1-link | multi-links -------------+----------------------------+---------+----------- dlm | tcp | tcp | sctp -------------+----------------------------+---------+----------- At last, this patch should be work with updated kernel dlm module. Because the DLM_MAX_ADDR_COUNT is changed from 3 to 8. Signed-off-by: Heming Zhao <heming.zhao@suse.com> --- dlm_controld/action.c | 44 ++++++++++++++++++++++++++++++--------- dlm_controld/dlm_daemon.h | 4 ++-- dlm_sand/sand_internal.h | 4 ++-- 3 files changed, 38 insertions(+), 14 deletions(-) diff --git a/dlm_controld/action.c b/dlm_controld/action.c index 60eb22a78c56..4b54a28707c6 100644 --- a/dlm_controld/action.c +++ b/dlm_controld/action.c @@ -20,12 +20,15 @@ static int comms_nodes_count; #define CLUSTER_DIR "/sys/kernel/config/dlm/cluster" #define SPACES_DIR "/sys/kernel/config/dlm/cluster/spaces" #define COMMS_DIR "/sys/kernel/config/dlm/cluster/comms" +#define CMAP_STR_LEN 27 static int detect_protocol(void) { cmap_handle_t handle; char *str = NULL; + char key[CMAP_STR_LEN]; int rv, proto = -1; + int i, link = 0; rv = cmap_initialize(&handle); if (rv != CS_OK) { @@ -33,19 +36,40 @@ static int detect_protocol(void) return -1; } - rv = cmap_get_string(handle, "totem.rrp_mode", &str); - if (rv != CS_OK) - goto out; - - log_debug("cmap totem.rrp_mode = '%s'", str); + for (i = 0; i < MAX_NODE_ADDRESSES; i++) { + snprintf(key, CMAP_STR_LEN, "nodelist.node.0.ring%d_addr", i); + rv = cmap_get_string(handle, key, &str); + if (rv != CS_OK) { + /* we only care the link number, ignore all error here */ + log_debug("[%d] %s rv:%d", i, key, rv); + continue; + } + log_debug("[%d] %s : %s", i, key, str); + link++; + free(str); + } - if (!strcmp(str, "none")) - proto = PROTO_TCP; - else + if (link > 1) proto = PROTO_SCTP; - out: - if (str) + else if (link == 1) + proto = PROTO_TCP; + + /* + * Since corosync3 retains rrp_mode but allows any value, + * we should precisely match the rrp_mode value (none, + * active, passive) used in corosync2 env. + */ + rv = cmap_get_string(handle, "totem.rrp_mode", &str); + if (rv == CS_OK) { + log_debug("cmap totem.rrp_mode = '%s'", str); + + if (!strcmp(str, "none")) + proto = PROTO_TCP; + else if (!strcmp(str, "active") || !strcmp(str, "passive")) + proto = PROTO_SCTP; free(str); + } + cmap_finalize(handle); return proto; } diff --git a/dlm_controld/dlm_daemon.h b/dlm_controld/dlm_daemon.h index 4a533e3451e2..3ed4e235a83c 100644 --- a/dlm_controld/dlm_daemon.h +++ b/dlm_controld/dlm_daemon.h @@ -176,9 +176,9 @@ EXTERN struct dlm_option dlm_options[dlm_options_max]; #define MAX_NODES 128 /* Maximum number of IP addresses per node, when using SCTP and multi-ring in - corosync In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 3. */ + corosync In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 8. */ -#define MAX_NODE_ADDRESSES 4 +#define MAX_NODE_ADDRESSES 8 #define PROTO_TCP 0 #define PROTO_SCTP 1 diff --git a/dlm_sand/sand_internal.h b/dlm_sand/sand_internal.h index 4c2fc0897051..c17287abcd47 100644 --- a/dlm_sand/sand_internal.h +++ b/dlm_sand/sand_internal.h @@ -138,9 +138,9 @@ EXTERN struct dlm_option dlm_options[dlm_options_max]; Copied in libdlm.h so apps don't need to include the kernel header. */ /* Maximum number of IP addresses per node, when using SCTP. - In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 3. */ + In dlm-kernel this is DLM_MAX_ADDR_COUNT, currently 8. */ -#define MAX_NODE_ADDRESSES 4 +#define MAX_NODE_ADDRESSES 8 #define PROTO_TCP 0 #define PROTO_SCTP 1 -- 2.43.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2024-12-24 8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao @ 2025-01-06 18:11 ` Alexander Aring 2025-01-07 4:59 ` Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: Alexander Aring @ 2025-01-06 18:11 UTC (permalink / raw) To: Heming Zhao; +Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2 Hi Heming, On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: > > The totem.rrp_mode config item was obsolete in corosync3. And > this patch gives dlm_controld the ability to detect multiple > links. > > The corosync and dlm network protocol relationship table: > > -------------+-----------------------+--------------------- > | totem.transport=udpu | totem.transport=udp > +-----------------------+--------------------- > corosync 2.x | | | multicast > | 1-ring | 2-ring |--------------------- > | | | default | 2-ring > -------------+------------+----------+--------------------- > dlm | tcp | sctp | tcp | sctp > -------------+------------+----------+--------------------- > > -------------+----------------------------+---------------------- > | totem.transport = udpu/udp | totem.transport=knet > corosync 3.x |----------------------------+---------------------- > | 1-ring | 1-link | multi-links > -------------+----------------------------+---------+----------- > dlm | tcp | tcp | sctp > -------------+----------------------------+---------+----------- > > At last, this patch should be work with updated kernel dlm module. I am not getting why the network protocol configuration has anything to do with the corosync configuration. I know that we currently get the address configurations from corosync but with this patch we are forced to use SCTP when corosync provides more than one "ring" configuration? Even with corosync3 it should be possible to use corosync in SCTP (multiple rings) and the kernel dlm using TCP only, would this not be possible with dlm_controld then? Thanks. - Alex ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-06 18:11 ` Alexander Aring @ 2025-01-07 4:59 ` Heming Zhao 2025-01-08 15:54 ` Alexander Aring 0 siblings, 1 reply; 16+ messages in thread From: Heming Zhao @ 2025-01-07 4:59 UTC (permalink / raw) To: Alexander Aring Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 1/7/25 02:11, Alexander Aring wrote: > Hi Heming, > > On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: >> >> The totem.rrp_mode config item was obsolete in corosync3. And >> this patch gives dlm_controld the ability to detect multiple >> links. >> >> The corosync and dlm network protocol relationship table: >> >> -------------+-----------------------+--------------------- >> | totem.transport=udpu | totem.transport=udp >> +-----------------------+--------------------- >> corosync 2.x | | | multicast >> | 1-ring | 2-ring |--------------------- >> | | | default | 2-ring >> -------------+------------+----------+--------------------- >> dlm | tcp | sctp | tcp | sctp >> -------------+------------+----------+--------------------- >> >> -------------+----------------------------+---------------------- >> | totem.transport = udpu/udp | totem.transport=knet >> corosync 3.x |----------------------------+---------------------- >> | 1-ring | 1-link | multi-links >> -------------+----------------------------+---------+----------- >> dlm | tcp | tcp | sctp >> -------------+----------------------------+---------+----------- >> >> At last, this patch should be work with updated kernel dlm module. > > I am not getting why the network protocol configuration has anything > to do with the corosync configuration. > I know that we currently get the address configurations from corosync > but with this patch we are forced to use SCTP when corosync provides > more than one "ring" configuration? Yes. this patch will force dlm to change to SCTP when corosync provides more than one "ring". The reason: (without this patch) When a user sets up multi-links on corosync3 and corosync.conf with an incorrect or missing rrp_mode, dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report an error. Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this config item in the further. Therefore, the network protocol will always be TCP. > > Even with corosync3 it should be possible to use corosync in SCTP > (multiple rings) and the kernel dlm using TCP only, would this not be > possible with dlm_controld then? Only one case for above case: corosync3 on single-link. A new patch is needed for dlm to work over TCP when corosync3 in SCTP (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return -EINVAL when 'dlm_local_count > 1'. A key point for dlm is that there is no way to get the corosync version. This patch is compatible with corosync2 env. In corosync2, the user must correctly config rrp_mode when using 2-ring. i.e.: In corosync2, change to 2-ring from 1-ring (whatever multicast mode). There must include rrp_mode item, if not, error report: corosync[1284]: [MAIN ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none. corosync[1284]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1415. Even more, since corosync3 isn't compatible with corosync2, in my view, latest version of dlm_tools should only focus on corosync3 and drop corosync2 support. If any Linux distribution stay with corosync2, they should choose an old version of dlm_tools. - Heming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-07 4:59 ` Heming Zhao @ 2025-01-08 15:54 ` Alexander Aring 2025-01-09 2:26 ` Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: Alexander Aring @ 2025-01-08 15:54 UTC (permalink / raw) To: Heming Zhao Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou Hi, On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote: > > On 1/7/25 02:11, Alexander Aring wrote: > > Hi Heming, > > > > On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: > >> > >> The totem.rrp_mode config item was obsolete in corosync3. And > >> this patch gives dlm_controld the ability to detect multiple > >> links. > >> > >> The corosync and dlm network protocol relationship table: > >> > >> -------------+-----------------------+--------------------- > >> | totem.transport=udpu | totem.transport=udp > >> +-----------------------+--------------------- > >> corosync 2.x | | | multicast > >> | 1-ring | 2-ring |--------------------- > >> | | | default | 2-ring > >> -------------+------------+----------+--------------------- > >> dlm | tcp | sctp | tcp | sctp > >> -------------+------------+----------+--------------------- > >> > >> -------------+----------------------------+---------------------- > >> | totem.transport = udpu/udp | totem.transport=knet > >> corosync 3.x |----------------------------+---------------------- > >> | 1-ring | 1-link | multi-links > >> -------------+----------------------------+---------+----------- > >> dlm | tcp | tcp | sctp > >> -------------+----------------------------+---------+----------- > >> > >> At last, this patch should be work with updated kernel dlm module. > > > > I am not getting why the network protocol configuration has anything > > to do with the corosync configuration. > > I know that we currently get the address configurations from corosync > > but with this patch we are forced to use SCTP when corosync provides > > more than one "ring" configuration? > > Yes. this patch will force dlm to change to SCTP when corosync provides > more than one "ring". > > The reason: > (without this patch) When a user sets up multi-links on corosync3 > and corosync.conf with an incorrect or missing rrp_mode, > dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report > an error. > Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this > config item in the further. Therefore, the network protocol will > always be TCP. > > > > > Even with corosync3 it should be possible to use corosync in SCTP > > (multiple rings) and the kernel dlm using TCP only, would this not be > > possible with dlm_controld then? > > Only one case for above case: corosync3 on single-link. > A new patch is needed for dlm to work over TCP when corosync3 in SCTP > (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return > -EINVAL when 'dlm_local_count > 1'. > I think we should change that condition then. > A key point for dlm is that there is no way to get the corosync version. > This patch is compatible with corosync2 env. In corosync2, the user must > correctly config rrp_mode when using 2-ring. > So far I looked into it, it is anyway for detecting a protocol according to some Corosync functionality it should still be possible to always force dlm_controld using a different protocol by setting the right config values/parameters. > i.e.: > In corosync2, change to 2-ring from 1-ring (whatever multicast mode). > There must include rrp_mode item, if not, error report: > corosync[1284]: [MAIN ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none. > corosync[1284]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1415. > > Even more, since corosync3 isn't compatible with corosync2, > in my view, latest version of dlm_tools should only focus on corosync3 > and drop corosync2 support. If any Linux distribution stay with > corosync2, they should choose an old version of dlm_tools. > Is dlm_tool not just a domain socket talking to dlm_controld? It does not use any library of the Corosync project? - Alex ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-08 15:54 ` Alexander Aring @ 2025-01-09 2:26 ` Heming Zhao 2025-01-09 15:34 ` Alexander Aring 0 siblings, 1 reply; 16+ messages in thread From: Heming Zhao @ 2025-01-09 2:26 UTC (permalink / raw) To: Alexander Aring Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 1/8/25 23:54, Alexander Aring wrote: > Hi, > > On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote: >> >> On 1/7/25 02:11, Alexander Aring wrote: >>> Hi Heming, >>> >>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: >>>> >>>> The totem.rrp_mode config item was obsolete in corosync3. And >>>> this patch gives dlm_controld the ability to detect multiple >>>> links. >>>> >>>> The corosync and dlm network protocol relationship table: >>>> >>>> -------------+-----------------------+--------------------- >>>> | totem.transport=udpu | totem.transport=udp >>>> +-----------------------+--------------------- >>>> corosync 2.x | | | multicast >>>> | 1-ring | 2-ring |--------------------- >>>> | | | default | 2-ring >>>> -------------+------------+----------+--------------------- >>>> dlm | tcp | sctp | tcp | sctp >>>> -------------+------------+----------+--------------------- >>>> >>>> -------------+----------------------------+---------------------- >>>> | totem.transport = udpu/udp | totem.transport=knet >>>> corosync 3.x |----------------------------+---------------------- >>>> | 1-ring | 1-link | multi-links >>>> -------------+----------------------------+---------+----------- >>>> dlm | tcp | tcp | sctp >>>> -------------+----------------------------+---------+----------- >>>> >>>> At last, this patch should be work with updated kernel dlm module. >>> >>> I am not getting why the network protocol configuration has anything >>> to do with the corosync configuration. >>> I know that we currently get the address configurations from corosync >>> but with this patch we are forced to use SCTP when corosync provides >>> more than one "ring" configuration? >> >> Yes. this patch will force dlm to change to SCTP when corosync provides >> more than one "ring". >> >> The reason: >> (without this patch) When a user sets up multi-links on corosync3 >> and corosync.conf with an incorrect or missing rrp_mode, >> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report >> an error. >> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this >> config item in the further. Therefore, the network protocol will >> always be TCP. >> >>> >>> Even with corosync3 it should be possible to use corosync in SCTP >>> (multiple rings) and the kernel dlm using TCP only, would this not be >>> possible with dlm_controld then? >> >> Only one case for above case: corosync3 on single-link. >> A new patch is needed for dlm to work over TCP when corosync3 in SCTP >> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return >> -EINVAL when 'dlm_local_count > 1'. >> > > I think we should change that condition then. > >> A key point for dlm is that there is no way to get the corosync version. >> This patch is compatible with corosync2 env. In corosync2, the user must >> correctly config rrp_mode when using 2-ring. >> > > So far I looked into it, it is anyway for detecting a protocol > according to some Corosync functionality it should still be possible > to always force dlm_controld using a different protocol by setting the > right config values/parameters. Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass the detection phase when its value is "tcp|sctp". But in general, dlm.conf is seldom used. Unfortunately, corosync doesn't provide the api. ref: https://github.com/corosync/corosync/issues/771 > >> i.e.: >> In corosync2, change to 2-ring from 1-ring (whatever multicast mode). >> There must include rrp_mode item, if not, error report: >> corosync[1284]: [MAIN ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none. >> corosync[1284]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1415. >> >> Even more, since corosync3 isn't compatible with corosync2, >> in my view, latest version of dlm_tools should only focus on corosync3 >> and drop corosync2 support. If any Linux distribution stay with >> corosync2, they should choose an old version of dlm_tools. >> > > Is dlm_tool not just a domain socket talking to dlm_controld? It does > not use any library of the Corosync project? detect_protocol(@dlm_controld/action.c) uses the cmap APIs to get corosync settings. dlm_controld registers a callback in setup_cluster(), then corosync uses quorum_callback/quorum_nodelist_callback to notify the dlm_controld daemon about quorum status. - Heming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-09 2:26 ` Heming Zhao @ 2025-01-09 15:34 ` Alexander Aring 2025-01-09 15:38 ` christine caulfield 2025-01-10 14:28 ` Heming Zhao 0 siblings, 2 replies; 16+ messages in thread From: Alexander Aring @ 2025-01-09 15:34 UTC (permalink / raw) To: Heming Zhao Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou Hi Heming, On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote: > > On 1/8/25 23:54, Alexander Aring wrote: > > Hi, > > > > On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote: > >> > >> On 1/7/25 02:11, Alexander Aring wrote: > >>> Hi Heming, > >>> > >>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: > >>>> > >>>> The totem.rrp_mode config item was obsolete in corosync3. And > >>>> this patch gives dlm_controld the ability to detect multiple > >>>> links. > >>>> > >>>> The corosync and dlm network protocol relationship table: > >>>> > >>>> -------------+-----------------------+--------------------- > >>>> | totem.transport=udpu | totem.transport=udp > >>>> +-----------------------+--------------------- > >>>> corosync 2.x | | | multicast > >>>> | 1-ring | 2-ring |--------------------- > >>>> | | | default | 2-ring > >>>> -------------+------------+----------+--------------------- > >>>> dlm | tcp | sctp | tcp | sctp > >>>> -------------+------------+----------+--------------------- > >>>> > >>>> -------------+----------------------------+---------------------- > >>>> | totem.transport = udpu/udp | totem.transport=knet > >>>> corosync 3.x |----------------------------+---------------------- > >>>> | 1-ring | 1-link | multi-links > >>>> -------------+----------------------------+---------+----------- > >>>> dlm | tcp | tcp | sctp > >>>> -------------+----------------------------+---------+----------- > >>>> > >>>> At last, this patch should be work with updated kernel dlm module. > >>> > >>> I am not getting why the network protocol configuration has anything > >>> to do with the corosync configuration. > >>> I know that we currently get the address configurations from corosync > >>> but with this patch we are forced to use SCTP when corosync provides > >>> more than one "ring" configuration? > >> > >> Yes. this patch will force dlm to change to SCTP when corosync provides > >> more than one "ring". > >> > >> The reason: > >> (without this patch) When a user sets up multi-links on corosync3 > >> and corosync.conf with an incorrect or missing rrp_mode, > >> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report > >> an error. > >> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this > >> config item in the further. Therefore, the network protocol will > >> always be TCP. > >> > >>> > >>> Even with corosync3 it should be possible to use corosync in SCTP > >>> (multiple rings) and the kernel dlm using TCP only, would this not be > >>> possible with dlm_controld then? > >> > >> Only one case for above case: corosync3 on single-link. > >> A new patch is needed for dlm to work over TCP when corosync3 in SCTP > >> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return > >> -EINVAL when 'dlm_local_count > 1'. > >> > > > > I think we should change that condition then. > > > >> A key point for dlm is that there is no way to get the corosync version. > >> This patch is compatible with corosync2 env. In corosync2, the user must > >> correctly config rrp_mode when using 2-ring. > >> > > > > So far I looked into it, it is anyway for detecting a protocol > > according to some Corosync functionality it should still be possible > > to always force dlm_controld using a different protocol by setting the > > right config values/parameters. > > Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass > the detection phase when its value is "tcp|sctp". But in general, dlm.conf > is seldom used. > > Unfortunately, corosync doesn't provide the api. > ref: https://github.com/corosync/corosync/issues/771 I have the following scenario in my head with detect_protocol(). Currently, if somebody uses knet with UDP and has multiple "nodelist.node.0.ring%d_addr" defined in Corosync but does not set "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as a parameter (it will use detect_protocol()"), then the DLM kernel will use TCP. After your patch the behaviour will be changed and the DLM kernel will use SCTP with the same configuration as before? - Alex ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-09 15:34 ` Alexander Aring @ 2025-01-09 15:38 ` christine caulfield 2025-01-10 14:28 ` Heming Zhao 1 sibling, 0 replies; 16+ messages in thread From: christine caulfield @ 2025-01-09 15:38 UTC (permalink / raw) To: Alexander Aring, Heming Zhao Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 09/01/2025 15:34, Alexander Aring wrote: > Hi Heming, > > On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote: >> >> On 1/8/25 23:54, Alexander Aring wrote: >>> Hi, >>> >>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote: >>>> >>>> On 1/7/25 02:11, Alexander Aring wrote: >>>>> Hi Heming, >>>>> >>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: >>>>>> >>>>>> The totem.rrp_mode config item was obsolete in corosync3. And >>>>>> this patch gives dlm_controld the ability to detect multiple >>>>>> links. >>>>>> >>>>>> The corosync and dlm network protocol relationship table: >>>>>> >>>>>> -------------+-----------------------+--------------------- >>>>>> | totem.transport=udpu | totem.transport=udp >>>>>> +-----------------------+--------------------- >>>>>> corosync 2.x | | | multicast >>>>>> | 1-ring | 2-ring |--------------------- >>>>>> | | | default | 2-ring >>>>>> -------------+------------+----------+--------------------- >>>>>> dlm | tcp | sctp | tcp | sctp >>>>>> -------------+------------+----------+--------------------- >>>>>> >>>>>> -------------+----------------------------+---------------------- >>>>>> | totem.transport = udpu/udp | totem.transport=knet >>>>>> corosync 3.x |----------------------------+---------------------- >>>>>> | 1-ring | 1-link | multi-links >>>>>> -------------+----------------------------+---------+----------- >>>>>> dlm | tcp | tcp | sctp >>>>>> -------------+----------------------------+---------+----------- >>>>>> >>>>>> At last, this patch should be work with updated kernel dlm module. >>>>> >>>>> I am not getting why the network protocol configuration has anything >>>>> to do with the corosync configuration. >>>>> I know that we currently get the address configurations from corosync >>>>> but with this patch we are forced to use SCTP when corosync provides >>>>> more than one "ring" configuration? >>>> >>>> Yes. this patch will force dlm to change to SCTP when corosync provides >>>> more than one "ring". >>>> >>>> The reason: >>>> (without this patch) When a user sets up multi-links on corosync3 >>>> and corosync.conf with an incorrect or missing rrp_mode, >>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report >>>> an error. >>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this >>>> config item in the further. Therefore, the network protocol will >>>> always be TCP. >>>> >>>>> >>>>> Even with corosync3 it should be possible to use corosync in SCTP >>>>> (multiple rings) and the kernel dlm using TCP only, would this not be >>>>> possible with dlm_controld then? >>>> >>>> Only one case for above case: corosync3 on single-link. >>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP >>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return >>>> -EINVAL when 'dlm_local_count > 1'. >>>> >>> >>> I think we should change that condition then. >>> >>>> A key point for dlm is that there is no way to get the corosync version. >>>> This patch is compatible with corosync2 env. In corosync2, the user must >>>> correctly config rrp_mode when using 2-ring. >>>> >>> >>> So far I looked into it, it is anyway for detecting a protocol >>> according to some Corosync functionality it should still be possible >>> to always force dlm_controld using a different protocol by setting the >>> right config values/parameters. >> >> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass >> the detection phase when its value is "tcp|sctp". But in general, dlm.conf >> is seldom used. >> >> Unfortunately, corosync doesn't provide the api. >> ref: https://github.com/corosync/corosync/issues/771 > > I have the following scenario in my head with detect_protocol(). > > Currently, if somebody uses knet with UDP and has multiple Just a quick intervention here. If corosync is using knet, then it doesn't matter if knet's protocol is UDP or SCTP, it can still provide up to 8 links. There is no point in you checking that value (also, different links can use different protocols). Yes, check corosync's transport: key for knet/udp/udpu - THAT will tell you whether multi-link is available. So, the logic goes: multilink = false if totem.rrp_mode == 1 then multilink = true if totem.transport == 'knet' then multilink = true That should cover it. Chrissie > "nodelist.node.0.ring%d_addr" defined in Corosync but does not set > "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as > a parameter (it will use detect_protocol()"), then the DLM kernel will > use TCP. > > After your patch the behaviour will be changed and the DLM kernel will > use SCTP with the same configuration as before? > > - Alex > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-09 15:34 ` Alexander Aring 2025-01-09 15:38 ` christine caulfield @ 2025-01-10 14:28 ` Heming Zhao 2025-01-10 14:43 ` christine caulfield 1 sibling, 1 reply; 16+ messages in thread From: Heming Zhao @ 2025-01-10 14:28 UTC (permalink / raw) To: Alexander Aring Cc: teigland, ccaulfie, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 1/9/25 23:34, Alexander Aring wrote: > Hi Heming, > > On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote: >> >> On 1/8/25 23:54, Alexander Aring wrote: >>> Hi, >>> >>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote: >>>> >>>> On 1/7/25 02:11, Alexander Aring wrote: >>>>> Hi Heming, >>>>> >>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: >>>>>> >>>>>> The totem.rrp_mode config item was obsolete in corosync3. And >>>>>> this patch gives dlm_controld the ability to detect multiple >>>>>> links. >>>>>> >>>>>> The corosync and dlm network protocol relationship table: >>>>>> >>>>>> -------------+-----------------------+--------------------- >>>>>> | totem.transport=udpu | totem.transport=udp >>>>>> +-----------------------+--------------------- >>>>>> corosync 2.x | | | multicast >>>>>> | 1-ring | 2-ring |--------------------- >>>>>> | | | default | 2-ring >>>>>> -------------+------------+----------+--------------------- >>>>>> dlm | tcp | sctp | tcp | sctp >>>>>> -------------+------------+----------+--------------------- >>>>>> >>>>>> -------------+----------------------------+---------------------- >>>>>> | totem.transport = udpu/udp | totem.transport=knet >>>>>> corosync 3.x |----------------------------+---------------------- >>>>>> | 1-ring | 1-link | multi-links >>>>>> -------------+----------------------------+---------+----------- >>>>>> dlm | tcp | tcp | sctp >>>>>> -------------+----------------------------+---------+----------- >>>>>> >>>>>> At last, this patch should be work with updated kernel dlm module. >>>>> >>>>> I am not getting why the network protocol configuration has anything >>>>> to do with the corosync configuration. >>>>> I know that we currently get the address configurations from corosync >>>>> but with this patch we are forced to use SCTP when corosync provides >>>>> more than one "ring" configuration? >>>> >>>> Yes. this patch will force dlm to change to SCTP when corosync provides >>>> more than one "ring". >>>> >>>> The reason: >>>> (without this patch) When a user sets up multi-links on corosync3 >>>> and corosync.conf with an incorrect or missing rrp_mode, >>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report >>>> an error. >>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this >>>> config item in the further. Therefore, the network protocol will >>>> always be TCP. >>>> >>>>> >>>>> Even with corosync3 it should be possible to use corosync in SCTP >>>>> (multiple rings) and the kernel dlm using TCP only, would this not be >>>>> possible with dlm_controld then? >>>> >>>> Only one case for above case: corosync3 on single-link. >>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP >>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return >>>> -EINVAL when 'dlm_local_count > 1'. >>>> >>> >>> I think we should change that condition then. >>> >>>> A key point for dlm is that there is no way to get the corosync version. >>>> This patch is compatible with corosync2 env. In corosync2, the user must >>>> correctly config rrp_mode when using 2-ring. >>>> >>> >>> So far I looked into it, it is anyway for detecting a protocol >>> according to some Corosync functionality it should still be possible >>> to always force dlm_controld using a different protocol by setting the >>> right config values/parameters. >> >> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass >> the detection phase when its value is "tcp|sctp". But in general, dlm.conf >> is seldom used. >> >> Unfortunately, corosync doesn't provide the api. >> ref: https://github.com/corosync/corosync/issues/771 > > I have the following scenario in my head with detect_protocol(). > > Currently, if somebody uses knet with UDP and has multiple > "nodelist.node.0.ring%d_addr" defined in Corosync but does not set > "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as > a parameter (it will use detect_protocol()"), then the DLM kernel will > use TCP. Since you wrote knet above, so the corosync version is 3.x. For your description, there are four points/places to notice. 1. The above setting never works in the SUSE HA stack. The reason I wrote in the previous mail is that corosync will report error: > corosync[1284]: [MAIN ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none. 2. (you are right) DLM kernel will uses TCP If corosync doesn't complain that the rrp_mode is missing. The current code (without my patch), dlm_tools func detect_protocol() returns '-1', which makes the DLM kernel use TCP. 3. DLM kernel module doesn't work current code (without my patch), DLM kernel dlm_tcp_listen_validate() will return -EINVAL when 'dlm_local_count > 1'. 4. Corosync using UDP/SCTP is transparent for dlm. UDP/UDPU just means corosync is under single-link. this is one rule of corosync 3.x. knet means corosync is under multi-link. there may be only one link present, or up to 8 links present. > > After your patch the behaviour will be changed and the DLM kernel will > use SCTP with the same configuration as before? > According to the corosync/dlm behaviour in SUSE HA stack (ref above 4 points), my patch: - corosync 3.x env, forces the dlm to use TCP when only one link exists. - corosync 3.x env, forces the dlm kernel to use SCTP when more than one link exists. - keep the same behaviour when running corosync 2.x. (since user must correctly set rrp_mode). /Heming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-10 14:28 ` Heming Zhao @ 2025-01-10 14:43 ` christine caulfield 2025-01-13 3:12 ` Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: christine caulfield @ 2025-01-10 14:43 UTC (permalink / raw) To: Heming Zhao, Alexander Aring Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 10/01/2025 14:28, Heming Zhao wrote: > On 1/9/25 23:34, Alexander Aring wrote: >> Hi Heming, >> >> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote: >>> >>> On 1/8/25 23:54, Alexander Aring wrote: >>>> Hi, >>>> >>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> >>>> wrote: >>>>> >>>>> On 1/7/25 02:11, Alexander Aring wrote: >>>>>> Hi Heming, >>>>>> >>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> >>>>>> wrote: >>>>>>> >>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And >>>>>>> this patch gives dlm_controld the ability to detect multiple >>>>>>> links. >>>>>>> >>>>>>> The corosync and dlm network protocol relationship table: >>>>>>> >>>>>>> -------------+-----------------------+--------------------- >>>>>>> | totem.transport=udpu | totem.transport=udp >>>>>>> +-----------------------+--------------------- >>>>>>> corosync 2.x | | | multicast >>>>>>> | 1-ring | 2-ring |--------------------- >>>>>>> | | | default | 2-ring >>>>>>> -------------+------------+----------+--------------------- >>>>>>> dlm | tcp | sctp | tcp | sctp >>>>>>> -------------+------------+----------+--------------------- >>>>>>> >>>>>>> -------------+---------------------------- >>>>>>> +---------------------- >>>>>>> | totem.transport = udpu/udp | totem.transport=knet >>>>>>> corosync 3.x |---------------------------- >>>>>>> +---------------------- >>>>>>> | 1-ring | 1-link | multi- >>>>>>> links >>>>>>> -------------+----------------------------+---------+----------- >>>>>>> dlm | tcp | tcp | sctp >>>>>>> -------------+----------------------------+---------+----------- >>>>>>> >>>>>>> At last, this patch should be work with updated kernel dlm module. >>>>>> >>>>>> I am not getting why the network protocol configuration has anything >>>>>> to do with the corosync configuration. >>>>>> I know that we currently get the address configurations from corosync >>>>>> but with this patch we are forced to use SCTP when corosync provides >>>>>> more than one "ring" configuration? >>>>> >>>>> Yes. this patch will force dlm to change to SCTP when corosync >>>>> provides >>>>> more than one "ring". >>>>> >>>>> The reason: >>>>> (without this patch) When a user sets up multi-links on corosync3 >>>>> and corosync.conf with an incorrect or missing rrp_mode, >>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and >>>>> report >>>>> an error. >>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read >>>>> this >>>>> config item in the further. Therefore, the network protocol will >>>>> always be TCP. >>>>> >>>>>> >>>>>> Even with corosync3 it should be possible to use corosync in SCTP >>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be >>>>>> possible with dlm_controld then? >>>>> >>>>> Only one case for above case: corosync3 on single-link. >>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP >>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return >>>>> -EINVAL when 'dlm_local_count > 1'. >>>>> >>>> >>>> I think we should change that condition then. >>>> >>>>> A key point for dlm is that there is no way to get the corosync >>>>> version. >>>>> This patch is compatible with corosync2 env. In corosync2, the user >>>>> must >>>>> correctly config rrp_mode when using 2-ring. >>>>> >>>> >>>> So far I looked into it, it is anyway for detecting a protocol >>>> according to some Corosync functionality it should still be possible >>>> to always force dlm_controld using a different protocol by setting the >>>> right config values/parameters. >>> >>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can >>> bypass >>> the detection phase when its value is "tcp|sctp". But in general, >>> dlm.conf >>> is seldom used. >>> >>> Unfortunately, corosync doesn't provide the api. >>> ref: https://github.com/corosync/corosync/issues/771 >> >> I have the following scenario in my head with detect_protocol(). >> >> Currently, if somebody uses knet with UDP and has multiple >> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set >> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as >> a parameter (it will use detect_protocol()"), then the DLM kernel will >> use TCP. > > Since you wrote knet above, so the corosync version is 3.x. > For your description, there are four points/places to notice. > > 1. The above setting never works in the SUSE HA stack. > > The reason I wrote in the previous mail is that corosync will report error: >> corosync[1284]: [MAIN ] parse error in config: 2 is too many >> configured interfaces for the rrp_mode setting none. > > 2. (you are right) DLM kernel will uses TCP > > If corosync doesn't complain that the rrp_mode is missing. > The current code (without my patch), dlm_tools func detect_protocol() > returns '-1', which makes the DLM kernel use TCP. > > 3. DLM kernel module doesn't work > > current code (without my patch), DLM kernel dlm_tcp_listen_validate() > will return -EINVAL when 'dlm_local_count > 1'. > > 4. Corosync using UDP/SCTP is transparent for dlm. > > UDP/UDPU just means corosync is under single-link. this is one > rule of corosync 3.x. > knet means corosync is under multi-link. there may be only one > link present, or up to 8 links present. > >> >> After your patch the behaviour will be changed and the DLM kernel will >> use SCTP with the same configuration as before? >> > > According to the corosync/dlm behaviour in SUSE HA stack > (ref above 4 points), my patch: > - corosync 3.x env, forces the dlm to use TCP when only one link exists. That's dangerous though, because corosync3 can dynamically add and remove links while running. It's quite possible (and explicitly supported) to create a cluster with only 1 link, and then add others later. Chrissie > - corosync 3.x env, forces the dlm kernel to use SCTP when more than > one link exists. > - keep the same behaviour when running corosync 2.x. (since user must > correctly set rrp_mode). > > > /Heming > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-10 14:43 ` christine caulfield @ 2025-01-13 3:12 ` Heming Zhao 2025-01-17 15:11 ` Alexander Aring 0 siblings, 1 reply; 16+ messages in thread From: Heming Zhao @ 2025-01-13 3:12 UTC (permalink / raw) To: christine caulfield, Alexander Aring Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 1/10/25 22:43, christine caulfield wrote: > > > On 10/01/2025 14:28, Heming Zhao wrote: >> On 1/9/25 23:34, Alexander Aring wrote: >>> Hi Heming, >>> >>> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote: >>>> >>>> On 1/8/25 23:54, Alexander Aring wrote: >>>>> Hi, >>>>> >>>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote: >>>>>> >>>>>> On 1/7/25 02:11, Alexander Aring wrote: >>>>>>> Hi Heming, >>>>>>> >>>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote: >>>>>>>> >>>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And >>>>>>>> this patch gives dlm_controld the ability to detect multiple >>>>>>>> links. >>>>>>>> >>>>>>>> The corosync and dlm network protocol relationship table: >>>>>>>> >>>>>>>> -------------+-----------------------+--------------------- >>>>>>>> | totem.transport=udpu | totem.transport=udp >>>>>>>> +-----------------------+--------------------- >>>>>>>> corosync 2.x | | | multicast >>>>>>>> | 1-ring | 2-ring |--------------------- >>>>>>>> | | | default | 2-ring >>>>>>>> -------------+------------+----------+--------------------- >>>>>>>> dlm | tcp | sctp | tcp | sctp >>>>>>>> -------------+------------+----------+--------------------- >>>>>>>> >>>>>>>> -------------+---------------------------- +---------------------- >>>>>>>> | totem.transport = udpu/udp | totem.transport=knet >>>>>>>> corosync 3.x |---------------------------- +---------------------- >>>>>>>> | 1-ring | 1-link | multi- links >>>>>>>> -------------+----------------------------+---------+----------- >>>>>>>> dlm | tcp | tcp | sctp >>>>>>>> -------------+----------------------------+---------+----------- >>>>>>>> >>>>>>>> At last, this patch should be work with updated kernel dlm module. >>>>>>> >>>>>>> I am not getting why the network protocol configuration has anything >>>>>>> to do with the corosync configuration. >>>>>>> I know that we currently get the address configurations from corosync >>>>>>> but with this patch we are forced to use SCTP when corosync provides >>>>>>> more than one "ring" configuration? >>>>>> >>>>>> Yes. this patch will force dlm to change to SCTP when corosync provides >>>>>> more than one "ring". >>>>>> >>>>>> The reason: >>>>>> (without this patch) When a user sets up multi-links on corosync3 >>>>>> and corosync.conf with an incorrect or missing rrp_mode, >>>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report >>>>>> an error. >>>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this >>>>>> config item in the further. Therefore, the network protocol will >>>>>> always be TCP. >>>>>> >>>>>>> >>>>>>> Even with corosync3 it should be possible to use corosync in SCTP >>>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be >>>>>>> possible with dlm_controld then? >>>>>> >>>>>> Only one case for above case: corosync3 on single-link. >>>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP >>>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return >>>>>> -EINVAL when 'dlm_local_count > 1'. >>>>>> >>>>> >>>>> I think we should change that condition then. >>>>> >>>>>> A key point for dlm is that there is no way to get the corosync version. >>>>>> This patch is compatible with corosync2 env. In corosync2, the user must >>>>>> correctly config rrp_mode when using 2-ring. >>>>>> >>>>> >>>>> So far I looked into it, it is anyway for detecting a protocol >>>>> according to some Corosync functionality it should still be possible >>>>> to always force dlm_controld using a different protocol by setting the >>>>> right config values/parameters. >>>> >>>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass >>>> the detection phase when its value is "tcp|sctp". But in general, dlm.conf >>>> is seldom used. >>>> >>>> Unfortunately, corosync doesn't provide the api. >>>> ref: https://github.com/corosync/corosync/issues/771 >>> >>> I have the following scenario in my head with detect_protocol(). >>> >>> Currently, if somebody uses knet with UDP and has multiple >>> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set >>> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as >>> a parameter (it will use detect_protocol()"), then the DLM kernel will >>> use TCP. >> >> Since you wrote knet above, so the corosync version is 3.x. >> For your description, there are four points/places to notice. >> >> 1. The above setting never works in the SUSE HA stack. >> >> The reason I wrote in the previous mail is that corosync will report error: >>> corosync[1284]: [MAIN ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none. >> >> 2. (you are right) DLM kernel will uses TCP >> >> If corosync doesn't complain that the rrp_mode is missing. >> The current code (without my patch), dlm_tools func detect_protocol() >> returns '-1', which makes the DLM kernel use TCP. >> >> 3. DLM kernel module doesn't work >> >> current code (without my patch), DLM kernel dlm_tcp_listen_validate() >> will return -EINVAL when 'dlm_local_count > 1'. >> >> 4. Corosync using UDP/SCTP is transparent for dlm. >> >> UDP/UDPU just means corosync is under single-link. this is one >> rule of corosync 3.x. >> knet means corosync is under multi-link. there may be only one >> link present, or up to 8 links present. >> >>> >>> After your patch the behaviour will be changed and the DLM kernel will >>> use SCTP with the same configuration as before? >>> >> >> According to the corosync/dlm behaviour in SUSE HA stack >> (ref above 4 points), my patch: >> - corosync 3.x env, forces the dlm to use TCP when only one link exists. > > That's dangerous though, because corosync3 can dynamically add and remove links while running. It's quite possible (and explicitly supported) to create a cluster with only 1 link, and then add others later. > > Chrissie The current dlm code design doesn't allow reconfiguring the network protocol on the fly. In the above scenario, the dlm will maintain the TCP connection until the next dlm_deamon restart. In my view, it's not essential for dlm to follow the knet dynamically multi-link style. if the user hasn't set the 'protocol' item in dlm.conf, (with my patch, for knet env), dlm will detect the corosync nodelist on startup, and set the appropriate protocol mode. If the user want to keep maintain a multi-link for dlm, they should set protocl item in dlm.conf. On the other hand, if dlm needs to dynamically change the number of links during runtime, it should always use the SCTP protocol. Thanks, Heming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-13 3:12 ` Heming Zhao @ 2025-01-17 15:11 ` Alexander Aring 2025-01-17 15:16 ` christine caulfield 0 siblings, 1 reply; 16+ messages in thread From: Alexander Aring @ 2025-01-17 15:11 UTC (permalink / raw) To: Heming Zhao Cc: christine caulfield, teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou Hi, On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote: ... > > The current dlm code design doesn't allow reconfiguring the network > protocol on the fly. In the above scenario, the dlm will maintain the > TCP connection until the next dlm_deamon restart. > > In my view, it's not essential for dlm to follow the knet dynamically > multi-link style. if the user hasn't set the 'protocol' item in > dlm.conf, (with my patch, for knet env), dlm will detect the corosync > nodelist on startup, and set the appropriate protocol mode. > > If the user want to keep maintain a multi-link for dlm, they should > set protocl item in dlm.conf. > > On the other hand, if dlm needs to dynamically change the number of > links during runtime, it should always use the SCTP protocol. to move on here, I would at first apply/ack the patches for the static maximum address increase. I see no problems to do that as this is required by you. For the automatic protocol detection and corosync3, at least I need to doing some tests as I have currently 0 ideas how corosync informs the users about new/removed links during runtime. - Alex ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-17 15:11 ` Alexander Aring @ 2025-01-17 15:16 ` christine caulfield 2025-02-18 11:46 ` Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: christine caulfield @ 2025-01-17 15:16 UTC (permalink / raw) To: Alexander Aring, Heming Zhao Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 17/01/2025 15:11, Alexander Aring wrote: > Hi, > > On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote: > ... >> >> The current dlm code design doesn't allow reconfiguring the network >> protocol on the fly. In the above scenario, the dlm will maintain the >> TCP connection until the next dlm_deamon restart. >> >> In my view, it's not essential for dlm to follow the knet dynamically >> multi-link style. if the user hasn't set the 'protocol' item in >> dlm.conf, (with my patch, for knet env), dlm will detect the corosync >> nodelist on startup, and set the appropriate protocol mode. >> >> If the user want to keep maintain a multi-link for dlm, they should >> set protocl item in dlm.conf. >> >> On the other hand, if dlm needs to dynamically change the number of >> links during runtime, it should always use the SCTP protocol. > > to move on here, I would at first apply/ack the patches for the static > maximum address increase. > > I see no problems to do that as this is required by you. > > For the automatic protocol detection and corosync3, at least I need to > doing some tests as I have currently 0 ideas how corosync informs the > users about new/removed links during runtime. > To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :) Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf. Chrissie > - Alex > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-01-17 15:16 ` christine caulfield @ 2025-02-18 11:46 ` Heming Zhao 2025-02-18 16:35 ` Alexander Aring 0 siblings, 1 reply; 16+ messages in thread From: Heming Zhao @ 2025-02-18 11:46 UTC (permalink / raw) To: christine caulfield, Alexander Aring Cc: teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou Hello Christine and Alexander, On 1/17/25 23:16, christine caulfield wrote: > On 17/01/2025 15:11, Alexander Aring wrote: >> Hi, >> >> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote: >> ... >>> >>> The current dlm code design doesn't allow reconfiguring the network >>> protocol on the fly. In the above scenario, the dlm will maintain the >>> TCP connection until the next dlm_deamon restart. >>> >>> In my view, it's not essential for dlm to follow the knet dynamically >>> multi-link style. if the user hasn't set the 'protocol' item in >>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync >>> nodelist on startup, and set the appropriate protocol mode. >>> >>> If the user want to keep maintain a multi-link for dlm, they should >>> set protocl item in dlm.conf. >>> >>> On the other hand, if dlm needs to dynamically change the number of >>> links during runtime, it should always use the SCTP protocol. >> >> to move on here, I would at first apply/ack the patches for the static >> maximum address increase. >> >> I see no problems to do that as this is required by you. >> >> For the automatic protocol detection and corosync3, at least I need to >> doing some tests as I have currently 0 ideas how corosync informs the >> users about new/removed links during runtime. >> > > To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :) > > Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf. > > Chrissie > >> - Alex >> > Sorry to raise this context. I suspect Chrissie misunderstood Alex's writing, Alex comment is about another patch [1], not this patch. In my view, it's not harmful to increase the address number in kernel space. The multiple addresses are SCTP specific, Even if some of the eight links go down/break, with the SCTP redundant path feature, the SCTP can still work normally. [1]: https://lore.kernel.org/gfs2/20241220071418.16786-2-heming.zhao@suse.com/ - Heming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-02-18 11:46 ` Heming Zhao @ 2025-02-18 16:35 ` Alexander Aring 2025-02-20 3:56 ` Heming Zhao 0 siblings, 1 reply; 16+ messages in thread From: Alexander Aring @ 2025-02-18 16:35 UTC (permalink / raw) To: Heming Zhao Cc: christine caulfield, teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou Hi, On Tue, Feb 18, 2025 at 6:46 AM Heming Zhao <heming.zhao@suse.com> wrote: > > Hello Christine and Alexander, > > On 1/17/25 23:16, christine caulfield wrote: > > On 17/01/2025 15:11, Alexander Aring wrote: > >> Hi, > >> > >> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote: > >> ... > >>> > >>> The current dlm code design doesn't allow reconfiguring the network > >>> protocol on the fly. In the above scenario, the dlm will maintain the > >>> TCP connection until the next dlm_deamon restart. > >>> > >>> In my view, it's not essential for dlm to follow the knet dynamically > >>> multi-link style. if the user hasn't set the 'protocol' item in > >>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync > >>> nodelist on startup, and set the appropriate protocol mode. > >>> > >>> If the user want to keep maintain a multi-link for dlm, they should > >>> set protocl item in dlm.conf. > >>> > >>> On the other hand, if dlm needs to dynamically change the number of > >>> links during runtime, it should always use the SCTP protocol. > >> > >> to move on here, I would at first apply/ack the patches for the static > >> maximum address increase. > >> > >> I see no problems to do that as this is required by you. > >> > >> For the automatic protocol detection and corosync3, at least I need to > >> doing some tests as I have currently 0 ideas how corosync informs the > >> users about new/removed links during runtime. > >> > > > > To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :) > > > > Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf. > > > > Chrissie > > > >> - Alex > >> > > > > Sorry to raise this context. > > I suspect Chrissie misunderstood Alex's writing, Alex comment is about another > patch [1], not this patch. > In my view, it's not harmful to increase the address number in kernel space. good news, the patch is already upstream. [0] Yes it is indeed not harmful to increase that value. We need to test/discuss the user space changes with corosync v2 vs v3 related to detect_protocol()? If you split this patch to have only the "MAX_NODE_ADDRESSES", then I can already apply this change. Btw: should it not be 9 (kernel 8) when previously it was 4 (kernel 3)? I feel there might be a termination entry at the end of the array? - Alex [0] https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/commit/?h=next&id=a53a6336171bd722aac6e98964a79d56841c5416 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link 2025-02-18 16:35 ` Alexander Aring @ 2025-02-20 3:56 ` Heming Zhao 0 siblings, 0 replies; 16+ messages in thread From: Heming Zhao @ 2025-02-20 3:56 UTC (permalink / raw) To: Alexander Aring Cc: christine caulfield, teigland, jfriesse, nicholas.yang, glass.su, gfs2, Roger Zhou On 2/19/25 00:35, Alexander Aring wrote: > Hi, > > On Tue, Feb 18, 2025 at 6:46 AM Heming Zhao <heming.zhao@suse.com> wrote: >> >> Hello Christine and Alexander, >> >> On 1/17/25 23:16, christine caulfield wrote: >>> On 17/01/2025 15:11, Alexander Aring wrote: >>>> Hi, >>>> >>>> On Sun, Jan 12, 2025 at 10:13 PM Heming Zhao <heming.zhao@suse.com> wrote: >>>> ... >>>>> >>>>> The current dlm code design doesn't allow reconfiguring the network >>>>> protocol on the fly. In the above scenario, the dlm will maintain the >>>>> TCP connection until the next dlm_deamon restart. >>>>> >>>>> In my view, it's not essential for dlm to follow the knet dynamically >>>>> multi-link style. if the user hasn't set the 'protocol' item in >>>>> dlm.conf, (with my patch, for knet env), dlm will detect the corosync >>>>> nodelist on startup, and set the appropriate protocol mode. >>>>> >>>>> If the user want to keep maintain a multi-link for dlm, they should >>>>> set protocl item in dlm.conf. >>>>> >>>>> On the other hand, if dlm needs to dynamically change the number of >>>>> links during runtime, it should always use the SCTP protocol. >>>> >>>> to move on here, I would at first apply/ack the patches for the static >>>> maximum address increase. >>>> >>>> I see no problems to do that as this is required by you. >>>> >>>> For the automatic protocol detection and corosync3, at least I need to >>>> doing some tests as I have currently 0 ideas how corosync informs the >>>> users about new/removed links during runtime. >>>> >>> >>> To be honest, it mostly doesn't - and it's maybe something we could look into, though I doubt it would be high priority as this is the first time it's been thought of :) >>> >>> Currently you can watch for the cmap key config.reload_in_progress and after that goes from 1 back to 0 re-read the new corosync.conf. >>> >>> Chrissie >>> >>>> - Alex >>>> >>> >> >> Sorry to raise this context. >> >> I suspect Chrissie misunderstood Alex's writing, Alex comment is about another >> patch [1], not this patch. >> In my view, it's not harmful to increase the address number in kernel space. > > good news, the patch is already upstream. [0] Yes it is indeed not > harmful to increase that value. > > We need to test/discuss the user space changes with corosync v2 vs v3 > related to detect_protocol()? Welcome to discuss. If we want to follow corosync/knet style to support dynamically network link changes, more code modifications will be necessary. > > If you split this patch to have only the "MAX_NODE_ADDRESSES", then I zhm: I will send a split patch for 'MAX_NODE_ADDRESSES'. > can already apply this change. Btw: should it not be 9 (kernel 8) when > previously it was 4 (kernel 3)? I feel there might be a termination > entry at the end of the array? The macro used in both kernel-space and user-space play a termination role. However, in my view, both values are incorrect. The correct and workable values for Corosync 2 are: - DLM_MAX_ADDR_COUNT (kernel-space) should be 2, not 3 - MAX_NODE_ADDRESSES (user-space) should be 2, not 4. The original author seems to use an enough number to accommodate potential future increases in network links. Regarding MAX_NODE_ADDRESSES, the dlm_controld process uses corosync_cfg_get_node_addrs() to request a maximum of four addresses. However, Corosync 2 only supports a two-ring network. Similarly, the current DLM kernel code can accept three sets of network addresses. I tested the modifications with DLM_MAX_ADDR_COUNT set to 2 and MAX_NODE_ADDRESSES also set to 2, and the cluster operated normally. So for my patches: - in kernel-space, I define DLM_MAX_ADDR_COUNT as 8 (not 9) - in user-space, I define MAX_NODE_ADDRESSES as 8 (not the kernel value plus 1) - I tested the code before sending both the v1 and v2 patches. If you very like the style of kernel value plus 1, I am OK to define MAX_NODE_ADDRESSES as 9. - Heming ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-02-20 3:56 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-24 8:42 [PATCH 0/1] dlm_tools: support corosync3/knet multi-link Heming Zhao 2024-12-24 8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao 2025-01-06 18:11 ` Alexander Aring 2025-01-07 4:59 ` Heming Zhao 2025-01-08 15:54 ` Alexander Aring 2025-01-09 2:26 ` Heming Zhao 2025-01-09 15:34 ` Alexander Aring 2025-01-09 15:38 ` christine caulfield 2025-01-10 14:28 ` Heming Zhao 2025-01-10 14:43 ` christine caulfield 2025-01-13 3:12 ` Heming Zhao 2025-01-17 15:11 ` Alexander Aring 2025-01-17 15:16 ` christine caulfield 2025-02-18 11:46 ` Heming Zhao 2025-02-18 16:35 ` Alexander Aring 2025-02-20 3:56 ` Heming Zhao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox