From: Heming Zhao <heming.zhao@suse.com>
To: christine caulfield <ccaulfie@redhat.com>,
Alexander Aring <aahringo@redhat.com>
Cc: teigland@redhat.com, jfriesse@redhat.com, nicholas.yang@suse.com,
glass.su@suse.com, gfs2@lists.linux.dev,
Roger Zhou <ZZhou@suse.com>
Subject: Re: [PATCH v2 1/1] dlm_controld: support corosync3/knet multi-link
Date: Mon, 13 Jan 2025 11:12:48 +0800 [thread overview]
Message-ID: <55a997d3-df7b-4ead-8ddd-d4819ca95cf0@suse.com> (raw)
In-Reply-To: <aaec163a-fadf-4c7d-a193-9b0eb97d584b@redhat.com>
On 1/10/25 22:43, christine caulfield wrote:
>
>
> On 10/01/2025 14:28, Heming Zhao wrote:
>> On 1/9/25 23:34, Alexander Aring wrote:
>>> Hi Heming,
>>>
>>> On Wed, Jan 8, 2025 at 9:26 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>
>>>> On 1/8/25 23:54, Alexander Aring wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, Jan 6, 2025 at 11:59 PM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>>>
>>>>>> On 1/7/25 02:11, Alexander Aring wrote:
>>>>>>> Hi Heming,
>>>>>>>
>>>>>>> On Tue, Dec 24, 2024 at 3:42 AM Heming Zhao <heming.zhao@suse.com> wrote:
>>>>>>>>
>>>>>>>> The totem.rrp_mode config item was obsolete in corosync3. And
>>>>>>>> this patch gives dlm_controld the ability to detect multiple
>>>>>>>> links.
>>>>>>>>
>>>>>>>> The corosync and dlm network protocol relationship table:
>>>>>>>>
>>>>>>>> -------------+-----------------------+---------------------
>>>>>>>> | totem.transport=udpu | totem.transport=udp
>>>>>>>> +-----------------------+---------------------
>>>>>>>> corosync 2.x | | | multicast
>>>>>>>> | 1-ring | 2-ring |---------------------
>>>>>>>> | | | default | 2-ring
>>>>>>>> -------------+------------+----------+---------------------
>>>>>>>> dlm | tcp | sctp | tcp | sctp
>>>>>>>> -------------+------------+----------+---------------------
>>>>>>>>
>>>>>>>> -------------+---------------------------- +----------------------
>>>>>>>> | totem.transport = udpu/udp | totem.transport=knet
>>>>>>>> corosync 3.x |---------------------------- +----------------------
>>>>>>>> | 1-ring | 1-link | multi- links
>>>>>>>> -------------+----------------------------+---------+-----------
>>>>>>>> dlm | tcp | tcp | sctp
>>>>>>>> -------------+----------------------------+---------+-----------
>>>>>>>>
>>>>>>>> At last, this patch should be work with updated kernel dlm module.
>>>>>>>
>>>>>>> I am not getting why the network protocol configuration has anything
>>>>>>> to do with the corosync configuration.
>>>>>>> I know that we currently get the address configurations from corosync
>>>>>>> but with this patch we are forced to use SCTP when corosync provides
>>>>>>> more than one "ring" configuration?
>>>>>>
>>>>>> Yes. this patch will force dlm to change to SCTP when corosync provides
>>>>>> more than one "ring".
>>>>>>
>>>>>> The reason:
>>>>>> (without this patch) When a user sets up multi-links on corosync3
>>>>>> and corosync.conf with an incorrect or missing rrp_mode,
>>>>>> dlm_tcp_listen_validate() will trigger 'dlm_local_count > 1' and report
>>>>>> an error.
>>>>>> Please note, rrp_mode is obsolete; the dlm_daemon will fail to read this
>>>>>> config item in the further. Therefore, the network protocol will
>>>>>> always be TCP.
>>>>>>
>>>>>>>
>>>>>>> Even with corosync3 it should be possible to use corosync in SCTP
>>>>>>> (multiple rings) and the kernel dlm using TCP only, would this not be
>>>>>>> possible with dlm_controld then?
>>>>>>
>>>>>> Only one case for above case: corosync3 on single-link.
>>>>>> A new patch is needed for dlm to work over TCP when corosync3 in SCTP
>>>>>> (multi-link mode). i.e. dlm_tcp_listen_validate() shouldn't return
>>>>>> -EINVAL when 'dlm_local_count > 1'.
>>>>>>
>>>>>
>>>>> I think we should change that condition then.
>>>>>
>>>>>> A key point for dlm is that there is no way to get the corosync version.
>>>>>> This patch is compatible with corosync2 env. In corosync2, the user must
>>>>>> correctly config rrp_mode when using 2-ring.
>>>>>>
>>>>>
>>>>> So far I looked into it, it is anyway for detecting a protocol
>>>>> according to some Corosync functionality it should still be possible
>>>>> to always force dlm_controld using a different protocol by setting the
>>>>> right config values/parameters.
>>>>
>>>> Yes, I forgot the config item 'protocol=[detect|tcp|sctp]', which can bypass
>>>> the detection phase when its value is "tcp|sctp". But in general, dlm.conf
>>>> is seldom used.
>>>>
>>>> Unfortunately, corosync doesn't provide the api.
>>>> ref: https://github.com/corosync/corosync/issues/771
>>>
>>> I have the following scenario in my head with detect_protocol().
>>>
>>> Currently, if somebody uses knet with UDP and has multiple
>>> "nodelist.node.0.ring%d_addr" defined in Corosync but does not set
>>> "totem.rrp_mode" and there is no "protocol" setting in dlm.conf or as
>>> a parameter (it will use detect_protocol()"), then the DLM kernel will
>>> use TCP.
>>
>> Since you wrote knet above, so the corosync version is 3.x.
>> For your description, there are four points/places to notice.
>>
>> 1. The above setting never works in the SUSE HA stack.
>>
>> The reason I wrote in the previous mail is that corosync will report error:
>>> corosync[1284]: [MAIN ] parse error in config: 2 is too many configured interfaces for the rrp_mode setting none.
>>
>> 2. (you are right) DLM kernel will uses TCP
>>
>> If corosync doesn't complain that the rrp_mode is missing.
>> The current code (without my patch), dlm_tools func detect_protocol()
>> returns '-1', which makes the DLM kernel use TCP.
>>
>> 3. DLM kernel module doesn't work
>>
>> current code (without my patch), DLM kernel dlm_tcp_listen_validate()
>> will return -EINVAL when 'dlm_local_count > 1'.
>>
>> 4. Corosync using UDP/SCTP is transparent for dlm.
>>
>> UDP/UDPU just means corosync is under single-link. this is one
>> rule of corosync 3.x.
>> knet means corosync is under multi-link. there may be only one
>> link present, or up to 8 links present.
>>
>>>
>>> After your patch the behaviour will be changed and the DLM kernel will
>>> use SCTP with the same configuration as before?
>>>
>>
>> According to the corosync/dlm behaviour in SUSE HA stack
>> (ref above 4 points), my patch:
>> - corosync 3.x env, forces the dlm to use TCP when only one link exists.
>
> That's dangerous though, because corosync3 can dynamically add and remove links while running. It's quite possible (and explicitly supported) to create a cluster with only 1 link, and then add others later.
>
> Chrissie
The current dlm code design doesn't allow reconfiguring the network
protocol on the fly. In the above scenario, the dlm will maintain the
TCP connection until the next dlm_deamon restart.
In my view, it's not essential for dlm to follow the knet dynamically
multi-link style. if the user hasn't set the 'protocol' item in
dlm.conf, (with my patch, for knet env), dlm will detect the corosync
nodelist on startup, and set the appropriate protocol mode.
If the user want to keep maintain a multi-link for dlm, they should
set protocl item in dlm.conf.
On the other hand, if dlm needs to dynamically change the number of
links during runtime, it should always use the SCTP protocol.
Thanks,
Heming
next prev parent reply other threads:[~2025-01-13 3:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-24 8:42 [PATCH 0/1] dlm_tools: support corosync3/knet multi-link Heming Zhao
2024-12-24 8:42 ` [PATCH v2 1/1] dlm_controld: " Heming Zhao
2025-01-06 18:11 ` Alexander Aring
2025-01-07 4:59 ` Heming Zhao
2025-01-08 15:54 ` Alexander Aring
2025-01-09 2:26 ` Heming Zhao
2025-01-09 15:34 ` Alexander Aring
2025-01-09 15:38 ` christine caulfield
2025-01-10 14:28 ` Heming Zhao
2025-01-10 14:43 ` christine caulfield
2025-01-13 3:12 ` Heming Zhao [this message]
2025-01-17 15:11 ` Alexander Aring
2025-01-17 15:16 ` christine caulfield
2025-02-18 11:46 ` Heming Zhao
2025-02-18 16:35 ` Alexander Aring
2025-02-20 3:56 ` Heming Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55a997d3-df7b-4ead-8ddd-d4819ca95cf0@suse.com \
--to=heming.zhao@suse.com \
--cc=ZZhou@suse.com \
--cc=aahringo@redhat.com \
--cc=ccaulfie@redhat.com \
--cc=gfs2@lists.linux.dev \
--cc=glass.su@suse.com \
--cc=jfriesse@redhat.com \
--cc=nicholas.yang@suse.com \
--cc=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox