public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: "D. Wythe" <alibuda@linux.alibaba.com>
To: Mat Martineau <martineau@kernel.org>,
	Matthieu Baerts <matttbe@kernel.org>
Cc: kgraul@linux.ibm.com, wenjia@linux.ibm.com, jaka@linux.ibm.com,
	wintera@linux.ibm.com, guwen@linux.alibaba.com, kuba@kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org,
	tonylu@linux.alibaba.com, Paolo Abeni <pabeni@redhat.com>,
	edumazet@google.com
Subject: Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
Date: Sat, 8 Jun 2024 03:35:12 +0800	[thread overview]
Message-ID: <e6b66001-f3cb-4367-aeaf-600fbc5f77b2@linux.alibaba.com> (raw)
In-Reply-To: <61b94bf6-a383-afff-db62-261cac7360c7@kernel.org>



On 6/8/24 12:47 AM, Mat Martineau wrote:
> On Fri, 7 Jun 2024, Matthieu Baerts wrote:
>
>> Hi D.Wythe,
>>
>> On 07/06/2024 07:09, D. Wythe wrote:
>>>
>>> On 6/7/24 5:22 AM, Mat Martineau wrote:
>>>> On Wed, 5 Jun 2024, D. Wythe wrote:
>>>>
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> This patch allows to create smc socket via AF_INET,
>>>>> similar to the following code,
>>>>>
>>>>> /* create v4 smc sock */
>>>>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>>>>>
>>>>> /* create v6 smc sock */
>>>>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>>>>>
>>>>> There are several reasons why we believe it is appropriate here:
>>>>>
>>>>> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
>>>>> address. There is no AF_SMC address at all.
>>>>>
>>>>> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
>>>>> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
>>>>> Otherwise, smc have to implement it again in AF_SMC path.
>>>>>
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>>>> ---
>>>>> include/uapi/linux/in.h |   2 +
>>>>> net/smc/Makefile        |   2 +-
>>>>> net/smc/af_smc.c        |  16 ++++-
>>>>> net/smc/smc_inet.c      | 169 +++++++++++++++++++++++++++++++++++++++
>>>>> +++++++++
>>>>> net/smc/smc_inet.h      |  22 +++++++
>>>>> 5 files changed, 208 insertions(+), 3 deletions(-)
>>>>> create mode 100644 net/smc/smc_inet.c
>>>>> create mode 100644 net/smc/smc_inet.h
>>>>>
>>>>> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>>>>> index e682ab6..0c6322b 100644
>>>>> --- a/include/uapi/linux/in.h
>>>>> +++ b/include/uapi/linux/in.h
>>>>> @@ -83,6 +83,8 @@ enum {
>>>>> #define IPPROTO_RAW        IPPROTO_RAW
>>>>>   IPPROTO_MPTCP = 262,        /* Multipath TCP connection */
>>>>> #define IPPROTO_MPTCP        IPPROTO_MPTCP
>>>>> +  IPPROTO_SMC = 263,        /* Shared Memory 
>>>>> Communications        */
>>>>> +#define IPPROTO_SMC        IPPROTO_SMC
>>>>
>>>> Hello,
>>>>
>>>> It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
>>>> value. Making IPPROTO_MAX larger does increase the size of the
>>>> inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
>>>> without increasing IPPROTO_MAX.
>>>>
>>>> Just for background: When we added IPPROTO_MPTCP, we chose 262 because
>>>> it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
>>>> wide so we knew we would not conflict with any future additions, and
>>>> in the case of MPTCP is was convenient that truncating the proto value
>>>> to 8 bits would match IPPROTO_TCP.
>>>>
>>>> - Mat
>>>>
>>>
>>> Hi Mat,
>>>
>>> Thank you very much for your feedback, I have always been curious about
>>> the origins of IPPROTO_MPTCP and I am glad to
>>> have learned new knowledge.
>>>
>
> Hi D. Whythe -
>
> Sure, you're welcome!
>
>>> Regarding the size issue of inet_diag_tables, what you said does make
>>> sense. However, we still hope to continue using 263,
>>> although the rationale may not be fully sufficient, as this series has
>>> been under community evaluation for quite some time now,
>>> and we haven't received any feedback about this value, so we’ve been
>>> using it in some user-space tools ... 🙁
>>>
>
> It's definitely a tradeoff between the Linux UAPI that gets locked in 
> forever vs. handling a transition with your userspace tools. If you 
> change the numeric value of IPPROTO_SMC on the open source side you 
> could transition internally by carrying a kernel patch that allows 
> both the new and old value.
>
>>> I would like to see what the community thinks. If everyone agrees that
>>> using 263 will be completely unacceptable and a disaster,
>>> then we will have no choice but to change it.
>>
>> It will not be a disaster, but a small waste of space (even if
>> CONFIG_SMC is not set).
>
> Well stated Matthieu :)  I chose my "not required" wording carefully, 
> as I didn't want to demand a change here but to make you aware of some 
> of the tradeoffs to consider. And thankfully Matthieu remembered the 
> userspace issues below.
>
> Also, I see that one of the netdev maintainers flagged this v6 series 
> as "changes requested" in patchwork so that may indicate their 
> preference?
>
>>
>> Also, please note that the introduction of IPPROTO_MPTCP caused some
>> troubles in some userspace programs. That was mainly because IPPROTO_MAX
>> got updated, and they didn't expect that, e.g. a quick search on GitHub
>> gave me this:
>>
>>  https://github.com/systemd/systemd/issues/15604
>>  https://github.com/strace/strace/issues/164
>>  https://github.com/rust-lang/libc/issues/1896
>>
>> I guess these userspace programs should now be ready for a new update,
>> but still, it might be better to avoid that if there is a "simple" 
>> solution.
>>
>> I understand changing your userspace tools will be annoying. (On the
>> other hand, it is still time to do that :) )
>
> Agreed!
>
>
> - Mat


Hi Mat and Matthieu,

Thanks very much for your feedback!  The reasons you all have provided 
are already quite convincing.
In fact, as I mentioned earlier, I actually don't have sufficient 
grounds to insist on 263.  It seems it's time for a change. 😉

Regarding the new value of IPPROTO_SMC, do you have any recommendations?
Which one might be better, 256 or 261?

Best wishes,
D. Wythe



  reply	other threads:[~2024-06-07 19:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 12:56 [PATCH net-next v6 0/3] Introduce IPPROTO_SMC D. Wythe
2024-06-05 12:56 ` [PATCH net-next v6 1/3] net/smc: refactoring initialization of smc sock D. Wythe
2024-06-05 12:56 ` [PATCH net-next v6 2/3] net/smc: expose smc proto operations D. Wythe
2024-06-05 12:56 ` [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC D. Wythe
2024-06-06 21:22   ` Mat Martineau
2024-06-07  5:09     ` D. Wythe
2024-06-07 14:47       ` Matthieu Baerts
2024-06-07 16:47         ` Mat Martineau
2024-06-07 19:35           ` D. Wythe [this message]
2024-06-07 20:32             ` Mat Martineau
2024-06-06 20:26 ` [PATCH net-next v6 0/3] " Wenjia Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6b66001-f3cb-4367-aeaf-600fbc5f77b2@linux.alibaba.com \
    --to=alibuda@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=guwen@linux.alibaba.com \
    --cc=jaka@linux.ibm.com \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=martineau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=wenjia@linux.ibm.com \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox