From: Tony Lu <tonylu@linux.alibaba.com>
To: Karsten Graul <kgraul@linux.ibm.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
davem@davemloft.net, netdev@vger.kernel.org,
linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org,
jacob.qi@linux.alibaba.com, xuanzhuo@linux.alibaba.com,
guwen@linux.alibaba.com, dust.li@linux.alibaba.com
Subject: Re: [PATCH net 1/4] Revert "net/smc: don't wait for send buffer space when data was already sent"
Date: Wed, 3 Nov 2021 11:06:06 +0800 [thread overview]
Message-ID: <YYH8npT0+ww57Gg1@TonyMac-Alibaba> (raw)
In-Reply-To: <ca2a567b-915e-c4e1-96cf-2c03ff74aad5@linux.ibm.com>
On Tue, Nov 02, 2021 at 10:17:15AM +0100, Karsten Graul wrote:
> On 01/11/2021 08:04, Tony Lu wrote:
> > On Thu, Oct 28, 2021 at 07:38:27AM -0700, Jakub Kicinski wrote:
> >> On Thu, 28 Oct 2021 13:57:55 +0200 Karsten Graul wrote:
> >>> So how to deal with all of this? Is it an accepted programming error
> >>> when a user space program gets itself into this kind of situation?
> >>> Since this problem depends on internal send/recv buffer sizes such a
> >>> program might work on one system but not on other systems.
> >>
> >> It's a gray area so unless someone else has a strong opinion we can
> >> leave it as is.
> >
> > Things might be different. IMHO, the key point of this problem is to
> > implement the "standard" POSIX socket API, or TCP-socket compatible API.
> >
> >>> At the end the question might be if either such kind of a 'deadlock'
> >>> is acceptable, or if it is okay to have send() return lesser bytes
> >>> than requested.
> >>
> >> Yeah.. the thing is we have better APIs for applications to ask not to
> >> block than we do for applications to block. If someone really wants to
> >> wait for all data to come out for performance reasons they will
> >> struggle to get that behavior.
> >
> > IMO, it is better to do something to unify this behavior. Some
> > applications like netperf would be broken, and the people who want to use
> > SMC to run basic benchmark, would be confused about this, and its
> > compatibility with TCP. Maybe we could:
> > 1) correct the behavior of netperf to check the rc as we discussed.
> > 2) "copy" the behavior of TCP, and try to compatiable with TCP, though
> > it is a gray area.
>
> I have a strong opinion here, so when the question is if the user either
> encounters a deadlock or if send() returns lesser bytes than requested,
> I prefer the latter behavior.
> The second case is much easier to debug for users, they can do something
> to handle the problem (loop around send()), and this case can even be detected
> using strace. But the deadlock case is nearly not debuggable by users and there
> is nothing to prevent it when the workload pattern runs into this situation
> (except to not use blocking sends).
I agree with you. I am curious about this deadlock scene. If it was
convenient, could you provide a reproducible test case? We are also
setting up a SMC CI/CD system to find the compatible and performance
fallback problems. Maybe we could do something to make it better.
Cheers,
Tony Lu
next prev parent reply other threads:[~2021-11-03 3:06 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-27 8:52 [PATCH net 0/4] Fixes for SMC Tony Lu
2021-10-27 8:52 ` [PATCH net 1/4] Revert "net/smc: don't wait for send buffer space when data was already sent" Tony Lu
2021-10-27 10:21 ` Karsten Graul
2021-10-27 15:08 ` Jakub Kicinski
2021-10-27 15:38 ` Karsten Graul
2021-10-27 15:47 ` Jakub Kicinski
2021-10-28 6:48 ` Tony Lu
2021-10-28 11:57 ` Karsten Graul
2021-10-28 14:38 ` Jakub Kicinski
2021-11-01 7:04 ` Tony Lu
2021-11-02 9:17 ` Karsten Graul
2021-11-03 3:06 ` Tony Lu [this message]
2021-11-06 12:46 ` Karsten Graul
2021-10-27 8:52 ` [PATCH net 2/4] net/smc: Fix smc_link->llc_testlink_time overflow Tony Lu
2021-10-27 10:24 ` Karsten Graul
2021-10-28 6:52 ` Tony Lu
2021-10-27 8:52 ` [PATCH net 3/4] net/smc: Correct spelling mistake to TCPF_SYN_RECV Tony Lu
2021-10-27 10:23 ` Karsten Graul
2021-10-28 6:53 ` Tony Lu
2021-10-27 8:52 ` [PATCH net 4/4] net/smc: Fix wq mismatch issue caused by smc fallback Tony Lu
2021-10-28 14:16 ` Karsten Graul
2021-10-29 9:40 ` Karsten Graul
2021-11-01 6:15 ` Wen Gu
2021-11-02 9:25 ` Karsten Graul
2021-11-03 8:56 ` Wen Gu
2021-11-04 4:39 ` Wen Gu
2021-11-06 12:51 ` Karsten Graul
2021-11-10 12:50 ` Wen Gu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYH8npT0+ww57Gg1@TonyMac-Alibaba \
--to=tonylu@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=dust.li@linux.alibaba.com \
--cc=guwen@linux.alibaba.com \
--cc=jacob.qi@linux.alibaba.com \
--cc=kgraul@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox