From: Wen Gu <guwen@linux.alibaba.com>
To: Karsten Graul <kgraul@linux.ibm.com>,
davem@davemloft.net, kuba@kernel.org
Cc: linux-s390@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, dust.li@linux.alibaba.com,
tonylu@linux.alibaba.com
Subject: Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear
Date: Fri, 31 Dec 2021 17:45:27 +0800 [thread overview]
Message-ID: <2c056f5a-0cd1-e7a6-6318-b2368946ae96@linux.alibaba.com> (raw)
In-Reply-To: <7311029c-2c56-d9c7-9ed5-87bc6a36511f@linux.ibm.com>
Thanks for your reply.
On 2021/12/29 8:51 pm, Karsten Graul wrote:
> On 28/12/2021 16:13, Wen Gu wrote:
>> We encountered some crashes caused by the race between SMC-R
>> link access and link clear triggered by link group termination
>> in abnormal case, like port error.
>
> Without to dig deeper into this, there is already a refcount for links, see smc_wr_tx_link_hold().
> In smc_wr_free_link() there are waits for the refcounts to become zero.
>
Thanks for reminding. we also noticed link->wr_tx_refcnt when trying to fix this issue.
In my humble opinions, link->wr_tx_refcnt is used for fixing the race between the waiters for a
tx work request buffer (mainly LLC/CDC msgs) and the link down processing that finally clears the
link.
But the issue in this patch is about the race between the access to link by the connections
above it (like in listen or connect processing) and the link clear processing that memset the link
as zero and release the resource. So it seems that the two should not share the same reference count?
> Why do you need to introduce another refcounting instead of using the existing?
> And if you have a good reason, do we still need the existing refcounting with your new
> implementation?
>
Yes, we still need it.
In my humble opinion, link->wr_tx_refcnt can ensure that the CDC/LLC message sends won't wait for
an already cleared link. And LLC messages may be triggered by underlying events like net device
ports add/error.
But this patch's implementation only ensures that the access to link by connections is safe and
smc connections won't get something that already freed during its life cycle, like in listen/connect
processing. It can't cover the link access by LLC messages, which may be triggered by underlying
events.
> Maybe its enough to use the existing refcounting in the other functions like smc_llc_flow_initiate()?
>
> Btw: it is interesting what kind of crashes you see, we never met them in our setup.
This kind of crashes and the link group free crashes mentioned in the [1/2] patch can be reproduced
by up/down net device frequently during the testing.
> Its great to see you evaluating SMC in a cloud environment!
Thanks! Hope that SMC will be widely used. It is an amazing protocal!
Cheers,
Wen Gu
next prev parent reply other threads:[~2021-12-31 9:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-28 15:13 [RFC PATCH net v2 0/2] net/smc: Fix for race in smc link group termination Wen Gu
2021-12-28 15:13 ` [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination Wen Gu
2021-12-29 12:56 ` Karsten Graul
2021-12-31 9:44 ` Wen Gu
2022-01-03 10:36 ` Karsten Graul
2022-01-05 8:27 ` Wen Gu
2022-01-05 12:03 ` Karsten Graul
2022-01-06 13:02 ` Wen Gu
2022-01-07 9:54 ` Karsten Graul
2022-01-07 12:04 ` Wen Gu
2021-12-28 15:13 ` [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu
2021-12-29 12:51 ` Karsten Graul
2021-12-30 4:00 ` dust.li
2021-12-31 9:45 ` Wen Gu [this message]
2022-01-03 10:39 ` Karsten Graul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c056f5a-0cd1-e7a6-6318-b2368946ae96@linux.alibaba.com \
--to=guwen@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=dust.li@linux.alibaba.com \
--cc=kgraul@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tonylu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox