From: Wenjia Zhang <wenjia@linux.ibm.com>
To: dust.li@linux.alibaba.com, "D. Wythe" <alibuda@linux.alibaba.com>,
kgraul@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com
Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org,
linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
Date: Fri, 13 Oct 2023 13:52:09 +0200 [thread overview]
Message-ID: <6666db42-a4de-425e-a96d-bfa899ab265e@linux.ibm.com> (raw)
In-Reply-To: <20231013053214.GT92403@linux.alibaba.com>
On 13.10.23 07:32, Dust Li wrote:
> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>
>>
>> On 12.10.23 04:37, D. Wythe wrote:
>>>
>>>
>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> Considering scenario:
>>>>>
>>>>> smc_cdc_rx_handler_rwwi
>>>>> __smc_release
>>>>> sock_set_flag
>>>>> smc_close_active()
>>>>> sock_set_flag
>>>>>
>>>>> __set_bit(DEAD) __set_bit(DONE)
>>>>>
>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>>>>> in smc_close_passive_work:
>>>>>
>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>> smc_close_sent_any_close(conn)) {
>>>>> sk->sk_state = SMC_CLOSED;
>>>>> } else {
>>>>> /* just shutdown, but not yet closed locally */
>>>>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>> }
>>>>>
>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>>>> Since set_bit is atomic.
>>>>>
>>>> I didn't really understand the scenario. What is
>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>> during the runtime?
>>>>
>>>
>>> Hi Wenjia,
>>>
>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>> smc_cdc_rx_handler();
>>>
>>> Following is a more specific description of the issues
>>>
>>>
>>> lock_sock()
>>> __smc_release
>>>
>>> smc_cdc_rx_handler()
>>> smc_cdc_msg_recv()
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>> __set_bit __set_bit
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>>
>>>
>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>>> actually used for different purposes and contexts.
>>>
>>>
>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> exclusive. However, since bh_lock_sock() is used, this scenario you described
>> above should not happen, because that gets the sk_lock.slock. Following this
>> scenarios, IMO, only the following situation can happen.
>>
>> lock_sock()
>> __smc_release
>>
>> smc_cdc_rx_handler()
>> smc_cdc_msg_recv()
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE)
>> bh_unlock_sock()
>> sock_set_flag(DEAD)
>> release_sock()
>
> Hi wenjia,
>
> I think I know what D. Wythe means now, and I think he is right on this.
>
> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>
> PROCESS CONTEXT INTERRUPT CONTEXT
> ------------------------------------------------------------------------
> lock_sock()
> spin_lock_bh(&sk->sk_lock.slock);
> ...
> sk->sk_lock.owned = 1;
> // here the spinlock is released
> spin_unlock_bh(&sk->sk_lock.slock);
> __smc_release()
> bh_lock_sock(&smc->sk);
> smc_cdc_msg_recv_action(smc, cdc);
> sock_set_flag(&smc->sk, SOCK_DONE);
> bh_unlock_sock(&smc->sk);
>
> sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE)
> release_sock()
>
> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
> after lock_sock() return. Therefor, there is actually no lock between
> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
> before or after sock_set_flag(DONE).
>
>
> Actually, in TCP, the interrupt context will check sock_owned_by_user().
> If it returns true, the softirq just defer the process to backlog, and process
> that in release_sock(). Which avoid the race between softirq and process
> when visiting the 'struct sock'.
>
> tcp_v4_rcv()
> bh_lock_sock_nested(sk);
> tcp_segs_in(tcp_sk(sk), skb);
> ret = 0;
> if (!sock_owned_by_user(sk)) {
> ret = tcp_v4_do_rcv(sk, skb);
> } else {
> if (tcp_add_backlog(sk, skb, &drop_reason))
> goto discard_and_relse;
> }
> bh_unlock_sock(sk);
>
>
> But in SMC we don't have a backlog, that means fields in 'struct sock'
> might all have race, and this sock_set_flag() is just one of the cases.
>
> Best regards,
> Dust
>
I agree on your description above.
Sure, the following case 1) can also happen
case 1)
-------
lock_sock()
__smc_release
sock_set_flag(DEAD)
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE)
bh_unlock_sock()
release_sock()
case 2)
-------
lock_sock()
__smc_release
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE) sock_set_flag(DEAD)
__set_bit __set_bit
bh_unlock_sock()
release_sock()
My point here is that case2) can never happen. i.e that
sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently.
Thus, how could
the atomic set help make sure that the Dead flag would not be
overwritten with DONE?
Maybe I'm the only one who is getting stuck in the problem. I'll
aprieciate if you can help me get out :P
Thanks,
Wenjia
>
>
>>
>>>
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> ---
>>>>> net/smc/af_smc.c | 4 ++--
>>>>> net/smc/smc.h | 5 +++++
>>>>> net/smc/smc_cdc.c | 2 +-
>>>>> net/smc/smc_close.c | 2 +-
>>>>> 4 files changed, 9 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>>> index bacdd97..5ad2a9f 100644
>>>>> --- a/net/smc/af_smc.c
>>>>> +++ b/net/smc/af_smc.c
>>>>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>>>>> if (!smc->use_fallback) {
>>>>> rc = smc_close_active(smc);
>>>>> - sock_set_flag(sk, SOCK_DEAD);
>>>>> + smc_sock_set_flag(sk, SOCK_DEAD);
>>>>> sk->sk_shutdown |= SHUTDOWN_MASK;
>>>>> } else {
>>>>> if (sk->sk_state != SMC_CLOSED) {
>>>>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct
>>>>> smc_sock *lsmc, struct smc_sock **new_smc)
>>>>> if (new_clcsock)
>>>>> sock_release(new_clcsock);
>>>>> new_sk->sk_state = SMC_CLOSED;
>>>>> - sock_set_flag(new_sk, SOCK_DEAD);
>>>>> + smc_sock_set_flag(new_sk, SOCK_DEAD);
>>>>> sock_put(new_sk); /* final */
>>>>> *new_smc = NULL;
>>>>> goto out;
>>>>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>>>>> index 24745fd..e377980 100644
>>>>> --- a/net/smc/smc.h
>>>>> +++ b/net/smc/smc.h
>>>>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>>>>> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>>>>> genl_info *info);
>>>>> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>>>>> genl_info *info);
>>>>> +static inline void smc_sock_set_flag(struct sock *sk, enum
>>>>> sock_flags flag)
>>>>> +{
>>>>> + set_bit(flag, &sk->sk_flags);
>>>>> +}
>>>>> +
>>>>> #endif /* __SMC_H */
>>>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>>>> index 89105e9..01bdb79 100644
>>>>> --- a/net/smc/smc_cdc.c
>>>>> +++ b/net/smc/smc_cdc.c
>>>>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>>>>> smc_sock *smc,
>>>>> smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>>>>> if (smc->clcsock && smc->clcsock->sk)
>>>>> smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>>>>> - sock_set_flag(&smc->sk, SOCK_DONE);
>>>>> + smc_sock_set_flag(&smc->sk, SOCK_DONE);
>>>>> sock_hold(&smc->sk); /* sock_put in close_work */
>>>>> if (!queue_work(smc_close_wq, &conn->close_work))
>>>>> sock_put(&smc->sk);
>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>> index dbdf03e..449ef45 100644
>>>>> --- a/net/smc/smc_close.c
>>>>> +++ b/net/smc/smc_close.c
>>>>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>>>>> break;
>>>>> }
>>>>> - sock_set_flag(sk, SOCK_DEAD);
>>>>> + smc_sock_set_flag(sk, SOCK_DEAD);
>>>>> sk->sk_state_change(sk);
>>>>> if (release_clcsock) {
>>>
next prev parent reply other threads:[~2023-10-13 11:52 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
2023-10-11 7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
2023-10-11 14:00 ` Dust Li
2023-10-11 20:31 ` Wenjia Zhang
2023-10-12 2:47 ` D. Wythe
[not found] ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
2023-10-12 11:51 ` Wenjia Zhang
2023-10-13 5:32 ` Dust Li
2023-10-13 11:52 ` Wenjia Zhang [this message]
2023-10-13 12:27 ` Dust Li
2023-10-17 2:00 ` D. Wythe
2023-10-17 8:39 ` Dust Li
2023-10-17 17:03 ` Wenjia Zhang
[not found] ` <4065e94f-f7ea-7943-e2cc-0c7d3f9c788b@linux.alibaba.com>
2023-10-19 11:54 ` Wenjia Zhang
2023-10-23 20:53 ` Wenjia Zhang
2023-10-11 7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
2023-10-11 8:44 ` Heiko Carstens
2023-10-11 8:57 ` D. Wythe
2023-10-11 7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
2023-10-11 20:37 ` Wenjia Zhang
2023-10-12 2:49 ` D. Wythe
2023-10-12 15:15 ` Wenjia Zhang
2023-10-11 7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
2023-10-12 17:14 ` Wenjia Zhang
2023-10-31 3:04 ` D. Wythe
2023-10-11 7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
2023-10-11 14:54 ` Dust Li
2023-10-12 19:04 ` Wenjia Zhang
[not found] ` <ee641ca5-104b-d1ec-5b2a-e20237c5378a@linux.alibaba.com>
2023-10-18 20:26 ` Wenjia Zhang
2023-10-19 7:33 ` D. Wythe
2023-10-19 17:40 ` Wenjia Zhang
2023-10-20 2:41 ` D. Wythe
2023-10-23 8:19 ` Wenjia Zhang
2023-10-23 8:52 ` D. Wythe
2023-10-23 10:28 ` Wenjia Zhang
2023-10-23 11:56 ` Dust Li
[not found] ` <59c0c75f-e9df-2ef1-ead2-7c5c97f3e750@linux.alibaba.com>
2023-10-23 20:52 ` Wenjia Zhang
2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
2023-10-17 1:56 ` D. Wythe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6666db42-a4de-425e-a96d-bfa899ab265e@linux.ibm.com \
--to=wenjia@linux.ibm.com \
--cc=alibuda@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=dust.li@linux.alibaba.com \
--cc=jaka@linux.ibm.com \
--cc=kgraul@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=wintera@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).