* [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
@ 2023-10-11 7:33 ` D. Wythe
2023-10-11 14:00 ` Dust Li
` (2 more replies)
2023-10-11 7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
` (4 subsequent siblings)
5 siblings, 3 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11 7:33 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe
From: "D. Wythe" <alibuda@linux.alibaba.com>
Considering scenario:
smc_cdc_rx_handler_rwwi
__smc_release
sock_set_flag
smc_close_active()
sock_set_flag
__set_bit(DEAD) __set_bit(DONE)
Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
if the DEAD flag lost, the state SMC_CLOSED will be never be reached
in smc_close_passive_work:
if (sock_flag(sk, SOCK_DEAD) &&
smc_close_sent_any_close(conn)) {
sk->sk_state = SMC_CLOSED;
} else {
/* just shutdown, but not yet closed locally */
sk->sk_state = SMC_APPFINCLOSEWAIT;
}
Replace sock_set_flags or __set_bit to set_bit will fix this problem.
Since set_bit is atomic.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/af_smc.c | 4 ++--
net/smc/smc.h | 5 +++++
net/smc/smc_cdc.c | 2 +-
net/smc/smc_close.c | 2 +-
4 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index bacdd97..5ad2a9f 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
if (!smc->use_fallback) {
rc = smc_close_active(smc);
- sock_set_flag(sk, SOCK_DEAD);
+ smc_sock_set_flag(sk, SOCK_DEAD);
sk->sk_shutdown |= SHUTDOWN_MASK;
} else {
if (sk->sk_state != SMC_CLOSED) {
@@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
if (new_clcsock)
sock_release(new_clcsock);
new_sk->sk_state = SMC_CLOSED;
- sock_set_flag(new_sk, SOCK_DEAD);
+ smc_sock_set_flag(new_sk, SOCK_DEAD);
sock_put(new_sk); /* final */
*new_smc = NULL;
goto out;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 24745fd..e377980 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
+static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
+{
+ set_bit(flag, &sk->sk_flags);
+}
+
#endif /* __SMC_H */
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 89105e9..01bdb79 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
smc->sk.sk_shutdown |= RCV_SHUTDOWN;
if (smc->clcsock && smc->clcsock->sk)
smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
- sock_set_flag(&smc->sk, SOCK_DONE);
+ smc_sock_set_flag(&smc->sk, SOCK_DONE);
sock_hold(&smc->sk); /* sock_put in close_work */
if (!queue_work(smc_close_wq, &conn->close_work))
sock_put(&smc->sk);
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index dbdf03e..449ef45 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
break;
}
- sock_set_flag(sk, SOCK_DEAD);
+ smc_sock_set_flag(sk, SOCK_DEAD);
sk->sk_state_change(sk);
if (release_clcsock) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-11 7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
@ 2023-10-11 14:00 ` Dust Li
2023-10-11 20:31 ` Wenjia Zhang
2023-10-23 20:53 ` Wenjia Zhang
2 siblings, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-11 14:00 UTC (permalink / raw)
To: D. Wythe, kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On Wed, Oct 11, 2023 at 03:33:16PM +0800, D. Wythe wrote:
>From: "D. Wythe" <alibuda@linux.alibaba.com>
>
>Considering scenario:
>
> smc_cdc_rx_handler_rwwi
>__smc_release
> sock_set_flag
>smc_close_active()
>sock_set_flag
>
>__set_bit(DEAD) __set_bit(DONE)
If I understand correctly, both operations should hold sock_lock,
that means thay should not race, have I missed something ?
>
>Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>in smc_close_passive_work:
>
>if (sock_flag(sk, SOCK_DEAD) &&
> smc_close_sent_any_close(conn)) {
> sk->sk_state = SMC_CLOSED;
>} else {
> /* just shutdown, but not yet closed locally */
> sk->sk_state = SMC_APPFINCLOSEWAIT;
>}
>
>Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>Since set_bit is atomic.
>
You should add a fixes tag.
>Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>---
> net/smc/af_smc.c | 4 ++--
> net/smc/smc.h | 5 +++++
> net/smc/smc_cdc.c | 2 +-
> net/smc/smc_close.c | 2 +-
> 4 files changed, 9 insertions(+), 4 deletions(-)
>
>diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>index bacdd97..5ad2a9f 100644
>--- a/net/smc/af_smc.c
>+++ b/net/smc/af_smc.c
>@@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>
> if (!smc->use_fallback) {
> rc = smc_close_active(smc);
>- sock_set_flag(sk, SOCK_DEAD);
>+ smc_sock_set_flag(sk, SOCK_DEAD);
> sk->sk_shutdown |= SHUTDOWN_MASK;
> } else {
> if (sk->sk_state != SMC_CLOSED) {
>@@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
> if (new_clcsock)
> sock_release(new_clcsock);
> new_sk->sk_state = SMC_CLOSED;
>- sock_set_flag(new_sk, SOCK_DEAD);
>+ smc_sock_set_flag(new_sk, SOCK_DEAD);
> sock_put(new_sk); /* final */
> *new_smc = NULL;
> goto out;
>diff --git a/net/smc/smc.h b/net/smc/smc.h
>index 24745fd..e377980 100644
>--- a/net/smc/smc.h
>+++ b/net/smc/smc.h
>@@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
>
>+static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
>+{
>+ set_bit(flag, &sk->sk_flags);
>+}
>+
> #endif /* __SMC_H */
>diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>index 89105e9..01bdb79 100644
>--- a/net/smc/smc_cdc.c
>+++ b/net/smc/smc_cdc.c
>@@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
> smc->sk.sk_shutdown |= RCV_SHUTDOWN;
> if (smc->clcsock && smc->clcsock->sk)
> smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>- sock_set_flag(&smc->sk, SOCK_DONE);
>+ smc_sock_set_flag(&smc->sk, SOCK_DONE);
> sock_hold(&smc->sk); /* sock_put in close_work */
> if (!queue_work(smc_close_wq, &conn->close_work))
> sock_put(&smc->sk);
>diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>index dbdf03e..449ef45 100644
>--- a/net/smc/smc_close.c
>+++ b/net/smc/smc_close.c
>@@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
> break;
> }
>
>- sock_set_flag(sk, SOCK_DEAD);
>+ smc_sock_set_flag(sk, SOCK_DEAD);
> sk->sk_state_change(sk);
>
> if (release_clcsock) {
>--
>1.8.3.1
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-11 7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
2023-10-11 14:00 ` Dust Li
@ 2023-10-11 20:31 ` Wenjia Zhang
2023-10-12 2:47 ` D. Wythe
[not found] ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
2023-10-23 20:53 ` Wenjia Zhang
2 siblings, 2 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-11 20:31 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Considering scenario:
>
> smc_cdc_rx_handler_rwwi
> __smc_release
> sock_set_flag
> smc_close_active()
> sock_set_flag
>
> __set_bit(DEAD) __set_bit(DONE)
>
> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
> in smc_close_passive_work:
>
> if (sock_flag(sk, SOCK_DEAD) &&
> smc_close_sent_any_close(conn)) {
> sk->sk_state = SMC_CLOSED;
> } else {
> /* just shutdown, but not yet closed locally */
> sk->sk_state = SMC_APPFINCLOSEWAIT;
> }
>
> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
> Since set_bit is atomic.
>
I didn't really understand the scenario. What is
smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock during
the runtime?
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/af_smc.c | 4 ++--
> net/smc/smc.h | 5 +++++
> net/smc/smc_cdc.c | 2 +-
> net/smc/smc_close.c | 2 +-
> 4 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index bacdd97..5ad2a9f 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>
> if (!smc->use_fallback) {
> rc = smc_close_active(smc);
> - sock_set_flag(sk, SOCK_DEAD);
> + smc_sock_set_flag(sk, SOCK_DEAD);
> sk->sk_shutdown |= SHUTDOWN_MASK;
> } else {
> if (sk->sk_state != SMC_CLOSED) {
> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
> if (new_clcsock)
> sock_release(new_clcsock);
> new_sk->sk_state = SMC_CLOSED;
> - sock_set_flag(new_sk, SOCK_DEAD);
> + smc_sock_set_flag(new_sk, SOCK_DEAD);
> sock_put(new_sk); /* final */
> *new_smc = NULL;
> goto out;
> diff --git a/net/smc/smc.h b/net/smc/smc.h
> index 24745fd..e377980 100644
> --- a/net/smc/smc.h
> +++ b/net/smc/smc.h
> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
>
> +static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
> +{
> + set_bit(flag, &sk->sk_flags);
> +}
> +
> #endif /* __SMC_H */
> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> index 89105e9..01bdb79 100644
> --- a/net/smc/smc_cdc.c
> +++ b/net/smc/smc_cdc.c
> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
> smc->sk.sk_shutdown |= RCV_SHUTDOWN;
> if (smc->clcsock && smc->clcsock->sk)
> smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
> - sock_set_flag(&smc->sk, SOCK_DONE);
> + smc_sock_set_flag(&smc->sk, SOCK_DONE);
> sock_hold(&smc->sk); /* sock_put in close_work */
> if (!queue_work(smc_close_wq, &conn->close_work))
> sock_put(&smc->sk);
> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
> index dbdf03e..449ef45 100644
> --- a/net/smc/smc_close.c
> +++ b/net/smc/smc_close.c
> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
> break;
> }
>
> - sock_set_flag(sk, SOCK_DEAD);
> + smc_sock_set_flag(sk, SOCK_DEAD);
> sk->sk_state_change(sk);
>
> if (release_clcsock) {
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-11 20:31 ` Wenjia Zhang
@ 2023-10-12 2:47 ` D. Wythe
[not found] ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
1 sibling, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-12 2:47 UTC (permalink / raw)
To: Wenjia Zhang, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Considering scenario:
>>
>> smc_cdc_rx_handler_rwwi
>> __smc_release
>> sock_set_flag
>> smc_close_active()
>> sock_set_flag
>>
>> __set_bit(DEAD) __set_bit(DONE)
>>
>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>> in smc_close_passive_work:
>>
>> if (sock_flag(sk, SOCK_DEAD) &&
>> smc_close_sent_any_close(conn)) {
>> sk->sk_state = SMC_CLOSED;
>> } else {
>> /* just shutdown, but not yet closed locally */
>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>> }
>>
>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> Since set_bit is atomic.
>>
> I didn't really understand the scenario. What is
> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
> during the runtime?
>
Hi Wenjia,
Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
smc_cdc_rx_handler();
Following is a more specific description of the issues
lock_sock()
__smc_release
smc_cdc_rx_handler()
smc_cdc_msg_recv()
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE) sock_set_flag(DEAD)
__set_bit __set_bit
bh_unlock_sock()
release_sock()
Note : bh_lock_sock and lock_sock are not mutually exclusive.
They are actually used for different purposes and contexts.
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>> net/smc/af_smc.c | 4 ++--
>> net/smc/smc.h | 5 +++++
>> net/smc/smc_cdc.c | 2 +-
>> net/smc/smc_close.c | 2 +-
>> 4 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index bacdd97..5ad2a9f 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>> if (!smc->use_fallback) {
>> rc = smc_close_active(smc);
>> - sock_set_flag(sk, SOCK_DEAD);
>> + smc_sock_set_flag(sk, SOCK_DEAD);
>> sk->sk_shutdown |= SHUTDOWN_MASK;
>> } else {
>> if (sk->sk_state != SMC_CLOSED) {
>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock
>> *lsmc, struct smc_sock **new_smc)
>> if (new_clcsock)
>> sock_release(new_clcsock);
>> new_sk->sk_state = SMC_CLOSED;
>> - sock_set_flag(new_sk, SOCK_DEAD);
>> + smc_sock_set_flag(new_sk, SOCK_DEAD);
>> sock_put(new_sk); /* final */
>> *new_smc = NULL;
>> goto out;
>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>> index 24745fd..e377980 100644
>> --- a/net/smc/smc.h
>> +++ b/net/smc/smc.h
>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>> genl_info *info);
>> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>> genl_info *info);
>> +static inline void smc_sock_set_flag(struct sock *sk, enum
>> sock_flags flag)
>> +{
>> + set_bit(flag, &sk->sk_flags);
>> +}
>> +
>> #endif /* __SMC_H */
>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>> index 89105e9..01bdb79 100644
>> --- a/net/smc/smc_cdc.c
>> +++ b/net/smc/smc_cdc.c
>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>> smc_sock *smc,
>> smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>> if (smc->clcsock && smc->clcsock->sk)
>> smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>> - sock_set_flag(&smc->sk, SOCK_DONE);
>> + smc_sock_set_flag(&smc->sk, SOCK_DONE);
>> sock_hold(&smc->sk); /* sock_put in close_work */
>> if (!queue_work(smc_close_wq, &conn->close_work))
>> sock_put(&smc->sk);
>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>> index dbdf03e..449ef45 100644
>> --- a/net/smc/smc_close.c
>> +++ b/net/smc/smc_close.c
>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>> break;
>> }
>> - sock_set_flag(sk, SOCK_DEAD);
>> + smc_sock_set_flag(sk, SOCK_DEAD);
>> sk->sk_state_change(sk);
>> if (release_clcsock) {
^ permalink raw reply [flat|nested] 38+ messages in thread[parent not found: <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>]
* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
[not found] ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
@ 2023-10-12 11:51 ` Wenjia Zhang
2023-10-13 5:32 ` Dust Li
0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 11:51 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 12.10.23 04:37, D. Wythe wrote:
>
>
> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>
>>
>> On 11.10.23 09:33, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> Considering scenario:
>>>
>>> smc_cdc_rx_handler_rwwi
>>> __smc_release
>>> sock_set_flag
>>> smc_close_active()
>>> sock_set_flag
>>>
>>> __set_bit(DEAD) __set_bit(DONE)
>>>
>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>>> in smc_close_passive_work:
>>>
>>> if (sock_flag(sk, SOCK_DEAD) &&
>>> smc_close_sent_any_close(conn)) {
>>> sk->sk_state = SMC_CLOSED;
>>> } else {
>>> /* just shutdown, but not yet closed locally */
>>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>>> }
>>>
>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>> Since set_bit is atomic.
>>>
>> I didn't really understand the scenario. What is
>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> during the runtime?
>>
>
> Hi Wenjia,
>
> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
> smc_cdc_rx_handler();
>
> Following is a more specific description of the issues
>
>
> lock_sock()
> __smc_release
>
> smc_cdc_rx_handler()
> smc_cdc_msg_recv()
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE) sock_set_flag(DEAD)
> __set_bit __set_bit
> bh_unlock_sock()
> release_sock()
>
>
>
> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
> actually used for different purposes and contexts.
>
>
ok, that's true that |bh_lock_sock|and |lock_sock|are not really
mutually exclusive. However, since bh_lock_sock() is used, this scenario
you described above should not happen, because that gets the
sk_lock.slock. Following this scenarios, IMO, only the following
situation can happen.
lock_sock()
__smc_release
smc_cdc_rx_handler()
smc_cdc_msg_recv()
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE)
bh_unlock_sock()
sock_set_flag(DEAD)
release_sock()
>
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>> net/smc/af_smc.c | 4 ++--
>>> net/smc/smc.h | 5 +++++
>>> net/smc/smc_cdc.c | 2 +-
>>> net/smc/smc_close.c | 2 +-
>>> 4 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index bacdd97..5ad2a9f 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>>> if (!smc->use_fallback) {
>>> rc = smc_close_active(smc);
>>> - sock_set_flag(sk, SOCK_DEAD);
>>> + smc_sock_set_flag(sk, SOCK_DEAD);
>>> sk->sk_shutdown |= SHUTDOWN_MASK;
>>> } else {
>>> if (sk->sk_state != SMC_CLOSED) {
>>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct smc_sock
>>> *lsmc, struct smc_sock **new_smc)
>>> if (new_clcsock)
>>> sock_release(new_clcsock);
>>> new_sk->sk_state = SMC_CLOSED;
>>> - sock_set_flag(new_sk, SOCK_DEAD);
>>> + smc_sock_set_flag(new_sk, SOCK_DEAD);
>>> sock_put(new_sk); /* final */
>>> *new_smc = NULL;
>>> goto out;
>>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>>> index 24745fd..e377980 100644
>>> --- a/net/smc/smc.h
>>> +++ b/net/smc/smc.h
>>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>>> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>>> genl_info *info);
>>> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>>> genl_info *info);
>>> +static inline void smc_sock_set_flag(struct sock *sk, enum
>>> sock_flags flag)
>>> +{
>>> + set_bit(flag, &sk->sk_flags);
>>> +}
>>> +
>>> #endif /* __SMC_H */
>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>> index 89105e9..01bdb79 100644
>>> --- a/net/smc/smc_cdc.c
>>> +++ b/net/smc/smc_cdc.c
>>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>>> smc_sock *smc,
>>> smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>>> if (smc->clcsock && smc->clcsock->sk)
>>> smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>>> - sock_set_flag(&smc->sk, SOCK_DONE);
>>> + smc_sock_set_flag(&smc->sk, SOCK_DONE);
>>> sock_hold(&smc->sk); /* sock_put in close_work */
>>> if (!queue_work(smc_close_wq, &conn->close_work))
>>> sock_put(&smc->sk);
>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>> index dbdf03e..449ef45 100644
>>> --- a/net/smc/smc_close.c
>>> +++ b/net/smc/smc_close.c
>>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>>> break;
>>> }
>>> - sock_set_flag(sk, SOCK_DEAD);
>>> + smc_sock_set_flag(sk, SOCK_DEAD);
>>> sk->sk_state_change(sk);
>>> if (release_clcsock) {
>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-12 11:51 ` Wenjia Zhang
@ 2023-10-13 5:32 ` Dust Li
2023-10-13 11:52 ` Wenjia Zhang
0 siblings, 1 reply; 38+ messages in thread
From: Dust Li @ 2023-10-13 5:32 UTC (permalink / raw)
To: Wenjia Zhang, D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>
>
>On 12.10.23 04:37, D. Wythe wrote:
>>
>>
>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> >
>> >
>> > On 11.10.23 09:33, D. Wythe wrote:
>> > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > >
>> > > Considering scenario:
>> > >
>> > > smc_cdc_rx_handler_rwwi
>> > > __smc_release
>> > > sock_set_flag
>> > > smc_close_active()
>> > > sock_set_flag
>> > >
>> > > __set_bit(DEAD) __set_bit(DONE)
>> > >
>> > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>> > > in smc_close_passive_work:
>> > >
>> > > if (sock_flag(sk, SOCK_DEAD) &&
>> > > smc_close_sent_any_close(conn)) {
>> > > sk->sk_state = SMC_CLOSED;
>> > > } else {
>> > > /* just shutdown, but not yet closed locally */
>> > > sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > }
>> > >
>> > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > Since set_bit is atomic.
>> > >
>> > I didn't really understand the scenario. What is
>> > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > during the runtime?
>> >
>>
>> Hi Wenjia,
>>
>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> smc_cdc_rx_handler();
>>
>> Following is a more specific description of the issues
>>
>>
>> lock_sock()
>> __smc_release
>>
>> smc_cdc_rx_handler()
>> smc_cdc_msg_recv()
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE) sock_set_flag(DEAD)
>> __set_bit __set_bit
>> bh_unlock_sock()
>> release_sock()
>>
>>
>>
>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> actually used for different purposes and contexts.
>>
>>
>ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>exclusive. However, since bh_lock_sock() is used, this scenario you described
>above should not happen, because that gets the sk_lock.slock. Following this
>scenarios, IMO, only the following situation can happen.
>
>lock_sock()
>__smc_release
>
>smc_cdc_rx_handler()
>smc_cdc_msg_recv()
>bh_lock_sock()
>smc_cdc_msg_recv_action()
>sock_set_flag(DONE)
>bh_unlock_sock()
>sock_set_flag(DEAD)
>release_sock()
Hi wenjia,
I think I know what D. Wythe means now, and I think he is right on this.
IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
acquires the lock before bh_lock_sock(). This is how the sock lock works.
PROCESS CONTEXT INTERRUPT CONTEXT
------------------------------------------------------------------------
lock_sock()
spin_lock_bh(&sk->sk_lock.slock);
...
sk->sk_lock.owned = 1;
// here the spinlock is released
spin_unlock_bh(&sk->sk_lock.slock);
__smc_release()
bh_lock_sock(&smc->sk);
smc_cdc_msg_recv_action(smc, cdc);
sock_set_flag(&smc->sk, SOCK_DONE);
bh_unlock_sock(&smc->sk);
sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE)
release_sock()
The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
after lock_sock() return. Therefor, there is actually no lock between
the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
before or after sock_set_flag(DONE).
Actually, in TCP, the interrupt context will check sock_owned_by_user().
If it returns true, the softirq just defer the process to backlog, and process
that in release_sock(). Which avoid the race between softirq and process
when visiting the 'struct sock'.
tcp_v4_rcv()
bh_lock_sock_nested(sk);
tcp_segs_in(tcp_sk(sk), skb);
ret = 0;
if (!sock_owned_by_user(sk)) {
ret = tcp_v4_do_rcv(sk, skb);
} else {
if (tcp_add_backlog(sk, skb, &drop_reason))
goto discard_and_relse;
}
bh_unlock_sock(sk);
But in SMC we don't have a backlog, that means fields in 'struct sock'
might all have race, and this sock_set_flag() is just one of the cases.
Best regards,
Dust
>
>>
>> > > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> > > ---
>> > > net/smc/af_smc.c | 4 ++--
>> > > net/smc/smc.h | 5 +++++
>> > > net/smc/smc_cdc.c | 2 +-
>> > > net/smc/smc_close.c | 2 +-
>> > > 4 files changed, 9 insertions(+), 4 deletions(-)
>> > >
>> > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> > > index bacdd97..5ad2a9f 100644
>> > > --- a/net/smc/af_smc.c
>> > > +++ b/net/smc/af_smc.c
>> > > @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>> > > if (!smc->use_fallback) {
>> > > rc = smc_close_active(smc);
>> > > - sock_set_flag(sk, SOCK_DEAD);
>> > > + smc_sock_set_flag(sk, SOCK_DEAD);
>> > > sk->sk_shutdown |= SHUTDOWN_MASK;
>> > > } else {
>> > > if (sk->sk_state != SMC_CLOSED) {
>> > > @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct
>> > > smc_sock *lsmc, struct smc_sock **new_smc)
>> > > if (new_clcsock)
>> > > sock_release(new_clcsock);
>> > > new_sk->sk_state = SMC_CLOSED;
>> > > - sock_set_flag(new_sk, SOCK_DEAD);
>> > > + smc_sock_set_flag(new_sk, SOCK_DEAD);
>> > > sock_put(new_sk); /* final */
>> > > *new_smc = NULL;
>> > > goto out;
>> > > diff --git a/net/smc/smc.h b/net/smc/smc.h
>> > > index 24745fd..e377980 100644
>> > > --- a/net/smc/smc.h
>> > > +++ b/net/smc/smc.h
>> > > @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>> > > int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>> > > genl_info *info);
>> > > int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>> > > genl_info *info);
>> > > +static inline void smc_sock_set_flag(struct sock *sk, enum
>> > > sock_flags flag)
>> > > +{
>> > > + set_bit(flag, &sk->sk_flags);
>> > > +}
>> > > +
>> > > #endif /* __SMC_H */
>> > > diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>> > > index 89105e9..01bdb79 100644
>> > > --- a/net/smc/smc_cdc.c
>> > > +++ b/net/smc/smc_cdc.c
>> > > @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>> > > smc_sock *smc,
>> > > smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>> > > if (smc->clcsock && smc->clcsock->sk)
>> > > smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>> > > - sock_set_flag(&smc->sk, SOCK_DONE);
>> > > + smc_sock_set_flag(&smc->sk, SOCK_DONE);
>> > > sock_hold(&smc->sk); /* sock_put in close_work */
>> > > if (!queue_work(smc_close_wq, &conn->close_work))
>> > > sock_put(&smc->sk);
>> > > diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>> > > index dbdf03e..449ef45 100644
>> > > --- a/net/smc/smc_close.c
>> > > +++ b/net/smc/smc_close.c
>> > > @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>> > > break;
>> > > }
>> > > - sock_set_flag(sk, SOCK_DEAD);
>> > > + smc_sock_set_flag(sk, SOCK_DEAD);
>> > > sk->sk_state_change(sk);
>> > > if (release_clcsock) {
>>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-13 5:32 ` Dust Li
@ 2023-10-13 11:52 ` Wenjia Zhang
2023-10-13 12:27 ` Dust Li
0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-13 11:52 UTC (permalink / raw)
To: dust.li, D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 13.10.23 07:32, Dust Li wrote:
> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>
>>
>> On 12.10.23 04:37, D. Wythe wrote:
>>>
>>>
>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>
>>>>
>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>
>>>>> Considering scenario:
>>>>>
>>>>> smc_cdc_rx_handler_rwwi
>>>>> __smc_release
>>>>> sock_set_flag
>>>>> smc_close_active()
>>>>> sock_set_flag
>>>>>
>>>>> __set_bit(DEAD) __set_bit(DONE)
>>>>>
>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>>>>> in smc_close_passive_work:
>>>>>
>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>> smc_close_sent_any_close(conn)) {
>>>>> sk->sk_state = SMC_CLOSED;
>>>>> } else {
>>>>> /* just shutdown, but not yet closed locally */
>>>>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>> }
>>>>>
>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>>>> Since set_bit is atomic.
>>>>>
>>>> I didn't really understand the scenario. What is
>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>> during the runtime?
>>>>
>>>
>>> Hi Wenjia,
>>>
>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>> smc_cdc_rx_handler();
>>>
>>> Following is a more specific description of the issues
>>>
>>>
>>> lock_sock()
>>> __smc_release
>>>
>>> smc_cdc_rx_handler()
>>> smc_cdc_msg_recv()
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>> __set_bit __set_bit
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>>
>>>
>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>>> actually used for different purposes and contexts.
>>>
>>>
>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> exclusive. However, since bh_lock_sock() is used, this scenario you described
>> above should not happen, because that gets the sk_lock.slock. Following this
>> scenarios, IMO, only the following situation can happen.
>>
>> lock_sock()
>> __smc_release
>>
>> smc_cdc_rx_handler()
>> smc_cdc_msg_recv()
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE)
>> bh_unlock_sock()
>> sock_set_flag(DEAD)
>> release_sock()
>
> Hi wenjia,
>
> I think I know what D. Wythe means now, and I think he is right on this.
>
> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>
> PROCESS CONTEXT INTERRUPT CONTEXT
> ------------------------------------------------------------------------
> lock_sock()
> spin_lock_bh(&sk->sk_lock.slock);
> ...
> sk->sk_lock.owned = 1;
> // here the spinlock is released
> spin_unlock_bh(&sk->sk_lock.slock);
> __smc_release()
> bh_lock_sock(&smc->sk);
> smc_cdc_msg_recv_action(smc, cdc);
> sock_set_flag(&smc->sk, SOCK_DONE);
> bh_unlock_sock(&smc->sk);
>
> sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE)
> release_sock()
>
> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
> after lock_sock() return. Therefor, there is actually no lock between
> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
> before or after sock_set_flag(DONE).
>
>
> Actually, in TCP, the interrupt context will check sock_owned_by_user().
> If it returns true, the softirq just defer the process to backlog, and process
> that in release_sock(). Which avoid the race between softirq and process
> when visiting the 'struct sock'.
>
> tcp_v4_rcv()
> bh_lock_sock_nested(sk);
> tcp_segs_in(tcp_sk(sk), skb);
> ret = 0;
> if (!sock_owned_by_user(sk)) {
> ret = tcp_v4_do_rcv(sk, skb);
> } else {
> if (tcp_add_backlog(sk, skb, &drop_reason))
> goto discard_and_relse;
> }
> bh_unlock_sock(sk);
>
>
> But in SMC we don't have a backlog, that means fields in 'struct sock'
> might all have race, and this sock_set_flag() is just one of the cases.
>
> Best regards,
> Dust
>
I agree on your description above.
Sure, the following case 1) can also happen
case 1)
-------
lock_sock()
__smc_release
sock_set_flag(DEAD)
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE)
bh_unlock_sock()
release_sock()
case 2)
-------
lock_sock()
__smc_release
bh_lock_sock()
smc_cdc_msg_recv_action()
sock_set_flag(DONE) sock_set_flag(DEAD)
__set_bit __set_bit
bh_unlock_sock()
release_sock()
My point here is that case2) can never happen. i.e that
sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently.
Thus, how could
the atomic set help make sure that the Dead flag would not be
overwritten with DONE?
Maybe I'm the only one who is getting stuck in the problem. I'll
aprieciate if you can help me get out :P
Thanks,
Wenjia
>
>
>>
>>>
>>>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>>>> ---
>>>>> net/smc/af_smc.c | 4 ++--
>>>>> net/smc/smc.h | 5 +++++
>>>>> net/smc/smc_cdc.c | 2 +-
>>>>> net/smc/smc_close.c | 2 +-
>>>>> 4 files changed, 9 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>>> index bacdd97..5ad2a9f 100644
>>>>> --- a/net/smc/af_smc.c
>>>>> +++ b/net/smc/af_smc.c
>>>>> @@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
>>>>> if (!smc->use_fallback) {
>>>>> rc = smc_close_active(smc);
>>>>> - sock_set_flag(sk, SOCK_DEAD);
>>>>> + smc_sock_set_flag(sk, SOCK_DEAD);
>>>>> sk->sk_shutdown |= SHUTDOWN_MASK;
>>>>> } else {
>>>>> if (sk->sk_state != SMC_CLOSED) {
>>>>> @@ -1742,7 +1742,7 @@ static int smc_clcsock_accept(struct
>>>>> smc_sock *lsmc, struct smc_sock **new_smc)
>>>>> if (new_clcsock)
>>>>> sock_release(new_clcsock);
>>>>> new_sk->sk_state = SMC_CLOSED;
>>>>> - sock_set_flag(new_sk, SOCK_DEAD);
>>>>> + smc_sock_set_flag(new_sk, SOCK_DEAD);
>>>>> sock_put(new_sk); /* final */
>>>>> *new_smc = NULL;
>>>>> goto out;
>>>>> diff --git a/net/smc/smc.h b/net/smc/smc.h
>>>>> index 24745fd..e377980 100644
>>>>> --- a/net/smc/smc.h
>>>>> +++ b/net/smc/smc.h
>>>>> @@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
>>>>> int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct
>>>>> genl_info *info);
>>>>> int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct
>>>>> genl_info *info);
>>>>> +static inline void smc_sock_set_flag(struct sock *sk, enum
>>>>> sock_flags flag)
>>>>> +{
>>>>> + set_bit(flag, &sk->sk_flags);
>>>>> +}
>>>>> +
>>>>> #endif /* __SMC_H */
>>>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>>>> index 89105e9..01bdb79 100644
>>>>> --- a/net/smc/smc_cdc.c
>>>>> +++ b/net/smc/smc_cdc.c
>>>>> @@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct
>>>>> smc_sock *smc,
>>>>> smc->sk.sk_shutdown |= RCV_SHUTDOWN;
>>>>> if (smc->clcsock && smc->clcsock->sk)
>>>>> smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
>>>>> - sock_set_flag(&smc->sk, SOCK_DONE);
>>>>> + smc_sock_set_flag(&smc->sk, SOCK_DONE);
>>>>> sock_hold(&smc->sk); /* sock_put in close_work */
>>>>> if (!queue_work(smc_close_wq, &conn->close_work))
>>>>> sock_put(&smc->sk);
>>>>> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>>>>> index dbdf03e..449ef45 100644
>>>>> --- a/net/smc/smc_close.c
>>>>> +++ b/net/smc/smc_close.c
>>>>> @@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
>>>>> break;
>>>>> }
>>>>> - sock_set_flag(sk, SOCK_DEAD);
>>>>> + smc_sock_set_flag(sk, SOCK_DEAD);
>>>>> sk->sk_state_change(sk);
>>>>> if (release_clcsock) {
>>>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-13 11:52 ` Wenjia Zhang
@ 2023-10-13 12:27 ` Dust Li
2023-10-17 2:00 ` D. Wythe
0 siblings, 1 reply; 38+ messages in thread
From: Dust Li @ 2023-10-13 12:27 UTC (permalink / raw)
To: Wenjia Zhang, D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>
>
>On 13.10.23 07:32, Dust Li wrote:
>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>> >
>> >
>> > On 12.10.23 04:37, D. Wythe wrote:
>> > >
>> > >
>> > > On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> > > >
>> > > >
>> > > > On 11.10.23 09:33, D. Wythe wrote:
>> > > > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > > >
>> > > > > Considering scenario:
>> > > > >
>> > > > > smc_cdc_rx_handler_rwwi
>> > > > > __smc_release
>> > > > > sock_set_flag
>> > > > > smc_close_active()
>> > > > > sock_set_flag
>> > > > >
>> > > > > __set_bit(DEAD) __set_bit(DONE)
>> > > > >
>> > > > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > > > if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>> > > > > in smc_close_passive_work:
>> > > > >
>> > > > > if (sock_flag(sk, SOCK_DEAD) &&
>> > > > > smc_close_sent_any_close(conn)) {
>> > > > > sk->sk_state = SMC_CLOSED;
>> > > > > } else {
>> > > > > /* just shutdown, but not yet closed locally */
>> > > > > sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > > > }
>> > > > >
>> > > > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > > > Since set_bit is atomic.
>> > > > >
>> > > > I didn't really understand the scenario. What is
>> > > > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > > > during the runtime?
>> > > >
>> > >
>> > > Hi Wenjia,
>> > >
>> > > Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> > > smc_cdc_rx_handler();
>> > >
>> > > Following is a more specific description of the issues
>> > >
>> > >
>> > > lock_sock()
>> > > __smc_release
>> > >
>> > > smc_cdc_rx_handler()
>> > > smc_cdc_msg_recv()
>> > > bh_lock_sock()
>> > > smc_cdc_msg_recv_action()
>> > > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > > __set_bit __set_bit
>> > > bh_unlock_sock()
>> > > release_sock()
>> > >
>> > >
>> > >
>> > > Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> > > actually used for different purposes and contexts.
>> > >
>> > >
>> > ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> > exclusive. However, since bh_lock_sock() is used, this scenario you described
>> > above should not happen, because that gets the sk_lock.slock. Following this
>> > scenarios, IMO, only the following situation can happen.
>> >
>> > lock_sock()
>> > __smc_release
>> >
>> > smc_cdc_rx_handler()
>> > smc_cdc_msg_recv()
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE)
>> > bh_unlock_sock()
>> > sock_set_flag(DEAD)
>> > release_sock()
>>
>> Hi wenjia,
>>
>> I think I know what D. Wythe means now, and I think he is right on this.
>>
>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>>
>> PROCESS CONTEXT INTERRUPT CONTEXT
>> ------------------------------------------------------------------------
>> lock_sock()
>> spin_lock_bh(&sk->sk_lock.slock);
>> ...
>> sk->sk_lock.owned = 1;
>> // here the spinlock is released
>> spin_unlock_bh(&sk->sk_lock.slock);
>> __smc_release()
>> bh_lock_sock(&smc->sk);
>> smc_cdc_msg_recv_action(smc, cdc);
>> sock_set_flag(&smc->sk, SOCK_DONE);
>> bh_unlock_sock(&smc->sk);
>>
>> sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE)
>> release_sock()
>>
>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>> after lock_sock() return. Therefor, there is actually no lock between
>> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>> before or after sock_set_flag(DONE).
>>
>>
>> Actually, in TCP, the interrupt context will check sock_owned_by_user().
>> If it returns true, the softirq just defer the process to backlog, and process
>> that in release_sock(). Which avoid the race between softirq and process
>> when visiting the 'struct sock'.
>>
>> tcp_v4_rcv()
>> bh_lock_sock_nested(sk);
>> tcp_segs_in(tcp_sk(sk), skb);
>> ret = 0;
>> if (!sock_owned_by_user(sk)) {
>> ret = tcp_v4_do_rcv(sk, skb);
>> } else {
>> if (tcp_add_backlog(sk, skb, &drop_reason))
>> goto discard_and_relse;
>> }
>> bh_unlock_sock(sk);
>>
>>
>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>> might all have race, and this sock_set_flag() is just one of the cases.
>>
>> Best regards,
>> Dust
>>
>I agree on your description above.
>Sure, the following case 1) can also happen
>
>case 1)
>-------
> lock_sock()
> __smc_release
>
> sock_set_flag(DEAD)
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE)
> bh_unlock_sock()
> release_sock()
>
>case 2)
>-------
> lock_sock()
> __smc_release
>
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE) sock_set_flag(DEAD)
> __set_bit __set_bit
> bh_unlock_sock()
> release_sock()
>
>My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>the atomic set help make sure that the Dead flag would not be overwritten
>with DONE?
I agree with you on this. I also don't see using atomic can
solve the problem of overwriting the DEAD flag with DONE.
I think we need some mechanisms to ensure that sk_flags and other
struct sock related fields are not modified simultaneously.
Best regards,
Dust
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-13 12:27 ` Dust Li
@ 2023-10-17 2:00 ` D. Wythe
2023-10-17 8:39 ` Dust Li
2023-10-17 17:03 ` Wenjia Zhang
0 siblings, 2 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-17 2:00 UTC (permalink / raw)
To: dust.li, Wenjia Zhang, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 10/13/23 8:27 PM, Dust Li wrote:
> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>>
>> On 13.10.23 07:32, Dust Li wrote:
>>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>>>
>>>> On 12.10.23 04:37, D. Wythe wrote:
>>>>>
>>>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>>>
>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>
>>>>>>> Considering scenario:
>>>>>>>
>>>>>>> smc_cdc_rx_handler_rwwi
>>>>>>> __smc_release
>>>>>>> sock_set_flag
>>>>>>> smc_close_active()
>>>>>>> sock_set_flag
>>>>>>>
>>>>>>> __set_bit(DEAD) __set_bit(DONE)
>>>>>>>
>>>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>>>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>>>>>>> in smc_close_passive_work:
>>>>>>>
>>>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>>>> smc_close_sent_any_close(conn)) {
>>>>>>> sk->sk_state = SMC_CLOSED;
>>>>>>> } else {
>>>>>>> /* just shutdown, but not yet closed locally */
>>>>>>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>>>> }
>>>>>>>
>>>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>>>>>>> Since set_bit is atomic.
>>>>>>>
>>>>>> I didn't really understand the scenario. What is
>>>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>>>> during the runtime?
>>>>>>
>>>>> Hi Wenjia,
>>>>>
>>>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>>>> smc_cdc_rx_handler();
>>>>>
>>>>> Following is a more specific description of the issues
>>>>>
>>>>>
>>>>> lock_sock()
>>>>> __smc_release
>>>>>
>>>>> smc_cdc_rx_handler()
>>>>> smc_cdc_msg_recv()
>>>>> bh_lock_sock()
>>>>> smc_cdc_msg_recv_action()
>>>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>>>> __set_bit __set_bit
>>>>> bh_unlock_sock()
>>>>> release_sock()
>>>>>
>>>>>
>>>>>
>>>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>>>>> actually used for different purposes and contexts.
>>>>>
>>>>>
>>>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>>>> exclusive. However, since bh_lock_sock() is used, this scenario you described
>>>> above should not happen, because that gets the sk_lock.slock. Following this
>>>> scenarios, IMO, only the following situation can happen.
>>>>
>>>> lock_sock()
>>>> __smc_release
>>>>
>>>> smc_cdc_rx_handler()
>>>> smc_cdc_msg_recv()
>>>> bh_lock_sock()
>>>> smc_cdc_msg_recv_action()
>>>> sock_set_flag(DONE)
>>>> bh_unlock_sock()
>>>> sock_set_flag(DEAD)
>>>> release_sock()
>>> Hi wenjia,
>>>
>>> I think I know what D. Wythe means now, and I think he is right on this.
>>>
>>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>>> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>>>
>>> PROCESS CONTEXT INTERRUPT CONTEXT
>>> ------------------------------------------------------------------------
>>> lock_sock()
>>> spin_lock_bh(&sk->sk_lock.slock);
>>> ...
>>> sk->sk_lock.owned = 1;
>>> // here the spinlock is released
>>> spin_unlock_bh(&sk->sk_lock.slock);
>>> __smc_release()
>>> bh_lock_sock(&smc->sk);
>>> smc_cdc_msg_recv_action(smc, cdc);
>>> sock_set_flag(&smc->sk, SOCK_DONE);
>>> bh_unlock_sock(&smc->sk);
>>>
>>> sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE)
>>> release_sock()
>>>
>>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>>> after lock_sock() return. Therefor, there is actually no lock between
>>> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>>> before or after sock_set_flag(DONE).
>>>
>>>
>>> Actually, in TCP, the interrupt context will check sock_owned_by_user().
>>> If it returns true, the softirq just defer the process to backlog, and process
>>> that in release_sock(). Which avoid the race between softirq and process
>>> when visiting the 'struct sock'.
>>>
>>> tcp_v4_rcv()
>>> bh_lock_sock_nested(sk);
>>> tcp_segs_in(tcp_sk(sk), skb);
>>> ret = 0;
>>> if (!sock_owned_by_user(sk)) {
>>> ret = tcp_v4_do_rcv(sk, skb);
>>> } else {
>>> if (tcp_add_backlog(sk, skb, &drop_reason))
>>> goto discard_and_relse;
>>> }
>>> bh_unlock_sock(sk);
>>>
>>>
>>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>>> might all have race, and this sock_set_flag() is just one of the cases.
>>>
>>> Best regards,
>>> Dust
>>>
>> I agree on your description above.
>> Sure, the following case 1) can also happen
>>
>> case 1)
>> -------
>> lock_sock()
>> __smc_release
>>
>> sock_set_flag(DEAD)
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE)
>> bh_unlock_sock()
>> release_sock()
>>
>> case 2)
>> -------
>> lock_sock()
>> __smc_release
>>
>> bh_lock_sock()
>> smc_cdc_msg_recv_action()
>> sock_set_flag(DONE) sock_set_flag(DEAD)
>> __set_bit __set_bit
>> bh_unlock_sock()
>> release_sock()
>>
>> My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>> and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>> the atomic set help make sure that the Dead flag would not be overwritten
>> with DONE?
> I agree with you on this. I also don't see using atomic can
> solve the problem of overwriting the DEAD flag with DONE.
>
> I think we need some mechanisms to ensure that sk_flags and other
> struct sock related fields are not modified simultaneously.
>
> Best regards,
> Dust
It seems that everyone has agrees on that case 2 is impossible. I'm a
bit confused, why that
sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently.
What mechanism
prevents their parallel execution?
Best wishes,
D. Wythe
>
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-17 2:00 ` D. Wythe
@ 2023-10-17 8:39 ` Dust Li
2023-10-17 17:03 ` Wenjia Zhang
1 sibling, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-17 8:39 UTC (permalink / raw)
To: D. Wythe, Wenjia Zhang, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On Tue, Oct 17, 2023 at 10:00:28AM +0800, D. Wythe wrote:
>
>
>On 10/13/23 8:27 PM, Dust Li wrote:
>> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>> >
>> > On 13.10.23 07:32, Dust Li wrote:
>> > > On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>> > > >
>> > > > On 12.10.23 04:37, D. Wythe wrote:
>> > > > >
>> > > > > On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> > > > > >
>> > > > > > On 11.10.23 09:33, D. Wythe wrote:
>> > > > > > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > > > > >
>> > > > > > > Considering scenario:
>> > > > > > >
>> > > > > > > smc_cdc_rx_handler_rwwi
>> > > > > > > __smc_release
>> > > > > > > sock_set_flag
>> > > > > > > smc_close_active()
>> > > > > > > sock_set_flag
>> > > > > > >
>> > > > > > > __set_bit(DEAD) __set_bit(DONE)
>> > > > > > >
>> > > > > > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > > > > > if the DEAD flag lost, the state SMC_CLOSED will be never be reached
>> > > > > > > in smc_close_passive_work:
>> > > > > > >
>> > > > > > > if (sock_flag(sk, SOCK_DEAD) &&
>> > > > > > > smc_close_sent_any_close(conn)) {
>> > > > > > > sk->sk_state = SMC_CLOSED;
>> > > > > > > } else {
>> > > > > > > /* just shutdown, but not yet closed locally */
>> > > > > > > sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > > > > > }
>> > > > > > >
>> > > > > > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > > > > > Since set_bit is atomic.
>> > > > > > >
>> > > > > > I didn't really understand the scenario. What is
>> > > > > > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > > > > > during the runtime?
>> > > > > >
>> > > > > Hi Wenjia,
>> > > > >
>> > > > > Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> > > > > smc_cdc_rx_handler();
>> > > > >
>> > > > > Following is a more specific description of the issues
>> > > > >
>> > > > >
>> > > > > lock_sock()
>> > > > > __smc_release
>> > > > >
>> > > > > smc_cdc_rx_handler()
>> > > > > smc_cdc_msg_recv()
>> > > > > bh_lock_sock()
>> > > > > smc_cdc_msg_recv_action()
>> > > > > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > > > > __set_bit __set_bit
>> > > > > bh_unlock_sock()
>> > > > > release_sock()
>> > > > >
>> > > > >
>> > > > >
>> > > > > Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> > > > > actually used for different purposes and contexts.
>> > > > >
>> > > > >
>> > > > ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> > > > exclusive. However, since bh_lock_sock() is used, this scenario you described
>> > > > above should not happen, because that gets the sk_lock.slock. Following this
>> > > > scenarios, IMO, only the following situation can happen.
>> > > >
>> > > > lock_sock()
>> > > > __smc_release
>> > > >
>> > > > smc_cdc_rx_handler()
>> > > > smc_cdc_msg_recv()
>> > > > bh_lock_sock()
>> > > > smc_cdc_msg_recv_action()
>> > > > sock_set_flag(DONE)
>> > > > bh_unlock_sock()
>> > > > sock_set_flag(DEAD)
>> > > > release_sock()
>> > > Hi wenjia,
>> > >
>> > > I think I know what D. Wythe means now, and I think he is right on this.
>> > >
>> > > IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>> > > acquires the lock before bh_lock_sock(). This is how the sock lock works.
>> > >
>> > > PROCESS CONTEXT INTERRUPT CONTEXT
>> > > ------------------------------------------------------------------------
>> > > lock_sock()
>> > > spin_lock_bh(&sk->sk_lock.slock);
>> > > ...
>> > > sk->sk_lock.owned = 1;
>> > > // here the spinlock is released
>> > > spin_unlock_bh(&sk->sk_lock.slock);
>> > > __smc_release()
>> > > bh_lock_sock(&smc->sk);
>> > > smc_cdc_msg_recv_action(smc, cdc);
>> > > sock_set_flag(&smc->sk, SOCK_DONE);
>> > > bh_unlock_sock(&smc->sk);
>> > >
>> > > sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE)
>> > > release_sock()
>> > >
>> > > The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>> > > after lock_sock() return. Therefor, there is actually no lock between
>> > > the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>> > > Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>> > > before or after sock_set_flag(DONE).
>> > >
>> > >
>> > > Actually, in TCP, the interrupt context will check sock_owned_by_user().
>> > > If it returns true, the softirq just defer the process to backlog, and process
>> > > that in release_sock(). Which avoid the race between softirq and process
>> > > when visiting the 'struct sock'.
>> > >
>> > > tcp_v4_rcv()
>> > > bh_lock_sock_nested(sk);
>> > > tcp_segs_in(tcp_sk(sk), skb);
>> > > ret = 0;
>> > > if (!sock_owned_by_user(sk)) {
>> > > ret = tcp_v4_do_rcv(sk, skb);
>> > > } else {
>> > > if (tcp_add_backlog(sk, skb, &drop_reason))
>> > > goto discard_and_relse;
>> > > }
>> > > bh_unlock_sock(sk);
>> > >
>> > >
>> > > But in SMC we don't have a backlog, that means fields in 'struct sock'
>> > > might all have race, and this sock_set_flag() is just one of the cases.
>> > >
>> > > Best regards,
>> > > Dust
>> > >
>> > I agree on your description above.
>> > Sure, the following case 1) can also happen
>> >
>> > case 1)
>> > -------
>> > lock_sock()
>> > __smc_release
>> >
>> > sock_set_flag(DEAD)
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE)
>> > bh_unlock_sock()
>> > release_sock()
>> >
>> > case 2)
>> > -------
>> > lock_sock()
>> > __smc_release
>> >
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > __set_bit __set_bit
>> > bh_unlock_sock()
>> > release_sock()
>> >
>> > My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>> > and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>> > the atomic set help make sure that the Dead flag would not be overwritten
>> > with DONE?
>> I agree with you on this. I also don't see using atomic can
>> solve the problem of overwriting the DEAD flag with DONE.
>>
>> I think we need some mechanisms to ensure that sk_flags and other
>> struct sock related fields are not modified simultaneously.
>>
>> Best regards,
>> Dust
>
>It seems that everyone has agrees on that case 2 is impossible. I'm a bit
>confused, why that
>sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently. What
>mechanism
>prevents their parallel execution?
Upon reviewing the code again, I realize that my previous understanding
was incorrect. I mistakenly believed that the DEAD and DONE flags would
overwrite each other, without realizing that sk_flags is actually used
as a bitmap.
So, I think you are right, using atomic will ensure that the DEAD flag is
always set.
Best regards,
Dust
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-17 2:00 ` D. Wythe
2023-10-17 8:39 ` Dust Li
@ 2023-10-17 17:03 ` Wenjia Zhang
[not found] ` <4065e94f-f7ea-7943-e2cc-0c7d3f9c788b@linux.alibaba.com>
1 sibling, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-17 17:03 UTC (permalink / raw)
To: D. Wythe, dust.li, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 17.10.23 04:00, D. Wythe wrote:
>
>
> On 10/13/23 8:27 PM, Dust Li wrote:
>> On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>>>
>>> On 13.10.23 07:32, Dust Li wrote:
>>>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>>>>>
>>>>> On 12.10.23 04:37, D. Wythe wrote:
>>>>>>
>>>>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>>>>>>>
>>>>>>> On 11.10.23 09:33, D. Wythe wrote:
>>>>>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>>>>>
>>>>>>>> Considering scenario:
>>>>>>>>
>>>>>>>> smc_cdc_rx_handler_rwwi
>>>>>>>> __smc_release
>>>>>>>> sock_set_flag
>>>>>>>> smc_close_active()
>>>>>>>> sock_set_flag
>>>>>>>>
>>>>>>>> __set_bit(DEAD) __set_bit(DONE)
>>>>>>>>
>>>>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>>>>>>>> if the DEAD flag lost, the state SMC_CLOSED will be never be
>>>>>>>> reached
>>>>>>>> in smc_close_passive_work:
>>>>>>>>
>>>>>>>> if (sock_flag(sk, SOCK_DEAD) &&
>>>>>>>> smc_close_sent_any_close(conn)) {
>>>>>>>> sk->sk_state = SMC_CLOSED;
>>>>>>>> } else {
>>>>>>>> /* just shutdown, but not yet closed locally */
>>>>>>>> sk->sk_state = SMC_APPFINCLOSEWAIT;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this
>>>>>>>> problem.
>>>>>>>> Since set_bit is atomic.
>>>>>>>>
>>>>>>> I didn't really understand the scenario. What is
>>>>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>>>>>>> during the runtime?
>>>>>>>
>>>>>> Hi Wenjia,
>>>>>>
>>>>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>>>>>> smc_cdc_rx_handler();
>>>>>>
>>>>>> Following is a more specific description of the issues
>>>>>>
>>>>>>
>>>>>> lock_sock()
>>>>>> __smc_release
>>>>>>
>>>>>> smc_cdc_rx_handler()
>>>>>> smc_cdc_msg_recv()
>>>>>> bh_lock_sock()
>>>>>> smc_cdc_msg_recv_action()
>>>>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>>>>> __set_bit __set_bit
>>>>>> bh_unlock_sock()
>>>>>> release_sock()
>>>>>>
>>>>>>
>>>>>>
>>>>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive.
>>>>>> They are
>>>>>> actually used for different purposes and contexts.
>>>>>>
>>>>>>
>>>>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really
>>>>> mutually
>>>>> exclusive. However, since bh_lock_sock() is used, this scenario you
>>>>> described
>>>>> above should not happen, because that gets the sk_lock.slock.
>>>>> Following this
>>>>> scenarios, IMO, only the following situation can happen.
>>>>>
>>>>> lock_sock()
>>>>> __smc_release
>>>>>
>>>>> smc_cdc_rx_handler()
>>>>> smc_cdc_msg_recv()
>>>>> bh_lock_sock()
>>>>> smc_cdc_msg_recv_action()
>>>>> sock_set_flag(DONE)
>>>>> bh_unlock_sock()
>>>>> sock_set_flag(DEAD)
>>>>> release_sock()
>>>> Hi wenjia,
>>>>
>>>> I think I know what D. Wythe means now, and I think he is right on
>>>> this.
>>>>
>>>> IIUC, in process context, lock_sock() won't respect bh_lock_sock()
>>>> if it
>>>> acquires the lock before bh_lock_sock(). This is how the sock lock
>>>> works.
>>>>
>>>> PROCESS CONTEXT INTERRUPT CONTEXT
>>>> ------------------------------------------------------------------------
>>>> lock_sock()
>>>> spin_lock_bh(&sk->sk_lock.slock);
>>>> ...
>>>> sk->sk_lock.owned = 1;
>>>> // here the spinlock is released
>>>> spin_unlock_bh(&sk->sk_lock.slock);
>>>> __smc_release()
>>>>
>>>> bh_lock_sock(&smc->sk);
>>>>
>>>> smc_cdc_msg_recv_action(smc, cdc);
>>>>
>>>> sock_set_flag(&smc->sk, SOCK_DONE);
>>>>
>>>> bh_unlock_sock(&smc->sk);
>>>>
>>>> sock_set_flag(DEAD) <-- Can be before or after
>>>> sock_set_flag(DONE)
>>>> release_sock()
>>>>
>>>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already
>>>> released
>>>> after lock_sock() return. Therefor, there is actually no lock between
>>>> the code after lock_sock() and before release_sock() with
>>>> bh_lock_sock()...bh_unlock_sock().
>>>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and
>>>> might be
>>>> before or after sock_set_flag(DONE).
>>>>
>>>>
>>>> Actually, in TCP, the interrupt context will check
>>>> sock_owned_by_user().
>>>> If it returns true, the softirq just defer the process to backlog,
>>>> and process
>>>> that in release_sock(). Which avoid the race between softirq and
>>>> process
>>>> when visiting the 'struct sock'.
>>>>
>>>> tcp_v4_rcv()
>>>> bh_lock_sock_nested(sk);
>>>> tcp_segs_in(tcp_sk(sk), skb);
>>>> ret = 0;
>>>> if (!sock_owned_by_user(sk)) {
>>>> ret = tcp_v4_do_rcv(sk, skb);
>>>> } else {
>>>> if (tcp_add_backlog(sk, skb, &drop_reason))
>>>> goto discard_and_relse;
>>>> }
>>>> bh_unlock_sock(sk);
>>>>
>>>>
>>>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>>>> might all have race, and this sock_set_flag() is just one of the cases.
>>>>
>>>> Best regards,
>>>> Dust
>>>>
>>> I agree on your description above.
>>> Sure, the following case 1) can also happen
>>>
>>> case 1)
>>> -------
>>> lock_sock()
>>> __smc_release
>>>
>>> sock_set_flag(DEAD)
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE)
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>> case 2)
>>> -------
>>> lock_sock()
>>> __smc_release
>>>
>>> bh_lock_sock()
>>> smc_cdc_msg_recv_action()
>>> sock_set_flag(DONE) sock_set_flag(DEAD)
>>> __set_bit __set_bit
>>> bh_unlock_sock()
>>> release_sock()
>>>
>>> My point here is that case2) can never happen. i.e that
>>> sock_set_flag(DONE)
>>> and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>>> the atomic set help make sure that the Dead flag would not be
>>> overwritten
>>> with DONE?
>> I agree with you on this. I also don't see using atomic can
>> solve the problem of overwriting the DEAD flag with DONE.
>>
>> I think we need some mechanisms to ensure that sk_flags and other
>> struct sock related fields are not modified simultaneously.
>>
>> Best regards,
>> Dust
>
> It seems that everyone has agrees on that case 2 is impossible. I'm a
> bit confused, why that
> sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently.
> What mechanism
> prevents their parallel execution?
>
> Best wishes,
> D. Wythe
>
>>
>
In the smc_cdc_rx_handler(), if bh_lock_sock() is got, how could the
sock_set_flag(DEAD) in the __smc_release() modify the flag concurrently?
As I said, that could be just kind of lapse of my thought, but I still
want to make it clarify.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
2023-10-11 7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
2023-10-11 14:00 ` Dust Li
2023-10-11 20:31 ` Wenjia Zhang
@ 2023-10-23 20:53 ` Wenjia Zhang
2 siblings, 0 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-23 20:53 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Considering scenario:
>
> smc_cdc_rx_handler_rwwi
> __smc_release
> sock_set_flag
> smc_close_active()
> sock_set_flag
>
> __set_bit(DEAD) __set_bit(DONE)
>
> Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
> if the DEAD flag lost, the state SMC_CLOSED will be never be reached
> in smc_close_passive_work:
>
> if (sock_flag(sk, SOCK_DEAD) &&
> smc_close_sent_any_close(conn)) {
> sk->sk_state = SMC_CLOSED;
> } else {
> /* just shutdown, but not yet closed locally */
> sk->sk_state = SMC_APPFINCLOSEWAIT;
> }
>
> Replace sock_set_flags or __set_bit to set_bit will fix this problem.
> Since set_bit is atomic.
>
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/af_smc.c | 4 ++--
> net/smc/smc.h | 5 +++++
> net/smc/smc_cdc.c | 2 +-
> net/smc/smc_close.c | 2 +-
> 4 files changed, 9 insertions(+), 4 deletions(-)
>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH net 2/5] net/smc: fix incorrect barrier usage
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
2023-10-11 7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
@ 2023-10-11 7:33 ` D. Wythe
2023-10-11 8:44 ` Heiko Carstens
2023-10-11 7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
` (3 subsequent siblings)
5 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-11 7:33 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe,
Heiko Carstens
From: "D. Wythe" <alibuda@linux.alibaba.com>
This patch add explicit CPU barrier to ensure memory
consistency rather than compiler barrier.
Besides, the atomicity between READ_ONCE and cmpxhcg cannot
be guaranteed, so we need to use atomic ops. The simple way
is to replace READ_ONCE with xchg.
Fixes: 475f9ff63ee8 ("net/smc: fix application data exception")
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Links: https://lore.kernel.org/netdev/1b7c95be-d3d9-53c3-3152-cd835314d37c@linux.ibm.com/T/
---
net/smc/smc_core.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d520ee6..cc7d72e 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1133,9 +1133,10 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
smc_buf_free(lgr, is_rmb, buf_desc);
} else {
- /* memzero_explicit provides potential memory barrier semantics */
- memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
- WRITE_ONCE(buf_desc->used, 0);
+ memset(buf_desc->cpu_addr, 0, buf_desc->len);
+ /* make sure buf_desc->used not be reordered ahead */
+ smp_mb__before_atomic();
+ xchg(&buf_desc->used, 0);
}
}
@@ -1146,17 +1147,21 @@ static void smc_buf_unuse(struct smc_connection *conn,
if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
} else {
- memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
- WRITE_ONCE(conn->sndbuf_desc->used, 0);
+ memset(conn->sndbuf_desc->cpu_addr, 0, conn->sndbuf_desc->len);
+ /* make sure buf_desc->used not be reordered ahead */
+ smp_mb__before_atomic();
+ xchg(&conn->sndbuf_desc->used, 0);
}
}
if (conn->rmb_desc) {
if (!lgr->is_smcd) {
smcr_buf_unuse(conn->rmb_desc, true, lgr);
} else {
- memzero_explicit(conn->rmb_desc->cpu_addr,
- conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
- WRITE_ONCE(conn->rmb_desc->used, 0);
+ memset(conn->rmb_desc->cpu_addr, 0,
+ conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
+ /* make sure buf_desc->used not be reordered ahead */
+ smp_mb__before_atomic();
+ xchg(&conn->rmb_desc->used, 0);
}
}
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH net 2/5] net/smc: fix incorrect barrier usage
2023-10-11 7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
@ 2023-10-11 8:44 ` Heiko Carstens
2023-10-11 8:57 ` D. Wythe
0 siblings, 1 reply; 38+ messages in thread
From: Heiko Carstens @ 2023-10-11 8:44 UTC (permalink / raw)
To: D. Wythe
Cc: kgraul, wenjia, jaka, wintera, kuba, davem, netdev, linux-s390,
linux-rdma
On Wed, Oct 11, 2023 at 03:33:17PM +0800, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> This patch add explicit CPU barrier to ensure memory
> consistency rather than compiler barrier.
>
> Besides, the atomicity between READ_ONCE and cmpxhcg cannot
> be guaranteed, so we need to use atomic ops. The simple way
> is to replace READ_ONCE with xchg.
>
> Fixes: 475f9ff63ee8 ("net/smc: fix application data exception")
> Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
^^^
I did not Co-develop this, nor did I provide an explicit Signed-off-by.
Please don't add Signed-off-by statements which have not been explicitly
agreed on.
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 2/5] net/smc: fix incorrect barrier usage
2023-10-11 8:44 ` Heiko Carstens
@ 2023-10-11 8:57 ` D. Wythe
0 siblings, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11 8:57 UTC (permalink / raw)
To: Heiko Carstens
Cc: kgraul, wenjia, jaka, wintera, kuba, davem, netdev, linux-s390,
linux-rdma
On 10/11/23 4:44 PM, Heiko Carstens wrote:
> I did not Co-develop this, nor did I provide an explicit Signed-off-by.
> Please don't add Signed-off-by statements which have not been explicitly
> agreed on.
Sorry for that, I have used the wrong tag, Reported by might be more
appropriate .
I will remove this in the next version, sorry to bother you again.
Best wishes,
D. Wythe
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
2023-10-11 7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
2023-10-11 7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
@ 2023-10-11 7:33 ` D. Wythe
2023-10-11 20:37 ` Wenjia Zhang
2023-10-11 7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
` (2 subsequent siblings)
5 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-11 7:33 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe
From: "D. Wythe" <alibuda@linux.alibaba.com>
This patch re-fix the issues memtianed by commit 22a825c541d7
("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
Blocking sending message do solve the issues though, but it also
prevents the peer to receive the final message. Besides, in logic,
whether the sndbuf_desc is NULL or not have no impact on the processing
of cdc message sending.
Hence that, this patch allow the cdc message sending but to check the
sndbuf_desc with care in smc_cdc_tx_handler().
Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/smc_cdc.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 01bdb79..3c06625 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct smc_wr_tx_pend_priv *pnd_snd,
{
struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend *)pnd_snd;
struct smc_connection *conn = cdcpend->conn;
+ struct smc_buf_desc *sndbuf_desc;
struct smc_sock *smc;
int diff;
+ sndbuf_desc = conn->sndbuf_desc;
smc = container_of(conn, struct smc_sock, conn);
bh_lock_sock(&smc->sk);
- if (!wc_status) {
- diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
+ if (!wc_status && sndbuf_desc) {
+ diff = smc_curs_diff(sndbuf_desc->len,
&cdcpend->conn->tx_curs_fin,
&cdcpend->cursor);
/* sndbuf_space is decreased in smc_sendmsg */
@@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
union smc_host_cursor cfed;
int rc;
- if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
- return -ENOBUFS;
-
smc_cdc_add_pending_send(conn, pend);
conn->tx_cdc_seq++;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
2023-10-11 7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
@ 2023-10-11 20:37 ` Wenjia Zhang
2023-10-12 2:49 ` D. Wythe
0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-11 20:37 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> This patch re-fix the issues memtianed by commit 22a825c541d7
> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>
> Blocking sending message do solve the issues though, but it also
> prevents the peer to receive the final message. Besides, in logic,
> whether the sndbuf_desc is NULL or not have no impact on the processing
> of cdc message sending.
>
Agree.
> Hence that, this patch allow the cdc message sending but to check the
> sndbuf_desc with care in smc_cdc_tx_handler().
>
> Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/smc_cdc.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
> index 01bdb79..3c06625 100644
> --- a/net/smc/smc_cdc.c
> +++ b/net/smc/smc_cdc.c
> @@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct smc_wr_tx_pend_priv *pnd_snd,
> {
> struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend *)pnd_snd;
> struct smc_connection *conn = cdcpend->conn;
> + struct smc_buf_desc *sndbuf_desc;
> struct smc_sock *smc;
> int diff;
>
> + sndbuf_desc = conn->sndbuf_desc;
> smc = container_of(conn, struct smc_sock, conn);
> bh_lock_sock(&smc->sk);
> - if (!wc_status) {
> - diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
> + if (!wc_status && sndbuf_desc) {
> + diff = smc_curs_diff(sndbuf_desc->len,
How could this guarantee that the sndbuf_desc would not be NULL?
> &cdcpend->conn->tx_curs_fin,
> &cdcpend->cursor);
> /* sndbuf_space is decreased in smc_sendmsg */
> @@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
> union smc_host_cursor cfed;
> int rc;
>
> - if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
> - return -ENOBUFS;
> -
> smc_cdc_add_pending_send(conn, pend);
>
> conn->tx_cdc_seq++;
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
2023-10-11 20:37 ` Wenjia Zhang
@ 2023-10-12 2:49 ` D. Wythe
2023-10-12 15:15 ` Wenjia Zhang
0 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-12 2:49 UTC (permalink / raw)
To: Wenjia Zhang, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 10/12/23 4:37 AM, Wenjia Zhang wrote:
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch re-fix the issues memtianed by commit 22a825c541d7
>> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>>
>> Blocking sending message do solve the issues though, but it also
>> prevents the peer to receive the final message. Besides, in logic,
>> whether the sndbuf_desc is NULL or not have no impact on the processing
>> of cdc message sending.
>>
> Agree.
>
>> Hence that, this patch allow the cdc message sending but to check the
>> sndbuf_desc with care in smc_cdc_tx_handler().
>>
>> Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in
>> smc_cdc_tx_handler()")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>> net/smc/smc_cdc.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>> index 01bdb79..3c06625 100644
>> --- a/net/smc/smc_cdc.c
>> +++ b/net/smc/smc_cdc.c
>> @@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct
>> smc_wr_tx_pend_priv *pnd_snd,
>> {
>> struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend
>> *)pnd_snd;
>> struct smc_connection *conn = cdcpend->conn;
>> + struct smc_buf_desc *sndbuf_desc;
>> struct smc_sock *smc;
>> int diff;
>> + sndbuf_desc = conn->sndbuf_desc;
>> smc = container_of(conn, struct smc_sock, conn);
>> bh_lock_sock(&smc->sk);
>> - if (!wc_status) {
>> - diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
>> + if (!wc_status && sndbuf_desc) {
>> + diff = smc_curs_diff(sndbuf_desc->len,
> How could this guarantee that the sndbuf_desc would not be NULL?
>
It can not guarantee he sndbuf_desc would not be NULL, but it will prevents
the smc_cdc_tx_handler() to access a NULL sndbuf_desc. So that we
can avoid the panic descried in commit 22a825c541d7
("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>> &cdcpend->conn->tx_curs_fin,
>> &cdcpend->cursor);
>> /* sndbuf_space is decreased in smc_sendmsg */
>> @@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
>> union smc_host_cursor cfed;
>> int rc;
>> - if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
>> - return -ENOBUFS;
>> -
>> smc_cdc_add_pending_send(conn, pend);
>> conn->tx_cdc_seq++;
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
2023-10-12 2:49 ` D. Wythe
@ 2023-10-12 15:15 ` Wenjia Zhang
0 siblings, 0 replies; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 15:15 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 12.10.23 04:49, D. Wythe wrote:
>
>
> On 10/12/23 4:37 AM, Wenjia Zhang wrote:
>>
>>
>> On 11.10.23 09:33, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> This patch re-fix the issues memtianed by commit 22a825c541d7
>>> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>>>
>>> Blocking sending message do solve the issues though, but it also
>>> prevents the peer to receive the final message. Besides, in logic,
>>> whether the sndbuf_desc is NULL or not have no impact on the processing
>>> of cdc message sending.
>>>
>> Agree.
>>
>>> Hence that, this patch allow the cdc message sending but to check the
>>> sndbuf_desc with care in smc_cdc_tx_handler().
>>>
>>> Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in
>>> smc_cdc_tx_handler()")
>>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>>> ---
>>> net/smc/smc_cdc.c | 9 ++++-----
>>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
>>> index 01bdb79..3c06625 100644
>>> --- a/net/smc/smc_cdc.c
>>> +++ b/net/smc/smc_cdc.c
>>> @@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct
>>> smc_wr_tx_pend_priv *pnd_snd,
>>> {
>>> struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend
>>> *)pnd_snd;
>>> struct smc_connection *conn = cdcpend->conn;
>>> + struct smc_buf_desc *sndbuf_desc;
>>> struct smc_sock *smc;
>>> int diff;
>>> + sndbuf_desc = conn->sndbuf_desc;
>>> smc = container_of(conn, struct smc_sock, conn);
>>> bh_lock_sock(&smc->sk);
>>> - if (!wc_status) {
>>> - diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
>>> + if (!wc_status && sndbuf_desc) {
>>> + diff = smc_curs_diff(sndbuf_desc->len,
>> How could this guarantee that the sndbuf_desc would not be NULL?
>>
>
> It can not guarantee he sndbuf_desc would not be NULL, but it will prevents
> the smc_cdc_tx_handler() to access a NULL sndbuf_desc. So that we
> can avoid the panic descried in commit 22a825c541d7
> ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
>
got it, thanks!
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
>>> &cdcpend->conn->tx_curs_fin,
>>> &cdcpend->cursor);
>>> /* sndbuf_space is decreased in smc_sendmsg */
>>> @@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
>>> union smc_host_cursor cfed;
>>> int rc;
>>> - if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
>>> - return -ENOBUFS;
>>> -
>>> smc_cdc_add_pending_send(conn, pend);
>>> conn->tx_cdc_seq++;
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH net 4/5] net/smc: protect connection state transitions in listen work
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
` (2 preceding siblings ...)
2023-10-11 7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
@ 2023-10-11 7:33 ` D. Wythe
2023-10-12 17:14 ` Wenjia Zhang
2023-10-11 7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
5 siblings, 1 reply; 38+ messages in thread
From: D. Wythe @ 2023-10-11 7:33 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe
From: "D. Wythe" <alibuda@linux.alibaba.com>
Consider the following scenario:
smc_close_passive_work
smc_listen_out_connected
lock_sock()
if (state == SMC_INIT)
if (state == SMC_INIT)
state = SMC_APPCLOSEWAIT1;
state = SMC_ACTIVE
release_sock()
This would cause the state machine of the connection to be corrupted.
Also, this issue can occur in smc_listen_out_err().
To solve this problem, we can protect the state transitions under
the lock of sock to avoid collision.
Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/af_smc.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 5ad2a9f..3bb8265 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1926,8 +1926,10 @@ static void smc_listen_out_connected(struct smc_sock *new_smc)
{
struct sock *newsmcsk = &new_smc->sk;
+ lock_sock(newsmcsk);
if (newsmcsk->sk_state == SMC_INIT)
newsmcsk->sk_state = SMC_ACTIVE;
+ release_sock(newsmcsk);
smc_listen_out(new_smc);
}
@@ -1939,9 +1941,12 @@ static void smc_listen_out_err(struct smc_sock *new_smc)
struct net *net = sock_net(newsmcsk);
this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt);
+
+ lock_sock(newsmcsk);
if (newsmcsk->sk_state == SMC_INIT)
sock_put(&new_smc->sk); /* passive closing */
newsmcsk->sk_state = SMC_CLOSED;
+ release_sock(newsmcsk);
smc_listen_out(new_smc);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH net 4/5] net/smc: protect connection state transitions in listen work
2023-10-11 7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
@ 2023-10-12 17:14 ` Wenjia Zhang
2023-10-31 3:04 ` D. Wythe
0 siblings, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 17:14 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Consider the following scenario:
>
> smc_close_passive_work
> smc_listen_out_connected
> lock_sock()
> if (state == SMC_INIT)
> if (state == SMC_INIT)
> state = SMC_APPCLOSEWAIT1;
> state = SMC_ACTIVE
> release_sock()
>
> This would cause the state machine of the connection to be corrupted.
> Also, this issue can occur in smc_listen_out_err().
>
> To solve this problem, we can protect the state transitions under
> the lock of sock to avoid collision.
>
To this fix, I have to repeat the question from Alexandra.
Did the scenario occur in real life? Or is it just kind of potencial
problem you found during the code review?
If it is the former one, could you please show us the corresponding
message, e.g. from dmesg? If it is the latter one, I'd like to deal with
it more carefully. Going from this scenario, I noticed that there could
also be other similar places where we need to make sure that no race
happens. Thus, it would make more sense to find a systematic approach.
> Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/af_smc.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 5ad2a9f..3bb8265 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -1926,8 +1926,10 @@ static void smc_listen_out_connected(struct smc_sock *new_smc)
> {
> struct sock *newsmcsk = &new_smc->sk;
>
> + lock_sock(newsmcsk);
> if (newsmcsk->sk_state == SMC_INIT)
> newsmcsk->sk_state = SMC_ACTIVE;
> + release_sock(newsmcsk);
>
> smc_listen_out(new_smc);
> }
> @@ -1939,9 +1941,12 @@ static void smc_listen_out_err(struct smc_sock *new_smc)
> struct net *net = sock_net(newsmcsk);
>
> this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt);
> +
> + lock_sock(newsmcsk);
> if (newsmcsk->sk_state == SMC_INIT)
> sock_put(&new_smc->sk); /* passive closing */
> newsmcsk->sk_state = SMC_CLOSED;
> + release_sock(newsmcsk);
>
> smc_listen_out(new_smc);
> }
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 4/5] net/smc: protect connection state transitions in listen work
2023-10-12 17:14 ` Wenjia Zhang
@ 2023-10-31 3:04 ` D. Wythe
0 siblings, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-31 3:04 UTC (permalink / raw)
To: Wenjia Zhang, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 10/13/23 1:14 AM, Wenjia Zhang wrote:
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> Consider the following scenario:
>>
>> smc_close_passive_work
>> smc_listen_out_connected
>> lock_sock()
>> if (state == SMC_INIT)
>> if (state == SMC_INIT)
>> state = SMC_APPCLOSEWAIT1;
>> state = SMC_ACTIVE
>> release_sock()
>>
>> This would cause the state machine of the connection to be corrupted.
>> Also, this issue can occur in smc_listen_out_err().
>>
>> To solve this problem, we can protect the state transitions under
>> the lock of sock to avoid collision.
>>
> To this fix, I have to repeat the question from Alexandra.
> Did the scenario occur in real life? Or is it just kind of potencial
> problem you found during the code review?
>
Hi Wenjia,
This is a real issue that occurred in our environment rather than being
obtained from code reviews.
Unfortunately, since this patch does not cause panic, but rather
potential reference leaks, so it is difficult for me
to provide a very intuitive error message.
> If it is the former one, could you please show us the corresponding
> message, e.g. from dmesg? If it is the latter one, I'd like to deal
> with it more carefully. Going from this scenario, I noticed that there
> could also be other similar places where we need to make sure that no
> race happens. Thus, it would make more sense to find a systematic
> approach.
>
We agree that we should deal with it with more care, In fact, this issue
is very complex and we may spend a lot of time discussing it. Therefore,
I suggest that we can temporarily drop it
so that we can quickly accept the patch we have already agreed on. I
will send those patches separately in the future.
Best Wishes,
D. Wythe
>> Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in
>> af_smc")
>> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
>> ---
>> net/smc/af_smc.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index 5ad2a9f..3bb8265 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -1926,8 +1926,10 @@ static void smc_listen_out_connected(struct
>> smc_sock *new_smc)
>> {
>> struct sock *newsmcsk = &new_smc->sk;
>> + lock_sock(newsmcsk);
>> if (newsmcsk->sk_state == SMC_INIT)
>> newsmcsk->sk_state = SMC_ACTIVE;
>> + release_sock(newsmcsk);
>> smc_listen_out(new_smc);
>> }
>> @@ -1939,9 +1941,12 @@ static void smc_listen_out_err(struct smc_sock
>> *new_smc)
>> struct net *net = sock_net(newsmcsk);
>> this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt);
>> +
>> + lock_sock(newsmcsk);
>> if (newsmcsk->sk_state == SMC_INIT)
>> sock_put(&new_smc->sk); /* passive closing */
>> newsmcsk->sk_state = SMC_CLOSED;
>> + release_sock(newsmcsk);
>> smc_listen_out(new_smc);
>> }
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH net 5/5] net/smc: put sk reference if close work was canceled
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
` (3 preceding siblings ...)
2023-10-11 7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
@ 2023-10-11 7:33 ` D. Wythe
2023-10-11 14:54 ` Dust Li
2023-10-12 19:04 ` Wenjia Zhang
2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
5 siblings, 2 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-11 7:33 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma, D. Wythe
From: "D. Wythe" <alibuda@linux.alibaba.com>
Note that we always hold a reference to sock when attempting
to submit close_work. Therefore, if we have successfully
canceled close_work from pending, we MUST release that reference
to avoid potential leaks.
Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
net/smc/smc_close.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index 449ef45..10219f5 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
struct sock *sk = &smc->sk;
release_sock(sk);
- cancel_work_sync(&smc->conn.close_work);
+ if (cancel_work_sync(&smc->conn.close_work))
+ sock_put(sk);
cancel_delayed_work_sync(&smc->conn.tx_work);
lock_sock(sk);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 38+ messages in thread* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
2023-10-11 7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
@ 2023-10-11 14:54 ` Dust Li
2023-10-12 19:04 ` Wenjia Zhang
1 sibling, 0 replies; 38+ messages in thread
From: Dust Li @ 2023-10-11 14:54 UTC (permalink / raw)
To: D. Wythe, kgraul, wenjia, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On Wed, Oct 11, 2023 at 03:33:20PM +0800, D. Wythe wrote:
>From: "D. Wythe" <alibuda@linux.alibaba.com>
>
>Note that we always hold a reference to sock when attempting
>to submit close_work. Therefore, if we have successfully
>canceled close_work from pending, we MUST release that reference
>to avoid potential leaks.
>
>Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
>Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Best regards,
Dust
>---
> net/smc/smc_close.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
>index 449ef45..10219f5 100644
>--- a/net/smc/smc_close.c
>+++ b/net/smc/smc_close.c
>@@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
> struct sock *sk = &smc->sk;
>
> release_sock(sk);
>- cancel_work_sync(&smc->conn.close_work);
>+ if (cancel_work_sync(&smc->conn.close_work))
>+ sock_put(sk);
> cancel_delayed_work_sync(&smc->conn.tx_work);
> lock_sock(sk);
> }
>--
>1.8.3.1
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 5/5] net/smc: put sk reference if close work was canceled
2023-10-11 7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
2023-10-11 14:54 ` Dust Li
@ 2023-10-12 19:04 ` Wenjia Zhang
[not found] ` <ee641ca5-104b-d1ec-5b2a-e20237c5378a@linux.alibaba.com>
1 sibling, 1 reply; 38+ messages in thread
From: Wenjia Zhang @ 2023-10-12 19:04 UTC (permalink / raw)
To: D. Wythe, kgraul, jaka, wintera
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> Note that we always hold a reference to sock when attempting
> to submit close_work.
yes
Therefore, if we have successfully
> canceled close_work from pending, we MUST release that reference
> to avoid potential leaks.
>
Isn't the corresponding reference already released inside the
smc_close_passive_work()?
> Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
> net/smc/smc_close.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
> index 449ef45..10219f5 100644
> --- a/net/smc/smc_close.c
> +++ b/net/smc/smc_close.c
> @@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
> struct sock *sk = &smc->sk;
>
> release_sock(sk);
> - cancel_work_sync(&smc->conn.close_work);
> + if (cancel_work_sync(&smc->conn.close_work))
> + sock_put(sk);
> cancel_delayed_work_sync(&smc->conn.tx_work);
> lock_sock(sk);
> }
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH net 0/5] net/smc: bugfixs for smc-r
2023-10-11 7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
` (4 preceding siblings ...)
2023-10-11 7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
@ 2023-10-12 13:43 ` Alexandra Winter
2023-10-17 1:56 ` D. Wythe
5 siblings, 1 reply; 38+ messages in thread
From: Alexandra Winter @ 2023-10-12 13:43 UTC (permalink / raw)
To: D. Wythe, kgraul, wenjia, jaka
Cc: kuba, davem, netdev, linux-s390, linux-rdma
The subject of the thread says 'smc-r', but some of the changes affect smc-d alike,
don't they?
On 11.10.23 09:33, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
>
> This patches contains bugfix following:
>
> 1. hung state
> 2. sock leak
> 3. potential panic
>
I may be helpful for the reviewers, when you point out, which patch fixes which problem.
Were they all found by code reviews?
Or did some occur in real life? If so, then what were the symptoms?
A description of the symptoms is helpful for somebody who is debugging and wants to check
whether the issue was already fixed upstream.
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH net 0/5] net/smc: bugfixs for smc-r
2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
@ 2023-10-17 1:56 ` D. Wythe
0 siblings, 0 replies; 38+ messages in thread
From: D. Wythe @ 2023-10-17 1:56 UTC (permalink / raw)
To: Alexandra Winter, kgraul, wenjia, jaka
Cc: kuba, davem, netdev, linux-s390, linux-rdma
On 10/12/23 9:43 PM, Alexandra Winter wrote:
> The subject of the thread says 'smc-r', but some of the changes affect smc-d alike,
> don't they?
Yes, sorry for this mistake, it should be bugfix for smc.
>
>
> On 11.10.23 09:33, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patches contains bugfix following:
>>
>> 1. hung state
>> 2. sock leak
>> 3. potential panic
>>
> I may be helpful for the reviewers, when you point out, which patch fixes which problem.
>
> Were they all found by code reviews?
> Or did some occur in real life? If so, then what were the symptoms?
> A description of the symptoms is helpful for somebody who is debugging and wants to check
> whether the issue was already fixed upstream.
Hi Alexandra,
Except for the issue with the barrier, which was feedback from the
review, all other issues have actually occurred in our environment
and have been verified through internal testing. However, most of these
issues are caused by reference leakage rather than panic, so it is
difficult to provide a
representative phenomenon. But what you said is do necessary, so I will
post some phenomena in the next version, such as
lsmod | grep smc
or
smcss - a
In that case, we can found the issues of reference residue or the
connection residue. Hope it can be helpful to you.
Thanks,
D. Wythe
^ permalink raw reply [flat|nested] 38+ messages in thread