All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dust Li <dust.li@linux.alibaba.com>
To: Wenjia Zhang <wenjia@linux.ibm.com>,
	"D. Wythe" <alibuda@linux.alibaba.com>,
	kgraul@linux.ibm.com, jaka@linux.ibm.com, wintera@linux.ibm.com
Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
Date: Fri, 13 Oct 2023 20:27:29 +0800	[thread overview]
Message-ID: <20231013122729.GU92403@linux.alibaba.com> (raw)
In-Reply-To: <6666db42-a4de-425e-a96d-bfa899ab265e@linux.ibm.com>

On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote:
>
>
>On 13.10.23 07:32, Dust Li wrote:
>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote:
>> > 
>> > 
>> > On 12.10.23 04:37, D. Wythe wrote:
>> > > 
>> > > 
>> > > On 10/12/23 4:31 AM, Wenjia Zhang wrote:
>> > > > 
>> > > > 
>> > > > On 11.10.23 09:33, D. Wythe wrote:
>> > > > > From: "D. Wythe" <alibuda@linux.alibaba.com>
>> > > > > 
>> > > > > Considering scenario:
>> > > > > 
>> > > > >                  smc_cdc_rx_handler_rwwi
>> > > > > __smc_release
>> > > > >                  sock_set_flag
>> > > > > smc_close_active()
>> > > > > sock_set_flag
>> > > > > 
>> > > > > __set_bit(DEAD)            __set_bit(DONE)
>> > > > > 
>> > > > > Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
>> > > > > if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
>> > > > > in smc_close_passive_work:
>> > > > > 
>> > > > > if (sock_flag(sk, SOCK_DEAD) &&
>> > > > >      smc_close_sent_any_close(conn)) {
>> > > > >      sk->sk_state = SMC_CLOSED;
>> > > > > } else {
>> > > > >      /* just shutdown, but not yet closed locally */
>> > > > >      sk->sk_state = SMC_APPFINCLOSEWAIT;
>> > > > > }
>> > > > > 
>> > > > > Replace sock_set_flags or __set_bit to set_bit will fix this problem.
>> > > > > Since set_bit is atomic.
>> > > > > 
>> > > > I didn't really understand the scenario. What is
>> > > > smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock
>> > > > during the runtime?
>> > > > 
>> > > 
>> > > Hi Wenjia,
>> > > 
>> > > Sorry for that, It is not smc_cdc_rx_handler_rwwi() but
>> > > smc_cdc_rx_handler();
>> > > 
>> > > Following is a more specific description of the issues
>> > > 
>> > > 
>> > > lock_sock()
>> > > __smc_release
>> > > 
>> > > smc_cdc_rx_handler()
>> > > smc_cdc_msg_recv()
>> > > bh_lock_sock()
>> > > smc_cdc_msg_recv_action()
>> > > sock_set_flag(DONE) sock_set_flag(DEAD)
>> > > __set_bit __set_bit
>> > > bh_unlock_sock()
>> > > release_sock()
>> > > 
>> > > 
>> > > 
>> > > Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are
>> > > actually used for different purposes and contexts.
>> > > 
>> > > 
>> > ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually
>> > exclusive. However, since bh_lock_sock() is used, this scenario you described
>> > above should not happen, because that gets the sk_lock.slock. Following this
>> > scenarios, IMO, only the following situation can happen.
>> > 
>> > lock_sock()
>> > __smc_release
>> > 
>> > smc_cdc_rx_handler()
>> > smc_cdc_msg_recv()
>> > bh_lock_sock()
>> > smc_cdc_msg_recv_action()
>> > sock_set_flag(DONE)
>> > bh_unlock_sock()
>> > sock_set_flag(DEAD)
>> > release_sock()
>> 
>> Hi wenjia,
>> 
>> I think I know what D. Wythe means now, and I think he is right on this.
>> 
>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it
>> acquires the lock before bh_lock_sock(). This is how the sock lock works.
>> 
>>      PROCESS CONTEXT                                 INTERRUPT CONTEXT
>> ------------------------------------------------------------------------
>> lock_sock()
>>      spin_lock_bh(&sk->sk_lock.slock);
>>      ...
>>      sk->sk_lock.owned = 1;
>>      // here the spinlock is released
>>      spin_unlock_bh(&sk->sk_lock.slock);
>> __smc_release()
>>                                                     bh_lock_sock(&smc->sk);
>>                                                     smc_cdc_msg_recv_action(smc, cdc);
>>                                                         sock_set_flag(&smc->sk, SOCK_DONE);
>>                                                     bh_unlock_sock(&smc->sk);
>> 
>>      sock_set_flag(DEAD)  <-- Can be before or after sock_set_flag(DONE)
>> release_sock()
>> 
>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released
>> after lock_sock() return. Therefor, there is actually no lock between
>> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock().
>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be
>> before or after sock_set_flag(DONE).
>> 
>> 
>> Actually, in TCP, the interrupt context will check sock_owned_by_user().
>> If it returns true, the softirq just defer the process to backlog, and process
>> that in release_sock(). Which avoid the race between softirq and process
>> when visiting the 'struct sock'.
>> 
>> tcp_v4_rcv()
>>           bh_lock_sock_nested(sk);
>>           tcp_segs_in(tcp_sk(sk), skb);
>>           ret = 0;
>>           if (!sock_owned_by_user(sk)) {
>>                   ret = tcp_v4_do_rcv(sk, skb);
>>           } else {
>>                   if (tcp_add_backlog(sk, skb, &drop_reason))
>>                           goto discard_and_relse;
>>           }
>>           bh_unlock_sock(sk);
>> 
>> 
>> But in SMC we don't have a backlog, that means fields in 'struct sock'
>> might all have race, and this sock_set_flag() is just one of the cases.
>> 
>> Best regards,
>> Dust
>> 
>I agree on your description above.
>Sure, the following case 1) can also happen
>
>case 1)
>-------
> lock_sock()
> __smc_release
>
> sock_set_flag(DEAD)
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE)
> bh_unlock_sock()
> release_sock()
>
>case 2)
>-------
> lock_sock()
> __smc_release
>
> bh_lock_sock()
> smc_cdc_msg_recv_action()
> sock_set_flag(DONE) sock_set_flag(DEAD)
> __set_bit __set_bit
> bh_unlock_sock()
> release_sock()
>
>My point here is that case2) can never happen. i.e that sock_set_flag(DONE)
>and sock_set_flag(DEAD) can not happen concurrently. Thus, how could
>the atomic set help make sure that the Dead flag would not be overwritten
>with DONE?

I agree with you on this. I also don't see using atomic can
solve the problem of overwriting the DEAD flag with DONE.

I think we need some mechanisms to ensure that sk_flags and other
struct sock related fields are not modified simultaneously.

Best regards,
Dust



  reply	other threads:[~2023-10-13 12:27 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11  7:33 [PATCH net 0/5] net/smc: bugfixs for smc-r D. Wythe
2023-10-11  7:33 ` [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT D. Wythe
2023-10-11 14:00   ` Dust Li
2023-10-11 20:31   ` Wenjia Zhang
2023-10-12  2:47     ` D. Wythe
     [not found]     ` <f8089b26-bb11-f82d-8070-222b1f8c1db1@linux.alibaba.com>
2023-10-12 11:51       ` Wenjia Zhang
2023-10-13  5:32         ` Dust Li
2023-10-13 11:52           ` Wenjia Zhang
2023-10-13 12:27             ` Dust Li [this message]
2023-10-17  2:00               ` D. Wythe
2023-10-17  8:39                 ` Dust Li
2023-10-17 17:03                 ` Wenjia Zhang
     [not found]                   ` <4065e94f-f7ea-7943-e2cc-0c7d3f9c788b@linux.alibaba.com>
2023-10-19 11:54                     ` Wenjia Zhang
2023-10-23 20:53   ` Wenjia Zhang
2023-10-11  7:33 ` [PATCH net 2/5] net/smc: fix incorrect barrier usage D. Wythe
2023-10-11  8:44   ` Heiko Carstens
2023-10-11  8:57     ` D. Wythe
2023-10-11  7:33 ` [PATCH net 3/5] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc D. Wythe
2023-10-11 20:37   ` Wenjia Zhang
2023-10-12  2:49     ` D. Wythe
2023-10-12 15:15       ` Wenjia Zhang
2023-10-11  7:33 ` [PATCH net 4/5] net/smc: protect connection state transitions in listen work D. Wythe
2023-10-12 17:14   ` Wenjia Zhang
2023-10-31  3:04     ` D. Wythe
2023-10-11  7:33 ` [PATCH net 5/5] net/smc: put sk reference if close work was canceled D. Wythe
2023-10-11 14:54   ` Dust Li
2023-10-12 19:04   ` Wenjia Zhang
     [not found]     ` <ee641ca5-104b-d1ec-5b2a-e20237c5378a@linux.alibaba.com>
2023-10-18 20:26       ` Wenjia Zhang
2023-10-19  7:33         ` D. Wythe
2023-10-19 17:40           ` Wenjia Zhang
2023-10-20  2:41             ` D. Wythe
2023-10-23  8:19               ` Wenjia Zhang
2023-10-23  8:52                 ` D. Wythe
2023-10-23 10:28                   ` Wenjia Zhang
2023-10-23 11:56                     ` Dust Li
     [not found]                     ` <59c0c75f-e9df-2ef1-ead2-7c5c97f3e750@linux.alibaba.com>
2023-10-23 20:52                       ` Wenjia Zhang
2023-10-12 13:43 ` [PATCH net 0/5] net/smc: bugfixs for smc-r Alexandra Winter
2023-10-17  1:56   ` D. Wythe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231013122729.GU92403@linux.alibaba.com \
    --to=dust.li@linux.alibaba.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=jaka@linux.ibm.com \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=wenjia@linux.ibm.com \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.