BPF List
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Aditi Ghag <aditi.ghag@isovalent.com>
Cc: bpf@vger.kernel.org, kafai@fb.com, edumazet@google.com
Subject: Re: [PATCH v4 bpf-next 3/4] bpf,tcp: Avoid taking fast sock lock in iterator
Date: Fri, 24 Mar 2023 14:45:17 -0700	[thread overview]
Message-ID: <ZB4Z7cnF+RDMaKvW@google.com> (raw)
In-Reply-To: <20230323200633.3175753-4-aditi.ghag@isovalent.com>

On 03/23, Aditi Ghag wrote:
> Previously, BPF TCP iterator was acquiring fast version of sock lock that
> disables the BH. This introduced a circular dependency with code paths  
> that
> later acquire sockets hash table bucket lock.
> Replace the fast version of sock lock with slow that faciliates BPF
> programs executed from the iterator to destroy TCP listening sockets
> using the bpf_sock_destroy kfunc.

> Here is a stack trace that motivated this change:

> ```
> 1) sock_lock with BH disabled + bucket lock

> lock_acquire+0xcd/0x330
> _raw_spin_lock_bh+0x38/0x50
> inet_unhash+0x96/0xd0
> tcp_set_state+0x6a/0x210
> tcp_abort+0x12b/0x230
> bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa
> bpf_iter_run_prog+0x1ff/0x340
> bpf_iter_tcp_seq_show+0xca/0x190
> bpf_seq_read+0x177/0x450
> vfs_read+0xc6/0x300
> ksys_read+0x69/0xf0
> do_syscall_64+0x3c/0x90
> entry_SYSCALL_64_after_hwframe+0x72/0xdc

> 2) sock lock with BH enable

> [    1.499968]   lock_acquire+0xcd/0x330
> [    1.500316]   _raw_spin_lock+0x33/0x40
> [    1.500670]   sk_clone_lock+0x146/0x520
> [    1.501030]   inet_csk_clone_lock+0x1b/0x110
> [    1.501433]   tcp_create_openreq_child+0x22/0x3f0
> [    1.501873]   tcp_v6_syn_recv_sock+0x96/0x940
> [    1.502284]   tcp_check_req+0x137/0x660
> [    1.502646]   tcp_v6_rcv+0xa63/0xe80
> [    1.502994]   ip6_protocol_deliver_rcu+0x78/0x590
> [    1.503434]   ip6_input_finish+0x72/0x140
> [    1.503818]   __netif_receive_skb_one_core+0x63/0xa0
> [    1.504281]   process_backlog+0x79/0x260
> [    1.504668]   __napi_poll.constprop.0+0x27/0x170
> [    1.505104]   net_rx_action+0x14a/0x2a0
> [    1.505469]   __do_softirq+0x165/0x510
> [    1.505842]   do_softirq+0xcd/0x100
> [    1.506172]   __local_bh_enable_ip+0xcc/0xf0
> [    1.506588]   ip6_finish_output2+0x2a8/0xb00
> [    1.506988]   ip6_finish_output+0x274/0x510
> [    1.507377]   ip6_xmit+0x319/0x9b0
> [    1.507726]   inet6_csk_xmit+0x12b/0x2b0
> [    1.508096]   __tcp_transmit_skb+0x549/0xc40
> [    1.508498]   tcp_rcv_state_process+0x362/0x1180

> ```

> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>

Acked-by: Stanislav Fomichev <sdf@google.com>

Don't need fixes because it doesn't trigger without your new
bpf_sock_destroy?


> ---
>   net/ipv4/tcp_ipv4.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)

> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index ea370afa70ed..f2d370a9450f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file  
> *seq, void *v)
>   	struct bpf_iter_meta meta;
>   	struct bpf_prog *prog;
>   	struct sock *sk = v;
> -	bool slow;
>   	uid_t uid;
>   	int ret;

> @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file  
> *seq, void *v)
>   		return 0;

>   	if (sk_fullsock(sk))
> -		slow = lock_sock_fast(sk);
> +		lock_sock(sk);

>   	if (unlikely(sk_unhashed(sk))) {
>   		ret = SEQ_SKIP;
> @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file  
> *seq, void *v)

>   unlock:
>   	if (sk_fullsock(sk))
> -		unlock_sock_fast(sk, slow);
> +		release_sock(sk);
>   	return ret;

>   }
> --
> 2.34.1


  reply	other threads:[~2023-03-24 21:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23 20:06 [PATCH v4 bpf-next 0/5] bpf-nex: Add socket destroy capability Aditi Ghag
2023-03-23 20:06 ` [PATCH v4 bpf-next 1/4] bpf: Implement batching in UDP iterator Aditi Ghag
2023-03-24 21:56   ` Stanislav Fomichev
2023-03-27 15:52     ` Aditi Ghag
2023-03-27 16:52       ` Stanislav Fomichev
2023-03-27 22:28   ` Martin KaFai Lau
2023-03-28 17:06     ` Aditi Ghag
2023-03-28 21:33       ` Martin KaFai Lau
2023-03-29 16:20         ` Aditi Ghag
2023-03-23 20:06 ` [PATCH v4 bpf-next 2/4] bpf: Add bpf_sock_destroy kfunc Aditi Ghag
2023-03-23 23:58   ` Martin KaFai Lau
2023-03-24 21:37   ` Stanislav Fomichev
2023-03-30 14:42     ` Aditi Ghag
2023-03-30 16:32       ` Stanislav Fomichev
2023-03-30 17:30         ` Martin KaFai Lau
2023-04-03 15:58           ` Aditi Ghag
2023-03-23 20:06 ` [PATCH v4 bpf-next 3/4] bpf,tcp: Avoid taking fast sock lock in iterator Aditi Ghag
2023-03-24 21:45   ` Stanislav Fomichev [this message]
2023-03-28 15:20     ` Aditi Ghag
2023-03-27 22:34   ` Martin KaFai Lau
2023-03-23 20:06 ` [PATCH v4 bpf-next 4/4] selftests/bpf: Add tests for bpf_sock_destroy Aditi Ghag
2023-03-24 21:52   ` Stanislav Fomichev
2023-03-27 15:57     ` Aditi Ghag
2023-03-27 16:54       ` Stanislav Fomichev
2023-03-28 17:50         ` Aditi Ghag
2023-03-28 18:35           ` Stanislav Fomichev
2023-03-29 23:13             ` Aditi Ghag
2023-03-29 23:25               ` Aditi Ghag
2023-03-29 23:25               ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZB4Z7cnF+RDMaKvW@google.com \
    --to=sdf@google.com \
    --cc=aditi.ghag@isovalent.com \
    --cc=bpf@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox