From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C586C282C3 for ; Tue, 22 Jan 2019 17:06:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C3BD320866 for ; Tue, 22 Jan 2019 17:06:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=fomichev-me.20150623.gappssmtp.com header.i=@fomichev-me.20150623.gappssmtp.com header.b="H9Y7r4oV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729394AbfAVRGu (ORCPT ); Tue, 22 Jan 2019 12:06:50 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:46123 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729316AbfAVRGu (ORCPT ); Tue, 22 Jan 2019 12:06:50 -0500 Received: by mail-pl1-f196.google.com with SMTP id t13so11753128ply.13 for ; Tue, 22 Jan 2019 09:06:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fomichev-me.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=cuYfLAME2NlhvogRDpbH0v7blYMzvbebz+IzPkZODEQ=; b=H9Y7r4oVRAl3ztI2X/WvY4mO8gZ3Urx/D1P0TDeXCFlL8v3EB0dvh2/mCVEbjNUt6x 2+74hpQl5cbYfyJV83JBwVwxl1q9Jf1aXQbVhNjpykdEK81OFO10IxClbnLNfLRm2D5c Jy4hue9PTPXxxIvE0SqCQbHKVyF5y+J/GJqgRQkkiWKmIL0tWZamYS4muf08PgUybT4d dRzG001yMCYaA46DKtFB5VgCIBmwmf7RW4s6IHbhS+lbxInt5nJEzF3OwPOUmIgCu3oY qqtY5eXuVwUf04ywaGq6NzlNsJOIloqNdSEW4MJ7QqPCGjWxpD0qUwzhXBdnlqali4Pe 3EgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=cuYfLAME2NlhvogRDpbH0v7blYMzvbebz+IzPkZODEQ=; b=iLYSMbzuKM0tNsDtoh5dJPbYS7TIGQ3ScfWi2PUMPUywjLDbM3iy5l5S5sReP4o+6q zryrYSsRcc0IzLLF+FzHjdBG+8659FkzQ7LEWOEgPIy3h0g96EqWoBx6Q7eRNh7S20Dc 3qf/zrfB/rFruoAG3At2QKuLyGGtjQQ6vKvbfPX4oxf6A1hv+1FBj96WfGKOdzK8X0ES XHZ0TcCReau6/EZBkDHzPiJL3LIehp2kaJX/2A/mC/l5frAtLJJ7d2tdscSk4j86eWMY fSC//Vty+t2Mip/+L9TSApXGzdklEDliHFaYwzzgKqgWmDgSWZ+IUB0e6CfGkaN0YxPf W+vA== X-Gm-Message-State: AJcUukdKlIY9SrlIwDUyNo5KnIPJeMWq4w53EErTlCh9lFU7Nv2Aw6rM quykOZyouYGSfrkrm+h9GoJYng== X-Google-Smtp-Source: ALg8bN7zx5bvRc+ugSovW2nDytPUyJRyQqfIfbNLO/z/5ekSqumC/zBhRXQypFDPV50B6MlDyhpFCA== X-Received: by 2002:a17:902:2e01:: with SMTP id q1mr33994814plb.97.1548176437698; Tue, 22 Jan 2019 09:00:37 -0800 (PST) Received: from localhost ([2601:646:8f00:18d9:d0fa:7a4b:764f:de48]) by smtp.gmail.com with ESMTPSA id q1sm19346520pgs.14.2019.01.22.09.00.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 22 Jan 2019 09:00:36 -0800 (PST) Date: Tue, 22 Jan 2019 09:00:35 -0800 From: Stanislav Fomichev To: Andrey Ignatov Cc: Stanislav Fomichev , "netdev@vger.kernel.org" , "davem@davemloft.net" , "ast@kernel.org" , "daniel@iogearbox.net" , "edumazet@google.com" Subject: Re: [PATCH bpf-next 0/5] add bpf cgroup hooks that trigger on socket close Message-ID: <20190122170035.GB26773@mini-arch> References: <20190118004106.163825-1-sdf@google.com> <20190118023654.GB8342@rdna-mbp.dhcp.thefacebook.com> <20190118165033.GA26773@mini-arch> <20190118223926.GA15896@rdna-mbp.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190118223926.GA15896@rdna-mbp.dhcp.thefacebook.com> User-Agent: Mutt/1.11.2 (2019-01-07) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 01/18, Andrey Ignatov wrote: > Stanislav Fomichev [Fri, 2019-01-18 08:50 -0800]: > > On 01/18, Andrey Ignatov wrote: > > > Stanislav Fomichev [Thu, 2019-01-17 16:41 -0800]: > > > > Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on > > > > socket creation and there is no way to know when the socket is being > > > > closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE > > > > that trigger when the socket is closed. > > > > > > > > Initial intended usecase is to cleanup statistics after POST{4,6}_BIND. > > > > Hooks have read-only access to all fields of struct bpf_sock. > > > > > > Do you need it for both TCP and UDP? > > Yes, we need both TCP and UDP. Although, UDP is tricky in general with > > the connected/unconnected cases. > > > > > I was thinking about this hook earlier but since in my case only TCP was > > > needed I ended up using TCP-BPF. E.g. be BPF_SOCK_OPS_TCP_LISTEN_CB or > > > BPF_SOCK_OPS_TCP_CONNECT_CB can be used instead of POST{4,6}_BIND to > > > enable something, and then BPF_SOCK_OPS_STATE_CB can be used instead of > > > SOCK_RELEASE to disable that something when socket transisions to > > > BPF_TCP_CLOSE (e.g. BPF_TCP_LISTEN -> BPF_TCP_CLOSE). > > > > > > That turned out to be much cleaner than POST{4,6}_BIND and also works > > > fine when socket is disconnected with AF_UNSPEC and then connected again > > > (what Eric mentioned). > > > > What if we do something like the patch below? Add pre_release hook (like we > > currently have for pre_connect) and call it from connect(AF_UNSPEC) and > > from inet_release? Any concerns here? > > That's hard to say w/o fully understanding the use-case. Could you > provide more details on it please? The use-case is a simple lightweight per-cgroup bpf audit (without involving audit subsystem). Basically record the following events: when the socket is created, when (and where) it's connected/bound and when it's torn down (plus, _maybe_ rx/tx stats). > Is my understanding correct that some statistics is needed only for > _client_ sockets (since we're talking about connect) and these client > sockets are always bound to local IP:port (or maybe IP only with > IP_BIND_ADDRESS_NO_PORT?) before sending data and that's why > POST{4,6}_BIND is used to start gathering statistics? > > And then later, there should be some event to cleanup statistics and you > chose ops->release for this? If so what's kind of statistics is needed? > Is it per-destination statistics (since AF_UNSPEC matters?)? If so then > it should be cleaned up whenever destination is changed or when socket > is disconnected the last time. > > For UDP socket connect(2) can be called many times with different > destinations even w/o AF_UNSPEC in between (in contrast to TCP). So your > patch with pre_release below won't handle it. Furthermore even if > connect(2) was used to set destination, sendmsg(2) can override it for > individual datagram. > > If per-destination statistics is needed for UDP client socket then IMO > both "start events" should be handled: BPF_CGROUP_INET{4,6}_CONNECT > (connected UDP) and BPF_CGROUP_UDP{4,6}_SENDMSG (unconnected UDP). The > former can be used to "close" previous statistics "window" (if it's not > the first connect) as well. The latter can be used to created "separate" > statistics "window" for destination of current datagram only. And the > only case left, from what I see, is when socket is stopped being used > completely. That's, again, hard to say how to handle it better w/o > understanding use-case (e.g. since it's only UDP problem it may make > sense to implement in UDP stack). > > > (I agree that TCP is probably better handled via BPF_SOCK_OPS_TCP_XYZ hooks, > > but we need something for UDP as well) > > Do you think you may handle TCP with TCP-BPF and this patch set may > focus on UDP only if there is no good attach point that covers both TCP > and UDP? That may lead to different BPF programs for TCP and UDP but > that may be hard to avoid anyway (BPF_CGROUP_UDP{4,6}_SENDMSG is a good > example here). Let me look closer into TCP-BPF hooks, it looks like that would be enough for TCP at least. I'll get back to you in case there is an issue. I'll also think about UDP use case. Maybe we should do as you suggested and support release only in UDP. (Or maybe have release for TCP+UDP and an additional 'disconnect' hook to handle connect(AF_UNSPEC), that might be more generic). But thanks for you input, really appreciate it! > > > > > -- > > > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index b703ad242365..ee3dc181df8f 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -568,8 +568,11 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, > > > > if (addr_len < sizeof(uaddr->sa_family)) > > return -EINVAL; > > - if (uaddr->sa_family == AF_UNSPEC) > > + if (uaddr->sa_family == AF_UNSPEC) { > > + if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk)) > > + sk->sk_prot->pre_release(sk); > > return sk->sk_prot->disconnect(sk, flags); > > + } > > > > if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { > > err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); > > @@ -632,6 +635,8 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, > > return -EINVAL; > > > > if (uaddr->sa_family == AF_UNSPEC) { > > + if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk)) > > + sk->sk_prot->pre_release(sk); > > err = sk->sk_prot->disconnect(sk, flags); > > sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED; > > goto out; > > > > > > First patch adds hooks, the rest of the patches add uapi and tests to make > > > > sure these hooks work. > > > > > > > > Stanislav Fomichev (5): > > > > bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE hooks > > > > tools: bpf: support BPF_CGROUP_INET{4,6}_SOCK_RELEASE in > > > > libbpf/bpftool > > > > selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to > > > > test_section_names.c > > > > selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to test_sock.c > > > > selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to > > > > test_sock_addr.c > > > > > > > > include/linux/bpf-cgroup.h | 6 + > > > > include/net/inet_common.h | 1 + > > > > include/uapi/linux/bpf.h | 2 + > > > > kernel/bpf/syscall.c | 8 ++ > > > > net/core/filter.c | 7 + > > > > net/ipv4/af_inet.c | 13 +- > > > > net/ipv6/af_inet6.c | 5 +- > > > > tools/bpf/bpftool/cgroup.c | 2 + > > > > tools/include/uapi/linux/bpf.h | 2 + > > > > tools/lib/bpf/libbpf.c | 4 + > > > > .../selftests/bpf/test_section_names.c | 10 ++ > > > > tools/testing/selftests/bpf/test_sock.c | 119 ++++++++++++++++ > > > > tools/testing/selftests/bpf/test_sock_addr.c | 131 +++++++++++++++++- > > > > 13 files changed, 307 insertions(+), 3 deletions(-) > > > > > > > > -- > > > > 2.20.1.321.g9e740568ce-goog > > > > > > > > > > -- > > > Andrey Ignatov > > -- > Andrey Ignatov