From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98A89C6FD18 for ; Tue, 18 Apr 2023 16:48:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232495AbjDRQsX (ORCPT ); Tue, 18 Apr 2023 12:48:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231488AbjDRQsW (ORCPT ); Tue, 18 Apr 2023 12:48:22 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E16C118EA for ; Tue, 18 Apr 2023 09:48:05 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-63b67a25fc1so1545755b3a.1 for ; Tue, 18 Apr 2023 09:48:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681836469; x=1684428469; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=5rSxGUFrAmISVwIuFMTX0Aaejk/CKt02QLw7n2d5NOs=; b=PfoljIC5AZgXfIxDU8fcFKkh+Ai7eQzzLzicR4jBjiom7LEoYDZGvTMDGrbG1Z4Ud7 uMFVTAK+JTtzoS0MKvtPGWwdSGtIBD8kbc4SfLXHCjA0I+SdKKcDb46CvxZO4RN+QKcr 4hGNVdTIfrAcrWEkQiQ8/3e7ZlrlhVk194MVJD6NQvH/IjLipRSf4B0QGnTZ6j+s3rfO FoaxHebR4VegmT9yDpEETwUTkmKAd4ZKG8Yfxznn/n4SwSaLHbnH7ecVFw7Uf6szdwMe 597Gvj4Ii7piDYSmLzXGyGrd1L9thTQGMBdxwypwEggoGYj/8WHWeiNd3wF6/PVMsJk1 sufA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681836469; x=1684428469; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=5rSxGUFrAmISVwIuFMTX0Aaejk/CKt02QLw7n2d5NOs=; b=VLO7Mk0i9CTSJDmwEK81d+vBaawWFc5y6gO6YIy/nY7Fkh2x8D2uF+I/+9IZuOPBf8 aGU6COcIbYQEW0fmEUOIyTER4WvmSpNPc3dz9nYJmZpC1aHEp0WFrVZc2mqVDc0iUGmk rfnYOhVpS8jOkqc9f00qBs0WWUTHeLAa5F7hlTvUOd/pf4VyCqsBjABzbyuI5NHJk39a f9UDgGFQflao0VKN813xEHeluuwylA3lq3tSMUHc2JOhzpMmGAy4+xPuv/6K8ueCZWZc dljSUoA3X89Rg9Wt4qnceCi+VrKYt5ejuwOyj6OymFWOGp9MblMO6R/5cnxCc1FSfuPe UpDw== X-Gm-Message-State: AAQBX9e5XeV5xhFnZ/lkOnOeOplUoyYA+kMHahxpwUqC9abSO/oUXMY8 RU/WKidR/UfK1z4vv57cNcn8z6Y= X-Google-Smtp-Source: AKy350ZqYBxtk9IGpUQI+ESsiEZ06M7bEzkTzb5twCE4jk0SIv/hlG6z2jLAoF7+l4gdKxwicmpHmjg= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a05:6a00:17a0:b0:63b:8778:99e5 with SMTP id s32-20020a056a0017a000b0063b877899e5mr195402pfg.5.1681836468849; Tue, 18 Apr 2023 09:47:48 -0700 (PDT) Date: Tue, 18 Apr 2023 09:47:47 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230413133355.350571-1-aleksandr.mikhalitsyn@canonical.com> <20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com> Message-ID: Subject: Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) From: Stanislav Fomichev To: Martin KaFai Lau Cc: Eric Dumazet , davem@davemloft.net, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, daniel@iogearbox.net, Jakub Kicinski , Paolo Abeni , Leon Romanovsky , David Ahern , Arnd Bergmann , Kees Cook , Christian Brauner , Kuniyuki Iwashima , Lennart Poettering , linux-arch@vger.kernel.org, Aleksandr Mikhalitsyn , bpf Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-arch@vger.kernel.org On 04/17, Martin KaFai Lau wrote: > On 4/14/23 6:55 PM, Stanislav Fomichev wrote: > > On 04/13, Stanislav Fomichev wrote: > > > On Thu, Apr 13, 2023 at 7:38=E2=80=AFAM Aleksandr Mikhalitsyn > > > wrote: > > > >=20 > > > > On Thu, Apr 13, 2023 at 4:22=E2=80=AFPM Eric Dumazet wrote: > > > > >=20 > > > > > On Thu, Apr 13, 2023 at 3:35=E2=80=AFPM Alexander Mikhalitsyn > > > > > wrote: > > > > > >=20 > > > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Chris= tian), > > > > > > that bpf cgroup hook can cause FD leaks when used with sockopts= which > > > > > > install FDs into the process fdtable. > > > > > >=20 > > > > > > After some offlist discussion it was proposed to add a blacklis= t of > > > > >=20 > > > > > We try to replace this word by either denylist or blocklist, even= in changelogs. > > > >=20 > > > > Hi Eric, > > > >=20 > > > > Oh, I'm sorry about that. :( Sure. > > > >=20 > > > > >=20 > > > > > > socket options those can cause troubles when BPF cgroup hook is= enabled. > > > > > >=20 > > > > >=20 > > > > > Can we find the appropriate Fixes: tag to help stable teams ? > > > >=20 > > > > Sure, I will add next time. > > > >=20 > > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hook= s") > > > >=20 > > > > I think it's better to add Stanislav Fomichev to CC. > > >=20 > > > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already > > > use it for tcp zerocopy, I'm assuming it should work in this case as > > > well? > >=20 > > Jakub reminded me of the other things I wanted to ask here bug forgot: > >=20 > > - setsockopt is probably not needed, right? setsockopt hook triggers > > before the kernel and shouldn't leak anything > > - for getsockopt, instead of bypassing bpf completely, should we instea= d > > ignore the error from the bpf program? that would still preserve > > the observability aspect >=20 > stealing this thread to discuss the optlen issue which may make sense to > bypass also. >=20 > There has been issue with optlen. Other than this older post related to > optlen > PAGE_SIZE: > https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.de= v/, > the recent one related to optlen that we have seen is > NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen =3D=3D 0 and the= kernel > put the expected optlen (> 0) and 'return 0;' to userspace. The userspace > intention is to learn the expected optlen. This makes 'ctx.optlen > > max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning > -EFAULT to the userspace even the bpf prog has not changed anything. (ignoring -EFAULT issue) this seems like it needs to be if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { /* error */ } ? > Does it make sense to also bypass the bpf prog when 'ctx.optlen > > max_optlen' for now (and this can use a separate patch which as usual > requires a bpf selftests)? Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something seems like the way to go. It caused too much trouble already :-( Should I prepare a patch or do you want to take a stab at it? > In the future, does it make sense to have a specific cgroup-bpf-prog (a > specific attach type?) that only uses bpf_dynptr kfunc to access the optv= al > such that it can enforce read-only for some optname and potentially also > track if bpf-prog has written a new optval? The bpf-prog can only return = 1 > (OK) and only allows using bpf_set_retval() instead. Likely there is stil= l > holes but could be a seed of thought to continue polishing the idea. Ack, let's think about it. Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea as well? If we can have a sleepable hook that can copy_from_user/copy_to_us= er, and we have a mostly working bpf_getsockopt (after your refactoring), I don't see why we need to continue the current scheme of triggering after the kernel? > > - or maybe we can even have a per-proto bpf_getsockopt_cleanup call tha= t > > gets called whenever bpf returns an error to make sure protocols hav= e > > a chance to handle that condition (and free the fd) > >=20 >=20 >=20