From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74C13EB64DC for ; Fri, 23 Jun 2023 01:03:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231162AbjFWBD1 (ORCPT ); Thu, 22 Jun 2023 21:03:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231332AbjFWBDW (ORCPT ); Thu, 22 Jun 2023 21:03:22 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51DB31FC0 for ; Thu, 22 Jun 2023 18:03:21 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 326416194A for ; Fri, 23 Jun 2023 01:03:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10F17C433CA; Fri, 23 Jun 2023 01:03:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687482200; bh=iHf4azkVY38aA+dvf79V6sYf9tJiFnPVprdxbc8Qtas=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=RTCjj1PS6m21+k8JHcWsfhIShN0Lke46U7JMzMwr72FLa7Caj7XdXx7SM2MNOwSu0 x5cuyL8JrXPwQyMcdeuMIvuv70yl9ZtTpP73RsRN0dIjAfTU/6R/8F4boPpIeh/lAC LW8GZt11/ZXA653P48mago3Qyx2eNBXfff32HJ0eQeH6zVSUUUAUT6srMkTkbOakks n2Z4m5+qzd7BWfWpNsPW+4mmEzqUH/oksyKtZw/JH0Ys9+fGcD1aD1h0vDogsk0JV0 IYWsW0pjlYSc5tb+6FMtu3TLl1IkForF72mWLkgvPfO7t9WfcybR9m7cEDYrEtSD62 jzjxeAPlTJoPg== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id EE1A327C0054; Thu, 22 Jun 2023 21:03:18 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Thu, 22 Jun 2023 21:03:18 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeegvddgfedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpeduveffvdegvdefhfegjeejlefgtdffueekudfgkeduvdetvddu ieeluefgjeeggfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 7A6A231A0063; Thu, 22 Jun 2023 21:03:18 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-499-gf27bbf33e2-fm-20230619.001-gf27bbf33 Mime-Version: 1.0 Message-Id: <82b79e57-a0ad-4559-abc9-858e0f51fbba@app.fastmail.com> In-Reply-To: References: <20230607235352.1723243-1-andrii@kernel.org> <5eb4264e-d491-a7a2-93c7-928b06ce264d@redhat.com> <5a75d1f0-4ed9-399c-4851-2df0755de9b5@redhat.com> Date: Thu, 22 Jun 2023 18:02:57 -0700 From: "Andy Lutomirski" To: "Andrii Nakryiko" , "Maryam Tahhan" Cc: "Andrii Nakryiko" , bpf@vger.kernel.org, linux-security-module@vger.kernel.org, "Kees Cook" , "Christian Brauner" , lennart@poettering.net, cyphar@cyphar.com, kernel-team@meta.com Subject: Re: [PATCH v2 bpf-next 00/18] BPF token Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: On Thu, Jun 22, 2023, at 11:40 AM, Andrii Nakryiko wrote: > On Thu, Jun 22, 2023 at 10:38=E2=80=AFAM Maryam Tahhan wrote: > > For CAP_BPF too broad. It is broad, yes. If you have good ideas how to > break it down some more -- please propose. But this is all orthogonal, > because the blocking problem is fundamental incompatibility of user > namespaces (and their implied isolation and sandboxing of workloads) > and BPF functionality, which is global by its very nature. The latter > is unavoidable in principle. How, exactly, is BPF global by its very nature? The *implementation* has some issues with globalness. Much of it should= be fixable. > > No matter how much you break down CAP_BPF, you can't enforce that BPF > program won't interfere with applications in other containers. Or that > it won't "spy" on them. It's just not what BPF can enforce in > principle. The WHOLE POINT of the verifier is to attempt to constrain what BPF prog= rams can and can't do. There are bugs -- I get that. There are helper = functions that are fundamentally global. But, in the absence of verifie= r bugs, BPF has actual boundaries to its functionality. > > So that comes back down to a question of trust and then controlled > delegation of BPF functionality. You trust workload with BPF usage > because you reviewed the BPF code, workload, testing, etc? Grant BPF > token and let that container use limited subset of BPF. Employ BPF LSM > to further restrict it beyond what BPF token can control. > > You cannot trust an application to not do something harmful? You > shouldn't grant it either CAP_BPF in init namespace, nor BPF token in > user namespace. That's it. Pick your poison. I think what's lost here is hardening vs restricting intended functional= ity. We have access control to restrict intended functionality. We have othe= r (and generally fairly ad-hoc and awkward) ways to flip off functionali= ty because we want to reduce exposure to any bugs in it. BPF needs hardening -- this is well established. Right now, this is acc= omplished by restricting it to global root (effectively). It should hav= e access controls, too, but it doesn't. > > But all this cannot be mechanically decided or enforced. There has to > be some humans involved in making these decisions. Kernel's job is to > provide building blocks to grant and control BPF functionality to the > extent that it is technically possible. > Exactly. And it DOES NOT. bpf maps, etc do not have sensible access co= ntrols. Things that should not be global are global. I'm saying the ke= rnel should fix THAT. Once it's in a state that it's at least credible = to allow BPF in a user namespace, than come up with a way to allow it. > As for "something to isolate the pinned maps/progs by different apps > (why not DAC rules?)", there is no such thing, as I've explained > already. > > I can install sched_switch raw_tracepoint BPF program (if I'm allowed > to), and that program has system-wide observability. It cannot be > bound to an application. Great, a real example! Either: (a) don't run this in a container. Have a service for the container to = request the help of this program. (b) have a way to have root approve a particular program and expose *tha= t* program to the container, and let the program have its own access con= trols internally (e.g. only output info that belongs to that container). > then what do we do when we switch from process A in container > X to process B in container Y? Is that event belonging to container X? > Or container Y? I don't know, but you had better answer this question before you run thi= s thing in a container, not just for security but for basic functionalit= y. If you haven't defined what your program is even supposed to do in a= container, don't run it there. > Hopefully you can see where I'm going with this. And this is just one > random tiny example. We can think up tons of other cases to prove BPF > is not isolatable to any sort of "container". No. You have not come up with an example of why BPF is not isolatable t= o a container. You have come up with an example of why binding to a sch= ed_switch raw tracepoint does not make sense in a container without addi= tional mechanisms to give it well defined functionality and appropriate = security. Please stop conflating BPF (programs, maps, etc) with *attachments* of B= PF programs to systemwide things. They're both under the BPF umbrella. = They're not the same thing. Passing a token into a container that allow that container to do things = like loading its own programs *and attaching them to raw tracepoints* is= IMO a complete nonstarter. It makes no sense.