From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEF26C7EE25 for ; Fri, 9 Jun 2023 22:57:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231294AbjFIW5P (ORCPT ); Fri, 9 Jun 2023 18:57:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229587AbjFIW5O (ORCPT ); Fri, 9 Jun 2023 18:57:14 -0400 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A20532D41; Fri, 9 Jun 2023 15:57:13 -0700 (PDT) Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-51496f57e59so3276664a12.2; Fri, 09 Jun 2023 15:57:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686351432; x=1688943432; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fCcFQvwlT5seqLJ1ulWZ3vRZZR2lvjPV8fCYXApwbH4=; b=kH3tnEULzr4ab5xK56zG5MiR7jMvjp35Vm6yAwnXqS8KUszZB/S6ahaPjY60UDfYCx VMQxdZC7td3Ctk78Ozyp9jBWDZYEFPob5IiYwPPBzqOx8hPAT4tAl8dYiSRpq8FQy8bD LvyKi7O/OFoq+sOA2Zgwr4HW+rsRs+H6v1FWD/Hk4jYkY5fnnaQTiYhyumpFHfnpTZaw 9mw7WavU45gGYCIEOS+gxLfqlZBPtezqMoibJjxaMU5xdIzAyRMrcUExQFJVH7rFcu0R gTqodlyO13xtZ52Lkn2JxZ5YVM/TTaJQDm+WgsC9mnn/M/5o0ds+5G7Q4Jy9Ceb+tqwe YG5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686351432; x=1688943432; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fCcFQvwlT5seqLJ1ulWZ3vRZZR2lvjPV8fCYXApwbH4=; b=fZQcXX2ESalHZfpWi/0wD/bCPAsFkW1pKDNCySv11IGigCULVaQVnlzPci22vV9W6Y qJ6YRdxL7GZT63Cpdfqz9GUpTbBvEIVdpfXL24t3lbDdugOaVba4WP1GsuqYLiLkRkFJ RNWDSn0Vgvjvq2S5or1yncnE80B7Tx4bw8B4uB5/emUH3OZBMvfZsV+i8hv9lxrT9ukW x+/GAqDPglYbk9XJDvkD04getNtHfHp3LtZPri8eUF3y+Urls39sCGLEVYu/dMN90j1O nczsnS+1j7IXwX+2lP9SwOQKqRynPf8JCsC+NeJjp+yBPUsMfLDMKk5RZ/PQuKMvQMzW aAng== X-Gm-Message-State: AC+VfDwl7Z15DNRmICz+8Vj3ZuNGM8TC+UqZmVaTq+xDkA2IoFHsUKRF I2F9eljdjYLWVu0WuxOGal8ZuYU/l/fl97iF/hk= X-Google-Smtp-Source: ACHHUZ5ce2sWlE9/G3bD6e/WfDXlMstl1ra1AbTaeM5u878zGGteHgPuD1xcQJjt3oqnIc6BFeoKYz+ZIjtUvpYl1i0= X-Received: by 2002:a17:907:25c4:b0:969:9fd0:7ce7 with SMTP id ae4-20020a17090725c400b009699fd07ce7mr2564053ejc.11.1686351432030; Fri, 09 Jun 2023 15:57:12 -0700 (PDT) MIME-Version: 1.0 References: <20230607235352.1723243-1-andrii@kernel.org> In-Reply-To: From: Andrii Nakryiko Date: Fri, 9 Jun 2023 15:57:00 -0700 Message-ID: Subject: Re: [PATCH v2 bpf-next 00/18] BPF token To: Djalal Harouni Cc: Andrii Nakryiko , bpf@vger.kernel.org, linux-security-module@vger.kernel.org, keescook@chromium.org, brauner@kernel.org, lennart@poettering.net, cyphar@cyphar.com, luto@kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: On Fri, Jun 9, 2023 at 3:30=E2=80=AFPM Djalal Harouni wr= ote: > > Hi Andrii, > > On Thu, Jun 8, 2023 at 1:54=E2=80=AFAM Andrii Nakryiko wrote: > > > > This patch set introduces new BPF object, BPF token, which allows to de= legate > > a subset of BPF functionality from privileged system-wide daemon (e.g., > > systemd or any other container manager) to a *trusted* unprivileged > > application. Trust is the key here. This functionality is not about all= owing > > unconditional unprivileged BPF usage. Establishing trust, though, is > > completely up to the discretion of respective privileged application th= at > > would create a BPF token. > > > > The main motivation for BPF token is a desire to enable containerized > > BPF applications to be used together with user namespaces. This is curr= ently > > impossible, as CAP_BPF, required for BPF subsystem usage, cannot be nam= espaced > > or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to = BPF > > helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safe= ly read > > arbitrary memory, and it's impossible to ensure that they only read mem= ory of > > processes belonging to any given namespace. This means that it's imposs= ible to > > have namespace-aware CAP_BPF capability, and as such another mechanism = to > > allow safe usage of BPF functionality is necessary. BPF token and deleg= ation > > of it to a trusted unprivileged applications is such mechanism. Kernel = makes > > no assumption about what "trusted" constitutes in any particular case, = and > > it's up to specific privileged applications and their surrounding > > infrastructure to decide that. What kernel provides is a set of APIs to= create > > and tune BPF token, and pass it around to privileged BPF commands that = are > > creating new BPF objects like BPF programs, BPF maps, etc. > > Is there a reason for coupling this only with the userns? There is no coupling. Without userns it is at least possible to grant CAP_BPF and other capabilities from init ns. With user namespace that becomes impossible. > The "trusted unprivileged" assumed by systemd can be in init userns? It doesn't have to be systemd, but yes, BPF token can be created only when you have CAP_SYS_ADMIN in init ns. It's in line with restrictions on a bunch of other bpf() syscall commands (like GET_FD_BY_ID family of commands). > > > > Previous attempt at addressing this very same problem ([0]) attempted t= o > > utilize authoritative LSM approach, but was conclusively rejected by up= stream > > LSM maintainers. BPF token concept is not changing anything about LSM > > approach, but can be combined with LSM hooks for very fine-grained secu= rity > > policy. Some ideas about making BPF token more convenient to use with L= SM (in > > particular custom BPF LSM programs) was briefly described in recent LSF= /MM/BPF > > 2023 presentation ([1]). E.g., an ability to specify user-provided data > > (context), which in combination with BPF LSM would allow implementing a= very > > dynamic and fine-granular custom security policies on top of BPF token.= In the > > interest of minimizing API surface area discussions this is going to be > > added in follow up patches, as it's not essential to the fundamental co= ncept > > of delegatable BPF token. > > > > It should be noted that BPF token is conceptually quite similar to the = idea of > > /dev/bpf device file, proposed by Song a while ago ([2]). The biggest > > difference is the idea of using virtual anon_inode file to hold BPF tok= en and > > allowing multiple independent instances of them, each with its own set = of > > restrictions. BPF pinning solves the problem of exposing such BPF token > > through file system (BPF FS, in this case) for cases where transferring= FDs > > over Unix domain sockets is not convenient. And also, crucially, BPF to= ken > > approach is not using any special stateful task-scoped flags. Instead, = bpf() > > What's the use case for transfering over unix domain sockets? I'm not sure I understand the question. Unix domain socket (specifically its SCM_RIGHTS ancillary message) allows to transfer files between processes, which is one way to pass BPF object (like prog/map/link, and now token). BPF FS is the other one. In practice it's usually BPF FS, but there is no presumption about how file reference is transferred. > > Will BPF token translation happen if you cross the different namespaces? What does BPF token translation mean specifically? Currently it's a very simple kernel object with refcnt and a few flags, so there is nothing to translate? > > If the token is pinned into different bpffs, will the token share the > same context? So I was planning to allow a user process creating a BPF token to specify custom user-provided data (context). This is not in this patch set, but is it what you are asking about? Regardless, pinning BPF object in BPF FS is just basically bumping a refcnt and exposes that object in a way that can be looked up through file system path (using bpf() syscall's BPF_OBJ_GET command). Underlying object isn't cloned or copied, it's exactly the same object with the same shared internal state.