From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D49ECC5B577 for ; Thu, 27 Jun 2019 23:40:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A5FCF208E3 for ; Thu, 27 Jun 2019 23:40:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561678857; bh=CDo9minwTgEePCDC4F9rTaw0sQddTAjaxt5bp/QDl2E=; h=Subject:To:Cc:References:From:Date:In-Reply-To:List-ID:From; b=OKkuzWElcfkK/+y4w5zzTCJFIr14EgO8UAkL8dIKvnlkmAJ7GRwYFe+7iTu7o5ZvM n3mE5GfOi94XiscgqJkOtQkAm36vYXOsEcvNs4+Nh5LpFEQD3gkfRKTvcLL9IMNjlS xBIN5gD6BWXZ6NxXGn3RYWoEfXjhcxIvqCGc9YzY= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726632AbfF0Xk5 (ORCPT ); Thu, 27 Jun 2019 19:40:57 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:46940 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726567AbfF0Xk4 (ORCPT ); Thu, 27 Jun 2019 19:40:56 -0400 Received: by mail-pl1-f193.google.com with SMTP id e5so2106843pls.13 for ; Thu, 27 Jun 2019 16:40:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Ios0tdBrbMiaNhZPetGcknsZ2m4TvlD/r/ST83vdgkU=; b=iE/G78wFyyB65U+Bi43/j/YqLe5orvilwHiOtq0Dcw9mHHgZxleJDuDbzBGmZ/oc6m d7KqNl0pj/PKYdtyKFwXJxC/PDoEUOsgDFWtC8VPJtMW5rzxa0oeySQp5uQhPzfFUBWU oSI2VTyvb8WOx7aENgftSZ1hWZgrmY1Pj/I8Aud/gGXm2s1VPQOnSKjz5GuOB7WPmCI0 4gQSIApK31v6UB7UVf//CVz9QyeHcqn6Bw8Pjrf5aY5/Ts9RqNFnQ+rKnLB/HGB1838F HlCQpBWiTvHeI2I9EhXKQof/BmdPLJA0dJeMcQBlNReEgi4y7JDozFcQ+aXdEdyJxIpO BaFw== X-Gm-Message-State: APjAAAX+NjlqFpf4skkkqxvSsZ0071DKADRqQLW0IuZDDs03A+Q9mnXj YkOTme7UWm6nIRO1WXMg1vCTww== X-Google-Smtp-Source: APXvYqwsttsvjEpOCAXUUio75uJQWVe+qFGneHy2HZGg6qjAlVubgnEg3neSdlA/KXt9cZVlYsxquw== X-Received: by 2002:a17:902:2a6b:: with SMTP id i98mr7303208plb.75.1561678856059; Thu, 27 Jun 2019 16:40:56 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:3602:86ff:fef6:e86b? ([2601:646:c200:1ef2:3602:86ff:fef6:e86b]) by smtp.googlemail.com with ESMTPSA id d1sm199095pgd.50.2019.06.27.16.40.54 (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jun 2019 16:40:55 -0700 (PDT) Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf To: Song Liu , netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, kernel-team@fb.com, lmb@cloudflare.com, jannh@google.com, gregkh@linuxfoundation.org, linux-abi@vger.kernel.org, kees@chromium.org References: <20190627201923.2589391-1-songliubraving@fb.com> <20190627201923.2589391-2-songliubraving@fb.com> From: Andy Lutomirski Message-ID: <21894f45-70d8-dfca-8c02-044f776c5e05@kernel.org> Date: Thu, 27 Jun 2019 16:40:38 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <20190627201923.2589391-2-songliubraving@fb.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 6/27/19 1:19 PM, Song Liu wrote: > This patch introduce unprivileged BPF access. The access control is > achieved via device /dev/bpf. Users with write access to /dev/bpf are able > to call sys_bpf(). > > Two ioctl command are added to /dev/bpf: > > The two commands enable/disable permission to call sys_bpf() for current > task. This permission is noted by bpf_permitted in task_struct. This > permission is inherited during clone(CLONE_THREAD). > > Helper function bpf_capable() is added to check whether the task has got > permission via /dev/bpf. > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 0e079b2298f8..79dc4d641cf3 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -9134,7 +9134,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, > env->insn_aux_data[i].orig_idx = i; > env->prog = *prog; > env->ops = bpf_verifier_ops[env->prog->type]; > - is_priv = capable(CAP_SYS_ADMIN); > + is_priv = bpf_capable(CAP_SYS_ADMIN); Huh? This isn't a hardening measure -- the "is_priv" verifier mode allows straight-up leaks of private kernel state to user mode. (For that matter, the pending lockdown stuff should possibly consider this a "confidentiality" issue.) I have a bigger issue with this patch, though: it's a really awkward way to pretend to have capabilities. For bpf, it seems like you could make this be a *real* capability without too much pain since there's only one syscall there. Just find a way to pass an fd to /dev/bpf into the syscall. If this means you need a new bpf_with_cap() syscall that takes an extra argument, so be it. The old bpf() syscall can just translate to bpf_with_cap(..., -1). For a while, I've considered a scheme I call "implicit rights". There would be a directory in /dev called /dev/implicit_rights. This would either be part of devtmpfs or a whole new filesystem -- it would *not* be any other filesystem. The contents would be files that can't be read or written and exist only in memory. You create them with a privileged syscall. Certain actions that are sensitive but not at the level of CAP_SYS_ADMIN (use of large-attack-surface bpf stuff, creation of user namespaces, profiling the kernel, etc) could require an "implicit right". When you do them, if you don't have CAP_SYS_ADMIN, the kernel would do a path walk for, say, /dev/implicit_rights/bpf and, if the object exists, can be opened, and actually refers to the "bpf" rights object, then the action is allowed. Otherwise it's denied. This is extensible, and it doesn't require the rather ugly per-task state of whether it's enabled. For things like creation of user namespaces, there's an existing API, and the default is that it works without privilege. Switching it to an implicit right has the benefit of not requiring code changes to programs that already work as non-root. But, for BPF in particular, this type of compatibility issue doesn't exist now. You already can't use most eBPF functionality without privilege. New bpf-using programs meant to run without privilege are *new*, so they can use a new improved API. So, rather than adding this obnoxious ioctl, just make the API explicit, please. Also, please cc: linux-abi next time.