* Re: [PATCH v2 0/5] Add support for O_MAYEXEC
From: Mickaël Salaün @ 2019-09-09 9:09 UTC (permalink / raw)
To: Aleksa Sarai, Andy Lutomirski
Cc: Steve Grubb, Florian Weimer, Mickaël Salaün,
linux-kernel, Alexei Starovoitov, Al Viro, Andy Lutomirski,
Christian Heimes, Daniel Borkmann, Eric Chiang, James Morris,
Jan Kara, Jann Horn, Jonathan Corbet, Kees Cook, Matthew Garrett,
Matthew Wilcox, Michael Kerrisk, Mimi Zohar,
Philippe Trébuchet
In-Reply-To: <20190906224410.lffd6l5lnm4z3hht@yavin.dot.cyphar.com>
On 07/09/2019 00:44, Aleksa Sarai wrote:
> On 2019-09-06, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Sep 6, 2019, at 12:07 PM, Steve Grubb <sgrubb@redhat.com> wrote:
>>>
>>>> On Friday, September 6, 2019 2:57:00 PM EDT Florian Weimer wrote:
>>>> * Steve Grubb:
>>>>> Now with LD_AUDIT
>>>>> $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test
>>>>> 2>&1 | grep passwd openat(3, "passwd", O_RDONLY) = 4
>>>>>
>>>>> No O_CLOEXEC flag.
>>>>
>>>> I think you need to explain in detail why you consider this a problem.
Right, LD_PRELOAD and such things are definitely not part of the threat
model for O_MAYEXEC, on purpose, because this must be addressed with
other security mechanism (e.g. correct file system access-control, IMA
policy, SELinux or other LSM security policies). This is a requirement
for O_MAYEXEC to be useful.
An interpreter is just a flexible program which is generic and doesn't
have other purpose other than behaving accordingly to external rules
(i.e. scripts). If you don't trust your interpreter, it should not be
executable in the first place. O_MAYEXEC enables to restrict the use of
(some) interpreters accordingly to a *global* system security policy.
>>>
>>> Because you can strip the O_MAYEXEC flag from being passed into the kernel.
>>> Once you do that, you defeat the security mechanism because it never gets
>>> invoked. The issue is that the only thing that knows _why_ something is being
>>> opened is user space. With this mechanism, you can attempt to pass this
>>> reason to the kernel so that it may see if policy permits this. But you can
>>> just remove the flag.
>>
>> I’m with Florian here. Once you are executing code in a process, you
>> could just emulate some other unapproved code. This series is not
>> intended to provide the kind of absolute protection you’re imagining.
>
> I also agree, though I think that there is a separate argument to be
> made that there are two possible problems with O_MAYEXEC (which might
> not be really big concerns):
>
> * It's very footgun-prone if you didn't call O_MAYEXEC yourself and
> you pass the descriptor elsewhere. You need to check f_flags to see
> if it contains O_MAYEXEC. Maybe there is an argument to be made that
> passing O_MAYEXECs around isn't a valid use-case, but in that case
> there should be some warnings about that.
That could be an issue if you don't trust your system, especially if the
mount points (and the "noexec" option) can be changed by untrusted
users. As I said above, there is a requirement for basic security
properties as a meaningful file system access control, and obviously not
letting any user change mount points (which can lead to much sever
security issues anyway).
If a process A pass a FD to an interpreter B, then the interpreter B
must trust the process A. Moreover, being able to tell if the FD was
open with O_MAYEXEC and relying on it may create a wrong feeling of
security. As I said in a previous email, being able to probe for
O_MAYEXEC does not make sense because it would not be enough to
know the system policy (either this flag is enforced or not, for mount
points, based on xattr, time…). The main goal of O_MAYEXEC is to ask the
kernel, on a trusted link (hence without LD_PRELOAD-like interfering),
for a file which is allowed to be interpreted/executed by this interpreter.
To be able to correctly handle the case you pointed out (FD passing),
either an existing or a new LSM should handle this behavior according to
the origin of the FD and the chain of processes getting it.
Some advanced LSM rules could tie interpreters with scripts dedicated to
them, and have different behavior for the same scripts but with
different interpreters.
>
> * There's effectively a TOCTOU flaw (even if you are sure O_MAYEXEC is
> in f_flags) -- if the filesystem becomes re-mounted noexec (or the
> file has a-x permissions) after you've done the check you won't get
> hit with an error when you go to use the file descriptor later.
Again, the threat model needs to be appropriate to make O_MAYEXEC
useful. The security policies of the system need to be seen as a whole,
and updated as such.
As for most file system access control on Linux, it may be possible to
have TOCTOU, but the whole system should be designed to protect against
that. For example, changing file access control (e.g. mount point
options) without a reboot may lead to inconsistent security properties,
which is why such thing are discouraged by some access control systems
(e.g. SELinux).
>
> To fix both you'd need to do what you mention later:
>
>> What the kernel *could* do is prevent mmapping a non-FMODE_EXEC file
>> with PROT_EXEC, which would indeed have a real effect (in an iOS-like
>> world, for example) but would break many, many things.
>
> And I think this would be useful (with the two possible ways of
> executing .text split into FMODE_EXEC and FMODE_MAP_EXEC, as mentioned
> in a sister subthread), but would have to be opt-in for the obvious
> reason you outlined. However, we could make it the default for
> openat2(2) -- assuming we can agree on what the semantics of a
> theoretical FMODE_EXEC should be.
>
> And of course we'd need to do FMODE_UPGRADE_EXEC (which would need to
> also permit fexecve(2) though probably not PROT_EXEC -- I don't think
> you can mmap() an O_PATH descriptor).
The mmapping restriction may be interesting but it is a different use
case. This series address the interpreter/script problem. Either the
script may be mapped executable is the choice of the interpreter. In
most cases, no script are mapped as such, exactly because they are
interpreted by a process but not by the CPU.
--
Mickaël Salaün
Les données à caractère personnel recueillies et traitées dans le cadre de cet échange, le sont à seule fin d’exécution d’une relation professionnelle et s’opèrent dans cette seule finalité et pour la durée nécessaire à cette relation. Si vous souhaitez faire usage de vos droits de consultation, de rectification et de suppression de vos données, veuillez contacter contact.rgpd@sgdsn.gouv.fr. Si vous avez reçu ce message par erreur, nous vous remercions d’en informer l’expéditeur et de détruire le message. The personal data collected and processed during this exchange aims solely at completing a business relationship and is limited to the necessary duration of that relationship. If you wish to use your rights of consultation, rectification and deletion of your data, please contact: contact.rgpd@sgdsn.gouv.fr. If you have received this message in error, we thank you for informing the sender and destroying the message.
^ permalink raw reply
* Re: [PATCH v2 0/5] Add support for O_MAYEXEC
From: James Morris @ 2019-09-09 0:16 UTC (permalink / raw)
To: Mickaël Salaün
Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
Andy Lutomirski, Christian Heimes, Daniel Borkmann, Eric Chiang,
Florian Weimer, Jan Kara, Jann Horn, Jonathan Corbet, Kees Cook,
Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
Scott Shell
In-Reply-To: <20190906152455.22757-1-mic@digikod.net>
[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]
On Fri, 6 Sep 2019, Mickaël Salaün wrote:
> Furthermore, the security policy can also be delegated to an LSM, either
> a MAC system or an integrity system. For instance, the new kernel
> MAY_OPENEXEC flag closes a major IMA measurement/appraisal interpreter
> integrity gap by bringing the ability to check the use of scripts [2].
To clarify, scripts are already covered by IMA if they're executed
directly, and the gap is when invoking a script as a parameter to the
interpreter (and for any sourced files). In that case only the interpreter
is measured/appraised, unless there's a rule also covering the script
file(s).
See:
https://en.opensuse.org/SDB:Ima_evm#script-behaviour
In theory you could probably also close the gap by modifying the
interpreters to check for the execute bit on any file opened for
interpretation (as earlier suggested by Steve Grubb), and then you could
have IMA measure/appraise all files with +x. I suspect this could get
messy in terms of unwanted files being included, and the MAY_OPENEXEC flag
has cleaner semantics.
--
James Morris
<jmorris@namei.org>
^ permalink raw reply
* Re: [PATCH bpf-next v10 06/10] bpf,landlock: Add a new map type: inode
From: Al Viro @ 2019-09-08 22:19 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Alexei Starovoitov, Mickaël Salaün, linux-kernel,
Alexei Starovoitov, Andrew Morton, Andy Lutomirski,
Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
David Drysdale, David S . Miller, Eric W . Biederman,
James Morris, Jann Horn, John Johansen, Jonathan Corbet,
Kees Cook, Michael Kerrisk, Paul Moore
In-Reply-To: <894922a2-1150-366c-3f08-c8b759da0742@digikod.net>
On Mon, Sep 09, 2019 at 12:09:57AM +0200, Mickaël Salaün wrote:
> >>> + rcu_read_lock();
> >>> + ptr = htab_map_lookup_elem(map, &inode);
> >>> + iput(inode);
Wait a sec. You are doing _what_ under rcu_read_lock()?
> >>> + if (IS_ERR(ptr)) {
> >>> + ret = PTR_ERR(ptr);
> >>> + } else if (!ptr) {
> >>> + ret = -ENOENT;
> >>> + } else {
> >>> + ret = 0;
> >>> + copy_map_value(map, value, ptr);
> >>> + }
> >>> + rcu_read_unlock();
^ permalink raw reply
* Re: [PATCH bpf-next v10 06/10] bpf,landlock: Add a new map type: inode
From: Mickaël Salaün @ 2019-09-08 22:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Mickaël Salaün, linux-kernel, Alexander Viro,
Alexei Starovoitov, Andrew Morton, Andy Lutomirski,
Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
David Drysdale, David S . Miller, Eric W . Biederman,
James Morris, Jann Horn, John Johansen, Jonathan Corbet,
Kees Cook, Michael Kerrisk, Paul Moore
In-Reply-To: <a870c2c9-d2f7-e0fa-c8cc-35dbf8b5b87d@ssi.gouv.fr>
On 31/07/2019 20:46, Mickaël Salaün wrote:
>
> On 27/07/2019 03:40, Alexei Starovoitov wrote:
>> On Sun, Jul 21, 2019 at 11:31:12PM +0200, Mickaël Salaün wrote:
>>> FIXME: 64-bits in the doc
>
> FYI, this FIXME was fixed, just not removed from this message. :)
>
>>>
>>> This new map store arbitrary values referenced by inode keys. The map
>>> can be updated from user space with file descriptor pointing to inodes
>>> tied to a file system. From an eBPF (Landlock) program point of view,
>>> such a map is read-only and can only be used to retrieved a value tied
>>> to a given inode. This is useful to recognize an inode tagged by user
>>> space, without access right to this inode (i.e. no need to have a write
>>> access to this inode).
>>>
>>> Add dedicated BPF functions to handle this type of map:
>>> * bpf_inode_htab_map_update_elem()
>>> * bpf_inode_htab_map_lookup_elem()
>>> * bpf_inode_htab_map_delete_elem()
>>>
>>> This new map require a dedicated helper inode_map_lookup_elem() because
>>> of the key which is a pointer to an opaque data (only provided by the
>>> kernel). This act like a (physical or cryptographic) key, which is why
>>> it is also not allowed to get the next key.
>>>
>>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>>
>> there are too many things to comment on.
>> Let's do this patch.
>>
>> imo inode_map concept is interesting, but see below...
>>
>>> +
>>> + /*
>>> + * Limit number of entries in an inode map to the maximum number of
>>> + * open files for the current process. The maximum number of file
>>> + * references (including all inode maps) for a process is then
>>> + * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
>>> + * is 0, then any entry update is forbidden.
>>> + *
>>> + * An eBPF program can inherit all the inode map FD. The worse case is
>>> + * to fill a bunch of arraymaps, create an eBPF program, close the
>>> + * inode map FDs, and start again. The maximum number of inode map
>>> + * entries can then be close to RLIMIT_NOFILE^3.
>>> + */
>>> + if (attr->max_entries > rlimit(RLIMIT_NOFILE))
>>> + return -EMFILE;
>>
>> rlimit is checked, but no fd are consumed.
>> Once created such inode map_fd can be passed to a different process.
>> map_fd can be pinned into bpffs.
>> etc.
>> what the value of the check?
>
> I was looking for the most meaningful limit for a process, and rlimit is
> the best I found. As the limit of open FD per processes, rlimit is not
> perfect, but I think the semantic is close here (e.g. a process can also
> pass FD through unix socket).
>
>>
>>> +
>>> + /* decorelate UAPI from kernel API */
>>> + attr->key_size = sizeof(struct inode *);
>>> +
>>> + return htab_map_alloc_check(attr);
>>> +}
>>> +
>>> +static void inode_htab_put_key(void *key)
>>> +{
>>> + struct inode **inode = key;
>>> +
>>> + if ((*inode)->i_state & I_FREEING)
>>> + return;
>>
>> checking the state without take a lock? isn't it racy?
>
> This should only trigger when called from security_inode_free(). I'll
> add a comment.
>
>>
>>> + iput(*inode);
>>> +}
>>> +
>>> +/* called from syscall or (never) from eBPF program */
>>> +static int map_get_next_no_key(struct bpf_map *map, void *key, void *next_key)
>>> +{
>>> + /* do not leak a file descriptor */
>>
>> what this comment suppose to mean?
>
> Because a key is a reference to an inode, a possible return value for
> this function could be a file descriptor pointing to this inode (the
> same way a file descriptor is use to add an element). For now, I don't
> want to implement a way for a process with such a map to extract such
> inode, which I compare to a possible leak (of information, not kernel
> memory nor object). This could be implemented in the future if there is
> value in it (and probably some additional safeguards), though.
>
>>
>>> + return -ENOTSUPP;
>>> +}
>>> +
>>> +/* must call iput(inode) after this call */
>>> +static struct inode *inode_from_fd(int ufd, bool check_access)
>>> +{
>>> + struct inode *ret;
>>> + struct fd f;
>>> + int deny;
>>> +
>>> + f = fdget(ufd);
>>> + if (unlikely(!f.file))
>>> + return ERR_PTR(-EBADF);
>>> + /* TODO?: add this check when called from an eBPF program too (already
>>> + * checked by the LSM parent hooks anyway) */
>>> + if (unlikely(IS_PRIVATE(file_inode(f.file)))) {
>>> + ret = ERR_PTR(-EINVAL);
>>> + goto put_fd;
>>> + }
>>> + /* check if the FD is tied to a mount point */
>>> + /* TODO?: add this check when called from an eBPF program too */
>>> + if (unlikely(f.file->f_path.mnt->mnt_flags & MNT_INTERNAL)) {
>>> + ret = ERR_PTR(-EINVAL);
>>> + goto put_fd;
>>> + }
>>
>> a bunch of TODOs do not inspire confidence.
>
> I think the current implement is good, but these TODOs are here to draw
> attention on particular points for which I would like external review
> and opinion (hence the "?").
>
>>
>>> + if (check_access) {
>>> + /*
>>> + * must be allowed to access attributes from this file to then
>>> + * be able to compare an inode to its map entry
>>> + */
>>> + deny = security_inode_getattr(&f.file->f_path);
>>> + if (deny) {
>>> + ret = ERR_PTR(deny);
>>> + goto put_fd;
>>> + }
>>> + }
>>> + ret = file_inode(f.file);
>>> + ihold(ret);
>>> +
>>> +put_fd:
>>> + fdput(f);
>>> + return ret;
>>> +}
>>> +
>>> +/*
>>> + * The key is a FD when called from a syscall, but an inode address when called
>>> + * from an eBPF program.
>>> + */
>>> +
>>> +/* called from syscall */
>>> +int bpf_inode_fd_htab_map_lookup_elem(struct bpf_map *map, int *key, void *value)
>>> +{
>>> + void *ptr;
>>> + struct inode *inode;
>>> + int ret;
>>> +
>>> + /* check inode access */
>>> + inode = inode_from_fd(*key, true);
>>> + if (IS_ERR(inode))
>>> + return PTR_ERR(inode);
>>> +
>>> + rcu_read_lock();
>>> + ptr = htab_map_lookup_elem(map, &inode);
>>> + iput(inode);
>>> + if (IS_ERR(ptr)) {
>>> + ret = PTR_ERR(ptr);
>>> + } else if (!ptr) {
>>> + ret = -ENOENT;
>>> + } else {
>>> + ret = 0;
>>> + copy_map_value(map, value, ptr);
>>> + }
>>> + rcu_read_unlock();
>>> + return ret;
>>> +}
>>> +
>>> +/* called from kernel */
>>
>> wrong comment?
>> kernel side cannot call it, right?
>
> This is called from bpf_inode_fd_htab_map_delete_elem() (code just
> beneath), and from
> kernel/bpf/syscall.c:bpf_inode_ptr_unlocked_htab_map_delet_elem() which
> can be called by security_inode_free() (hook_inode_free_security).
>
>>
>>> +int bpf_inode_ptr_locked_htab_map_delete_elem(struct bpf_map *map,
>>> + struct inode **key, bool remove_in_inode)
>>> +{
>>> + if (remove_in_inode)
>>> + landlock_inode_remove_map(*key, map);
>>> + return htab_map_delete_elem(map, key);
>>> +}
>>> +
>>> +/* called from syscall */
>>> +int bpf_inode_fd_htab_map_delete_elem(struct bpf_map *map, int *key)
>>> +{
>>> + struct inode *inode;
>>> + int ret;
>>> +
>>> + /* do not check inode access (similar to directory check) */
>>> + inode = inode_from_fd(*key, false);
>>> + if (IS_ERR(inode))
>>> + return PTR_ERR(inode);
>>> + ret = bpf_inode_ptr_locked_htab_map_delete_elem(map, &inode, true);
>>> + iput(inode);
>>> + return ret;
>>> +}
>>> +
>>> +/* called from syscall */
>>> +int bpf_inode_fd_htab_map_update_elem(struct bpf_map *map, int *key, void *value,
>>> + u64 map_flags)
>>> +{
>>> + struct inode *inode;
>>> + int ret;
>>> +
>>> + WARN_ON_ONCE(!rcu_read_lock_held());
>>> +
>>> + /* check inode access */
>>> + inode = inode_from_fd(*key, true);
>>> + if (IS_ERR(inode))
>>> + return PTR_ERR(inode);
>>> + ret = htab_map_update_elem(map, &inode, value, map_flags);
>>> + if (!ret)
>>> + ret = landlock_inode_add_map(inode, map);
>>> + iput(inode);
>>> + return ret;
>>> +}
>>> +
>>> +static void inode_htab_map_free(struct bpf_map *map)
>>> +{
>>> + struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
>>> + struct hlist_nulls_node *n;
>>> + struct hlist_nulls_head *head;
>>> + struct htab_elem *l;
>>> + int i;
>>> +
>>> + for (i = 0; i < htab->n_buckets; i++) {
>>> + head = select_bucket(htab, i);
>>> + hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
>>> + landlock_inode_remove_map(*((struct inode **)l->key), map);
>>> + }
>>> + }
>>> + htab_map_free(map);
>>> +}
>>
>> user space can delete the map.
>> that will trigger inode_htab_map_free() which will call
>> landlock_inode_remove_map().
>> which will simply itereate the list and delete from the list.
>
> landlock_inode_remove_map() removes the reference to the map (being
> freed) from the inode (with an RCU lock).
>
>>
>> While in parallel inode can be destoyed and hook_inode_free_security()
>> will be called.
>> I think nothing that protects from this race.
>
> According to security_inode_free(), the inode is effectively freed after
> the RCU grace period. However, I forgot to call bpf_map_inc() in
> landlock_inode_add_map(), which would prevent the map to be freed
> outside of the security_inode_free(). I'll fix that.
>
>>
>>> +
>>> +/*
>>> + * We need a dedicated helper to deal with inode maps because the key is a
>>> + * pointer to an opaque data, only provided by the kernel. This really act
>>> + * like a (physical or cryptographic) key, which is why it is also not allowed
>>> + * to get the next key with map_get_next_key().
>>
>> inode pointer is like cryptographic key? :)
>
> I wanted to highlight the fact that, contrary to other map key types,
> the value of this one should not be readable, only usable. A "secret
> value" is more appropriate but still confusing. I'll rephrase that.
>
>>
>>> + */
>>> +BPF_CALL_2(bpf_inode_map_lookup_elem, struct bpf_map *, map, void *, key)
>>> +{
>>> + WARN_ON_ONCE(!rcu_read_lock_held());
>>> + return (unsigned long)htab_map_lookup_elem(map, &key);
>>> +}
>>> +
>>> +const struct bpf_func_proto bpf_inode_map_lookup_elem_proto = {
>>> + .func = bpf_inode_map_lookup_elem,
>>> + .gpl_only = false,
>>> + .pkt_access = true,
>>
>> pkt_access ? :)
>
> This slipped in with this rebase, I'll remove it. :)
>
>>
>>> + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
>>> + .arg1_type = ARG_CONST_MAP_PTR,
>>> + .arg2_type = ARG_PTR_TO_INODE,
>>> +};
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index b2a8cb14f28e..e46441c42b68 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -801,6 +801,8 @@ static int map_lookup_elem(union bpf_attr *attr)
>>> } else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
>>> map->map_type == BPF_MAP_TYPE_STACK) {
>>> err = map->ops->map_peek_elem(map, value);
>>> + } else if (map->map_type == BPF_MAP_TYPE_INODE) {
>>> + err = bpf_inode_fd_htab_map_lookup_elem(map, key, value);
>>> } else {
>>> rcu_read_lock();
>>> if (map->ops->map_lookup_elem_sys_only)
>>> @@ -951,6 +953,10 @@ static int map_update_elem(union bpf_attr *attr)
>>> } else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
>>> map->map_type == BPF_MAP_TYPE_STACK) {
>>> err = map->ops->map_push_elem(map, value, attr->flags);
>>> + } else if (map->map_type == BPF_MAP_TYPE_INODE) {
>>> + rcu_read_lock();
>>> + err = bpf_inode_fd_htab_map_update_elem(map, key, value, attr->flags);
>>> + rcu_read_unlock();
>>> } else {
>>> rcu_read_lock();
>>> err = map->ops->map_update_elem(map, key, value, attr->flags);
>>> @@ -1006,7 +1012,10 @@ static int map_delete_elem(union bpf_attr *attr)
>>> preempt_disable();
>>> __this_cpu_inc(bpf_prog_active);
>>> rcu_read_lock();
>>> - err = map->ops->map_delete_elem(map, key);
>>> + if (map->map_type == BPF_MAP_TYPE_INODE)
>>> + err = bpf_inode_fd_htab_map_delete_elem(map, key);
>>> + else
>>> + err = map->ops->map_delete_elem(map, key);
>>> rcu_read_unlock();
>>> __this_cpu_dec(bpf_prog_active);
>>> preempt_enable();
>>> @@ -1018,6 +1027,22 @@ static int map_delete_elem(union bpf_attr *attr)
>>> return err;
>>> }
>>>
>>> +int bpf_inode_ptr_unlocked_htab_map_delete_elem(struct bpf_map *map,
>>> + struct inode **key, bool remove_in_inode)
>>> +{
>>> + int err;
>>> +
>>> + preempt_disable();
>>> + __this_cpu_inc(bpf_prog_active);
>>> + rcu_read_lock();
>>> + err = bpf_inode_ptr_locked_htab_map_delete_elem(map, key, remove_in_inode);
>>> + rcu_read_unlock();
>>> + __this_cpu_dec(bpf_prog_active);
>>> + preempt_enable();
>>> + maybe_wait_bpf_programs(map);
>>
>> if that function was actually doing synchronize_rcu() the consequences
>> would have been unpleasant. Fortunately it's a nop in this case.
>> Please read the code carefully before copy-paste.
>> Also what do you think the reason of bpf_prog_active above?
>> What is the reason of rcu_read_lock above?
>
> The RCU is used as for every map modifications (usually from userspace).
> I wasn't sure about the other protections so I kept the same (generic)
> checks as in map_delete_elem() (just above) because this function follow
> the same semantic. What can I safely remove?
>
>>
>> I think the patch set needs to shrink at least in half to be reviewable.
>> The way you tie seccomp and lsm is probably the biggest obstacle
>> than any of the bugs above.
>> Can you drop seccomp ? and do it as normal lsm ?
>
> The seccomp/enforcement part is needed to have a minimum viable product,
> i.e. a process able to sandbox itself. Are you suggesting to first merge
> a version when it is only possible to create inode maps but not use them
> in an useful way (i.e. for sandboxing)? I can do it if it's OK with you,
> and I hope it will not be a problem for the security folks if it can
> help to move forward.
I talked with Kees Cook and James Morris at LSS NA, and I think the
better strategy to shrink this patch series is to tackle a much less
complex problem at first. Instead on focusing right now on file system,
the next version of this patch series will focus on memory protection,
which is also something desired. I'll then iterate with file system
support (i.e. inode maps) and other use cases once the basics of
Landlock are upstream. For this next series, the majority of the code
will be on the LSM side, while the eBPF part will mainly consist to add
a new program type. Because bpf-next is moving rapidly, I think it still
make sense to base this work on this tree (instead of linux-security).
Regards,
Mickaël
^ permalink raw reply
* Re: [PATCH v12 11/12] open: openat2(2) syscall
From: Aleksa Sarai @ 2019-09-08 16:24 UTC (permalink / raw)
To: Jeff Layton
Cc: Al Viro, J. Bruce Fields, Arnd Bergmann, David Howells,
Shuah Khan, Shuah Khan, Ingo Molnar, Peter Zijlstra,
Christian Brauner, Eric Biederman, Andy Lutomirski, Andrew Morton,
Alexei Starovoitov, Kees Cook, Jann Horn, Tycho Andersen,
David Drysdale, Chanho Min, Oleg Nesterov, Rasmus Villemoes,
Alexander
In-Reply-To: <7236f382d72130f2afbbe8940e72cc67e5c6dce0.camel@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]
On 2019-09-07, Jeff Layton <jlayton@kernel.org> wrote:
> On Thu, 2019-09-05 at 06:19 +1000, Aleksa Sarai wrote:
> > + * @flags: O_* flags.
> > + * @mode: O_CREAT/O_TMPFILE file mode.
> > + * @upgrade_mask: UPGRADE_* flags (to restrict O_PATH re-opening).
> > + * @resolve: RESOLVE_* flags.
> > + */
> > +struct open_how {
> > + __u32 flags;
> > + union {
> > + __u16 mode;
> > + __u16 upgrade_mask;
> > + };
> > + __u16 resolve;
> > +};
> > +
> > +#define OPEN_HOW_SIZE_VER0 8 /* sizeof first published struct */
> > +
>
> Hmm, there is no version field. When you want to expand this in the
> future, what is the plan? Add a new flag to indicate that it's some
> length?
The "version number" is the size of the struct. Any extensions we make
are appended to the struct (openat2 now takes a size_t argument), and
the new copy_struct_{to,from}_user() helpers handle all of the
permutations of {old,new} kernel and {old,new} user space.
This is how clone3(), sched_[gs]etattr() and perf_event_open() all
operate (all of the sigset syscalls operate similarly but don't
gracefully handle different kernel vintages -- you just get -EINVAL).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH 2/2] ipc: fix sparc64 ipc() wrapper
From: Arnd Bergmann @ 2019-09-07 19:49 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, Andrew Morton, David S. Miller,
Arnd Bergmann, Eric W. Biederman, Deepa Dinamani,
Christian Brauner, Manfred Spraul, Davidlohr Bueso
Cc: linux-arch, y2038 Mailman List, Linux API, # 3.4.x,
Dominik Brodowski, sparclinux, Matt Turner, Anatolij Gustschin
In-Reply-To: <20190905152155.1392871-2-arnd@arndb.de>
On Thu, Sep 5, 2019 at 5:24 PM Arnd Bergmann <arnd@arndb.de> wrote:
> diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
> index ccc88926bc00..5ad0494df367 100644
> --- a/arch/sparc/kernel/sys_sparc_64.c
> +++ b/arch/sparc/kernel/sys_sparc_64.c
> @@ -340,21 +340,21 @@ SYSCALL_DEFINE6(sparc_ipc, unsigned int, call, int, first, unsigned long, second
> if (call <= SEMTIMEDOP) {
> switch (call) {
> case SEMOP:
> - err = sys_semtimedop(first, ptr,
> - (unsigned int)second, NULL);
> + err = ksys_semtimedop(first, ptr,
> + (unsigned int)second, NULL);
> goto out;
The zero-day bot found a link error in sparc64 allnoconfig:
arch/sparc/kernel/sys_sparc_64.o: In function `__se_sys_sparc_ipc':
>> sys_sparc_64.c:(.text+0x724): undefined reference to `ksys_semtimedop'
>> sys_sparc_64.c:(.text+0x76c): undefined reference to `ksys_old_msgctl'
>> sys_sparc_64.c:(.text+0x7a8): undefined reference to `ksys_semget'
>> sys_sparc_64.c:(.text+0x7c8): undefined reference to `ksys_old_semctl'
>> sys_sparc_64.c:(.text+0x7e4): undefined reference to `ksys_msgsnd'
>> sys_sparc_64.c:(.text+0x7fc): undefined reference to `ksys_shmget'
>> sys_sparc_64.c:(.text+0x808): undefined reference to `ksys_shmdt'
sys_sparc_64.c:(.text+0x828): undefined reference to `ksys_semtimedop'
>> sys_sparc_64.c:(.text+0x844): undefined reference to `ksys_old_shmctl'
>> sys_sparc_64.c:(.text+0x858): undefined reference to `ksys_msgget'
>> sys_sparc_64.c:(.text+0x86c): undefined reference to `ksys_msgrcv'
I've added this hunk to my patch and plan to send both fixes to Linus
in the next few days, after I get a positive report from the bot as well:
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -336,6 +336,9 @@ SYSCALL_DEFINE6(sparc_ipc, unsigned int, call,
int, first, unsigned long, second
{
long err;
+ if (!IS_ENABLED(CONFIG_SYSVIPC))
+ return -ENOSYS;
+
/* No need for backward compatibility. We can start fresh... */
if (call <= SEMTIMEDOP) {
switch (call) {
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038
^ permalink raw reply
* Re: [PATCH v12 11/12] open: openat2(2) syscall
From: Andy Lutomirski @ 2019-09-07 18:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Layton, Aleksa Sarai, Al Viro, J. Bruce Fields,
Arnd Bergmann, David Howells, Shuah Khan, Shuah Khan, Ingo Molnar,
Peter Zijlstra, Christian Brauner, Eric Biederman,
Andy Lutomirski, Andrew Morton, Alexei Starovoitov, Kees Cook,
Jann Horn, Tycho Andersen, David Drysdale, Chanho Min,
Oleg Nesterov
In-Reply-To: <CAHk-=whe90Ec_RRrMRLE0=bJOHNS9YmVwcytVxmrfK3oCuZF6A@mail.gmail.com>
> On Sep 7, 2019, at 10:45 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Sat, Sep 7, 2019 at 10:42 AM Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Linus, you rejected resolveat() because you wanted a *nice* API
>
> No. I rejected resoveat() because it was a completely broken garbage
> API that couldn't do even basic stuff right (like O_CREAT).
>
> We have a ton of flag space in the new openat2() model, we might as
> well leave the old flags alone that people are (a) used to and (b) we
> have code to support _anyway_.
>
> Making up a new flag namespace is only going to cause us - and users -
> more work, and more confusion. For no actual advantage. It's not going
> to be "cleaner". It's just going to be worse.
>
>
If we keep all the flag bits in the same mask with the same values, then we’re stuck with O_RDONLY=0 and everything that implies. We’ll have UPGRADE_READ that works differently from the missing plain-old-READ bit, and we can’t express execute-only-no-read-or-write. This sucks.
Can we at least split the permission bits into their own mask and make bits 0 and 1 illegal in the main set of flags in openat2?
There’s another thread going on right now about adding a bit along the lines of “MAYEXEC”, and one of the conclusions was that it should wait for openat2 so that it can have same semantics. If we’re stuck with O_RDONLY and friends, then MAYEXEC is doomed to being at least a bit nonsensical.
As an analogy, AMD64 introduced bigger PTEs but kept the same nonsense encoding of read and write permission. And then we got NX, and now we’re getting little holes in the encoding stolen by CET to mean new silly things. I don’t know if you’ve been following the various rounds of patches, but it is truly horrible. The mapping from meaning to the actual bits is *shit*, and AMD64 should have made a clean break instead.
open()’s permission bits are basically the same situation. And the kernel *already* has a non-type-safe translation layer. Please, please let openat2() at least get rid of the turd in open()’s bits 0 and 1.
^ permalink raw reply
* Re: [PATCH v12 11/12] open: openat2(2) syscall
From: Linus Torvalds @ 2019-09-07 17:45 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Jeff Layton, Aleksa Sarai, Al Viro, J. Bruce Fields,
Arnd Bergmann, David Howells, Shuah Khan, Shuah Khan, Ingo Molnar,
Peter Zijlstra, Christian Brauner, Eric Biederman,
Andy Lutomirski, Andrew Morton, Alexei Starovoitov, Kees Cook,
Jann Horn, Tycho Andersen, David Drysdale, Chanho Min,
Oleg Nesterov
In-Reply-To: <C81D6D29-F6BF-48E6-A15E-3ABCB2C992E5@amacapital.net>
On Sat, Sep 7, 2019 at 10:42 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
> Linus, you rejected resolveat() because you wanted a *nice* API
No. I rejected resoveat() because it was a completely broken garbage
API that couldn't do even basic stuff right (like O_CREAT).
We have a ton of flag space in the new openat2() model, we might as
well leave the old flags alone that people are (a) used to and (b) we
have code to support _anyway_.
Making up a new flag namespace is only going to cause us - and users -
more work, and more confusion. For no actual advantage. It's not going
to be "cleaner". It's just going to be worse.
Linus
^ permalink raw reply
* Re: [PATCH v12 11/12] open: openat2(2) syscall
From: Andy Lutomirski @ 2019-09-07 17:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Layton, Aleksa Sarai, Al Viro, J. Bruce Fields,
Arnd Bergmann, David Howells, Shuah Khan, Shuah Khan, Ingo Molnar,
Peter Zijlstra, Christian Brauner, Eric Biederman,
Andy Lutomirski, Andrew Morton, Alexei Starovoitov, Kees Cook,
Jann Horn, Tycho Andersen, David Drysdale, Chanho Min,
Oleg Nesterov
In-Reply-To: <CAHk-=whZx97Nm-gUK0ppofj2RA2LLz2vmaDUTKSSV-+yYB9q_Q@mail.gmail.com>
> On Sep 7, 2019, at 9:58 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Sat, Sep 7, 2019 at 5:40 AM Jeff Layton <jlayton@kernel.org> wrote:
>>
>> After thinking about this a bit, I wonder if we might be better served
>> with a new set of OA2_* flags instead of repurposing the O_* flags?
>
> I'd hate to have yet _another_ set of translation functions, and
> another chance of people just getting it wrong either in user space or
> the kernel.
>
> So no. Let's not make another set of flags that has no sane way to
> have type-safety to avoid more confusion.
>
> The new flags that _only_ work with openat2() might be named with a
> prefix/suffix to mark that, but I'm not sure it's a huge deal.
>
>
I agree with the philosophy, but I think it doesn’t apply in this case. Here are the flags:
O_RDONLY, O_WRONLY, O_RDWR: not even a proper bitmask. The kernel already has the FMODE_ bits to make this make sense. How about we make the openat2 permission bits consistent with the internal representation and let the O_ permission bits remain as an awful translation. The kernel already translates like this, and it already sucks.
O_CREAT, O_TMPFILE, O_NOCTTY, O_TRUNC: not modes on the fd at all. These affect the meaning of open(). Heck, for openat2, NOCTTY should be this default.
O_EXCL: hopelessly overloaded.
O_APPEND, O_DIRECT, O_SYNC, O_DSYNC, O_LARGEFILE, O_NOATIME, O_PATH, O_NONBLOCK: genuine mode bits
O_CLOEXEC: special because it affects the fd, not the struct file.
Linus, you rejected resolveat() because you wanted a *nice* API that people would use and that might even be adopted by other OSes. Let’s please not make openat2() be a giant pile of crap in the name of consistency with open(). open(), frankly, is horrible.
^ permalink raw reply
* Re: [PATCH v12 11/12] open: openat2(2) syscall
From: Linus Torvalds @ 2019-09-07 16:58 UTC (permalink / raw)
To: Jeff Layton
Cc: Aleksa Sarai, Al Viro, J. Bruce Fields, Arnd Bergmann,
David Howells, Shuah Khan, Shuah Khan, Ingo Molnar,
Peter Zijlstra, Christian Brauner, Eric Biederman,
Andy Lutomirski, Andrew Morton, Alexei Starovoitov, Kees Cook,
Jann Horn, Tycho Andersen, David Drysdale, Chanho Min,
Oleg Nesterov, Rasmus Villemoes <linux@
In-Reply-To: <7236f382d72130f2afbbe8940e72cc67e5c6dce0.camel@kernel.org>
On Sat, Sep 7, 2019 at 5:40 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> After thinking about this a bit, I wonder if we might be better served
> with a new set of OA2_* flags instead of repurposing the O_* flags?
I'd hate to have yet _another_ set of translation functions, and
another chance of people just getting it wrong either in user space or
the kernel.
So no. Let's not make another set of flags that has no sane way to
have type-safety to avoid more confusion.
The new flags that _only_ work with openat2() might be named with a
prefix/suffix to mark that, but I'm not sure it's a huge deal.
Linus
^ permalink raw reply
* Re: [PATCH v12 11/12] open: openat2(2) syscall
From: Jeff Layton @ 2019-09-07 12:40 UTC (permalink / raw)
To: Aleksa Sarai, Al Viro, J. Bruce Fields, Arnd Bergmann,
David Howells, Shuah Khan, Shuah Khan, Ingo Molnar,
Peter Zijlstra, Christian Brauner
Cc: Eric Biederman, Andy Lutomirski, Andrew Morton,
Alexei Starovoitov, Kees Cook, Jann Horn, Tycho Andersen,
David Drysdale, Chanho Min, Oleg Nesterov, Rasmus Villemoes,
Alexander Shishkin, Jiri Olsa, Namhyung Kim, Aleksa Sarai,
Linus Torvalds, containers, linux-alpha, linux-api, linux-arch,
linux-arm-kernel
In-Reply-To: <20190904201933.10736-12-cyphar@cyphar.com>
On Thu, 2019-09-05 at 06:19 +1000, Aleksa Sarai wrote:
> The most obvious syscall to add support for the new LOOKUP_* scoping
> flags would be openat(2). However, there are a few reasons why this is
> not the best course of action:
>
> * The new LOOKUP_* flags are intended to be security features, and
> openat(2) will silently ignore all unknown flags. This means that
> users would need to avoid foot-gunning themselves constantly when
> using this interface if it were part of openat(2). This can be fixed
> by having userspace libraries handle this for users[1], but should be
> avoided if possible.
>
> * Resolution scoping feels like a different operation to the existing
> O_* flags. And since openat(2) has limited flag space, it seems to be
> quite wasteful to clutter it with 5 flags that are all
> resolution-related. Arguably O_NOFOLLOW is also a resolution flag but
> its entire purpose is to error out if you encounter a trailing
> symlink -- not to scope resolution.
>
> * Other systems would be able to reimplement this syscall allowing for
> cross-OS standardisation rather than being hidden amongst O_* flags
> which may result in it not being used by all the parties that might
> want to use it (file servers, web servers, container runtimes, etc).
>
> * It gives us the opportunity to iterate on the O_PATH interface. In
> particular, the new @how->upgrade_mask field for fd re-opening is
> only possible because we have a clean slate without needing to re-use
> the ACC_MODE flag design nor the existing openat(2) @mode semantics.
>
> To this end, we introduce the openat2(2) syscall. It provides all of the
> features of openat(2) through the @how->flags argument, but also
> also provides a new @how->resolve argument which exposes RESOLVE_* flags
> that map to our new LOOKUP_* flags. It also eliminates the long-standing
> ugliness of variadic-open(2) by embedding it in a struct.
>
> In order to allow for userspace to lock down their usage of file
> descriptor re-opening, openat2(2) has the ability for users to disallow
> certain re-opening modes through @how->upgrade_mask. At the moment,
> there is no UPGRADE_NOEXEC.
>
> [1]: https://github.com/openSUSE/libpathrs
>
> Suggested-by: Christian Brauner <christian@brauner.io>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
> arch/alpha/kernel/syscalls/syscall.tbl | 1 +
> arch/arm/tools/syscall.tbl | 1 +
> arch/arm64/include/asm/unistd.h | 2 +-
> arch/arm64/include/asm/unistd32.h | 2 +
> arch/ia64/kernel/syscalls/syscall.tbl | 1 +
> arch/m68k/kernel/syscalls/syscall.tbl | 1 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
> arch/parisc/kernel/syscalls/syscall.tbl | 1 +
> arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
> arch/s390/kernel/syscalls/syscall.tbl | 1 +
> arch/sh/kernel/syscalls/syscall.tbl | 1 +
> arch/sparc/kernel/syscalls/syscall.tbl | 1 +
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
> fs/open.c | 94 ++++++++++++++++-----
> include/linux/fcntl.h | 19 ++++-
> include/linux/fs.h | 4 +-
> include/linux/syscalls.h | 14 ++-
> include/uapi/asm-generic/unistd.h | 5 +-
> include/uapi/linux/fcntl.h | 42 +++++++++
> 24 files changed, 168 insertions(+), 30 deletions(-)
>
[...]
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 1d338357df8a..479baf2da10e 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -93,5 +93,47 @@
>
> #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
>
> +/**
> + * Arguments for how openat2(2) should open the target path. If @resolve is
> + * zero, then openat2(2) operates identically to openat(2).
> + *
> + * However, unlike openat(2), unknown bits in @flags result in -EINVAL rather
> + * than being silently ignored. In addition, @mode (or @upgrade_mask) must be
> + * zero unless one of {O_CREAT, O_TMPFILE, O_PATH} are set.
> + *
After thinking about this a bit, I wonder if we might be better served
with a new set of OA2_* flags instead of repurposing the O_* flags?
Yes, those flags are familiar, but this is an entirely new syscall. We
have a chance to make a fresh start. Does something like O_LARGEFILE
have any real place in openat2? I'd argue no.
Also, once you want to add a new flag, then we get into the mess of how
to document whether open/openat also support it. It'd be good to freeze
changes on those syscalls and aim to only introduce new functionality in
openat2.
That would also allow us to drop some flags from openat2 that we really
don't need, and maybe expand the flag space to 64 bits initially, to
allow for expansion into the future.
Thoughts?
> + * @flags: O_* flags.
> + * @mode: O_CREAT/O_TMPFILE file mode.
> + * @upgrade_mask: UPGRADE_* flags (to restrict O_PATH re-opening).
> + * @resolve: RESOLVE_* flags.
> + */
> +struct open_how {
> + __u32 flags;
> + union {
> + __u16 mode;
> + __u16 upgrade_mask;
> + };
> + __u16 resolve;
> +};
> +
> +#define OPEN_HOW_SIZE_VER0 8 /* sizeof first published struct */
> +
Hmm, there is no version field. When you want to expand this in the
future, what is the plan? Add a new flag to indicate that it's some
length?
> +/* how->resolve flags for openat2(2). */
> +#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
> + (includes bind-mounts). */
> +#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
> + "magic-links". */
> +#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
> + (implies OEXT_NO_MAGICLINKS) */
> +#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
> + "..", symlinks, and absolute
> + paths which escape the dirfd. */
> +#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
> + be scoped inside the dirfd
> + (similar to chroot(2)). */
> +
> +/* how->upgrade flags for openat2(2). */
> +/* First bit is reserved for a future UPGRADE_NOEXEC flag. */
> +#define UPGRADE_NOREAD 0x02 /* Block re-opening with MAY_READ. */
> +#define UPGRADE_NOWRITE 0x04 /* Block re-opening with MAY_WRITE. */
>
> #endif /* _UAPI_LINUX_FCNTL_H */
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply
* Re: [PATCH v3 bpf-next 2/3] bpf: implement CAP_BPF
From: kbuild test robot @ 2019-09-07 4:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: kbuild-all, davem, daniel, peterz, luto, netdev, bpf, kernel-team,
linux-api
In-Reply-To: <20190904184335.360074-2-ast@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 4029 bytes --]
Hi Alexei,
I love your patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Alexei-Starovoitov/capability-introduce-CAP_BPF-and-CAP_TRACING/20190906-215814
base: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-c003-201935 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from include/linux/export.h:45:0,
from include/linux/linkage.h:7,
from include/linux/kernel.h:8,
from include/linux/list.h:9,
from include/linux/timer.h:5,
from include/linux/workqueue.h:9,
from include/linux/bpf.h:9,
from kernel/bpf/syscall.c:4:
kernel/bpf/syscall.c: In function 'bpf_prog_test_run':
kernel/bpf/syscall.c:2087:6: warning: the address of 'capable_bpf_net_admin' will always evaluate as 'true' [-Waddress]
if (!capable_bpf_net_admin)
^
include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
>> kernel/bpf/syscall.c:2087:2: note: in expansion of macro 'if'
if (!capable_bpf_net_admin)
^~
kernel/bpf/syscall.c:2087:6: warning: the address of 'capable_bpf_net_admin' will always evaluate as 'true' [-Waddress]
if (!capable_bpf_net_admin)
^
include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
>> kernel/bpf/syscall.c:2087:2: note: in expansion of macro 'if'
if (!capable_bpf_net_admin)
^~
kernel/bpf/syscall.c:2087:6: warning: the address of 'capable_bpf_net_admin' will always evaluate as 'true' [-Waddress]
if (!capable_bpf_net_admin)
^
include/linux/compiler.h:69:3: note: in definition of macro '__trace_if_value'
(cond) ? \
^~~~
include/linux/compiler.h:56:28: note: in expansion of macro '__trace_if_var'
#define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
^~~~~~~~~~~~~~
>> kernel/bpf/syscall.c:2087:2: note: in expansion of macro 'if'
if (!capable_bpf_net_admin)
^~
vim +/if +2087 kernel/bpf/syscall.c
2080
2081 static int bpf_prog_test_run(const union bpf_attr *attr,
2082 union bpf_attr __user *uattr)
2083 {
2084 struct bpf_prog *prog;
2085 int ret = -ENOTSUPP;
2086
> 2087 if (!capable_bpf_net_admin)
2088 /* test_run callback is available for networking progs only.
2089 * Add capable_bpf_tracing() above when tracing progs become runable.
2090 */
2091 return -EPERM;
2092 if (CHECK_ATTR(BPF_PROG_TEST_RUN))
2093 return -EINVAL;
2094
2095 if ((attr->test.ctx_size_in && !attr->test.ctx_in) ||
2096 (!attr->test.ctx_size_in && attr->test.ctx_in))
2097 return -EINVAL;
2098
2099 if ((attr->test.ctx_size_out && !attr->test.ctx_out) ||
2100 (!attr->test.ctx_size_out && attr->test.ctx_out))
2101 return -EINVAL;
2102
2103 prog = bpf_prog_get(attr->test.prog_fd);
2104 if (IS_ERR(prog))
2105 return PTR_ERR(prog);
2106
2107 if (prog->aux->ops->test_run)
2108 ret = prog->aux->ops->test_run(prog, attr, uattr);
2109
2110 bpf_prog_put(prog);
2111 return ret;
2112 }
2113
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29761 bytes --]
^ permalink raw reply
* [PATCH v4 bpf-next 4/4] selftests/bpf: use CAP_BPF and CAP_TRACING in tests
From: Alexei Starovoitov @ 2019-09-06 23:10 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, luto, netdev, bpf, kernel-team, linux-api
In-Reply-To: <20190906231053.1276792-1-ast@kernel.org>
Make all test_verifier test exercise CAP_BPF and CAP_TRACING
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
tools/testing/selftests/bpf/test_verifier.c | 46 +++++++++++++++++----
1 file changed, 38 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index d27fd929abb9..0d5567962c4e 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -807,10 +807,20 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
}
}
+struct libcap {
+ struct __user_cap_header_struct hdr;
+ struct __user_cap_data_struct data[2];
+};
+
static int set_admin(bool admin)
{
cap_t caps;
- const cap_value_t cap_val = CAP_SYS_ADMIN;
+ /* need CAP_BPF to load progs and CAP_NET_ADMIN to run networking progs,
+ * and CAP_TRACING to create stackmap
+ */
+ const cap_value_t cap_net_admin = CAP_NET_ADMIN;
+ const cap_value_t cap_sys_admin = CAP_SYS_ADMIN;
+ struct libcap *cap;
int ret = -1;
caps = cap_get_proc();
@@ -818,11 +828,26 @@ static int set_admin(bool admin)
perror("cap_get_proc");
return -1;
}
- if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_val,
+ cap = (struct libcap *)caps;
+ if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_sys_admin, CAP_CLEAR)) {
+ perror("cap_set_flag clear admin");
+ goto out;
+ }
+ if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_net_admin,
admin ? CAP_SET : CAP_CLEAR)) {
- perror("cap_set_flag");
+ perror("cap_set_flag set_or_clear net");
goto out;
}
+ /* libcap is likely old and simply ignores CAP_BPF and CAP_TRACING,
+ * so update effective bits manually
+ */
+ if (admin) {
+ cap->data[1].effective |= 1 << (38 /* CAP_BPF */ - 32);
+ cap->data[1].effective |= 1 << (39 /* CAP_TRACING */ - 32);
+ } else {
+ cap->data[1].effective &= ~(1 << (38 - 32));
+ cap->data[1].effective &= ~(1 << (39 - 32));
+ }
if (cap_set_proc(caps)) {
perror("cap_set_proc");
goto out;
@@ -1051,9 +1076,11 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
static bool is_admin(void)
{
+ cap_flag_value_t net_priv = CAP_CLEAR;
+ bool tracing_priv = false;
+ bool bpf_priv = false;
+ struct libcap *cap;
cap_t caps;
- cap_flag_value_t sysadmin = CAP_CLEAR;
- const cap_value_t cap_val = CAP_SYS_ADMIN;
#ifdef CAP_IS_SUPPORTED
if (!CAP_IS_SUPPORTED(CAP_SETFCAP)) {
@@ -1066,11 +1093,14 @@ static bool is_admin(void)
perror("cap_get_proc");
return false;
}
- if (cap_get_flag(caps, cap_val, CAP_EFFECTIVE, &sysadmin))
- perror("cap_get_flag");
+ cap = (struct libcap *)caps;
+ bpf_priv = cap->data[1].effective & (1 << (38/* CAP_BPF */ - 32));
+ tracing_priv = cap->data[1].effective & (1 << (39/* CAP_TRACING */ - 32));
+ if (cap_get_flag(caps, CAP_NET_ADMIN, CAP_EFFECTIVE, &net_priv))
+ perror("cap_get_flag NET");
if (cap_free(caps))
perror("cap_free");
- return (sysadmin == CAP_SET);
+ return bpf_priv && tracing_priv && net_priv == CAP_SET;
}
static void get_unpriv_disabled()
--
2.20.0
^ permalink raw reply related
* [PATCH v4 bpf-next 3/4] perf: implement CAP_TRACING
From: Alexei Starovoitov @ 2019-09-06 23:10 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, luto, netdev, bpf, kernel-team, linux-api
In-Reply-To: <20190906231053.1276792-1-ast@kernel.org>
Implement permissions as stated in uapi/linux/capability.h
and update Documentation.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
Documentation/admin-guide/perf-security.rst | 4 ++--
Documentation/admin-guide/sysctl/kernel.rst | 10 ++++------
arch/powerpc/perf/core-book3s.c | 4 ++--
arch/x86/events/intel/bts.c | 2 +-
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/p4.c | 2 +-
kernel/events/core.c | 14 +++++++-------
kernel/events/hw_breakpoint.c | 2 +-
kernel/trace/trace_event_perf.c | 4 ++--
9 files changed, 21 insertions(+), 23 deletions(-)
diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst
index 72effa7c23b9..c84152d1dfd4 100644
--- a/Documentation/admin-guide/perf-security.rst
+++ b/Documentation/admin-guide/perf-security.rst
@@ -66,8 +66,8 @@ into distinct units, known as capabilities [6]_ , which can be
independently enabled and disabled on per-thread basis for processes and
files of unprivileged users.
-Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
-as privileged processes with respect to perf_events performance
+Unprivileged processes with enabled CAP_SYS_ADMIN or CAP_TRACING capability
+are treated as privileged processes with respect to perf_events performance
monitoring and bypass *scope* permissions checks in the kernel.
Unprivileged processes using perf_events system call API is also subject
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 032c7cd3cede..595bf2b1363f 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -720,20 +720,18 @@ allowed to execute.
====================
Controls use of the performance events system by unprivileged
-users (without CAP_SYS_ADMIN). The default value is 2.
+users (without CAP_SYS_ADMIN and without CAP_TRACING). The default value is 2.
=== ==================================================================
-1 Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
->=0 Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+>=0 Disallow ftrace function tracepoint and raw tracepoint
- Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+>=1 Disallow CPU event access
->=1 Disallow CPU event access by users without CAP_SYS_ADMIN
-
->=2 Disallow kernel profiling by users without CAP_SYS_ADMIN
+>=2 Disallow kernel profiling
=== ==================================================================
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index ca92e01d0bd1..a204a3c6c68b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -204,7 +204,7 @@ static inline void perf_get_data_addr(struct pt_regs *regs, u64 *addrp)
if (!(mmcra & MMCRA_SAMPLE_ENABLE) || sdar_valid)
*addrp = mfspr(SPRN_SDAR);
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) &&
+ if (perf_paranoid_kernel() && !capable_tracing() &&
is_kernel_addr(mfspr(SPRN_SDAR)))
*addrp = 0;
}
@@ -472,7 +472,7 @@ static void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
* exporting it to userspace (avoid exposure of regions
* where we could have speculative execution)
*/
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) &&
+ if (perf_paranoid_kernel() && !capable_tracing() &&
is_kernel_addr(addr))
continue;
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 5ee3fed881d3..bd713b2dd7c2 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -550,7 +550,7 @@ static int bts_event_init(struct perf_event *event)
* users to profile the kernel.
*/
if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
- !capable(CAP_SYS_ADMIN))
+ !capable_tracing())
return -EACCES;
if (x86_add_exclusive(x86_lbr_exclusive_bts))
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e4c2cb65ea50..a7f8c18bd82b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3307,7 +3307,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (x86_pmu.version < 3)
return -EINVAL;
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_cpu() && !capable_tracing())
return -EACCES;
event->hw.config |= ARCH_PERFMON_EVENTSEL_ANY;
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index dee579efb2b2..f379a358c9cb 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -776,7 +776,7 @@ static int p4_validate_raw_event(struct perf_event *event)
* the user needs special permissions to be able to use it
*/
if (p4_ht_active() && p4_event_bind_map[v].shared) {
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_cpu() && !capable_tracing())
return -EACCES;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0463c1151bae..eaba102e5d91 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4134,7 +4134,7 @@ find_get_context(struct pmu *pmu, struct task_struct *task,
if (!task) {
/* Must be root to operate on a CPU event: */
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_cpu() && !capable_tracing())
return ERR_PTR(-EACCES);
cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
@@ -8741,7 +8741,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_kprobe.type)
return -ENOENT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_tracing())
return -EACCES;
/*
@@ -8801,7 +8801,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_uprobe.type)
return -ENOENT;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_tracing())
return -EACCES;
/*
@@ -10588,7 +10588,7 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
}
/* privileged levels capture (kernel, hv): check permissions */
if ((mask & PERF_SAMPLE_BRANCH_PERM_PLM)
- && perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+ && perf_paranoid_kernel() && !capable_tracing())
return -EACCES;
}
@@ -10807,12 +10807,12 @@ SYSCALL_DEFINE5(perf_event_open,
return err;
if (!attr.exclude_kernel) {
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_kernel() && !capable_tracing())
return -EACCES;
}
if (attr.namespaces) {
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_tracing())
return -EACCES;
}
@@ -10826,7 +10826,7 @@ SYSCALL_DEFINE5(perf_event_open,
/* Only privileged users can get physical addresses */
if ((attr.sample_type & PERF_SAMPLE_PHYS_ADDR) &&
- perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+ perf_paranoid_kernel() && !capable_tracing())
return -EACCES;
/*
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index c5cd852fe86b..8bc4d7d8c913 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -404,7 +404,7 @@ static int hw_breakpoint_parse(struct perf_event *bp,
* Don't let unprivileged users set a breakpoint in the trap
* path to avoid trap recursion attacks.
*/
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_tracing())
return -EPERM;
}
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0892e38ed6fb..6861307f14d6 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -46,7 +46,7 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
/* The ftrace function trace is allowed only for root. */
if (ftrace_event_is_function(tp_event)) {
- if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_tracepoint_raw() && !capable_tracing())
return -EPERM;
if (!is_sampling_event(p_event))
@@ -82,7 +82,7 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
* ...otherwise raw tracepoint data can be a severe data leak,
* only allow root to have these.
*/
- if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_tracepoint_raw() && !capable_tracing())
return -EPERM;
return 0;
--
2.20.0
^ permalink raw reply related
* [PATCH v4 bpf-next 2/4] bpf: implement CAP_BPF
From: Alexei Starovoitov @ 2019-09-06 23:10 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, luto, netdev, bpf, kernel-team, linux-api
In-Reply-To: <20190906231053.1276792-1-ast@kernel.org>
Implement permissions as stated in uapi/linux/capability.h
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/bpf/arraymap.c | 2 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/core.c | 4 ++--
kernel/bpf/hashtab.c | 4 ++--
kernel/bpf/lpm_trie.c | 2 +-
kernel/bpf/queue_stack_maps.c | 2 +-
kernel/bpf/reuseport_array.c | 2 +-
kernel/bpf/stackmap.c | 2 +-
kernel/bpf/syscall.c | 32 +++++++++++++++++++-------------
kernel/bpf/verifier.c | 2 +-
kernel/trace/bpf_trace.c | 2 +-
net/core/bpf_sk_storage.c | 2 +-
net/core/filter.c | 10 ++++++----
13 files changed, 38 insertions(+), 30 deletions(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 1c65ce0098a9..149f868a02dc 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -73,7 +73,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
int ret, numa_node = bpf_map_attr_numa_node(attr);
u32 elem_size, index_mask, max_entries;
- bool unpriv = !capable(CAP_SYS_ADMIN);
+ bool unpriv = !capable_bpf();
u64 cost, array_size, mask64;
struct bpf_map_memory mem;
struct bpf_array *array;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 6a6a154cfa7b..9c659ba5c146 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -795,7 +795,7 @@ cgroup_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
case BPF_FUNC_trace_printk:
- if (capable(CAP_SYS_ADMIN))
+ if (capable_bpf_tracing())
return bpf_get_trace_printk_proto();
/* fall through */
default:
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 66088a9e9b9e..6643099bc64b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
void bpf_prog_kallsyms_add(struct bpf_prog *fp)
{
if (!bpf_prog_kallsyms_candidate(fp) ||
- !capable(CAP_SYS_ADMIN))
+ !capable_bpf())
return;
spin_lock_bh(&bpf_lock);
@@ -768,7 +768,7 @@ static int bpf_jit_charge_modmem(u32 pages)
{
if (atomic_long_add_return(pages, &bpf_jit_current) >
(bpf_jit_limit >> PAGE_SHIFT)) {
- if (!capable(CAP_SYS_ADMIN)) {
+ if (!capable_bpf()) {
atomic_long_sub(pages, &bpf_jit_current);
return -EPERM;
}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 22066a62c8c9..0fae5c45f425 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -244,9 +244,9 @@ static int htab_map_alloc_check(union bpf_attr *attr)
BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) !=
offsetof(struct htab_elem, hash_node.pprev));
- if (lru && !capable(CAP_SYS_ADMIN))
+ if (lru && !capable_bpf())
/* LRU implementation is much complicated than other
- * maps. Hence, limit to CAP_SYS_ADMIN for now.
+ * maps. Hence, limit to CAP_BPF.
*/
return -EPERM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 56e6c75d354d..11da3be8a4e5 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -543,7 +543,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
u64 cost = sizeof(*trie), cost_per_node;
int ret;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return ERR_PTR(-EPERM);
/* check sanity of attributes */
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index f697647ceb54..d83afac32863 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -45,7 +45,7 @@ static bool queue_stack_map_is_full(struct bpf_queue_stack *qs)
/* Called from syscall */
static int queue_stack_map_alloc_check(union bpf_attr *attr)
{
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
/* check sanity of attributes */
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 50c083ba978c..b268fe4b2972 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -154,7 +154,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
struct bpf_map_memory mem;
u64 array_size;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return ERR_PTR(-EPERM);
array_size = sizeof(*array);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 052580c33d26..477063c63b27 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -90,7 +90,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
u64 cost, n_buckets;
int err;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_tracing())
return ERR_PTR(-EPERM);
if (attr->map_flags & ~STACK_CREATE_FLAG_MASK)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 82eabd4e38ad..cd2d1b21f0f5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1176,7 +1176,7 @@ static int map_freeze(const union bpf_attr *attr)
err = -EBUSY;
goto err_put;
}
- if (!capable(CAP_SYS_ADMIN)) {
+ if (!capable_bpf()) {
err = -EPERM;
goto err_put;
}
@@ -1635,7 +1635,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
(attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
- !capable(CAP_SYS_ADMIN))
+ !capable_bpf())
return -EPERM;
/* copy eBPF program license from user space */
@@ -1648,11 +1648,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
is_gpl = license_is_gpl_compatible(license);
if (attr->insn_cnt == 0 ||
- attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
+ attr->insn_cnt > (capable_bpf() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
return -E2BIG;
if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
type != BPF_PROG_TYPE_CGROUP_SKB &&
- !capable(CAP_SYS_ADMIN))
+ !capable_bpf())
return -EPERM;
bpf_prog_load_fixup_attach_type(attr);
@@ -1809,6 +1809,9 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
char tp_name[128];
int tp_fd, err;
+ if (!capable_bpf_tracing())
+ return -EPERM;
+
if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
sizeof(tp_name) - 1) < 0)
return -EFAULT;
@@ -2087,7 +2090,10 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
struct bpf_prog *prog;
int ret = -ENOTSUPP;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf_net_admin())
+ /* test_run callback is available for networking progs only.
+ * Add capable_bpf_tracing() above when tracing progs become runable.
+ */
return -EPERM;
if (CHECK_ATTR(BPF_PROG_TEST_RUN))
return -EINVAL;
@@ -2124,7 +2130,7 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr,
if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX)
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
next_id++;
@@ -2150,7 +2156,7 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
if (CHECK_ATTR(BPF_PROG_GET_FD_BY_ID))
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
spin_lock_bh(&prog_idr_lock);
@@ -2184,7 +2190,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
attr->open_flags & ~BPF_OBJ_FLAG_MASK)
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
f_flags = bpf_get_file_flag(attr->open_flags);
@@ -2359,7 +2365,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
info.run_time_ns = stats.nsecs;
info.run_cnt = stats.cnt;
- if (!capable(CAP_SYS_ADMIN)) {
+ if (!capable_bpf()) {
info.jited_prog_len = 0;
info.xlated_prog_len = 0;
info.nr_jited_ksyms = 0;
@@ -2677,7 +2683,7 @@ static int bpf_btf_load(const union bpf_attr *attr)
if (CHECK_ATTR(BPF_BTF_LOAD))
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
return btf_new_fd(attr);
@@ -2690,7 +2696,7 @@ static int bpf_btf_get_fd_by_id(const union bpf_attr *attr)
if (CHECK_ATTR(BPF_BTF_GET_FD_BY_ID))
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
return btf_get_fd_by_id(attr->btf_id);
@@ -2759,7 +2765,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
if (CHECK_ATTR(BPF_TASK_FD_QUERY))
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf_tracing())
return -EPERM;
if (attr->task_fd_query.flags != 0)
@@ -2827,7 +2833,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
union bpf_attr attr = {};
int err;
- if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
+ if (sysctl_unprivileged_bpf_disabled && !capable_bpf())
return -EPERM;
err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3fb50757e812..7e519711c689 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9234,7 +9234,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
env->insn_aux_data[i].orig_idx = i;
env->prog = *prog;
env->ops = bpf_verifier_ops[env->prog->type];
- is_priv = capable(CAP_SYS_ADMIN);
+ is_priv = capable_bpf();
/* grab the mutex to protect few globals used by verifier */
if (!is_priv)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..cdf8d6c8a430 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1246,7 +1246,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
u32 *ids, prog_cnt, ids_len;
int ret;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf_tracing())
return -EPERM;
if (event->attr.type != PERF_TYPE_TRACEPOINT)
return -EINVAL;
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index da5639a5bd3b..aa74be21f5b6 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -616,7 +616,7 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
!attr->btf_key_type_id || !attr->btf_value_type_id)
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return -EPERM;
if (attr->value_size >= KMALLOC_MAX_SIZE -
diff --git a/net/core/filter.c b/net/core/filter.c
index ed6563622ce3..b233ed8438f1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5990,7 +5990,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
break;
}
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return NULL;
switch (func_id) {
@@ -5999,7 +5999,9 @@ bpf_base_func_proto(enum bpf_func_id func_id)
case BPF_FUNC_spin_unlock:
return &bpf_spin_unlock_proto;
case BPF_FUNC_trace_printk:
- return bpf_get_trace_printk_proto();
+ if (capable_bpf_tracing())
+ return bpf_get_trace_printk_proto();
+ /* fall through */
default:
return NULL;
}
@@ -6563,7 +6565,7 @@ static bool cg_skb_is_valid_access(int off, int size,
return false;
case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_end):
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return false;
break;
}
@@ -6575,7 +6577,7 @@ static bool cg_skb_is_valid_access(int off, int size,
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
break;
case bpf_ctx_range(struct __sk_buff, tstamp):
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable_bpf())
return false;
break;
default:
--
2.20.0
^ permalink raw reply related
* [PATCH v4 bpf-next 1/4] capability: introduce CAP_BPF and CAP_TRACING
From: Alexei Starovoitov @ 2019-09-06 23:10 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, luto, netdev, bpf, kernel-team, linux-api
In-Reply-To: <20190906231053.1276792-1-ast@kernel.org>
Split BPF and perf/tracing operations that are allowed under
CAP_SYS_ADMIN into corresponding CAP_BPF and CAP_TRACING.
For backward compatibility include them in CAP_SYS_ADMIN as well.
The end result provides simple safety model for applications that use BPF:
- for tracing program types
BPF_PROG_TYPE_{KPROBE, TRACEPOINT, PERF_EVENT, RAW_TRACEPOINT, etc}
use CAP_BPF and CAP_TRACING
- for networking program types
BPF_PROG_TYPE_{SCHED_CLS, XDP, CGROUP_SKB, SK_SKB, etc}
use CAP_BPF and CAP_NET_ADMIN
There are few exceptions from this simple rule:
- bpf_trace_printk() is allowed in networking programs, but it's using
ftrace mechanism, hence this helper needs additional CAP_TRACING.
- cpumap is used by XDP programs. Currently it's kept under CAP_SYS_ADMIN,
but could be relaxed to CAP_NET_ADMIN in the future.
- BPF_F_ZERO_SEED flag for hash/lru map is allowed under CAP_SYS_ADMIN only
to discourage production use.
- BPF HW offload is allowed under CAP_SYS_ADMIN.
- cg_sysctl, cg_device, lirc program types are neither networking nor tracing.
They can be loaded under CAP_BPF, but attach is allowed under CAP_NET_ADMIN.
This will be cleaned up in the future.
userid=nobody + (CAP_TRACING | CAP_NET_ADMIN) + CAP_BPF is safer than
typical setup with userid=root and sudo by existing bpf applications.
It's not secure, since these capabilities:
- allow bpf progs access arbitrary memory
- let tasks access any bpf map
- let tasks attach/detach any bpf prog
bpftool, bpftrace, bcc tools binaries should not be installed with
cap_bpf+cap_tracing, since unpriv users will be able to read kernel secrets.
CAP_BPF, CAP_NET_ADMIN, CAP_TRACING are roughly equal in terms of
damage they can make to the system.
Example:
CAP_NET_ADMIN can stop network traffic. CAP_BPF can write into map
and if that map is used by firewall-like bpf prog the network traffic
may stop.
CAP_BPF allows many bpf prog_load commands in parallel. The verifier
may consume large amount of memory and significantly slow down the system.
CAP_TRACING allows many kprobes that can slow down the system.
In the future more fine-grained bpf permissions may be added.
Existing unprivileged BPF operations are not affected.
In particular unprivileged users are allowed to load socket_filter and cg_skb
program types and to create array, hash, prog_array, map-in-map map types.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/capability.h | 18 +++++++++++
include/uapi/linux/capability.h | 49 ++++++++++++++++++++++++++++-
security/selinux/include/classmap.h | 4 +--
3 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..13eb49c75797 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -247,6 +247,24 @@ static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
return true;
}
#endif /* CONFIG_MULTIUSER */
+
+static inline bool capable_bpf(void)
+{
+ return capable(CAP_SYS_ADMIN) || capable(CAP_BPF);
+}
+static inline bool capable_tracing(void)
+{
+ return capable(CAP_SYS_ADMIN) || capable(CAP_TRACING);
+}
+static inline bool capable_bpf_tracing(void)
+{
+ return capable(CAP_SYS_ADMIN) || (capable(CAP_BPF) && capable(CAP_TRACING));
+}
+static inline bool capable_bpf_net_admin(void)
+{
+ return (capable(CAP_SYS_ADMIN) || capable(CAP_BPF)) && capable(CAP_NET_ADMIN);
+}
+
extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode);
extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..fe01d8235e1e 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -274,6 +274,7 @@ struct vfs_ns_cap_data {
arbitrary SCSI commands */
/* Allow setting encryption key on loopback filesystem */
/* Allow setting zone reclaim policy */
+/* Allow everything under CAP_BPF and CAP_TRACING for backward compatibility */
#define CAP_SYS_ADMIN 21
@@ -366,8 +367,54 @@ struct vfs_ns_cap_data {
#define CAP_AUDIT_READ 37
+/*
+ * CAP_BPF allows the following BPF operations:
+ * - Loading all types of BPF programs
+ * - Creating all types of BPF maps except:
+ * - stackmap that needs CAP_TRACING
+ * - devmap that needs CAP_NET_ADMIN
+ * - cpumap that needs CAP_SYS_ADMIN
+ * - Advanced verifier features
+ * - Indirect variable access
+ * - Bounded loops
+ * - BPF to BPF function calls
+ * - Scalar precision tracking
+ * - Larger complexity limits
+ * - Dead code elimination
+ * - And potentially other features
+ * - Use of pointer-to-integer conversions in BPF programs
+ * - Bypassing of speculation attack hardening measures
+ * - Loading BPF Type Format (BTF) data
+ * - Iterate system wide loaded programs, maps, BTF objects
+ * - Retrieve xlated and JITed code of BPF programs
+ * - Access maps and programs via id
+ * - Use bpf_spin_lock() helper
+ *
+ * CAP_BPF and CAP_TRACING together allow the following:
+ * - bpf_probe_read to read arbitrary kernel memory
+ * - bpf_trace_printk to print data to ftrace ring buffer
+ * - Attach to raw_tracepoint
+ * - Query association between kprobe/tracepoint and bpf program
+ *
+ * CAP_BPF and CAP_NET_ADMIN together allow the following:
+ * - Attach to cgroup-bpf hooks and query
+ * - skb, xdp, flow_dissector test_run command
+ *
+ * CAP_NET_ADMIN allows:
+ * - Attach networking bpf programs to xdp, tc, lwt, flow dissector
+ */
+#define CAP_BPF 38
+
+/*
+ * CAP_TRACING allows:
+ * - Full use of perf_event_open(), similarly to the effect of
+ * kernel.perf_event_paranoid == -1
+ * - Creation of [ku][ret]probe
+ * - Attach tracing bpf programs to perf events
+ */
+#define CAP_TRACING 39
-#define CAP_LAST_CAP CAP_AUDIT_READ
+#define CAP_LAST_CAP CAP_TRACING
#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 201f7e588a29..0b364e245163 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -26,9 +26,9 @@
"audit_control", "setfcap"
#define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \
- "wake_alarm", "block_suspend", "audit_read"
+ "wake_alarm", "block_suspend", "audit_read", "bpf", "tracing"
-#if CAP_LAST_CAP > CAP_AUDIT_READ
+#if CAP_LAST_CAP > CAP_TRACING
#error New capability defined, please update COMMON_CAP2_PERMS.
#endif
--
2.20.0
^ permalink raw reply related
* [PATCH v4 bpf-next 0/4] CAP_BPF and CAP_TRACING
From: Alexei Starovoitov @ 2019-09-06 23:10 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, luto, netdev, bpf, kernel-team, linux-api
v3->v4:
- rebase and typo fixes
- split selftests into separate patch
- update perf* docs with CAP_TRACING
- add a note to commit log that existing unpriv bpf behavior is not changing
v2->v3:
- dropped ftrace and kallsyms from CAP_TRACING description.
In the future these mechanisms can start using it too.
- added CAP_SYS_ADMIN backward compatibility.
Alexei Starovoitov (4):
capability: introduce CAP_BPF and CAP_TRACING
bpf: implement CAP_BPF
perf: implement CAP_TRACING
selftests/bpf: use CAP_BPF and CAP_TRACING in tests
Documentation/admin-guide/perf-security.rst | 4 +-
Documentation/admin-guide/sysctl/kernel.rst | 10 ++---
arch/powerpc/perf/core-book3s.c | 4 +-
arch/x86/events/intel/bts.c | 2 +-
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/p4.c | 2 +-
include/linux/capability.h | 18 ++++++++
include/uapi/linux/capability.h | 49 ++++++++++++++++++++-
kernel/bpf/arraymap.c | 2 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/core.c | 4 +-
kernel/bpf/hashtab.c | 4 +-
kernel/bpf/lpm_trie.c | 2 +-
kernel/bpf/queue_stack_maps.c | 2 +-
kernel/bpf/reuseport_array.c | 2 +-
kernel/bpf/stackmap.c | 2 +-
kernel/bpf/syscall.c | 32 ++++++++------
kernel/bpf/verifier.c | 2 +-
kernel/events/core.c | 14 +++---
kernel/events/hw_breakpoint.c | 2 +-
kernel/trace/bpf_trace.c | 2 +-
kernel/trace/trace_event_perf.c | 4 +-
net/core/bpf_sk_storage.c | 2 +-
net/core/filter.c | 10 +++--
security/selinux/include/classmap.h | 4 +-
tools/testing/selftests/bpf/test_verifier.c | 46 +++++++++++++++----
26 files changed, 165 insertions(+), 64 deletions(-)
--
2.20.0
^ permalink raw reply
* Re: [PATCH v2 0/5] Add support for O_MAYEXEC
From: Aleksa Sarai @ 2019-09-06 22:44 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Steve Grubb, Florian Weimer, Mickaël Salaün,
linux-kernel, Alexei Starovoitov, Al Viro, Andy Lutomirski,
Christian Heimes, Daniel Borkmann, Eric Chiang, James Morris,
Jan Kara, Jann Horn, Jonathan Corbet, Kees Cook, Matthew Garrett,
Matthew Wilcox, Michael Kerrisk, Mickaël Salaün,
Mimi Zohar
In-Reply-To: <C95B704C-F84F-4341-BDE7-CD70C5DDBEEF@amacapital.net>
[-- Attachment #1: Type: text/plain, Size: 2905 bytes --]
On 2019-09-06, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Sep 6, 2019, at 12:07 PM, Steve Grubb <sgrubb@redhat.com> wrote:
> >
> >> On Friday, September 6, 2019 2:57:00 PM EDT Florian Weimer wrote:
> >> * Steve Grubb:
> >>> Now with LD_AUDIT
> >>> $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test
> >>> 2>&1 | grep passwd openat(3, "passwd", O_RDONLY) = 4
> >>>
> >>> No O_CLOEXEC flag.
> >>
> >> I think you need to explain in detail why you consider this a problem.
> >
> > Because you can strip the O_MAYEXEC flag from being passed into the kernel.
> > Once you do that, you defeat the security mechanism because it never gets
> > invoked. The issue is that the only thing that knows _why_ something is being
> > opened is user space. With this mechanism, you can attempt to pass this
> > reason to the kernel so that it may see if policy permits this. But you can
> > just remove the flag.
>
> I’m with Florian here. Once you are executing code in a process, you
> could just emulate some other unapproved code. This series is not
> intended to provide the kind of absolute protection you’re imagining.
I also agree, though I think that there is a separate argument to be
made that there are two possible problems with O_MAYEXEC (which might
not be really big concerns):
* It's very footgun-prone if you didn't call O_MAYEXEC yourself and
you pass the descriptor elsewhere. You need to check f_flags to see
if it contains O_MAYEXEC. Maybe there is an argument to be made that
passing O_MAYEXECs around isn't a valid use-case, but in that case
there should be some warnings about that.
* There's effectively a TOCTOU flaw (even if you are sure O_MAYEXEC is
in f_flags) -- if the filesystem becomes re-mounted noexec (or the
file has a-x permissions) after you've done the check you won't get
hit with an error when you go to use the file descriptor later.
To fix both you'd need to do what you mention later:
> What the kernel *could* do is prevent mmapping a non-FMODE_EXEC file
> with PROT_EXEC, which would indeed have a real effect (in an iOS-like
> world, for example) but would break many, many things.
And I think this would be useful (with the two possible ways of
executing .text split into FMODE_EXEC and FMODE_MAP_EXEC, as mentioned
in a sister subthread), but would have to be opt-in for the obvious
reason you outlined. However, we could make it the default for
openat2(2) -- assuming we can agree on what the semantics of a
theoretical FMODE_EXEC should be.
And of course we'd need to do FMODE_UPGRADE_EXEC (which would need to
also permit fexecve(2) though probably not PROT_EXEC -- I don't think
you can mmap() an O_PATH descriptor).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
From: Aleksa Sarai @ 2019-09-06 22:18 UTC (permalink / raw)
To: Jeff Layton
Cc: Mickaël Salaün, Florian Weimer,
Mickaël Salaün, linux-kernel, Alexei Starovoitov,
Al Viro, Andy Lutomirski, Christian Heimes, Daniel Borkmann,
Eric Chiang, James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
Kees Cook, Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
Mimi Zohar
In-Reply-To: <20190906220546.gkqxzm4y3ynevvtz@yavin.dot.cyphar.com>
[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]
On 2019-09-07, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> > On Sat, 2019-09-07 at 03:13 +1000, Aleksa Sarai wrote:
> > > On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> > > > On Fri, 2019-09-06 at 18:06 +0200, Mickaël Salaün wrote:
> > > > > On 06/09/2019 17:56, Florian Weimer wrote:
> > > > > > Let's assume I want to add support for this to the glibc dynamic loader,
> > > > > > while still being able to run on older kernels.
> > > > > >
> > > > > > Is it safe to try the open call first, with O_MAYEXEC, and if that fails
> > > > > > with EINVAL, try again without O_MAYEXEC?
> > > > >
> > > > > The kernel ignore unknown open(2) flags, so yes, it is safe even for
> > > > > older kernel to use O_MAYEXEC.
> > > > >
> > > >
> > > > Well...maybe. What about existing programs that are sending down bogus
> > > > open flags? Once you turn this on, they may break...or provide a way to
> > > > circumvent the protections this gives.
> > >
> > > It should be noted that this has been a valid concern for every new O_*
> > > flag introduced (and yet we still introduced new flags, despite the
> > > concern) -- though to be fair, O_TMPFILE actually does have a
> > > work-around with the O_DIRECTORY mask setup.
> > >
> > > The openat2() set adds O_EMPTYPATH -- though in fairness it's also
> > > backwards compatible because empty path strings have always given ENOENT
> > > (or EINVAL?) while O_EMPTYPATH is a no-op non-empty strings.
> > >
> > > > Maybe this should be a new flag that is only usable in the new openat2()
> > > > syscall that's still under discussion? That syscall will enforce that
> > > > all flags are recognized. You presumably wouldn't need the sysctl if you
> > > > went that route too.
> > >
> > > I'm also interested in whether we could add an UPGRADE_NOEXEC flag to
> > > how->upgrade_mask for the openat2(2) patchset (I reserved a flag bit for
> > > it, since I'd heard about this work through the grape-vine).
> > >
> >
> > I rather like the idea of having openat2 fds be non-executable by
> > default, and having userland request it specifically via O_MAYEXEC (or
> > some similar openat2 flag) if it's needed. Then you could add an
> > UPGRADE_EXEC flag instead?
> >
> > That seems like something reasonable to do with a brand new API, and
> > might be very helpful for preventing certain classes of attacks.
>
> In that case, maybe openat2(2) should default to not allowing any
> upgrades by default? The reason I pitched UPGRADE_NOEXEC is because
> UPGRADE_NO{READ,WRITE} are the existing @how->upgrade_mask flags.
Sorry, another issue is that there isn't a current way to really
restrict fexecve() permissions (from my [limited] understanding,
__FMODE_EXEC isn't the right thing to use) -- so we can't blanket block
exec through openat2() O_PATH descriptors and add UPGRADE_EXEC later.
We would have to implement FMODE_EXEC (and FMODE_MAP_EXEC as you
suggested) in order to implement FMODE_UPGRADE_EXEC before we could even
get a first version of openat2(2) in. Though, I do (a little
begrudgingly) agree that we should have a safe default if possible
(magical O_PATH reopening trickery is something that most people don't
know about and probably wouldn't want to happen if they did).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
From: Aleksa Sarai @ 2019-09-06 22:12 UTC (permalink / raw)
To: Jeff Layton
Cc: Andy Lutomirski, Mickaël Salaün, Florian Weimer,
Mickaël Salaün, linux-kernel, Alexei Starovoitov,
Al Viro, Andy Lutomirski, Christian Heimes, Daniel Borkmann,
Eric Chiang, James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
Kees Cook, Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
Mimi Zohar
In-Reply-To: <8dc59d585a133e96f9adaf0a148334e7f19058b9.camel@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 5208 bytes --]
On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> On Fri, 2019-09-06 at 13:06 -0700, Andy Lutomirski wrote:
> > > On Sep 6, 2019, at 12:43 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > > On Sat, 2019-09-07 at 03:13 +1000, Aleksa Sarai wrote:
> > > > > On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > On Fri, 2019-09-06 at 18:06 +0200, Mickaël Salaün wrote:
> > > > > > > On 06/09/2019 17:56, Florian Weimer wrote:
> > > > > > > Let's assume I want to add support for this to the glibc dynamic loader,
> > > > > > > while still being able to run on older kernels.
> > > > > > >
> > > > > > > Is it safe to try the open call first, with O_MAYEXEC, and if that fails
> > > > > > > with EINVAL, try again without O_MAYEXEC?
> > > > > >
> > > > > > The kernel ignore unknown open(2) flags, so yes, it is safe even for
> > > > > > older kernel to use O_MAYEXEC.
> > > > > >
> > > > >
> > > > > Well...maybe. What about existing programs that are sending down bogus
> > > > > open flags? Once you turn this on, they may break...or provide a way to
> > > > > circumvent the protections this gives.
> > > >
> > > > It should be noted that this has been a valid concern for every new O_*
> > > > flag introduced (and yet we still introduced new flags, despite the
> > > > concern) -- though to be fair, O_TMPFILE actually does have a
> > > > work-around with the O_DIRECTORY mask setup.
> > > >
> > > > The openat2() set adds O_EMPTYPATH -- though in fairness it's also
> > > > backwards compatible because empty path strings have always given ENOENT
> > > > (or EINVAL?) while O_EMPTYPATH is a no-op non-empty strings.
> > > >
> > > > > Maybe this should be a new flag that is only usable in the new openat2()
> > > > > syscall that's still under discussion? That syscall will enforce that
> > > > > all flags are recognized. You presumably wouldn't need the sysctl if you
> > > > > went that route too.
> > > >
> > > > I'm also interested in whether we could add an UPGRADE_NOEXEC flag to
> > > > how->upgrade_mask for the openat2(2) patchset (I reserved a flag bit for
> > > > it, since I'd heard about this work through the grape-vine).
> > > >
> > >
> > > I rather like the idea of having openat2 fds be non-executable by
> > > default, and having userland request it specifically via O_MAYEXEC (or
> > > some similar openat2 flag) if it's needed. Then you could add an
> > > UPGRADE_EXEC flag instead?
> > >
> > > That seems like something reasonable to do with a brand new API, and
> > > might be very helpful for preventing certain classes of attacks.
> > >
> > >
> >
> > There are at least four concepts of executability here:
> >
> > - Just check the file mode and any other relevant permissions. Return a normal fd. Makes sense for script interpreters, perhaps.
> >
> > - Make the fd fexecve-able.
> >
> > - Make the resulting fd mappable PROT_EXEC.
> >
> > - Make the resulting fd upgradable.
> >
> > I’m not at all convinced that the kernel needs to distinguish all these, but at least upgradability should be its own thing IMO.
>
> Good point. Upgradability is definitely orthogonal, though the idea
> there is to alter the default behavior. If the default is NOEXEC then
> UPGRADE_EXEC would make sense.
>
> In any case, I was mostly thinking about the middle two in your list
> above. After more careful reading of the patches, I now get get that
> Mickaël is more interested in the first, and that's really a different
> sort of use-case.
>
> Most opens never result in the fd being fed to fexecve or mmapped with
> PROT_EXEC, so having userland explicitly opt-in to allowing that during
> the open sounds like a reasonable thing to do.
>
> But I get that preventing execution via script interpreters of files
> that are not executable might be something nice to have.
My first glance at the patch lead me to believe that this was about
blocking at fexecve()-time (which was what my first attempt at this
problem looked like) -- hence why I mentioned the upgrade_mask stuff
(because of the dances you can do with O_PATH, if blocking at
fexecve()-time was the goal then you seriously do need the upgrade_mask
and "O_PATH mask" in order for it to be even slightly secure).
But I also agree this is useful, and we can always add FMODE_EXEC,
FMODE_MAP_EXEC, and FMODE_UPGRADE_EXEC (and the related bits) at a later
date.
> Perhaps we need two flags for openat2?
>
> OA2_MAYEXEC : test that permissions allow execution and that the file
> doesn't reside on a noexec mount before allowing the open
>
> OA2_EXECABLE : only allow fexecve or mmapping with PROT_EXEC if the fd
> was opened with this
That seems reasonable to me. The only thing is that there currently
isn't any code to restrict fexecve() or PROT_EXEC in that fashion
(doubly so when you consider binfmt_script). So if we want to make
certain things default behaviour (such as disallowing exec by default)
we'd need to get the PROT_EXEC restriction work done first.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
From: Aleksa Sarai @ 2019-09-06 22:05 UTC (permalink / raw)
To: Jeff Layton
Cc: Mickaël Salaün, Florian Weimer,
Mickaël Salaün, linux-kernel, Alexei Starovoitov,
Al Viro, Andy Lutomirski, Christian Heimes, Daniel Borkmann,
Eric Chiang, James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
Kees Cook, Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
Mimi Zohar
In-Reply-To: <e1ac9428e6b768ac3145aafbe19b24dd6cf410b9.camel@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 2874 bytes --]
On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> On Sat, 2019-09-07 at 03:13 +1000, Aleksa Sarai wrote:
> > On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> > > On Fri, 2019-09-06 at 18:06 +0200, Mickaël Salaün wrote:
> > > > On 06/09/2019 17:56, Florian Weimer wrote:
> > > > > Let's assume I want to add support for this to the glibc dynamic loader,
> > > > > while still being able to run on older kernels.
> > > > >
> > > > > Is it safe to try the open call first, with O_MAYEXEC, and if that fails
> > > > > with EINVAL, try again without O_MAYEXEC?
> > > >
> > > > The kernel ignore unknown open(2) flags, so yes, it is safe even for
> > > > older kernel to use O_MAYEXEC.
> > > >
> > >
> > > Well...maybe. What about existing programs that are sending down bogus
> > > open flags? Once you turn this on, they may break...or provide a way to
> > > circumvent the protections this gives.
> >
> > It should be noted that this has been a valid concern for every new O_*
> > flag introduced (and yet we still introduced new flags, despite the
> > concern) -- though to be fair, O_TMPFILE actually does have a
> > work-around with the O_DIRECTORY mask setup.
> >
> > The openat2() set adds O_EMPTYPATH -- though in fairness it's also
> > backwards compatible because empty path strings have always given ENOENT
> > (or EINVAL?) while O_EMPTYPATH is a no-op non-empty strings.
> >
> > > Maybe this should be a new flag that is only usable in the new openat2()
> > > syscall that's still under discussion? That syscall will enforce that
> > > all flags are recognized. You presumably wouldn't need the sysctl if you
> > > went that route too.
> >
> > I'm also interested in whether we could add an UPGRADE_NOEXEC flag to
> > how->upgrade_mask for the openat2(2) patchset (I reserved a flag bit for
> > it, since I'd heard about this work through the grape-vine).
> >
>
> I rather like the idea of having openat2 fds be non-executable by
> default, and having userland request it specifically via O_MAYEXEC (or
> some similar openat2 flag) if it's needed. Then you could add an
> UPGRADE_EXEC flag instead?
>
> That seems like something reasonable to do with a brand new API, and
> might be very helpful for preventing certain classes of attacks.
In that case, maybe openat2(2) should default to not allowing any
upgrades by default? The reason I pitched UPGRADE_NOEXEC is because
UPGRADE_NO{READ,WRITE} are the existing @how->upgrade_mask flags.
However, I just noticed something else about this series -- if you do
O_PATH|O_MAYEXEC the new flag gets ignored. Given that you can do
fexecve(2) on an O_PATH (and O_PATHs have some other benefits), is this
something that we'd want to have?
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
From: Andy Lutomirski @ 2019-09-06 21:27 UTC (permalink / raw)
To: Jeff Layton
Cc: Aleksa Sarai, Mickaël Salaün, Florian Weimer,
Mickaël Salaün, LKML, Alexei Starovoitov, Al Viro,
Andy Lutomirski, Christian Heimes, Daniel Borkmann, Eric Chiang,
James Morris, Jan Kara, Jann Horn, Jonathan Corbet, Kees Cook,
Matthew Garrett, Matthew Wilcox, Michael Kerrisk, Mimi Zohar,
Philippe Trébuchet, Scott Shell
In-Reply-To: <8dc59d585a133e96f9adaf0a148334e7f19058b9.camel@kernel.org>
> On Sep 6, 2019, at 1:51 PM, Jeff Layton <jlayton@kernel.org> wrote:
>
> On Fri, 2019-09-06 at 13:06 -0700, Andy Lutomirski wrote:
>
>> I’m not at all convinced that the kernel needs to distinguish all these, but at least upgradability should be its own thing IMO.
>
> Good point. Upgradability is definitely orthogonal, though the idea
> there is to alter the default behavior. If the default is NOEXEC then
> UPGRADE_EXEC would make sense.
>
> In any case, I was mostly thinking about the middle two in your list
> above. After more careful reading of the patches, I now get get that
> Mickaël is more interested in the first, and that's really a different
> sort of use-case.
>
> Most opens never result in the fd being fed to fexecve or mmapped with
> PROT_EXEC, so having userland explicitly opt-in to allowing that during
> the open sounds like a reasonable thing to do.
>
> But I get that preventing execution via script interpreters of files
> that are not executable might be something nice to have.
>
> Perhaps we need two flags for openat2?
>
> OA2_MAYEXEC : test that permissions allow execution and that the file
> doesn't reside on a noexec mount before allowing the open
>
> OA2_EXECABLE : only allow fexecve or mmapping with PROT_EXEC if the fd
> was opened with this
>
>
>
We could go one step farther and have three masks: check_perms,
fd_perms, and upgrade_perms. check_perms says “fail if I don’t have
these perms”. fd_perms is the permissions on the returned fd, and
upgrade_perms is the upgrade mask. (fd_perms & ~check_perms) != 0 is
an error. This makes it possible to say "I want to make sure the file
is writable, but I don't actually want to write to it", which could
plausibly be useful.
I would argue that these things should have new, sane bits, e.g.
FILE_READ, FILE_WRITE, and FILE_EXECUTE (or maybe FILE_MAP_EXEC and
FILE_EXECVE). And maybe there should be at least 16 bits for each
mask reserved. Windows has a lot more mode bits than Linux, and it's
not entirely nuts. We do *not* need any direct equivalent of O_RDWR
for openat2().
--Andy
^ permalink raw reply
* Re: Why add the general notification queue and its sources
From: David Howells @ 2019-09-06 21:19 UTC (permalink / raw)
To: Theodore Y. Ts'o
Cc: dhowells, Linus Torvalds, Steven Whitehouse, Ray Strode,
Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb,
linux-block, Christian Brauner, LSM List, linux-fsdevel,
Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi,
Robbie Harwood
In-Reply-To: <20190906174102.GB2819@mit.edu>
Theodore Y. Ts'o <tytso@mit.edu> wrote:
> Something else which we should consider up front is how to handle the
> case where you have multiple userspace processes that want to
> subscribe to the same notification.
I have that.
> This also implies that we'll need to have some kind of standard header
> at the beginning to specify the source of a particular notification
> message.
That too.
David
^ permalink raw reply
* Re: [PATCH v2 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
From: Jeff Layton @ 2019-09-06 20:51 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Aleksa Sarai, Mickaël Salaün, Florian Weimer,
Mickaël Salaün, linux-kernel, Alexei Starovoitov,
Al Viro, Andy Lutomirski, Christian Heimes, Daniel Borkmann,
Eric Chiang, James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
Kees Cook, Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
Mimi Zohar
In-Reply-To: <D2A57C7B-B0FD-424E-9F81-B858FFF21FF0@amacapital.net>
On Fri, 2019-09-06 at 13:06 -0700, Andy Lutomirski wrote:
> > On Sep 6, 2019, at 12:43 PM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > > On Sat, 2019-09-07 at 03:13 +1000, Aleksa Sarai wrote:
> > > > On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > On Fri, 2019-09-06 at 18:06 +0200, Mickaël Salaün wrote:
> > > > > > On 06/09/2019 17:56, Florian Weimer wrote:
> > > > > > Let's assume I want to add support for this to the glibc dynamic loader,
> > > > > > while still being able to run on older kernels.
> > > > > >
> > > > > > Is it safe to try the open call first, with O_MAYEXEC, and if that fails
> > > > > > with EINVAL, try again without O_MAYEXEC?
> > > > >
> > > > > The kernel ignore unknown open(2) flags, so yes, it is safe even for
> > > > > older kernel to use O_MAYEXEC.
> > > > >
> > > >
> > > > Well...maybe. What about existing programs that are sending down bogus
> > > > open flags? Once you turn this on, they may break...or provide a way to
> > > > circumvent the protections this gives.
> > >
> > > It should be noted that this has been a valid concern for every new O_*
> > > flag introduced (and yet we still introduced new flags, despite the
> > > concern) -- though to be fair, O_TMPFILE actually does have a
> > > work-around with the O_DIRECTORY mask setup.
> > >
> > > The openat2() set adds O_EMPTYPATH -- though in fairness it's also
> > > backwards compatible because empty path strings have always given ENOENT
> > > (or EINVAL?) while O_EMPTYPATH is a no-op non-empty strings.
> > >
> > > > Maybe this should be a new flag that is only usable in the new openat2()
> > > > syscall that's still under discussion? That syscall will enforce that
> > > > all flags are recognized. You presumably wouldn't need the sysctl if you
> > > > went that route too.
> > >
> > > I'm also interested in whether we could add an UPGRADE_NOEXEC flag to
> > > how->upgrade_mask for the openat2(2) patchset (I reserved a flag bit for
> > > it, since I'd heard about this work through the grape-vine).
> > >
> >
> > I rather like the idea of having openat2 fds be non-executable by
> > default, and having userland request it specifically via O_MAYEXEC (or
> > some similar openat2 flag) if it's needed. Then you could add an
> > UPGRADE_EXEC flag instead?
> >
> > That seems like something reasonable to do with a brand new API, and
> > might be very helpful for preventing certain classes of attacks.
> >
> >
>
> There are at least four concepts of executability here:
>
> - Just check the file mode and any other relevant permissions. Return a normal fd. Makes sense for script interpreters, perhaps.
>
> - Make the fd fexecve-able.
>
> - Make the resulting fd mappable PROT_EXEC.
>
> - Make the resulting fd upgradable.
>
> I’m not at all convinced that the kernel needs to distinguish all these, but at least upgradability should be its own thing IMO.
Good point. Upgradability is definitely orthogonal, though the idea
there is to alter the default behavior. If the default is NOEXEC then
UPGRADE_EXEC would make sense.
In any case, I was mostly thinking about the middle two in your list
above. After more careful reading of the patches, I now get get that
Mickaël is more interested in the first, and that's really a different
sort of use-case.
Most opens never result in the fd being fed to fexecve or mmapped with
PROT_EXEC, so having userland explicitly opt-in to allowing that during
the open sounds like a reasonable thing to do.
But I get that preventing execution via script interpreters of files
that are not executable might be something nice to have.
Perhaps we need two flags for openat2?
OA2_MAYEXEC : test that permissions allow execution and that the file
doesn't reside on a noexec mount before allowing the open
OA2_EXECABLE : only allow fexecve or mmapping with PROT_EXEC if the fd
was opened with this
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply
* Re: [PATCH v2 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
From: Andy Lutomirski @ 2019-09-06 20:06 UTC (permalink / raw)
To: Jeff Layton
Cc: Aleksa Sarai, Mickaël Salaün, Florian Weimer,
Mickaël Salaün, linux-kernel, Alexei Starovoitov,
Al Viro, Andy Lutomirski, Christian Heimes, Daniel Borkmann,
Eric Chiang, James Morris, Jan Kara, Jann Horn, Jonathan Corbet,
Kees Cook, Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
Mimi Zohar
In-Reply-To: <e1ac9428e6b768ac3145aafbe19b24dd6cf410b9.camel@kernel.org>
> On Sep 6, 2019, at 12:43 PM, Jeff Layton <jlayton@kernel.org> wrote:
>
>> On Sat, 2019-09-07 at 03:13 +1000, Aleksa Sarai wrote:
>>> On 2019-09-06, Jeff Layton <jlayton@kernel.org> wrote:
>>>> On Fri, 2019-09-06 at 18:06 +0200, Mickaël Salaün wrote:
>>>>> On 06/09/2019 17:56, Florian Weimer wrote:
>>>>> Let's assume I want to add support for this to the glibc dynamic loader,
>>>>> while still being able to run on older kernels.
>>>>>
>>>>> Is it safe to try the open call first, with O_MAYEXEC, and if that fails
>>>>> with EINVAL, try again without O_MAYEXEC?
>>>>
>>>> The kernel ignore unknown open(2) flags, so yes, it is safe even for
>>>> older kernel to use O_MAYEXEC.
>>>>
>>>
>>> Well...maybe. What about existing programs that are sending down bogus
>>> open flags? Once you turn this on, they may break...or provide a way to
>>> circumvent the protections this gives.
>>
>> It should be noted that this has been a valid concern for every new O_*
>> flag introduced (and yet we still introduced new flags, despite the
>> concern) -- though to be fair, O_TMPFILE actually does have a
>> work-around with the O_DIRECTORY mask setup.
>>
>> The openat2() set adds O_EMPTYPATH -- though in fairness it's also
>> backwards compatible because empty path strings have always given ENOENT
>> (or EINVAL?) while O_EMPTYPATH is a no-op non-empty strings.
>>
>>> Maybe this should be a new flag that is only usable in the new openat2()
>>> syscall that's still under discussion? That syscall will enforce that
>>> all flags are recognized. You presumably wouldn't need the sysctl if you
>>> went that route too.
>>
>> I'm also interested in whether we could add an UPGRADE_NOEXEC flag to
>> how->upgrade_mask for the openat2(2) patchset (I reserved a flag bit for
>> it, since I'd heard about this work through the grape-vine).
>>
>
> I rather like the idea of having openat2 fds be non-executable by
> default, and having userland request it specifically via O_MAYEXEC (or
> some similar openat2 flag) if it's needed. Then you could add an
> UPGRADE_EXEC flag instead?
>
> That seems like something reasonable to do with a brand new API, and
> might be very helpful for preventing certain classes of attacks.
>
>
There are at least four concepts of executability here:
- Just check the file mode and any other relevant permissions. Return a normal fd. Makes sense for script interpreters, perhaps.
- Make the fd fexecve-able.
- Make the resulting fd mappable PROT_EXEC.
- Make the resulting fd upgradable.
I’m not at all convinced that the kernel needs to distinguish all these, but at least upgradability should be its own thing IMO.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox