From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3E16C4346E for ; Thu, 24 Sep 2020 23:57:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F3A2221EB for ; Thu, 24 Sep 2020 23:57:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="ZoSBF1dT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726730AbgIXX4z (ORCPT ); Thu, 24 Sep 2020 19:56:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726631AbgIXX4z (ORCPT ); Thu, 24 Sep 2020 19:56:55 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B67EEC0613CE for ; Thu, 24 Sep 2020 16:56:54 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id q4so819236pjh.5 for ; Thu, 24 Sep 2020 16:56:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=PEQHQdRyQ9xsSCdQzKyvMmoiCwe5AhDytk6haiQYAYA=; b=ZoSBF1dT+8ZzaoiNAlcF2KrzB+hxJVDnGMsShZQ+860GdXPO2wmj7dDRwbYZeZ6FP0 tzTDEEpd16UQ9NlRiaEe6R1ZIMkEjYQrDuWy5groU3A3GHraaAl994ENGTGfOqf1E+be 7Fd1TTa/XVDOW8VwnCn8keFubI4lgYN/gRBv4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=PEQHQdRyQ9xsSCdQzKyvMmoiCwe5AhDytk6haiQYAYA=; b=H6T8HOUGogo/jHPjsjAIo5x4DNztp+JvEx41S9ehGmKZU0W643zBci85PKrEftWoij ETjVTczWVp6K01oOxwRj99F0gf0rF58ydehLRFwe4f9lyTHuNBvF4Xq4eLRhgKSMoTt1 D7EXsV3CjDUX9Vi6HzKvmjsnGSawznapQoEx3AdmNEChwbJpDILbpi3N0SlcQ0rcIVtG VaQdPan4RZdicSRb5c58P9dd4UHVLJuU9bOczn35Oa6ACNrehjCJQcci+rRLcqcURnuV DduelSop9WFE9vjZnEdxI1gqIQJzg0CFmxcJkehes+CZ1bzCB1KPhpnD6+PglFxznp8Y I5Rg== X-Gm-Message-State: AOAM532X/Nd+Z3CY4yTkAXDvnxg+BzO9+xt01i8Wtk4JwiKTV2P2ETeK GQK1pq3Oofk2wGmlJKgntqdyEw== X-Google-Smtp-Source: ABdhPJzyfryL0iIu0PMOcXe+glEbhoRCWk4I6igRlfPAIK3CdtHgMQxMMQcPAkl7466Tlrc0Ju8TcA== X-Received: by 2002:a17:90a:db56:: with SMTP id u22mr95780pjx.85.1600991814140; Thu, 24 Sep 2020 16:56:54 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id d25sm418889pgl.23.2020.09.24.16.56.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Sep 2020 16:56:53 -0700 (PDT) Date: Thu, 24 Sep 2020 16:56:52 -0700 From: Kees Cook To: YiFei Zhu Cc: containers@lists.linux-foundation.org, YiFei Zhu , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksa Sarai , Andrea Arcangeli , Andy Lutomirski , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jack Chen , Jann Horn , Josep Torrellas , Tianyin Xu , Tobin Feldman-Fitzthum , Tycho Andersen , Valentin Rothberg , Will Drewry Subject: Re: [PATCH v2 seccomp 6/6] seccomp/cache: Report cache data through /proc/pid/seccomp_cache Message-ID: <202009241647.2239747F0@keescook> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Thu, Sep 24, 2020 at 07:44:21AM -0500, YiFei Zhu wrote: > From: YiFei Zhu > > Currently the kernel does not provide an infrastructure to translate > architecture numbers to a human-readable name. Translating syscall > numbers to syscall names is possible through FTRACE_SYSCALL > infrastructure but it does not provide support for compat syscalls. > > This will create a file for each PID as /proc/pid/seccomp_cache. > The file will be empty when no seccomp filters are loaded, or be > in the format of: > > where ALLOW means the cache is guaranteed to allow the syscall, > and filter means the cache will pass the syscall to the BPF filter. > > For the docker default profile on x86_64 it looks like: > c000003e 0 ALLOW > c000003e 1 ALLOW > c000003e 2 ALLOW > c000003e 3 ALLOW > [...] > c000003e 132 ALLOW > c000003e 133 ALLOW > c000003e 134 FILTER > c000003e 135 FILTER > c000003e 136 FILTER > c000003e 137 ALLOW > c000003e 138 ALLOW > c000003e 139 FILTER > c000003e 140 ALLOW > c000003e 141 ALLOW > [...] > > This file is guarded by CONFIG_PROC_SECCOMP_CACHE with a default > of N because I think certain users of seecomp might not want the > application to know which syscalls are definitely usable. > > I'm not sure if adding all the "human readable names" is worthwhile, > considering it can be easily done in userspace. The question of permissions is my central concern here: who should see this? Some contained processes have been intentionally blocked from self-introspection so even the "standard" high bar of "ptrace attach allowed?" can't always be sufficient. My compromise about filter visibility in the past was saying that CAP_SYS_ADMIN was required (see seccomp_get_filter()). I'm nervous to weaken this. (There is some work that hasn't been sent upstream yet that is looking to expose the filter _contents_ via /proc that has been nervous too.) Now full contents vs "allow"/"filter" are certainly different things, but I don't feel like I've got enough evidence to show that this introspection would help debugging enough to justify the partially imagined safety of not exposing it to potential attackers. I suspect it _is_ the right thing to do (just look at my own RFC's "debug" patch), but I'd like this to be well justified in the commit log. And yes, while it does hide behind a CONFIG, I'd still want it justified, especially since distros have a tendency to just turn everything on anyway. ;) > + for (arch = 0; arch < ARRAY_SIZE(syscall_arches); arch++) { > + for (nr = 0; nr < NR_syscalls; nr++) { > + bool cached = test_bit(nr, f->cache.syscall_ok[arch]); > + char *status = cached ? "ALLOW" : "FILTER"; > + > + seq_printf(m, "%08x %d %s\n", syscall_arches[arch], > + nr, status > + ); > + } > + } But behavior-wise, yeah, I like it; I'm fine with human-readable and full AUDIT_ARCH values. (Though, as devil's advocate again, to repeat Jann's own words back: do we want to add this only to have a new UAPI to support going forward?) -- Kees Cook