Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [RFC PATCH 1/7] x86/cpufeatures: add X86_FEATURE_SCI
From: hackapple @ 2020-06-30  0:08 UTC (permalink / raw)
  To: rppt
  Cc: James.Bottomley, alexandre.chartre, bp, dave.hansen, hpa, jwadams,
	keescook, linux-kernel, linux-mm, linux-security-module, luto,
	mingo, peterz, pjt, tglx, x86

What’s the version of kernel?

^ permalink raw reply

* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen
From: Eric W. Biederman @ 2020-06-30  1:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf,
	linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada,
	Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler,
	Luis Chamberlain, Linus Torvalds
In-Reply-To: <20200629221231.jjc2czk3ul2roxkw@ast-mbp.dhcp.thefacebook.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote:
>> 
>> I have tested thes changes by booting with the code compiled in and
>> by killing "bpfilter_umh" and running iptables -vnL to restart
>> the userspace driver.
>> 
>> I have compiled tested each change with and without CONFIG_BPFILTER
>> enabled.
>
> With
> CONFIG_BPFILTER=y
> CONFIG_BPFILTER_UMH=m
> it doesn't build:
>
> ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined!
>
> I've added:
> +EXPORT_SYMBOL(kill_pid_info);
> to continue testing...
>
> And then did:
> while true; do iptables -L;rmmod bpfilter; done
>  
> Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event().
>
> I suspect patch 13 is somehow responsible:
> +	if (tgid) {
> +		kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid);
> +		wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID));
> +		bpfilter_umh_cleanup(info);
> +	}
>
> I cannot figure out why it hangs. Some sort of race ?
> Since adding short delay between kill and wait makes it work.

Thanks.  I will take a look.

Eric

^ permalink raw reply

* Re: [PATCH v2 10/15] exec: Remove do_execve_file
From: Christoph Hellwig @ 2020-06-30  5:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds
In-Reply-To: <87lfk54p0m.fsf_-_@x220.int.ebiederm.org>

FYI, this clashes badly with my exec rework.  I'd suggest you
drop everything touching exec here for now, and I can then
add the final file based exec removal to the end of my series.

^ permalink raw reply

* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen
From: Tetsuo Handa @ 2020-06-30  6:16 UTC (permalink / raw)
  To: Eric W. Biederman, Alexei Starovoitov
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook,
	Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel,
	Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin,
	Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain,
	Linus Torvalds
In-Reply-To: <87y2o51hkk.fsf@x220.int.ebiederm.org>

On 2020/06/30 10:13, Eric W. Biederman wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
>> On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote:
>>>
>>> I have tested thes changes by booting with the code compiled in and
>>> by killing "bpfilter_umh" and running iptables -vnL to restart
>>> the userspace driver.
>>>
>>> I have compiled tested each change with and without CONFIG_BPFILTER
>>> enabled.
>>
>> With
>> CONFIG_BPFILTER=y
>> CONFIG_BPFILTER_UMH=m
>> it doesn't build:
>>
>> ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined!
>>
>> I've added:
>> +EXPORT_SYMBOL(kill_pid_info);
>> to continue testing...

kill_pid() is already exported.

>>
>> And then did:
>> while true; do iptables -L;rmmod bpfilter; done
>>  
>> Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event().
>>
>> I suspect patch 13 is somehow responsible:
>> +	if (tgid) {
>> +		kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid);
>> +		wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID));
>> +		bpfilter_umh_cleanup(info);
>> +	}
>>
>> I cannot figure out why it hangs. Some sort of race ?
>> Since adding short delay between kill and wait makes it work.

Because there is a race window that detach_pid() from __unhash_process() from
__exit_signal() from release_task() from exit_notify() from do_exit() is called
some time after wake_up_all(&pid->wait_pidfd) from do_notify_pidfd() from
do_notify_parent() from exit_notify() from do_exit() was called (in other words,
we can't use pid->wait_pidfd when pid_task() is used at wait_event()) ?

Below are changes I suggest.

diff --git a/kernel/umd.c b/kernel/umd.c
index ff79fb16d738..f688813b8830 100644
--- a/kernel/umd.c
+++ b/kernel/umd.c
@@ -26,7 +26,7 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
 	if (IS_ERR(mnt))
 		return mnt;
 
-	file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY, 0700);
+	file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY | O_EXCL, 0700);
 	if (IS_ERR(file)) {
 		mntput(mnt);
 		return ERR_CAST(file);
@@ -52,16 +52,18 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
 
 /**
  * umd_load_blob - Remember a blob of bytes for fork_usermode_driver
- * @info: information about usermode driver
- * @data: a blob of bytes that can be executed as a file
- * @len:  The lentgh of the blob
+ * @info: information about usermode driver (shouldn't be NULL)
+ * @data: a blob of bytes that can be executed as a file (shouldn't be NULL)
+ * @len:  The lentgh of the blob (shouldn't be 0)
  *
  */
 int umd_load_blob(struct umd_info *info, const void *data, size_t len)
 {
 	struct vfsmount *mnt;
 
-	if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt))
+	if (!info || !info->driver_name || !data || !len)
+		return -EINVAL;
+	if (info->wd.dentry || info->wd.mnt)
 		return -EBUSY;
 
 	mnt = blob_to_mnt(data, len, info->driver_name);
@@ -76,15 +78,14 @@ EXPORT_SYMBOL_GPL(umd_load_blob);
 
 /**
  * umd_unload_blob - Disassociate @info from a previously loaded blob
- * @info: information about usermode driver
+ * @info: information about usermode driver (shouldn't be NULL)
  *
  */
 int umd_unload_blob(struct umd_info *info)
 {
-	if (WARN_ON_ONCE(!info->wd.mnt ||
-			 !info->wd.dentry ||
-			 info->wd.mnt->mnt_root != info->wd.dentry))
+	if (!info || !info->driver_name || !info->wd.dentry || !info->wd.mnt)
 		return -EINVAL;
+	BUG_ON(info->wd.mnt->mnt_root != info->wd.dentry);
 
 	kern_unmount(info->wd.mnt);
 	info->wd.mnt = NULL;
@@ -138,7 +139,7 @@ static void umd_cleanup(struct subprocess_info *info)
 {
 	struct umd_info *umd_info = info->data;
 
-	/* cleanup if umh_setup() was successful but exec failed */
+	/* cleanup if umd_setup() was successful but exec failed */
 	if (info->retval) {
 		fput(umd_info->pipe_to_umh);
 		fput(umd_info->pipe_from_umh);
@@ -163,7 +164,10 @@ int fork_usermode_driver(struct umd_info *info)
 	const char *argv[] = { info->driver_name, NULL };
 	int err;
 
-	if (WARN_ON_ONCE(info->tgid))
+	if (!info || !info->driver_name || !info->wd.dentry || !info->wd.mnt)
+		return -EINVAL;
+	BUG_ON(info->wd.mnt->mnt_root != info->wd.dentry);
+	if (info->tgid)
 		return -EBUSY;
 
 	err = -ENOMEM;
diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index 91474884ddb7..9dd70aacb81a 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -19,8 +19,13 @@ static void shutdown_umh(void)
 	struct pid *tgid = info->tgid;
 
 	if (tgid) {
-		kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid);
-		wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID));
+		kill_pid(tgid, SIGKILL, 1);
+		while (({ bool done;
+			  rcu_read_lock();
+			  done = !pid_task(tgid, PIDTYPE_TGID);
+			  rcu_read_unlock();
+			  done; }))
+			schedule_timeout_uninterruptible(1);
 		bpfilter_umh_cleanup(info);
 	}
 }


^ permalink raw reply related

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Tetsuo Handa @ 2020-06-30  6:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexei Starovoitov, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <874kqt39qo.fsf@x220.int.ebiederm.org>

On 2020/06/30 5:19, Eric W. Biederman wrote:
> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> 
>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>> But all the defensive programming kinda goes against general kernel style.
>>> I wouldn't do it. Especially pr_info() ?!
>>> Though I don't feel strongly about it.
>>
>> Honestly speaking, caller should check for errors and print appropriate
>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>> went wrong (maybe memory corruption). But other conditions are not fatal.
>> That is, I consider even pr_info() here should be unnecessary.
> 
> They were all should never happen cases.  Which is why my patches do:
> if (WARN_ON_ONCE(...))

No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
This bug was unfortunately not found by syzkaller because this path is
not easily reachable via syscall interface.

> 
> That let's the caller know the messed up very clearly while still
> providing a change to continue.
> 
> If they were clearly corruption no ones kernel should ever continue
> BUG_ON would be appropriate.

Please use BUG_ON() (to only corruption case) like I suggested in my updated diff.


^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Fix regression on empty requests
From: Naresh Kamboju @ 2020-06-30  8:48 UTC (permalink / raw)
  To: Herbert Xu, Eric Biggers
  Cc: Luis Chamberlain, LTP List, open list, linux-security-module,
	keyrings, lkft-triage, Linux Crypto Mailing List, Jan Stancek,
	chrubis, Serge E. Hallyn, James Morris, Jarkko Sakkinen,
	David Howells, David S. Miller, Sachin Sant,
	Linux Next Mailing List, linuxppc-dev, linux- stable
In-Reply-To: <20200626062948.GA25285@gondor.apana.org.au>

On Fri, 26 Jun 2020 at 12:00, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, Jun 23, 2020 at 10:02:17AM -0700, Eric Biggers wrote:
> >
> > The source code for the two failing AF_ALG tests is here:
> >
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypto/af_alg02.c
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypto/af_alg05.c
> >
> > They use read() and write(), not send() and recv().
> >
> > af_alg02 uses read() to read from a "salsa20" request socket without writing
> > anything to it.  It is expected that this returns 0, i.e. that behaves like
> > encrypting an empty message.

Since we are on this subject,
LTP af_alg02  test case fails on stable 4.9 and stable 4.4
This is not a regression because the test case has been failing from
the beginning.

Is this test case expected to fail on stable 4.9 and 4.4 ?
or any chance to fix this on these older branches ?

Test output:
af_alg02.c:52: BROK: Timed out while reading from request socket.

ref:
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191-g082e807235d7/testrun/2884917/suite/ltp-crypto-tests/test/af_alg02/history/
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191-g082e807235d7/testrun/2884606/suite/ltp-crypto-tests/test/af_alg02/log

- Naresh

^ permalink raw reply

* Re: [PATCH v33 12/21] x86/sgx: Allow a limited use of ATTRIBUTE.PROVISIONKEY for attestation
From: Borislav Petkov @ 2020-06-30  8:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jarkko Sakkinen, x86, linux-sgx, linux-kernel,
	linux-security-module, Jethro Beekman, Andy Lutomirski, akpm,
	andriy.shevchenko, asapek, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, nhorman, npmccallum,
	puiterwijk, rientjes, tglx, yaozhangx
In-Reply-To: <20200629220400.GI12312@linux.intel.com>

On Mon, Jun 29, 2020 at 03:04:00PM -0700, Sean Christopherson wrote:
> > I don't see this acronym resolved anywhere in the whole patchset.
> 
> Quoting Enclave.

Yah, pls add it somewhere.

> /dev/sgx/provision is root-only by default, the expectation is that the admin
> will configure the system to grant only specific enclaves access to the
> PROVISION_KEY.

Uuh, I don't like "the expectation is" - the reality happens to turn
differently, more often than not.

> In this series, access is fairly binary, i.e. there's no additional kernel
> infrastructure to help userspace make per-enclave decisions.  There have been
> more than a few proposals on how to extend the kernel to help provide better
> granularity, e.g. LSM hooks, but it was generally agreed to punt that stuff
> to post-upstreaming to keep things "simple" once we went far enough down
> various paths to ensure we weren't painting ourselves into a corner.

So this all sounds to me like we should not upstream /dev/sgx/provision
now but delay it until the infrastructure for that has been made more
concrete. We can always add it then. Changing it after the fact -
if we have to and for whatever reason - would be a lot harder for a
user-visible interface which someone has started using already.

So I'd leave  that out from the initial patchset.

> If you want super gory details, Intel's whitepaper on attestation in cloud
> environments is a good starting point[*], but I don't recommended doing much
> more than skimming unless you really like attestation stuff or are
> masochistic, which IMO amount to the same thing :-)

No thanks. :)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/4] bpf: Implement bpf_local_storage for inodes
From: KP Singh @ 2020-06-30 11:49 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: KP Singh, bpf, linux-security-module, Alexei Starovoitov,
	Daniel Borkmann, Paul Turner, Jann Horn
In-Reply-To: <20200619065245.t755bkffk6zleoi2@kafai-mbp.dhcp.thefacebook.com>

On 18-Jun 23:52, Martin KaFai Lau wrote:
> On Wed, Jun 17, 2020 at 10:29:39PM +0200, KP Singh wrote:
> [ ... ]
> 
> > diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> > index af74712af585..8efd7562e3de 100644
> > --- a/include/linux/bpf_lsm.h
> > +++ b/include/linux/bpf_lsm.h
> > @@ -17,9 +17,24 @@
> >  #include <linux/lsm_hook_defs.h>
> >  #undef LSM_HOOK
> >  
> > +struct bpf_storage_blob {
> > +	struct bpf_local_storage __rcu *storage;
> > +};
> > +
> > +extern struct lsm_blob_sizes bpf_lsm_blob_sizes;
> > +
> >  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> >  			const struct bpf_prog *prog);
> >  
> > +static inline struct bpf_storage_blob *bpf_inode(
> > +	const struct inode *inode)
> > +{
> > +	if (unlikely(!inode->i_security))
> > +		return NULL;
> > +
> > +	return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
> > +}
> > +
> >  #else /* !CONFIG_BPF_LSM */
> >  
> >  static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> > @@ -28,6 +43,12 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> >  	return -EOPNOTSUPP;
> >  }
> >  
> > +static inline struct bpf_storage_blob *bpf_inode_storage(
> This does not seem to match the newly added "bpf_inode()"
> above for the "CONFIG_BPF_LSM" case.
> 
> A typo?  May be a good idea to test compiling with !CONFIG_BPF_LSM.

Sorry about that, yeah it was a last minute lazy rename. Will
compile test the series with !CONFIG_BPF_LSM and !CONFIG_NET. Thanks.

> 
> > +	const struct inode *inode)
> > +{
> > +	return NULL;
> > +}
> > +
> >  #endif /* CONFIG_BPF_LSM */
> >  
> >  #endif /* _LINUX_BPF_LSM_H */
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index a18ae82a298a..881e7954c956 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -101,6 +101,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP_HASH, dev_map_hash_ops)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
> > +BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
> sk_storage is under CONFIG_NET.
> 
> inode_storage should be CONFIG_BPF_LSM?

Thanks, updated.

- KP

> 
> >  #if defined(CONFIG_BPF_STREAM_PARSER)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)

^ permalink raw reply

* Re: [PATCH bpf-next v2 4/4] bpf: Add selftests for local_storage
From: KP Singh @ 2020-06-30 11:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: KP Singh, bpf, linux-security-module, Alexei Starovoitov,
	Daniel Borkmann, Paul Turner, Jann Horn
In-Reply-To: <CAEf4BzZdUWUSLzT1Y-o1Yvy3tTETkJEVU7RyZufZY_yEKzwOSg@mail.gmail.com>

On 18-Jun 11:16, Andrii Nakryiko wrote:
> On Wed, Jun 17, 2020 at 1:31 PM KP Singh <kpsingh@chromium.org> wrote:
> >
> > From: KP Singh <kpsingh@google.com>
> >
> > inode_local_storage:
> >
> > * Hook to the file_open and inode_unlink LSM hooks.
> > * Create and unlink a temporary file.
> > * Store some information in the inode's bpf_local_storage during
> >   file_open.
> > * Verify that this information exists when the file is unlinked.
> >
> > sk_local_storage:
> >
> > * Hook to the socket_post_create and socket_bind LSM hooks.
> > * Open and bind a socket and set the sk_storage in the
> >   socket_post_create hook using the start_server helper.
> > * Verify if the information is set in the socket_bind hook.
> >
> > Signed-off-by: KP Singh <kpsingh@google.com>
> > ---
> >  .../bpf/prog_tests/test_local_storage.c       |  60 ++++++++
> >  .../selftests/bpf/progs/local_storage.c       | 137 ++++++++++++++++++
> >  2 files changed, 197 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
> >
> 
> [...]
> 
> > diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/selftests/bpf/progs/local_storage.c
> > new file mode 100644
> > index 000000000000..38954e6a1edc
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/local_storage.c
> > @@ -0,0 +1,137 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright 2020 Google LLC.
> > + */
> > +
> > +#include <errno.h>
> > +#include <linux/bpf.h>
> > +#include <stdbool.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +__u32 _version SEC("version") = 1;
> 
> version is anachronism, please drop it.

Removed.

> 
> Otherwise, LGTM.

Thanks

- KP

> 
> Acked-by: Andrii Nakryiko <andriin@fb.com>
> 
> [...]

^ permalink raw reply

* Re: [RFC PATCH 1/7] x86/cpufeatures: add X86_FEATURE_SCI
From: Mike Rapoport @ 2020-06-30 11:58 UTC (permalink / raw)
  To: hackapple
  Cc: James.Bottomley, alexandre.chartre, bp, dave.hansen, hpa, jwadams,
	keescook, linux-kernel, linux-mm, linux-security-module, luto,
	mingo, peterz, pjt, tglx, x86
In-Reply-To: <94E40D06-B481-45B7-A929-CC324F5B856B@qq.com>

On Tue, Jun 30, 2020 at 08:08:59AM +0800, hackapple wrote:
> What’s the version of kernel?

It was around 5.2 time frame, I think.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v2 10/15] exec: Remove do_execve_file
From: Eric W. Biederman @ 2020-06-30 12:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds
In-Reply-To: <20200630054313.GB27221@infradead.org>

Christoph Hellwig <hch@infradead.org> writes:

> FYI, this clashes badly with my exec rework.  I'd suggest you
> drop everything touching exec here for now, and I can then
> add the final file based exec removal to the end of my series.

I have looked and I haven't even seen any exec work.  Where can it be
found?

I have working and cleaning up exec for what 3 cycles now.  There is
still quite a ways to go before it becomes possible to fix some of the
deep problems in exec.  Removing all of these broken exec special cases
is quite frankly the entire point of this patchset.

Sight unseen I suggest you send me your exec work and I can merge it
into my branch if we are going to conflict badly.

Eric


^ permalink raw reply

* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen
From: Eric W. Biederman @ 2020-06-30 12:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf,
	linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada,
	Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler,
	Luis Chamberlain, Linus Torvalds
In-Reply-To: <20200629221231.jjc2czk3ul2roxkw@ast-mbp.dhcp.thefacebook.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

2> On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote:
>> 
>> I have tested thes changes by booting with the code compiled in and
>> by killing "bpfilter_umh" and running iptables -vnL to restart
>> the userspace driver.
>> 
>> I have compiled tested each change with and without CONFIG_BPFILTER
>> enabled.
>
> With
> CONFIG_BPFILTER=y
> CONFIG_BPFILTER_UMH=m
> it doesn't build:
>
> ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined!
>
> I've added:
> +EXPORT_SYMBOL(kill_pid_info);
> to continue testing...

I am rather surprised I thought Tetsuo had already compile tested
modules.


> I suspect patch 13 is somehow responsible:
> +	if (tgid) {
> +		kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid);
> +		wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID));
> +		bpfilter_umh_cleanup(info);
> +	}
>
> I cannot figure out why it hangs. Some sort of race ?
> Since adding short delay between kill and wait makes it work.

Having had a chance to sleep kill_pid_info was a thinko, as was
!pid_task.  It should have been !pid_has_task as that takes the proper
rcu locking.

I don't know if that is going to be enough to fix the wait_event
but those are obvious bugs that need to be fixed.

diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index 91474884ddb7..3e1874030daa 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -19,8 +19,8 @@ static void shutdown_umh(void)
        struct pid *tgid = info->tgid;
 
        if (tgid) {
-               kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid);
-               wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID));
+               kill_pid(tgid, SIGKILL, 1);
+               wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID));
                bpfilter_umh_cleanup(info);
        }
 }

> And then did:
> while true; do iptables -L;rmmod bpfilter; done
>  
> Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event().

Hmm.  The wake up happens just of tgid->wait_pidfd happens just before
release_task is called so there is a race.  As it is possible to wake
up and then go back to sleep before pid_has_task becomes false.

So I think I need a friendly helper that does:

bool task_has_exited(struct pid *tgid)
{
	bool exited = false;

	rcu_read_lock();
        tsk = pid_task(tgid, PIDTYPE_TGID);
        exited = !!tsk;
        if (tsk) {
        	exited = !!tsk->exit_state;
out:
	rcu_unlock();
	return exited;
}

There should be a sensible way to do that.

Eric



^ permalink raw reply related

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Eric W. Biederman @ 2020-06-30 12:32 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Alexei Starovoitov, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <6a9dd8be-333a-fd21-d125-ec20fb7c81df@i-love.sakura.ne.jp>

Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:

> On 2020/06/30 5:19, Eric W. Biederman wrote:
>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>> 
>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>>> But all the defensive programming kinda goes against general kernel style.
>>>> I wouldn't do it. Especially pr_info() ?!
>>>> Though I don't feel strongly about it.
>>>
>>> Honestly speaking, caller should check for errors and print appropriate
>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>>> went wrong (maybe memory corruption). But other conditions are not fatal.
>>> That is, I consider even pr_info() here should be unnecessary.
>> 
>> They were all should never happen cases.  Which is why my patches do:
>> if (WARN_ON_ONCE(...))
>
> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> This bug was unfortunately not found by syzkaller because this path is
> not easily reachable via syscall interface.

Absolutely yes.  These are cases that should never happen.
They should never be reachable by userspace.

It is absolutely a bug if these are hit by userspace.

Now if fuzzers want horrible cases to be even more horrible and change a
nice friendly warn into a panic that is their problem.  The issue being
do they capture the information the rest of us need to fix.

Eric



^ permalink raw reply

* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen
From: Tetsuo Handa @ 2020-06-30 13:21 UTC (permalink / raw)
  To: Eric W. Biederman, Alexei Starovoitov
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook,
	Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel,
	Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin,
	Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain,
	Linus Torvalds
In-Reply-To: <87eepwzqhd.fsf@x220.int.ebiederm.org>

On 2020/06/30 21:29, Eric W. Biederman wrote:
> Hmm.  The wake up happens just of tgid->wait_pidfd happens just before
> release_task is called so there is a race.  As it is possible to wake
> up and then go back to sleep before pid_has_task becomes false.

What is the reason we want to wait until pid_has_task() becomes false?

- wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID));
+ while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1));




By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says
that use of flush_delayed_fput() has to be careful. Al, is it safe to call
flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be
called from both kernel thread and from process context (e.g. init_module()
syscall by /sbin/insmod )) ?

^ permalink raw reply

* Re: [PATCH v2 10/15] exec: Remove do_execve_file
From: Christoph Hellwig @ 2020-06-30 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, linux-kernel, David Miller, Greg Kroah-Hartman,
	Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton,
	Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann,
	Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele,
	LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds
In-Reply-To: <87a70k21k0.fsf@x220.int.ebiederm.org>

On Tue, Jun 30, 2020 at 07:14:23AM -0500, Eric W. Biederman wrote:
> Christoph Hellwig <hch@infradead.org> writes:
> 
> > FYI, this clashes badly with my exec rework.  I'd suggest you
> > drop everything touching exec here for now, and I can then
> > add the final file based exec removal to the end of my series.
> 
> I have looked and I haven't even seen any exec work.  Where can it be
> found?
> 
> I have working and cleaning up exec for what 3 cycles now.  There is
> still quite a ways to go before it becomes possible to fix some of the
> deep problems in exec.  Removing all of these broken exec special cases
> is quite frankly the entire point of this patchset.
> 
> Sight unseen I suggest you send me your exec work and I can merge it
> into my branch if we are going to conflict badly.

https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/T/#t

^ permalink raw reply

* Re: [PATCH v33 12/21] x86/sgx: Allow a limited use of ATTRIBUTE.PROVISIONKEY for attestation
From: Sean Christopherson @ 2020-06-30 14:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jarkko Sakkinen, x86, linux-sgx, linux-kernel,
	linux-security-module, Jethro Beekman, Andy Lutomirski, akpm,
	andriy.shevchenko, asapek, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, nhorman, npmccallum,
	puiterwijk, rientjes, tglx, yaozhangx
In-Reply-To: <20200630084956.GB1093@zn.tnic>

On Tue, Jun 30, 2020 at 10:49:56AM +0200, Borislav Petkov wrote:
> On Mon, Jun 29, 2020 at 03:04:00PM -0700, Sean Christopherson wrote:
> > /dev/sgx/provision is root-only by default, the expectation is that the admin
> > will configure the system to grant only specific enclaves access to the
> > PROVISION_KEY.
> 
> Uuh, I don't like "the expectation is" - the reality happens to turn
> differently, more often than not.

Would it help if I worded it as "only root should ever be able to run an
enclave with access to PROVISION_KEY"?  We obviously can't control what
admins actually do, hence my wording of it as the expected behavior.

> > In this series, access is fairly binary, i.e. there's no additional kernel
> > infrastructure to help userspace make per-enclave decisions.  There have been
> > more than a few proposals on how to extend the kernel to help provide better
> > granularity, e.g. LSM hooks, but it was generally agreed to punt that stuff
> > to post-upstreaming to keep things "simple" once we went far enough down
> > various paths to ensure we weren't painting ourselves into a corner.
> 
> So this all sounds to me like we should not upstream /dev/sgx/provision
> now but delay it until the infrastructure for that has been made more
> concrete. We can always add it then. Changing it after the fact -
> if we have to and for whatever reason - would be a lot harder for a
> user-visible interface which someone has started using already.

The userspace and attestation infrastructure is very concrete, i.e. the
need for userspace to be able to access PROVISION_KEY is there, as is the
desire to be able to restrict access to PROVISION_KEY, e.g. I believe Andy
Lutomirski originally requested the ability to restrict access.

The additional infrastructure for per-enclave decisions is somewhat
orthogonal to the PROVISION_KEY, e.g. they won't necessarily be employed
by everyone running enclaves, and environments that do have per-enclave
policies would still likely want the extra layer of restriction for
PROVISION_KEY.  I only brought the additional policy crud to call out that
we've done enough path-finding on additional restrictions to have strong
confidence that adding /dev/sgx/provision won't prevent us from adding more
fine grained control in the future.

> So I'd leave  that out from the initial patchset.

^ permalink raw reply

* Re: [PATCH v2 10/15] exec: Remove do_execve_file
From: Eric W. Biederman @ 2020-06-30 14:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds
In-Reply-To: <20200630133802.GA30093@infradead.org>

Christoph Hellwig <hch@infradead.org> writes:

> On Tue, Jun 30, 2020 at 07:14:23AM -0500, Eric W. Biederman wrote:
>> Christoph Hellwig <hch@infradead.org> writes:
>> 
>> > FYI, this clashes badly with my exec rework.  I'd suggest you
>> > drop everything touching exec here for now, and I can then
>> > add the final file based exec removal to the end of my series.
>> 
>> I have looked and I haven't even seen any exec work.  Where can it be
>> found?
>> 
>> I have working and cleaning up exec for what 3 cycles now.  There is
>> still quite a ways to go before it becomes possible to fix some of the
>> deep problems in exec.  Removing all of these broken exec special cases
>> is quite frankly the entire point of this patchset.
>> 
>> Sight unseen I suggest you send me your exec work and I can merge it
>> into my branch if we are going to conflict badly.
>
> https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/T/#t


Looking at your final patch I do not like the construct.

static int __do_execveat(int fd, struct filename *filename,
 		const char __user *const __user *argv,
 		const char __user *const __user *envp,
		const char *const *kernel_argv,
		const char *const *kernel_envp,
 		int flags, struct file *file);


It results in a function that is full of:
	if (kernel_argv) {
        	// For kernel_exeveat 
		...
	} else {
        	// For ordinary exeveat
        	
        }

Which while understandable.  I do not think results in good long term
maintainble code.

The current file paramter that I am getting rid of in my patchset is
a stark example of that.  Because of all of the if's no one realized
that the code had it's file reference counting wrong (amoung other
bugs).

I think this is important to address as exec has already passed
the point where people can fix all of the bugs in exec because
the code is so hairy.

I think to be maintainable and clear the code exec code is going to
need to look something like:

static int bprm_execveat(int fd, struct filename *filename,
			struct bprm *bprm, int flags);

int kernel_execve(const char *filename,
		  const char *const *argv, const char *const *envp, int flags)
{
	bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL);
        bprm->argc = count_kernel_strings(argv);
        bprm->envc = count_kernel_strings(envp);
        prepare_arg_pages(bprm);
        copy_strings_kernel(bprm->envc, envp, bprm);
        copy_strings_kernel(bprm->argc, argc, bprm);
	ret = bprm_execveat(AT_FDCWD, filename, bprm);
        free_bprm(bprm);
        return ret;
}

int do_exeveat(int fd, const char *filename,
		const char __user *const __user *argv,
                const char __user *const __user *envp, int flags)
{
	bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL);
        bprm->argc = count_strings(argv);
        bprm->envc = count_strings(envp);
        prepare_arg_pages(bprm);
        copy_strings(bprm->envc, envp, bprm);
        copy_strings(bprm->argc, argc, bprm);
	ret = bprm_execveat(fd, filename, bprm);
        free_bprm(bprm);
        return ret;
}

More work is required obviously to make the code above really work but
when the dust clears a structure like that doesn't have funny edge cases
that can hide bugs and make it tricky to change the code.

Eric




^ permalink raw reply

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Alexei Starovoitov @ 2020-06-30 16:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric W. Biederman, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <6a9dd8be-333a-fd21-d125-ec20fb7c81df@i-love.sakura.ne.jp>

On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
> On 2020/06/30 5:19, Eric W. Biederman wrote:
> > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> > 
> >> On 2020/06/29 4:44, Alexei Starovoitov wrote:
> >>> But all the defensive programming kinda goes against general kernel style.
> >>> I wouldn't do it. Especially pr_info() ?!
> >>> Though I don't feel strongly about it.
> >>
> >> Honestly speaking, caller should check for errors and print appropriate
> >> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
> >> went wrong (maybe memory corruption). But other conditions are not fatal.
> >> That is, I consider even pr_info() here should be unnecessary.
> > 
> > They were all should never happen cases.  Which is why my patches do:
> > if (WARN_ON_ONCE(...))
> 
> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.

I don't believe that's true.
Please show fuzzing stack trace to prove your point.

^ permalink raw reply

* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen
From: Alexei Starovoitov @ 2020-06-30 16:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa,
	Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf,
	linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada,
	Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler,
	Luis Chamberlain, Linus Torvalds
In-Reply-To: <87eepwzqhd.fsf@x220.int.ebiederm.org>

On Tue, Jun 30, 2020 at 07:29:34AM -0500, Eric W. Biederman wrote:
> 
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> index 91474884ddb7..3e1874030daa 100644
> --- a/net/bpfilter/bpfilter_kern.c
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -19,8 +19,8 @@ static void shutdown_umh(void)
>         struct pid *tgid = info->tgid;
>  
>         if (tgid) {
> -               kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid);
> -               wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID));
> +               kill_pid(tgid, SIGKILL, 1);
> +               wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID));
>                 bpfilter_umh_cleanup(info);
>         }
>  }
> 
> > And then did:
> > while true; do iptables -L;rmmod bpfilter; done
> >  
> > Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event().
> 
> Hmm.  The wake up happens just of tgid->wait_pidfd happens just before
> release_task is called so there is a race.  As it is possible to wake
> up and then go back to sleep before pid_has_task becomes false.
> 
> So I think I need a friendly helper that does:
> 
> bool task_has_exited(struct pid *tgid)
> {
> 	bool exited = false;
> 
> 	rcu_read_lock();
>         tsk = pid_task(tgid, PIDTYPE_TGID);
>         exited = !!tsk;
>         if (tsk) {
>         	exited = !!tsk->exit_state;
> out:
> 	rcu_unlock();
> 	return exited;
> }

All makes sense to me.
If I understood the race condition such helper should indeed solve it.
Are you going to add such patch to your series?
I'll proceed with my work on top of your series and will ignore this
race for now, but I think it should be fixed before we land this set
into multiple trees.

^ permalink raw reply

* Re: [PATCH v2 10/15] exec: Remove do_execve_file
From: Alexei Starovoitov @ 2020-06-30 16:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, linux-kernel, David Miller, Greg Kroah-Hartman,
	Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds
In-Reply-To: <878sg4y6f9.fsf@x220.int.ebiederm.org>

On Tue, Jun 30, 2020 at 09:28:10AM -0500, Eric W. Biederman wrote:
> Christoph Hellwig <hch@infradead.org> writes:
> 
> > On Tue, Jun 30, 2020 at 07:14:23AM -0500, Eric W. Biederman wrote:
> >> Christoph Hellwig <hch@infradead.org> writes:
> >> 
> >> > FYI, this clashes badly with my exec rework.  I'd suggest you
> >> > drop everything touching exec here for now, and I can then
> >> > add the final file based exec removal to the end of my series.
> >> 
> >> I have looked and I haven't even seen any exec work.  Where can it be
> >> found?
> >> 
> >> I have working and cleaning up exec for what 3 cycles now.  There is
> >> still quite a ways to go before it becomes possible to fix some of the
> >> deep problems in exec.  Removing all of these broken exec special cases
> >> is quite frankly the entire point of this patchset.
> >> 
> >> Sight unseen I suggest you send me your exec work and I can merge it
> >> into my branch if we are going to conflict badly.
> >
> > https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/T/#t
> 
> 
> Looking at your final patch I do not like the construct.
> 
> static int __do_execveat(int fd, struct filename *filename,
>  		const char __user *const __user *argv,
>  		const char __user *const __user *envp,
> 		const char *const *kernel_argv,
> 		const char *const *kernel_envp,
>  		int flags, struct file *file);
> 
> 
> It results in a function that is full of:
> 	if (kernel_argv) {
>         	// For kernel_exeveat 
> 		...
> 	} else {
>         	// For ordinary exeveat
>         	
>         }
> 
> Which while understandable.  I do not think results in good long term
> maintainble code.
> 
> The current file paramter that I am getting rid of in my patchset is
> a stark example of that.  Because of all of the if's no one realized
> that the code had it's file reference counting wrong (amoung other
> bugs).
> 
> I think this is important to address as exec has already passed
> the point where people can fix all of the bugs in exec because
> the code is so hairy.
> 
> I think to be maintainable and clear the code exec code is going to
> need to look something like:
> 
> static int bprm_execveat(int fd, struct filename *filename,
> 			struct bprm *bprm, int flags);
> 
> int kernel_execve(const char *filename,
> 		  const char *const *argv, const char *const *envp, int flags)
> {
> 	bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL);
>         bprm->argc = count_kernel_strings(argv);
>         bprm->envc = count_kernel_strings(envp);
>         prepare_arg_pages(bprm);
>         copy_strings_kernel(bprm->envc, envp, bprm);
>         copy_strings_kernel(bprm->argc, argc, bprm);
> 	ret = bprm_execveat(AT_FDCWD, filename, bprm);
>         free_bprm(bprm);
>         return ret;
> }
> 
> int do_exeveat(int fd, const char *filename,
> 		const char __user *const __user *argv,
>                 const char __user *const __user *envp, int flags)
> {
> 	bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL);
>         bprm->argc = count_strings(argv);
>         bprm->envc = count_strings(envp);
>         prepare_arg_pages(bprm);
>         copy_strings(bprm->envc, envp, bprm);
>         copy_strings(bprm->argc, argc, bprm);
> 	ret = bprm_execveat(fd, filename, bprm);
>         free_bprm(bprm);
>         return ret;
> }
> 
> More work is required obviously to make the code above really work but
> when the dust clears a structure like that doesn't have funny edge cases
> that can hide bugs and make it tricky to change the code.

+1 to the approach.
I think Christoph's work need to be on top of Eric's.

^ permalink raw reply

* Re: [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support
From: Linus Torvalds @ 2020-06-30 16:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, David Miller, Greg Kroah-Hartman,
	Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton,
	Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann,
	Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele,
	LSM List, Casey Schaufler, Luis Chamberlain
In-Reply-To: <87imf963s6.fsf_-_@x220.int.ebiederm.org>

On Mon, Jun 29, 2020 at 1:05 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> This makes it clear which code is part of the core user mode
> helper support and which code is needed to implement user mode
> drivers.
>
>  kernel/umd.c             | 146 +++++++++++++++++++++++++++++++++++++++
>  kernel/umh.c             | 139 -------------------------------------

I certainly don't object to the split, but I hate the name.

We have uml, umd and umh for user mode {linux, drivers, helper}
respectively.And honestly, I don't see the point in using an obscure
and unreadable TLA for something like this.

I really don't think it would hurt to write out even the full name
with "usermode_driver.c" or something like that, would it?

Then "umd" could be continued to be used as a prefix for the helper
functions, by all means, but if we startv renaming files, can we do it
properly?

                   Linus

^ permalink raw reply

* Re: [PATCH v33 12/21] x86/sgx: Allow a limited use of ATTRIBUTE.PROVISIONKEY for attestation
From: Andy Lutomirski @ 2020-06-30 17:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Jarkko Sakkinen, X86 ML, linux-sgx, LKML,
	LSM List, Jethro Beekman, Andy Lutomirski, Andrew Morton,
	Andy Shevchenko, asapek, Xing, Cedric, chenalexchen, conradparker,
	cyhanish, Dave Hansen, Huang, Haitao, Josh Triplett, Huang, Kai,
	Svahn, Kai, kmoy, Christian Ludloff, Neil Horman, npmccallum,
	puiterwijk, David Rientjes, Thomas Gleixner, yaozhangx
In-Reply-To: <20200630142027.GA7733@linux.intel.com>

On Tue, Jun 30, 2020 at 7:20 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Jun 30, 2020 at 10:49:56AM +0200, Borislav Petkov wrote:
> > On Mon, Jun 29, 2020 at 03:04:00PM -0700, Sean Christopherson wrote:
> > > /dev/sgx/provision is root-only by default, the expectation is that the admin
> > > will configure the system to grant only specific enclaves access to the
> > > PROVISION_KEY.
> >
> > Uuh, I don't like "the expectation is" - the reality happens to turn
> > differently, more often than not.
>
> Would it help if I worded it as "only root should ever be able to run an
> enclave with access to PROVISION_KEY"?  We obviously can't control what
> admins actually do, hence my wording of it as the expected behavior.
>
> > > In this series, access is fairly binary, i.e. there's no additional kernel
> > > infrastructure to help userspace make per-enclave decisions.  There have been
> > > more than a few proposals on how to extend the kernel to help provide better
> > > granularity, e.g. LSM hooks, but it was generally agreed to punt that stuff
> > > to post-upstreaming to keep things "simple" once we went far enough down
> > > various paths to ensure we weren't painting ourselves into a corner.
> >
> > So this all sounds to me like we should not upstream /dev/sgx/provision
> > now but delay it until the infrastructure for that has been made more
> > concrete. We can always add it then. Changing it after the fact -
> > if we have to and for whatever reason - would be a lot harder for a
> > user-visible interface which someone has started using already.
>
> The userspace and attestation infrastructure is very concrete, i.e. the
> need for userspace to be able to access PROVISION_KEY is there, as is the
> desire to be able to restrict access to PROVISION_KEY, e.g. I believe Andy
> Lutomirski originally requested the ability to restrict access.
>
> The additional infrastructure for per-enclave decisions is somewhat
> orthogonal to the PROVISION_KEY, e.g. they won't necessarily be employed
> by everyone running enclaves, and environments that do have per-enclave
> policies would still likely want the extra layer of restriction for
> PROVISION_KEY.  I only brought the additional policy crud to call out that
> we've done enough path-finding on additional restrictions to have strong
> confidence that adding /dev/sgx/provision won't prevent us from adding more
> fine grained control in the future.

I agree.

I anticipate that most of the nasty fine-grained stuff will end up in
userspace down the road.  Systems can be configured such that
provisioning is done as root, or systems can end up with fancy SELinux
rules or daemons that pass around fds or whatever, but all of this can
be done with the kernel code in this patchset.

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Luis Chamberlain @ 2020-06-30 17:57 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <20200626025410.GJ4332@42.do-not-panic.com>

On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 24.06.20 20:32, Christian Borntraeger wrote:
> > [...]> 
> > > So the translations look correct. But your change is actually a sematic change
> > > if(ret) will only trigger if there is an error
> > > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> > > and we did not do it before. 
> > > 
> > 
> > So the right fix is
> > 
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index f81e8698e36e..a3a3196e84d1 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >                  * the real error code is already in sub_info->retval or
> >                  * sub_info->retval is 0 anyway, so don't mess with it then.
> >                  */
> > -               if (KWIFEXITED(ret))
> > +               if (KWEXITSTATUS(ret))
> >                         sub_info->retval = KWEXITSTATUS(ret);
> >         }
> >  
> > I think.
> 
> Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
> I'm not able to reproduce this on x86 with a bridge. What type of bridge
> are you using on a guest, or did you mean using KVM so that the *host*
> can spawn kvm guests?
> 
> It would be good if you can try to add a bridge manually and see where
> things fail. Can you do something like this:
> 
> brctl addbr br0
> brctl addif br0 ens6 
> ip link set dev br0 up
> 
> Note that most callers are for modprobe. I'd be curious to see which
> umh is failing which breaks bridge for you. Can you trut this so we can
> see which umh call is failing?

Christian, any luck getting to test the code below to see what this
reveals?

  Luis

> 
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f81e8698e36e..5ad74bc301d8 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -2,6 +2,9 @@
>  /*
>   * umh - the kernel usermode helper
>   */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/module.h>
>  #include <linux/sched.h>
>  #include <linux/sched/task.h>
> @@ -154,8 +157,12 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>  		 * the real error code is already in sub_info->retval or
>  		 * sub_info->retval is 0 anyway, so don't mess with it then.
>  		 */
> -		if (KWIFEXITED(ret))
> +		printk("== ret: %02x\n", ret);
> +		printk("== KWIFEXITED(ret): %02x\n", KWIFEXITED(ret));
> +		if (KWIFEXITED(ret)) {
> +			printk("KWEXITSTATUS(ret): %d\n", KWEXITSTATUS(ret));
>  			sub_info->retval = KWEXITSTATUS(ret);
> +		}
>  	}
>  
>  	/* Restore default kernel sig handler */
> @@ -383,6 +390,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
>  		void *data)
>  {
>  	struct subprocess_info *sub_info;
> +	unsigned int i = 0;
>  	sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
>  	if (!sub_info)
>  		goto out;
> @@ -394,6 +402,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
>  #else
>  	sub_info->path = path;
>  #endif
> +	pr_info("sub_info->path: %s\n", sub_info->path);
> +	while (argv[i])
> +		printk(KERN_INFO "%s ", argv[i++]);
> +	printk(KERN_INFO  "\n");
> +
>  	sub_info->argv = argv;
>  	sub_info->envp = envp;
>  
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/4] bpf: Generalize bpf_sk_storage
From: Martin KaFai Lau @ 2020-06-30 19:34 UTC (permalink / raw)
  To: KP Singh
  Cc: bpf, linux-security-module, Alexei Starovoitov, Daniel Borkmann,
	Paul Turner, Jann Horn
In-Reply-To: <20200629160100.GA171259@google.com>

On Mon, Jun 29, 2020 at 06:01:00PM +0200, KP Singh wrote:
> Thanks for your feedback! Apologies it took some time for me
> to incorporate this into another revision.
> 
> On 18-Jun 23:43, Martin KaFai Lau wrote:
> > On Wed, Jun 17, 2020 at 10:29:38PM +0200, KP Singh wrote:
> > > From: KP Singh <kpsingh@google.com>
> > > 
> > > Refactor the functionality in bpf_sk_storage.c so that concept of
> > > storage linked to kernel objects can be extended to other objects like
> > > inode, task_struct etc.
> > > 
> > > bpf_sk_storage is updated to be bpf_local_storage with a union that
> > > contains a pointer to the owner object. The type of the
> > > bpf_local_storage can be determined using the newly added
> > > bpf_local_storage_type enum.
> > > 
> > > Each new local storage will still be a separate map and provide its own
> > > set of helpers. This allows for future object specific extensions and
> > > still share a lot of the underlying implementation.
> > Thanks for taking up this effort to refactor sk_local_storage.
> > 
> > I took a quick look.  I have some comments and would like to explore
> > some thoughts.
> > 
> > > --- a/net/core/bpf_sk_storage.c
> > > +++ b/kernel/bpf/bpf_local_storage.c
> > > @@ -1,19 +1,22 @@
> > >  // SPDX-License-Identifier: GPL-2.0
> > >  /* Copyright (c) 2019 Facebook  */
> > > +#include "linux/bpf.h"
> > > +#include "asm-generic/bug.h"
> > > +#include "linux/err.h"
> > "<" ">"
> > 
> > >  #include <linux/rculist.h>
> > >  #include <linux/list.h>
> > >  #include <linux/hash.h>
> > >  #include <linux/types.h>
> > >  #include <linux/spinlock.h>
> > >  #include <linux/bpf.h>
> > > -#include <net/bpf_sk_storage.h>
> > > +#include <linux/bpf_local_storage.h>
> > >  #include <net/sock.h>
> > >  #include <uapi/linux/sock_diag.h>
> > >  #include <uapi/linux/btf.h>
> > >  
> > >  static atomic_t cache_idx;
> > inode local storage and sk local storage probably need a separate
> > cache_idx.  An improvement on picking cache_idx has just been
> > landed also.
> 
> I see, thanks! I rebased and I now see that cache_idx is now a:
> 
>   static u64 cache_idx_usage_counts[BPF_STORAGE_CACHE_SIZE];
> 
> which tracks the free cache slots rather than using a single atomic
> cache_idx. I guess all types of local storage can share this now
> right?
I believe they have to be separated.  A sk-storage will not be cached/stored
in inode.  Caching a sk-storage at idx=0 of a sk should not stop
an inode-storage to be cached at the same idx of a inode.

> 
> > 
> > [ ... ]
> > 
> > > +struct bpf_local_storage {
> > > +	struct bpf_local_storage_data __rcu *cache[BPF_STORAGE_CACHE_SIZE];
> > >  	return NULL;
> 
> [...]
> 
> > >  }
> > >  
> > > -/* sk_storage->lock must be held and selem->sk_storage == sk_storage.
> > > +static void __unlink_local_storage(struct bpf_local_storage *local_storage,
> > > +				   bool uncharge_omem)
> > Nit. indent is off.  There are a few more cases like this.
> 
> Thanks, will fix this. (note to self: don't trust the editor's
> clang-format blindly).
> 
> > 
> > > +{
> > > +	struct sock *sk;
> > > +
> > > +	switch (local_storage->stype) {
> > Does it need a new bpf_local_storage_type?  Is map_type as good?
> > 
> > Instead of adding any new member (e.g. stype) to
> > "struct bpf_local_storage",  can the smap pointer be directly used
> > here instead?
> > 
> > For example in __unlink_local_storage() here, it should
> > have a hold to the selem which then has a hold to smap.
> 
> Good point, Updated to using the map->map_type.
> 
> > 
> > > +	case BPF_LOCAL_STORAGE_SK:
> > > +		sk = local_storage->sk;
> > > +		if (uncharge_omem)
> > > +			atomic_sub(sizeof(struct bpf_local_storage),
> > > +				   &sk->sk_omem_alloc);
> > > +
> > > +		/* After this RCU_INIT, sk may be freed and cannot be used */
> > > +		RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
> > > +		local_storage->sk = NULL;
> > > +		break;
> > > +	}
> > Another thought on the stype switch cases.
> > 
> > Instead of having multiple switches on stype in bpf_local_storage.c which may
> > not be scalable soon if we are planning to support a few more kernel objects,
> > have you considered putting them into its own "ops".  May be a few new
> > ops can be added to bpf_map_ops to do local storage unlink/update/alloc...etc.
> 
> Good idea, I was able to refactor this with the following ops:
> 
>         /* Functions called by bpf_local_storage maps */
> 	void (*map_local_storage_unlink)(struct bpf_local_storage *local_storage,
>                                          bool uncharge_omem);
> 	struct bpf_local_storage_elem *(*map_selem_alloc)(
> 		struct bpf_local_storage_map *smap, void *owner, void *value,
> 		bool charge_omem);
> 	struct bpf_local_storage_data *(*map_local_storage_update)(
> 		void  *owner, struct bpf_map *map, void *value, u64 flags);
> 	int (*map_local_storage_alloc)(void *owner,
> 				       struct bpf_local_storage_map *smap,
> 				       struct bpf_local_storage_elem *elem);
> 
> Let me know if you have any particular thoughts/suggestions about
> this.
Make sense to me.

It seems the fast-path's lookup can all be done within __local_storage_lookup().
No indirect call is needed, so should be good.

> 
> > 
> > > +}
> > > +
> > > +/* local_storage->lock must be held and selem->local_storage == local_storage.
> > >   * The caller must ensure selem->smap is still valid to be
> > >   * dereferenced for its smap->elem_size and smap->cache_idx.
> > > + *
> > > + * uncharge_omem is only relevant when:
> 
> [...]
> 
> > > +	/* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as
> > >  	 * the map_alloc_check() side also does.
> > >  	 */
> > >  	if (!bpf_capable())
> > > @@ -1025,10 +1127,10 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
> > >  }
> > >  EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc);
> > Would it be cleaner to leave bpf_sk specific function, map_ops, and func_proto
> > in net/core/bpf_sk_storage.c?
> 
> Sure, I can also keep the sk_clone code their as well for now.
Just came to my mind.  For easier review purpose, may be
first do the refactoring/renaming within bpf_sk_storage.c first and
then create another patch to move the common parts to a new
file bpf_local_storage.c.

Not sure if it will be indeed easier to read the diff in practice.
I probably should be able to follow it either way.

> 
> > 
> > There is a test in map_tests/sk_storage_map.c, in case you may not notice.
> 
> I will try to make it generic as a part of this series. If it takes
> too much time, I will send a separate patch for testing
> inode_storage_map and till then we have some assurance with
> test_local_storage in test_progs.
Sure. no problem.  It is mostly for you to test sk_storage to ensure things ;)
Also give some ideas on what racing conditions have
been considered in the sk_storage test and may be the inode storage
test want to stress similar code path.

^ permalink raw reply

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Tetsuo Handa @ 2020-06-30 21:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric W. Biederman, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <20200630164817.txa2jewfvk4stajy@ast-mbp.dhcp.thefacebook.com>

On 2020/07/01 1:48, Alexei Starovoitov wrote:
> On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
>> On 2020/06/30 5:19, Eric W. Biederman wrote:
>>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>>>
>>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>>>> But all the defensive programming kinda goes against general kernel style.
>>>>> I wouldn't do it. Especially pr_info() ?!
>>>>> Though I don't feel strongly about it.
>>>>
>>>> Honestly speaking, caller should check for errors and print appropriate
>>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>>>> went wrong (maybe memory corruption). But other conditions are not fatal.
>>>> That is, I consider even pr_info() here should be unnecessary.
>>>
>>> They were all should never happen cases.  Which is why my patches do:
>>> if (WARN_ON_ONCE(...))
>>
>> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> 
> I don't believe that's true.
> Please show fuzzing stack trace to prove your point.
> 

Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox