Linux userland API discussions
 help / color / mirror / Atom feed
* [PATCH V33 30/30] efi: Restrict efivar_ssdt_load when the kernel is locked down
From: Matthew Garrett @ 2019-06-21  1:19 UTC (permalink / raw)
  To: jmorris
  Cc: linux-security, linux-kernel, linux-api, Matthew Garrett,
	Matthew Garrett, Ard Biesheuvel, linux-efi
In-Reply-To: <20190621011941.186255-1-matthewgarrett@google.com>

efivar_ssdt_load allows the kernel to import arbitrary ACPI code from an
EFI variable, which gives arbitrary code execution in ring 0. Prevent
that when the kernel is locked down.

Signed-off-by: Matthew Garrett <mjg59@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-efi@vger.kernel.org
---
 drivers/firmware/efi/efi.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 55b77c576c42..a9ea649e0512 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -31,6 +31,7 @@
 #include <linux/acpi.h>
 #include <linux/ucs2_string.h>
 #include <linux/memblock.h>
+#include <linux/security.h>
 
 #include <asm/early_ioremap.h>
 
@@ -242,6 +243,9 @@ static void generic_ops_unregister(void)
 static char efivar_ssdt[EFIVAR_SSDT_NAME_MAX] __initdata;
 static int __init efivar_ssdt_setup(char *str)
 {
+	if (security_is_locked_down(LOCKDOWN_ACPI_TABLES))
+		return -EPERM;
+
 	if (strlen(str) < sizeof(efivar_ssdt))
 		memcpy(efivar_ssdt, str, strlen(str));
 	else
-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply related

* Re: [PATCH v5 14/16] ext4: add basic fs-verity support
From: Eric Biggers @ 2019-06-21  3:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Theodore Y . Ts'o, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190620235938.GE5375@magnolia>

Hi Darrick,

On Thu, Jun 20, 2019 at 04:59:38PM -0700, Darrick J. Wong wrote:
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 1cb67859e0518b..5a1deea3fb3e37 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -41,6 +41,7 @@
> >  #endif
> >  
> >  #include <linux/fscrypt.h>
> > +#include <linux/fsverity.h>
> >  
> >  #include <linux/compiler.h>
> >  
> > @@ -395,6 +396,7 @@ struct flex_groups {
> >  #define EXT4_TOPDIR_FL			0x00020000 /* Top of directory hierarchies*/
> >  #define EXT4_HUGE_FILE_FL               0x00040000 /* Set to each huge file */
> >  #define EXT4_EXTENTS_FL			0x00080000 /* Inode uses extents */
> > +#define EXT4_VERITY_FL			0x00100000 /* Verity protected inode */
> 
> Hmm, a new inode flag, superblock rocompat feature flag, and
> (presumably) the Merkle tree has some sort of well defined format which
> starts at the next 64k boundary past EOF.
> 
> Would you mind updating the relevant parts of the ondisk format
> documentation in Documentation/filesystems/ext4/, please?
> 
> I saw that the Merkle tree and verity descriptor formats themselves are
> documented in the first patch, so you could simply link the ext4
> documentation to it.
> 

Sure, I'll update the ext4 documentation.

> > +/*
> > + * Read some verity metadata from the inode.  __vfs_read() can't be used because
> > + * we need to read beyond i_size.
> > + */
> > +static int pagecache_read(struct inode *inode, void *buf, size_t count,
> > +			  loff_t pos)
> > +{
> > +	while (count) {
> > +		size_t n = min_t(size_t, count,
> > +				 PAGE_SIZE - offset_in_page(pos));
> > +		struct page *page;
> > +		void *addr;
> > +
> > +		page = read_mapping_page(inode->i_mapping, pos >> PAGE_SHIFT,
> > +					 NULL);
> > +		if (IS_ERR(page))
> > +			return PTR_ERR(page);
> > +
> > +		addr = kmap_atomic(page);
> > +		memcpy(buf, addr + offset_in_page(pos), n);
> > +		kunmap_atomic(addr);
> > +
> > +		put_page(page);
> > +
> > +		buf += n;
> > +		pos += n;
> > +		count -= n;
> > +	}
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Write some verity metadata to the inode for FS_IOC_ENABLE_VERITY.
> > + * kernel_write() can't be used because the file descriptor is readonly.
> > + */
> > +static int pagecache_write(struct inode *inode, const void *buf, size_t count,
> > +			   loff_t pos)
> > +{
> > +	while (count) {
> > +		size_t n = min_t(size_t, count,
> > +				 PAGE_SIZE - offset_in_page(pos));
> > +		struct page *page;
> > +		void *fsdata;
> > +		void *addr;
> > +		int res;
> > +
> > +		res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
> > +					    &page, &fsdata);
> > +		if (res)
> > +			return res;
> > +
> > +		addr = kmap_atomic(page);
> > +		memcpy(addr + offset_in_page(pos), buf, n);
> > +		kunmap_atomic(addr);
> > +
> > +		res = pagecache_write_end(NULL, inode->i_mapping, pos, n, n,
> > +					  page, fsdata);
> > +		if (res < 0)
> > +			return res;
> > +		if (res != n)
> > +			return -EIO;
> > +
> > +		buf += n;
> > +		pos += n;
> > +		count -= n;
> > +	}
> > +	return 0;
> > +}
> 
> This same code is duplicated in the f2fs patch.  Is there a reason why
> they don't share this common code?  Even if you have to hide it under
> fs/verity/ ?
> 

Yes, pagecache_read() and pagecache_write() are identical between ext4 and f2fs.
I didn't put them in fs/verity/ because the "metadata past EOF" approach is a
choice of ext4 and f2fs and not intrinsic to the fs-verity feature itself, so to
avoid confusion I made the fs/verity/ support layer be completely clean of any
assumption that that's the way filesystems implement fs-verity.

Also, making the fsverity_operations call back into fs/verity/ adds a little
extra conceptual complexity about what belongs where, since then we'd have a
call stack of filesystem => fs/verity/ => filesystem => fs/verity/.

But if people would rather that ext4 and f2fs share these two functions anyway,
then sure, we could move them into fs/verity/, and other filesystems (if they
take a different approach to fs-verity) simply won't use them.

- Eric

^ permalink raw reply

* Re: [PATCH V33 01/30] security: Support early LSMs
From: Kees Cook @ 2019-06-21  3:21 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, linux-security, linux-kernel, linux-api, Matthew Garrett
In-Reply-To: <20190621011941.186255-2-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 06:19:12PM -0700, Matthew Garrett wrote:
> The lockdown module is intended to allow for kernels to be locked down
> early in boot - sufficiently early that we don't have the ability to
> kmalloc() yet. Add support for early initialisation of some LSMs, and
> then add them to the list of names when we do full initialisation later.

So, if I'm reading correctly, these "early LSMs":

- start up before even boot parameter parsing has happened
- have their position in the LSM ordering ignored
- are initialized in boot order
- cannot use kmalloc, as well as probably lots of other things

> 
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
>  include/asm-generic/vmlinux.lds.h |  8 +++++-
>  include/linux/lsm_hooks.h         |  6 ++++
>  include/linux/security.h          |  1 +
>  init/main.c                       |  1 +
>  security/security.c               | 48 +++++++++++++++++++++++++------
>  5 files changed, 54 insertions(+), 10 deletions(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index f8f6f04c4453..e1963352fdb6 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -208,8 +208,13 @@
>  			__start_lsm_info = .;				\
>  			KEEP(*(.lsm_info.init))				\
>  			__end_lsm_info = .;
> +#define EARLY_LSM_TABLE()	. = ALIGN(8);				\
> +			__start_early_lsm_info = .;			\
> +			KEEP(*(.early_lsm_info.init))			\
> +			__end_early_lsm_info = .;
>  #else
>  #define LSM_TABLE()
> +#define EARLY_LSM_TABLE()
>  #endif
>  
>  #define ___OF_TABLE(cfg, name)	_OF_TABLE_##cfg(name)
> @@ -610,7 +615,8 @@
>  	ACPI_PROBE_TABLE(irqchip)					\
>  	ACPI_PROBE_TABLE(timer)						\
>  	EARLYCON_TABLE()						\
> -	LSM_TABLE()
> +	LSM_TABLE()							\
> +	EARLY_LSM_TABLE()
>  
>  #define INIT_TEXT							\
>  	*(.init.text .init.text.*)					\
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index a240a3fc5fc4..66fd1eac7a32 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -2085,12 +2085,18 @@ struct lsm_info {
>  };
>  
>  extern struct lsm_info __start_lsm_info[], __end_lsm_info[];
> +extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
>  
>  #define DEFINE_LSM(lsm)							\
>  	static struct lsm_info __lsm_##lsm				\
>  		__used __section(.lsm_info.init)			\
>  		__aligned(sizeof(unsigned long))
>  
> +#define DEFINE_EARLY_LSM(lsm)						\
> +	static struct lsm_info __early_lsm_##lsm			\
> +		__used __section(.early_lsm_info.init)			\
> +		__aligned(sizeof(unsigned long))
> +
>  #ifdef CONFIG_SECURITY_SELINUX_DISABLE
>  /*
>   * Assuring the safety of deleting a security module is up to
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 49f2685324b0..1bb6fb2f1523 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -194,6 +194,7 @@ int unregister_lsm_notifier(struct notifier_block *nb);
>  
>  /* prototypes */
>  extern int security_init(void);
> +extern int early_security_init(void);
>  
>  /* Security operations */
>  int security_binder_set_context_mgr(struct task_struct *mgr);
> diff --git a/init/main.c b/init/main.c
> index 598e278b46f7..f3faeb89c75f 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -563,6 +563,7 @@ asmlinkage __visible void __init start_kernel(void)
>  	boot_cpu_init();
>  	page_address_init();
>  	pr_notice("%s", linux_banner);
> +	early_security_init();
>  	setup_arch(&command_line);
>  	/*
>  	 * Set up the the initial canary and entropy after arch
> diff --git a/security/security.c b/security/security.c
> index 23cbb1a295a3..2a6672c9e72f 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -37,6 +37,7 @@
>  
>  /* How many LSMs were built into the kernel? */
>  #define LSM_COUNT (__end_lsm_info - __start_lsm_info)
> +#define EARLY_LSM_COUNT (__end_early_lsm_info - __start_early_lsm_info)
>  
>  struct security_hook_heads security_hook_heads __lsm_ro_after_init;
>  static ATOMIC_NOTIFIER_HEAD(lsm_notifier_chain);
> @@ -281,6 +282,8 @@ static void __init ordered_lsm_parse(const char *order, const char *origin)
>  static void __init lsm_early_cred(struct cred *cred);
>  static void __init lsm_early_task(struct task_struct *task);
>  
> +static int lsm_append(const char *new, char **result);
> +
>  static void __init ordered_lsm_init(void)
>  {
>  	struct lsm_info **lsm;
> @@ -327,15 +330,11 @@ static void __init ordered_lsm_init(void)
>  	kfree(ordered_lsms);
>  }
>  
> -/**
> - * security_init - initializes the security framework
> - *
> - * This should be called early in the kernel initialization sequence.
> - */
> -int __init security_init(void)
> +int __init early_security_init(void)
>  {
>  	int i;
>  	struct hlist_head *list = (struct hlist_head *) &security_hook_heads;
> +	struct lsm_info *lsm;
>  
>  	pr_info("Security Framework initializing\n");

I'd rather this was kept in security_init() since it's the string to
search for when debugging normal LSM initialization.

>  
> @@ -343,6 +342,30 @@ int __init security_init(void)
>  	     i++)
>  		INIT_HLIST_HEAD(&list[i]);
>  
> +	for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
> +		if (!lsm->enabled)
> +			lsm->enabled = &lsm_enabled_true;
> +		initialize_lsm(lsm);
> +	}

This should call prepare_lsm() before initialize_lsm(). While not needed
for this specific LSM, it would be nice to at least do the blog size
calculations and keep everything the same as other LSMs.

> +
> +	return 0;
> +}
> +
> +/**
> + * security_init - initializes the security framework
> + *
> + * This should be called early in the kernel initialization sequence.
> + */
> +int __init security_init(void)
> +{
> +	struct lsm_info *lsm;
> +
> +	/* Append the names of the early LSM modules now */

I would clarify this comment more: "... that kmalloc() is available."

> +	for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
> +		if (lsm->enabled)
> +			lsm_append(lsm->name, &lsm_names);
> +	}
> +
>  	/* Load LSMs in specified order. */
>  	ordered_lsm_init();
>  
> @@ -388,7 +411,7 @@ static bool match_last_lsm(const char *list, const char *lsm)
>  	return !strcmp(last, lsm);
>  }
>  
> -static int lsm_append(char *new, char **result)
> +static int lsm_append(const char *new, char **result)
>  {
>  	char *cp;
>  
> @@ -426,8 +449,15 @@ void __init security_add_hooks(struct security_hook_list *hooks, int count,
>  		hooks[i].lsm = lsm;
>  		hlist_add_tail_rcu(&hooks[i].list, hooks[i].head);
>  	}
> -	if (lsm_append(lsm, &lsm_names) < 0)
> -		panic("%s - Cannot get early memory.\n", __func__);
> +
> +	/*
> +	 * Don't try to append during early_security_init(), we'll come back
> +	 * and fix this up afterwards.
> +	 */
> +	if (slab_is_available()) {
> +		if (lsm_append(lsm, &lsm_names) < 0)
> +			panic("%s - Cannot get early memory.\n", __func__);
> +	}
>  }
>  
>  int call_lsm_notifier(enum lsm_event event, void *data)
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH V33 02/30] security: Add a "locked down" LSM hook
From: Kees Cook @ 2019-06-21  3:23 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, linux-security, linux-kernel, linux-api, Matthew Garrett
In-Reply-To: <20190621011941.186255-3-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 06:19:13PM -0700, Matthew Garrett wrote:
> Add a mechanism to allow LSMs to make a policy decision around whether
> kernel functionality that would allow tampering with or examining the
> runtime state of the kernel should be permitted.
> 
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
>  include/linux/lsm_hooks.h |  2 ++
>  include/linux/security.h  | 11 +++++++++++
>  security/security.c       |  6 ++++++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 66fd1eac7a32..df2aebc99838 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1790,6 +1790,7 @@ union security_list_options {
>  	int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
>  	void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
>  #endif /* CONFIG_BPF_SYSCALL */
> +	int (*locked_down)(enum lockdown_reason what);
>  };
>  
>  struct security_hook_heads {
> @@ -2027,6 +2028,7 @@ struct security_hook_heads {
>  	struct hlist_head bpf_prog_alloc_security;
>  	struct hlist_head bpf_prog_free_security;
>  #endif /* CONFIG_BPF_SYSCALL */
> +	struct hlist_head locked_down;
>  } __randomize_layout;
>  
>  /*
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 1bb6fb2f1523..b75941c811e6 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -76,6 +76,12 @@ enum lsm_event {
>  	LSM_POLICY_CHANGE,
>  };
>  
> +enum lockdown_reason {
> +	LOCKDOWN_NONE,
> +	LOCKDOWN_INTEGRITY_MAX,
> +	LOCKDOWN_CONFIDENTIALITY_MAX,
> +};
> +
>  /* These functions are in security/commoncap.c */
>  extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
>  		       int cap, unsigned int opts);
> @@ -389,6 +395,7 @@ void security_inode_invalidate_secctx(struct inode *inode);
>  int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
>  int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
>  int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
> +int security_is_locked_down(enum lockdown_reason what);

bikeshed: can this just be called "security_locked_down" without the
"is"?

-Kees

>  #else /* CONFIG_SECURITY */
>  
>  static inline int call_lsm_notifier(enum lsm_event event, void *data)
> @@ -1189,6 +1196,10 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32
>  {
>  	return -EOPNOTSUPP;
>  }
> +static inline int security_is_locked_down(enum lockdown_reason what)
> +{
> +	return 0;
> +}
>  #endif	/* CONFIG_SECURITY */
>  
>  #ifdef CONFIG_SECURITY_NETWORK
> diff --git a/security/security.c b/security/security.c
> index 2a6672c9e72f..17c17d4d8552 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -2378,3 +2378,9 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux)
>  	call_void_hook(bpf_prog_free_security, aux);
>  }
>  #endif /* CONFIG_BPF_SYSCALL */
> +
> +int security_is_locked_down(enum lockdown_reason what)
> +{
> +	return call_int_hook(locked_down, 0, what);
> +}
> +EXPORT_SYMBOL(security_is_locked_down);
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH V33 03/30] security: Add a static lockdown policy LSM
From: Kees Cook @ 2019-06-21  3:44 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, linux-security, linux-kernel, linux-api, Matthew Garrett,
	David Howells
In-Reply-To: <20190621011941.186255-4-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 06:19:14PM -0700, Matthew Garrett wrote:
> While existing LSMs can be extended to handle lockdown policy,
> distributions generally want to be able to apply a straightforward
> static policy. This patch adds a simple LSM that can be configured to
> reject either integrity or all lockdown queries, and can be configured
> at runtime (through securityfs), boot time (via a kernel parameter) or
> build time (via a kconfig option). Based on initial code by David
> Howells.
> 
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> Cc: David Howells <dhowells@redhat.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   9 +
>  include/linux/security.h                      |   4 +
>  security/Kconfig                              |   3 +-
>  security/Makefile                             |   2 +
>  security/lockdown/Kconfig                     |  46 +++++
>  security/lockdown/Makefile                    |   1 +
>  security/lockdown/lockdown.c                  | 168 ++++++++++++++++++
>  7 files changed, 232 insertions(+), 1 deletion(-)
>  create mode 100644 security/lockdown/Kconfig
>  create mode 100644 security/lockdown/Makefile
>  create mode 100644 security/lockdown/lockdown.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 2b8ee90bb644..fa336f6cd5bc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2239,6 +2239,15 @@
>  	lockd.nlm_udpport=M	[NFS] Assign UDP port.
>  			Format: <integer>
>  
> +	lockdown=	[SECURITY]
> +			{ integrity | confidentiality }
> +			Enable the kernel lockdown feature. If set to
> +			integrity, kernel features that allow userland to
> +			modify the running kernel are disabled. If set to
> +			confidentiality, kernel features that allow userland
> +			to extract confidential information from the kernel
> +			are also disabled.
> +
>  	locktorture.nreaders_stress= [KNL]
>  			Set the number of locking read-acquisition kthreads.
>  			Defaults to being automatically set based on the
> diff --git a/include/linux/security.h b/include/linux/security.h
> index b75941c811e6..a86a7739ca24 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -76,6 +76,10 @@ enum lsm_event {
>  	LSM_POLICY_CHANGE,
>  };
>  
> +/*
> + *  If you add to this, remember to extend lockdown_reasons in
> + *  security/lockdown/lockdown.c.
> + */

Best to add something like:

BUILD_BUG_ON(ARRAY_SIZE(lockdown_reasons), LOCKDOWN_CONFIDENTIALLY_MAX);

to actually enforce this.

>  enum lockdown_reason {
>  	LOCKDOWN_NONE,
>  	LOCKDOWN_INTEGRITY_MAX,
> diff --git a/security/Kconfig b/security/Kconfig
> index 1d6463fb1450..c35aa72103df 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -236,12 +236,13 @@ source "security/apparmor/Kconfig"
>  source "security/loadpin/Kconfig"
>  source "security/yama/Kconfig"
>  source "security/safesetid/Kconfig"
> +source "security/lockdown/Kconfig"
>  
>  source "security/integrity/Kconfig"
>  
>  config LSM
>  	string "Ordered list of enabled LSMs"
> -	default "yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
> +	default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"

Is this needed? It seems like the early LSMs are totally ignored for
ordering?

>  	help
>  	  A comma-separated list of LSMs, in initialization order.
>  	  Any LSMs left off this list will be ignored. This can be
> diff --git a/security/Makefile b/security/Makefile
> index c598b904938f..be1dd9d2cb2f 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -11,6 +11,7 @@ subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
>  subdir-$(CONFIG_SECURITY_YAMA)		+= yama
>  subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
>  subdir-$(CONFIG_SECURITY_SAFESETID)    += safesetid
> +subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown
>  
>  # always enable default capabilities
>  obj-y					+= commoncap.o
> @@ -27,6 +28,7 @@ obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
>  obj-$(CONFIG_SECURITY_YAMA)		+= yama/
>  obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
>  obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
> +obj-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown/
>  obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
>  
>  # Object integrity file lists
> diff --git a/security/lockdown/Kconfig b/security/lockdown/Kconfig
> new file mode 100644
> index 000000000000..431cd2b9a14e
> --- /dev/null
> +++ b/security/lockdown/Kconfig
> @@ -0,0 +1,46 @@
> +config SECURITY_LOCKDOWN_LSM
> +	bool "Basic module for enforcing kernel lockdown"
> +	depends on SECURITY
> +	help
> +	  Build support for an LSM that enforces a coarse kernel lockdown
> +	  behaviour.
> +
> +config SECURITY_LOCKDOWN_LSM_EARLY
> +        bool "Enable lockdown LSM early in init"

whitespace glitches?

> +	depends on SECURITY_LOCKDOWN_LSM
> +	help
> +	  Enable the lockdown LSM early in boot. This is necessary in order
> +	  to ensure that lockdown enforcement can be carried out on kernel
> +	  boot parameters that are otherwise parsed before the security
> +	  subsystem is fully initialised.
> +
> +choice
> +	prompt "Kernel default lockdown mode"
> +	default LOCK_DOWN_KERNEL_FORCE_NONE
> +	depends on SECURITY_LOCKDOWN_LSM
> +	help
> +	  The kernel can be configured to default to differing levels of
> +	  lockdown.
> +
> +config LOCK_DOWN_KERNEL_FORCE_NONE
> +       bool "None"
> +       help
> +          No lockdown functionality is enabled by default. Lockdown may be
> +	  enabled via the kernel commandline or /sys/kernel/security/lockdown.
> +
> +config LOCK_DOWN_KERNEL_FORCE_INTEGRITY
> +       bool "Integrity"
> +       help
> +         The kernel runs in integrity mode by default. Features that allow
> +	 the kernel to be modified at runtime are disabled.
> +
> +config LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY
> +       bool "Confidentiality"
> +       help
> +         The kernel runs in confidentiality mode by default. Features that
> +	 allow the kernel to be modified at runtime or that permit userland
> +	 code to read confidential material held inside the kernel are
> +	 disabled.
> +
> +endchoice
> +
> diff --git a/security/lockdown/Makefile b/security/lockdown/Makefile
> new file mode 100644
> index 000000000000..e3634b9017e7
> --- /dev/null
> +++ b/security/lockdown/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown.o
> diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
> new file mode 100644
> index 000000000000..1ecb2eecb245
> --- /dev/null
> +++ b/security/lockdown/lockdown.c
> @@ -0,0 +1,168 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Lock down the kernel
> + *
> + * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public Licence
> + * as published by the Free Software Foundation; either version
> + * 2 of the Licence, or (at your option) any later version.
> + */
> +
> +#include <linux/security.h>
> +#include <linux/export.h>
> +#include <linux/lsm_hooks.h>
> +
> +static enum lockdown_reason kernel_locked_down;

What's the use-case for runtime changing this value? (If you didn't, you
could make it __ro_after_init.)

> +
> +static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
> +	[LOCKDOWN_NONE] = "none",
> +	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
> +	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
> +};
> +
> +static enum lockdown_reason lockdown_levels[] = {LOCKDOWN_NONE,
> +						 LOCKDOWN_INTEGRITY_MAX,
> +						 LOCKDOWN_CONFIDENTIALITY_MAX};
> +
> +/*
> + * Put the kernel into lock-down mode.
> + */
> +static int lock_kernel_down(const char *where, enum lockdown_reason level)
> +{
> +	if (kernel_locked_down >= level)
> +		return -EPERM;
> +
> +	kernel_locked_down = level;
> +	pr_notice("Kernel is locked down from %s; see man kernel_lockdown.7\n",
> +		  where);
> +	return 0;
> +}
> +
> +static int __init lockdown_param(char *level)
> +{
> +	if (!level)
> +		return -EINVAL;
> +
> +	if (strcmp(level, "integrity") == 0)
> +		lock_kernel_down("command line", LOCKDOWN_INTEGRITY_MAX);
> +	else if (strcmp(level, "confidentiality") == 0)
> +		lock_kernel_down("command line", LOCKDOWN_CONFIDENTIALITY_MAX);
> +	else
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +early_param("lockdown", lockdown_param);
> +
> +/**
> + * lockdown_is_locked_down - Find out if the kernel is locked down
> + * @what: Tag to use in notice generated if lockdown is in effect
> + */
> +static int lockdown_is_locked_down(enum lockdown_reason what)
> +{	
> +	if ((kernel_locked_down >= what) && lockdown_reasons[what])
> +		pr_notice("Lockdown: %s is restricted; see man kernel_lockdown.7\n",
> +			  lockdown_reasons[what]);
> +	return (kernel_locked_down >= what);
> +}
> +
> +static struct security_hook_list lockdown_hooks[] __lsm_ro_after_init = {
> +	LSM_HOOK_INIT(locked_down, lockdown_is_locked_down),
> +};
> +
> +static int __init lockdown_lsm_init(void)
> +{
> +#if defined(CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY)
> +	lock_kernel_down("Kernel configuration", LOCKDOWN_INTEGRITY_MAX);
> +#elif defined(CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY)
> +	lock_kernel_down("Kernel configuration", LOCKDOWN_CONFIDENTIALITY_MAX);
> +#endif
> +	security_add_hooks(lockdown_hooks, ARRAY_SIZE(lockdown_hooks),
> +			   "lockdown");
> +	return 0;
> +}
> +
> +static ssize_t lockdown_read(struct file *filp, char __user *buf, size_t count,
> +			     loff_t *ppos)
> +{
> +	char temp[80];
> +	int i, offset=0;
> +
> +	for (i = 0; i < ARRAY_SIZE(lockdown_levels); i++) {
> +		enum lockdown_reason level = lockdown_levels[i];
> +
> +		if (lockdown_reasons[level]) {
> +			const char *label = lockdown_reasons[level];
> +
> +			if (kernel_locked_down == level)
> +				offset += sprintf(temp+offset, "[%s] ", label);
> +			else
> +				offset += sprintf(temp+offset, "%s ", label);
> +		}
> +	}

I thought there were helpers for this kind of thing?

> +
> +	/* Convert the last space to a newline if needed. */
> +	if (offset > 0)
> +		temp[offset-1] = '\n';
> +
> +	return simple_read_from_buffer(buf, count, ppos, temp, strlen(temp));
> +}
> +
> +static ssize_t lockdown_write(struct file *file, const char __user *buf,
> +			      size_t n, loff_t *ppos)
> +{
> +	char *state;
> +	int i, len, err = -EINVAL;
> +
> +	state = memdup_user_nul(buf, n);
> +	if (IS_ERR(state))
> +		return PTR_ERR(state);
> +
> +	len = strlen(state);
> +	if (len && state[len-1] == '\n') {
> +		state[len-1] = '\0';
> +		len--;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(lockdown_levels); i++) {
> +		enum lockdown_reason level = lockdown_levels[i];
> +		const char *label = lockdown_reasons[level];
> +
> +		if (label && !strcmp(state, label))
> +			err = lock_kernel_down("securityfs", level);
> +	}
> +
> +	kfree(state);
> +	return err ? err : n;
> +}
> +
> +static const struct file_operations lockdown_ops = {
> +	.read  = lockdown_read,
> +	.write = lockdown_write,
> +};
> +
> +static int __init lockdown_secfs_init(void)
> +{
> +	struct dentry *dentry;
> +
> +	dentry = securityfs_create_file("lockdown", 0600, NULL, NULL,
> +					&lockdown_ops);
> +	if (IS_ERR(dentry))
> +		return PTR_ERR(dentry);
> +
> +	return 0;
> +}
> +
> +core_initcall(lockdown_secfs_init);
> +
> +#ifdef CONFIG_SECURITY_LOCKDOWN_LSM_EARLY
> +DEFINE_EARLY_LSM(lockdown) = {
> +#else
> +DEFINE_LSM(lockdown) = {
> +#endif

Ah, I see now: it *might* be an early LSM. What states are missed if not
early? Only parameters? I think the behavior differences need to be
spelled out in Kconfig (or somewhere...)

> +	.name = "lockdown",
> +	.init = lockdown_lsm_init,
> +};
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH V33 04/30] Enforce module signatures if the kernel is locked down
From: Kees Cook @ 2019-06-21  3:46 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, linux-security, linux-kernel, linux-api, David Howells,
	Jessica Yu
In-Reply-To: <20190621011941.186255-5-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 06:19:15PM -0700, Matthew Garrett wrote:
> From: David Howells <dhowells@redhat.com>
> 
> If the kernel is locked down, require that all modules have valid
> signatures that we can verify.
> 
> I have adjusted the errors generated:
> 
>  (1) If there's no signature (ENODATA) or we can't check it (ENOPKG,
>      ENOKEY), then:
> 
>      (a) If signatures are enforced then EKEYREJECTED is returned.
> 
>      (b) If there's no signature or we can't check it, but the kernel is
> 	 locked down then EPERM is returned (this is then consistent with
> 	 other lockdown cases).
> 
>  (2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails
>      the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we
>      return the error we got.
> 
> Note that the X.509 code doesn't check for key expiry as the RTC might not
> be valid or might not have been transferred to the kernel's clock yet.
> 
>  [Modified by Matthew Garrett to remove the IMA integration. This will
>   be replaced with integration with the IMA architecture policy
>   patchset.]
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
> Cc: Jessica Yu <jeyu@kernel.org>
> ---
>  include/linux/security.h     |  1 +
>  kernel/module.c              | 39 +++++++++++++++++++++++++++++-------
>  security/lockdown/lockdown.c |  1 +
>  3 files changed, 34 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/security.h b/include/linux/security.h
> index a86a7739ca24..a7612b03b42a 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -82,6 +82,7 @@ enum lsm_event {
>   */
>  enum lockdown_reason {
>  	LOCKDOWN_NONE,
> +	LOCKDOWN_MODULE_SIGNATURE,
>  	LOCKDOWN_INTEGRITY_MAX,
>  	LOCKDOWN_CONFIDENTIALITY_MAX,
>  };
> diff --git a/kernel/module.c b/kernel/module.c
> index 0b9aa8ab89f0..780e9605ff88 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2763,8 +2763,9 @@ static inline void kmemleak_load_module(const struct module *mod,
>  #ifdef CONFIG_MODULE_SIG
>  static int module_sig_check(struct load_info *info, int flags)
>  {
> -	int err = -ENOKEY;
> +	int err = -ENODATA;
>  	const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
> +	const char *reason;
>  	const void *mod = info->hdr;
>  
>  	/*
> @@ -2779,16 +2780,40 @@ static int module_sig_check(struct load_info *info, int flags)
>  		err = mod_verify_sig(mod, info);
>  	}
>  
> -	if (!err) {
> +	switch (err) {
> +	case 0:
>  		info->sig_ok = true;
>  		return 0;
> -	}
>  
> -	/* Not having a signature is only an error if we're strict. */
> -	if (err == -ENOKEY && !is_module_sig_enforced())
> -		err = 0;
> +		/* We don't permit modules to be loaded into trusted kernels
> +		 * without a valid signature on them, but if we're not
> +		 * enforcing, certain errors are non-fatal.
> +		 */
> +	case -ENODATA:
> +		reason = "Loading of unsigned module";
> +		goto decide;
> +	case -ENOPKG:
> +		reason = "Loading of module with unsupported crypto";
> +		goto decide;
> +	case -ENOKEY:
> +		reason = "Loading of module with unavailable key";
> +	decide:
> +		if (is_module_sig_enforced()) {
> +			pr_notice("%s is rejected\n", reason);
> +			return -EKEYREJECTED;
> +		}
>  
> -	return err;
> +		if (security_is_locked_down(LOCKDOWN_MODULE_SIGNATURE))
> +			return -EPERM;

LSM hooks should return the desired error code. Here and in all the
other patches, I'd expect to see stuff like:

	ret = security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
	if (ret)
		return ret;


> +		return 0;
> +
> +		/* All other errors are fatal, including nomem, unparseable
> +		 * signatures and signature check failures - even if signatures
> +		 * aren't required.
> +		 */
> +	default:
> +		return err;
> +	}
>  }
>  #else /* !CONFIG_MODULE_SIG */
>  static int module_sig_check(struct load_info *info, int flags)
> diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
> index 1ecb2eecb245..08abd7e6609b 100644
> --- a/security/lockdown/lockdown.c
> +++ b/security/lockdown/lockdown.c
> @@ -18,6 +18,7 @@ static enum lockdown_reason kernel_locked_down;
>  
>  static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
>  	[LOCKDOWN_NONE] = "none",
> +	[LOCKDOWN_MODULE_SIGNATURE] = "unsigned module loading",
>  	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
>  	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
>  };
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH V33 27/30] lockdown: Print current->comm in restriction messages
From: Kees Cook @ 2019-06-21  4:09 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, linux-security, linux-kernel, linux-api, David Howells,
	Matthew Garrett
In-Reply-To: <20190621011941.186255-28-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 06:19:38PM -0700, Matthew Garrett wrote:
> Print the content of current->comm in messages generated by lockdown to
> indicate a restriction that was hit.  This makes it a bit easier to find
> out what caused the message.
> 
> The message now patterned something like:
> 
>         Lockdown: <comm>: <what> is restricted; see man kernel_lockdown.7
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
>  security/lockdown/lockdown.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
> index 14edc475d75c..408f0048f8a2 100644
> --- a/security/lockdown/lockdown.c
> +++ b/security/lockdown/lockdown.c
> @@ -80,8 +80,8 @@ early_param("lockdown", lockdown_param);
>  static int lockdown_is_locked_down(enum lockdown_reason what)
>  {	
>  	if ((kernel_locked_down >= what) && lockdown_reasons[what])
> -		pr_notice("Lockdown: %s is restricted; see man kernel_lockdown.7\n",
> -			  lockdown_reasons[what]);
> +		pr_notice("Lockdown: %s: %s is restricted; see man kernel_lockdown.7\n",
> +			  current->comm, lockdown_reasons[what]);
>  	return (kernel_locked_down >= what);
>  }

Maybe needs ratelimiting?

>  
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH V33 24/30] bpf: Restrict bpf when kernel lockdown is in confidentiality mode
From: Andy Lutomirski @ 2019-06-21  5:22 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: James Morris, linux-security, LKML, Linux API, David Howells,
	Alexei Starovoitov, Matthew Garrett, Network Development,
	Chun-Yi Lee, Daniel Borkmann
In-Reply-To: <20190621011941.186255-25-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 6:21 PM Matthew Garrett
<matthewgarrett@google.com> wrote:
>
> From: David Howells <dhowells@redhat.com>
>
> There are some bpf functions can be used to read kernel memory:
> bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> private keys in kernel memory (e.g. the hibernation image signing key) to
> be read by an eBPF program and kernel memory to be altered without
> restriction. Disable them if the kernel has been locked down in
> confidentiality mode.

This patch exemplifies why I don't like this approach:

> @@ -97,6 +97,7 @@ enum lockdown_reason {
>         LOCKDOWN_INTEGRITY_MAX,
>         LOCKDOWN_KCORE,
>         LOCKDOWN_KPROBES,
> +       LOCKDOWN_BPF,
>         LOCKDOWN_CONFIDENTIALITY_MAX,

> --- a/security/lockdown/lockdown.c
> +++ b/security/lockdown/lockdown.c
> @@ -33,6 +33,7 @@ static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
>         [LOCKDOWN_INTEGRITY_MAX] = "integrity",
>         [LOCKDOWN_KCORE] = "/proc/kcore access",
>         [LOCKDOWN_KPROBES] = "use of kprobes",
> +       [LOCKDOWN_BPF] = "use of bpf",
>         [LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",

The text here says "use of bpf", but what this patch is *really* doing
is locking down use of BPF to read kernel memory.  If the details
change, then every LSM needs to get updated, and we risk breaking user
policies that are based on LSMs that offer excessively fine
granularity.

I'd be more comfortable if the LSM only got to see "confidentiality"
or "integrity".

--Andy

^ permalink raw reply

* Re: [PATCH V33 01/30] security: Support early LSMs
From: Andy Lutomirski @ 2019-06-21  5:23 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: James Morris, linux-security, LKML, Linux API, Matthew Garrett
In-Reply-To: <20190621011941.186255-2-matthewgarrett@google.com>

On Thu, Jun 20, 2019 at 6:22 PM Matthew Garrett
<matthewgarrett@google.com> wrote:
>
> The lockdown module is intended to allow for kernels to be locked down
> early in boot - sufficiently early that we don't have the ability to
> kmalloc() yet. Add support for early initialisation of some LSMs, and
> then add them to the list of names when we do full initialisation later.

I'm confused.  What does it even mean to lock down the kernel before
we're ready to run userspace code?  We can't possibly be attacked by
user code before there is any to attack us.

Am I missing something here?

--Andy

^ permalink raw reply

* Re: [PATCH V31 06/25] kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE
From: Dave Young @ 2019-06-21  6:34 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Jiri Bohac, linux-api-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	jmorris-gx6/JNMH7DfYtjvyW6yDsg, Matthew Garrett,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	luto-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <20190326182742.16950-7-matthewgarrett-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

On 03/26/19 at 11:27am, Matthew Garrett wrote:
> From: Jiri Bohac <jbohac-AlSwsSmVLrQ@public.gmane.org>
> 
> This is a preparatory patch for kexec_file_load() lockdown.  A locked down
> kernel needs to prevent unsigned kernel images from being loaded with
> kexec_file_load().  Currently, the only way to force the signature
> verification is compiling with KEXEC_VERIFY_SIG.  This prevents loading
> usigned images even when the kernel is not locked down at runtime.
> 
> This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE.
> Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG
> turns on the signature verification but allows unsigned images to be
> loaded.  KEXEC_SIG_FORCE disallows images without a valid signature.
> 
> [Modified by David Howells such that:
> 
>  (1) verify_pefile_signature() differentiates between no-signature and
>      sig-didn't-match in its returned errors.
> 
>  (2) kexec fails with EKEYREJECTED and logs an appropriate message if
>      signature checking is enforced and an signature is not found, uses
>      unsupported crypto or has no matching key.
> 
>  (3) kexec fails with EKEYREJECTED if there is a signature for which we
>      have a key, but signature doesn't match - even if in non-forcing mode.
> 
>  (4) kexec fails with EBADMSG or some other error if there is a signature
>      which cannot be parsed - even if in non-forcing mode.
> 
>  (5) kexec fails with ELIBBAD if the PE file cannot be parsed to extract
>      the signature - even if in non-forcing mode.
> 
> ]
> 
> Signed-off-by: Jiri Bohac <jbohac-AlSwsSmVLrQ@public.gmane.org>
> Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Matthew Garrett <mjg59-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Jiri Bohac <jbohac-AlSwsSmVLrQ@public.gmane.org>
> cc: kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> ---
>  arch/x86/Kconfig                       | 20 ++++++++---
>  crypto/asymmetric_keys/verify_pefile.c |  4 ++-
>  include/linux/kexec.h                  |  4 +--
>  kernel/kexec_file.c                    | 48 ++++++++++++++++++++++----
>  4 files changed, 61 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4b4a7f32b68e..735d04a4b18f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2016,20 +2016,30 @@ config KEXEC_FILE
>  config ARCH_HAS_KEXEC_PURGATORY
>  	def_bool KEXEC_FILE
>  
> -config KEXEC_VERIFY_SIG
> +config KEXEC_SIG
>  	bool "Verify kernel signature during kexec_file_load() syscall"
>  	depends on KEXEC_FILE
>  	---help---
> -	  This option makes kernel signature verification mandatory for
> -	  the kexec_file_load() syscall.
>  
> -	  In addition to that option, you need to enable signature
> +	  This option makes the kexec_file_load() syscall check for a valid
> +	  signature of the kernel image.  The image can still be loaded without
> +	  a valid signature unless you also enable KEXEC_SIG_FORCE, though if
> +	  there's a signature that we can check, then it must be valid.
> +
> +	  In addition to this option, you need to enable signature
>  	  verification for the corresponding kernel image type being
>  	  loaded in order for this to work.
>  
> +config KEXEC_SIG_FORCE
> +	bool "Require a valid signature in kexec_file_load() syscall"
> +	depends on KEXEC_SIG
> +	---help---
> +	  This option makes kernel signature verification mandatory for
> +	  the kexec_file_load() syscall.
> +
>  config KEXEC_BZIMAGE_VERIFY_SIG
>  	bool "Enable bzImage signature verification support"
> -	depends on KEXEC_VERIFY_SIG
> +	depends on KEXEC_SIG
>  	depends on SIGNED_PE_FILE_VERIFICATION
>  	select SYSTEM_TRUSTED_KEYRING
>  	---help---
> diff --git a/crypto/asymmetric_keys/verify_pefile.c b/crypto/asymmetric_keys/verify_pefile.c
> index d178650fd524..4473cea1e877 100644
> --- a/crypto/asymmetric_keys/verify_pefile.c
> +++ b/crypto/asymmetric_keys/verify_pefile.c
> @@ -100,7 +100,7 @@ static int pefile_parse_binary(const void *pebuf, unsigned int pelen,
>  
>  	if (!ddir->certs.virtual_address || !ddir->certs.size) {
>  		pr_debug("Unsigned PE binary\n");
> -		return -EKEYREJECTED;
> +		return -ENODATA;
>  	}
>  
>  	chkaddr(ctx->header_size, ddir->certs.virtual_address,
> @@ -408,6 +408,8 @@ static int pefile_digest_pe(const void *pebuf, unsigned int pelen,
>   *  (*) 0 if at least one signature chain intersects with the keys in the trust
>   *	keyring, or:
>   *
> + *  (*) -ENODATA if there is no signature present.
> + *
>   *  (*) -ENOPKG if a suitable crypto module couldn't be found for a check on a
>   *	chain.
>   *
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index b9b1bc5f9669..58b27c7bdc2b 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -125,7 +125,7 @@ typedef void *(kexec_load_t)(struct kimage *image, char *kernel_buf,
>  			     unsigned long cmdline_len);
>  typedef int (kexec_cleanup_t)(void *loader_data);
>  
> -#ifdef CONFIG_KEXEC_VERIFY_SIG
> +#ifdef CONFIG_KEXEC_SIG
>  typedef int (kexec_verify_sig_t)(const char *kernel_buf,
>  				 unsigned long kernel_len);
>  #endif
> @@ -134,7 +134,7 @@ struct kexec_file_ops {
>  	kexec_probe_t *probe;
>  	kexec_load_t *load;
>  	kexec_cleanup_t *cleanup;
> -#ifdef CONFIG_KEXEC_VERIFY_SIG
> +#ifdef CONFIG_KEXEC_SIG
>  	kexec_verify_sig_t *verify_sig;
>  #endif
>  };
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index f1d0e00a3971..67f3a866eabe 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -90,7 +90,7 @@ int __weak arch_kimage_file_post_load_cleanup(struct kimage *image)
>  	return kexec_image_post_load_cleanup_default(image);
>  }
>  
> -#ifdef CONFIG_KEXEC_VERIFY_SIG
> +#ifdef CONFIG_KEXEC_SIG
>  static int kexec_image_verify_sig_default(struct kimage *image, void *buf,
>  					  unsigned long buf_len)
>  {
> @@ -188,7 +188,8 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
>  			     const char __user *cmdline_ptr,
>  			     unsigned long cmdline_len, unsigned flags)
>  {
> -	int ret = 0;
> +	const char *reason;
> +	int ret;
>  	void *ldata;
>  	loff_t size;
>  
> @@ -207,15 +208,48 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
>  	if (ret)
>  		goto out;
>  
> -#ifdef CONFIG_KEXEC_VERIFY_SIG
> +#ifdef CONFIG_KEXEC_SIG
>  	ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
>  					   image->kernel_buf_len);
> -	if (ret) {
> -		pr_debug("kernel signature verification failed.\n");
> +#else
> +	ret = -ENODATA;
> +#endif
> +
> +	switch (ret) {
> +	case 0:
> +		break;
> +
> +		/* Certain verification errors are non-fatal if we're not
> +		 * checking errors, provided we aren't mandating that there
> +		 * must be a valid signature.
> +		 */
> +	case -ENODATA:
> +		reason = "kexec of unsigned image";
> +		goto decide;
> +	case -ENOPKG:
> +		reason = "kexec of image with unsupported crypto";
> +		goto decide;
> +	case -ENOKEY:
> +		reason = "kexec of image with unavailable key";
> +	decide:
> +		if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
> +			pr_notice("%s rejected\n", reason);
> +			ret = -EKEYREJECTED;

Force use -EKEYREJECTED is odd,  why not just use original "ret"?

> +			goto out;
> +		}
> +
> +		ret = 0;
> +		break;
> +
> +		/* All other errors are fatal, including nomem, unparseable
> +		 * signatures and signature check failures - even if signatures
> +		 * aren't required.
> +		 */
> +	default:
> +		pr_notice("kernel signature verification failed (%d).\n", ret);
>  		goto out;
>  	}
> -	pr_debug("kernel signature verification successful.\n");
> -#endif
> +
>  	/* It is possible that there no initramfs is being loaded */
>  	if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
>  		ret = kernel_read_file_from_fd(initrd_fd, &image->initrd_buf,
> -- 
> 2.21.0.392.gf8f6787159e-goog
> 
> 
> _______________________________________________
> kexec mailing list
> kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

^ permalink raw reply

* Re: [PATCH V31 07/25] kexec_file: Restrict at runtime if the kernel is locked down
From: Dave Young @ 2019-06-21  6:43 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: jmorris, Jiri Bohac, linux-api, kexec, linux-kernel,
	Matthew Garrett, dhowells, linux-security-module, luto
In-Reply-To: <20190326182742.16950-8-matthewgarrett@google.com>

On 03/26/19 at 11:27am, Matthew Garrett wrote:
> From: Jiri Bohac <jbohac@suse.cz>
> 
> When KEXEC_SIG is not enabled, kernel should not load images through
> kexec_file systemcall if the kernel is locked down.
> 
> [Modified by David Howells to fit with modifications to the previous patch
>  and to return -EPERM if the kernel is locked down for consistency with
>  other lockdowns. Modified by Matthew Garrett to remove the IMA
>  integration, which will be replaced by integrating with the IMA
>  architecture policy patches.]
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> Reviewed-by: Jiri Bohac <jbohac@suse.cz>
> cc: kexec@lists.infradead.org
> ---
>  kernel/kexec_file.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 67f3a866eabe..a1cc37c8b43b 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -239,6 +239,12 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
>  		}
>  
>  		ret = 0;
> +
> +		if (kernel_is_locked_down(reason, LOCKDOWN_INTEGRITY)) {
> +			ret = -EPERM;
> +			goto out;
> +		}
> +

Checking here is late, it would be good to move the check to earlier
code around below code:
        /* We only trust the superuser with rebooting the system. */
        if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
                return -EPERM;

>  		break;
>  
>  		/* All other errors are fatal, including nomem, unparseable
> -- 
> 2.21.0.392.gf8f6787159e-goog
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

^ permalink raw reply

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
From: Arnd Bergmann @ 2019-06-21  9:37 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List, Linus Torvalds,
	Jann Horn, Kees Cook, Florian Weimer, Oleg Nesterov,
	David Howells, Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers
In-Reply-To: <20190620221003.ciuov5fzqxrcaykp@brauner.io>

On Fri, Jun 21, 2019 at 12:10 AM Christian Brauner <christian@brauner.io> wrote:
> On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> > On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
>
> clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
> simply define __ARCH_WANT_SYS_CLONE and are done with it.
> Some however, such as nios2 and h8300 don't define it but instead
> provide a sys_clone stub of their own because of architectural
> requirements (or tweaks) and they are mostly written in assembly. (That
> should be left to arch maintainers for sys_clone3.)
>
> The build failures were on my radar already. I hadn't yet replied
> since I haven't pushed the fixup below.
> The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
> cond_syscall(clone3) so we catch all architectures that do not yet
> provide clone3 with a ENOSYS until maintainers have added it.
>
> diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
> index 7a39e77984ef..aa35aa5d68dc 100644
> --- a/arch/arm/include/asm/unistd.h
> +++ b/arch/arm/include/asm/unistd.h
> @@ -40,6 +40,7 @@
>  #define __ARCH_WANT_SYS_FORK
>  #define __ARCH_WANT_SYS_VFORK
>  #define __ARCH_WANT_SYS_CLONE
> +#define __ARCH_WANT_SYS_CLONE3

I never really liked having __ARCH_WANT_SYS_CLONE here
because it was the only one that a new architecture needed to
set: all the other __ARCH_WANT_* are for system calls that
are already superseded by newer ones, so a new architecture
would start out with an empty list.

Since __ARCH_WANT_SYS_CLONE3 replaces
__ARCH_WANT_SYS_CLONE for new architectures, how about
leaving __ARCH_WANT_SYS_CLONE untouched but instead
coming up with the reverse for clone3 and mark the architectures
that specifically don't want it (if any)?

       Arnd

^ permalink raw reply

* Re: [PATCH 02/25] vfs: Allow fsinfo() to query what's in an fs_context [ver #13]
From: Christian Brauner @ 2019-06-21  9:47 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, linux-api, linux-fsdevel, linux-kernel, mszeredi
In-Reply-To: <155905627927.1662.13276277442207649583.stgit@warthog.procyon.org.uk>

On Tue, May 28, 2019 at 04:11:19PM +0100, David Howells wrote:
> Allow fsinfo() to be used to query the filesystem attached to an fs_context
> once a superblock has been created or if it comes from fspick().
> 
> This is done with something like:
> 
> 	fd = fsopen("ext4", 0);
> 	...
> 	fsconfig(fd, fsconfig_cmd_create, ...);
> 	fsinfo(fd, NULL, ...);
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/fsinfo.c |   30 +++++++++++++++++++++++++++++-
>  fs/statfs.c |    2 +-
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> index f9a63410e9a2..14db881dd02d 100644
> --- a/fs/fsinfo.c
> +++ b/fs/fsinfo.c
> @@ -8,6 +8,7 @@
>  #include <linux/security.h>
>  #include <linux/uaccess.h>
>  #include <linux/fsinfo.h>
> +#include <linux/fs_context.h>
>  #include <uapi/linux/mount.h>
>  #include "internal.h"
>  
> @@ -315,13 +316,40 @@ static int vfs_fsinfo_path(int dfd, const char __user *filename,
>  	return ret;
>  }
>  
> +static int vfs_fsinfo_fscontext(struct fs_context *fc,
> +				struct fsinfo_kparams *params)
> +{
> +	int ret;
> +
> +	if (fc->ops == &legacy_fs_context_ops)
> +		return -EOPNOTSUPP;
> +
> +	ret = mutex_lock_interruptible(&fc->uapi_mutex);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = -EIO;
> +	if (fc->root) {
> +		struct path path = { .dentry = fc->root };
> +
> +		ret = vfs_fsinfo(&path, params);
> +	}
> +
> +	mutex_unlock(&fc->uapi_mutex);
> +	return ret;
> +}
> +
>  static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_kparams *params)
>  {
>  	struct fd f = fdget_raw(fd);

You're using fdget_raw() which means you want to allow O_PATH fds but
below you're checking whether the f_ops correspond to
fscontext_fops. If it's an O_PATH f_ops will be set to empty_fops so
you'll always end up in the vfs_fsinfo branch. Is that your intention?

That means the new mount api doesn't support fsinfo() without using a
non-O_PATH fd, right? Why the fallback then?

Christian

>  	int ret = -EBADF;
>  
>  	if (f.file) {
> -		ret = vfs_fsinfo(&f.file->f_path, params);
> +		if (f.file->f_op == &fscontext_fops)
> +			ret = vfs_fsinfo_fscontext(f.file->private_data,
> +						   params);
> +		else
> +			ret = vfs_fsinfo(&f.file->f_path, params);
>  		fdput(f);
>  	}
>  	return ret;
> diff --git a/fs/statfs.c b/fs/statfs.c
> index eea7af6f2f22..b9b63d9f4f24 100644
> --- a/fs/statfs.c
> +++ b/fs/statfs.c
> @@ -86,7 +86,7 @@ int vfs_statfs(const struct path *path, struct kstatfs *buf)
>  	int error;
>  
>  	error = statfs_by_dentry(path->dentry, buf);
> -	if (!error)
> +	if (!error && path->mnt)
>  		buf->f_flags = calculate_f_flags(path->mnt);
>  	return error;
>  }
> 

^ permalink raw reply

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
From: Christian Brauner @ 2019-06-21 11:18 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List, Linus Torvalds,
	Jann Horn, Kees Cook, Florian Weimer, Oleg Nesterov,
	David Howells, Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers
In-Reply-To: <CAK8P3a2iV7=HkHBVL_puvCQN0DmdKEnVs2aG9MQV_8Q58JSfTA@mail.gmail.com>

On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 12:10 AM Christian Brauner <christian@brauner.io> wrote:
> > On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> > > On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> >
> > clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
> > simply define __ARCH_WANT_SYS_CLONE and are done with it.
> > Some however, such as nios2 and h8300 don't define it but instead
> > provide a sys_clone stub of their own because of architectural
> > requirements (or tweaks) and they are mostly written in assembly. (That
> > should be left to arch maintainers for sys_clone3.)
> >
> > The build failures were on my radar already. I hadn't yet replied
> > since I haven't pushed the fixup below.
> > The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
> > cond_syscall(clone3) so we catch all architectures that do not yet
> > provide clone3 with a ENOSYS until maintainers have added it.
> >
> > diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
> > index 7a39e77984ef..aa35aa5d68dc 100644
> > --- a/arch/arm/include/asm/unistd.h
> > +++ b/arch/arm/include/asm/unistd.h
> > @@ -40,6 +40,7 @@
> >  #define __ARCH_WANT_SYS_FORK
> >  #define __ARCH_WANT_SYS_VFORK
> >  #define __ARCH_WANT_SYS_CLONE
> > +#define __ARCH_WANT_SYS_CLONE3
> 
> I never really liked having __ARCH_WANT_SYS_CLONE here
> because it was the only one that a new architecture needed to
> set: all the other __ARCH_WANT_* are for system calls that
> are already superseded by newer ones, so a new architecture
> would start out with an empty list.
> 
> Since __ARCH_WANT_SYS_CLONE3 replaces
> __ARCH_WANT_SYS_CLONE for new architectures, how about
> leaving __ARCH_WANT_SYS_CLONE untouched but instead

__ARCH_WANT_SYS_CLONE is left untouched. :)

> coming up with the reverse for clone3 and mark the architectures
> that specifically don't want it (if any)?

Afaict, your suggestion is more or less the same thing what is done
here. So I'm not sure it buys us anything apart from future
architectures not needing to set __ARCH_WANT_SYS_CLONE3.

I expect the macro above to be only here temporarily until all arches
have caught up and we're sure that they don't require assembly stubs
(cf. [1]). A decision I'd leave to the maintainers (since even for
nios2 we were kind of on the fence what exactly the sys_clone stub was
supposed to do).

But I'm happy to take a patch from you if it's equally or more simple
than this one right here.

In any case, linux-next should be fine on all arches with this fixup
now.

Christian


[1]: Architectures such as nios2 or h8300 simply take the asm-generic
     syscall definitions and generate their syscall table from it. But
     since they don't define __ARCH_WANT_SYS_CLONE the build would fail
     complaining about sys_clone3 missing. The reason this doesn't
     happen for legacy clone is that nios2 and h8300 provide assembly
     stubs for sys_clone but they don't for sys_clone3.

^ permalink raw reply

* Re: [PATCH 02/25] vfs: Allow fsinfo() to query what's in an fs_context [ver #13]
From: David Howells @ 2019-06-21 13:12 UTC (permalink / raw)
  To: Christian Brauner
  Cc: dhowells, viro, raven, linux-api, linux-fsdevel, linux-kernel,
	mszeredi
In-Reply-To: <20190621094757.zijugn6cfulmchnf@brauner.io>

Christian Brauner <christian@brauner.io> wrote:

> >  static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_kparams *params)
> >  {
> >  	struct fd f = fdget_raw(fd);
> 
> You're using fdget_raw() which means you want to allow O_PATH fds but
> below you're checking whether the f_ops correspond to
> fscontext_fops. If it's an O_PATH

It can't be.  The only way to get an fs_context fd is from fsopen() or
fspick() - neither of which allow O_PATH to be specified.

If you tried to go through /proc/pid/fd with open(O_PATH), I think you'd get
the symlink, not the target.

David

^ permalink raw reply

* Re: [PATCH 02/25] vfs: Allow fsinfo() to query what's in an fs_context [ver #13]
From: Christian Brauner @ 2019-06-21 13:16 UTC (permalink / raw)
  Cc: dhowells, viro, raven, linux-api, linux-fsdevel, linux-kernel,
	mszeredi
In-Reply-To: <21652.1561122763@warthog.procyon.org.uk>

On June 21, 2019 3:12:43 PM GMT+02:00, David Howells <dhowells@redhat.com> wrote:
>Christian Brauner <christian@brauner.io> wrote:
>
>> >  static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_kparams
>*params)
>> >  {
>> >  	struct fd f = fdget_raw(fd);
>> 
>> You're using fdget_raw() which means you want to allow O_PATH fds but
>> below you're checking whether the f_ops correspond to
>> fscontext_fops. If it's an O_PATH
>
>It can't be.  The only way to get an fs_context fd is from fsopen() or
>fspick() - neither of which allow O_PATH to be specified.
>
>If you tried to go through /proc/pid/fd with open(O_PATH), I think
>you'd get
>the symlink, not the target.

Then you should use fdget(), no? :)

Christian

^ permalink raw reply

* Re: [PATCH 02/25] vfs: Allow fsinfo() to query what's in an fs_context [ver #13]
From: Christian Brauner @ 2019-06-21 13:28 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, linux-api, linux-fsdevel, linux-kernel, mszeredi
In-Reply-To: <E76F5188-CED8-4472-9136-BDCDFDAF57F0@brauner.io>

On Fri, Jun 21, 2019 at 03:16:04PM +0200, Christian Brauner wrote:
> On June 21, 2019 3:12:43 PM GMT+02:00, David Howells <dhowells@redhat.com> wrote:
> >Christian Brauner <christian@brauner.io> wrote:
> >
> >> >  static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_kparams
> >*params)
> >> >  {
> >> >  	struct fd f = fdget_raw(fd);
> >> 
> >> You're using fdget_raw() which means you want to allow O_PATH fds but
> >> below you're checking whether the f_ops correspond to
> >> fscontext_fops. If it's an O_PATH
> >
> >It can't be.  The only way to get an fs_context fd is from fsopen() or
> >fspick() - neither of which allow O_PATH to be specified.
> >
> >If you tried to go through /proc/pid/fd with open(O_PATH), I think
> >you'd get
> >the symlink, not the target.
> 
> Then you should use fdget(), no? :)

That is unless you want fsinfo() to be useable on any fd and just fds
that are returned from the new mount-api syscalls. Maybe that wasn't
clear from my first mail.

Is the information returned for:

int fd = fsopen()/fspick();
fsinfo(fd);

int ofd = open("/", O_PATH);
fsinfo(ofd, ...);

the same if they refer to the same mount or would they differ?

Christian

^ permalink raw reply

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
From: Arnd Bergmann @ 2019-06-21 14:20 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List, Linus Torvalds,
	Jann Horn, Kees Cook, Florian Weimer, Oleg Nesterov,
	David Howells, Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE
In-Reply-To: <20190621111839.v5yqlws6iw7mx4aa@brauner.io>

On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> >
> > I never really liked having __ARCH_WANT_SYS_CLONE here
> > because it was the only one that a new architecture needed to
> > set: all the other __ARCH_WANT_* are for system calls that
> > are already superseded by newer ones, so a new architecture
> > would start out with an empty list.
> >
> > Since __ARCH_WANT_SYS_CLONE3 replaces
> > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > leaving __ARCH_WANT_SYS_CLONE untouched but instead
>
> __ARCH_WANT_SYS_CLONE is left untouched. :)
>
> > coming up with the reverse for clone3 and mark the architectures
> > that specifically don't want it (if any)?
>
> Afaict, your suggestion is more or less the same thing what is done
> here. So I'm not sure it buys us anything apart from future
> architectures not needing to set __ARCH_WANT_SYS_CLONE3.
>
> I expect the macro above to be only here temporarily until all arches
> have caught up and we're sure that they don't require assembly stubs
> (cf. [1]). A decision I'd leave to the maintainers (since even for
> nios2 we were kind of on the fence what exactly the sys_clone stub was
> supposed to do).
>
> But I'm happy to take a patch from you if it's equally or more simple
> than this one right here.
>
> In any case, linux-next should be fine on all arches with this fixup
> now.

I've looked at bit more closely at the nios2 implementation, and I
believe this is purely an artifact of this file being copied over
from m68k, which also has an odd definition. The glibc side
of nios2 clone() is also odd in other ways, but that appears
to be unrelated to the kernel ABI.

I think the best option here would be to not have any special
cases and just hook up clone3() the same way on all
architectures, with no #ifdef at all. If it turns out to not work
on a particular architecture later, they can still disable the
syscall then.

      Arnd

^ permalink raw reply

* Re: [PATCH 02/25] vfs: Allow fsinfo() to query what's in an fs_context [ver #13]
From: David Howells @ 2019-06-21 14:50 UTC (permalink / raw)
  To: Christian Brauner
  Cc: dhowells, viro, raven, linux-api, linux-fsdevel, linux-kernel,
	mszeredi
In-Reply-To: <20190621132839.6ggsppexqfp5htpw@brauner.io>

Christian Brauner <christian@brauner.io> wrote:

> > >If you tried to go through /proc/pid/fd with open(O_PATH), I think
> > >you'd get
> > >the symlink, not the target.
> > 
> > Then you should use fdget(), no? :)
> 
> That is unless you want fsinfo() to be useable on any fd and just fds
> that are returned from the new mount-api syscalls. Maybe that wasn't
> clear from my first mail.

fsinfo(), as coded, is usable on any fd, as for fstat(), statx() and
fstatfs().

I have made it such that if you do this on the fd returned by fsopen() or
fspick(), the access is diverted to the filesystem that the fs_context refers
to since querying anon_inodes is of little value.

Now, it could be argued that it should require an AT_xxx flag to cause this
diversion to happen.

> Is the information returned for:
> 
> int fd = fsopen()/fspick();
> fsinfo(fd);
> 
> int ofd = open("/", O_PATH);
> fsinfo(ofd, ...);
> 
> the same if they refer to the same mount or would they differ?

At the moment it differs.  In the former case, there may not even be a
superblock attached to the fd to query, though invariants like filesystem
parameter types and names can be queried.

David

^ permalink raw reply

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
From: Christian Brauner @ 2019-06-21 15:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List, Linus Torvalds,
	Jann Horn, Kees Cook, Florian Weimer, Oleg Nesterov,
	David Howells, Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE
In-Reply-To: <CAK8P3a0T1=eg5ONbMFhHi=vmk1K5uogZ+5=wpsXvjVDzn6vS=Q@mail.gmail.com>

On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> > On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> > >
> > > I never really liked having __ARCH_WANT_SYS_CLONE here
> > > because it was the only one that a new architecture needed to
> > > set: all the other __ARCH_WANT_* are for system calls that
> > > are already superseded by newer ones, so a new architecture
> > > would start out with an empty list.
> > >
> > > Since __ARCH_WANT_SYS_CLONE3 replaces
> > > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > > leaving __ARCH_WANT_SYS_CLONE untouched but instead
> >
> > __ARCH_WANT_SYS_CLONE is left untouched. :)
> >
> > > coming up with the reverse for clone3 and mark the architectures
> > > that specifically don't want it (if any)?
> >
> > Afaict, your suggestion is more or less the same thing what is done
> > here. So I'm not sure it buys us anything apart from future
> > architectures not needing to set __ARCH_WANT_SYS_CLONE3.
> >
> > I expect the macro above to be only here temporarily until all arches
> > have caught up and we're sure that they don't require assembly stubs
> > (cf. [1]). A decision I'd leave to the maintainers (since even for
> > nios2 we were kind of on the fence what exactly the sys_clone stub was
> > supposed to do).
> >
> > But I'm happy to take a patch from you if it's equally or more simple
> > than this one right here.
> >
> > In any case, linux-next should be fine on all arches with this fixup
> > now.
> 
> I've looked at bit more closely at the nios2 implementation, and I
> believe this is purely an artifact of this file being copied over
> from m68k, which also has an odd definition. The glibc side
> of nios2 clone() is also odd in other ways, but that appears
> to be unrelated to the kernel ABI.
> 
> I think the best option here would be to not have any special
> cases and just hook up clone3() the same way on all
> architectures, with no #ifdef at all. If it turns out to not work
> on a particular architecture later, they can still disable the
> syscall then.

Hm, if you believe that this is fine and want to "vouch" for it by
whipping up a patch that replaces the wiring up done in [1] I'm happy to
take it. :) Otherwise I'd feel more comfortable not adding all arches at
once.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Christian

^ permalink raw reply

* Re: [PATCH] samples: make pidfd-metadata fail gracefully on older kernels
From: Dmitry V. Levin @ 2019-06-21 17:06 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jann Horn, Oleg Nesterov, Arnd Bergmann, linux-api, linux-kernel
In-Reply-To: <20190620111036.asi3mbcv4ax5ekrw@brauner.io>

[-- Attachment #1: Type: text/plain, Size: 3465 bytes --]

On Thu, Jun 20, 2019 at 01:10:37PM +0200, Christian Brauner wrote:
> On Thu, Jun 20, 2019 at 02:00:37PM +0300, Dmitry V. Levin wrote:
> > Cc'ed more people as the issue is not just with the example but
> > with the interface itself.
> > 
> > On Thu, Jun 20, 2019 at 12:31:06PM +0200, Christian Brauner wrote:
> > > On Thu, Jun 20, 2019 at 06:11:44AM +0300, Dmitry V. Levin wrote:
> > > > Initialize pidfd to an invalid descriptor, to fail gracefully on
> > > > those kernels that do not implement CLONE_PIDFD and leave pidfd
> > > > unchanged.
> > > > 
> > > > Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
> > > > ---
> > > >  samples/pidfd/pidfd-metadata.c | 8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/samples/pidfd/pidfd-metadata.c b/samples/pidfd/pidfd-metadata.c
> > > > index 14b454448429..ff109fdac3a5 100644
> > > > --- a/samples/pidfd/pidfd-metadata.c
> > > > +++ b/samples/pidfd/pidfd-metadata.c
> > > > @@ -83,7 +83,7 @@ static int pidfd_metadata_fd(pid_t pid, int pidfd)
> > > >  
> > > >  int main(int argc, char *argv[])
> > > >  {
> > > > -	int pidfd = 0, ret = EXIT_FAILURE;
> > > > +	int pidfd = -1, ret = EXIT_FAILURE;
> > > 
> > > Hm, that currently won't work since we added a check in fork.c for
> > > pidfd == 0. If it isn't you'll get EINVAL.
> > 
> > Sorry, I must've missed that check.  But this makes things even worse.
> > 
> > > This was done to ensure that
> > > we can potentially extend CLONE_PIDFD by passing in flags through the
> > > return argument.
> > > However, I find this increasingly unlikely. Especially since the
> > > interface would be horrendous and an absolute last resort.
> > > If clone3() gets merged for 5.3 (currently in linux-next) we also have
> > > no real need anymore to extend legacy clone() this way. So either wait
> > > until (if) we merge clone3() where the check I mentioned is gone anyway,
> > > or remove the pidfd == 0 check from fork.c in a preliminary patch.
> > > Thoughts?
> > 
> > Userspace needs a reliable way to tell whether CLONE_PIDFD is supported
> > by the kernel or not.
> 
> Right, that's the general problem with legacy clone(): it ignores
> unknown flags... clone3() will EINVAL you if you pass any flag it
> doesn't know about.
> 
> For legacy clone you can pass
> 
> (CLONE_PIDFD | CLONE_DETACHED)
> 
> on all relevant kernels >= 2.6.2. CLONE_DETACHED will be silently
> ignored by the kernel if specified in flags. But if you specify both
> CLONE_PIDFD and CLONE_DETACHED on a kernel that does support CLONE_PIDFD
> you'll get EINVALed. (We did this because we wanted to have the ability
> to make CLONE_DETACHED reuseable with CLONE_PIDFD.)
> Does that help?

Yes, this is feasible, but the cost is extra syscall for new kernels
and more complicated userspace code, so...

> > If CLONE_PIDFD is not supported, then pidfd remains unchanged.
> > 
> > If CLONE_PIDFD is supported and fd 0 is closed, then mandatory pidfd == 0
> > also remains unchanged, which effectively means that userspace must ensure
> > that fd 0 is not closed when invoking CLONE_PIDFD.  This is ugly.
> > 
> > If we can assume that clone(CLONE_PIDFD) is not going to be extended,
> > then I'm for removing the pidfd == 0 check along with recommending
> > userspace to initialize pidfd with -1.
> 
> Right, I'm ok with that too.

... I'd prefer this variant.


-- 
ldv

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH V33 01/30] security: Support early LSMs
From: Matthew Garrett @ 2019-06-21 19:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: James Morris, linux-security, Linux Kernel Mailing List,
	Linux API
In-Reply-To: <201906202010.49D16E03@keescook>

On Thu, Jun 20, 2019 at 8:22 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Jun 20, 2019 at 06:19:12PM -0700, Matthew Garrett wrote:
> > The lockdown module is intended to allow for kernels to be locked down
> > early in boot - sufficiently early that we don't have the ability to
> > kmalloc() yet. Add support for early initialisation of some LSMs, and
> > then add them to the list of names when we do full initialisation later.
>
> So, if I'm reading correctly, these "early LSMs":
>
> - start up before even boot parameter parsing has happened
> - have their position in the LSM ordering ignored
> - are initialized in boot order
> - cannot use kmalloc, as well as probably lots of other things

Accurate. I've expanded the description.

> >       pr_info("Security Framework initializing\n");
>
> I'd rather this was kept in security_init() since it's the string to
> search for when debugging normal LSM initialization.

Ok.

> >
> > @@ -343,6 +342,30 @@ int __init security_init(void)
> >            i++)
> >               INIT_HLIST_HEAD(&list[i]);
> >
> > +     for (lsm = __start_early_lsm_info; lsm < __end_early_lsm_info; lsm++) {
> > +             if (!lsm->enabled)
> > +                     lsm->enabled = &lsm_enabled_true;
> > +             initialize_lsm(lsm);
> > +     }
>
> This should call prepare_lsm() before initialize_lsm(). While not needed
> for this specific LSM, it would be nice to at least do the blog size
> calculations and keep everything the same as other LSMs.

Ok.

> > +
> > +     return 0;
> > +}
> > +
> > +/**
> > + * security_init - initializes the security framework
> > + *
> > + * This should be called early in the kernel initialization sequence.
> > + */
> > +int __init security_init(void)
> > +{
> > +     struct lsm_info *lsm;
> > +
> > +     /* Append the names of the early LSM modules now */
>
> I would clarify this comment more: "... that kmalloc() is available."

Ok,

^ permalink raw reply

* Re: [PATCH V33 01/30] security: Support early LSMs
From: Matthew Garrett @ 2019-06-21 19:27 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: James Morris, linux-security, LKML, Linux API
In-Reply-To: <CALCETrX87W4FE1xHF_W4=Do25Ci=LJxnvxNHMs9CTOFo4988aw@mail.gmail.com>

On Thu, Jun 20, 2019 at 10:23 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Thu, Jun 20, 2019 at 6:22 PM Matthew Garrett
> <matthewgarrett@google.com> wrote:
> >
> > The lockdown module is intended to allow for kernels to be locked down
> > early in boot - sufficiently early that we don't have the ability to
> > kmalloc() yet. Add support for early initialisation of some LSMs, and
> > then add them to the list of names when we do full initialisation later.
>
> I'm confused.  What does it even mean to lock down the kernel before
> we're ready to run userspace code?  We can't possibly be attacked by
> user code before there is any to attack us.

Certain kernel parameters can be disabled by lockdown, so we want to
have policy available before that parsing happens.

^ permalink raw reply

* Re: [PATCH V33 02/30] security: Add a "locked down" LSM hook
From: Matthew Garrett @ 2019-06-21 19:29 UTC (permalink / raw)
  To: Kees Cook; +Cc: James Morris, Linux Kernel Mailing List, Linux API
In-Reply-To: <201906202022.B09ED6E0@keescook>

On Thu, Jun 20, 2019 at 8:23 PM Kees Cook <keescook@chromium.org> wrote:

> bikeshed: can this just be called "security_locked_down" without the
> "is"?

Sure.

^ permalink raw reply

* Re: [PATCH V33 03/30] security: Add a static lockdown policy LSM
From: Matthew Garrett @ 2019-06-21 19:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: James Morris, linux-security, Linux Kernel Mailing List,
	Linux API, David Howells
In-Reply-To: <201906202028.5AB58C3@keescook>

On Thu, Jun 20, 2019 at 8:44 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Jun 20, 2019 at 06:19:14PM -0700, Matthew Garrett wrote:
> > +/*
> > + *  If you add to this, remember to extend lockdown_reasons in
> > + *  security/lockdown/lockdown.c.
> > + */
>
> Best to add something like:
>
> BUILD_BUG_ON(ARRAY_SIZE(lockdown_reasons), LOCKDOWN_CONFIDENTIALLY_MAX);
>
> to actually enforce this.

I don't think this will work - it'll only catch cases where someone
adds a new enum after LOCKDOWN_CONFIDENTIALITY_MAX.

> >  enum lockdown_reason {
> >       LOCKDOWN_NONE,
> >       LOCKDOWN_INTEGRITY_MAX,
> > diff --git a/security/Kconfig b/security/Kconfig
> > index 1d6463fb1450..c35aa72103df 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -236,12 +236,13 @@ source "security/apparmor/Kconfig"
> >  source "security/loadpin/Kconfig"
> >  source "security/yama/Kconfig"
> >  source "security/safesetid/Kconfig"
> > +source "security/lockdown/Kconfig"
> >
> >  source "security/integrity/Kconfig"
> >
> >  config LSM
> >       string "Ordered list of enabled LSMs"
> > -     default "yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
> > +     default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
>
> Is this needed? It seems like the early LSMs are totally ignored for
> ordering?

It's relevant if it's not configured as an early LSM.

> > +config SECURITY_LOCKDOWN_LSM
> > +     bool "Basic module for enforcing kernel lockdown"
> > +     depends on SECURITY
> > +     help
> > +       Build support for an LSM that enforces a coarse kernel lockdown
> > +       behaviour.
> > +
> > +config SECURITY_LOCKDOWN_LSM_EARLY
> > +        bool "Enable lockdown LSM early in init"
>
> whitespace glitches?

Fxied.

> > +static enum lockdown_reason kernel_locked_down;
>
> What's the use-case for runtime changing this value? (If you didn't, you
> could make it __ro_after_init.)

Cases where the admin wants to make the policy more strict after boot
via securityfs.

> > +     for (i = 0; i < ARRAY_SIZE(lockdown_levels); i++) {
> > +             enum lockdown_reason level = lockdown_levels[i];
> > +
> > +             if (lockdown_reasons[level]) {
> > +                     const char *label = lockdown_reasons[level];
> > +
> > +                     if (kernel_locked_down == level)
> > +                             offset += sprintf(temp+offset, "[%s] ", label);
> > +                     else
> > +                             offset += sprintf(temp+offset, "%s ", label);
> > +             }
> > +     }
>
> I thought there were helpers for this kind of thing?

I'll check, I'm bad at finding these new fangled things.

> Ah, I see now: it *might* be an early LSM. What states are missed if not
> early? Only parameters? I think the behavior differences need to be
> spelled out in Kconfig (or somewhere...)

Ok.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox