Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [RFC PATCH v5 1/1] Add dm verity root hash pkcs7 sig validation.
From: Milan Broz @ 2019-06-27 12:17 UTC (permalink / raw)
  To: Jaskaran Khurana, linux-security-module, linux-kernel,
	linux-integrity, linux-fsdevel
  Cc: agk, snitzer, dm-devel, jmorris, scottsh, ebiggers, mpatocka
In-Reply-To: <20190619191048.20365-2-jaskarankhurana@linux.microsoft.com>

Hi,

I tried to test test the patch, two comments below.

On 19/06/2019 21:10, Jaskaran Khurana wrote:
> The verification is to support cases where the roothash is not secured by
> Trusted Boot, UEFI Secureboot or similar technologies.
> One of the use cases for this is for dm-verity volumes mounted after boot,
> the root hash provided during the creation of the dm-verity volume has to
> be secure and thus in-kernel validation implemented here will be used
> before we trust the root hash and allow the block device to be created.
> 
> The signature being provided for verification must verify the root hash and
> must be trusted by the builtin keyring for verification to succeed.
> 
> The hash is added as a key of type "user" and the description is passed to 
> the kernel so it can look it up and use it for verification.
> 
> Kernel commandline parameter will indicate whether to check (only if 
> specified) or force (for all dm verity volumes) roothash signature 
> verification.
> 
> Kernel commandline: dm_verity.verify_sig=1 or 2 for check/force root hash
> signature validation respectively.

1) I think the switch should be just boolean - enforce signatures for all dm-verity targets
(with default to false/off).

The rest should be handled by simple logic - if the root_hash_sig_key_desc option
is specified, the signature MUST be validated in the constructor, all errors should cause
failure (bad reference in keyring, bad signature, etc).

(Now it ignores for example bad reference to the keyring, this is quite misleading.)

If a user wants to activate a dm-verity device without a signature, just remove
optional argument referencing the signature.
(This is not possible with dm_verity.verify_sig set to true/on.)


2) All DM targets must provide the same mapping table status ("dmsetup table"
command) as initially configured.
The output of the command should be directly usable as mapping table constructor.

Your patch is missing that part, I tried to fix it, add-on patch is here
https://git.kernel.org/pub/scm/linux/kernel/git/mbroz/linux.git/commit/?h=dm-cryptsetup&id=a26c10806f5257e255b6a436713127e762935ad3
(feel free to fold it in your patch)


Thanks,
Milan

^ permalink raw reply

* LSM module for SGX?
From: Jarkko Sakkinen @ 2019-06-27 12:56 UTC (permalink / raw)
  To: linux-security-module, linux-sgx, x86
  Cc: casey.schaufler, jmorris, luto, sean.j.christopherson

Looking at the SGX-LSM discussions I haven't seen even a single email
that would have any conclusions that the new hooks are the only possible
route to limit the privileges to use SGX.

An obvious alternative to consider might be to have a small-scale LSM
that you could stack. AFAIK Casey's LSM stacking patch set has not yet
landed but I also remember that with some constraints you can still do
it. Casey explained these constraints to me few years ago but I can't
recall them anymore :-)

One example of this is Yama, which limits the use of ptrace(). You can
enable it together with any of the "big" LSMs in the kernel.

A major benefit in this approach would that it is non-intrusive when it
comes to other architectures than x86. New hooks are not only
maintenance burden for those who care about SGX but also for those who
have to deal with LSMs.

/Jarkko


^ permalink raw reply

* Re: linux-next: Tree for Jun 26 (security/integrity/ima/)
From: Mimi Zohar @ 2019-06-27 13:02 UTC (permalink / raw)
  To: Randy Dunlap, Stephen Rothwell, Linux Next Mailing List,
	David Howells
  Cc: Linux Kernel Mailing List, linux-integrity, Dmitry Kasatkin,
	linux-security-module
In-Reply-To: <ee503bc1-a588-81f5-47e0-1762f590662f@infradead.org>

[Cc'ing David Howells]

On Wed, 2019-06-26 at 11:35 -0700, Randy Dunlap wrote:
> On 6/26/19 6:16 AM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > The sparc64 builds are broken in this tree, sorry.
> > 
> > Changes since 20190625:
> > 
> 
> on x86_64:
> 
> 11 warnings like this one (in a randconfig build):
> 
>   CC      security/integrity/ima/ima_fs.o
> In file included from ../security/integrity/ima/ima.h:25:0,
>                  from ../security/integrity/ima/ima_fs.c:26:
> ../security/integrity/ima/../integrity.h:170:18: warning: ‘struct key_acl’ declared inside parameter list [enabled by default]
>            struct key_acl *acl)
>                   ^
> ../security/integrity/ima/../integrity.h:170:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]

David, CONFIG_INTEGRITY_SIGNATURE is dependent on KEYS being enabled,
but the stub functions are not.  There's now a dependency on
key_acl().

Mimi


^ permalink raw reply

* [PATCH v9 0/3] add init_on_alloc/init_on_free boot options
From: Alexander Potapenko @ 2019-06-27 13:03 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, Kees Cook
  Cc: Alexander Potapenko, Masahiro Yamada, Michal Hocko, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, Marco Elver, Qian Cai, linux-mm,
	linux-security-module, kernel-hardening

Provide init_on_alloc and init_on_free boot options.

These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.

Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes.
SLOB allocator isn't supported at the moment, as its emulation of kmem
caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly.

Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.

As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations. There's not enough
evidence that doing so can speed up real-life cases, and introducing
ways to opt-out may result in things going out of control.

To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com

Alexander Potapenko (2):
  mm: security: introduce init_on_alloc=1 and init_on_free=1 boot
    options
  mm: init: report memory auto-initialization features at boot time

 .../admin-guide/kernel-parameters.txt         |  9 +++
 drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
 include/linux/mm.h                            | 24 +++++++
 init/main.c                                   | 24 +++++++
 mm/dmapool.c                                  |  4 +-
 mm/page_alloc.c                               | 71 +++++++++++++++++--
 mm/slab.c                                     | 16 ++++-
 mm/slab.h                                     | 20 ++++++
 mm/slub.c                                     | 41 +++++++++--
 net/core/sock.c                               |  2 +-
 security/Kconfig.hardening                    | 29 ++++++++
 11 files changed, 224 insertions(+), 18 deletions(-)
---
 v3: dropped __GFP_NO_AUTOINIT patches
 v5: dropped support for SLOB allocator, handle SLAB_TYPESAFE_BY_RCU
 v6: changed wording in boot-time message
 v7: dropped the test_meminit.c patch (picked by Andrew Morton already),
     minor wording changes
 v8: fixes for interoperability with other heap debugging features
 v9: added support for page/slab poisoning
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply

* [PATCH v9 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Alexander Potapenko @ 2019-06-27 13:03 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter, Kees Cook
  Cc: Alexander Potapenko, Masahiro Yamada, Michal Hocko, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, Marco Elver, Qian Cai, linux-mm,
	linux-security-module, kernel-hardening
In-Reply-To: <20190627130316.254309-1-glider@google.com>

The new options are needed to prevent possible information leaks and
make control-flow bugs that depend on uninitialized values more
deterministic.

This is expected to be on-by-default on Android and Chrome OS. And it
gives the opportunity for anyone else to use it under distros too via
the boot args. (The init_on_free feature is regularly requested by
folks where memory forensics is included in their threat models.)

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
zero-initialized to preserve their semantics.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

If either SLUB poisoning or page poisoning is enabled, those options
take precedence over init_on_alloc and init_on_free: initialization is
only applied to unpoisoned allocations.

Slowdown for the new features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

The new features are also going to pave the way for hardware memory
tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects. With MTE, tagging will have the
same cost as memory initialization.

Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized. There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
To: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 v2:
  - unconditionally initialize pages in kernel_init_free_pages()
  - comment from Randy Dunlap: drop 'default false' lines from Kconfig.hardening
 v3:
  - don't call kernel_init_free_pages() from memblock_free_pages()
  - adopted some Kees' comments for the patch description
 v4:
  - use NULL instead of 0 in slab_alloc_node() (found by kbuild test robot)
  - don't write to NULL object in slab_alloc_node() (found by Android
    testing)
 v5:
  - adjusted documentation wording as suggested by Kees
  - disable SLAB_POISON if auto-initialization is on
  - don't wipe RCU cache allocations made without __GFP_ZERO
  - dropped SLOB support
 v7:
  - rebase the patch, added the Acked-by: tag
 v8:
  - addressed comments by Michal Hocko: revert kernel/kexec_core.c and
    apply initialization in dma_pool_free()
  - disable init_on_alloc/init_on_free if slab poisoning or page
    poisoning are enabled, as requested by Qian Cai
  - skip the redzone when initializing a freed heap object, as requested
    by Qian Cai and Kees Cook
  - use s->offset to address the freeptr (suggested by Kees Cook)
  - updated the patch description, added Signed-off-by: tag
 v9:
  - picked up -mm fixes from Qian Cai and Andrew Morton (order of calls
    in free_pages_prepare(), export init_on_alloc)
  - exported init_on_free
  - allowed using init_on_alloc/init_on_free with SLUB poisoning and
    page poisoning. Poisoning supersedes zero-initialization, so some
    tests may behave differently with poisoning enabled.
---
 .../admin-guide/kernel-parameters.txt         |  9 +++
 drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
 include/linux/mm.h                            | 24 +++++++
 mm/dmapool.c                                  |  4 +-
 mm/page_alloc.c                               | 71 +++++++++++++++++--
 mm/slab.c                                     | 16 ++++-
 mm/slab.h                                     | 20 ++++++
 mm/slub.c                                     | 41 +++++++++--
 net/core/sock.c                               |  2 +-
 security/Kconfig.hardening                    | 29 ++++++++
 10 files changed, 200 insertions(+), 18 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..84ee1121a2b9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1673,6 +1673,15 @@
 
 	initrd=		[BOOT] Specify the location of the initial ramdisk
 
+	init_on_alloc=	[MM] Fill newly allocated pages and heap objects with
+			zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
+
+	init_on_free=	[MM] Fill freed pages and heap objects with zeroes.
+			Format: 0 | 1
+			Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
+
 	init_pkru=	[x86] Specify the default memory protection keys rights
 			register contents for all processes.  0x55555554 by
 			default (disallow access to all but pkey 0).  Can
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index 829b0c6944d8..61758201d9b2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_on_alloc(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dd0b5f4e1e45..81b582657854 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2696,6 +2696,30 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+static inline bool want_init_on_alloc(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_on_alloc) &&
+	    !page_poisoning_enabled())
+		return true;
+	return flags & __GFP_ZERO;
+}
+
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DECLARE_STATIC_KEY_TRUE(init_on_free);
+#else
+DECLARE_STATIC_KEY_FALSE(init_on_free);
+#endif
+static inline bool want_init_on_free(void)
+{
+	return static_branch_unlikely(&init_on_free) &&
+	       !page_poisoning_enabled();
+}
+
 extern bool _debug_pagealloc_enabled;
 
 static inline bool debug_pagealloc_enabled(void)
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 8c94c89a6f7e..fe5d33060415 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_on_alloc(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
@@ -428,6 +428,8 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
 	}
 
 	offset = vaddr - page->vaddr;
+	if (want_init_on_free())
+		memset(vaddr, 0, pool->size);
 #ifdef	DMAPOOL_DEBUG
 	if ((dma - page->dma) != offset) {
 		spin_unlock_irqrestore(&pool->lock, flags);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..c3123fa41bba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -136,6 +136,55 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_alloc);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_alloc);
+#endif
+EXPORT_SYMBOL(init_on_alloc);
+
+#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
+DEFINE_STATIC_KEY_TRUE(init_on_free);
+#else
+DEFINE_STATIC_KEY_FALSE(init_on_free);
+#endif
+EXPORT_SYMBOL(init_on_free);
+
+static int __init early_init_on_alloc(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result && IS_ENABLED(CONFIG_PAGE_POISONING))
+		pr_warn("mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_alloc\n");
+	if (bool_result)
+		static_branch_enable(&init_on_alloc);
+	else
+		static_branch_disable(&init_on_alloc);
+	return ret;
+}
+early_param("init_on_alloc", early_init_on_alloc);
+
+static int __init early_init_on_free(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &bool_result);
+	if (bool_result && IS_ENABLED(CONFIG_PAGE_POISONING))
+		pr_warn("mem auto-init: CONFIG_PAGE_POISONING is on, will take precedence over init_on_free\n");
+	if (bool_result)
+		static_branch_enable(&init_on_free);
+	else
+		static_branch_disable(&init_on_free);
+	return ret;
+}
+early_param("init_on_free", early_init_on_free);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -1090,6 +1139,14 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return ret;
 }
 
+static void kernel_init_free_pages(struct page *page, int numpages)
+{
+	int i;
+
+	for (i = 0; i < numpages; i++)
+		clear_highpage(page + i);
+}
+
 static __always_inline bool free_pages_prepare(struct page *page,
 					unsigned int order, bool check_free)
 {
@@ -1141,6 +1198,9 @@ static __always_inline bool free_pages_prepare(struct page *page,
 					   PAGE_SIZE << order);
 	}
 	arch_free_page(page, order);
+	if (want_init_on_free())
+		kernel_init_free_pages(page, 1 << order);
+
 	kernel_poison_pages(page, 1 << order, 0);
 	if (debug_pagealloc_enabled())
 		kernel_map_pages(page, 1 << order, 0);
@@ -2020,8 +2080,8 @@ static inline int check_new_page(struct page *page)
 
 static inline bool free_pages_prezeroed(void)
 {
-	return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
-		page_poisoning_enabled();
+	return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
+		page_poisoning_enabled()) || want_init_on_free();
 }
 
 #ifdef CONFIG_DEBUG_VM
@@ -2075,13 +2135,10 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
 							unsigned int alloc_flags)
 {
-	int i;
-
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
-		for (i = 0; i < (1 << order); i++)
-			clear_highpage(page + i);
+	if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
+		kernel_init_free_pages(page, 1 << order);
 
 	if (order && (gfp_flags & __GFP_COMP))
 		prep_compound_page(page, order);
diff --git a/mm/slab.c b/mm/slab.c
index f7117ad9b3a3..98a89d7c922d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1830,6 +1830,14 @@ static bool set_objfreelist_slab_cache(struct kmem_cache *cachep,
 
 	cachep->num = 0;
 
+	/*
+	 * If slab auto-initialization on free is enabled, store the freelist
+	 * off-slab, so that its contents don't end up in one of the allocated
+	 * objects.
+	 */
+	if (unlikely(slab_want_init_on_free(cachep)))
+		return false;
+
 	if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
 		return false;
 
@@ -3263,7 +3271,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
 		memset(ptr, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
@@ -3320,7 +3328,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
+	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
 		memset(objp, 0, cachep->object_size);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
@@ -3441,6 +3449,8 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
 	struct array_cache *ac = cpu_cache_get(cachep);
 
 	check_irq_off();
+	if (unlikely(slab_want_init_on_free(cachep)))
+		memset(objp, 0, cachep->object_size);
 	kmemleak_free_recursive(objp, cachep->flags);
 	objp = cache_free_debugcheck(cachep, objp, caller);
 
@@ -3528,7 +3538,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(slab_want_init_on_alloc(flags, s)))
 		for (i = 0; i < size; i++)
 			memset(p[i], 0, s->object_size);
 
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..d3f585e604bb 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -524,4 +524,24 @@ static inline int cache_random_seq_create(struct kmem_cache *cachep,
 static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
+static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_alloc)) {
+		if (c->ctor)
+			return false;
+		if (c->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON))
+			return flags & __GFP_ZERO;
+		return true;
+	}
+	return flags & __GFP_ZERO;
+}
+
+static inline bool slab_want_init_on_free(struct kmem_cache *c)
+{
+	if (static_branch_unlikely(&init_on_free))
+		return !(c->ctor ||
+			 (c->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)));
+	return false;
+}
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slub.c b/mm/slub.c
index cd04dbd2b5d0..3ccdab86f253 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1279,6 +1279,11 @@ static int __init setup_slub_debug(char *str)
 	if (*str == ',')
 		slub_debug_slabs = str + 1;
 out:
+	if ((static_branch_unlikely(&init_on_alloc) ||
+	     static_branch_unlikely(&init_on_free)) &&
+	    (slub_debug & SLAB_POISON)) {
+		pr_warn("mem auto-init: SLAB_POISON will take precedence over init_on_alloc/init_on_free\n");
+	}
 	return 1;
 }
 
@@ -1424,6 +1429,28 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s, void *x)
 static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 					   void **head, void **tail)
 {
+
+	void *object;
+	void *next = *head;
+	void *old_tail = *tail ? *tail : *head;
+	int rsize;
+
+	if (slab_want_init_on_free(s))
+		do {
+			object = next;
+			next = get_freepointer(s, object);
+			/*
+			 * Clear the object and the metadata, but don't touch
+			 * the redzone.
+			 */
+			memset(object, 0, s->object_size);
+			rsize = (s->flags & SLAB_RED_ZONE) ? s->red_left_pad
+							   : 0;
+			memset((char *)object + s->inuse, 0,
+			       s->size - s->inuse - rsize);
+			set_freepointer(s, object, next);
+		} while (object != old_tail);
+
 /*
  * Compiler cannot detect this function can be removed if slab_free_hook()
  * evaluates to nothing.  Thus, catch all relevant config debug options here.
@@ -1433,9 +1460,7 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
 	defined(CONFIG_DEBUG_OBJECTS_FREE) ||	\
 	defined(CONFIG_KASAN)
 
-	void *object;
-	void *next = *head;
-	void *old_tail = *tail ? *tail : *head;
+	next = *head;
 
 	/* Head and tail of the reconstructed freelist */
 	*head = NULL;
@@ -2741,8 +2766,14 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		prefetch_freepointer(s, next_object);
 		stat(s, ALLOC_FASTPATH);
 	}
+	/*
+	 * If the object has been wiped upon free, make sure it's fully
+	 * initialized by zeroing out freelist pointer.
+	 */
+	if (unlikely(slab_want_init_on_free(s)) && object)
+		memset(object + s->offset, 0, sizeof(void *));
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
+	if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
 		memset(object, 0, s->object_size);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
@@ -3163,7 +3194,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(slab_want_init_on_alloc(flags, s))) {
 		int j;
 
 		for (j = 0; j < i; j++)
diff --git a/net/core/sock.c b/net/core/sock.c
index af09a23e4822..425e97f693ce 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1596,7 +1596,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_on_alloc(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index c6cb2d9b2905..a1ffe2eb4d5f 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -160,6 +160,35 @@ config STACKLEAK_RUNTIME_DISABLE
 	  runtime to control kernel stack erasing for kernels built with
 	  CONFIG_GCC_PLUGIN_STACKLEAK.
 
+config INIT_ON_ALLOC_DEFAULT_ON
+	bool "Enable heap memory zeroing on allocation by default"
+	help
+	  This has the effect of setting "init_on_alloc=1" on the kernel
+	  command line. This can be disabled with "init_on_alloc=0".
+	  When "init_on_alloc" is enabled, all page allocator and slab
+	  allocator memory will be zeroed when allocated, eliminating
+	  many kinds of "uninitialized heap memory" flaws, especially
+	  heap content exposures. The performance impact varies by
+	  workload, but most cases see <1% impact. Some synthetic
+	  workloads have measured as high as 7%.
+
+config INIT_ON_FREE_DEFAULT_ON
+	bool "Enable heap memory zeroing on free by default"
+	help
+	  This has the effect of setting "init_on_free=1" on the kernel
+	  command line. This can be disabled with "init_on_free=0".
+	  Similar to "init_on_alloc", when "init_on_free" is enabled,
+	  all page allocator and slab allocator memory will be zeroed
+	  when freed, eliminating many kinds of "uninitialized heap memory"
+	  flaws, especially heap content exposures. The primary difference
+	  with "init_on_free" is that data lifetime in memory is reduced,
+	  as anything freed is wiped immediately, making live forensics or
+	  cold boot memory attacks unable to recover freed memory contents.
+	  The performance impact varies by workload, but is more expensive
+	  than "init_on_alloc" due to the negative cache effects of
+	  touching "cold" memory areas. Most cases see 3-5% impact. Some
+	  synthetic workloads have measured as high as 8%.
+
 endmenu
 
 endmenu
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related

* [PATCH v9 2/2] mm: init: report memory auto-initialization features at boot time
From: Alexander Potapenko @ 2019-06-27 13:03 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter
  Cc: Alexander Potapenko, Kees Cook, Dmitry Vyukov, James Morris,
	Jann Horn, Kostya Serebryany, Laura Abbott, Mark Rutland,
	Masahiro Yamada, Matthew Wilcox, Nick Desaulniers, Randy Dunlap,
	Sandeep Patil, Serge E. Hallyn, Souptick Joarder, Marco Elver,
	Kaiwan N Billimoria, kernel-hardening, linux-mm,
	linux-security-module
In-Reply-To: <20190627130316.254309-1-glider@google.com>

Print the currently enabled stack and heap initialization modes.

Stack initialization is enabled by a config flag, while heap
initialization is configured at boot time with defaults being set
in the config. It's more convenient for the user to have all information
about these hardening measures in one place at boot, so the user can
reason about the expected behavior of the running system.

The possible options for stack are:
 - "all" for CONFIG_INIT_STACK_ALL;
 - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL;
 - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF;
 - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER;
 - "off" otherwise.

Depending on the values of init_on_alloc and init_on_free boottime
options we also report "heap alloc" and "heap free" as "on"/"off".

In the init_on_free mode initializing pages at boot time may take a
while, so print a notice about that as well. This depends on how much
memory is installed, the memory bandwidth, etc.
On a relatively modern x86 system, it takes about 0.75s/GB to wipe all
memory:

  [    0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on
  [    0.419765] mem auto-init: clearing system memory may take some time...
  [   12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved)

Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Kaiwan N Billimoria <kaiwan@kaiwantech.com>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

---
 v6:
 - update patch description, fixed message about clearing memory
 v7:
 - rebase the patch, add the Acked-by: tag;
 - more description updates as suggested by Kees;
 - make report_meminit() static.
 v8:
 - added the Signed-off-by: tag
---
 init/main.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/init/main.c b/init/main.c
index 66a196c5e4c3..ff5803b0841c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -520,6 +520,29 @@ static inline void initcall_debug_enable(void)
 }
 #endif
 
+/* Report memory auto-initialization states for this boot. */
+static void __init report_meminit(void)
+{
+	const char *stack;
+
+	if (IS_ENABLED(CONFIG_INIT_STACK_ALL))
+		stack = "all";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL))
+		stack = "byref_all";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF))
+		stack = "byref";
+	else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER))
+		stack = "__user";
+	else
+		stack = "off";
+
+	pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n",
+		stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off",
+		want_init_on_free() ? "on" : "off");
+	if (want_init_on_free())
+		pr_info("mem auto-init: clearing system memory may take some time...\n");
+}
+
 /*
  * Set up kernel memory allocators
  */
@@ -530,6 +553,7 @@ static void __init mm_init(void)
 	 * bigger than MAX_ORDER unless SPARSEMEM.
 	 */
 	page_ext_init_flatmem();
+	report_meminit();
 	mem_init();
 	kmem_cache_init();
 	pgtable_init();
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related

* [PATCH] apparmor: fix unsigned len comparison with less than zero
From: Colin King @ 2019-06-27 13:09 UTC (permalink / raw)
  To: John Johansen, James Morris, Serge E . Hallyn,
	linux-security-module
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The sanity check in macro update_for_len checks to see if len
is less than zero, however, len is a size_t so it can never be
less than zero, so this sanity check is a no-op.  Fix this by
making len a ssize_t so the comparison will work and add ulen
that is a size_t copy of len so that the min() macro won't
throw warnings about comparing different types.

Addresses-Coverity: ("Macro compares unsigned to 0")
Fixes: f1bd904175e8 ("apparmor: add the base fns() for domain labels")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 security/apparmor/label.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/security/apparmor/label.c b/security/apparmor/label.c
index 59f1cc2557a7..42e849ccb4b5 100644
--- a/security/apparmor/label.c
+++ b/security/apparmor/label.c
@@ -1458,11 +1458,13 @@ static inline bool use_label_hname(struct aa_ns *ns, struct aa_label *label,
 /* helper macro for snprint routines */
 #define update_for_len(total, len, size, str)	\
 do {					\
+	size_t ulen = len;		\
+					\
 	AA_BUG(len < 0);		\
-	total += len;			\
-	len = min(len, size);		\
-	size -= len;			\
-	str += len;			\
+	total += ulen;			\
+	len = min(ulen, size);		\
+	size -= ulen;			\
+	str += ulen;			\
 } while (0)
 
 /**
@@ -1597,7 +1599,7 @@ int aa_label_snxprint(char *str, size_t size, struct aa_ns *ns,
 	struct aa_ns *prev_ns = NULL;
 	struct label_it i;
 	int count = 0, total = 0;
-	size_t len;
+	ssize_t len;
 
 	AA_BUG(!str && size != 0);
 	AA_BUG(!label);
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v8 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Alexander Potapenko @ 2019-06-27 13:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, Qian Cai,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening
In-Reply-To: <20190626154237.GZ17798@dhcp22.suse.cz>

On Wed, Jun 26, 2019 at 5:42 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 26-06-19 17:00:43, Alexander Potapenko wrote:
> > On Wed, Jun 26, 2019 at 4:49 PM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > > @@ -1142,6 +1200,8 @@ static __always_inline bool free_pages_prepare(struct page *page,
> > > >       }
> > > >       arch_free_page(page, order);
> > > >       kernel_poison_pages(page, 1 << order, 0);
> > > > +     if (want_init_on_free())
> > > > +             kernel_init_free_pages(page, 1 << order);
> > >
> > > same here. If you don't want to make this exclusive then you have to
> > > zero before poisoning otherwise you are going to blow up on the poison
> > > check, right?
> > Note that we disable initialization if page poisoning is on.
>
> Ohh, right. Missed that in the init code.
>
> > As I mentioned on another thread we can eventually merge this code
> > with page poisoning, but right now it's better to make the user decide
> > which of the features they want instead of letting them guess how the
> > combination of the two is going to work.
>
> Strictly speaking zeroying is a subset of poisoning. If somebody asks
> for both the poisoning surely satisfies any data leak guarantees
> zeroying would give. So I am not sure we have to really make them
> exclusive wrt. to the configuraion. I will leave that to you but it
> would be better if the code didn't break subtly once the early init
> restriction is removed for one way or another. So either always make
> sure that zeroying is done _before_ poisoning or that you do not zero
> when poisoning. The later sounds the best wrt. the code quality from my
> POV.
I somewhat liked the idea of always having zero-initialized page/heap
memory if init_on_{alloc,free} is on.
But in production mode we won't have page or slab poisoning anyway,
and for debugging this doesn't really matter much.
I've sent v9 with poisoning support added.
> --
> Michal Hocko
> SUSE Labs



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply

* Re: [PATCH v8 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Alexander Potapenko @ 2019-06-27 13:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Kees Cook, Masahiro Yamada, Michal Hocko,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, Qian Cai,
	Linux Memory Management List, linux-security-module,
	Kernel Hardening
In-Reply-To: <20190626162835.0947684d36ef01639f969232@linux-foundation.org>

On Thu, Jun 27, 2019 at 1:28 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 26 Jun 2019 14:19:42 +0200 Alexander Potapenko <glider@google.com> wrote:
>
> >  v8:
> >   - addressed comments by Michal Hocko: revert kernel/kexec_core.c and
> >     apply initialization in dma_pool_free()
> >   - disable init_on_alloc/init_on_free if slab poisoning or page
> >     poisoning are enabled, as requested by Qian Cai
> >   - skip the redzone when initializing a freed heap object, as requested
> >     by Qian Cai and Kees Cook
> >   - use s->offset to address the freeptr (suggested by Kees Cook)
> >   - updated the patch description, added Signed-off-by: tag
>
> v8 failed to incorporate
>
> https://ozlabs.org/~akpm/mmots/broken-out/mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options-fix.patch
> and
> https://ozlabs.org/~akpm/mmots/broken-out/mm-security-introduce-init_on_alloc=1-and-init_on_free=1-boot-options-fix-2.patch
>
> it's conventional to incorporate such fixes when preparing a new
> version of a patch.
>
v9 contains these patches (I've also exported init_on_free), so should
now be fine to drop them.

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply

* Re: [PATCH v9 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Qian Cai @ 2019-06-27 13:25 UTC (permalink / raw)
  To: Alexander Potapenko, Andrew Morton, Christoph Lameter, Kees Cook
  Cc: Masahiro Yamada, Michal Hocko, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov, Sandeep Patil,
	Laura Abbott, Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	linux-mm, linux-security-module, kernel-hardening
In-Reply-To: <20190627130316.254309-2-glider@google.com>

On Thu, 2019-06-27 at 15:03 +0200, Alexander Potapenko wrote:
> The new options are needed to prevent possible information leaks and
> make control-flow bugs that depend on uninitialized values more
> deterministic.
> 
> This is expected to be on-by-default on Android and Chrome OS. And it
> gives the opportunity for anyone else to use it under distros too via
> the boot args. (The init_on_free feature is regularly requested by
> folks where memory forensics is included in their threat models.)
> 
> init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> objects with zeroes. Initialization is done at allocation time at the
> places where checks for __GFP_ZERO are performed.
> 
> init_on_free=1 makes the kernel initialize freed pages and heap objects
> with zeroes upon their deletion. This helps to ensure sensitive data
> doesn't leak via use-after-free accesses.
> 
> Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
> returns zeroed memory. The two exceptions are slab caches with
> constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
> zero-initialized to preserve their semantics.
> 
> Both init_on_alloc and init_on_free default to zero, but those defaults
> can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
> CONFIG_INIT_ON_FREE_DEFAULT_ON.
> 
> If either SLUB poisoning or page poisoning is enabled, those options
> take precedence over init_on_alloc and init_on_free: initialization is
> only applied to unpoisoned allocations.
> 
> Slowdown for the new features compared to init_on_free=0,
> init_on_alloc=0:
> 
> hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
> 
> Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
> 
> The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> baseline is within the standard error.
> 
> The new features are also going to pave the way for hardware memory
> tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
> hooks to set the tags for heap objects. With MTE, tagging will have the
> same cost as memory initialization.
> 
> Although init_on_free is rather costly, there are paranoid use-cases where
> in-memory data lifetime is desired to be minimized. There are various
> arguments for/against the realism of the associated threat models, but
> given that we'll need the infrastructure for MTE anyway, and there are
> people who want wipe-on-free behavior no matter what the performance cost,
> it seems reasonable to include it in this series.
> 
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Acked-by: Kees Cook <keescook@chromium.org>
> To: Andrew Morton <akpm@linux-foundation.org>
> To: Christoph Lameter <cl@linux.com>
> To: Kees Cook <keescook@chromium.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marco Elver <elver@google.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>  v2:
>   - unconditionally initialize pages in kernel_init_free_pages()
>   - comment from Randy Dunlap: drop 'default false' lines from
> Kconfig.hardening
>  v3:
>   - don't call kernel_init_free_pages() from memblock_free_pages()
>   - adopted some Kees' comments for the patch description
>  v4:
>   - use NULL instead of 0 in slab_alloc_node() (found by kbuild test robot)
>   - don't write to NULL object in slab_alloc_node() (found by Android
>     testing)
>  v5:
>   - adjusted documentation wording as suggested by Kees
>   - disable SLAB_POISON if auto-initialization is on
>   - don't wipe RCU cache allocations made without __GFP_ZERO
>   - dropped SLOB support
>  v7:
>   - rebase the patch, added the Acked-by: tag
>  v8:
>   - addressed comments by Michal Hocko: revert kernel/kexec_core.c and
>     apply initialization in dma_pool_free()
>   - disable init_on_alloc/init_on_free if slab poisoning or page
>     poisoning are enabled, as requested by Qian Cai
>   - skip the redzone when initializing a freed heap object, as requested
>     by Qian Cai and Kees Cook
>   - use s->offset to address the freeptr (suggested by Kees Cook)
>   - updated the patch description, added Signed-off-by: tag
>  v9:
>   - picked up -mm fixes from Qian Cai and Andrew Morton (order of calls
>     in free_pages_prepare(), export init_on_alloc)
>   - exported init_on_free
>   - allowed using init_on_alloc/init_on_free with SLUB poisoning and
>     page poisoning. Poisoning supersedes zero-initialization, so some
>     tests may behave differently with poisoning enabled.
> ---
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  drivers/infiniband/core/uverbs_ioctl.c        |  2 +-
>  include/linux/mm.h                            | 24 +++++++
>  mm/dmapool.c                                  |  4 +-
>  mm/page_alloc.c                               | 71 +++++++++++++++++--
>  mm/slab.c                                     | 16 ++++-
>  mm/slab.h                                     | 20 ++++++
>  mm/slub.c                                     | 41 +++++++++--
>  net/core/sock.c                               |  2 +-
>  security/Kconfig.hardening                    | 29 ++++++++
>  10 files changed, 200 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 138f6664b2e2..84ee1121a2b9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1673,6 +1673,15 @@
>  
>  	initrd=		[BOOT] Specify the location of the initial
> ramdisk
>  
> +	init_on_alloc=	[MM] Fill newly allocated pages and heap
> objects with
> +			zeroes.
> +			Format: 0 | 1
> +			Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
> +
> +	init_on_free=	[MM] Fill freed pages and heap objects with
> zeroes.
> +			Format: 0 | 1
> +			Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
> +
>  	init_pkru=	[x86] Specify the default memory protection keys
> rights
>  			register contents for all processes.  0x55555554 by
>  			default (disallow access to all but pkey 0).  Can
> diff --git a/drivers/infiniband/core/uverbs_ioctl.c
> b/drivers/infiniband/core/uverbs_ioctl.c
> index 829b0c6944d8..61758201d9b2 100644
> --- a/drivers/infiniband/core/uverbs_ioctl.c
> +++ b/drivers/infiniband/core/uverbs_ioctl.c
> @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle
> *bundle, size_t size,
>  	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
>  	pbundle->internal_used =
>  		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> -	if (flags & __GFP_ZERO)
> +	if (want_init_on_alloc(flags))
>  		memset(res, 0, size);
>  	return res;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index dd0b5f4e1e45..81b582657854 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2696,6 +2696,30 @@ static inline void kernel_poison_pages(struct page
> *page, int numpages,
>  					int enable) { }
>  #endif
>  
> +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> +DECLARE_STATIC_KEY_TRUE(init_on_alloc);
> +#else
> +DECLARE_STATIC_KEY_FALSE(init_on_alloc);
> +#endif
> +static inline bool want_init_on_alloc(gfp_t flags)
> +{
> +	if (static_branch_unlikely(&init_on_alloc) &&
> +	    !page_poisoning_enabled())
> +		return true;
> +	return flags & __GFP_ZERO;
> +}
> +
> +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
> +DECLARE_STATIC_KEY_TRUE(init_on_free);
> +#else
> +DECLARE_STATIC_KEY_FALSE(init_on_free);
> +#endif
> +static inline bool want_init_on_free(void)
> +{
> +	return static_branch_unlikely(&init_on_free) &&
> +	       !page_poisoning_enabled();
> +}
> +
>  extern bool _debug_pagealloc_enabled;
>  
>  static inline bool debug_pagealloc_enabled(void)
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 8c94c89a6f7e..fe5d33060415 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t
> mem_flags,
>  #endif
>  	spin_unlock_irqrestore(&pool->lock, flags);
>  
> -	if (mem_flags & __GFP_ZERO)
> +	if (want_init_on_alloc(mem_flags))
>  		memset(retval, 0, pool->size);
>  
>  	return retval;
> @@ -428,6 +428,8 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr,
> dma_addr_t dma)
>  	}
>  
>  	offset = vaddr - page->vaddr;
> +	if (want_init_on_free())
> +		memset(vaddr, 0, pool->size);
>  #ifdef	DMAPOOL_DEBUG
>  	if ((dma - page->dma) != offset) {
>  		spin_unlock_irqrestore(&pool->lock, flags);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..c3123fa41bba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -136,6 +136,55 @@ unsigned long totalcma_pages __read_mostly;
>  
>  int percpu_pagelist_fraction;
>  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> +#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> +DEFINE_STATIC_KEY_TRUE(init_on_alloc);
> +#else
> +DEFINE_STATIC_KEY_FALSE(init_on_alloc);
> +#endif
> +EXPORT_SYMBOL(init_on_alloc);
> +
> +#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
> +DEFINE_STATIC_KEY_TRUE(init_on_free);
> +#else
> +DEFINE_STATIC_KEY_FALSE(init_on_free);
> +#endif
> +EXPORT_SYMBOL(init_on_free);
> +
> +static int __init early_init_on_alloc(char *buf)
> +{
> +	int ret;
> +	bool bool_result;
> +
> +	if (!buf)
> +		return -EINVAL;
> +	ret = kstrtobool(buf, &bool_result);
> +	if (bool_result && IS_ENABLED(CONFIG_PAGE_POISONING))
> +		pr_warn("mem auto-init: CONFIG_PAGE_POISONING is on, will
> take precedence over init_on_alloc\n");

I don't like the warning here. It makes people think it is bug that need to be
fixed, but actually it is just information. People could enable both in a debug
kernel.

> +	if (bool_result)
> +		static_branch_enable(&init_on_alloc);
> +	else
> +		static_branch_disable(&init_on_alloc);
> +	return ret;
> +}
> +early_param("init_on_alloc", early_init_on_alloc);
> +
> +static int __init early_init_on_free(char *buf)
> +{
> +	int ret;
> +	bool bool_result;
> +
> +	if (!buf)
> +		return -EINVAL;
> +	ret = kstrtobool(buf, &bool_result);
> +	if (bool_result && IS_ENABLED(CONFIG_PAGE_POISONING))
> +		pr_warn("mem auto-init: CONFIG_PAGE_POISONING is on, will
> take precedence over init_on_free\n");

Ditto.

> +	if (bool_result)
> +		static_branch_enable(&init_on_free);
> +	else
> +		static_branch_disable(&init_on_free);
> +	return ret;
> +}
> +early_param("init_on_free", early_init_on_free);
>  
>  /*
>   * A cached value of the page's pageblock's migratetype, used when the page
> is
> @@ -1090,6 +1139,14 @@ static int free_tail_pages_check(struct page
> *head_page, struct page *page)
>  	return ret;
>  }
>  
> +static void kernel_init_free_pages(struct page *page, int numpages)
> +{
> +	int i;
> +
> +	for (i = 0; i < numpages; i++)
> +		clear_highpage(page + i);
> +}
> +
>  static __always_inline bool free_pages_prepare(struct page *page,
>  					unsigned int order, bool check_free)
>  {
> @@ -1141,6 +1198,9 @@ static __always_inline bool free_pages_prepare(struct
> page *page,
>  					   PAGE_SIZE << order);
>  	}
>  	arch_free_page(page, order);
> +	if (want_init_on_free())
> +		kernel_init_free_pages(page, 1 << order);
> +
>  	kernel_poison_pages(page, 1 << order, 0);
>  	if (debug_pagealloc_enabled())
>  		kernel_map_pages(page, 1 << order, 0);
> @@ -2020,8 +2080,8 @@ static inline int check_new_page(struct page *page)
>  
>  static inline bool free_pages_prezeroed(void)
>  {
> -	return IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
> -		page_poisoning_enabled();
> +	return (IS_ENABLED(CONFIG_PAGE_POISONING_ZERO) &&
> +		page_poisoning_enabled()) || want_init_on_free();
>  }
>  
>  #ifdef CONFIG_DEBUG_VM
> @@ -2075,13 +2135,10 @@ inline void post_alloc_hook(struct page *page,
> unsigned int order,
>  static void prep_new_page(struct page *page, unsigned int order, gfp_t
> gfp_flags,
>  							unsigned int
> alloc_flags)
>  {
> -	int i;
> -
>  	post_alloc_hook(page, order, gfp_flags);
>  
> -	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> -		for (i = 0; i < (1 << order); i++)
> -			clear_highpage(page + i);
> +	if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
> +		kernel_init_free_pages(page, 1 << order);
>  
>  	if (order && (gfp_flags & __GFP_COMP))
>  		prep_compound_page(page, order);
> diff --git a/mm/slab.c b/mm/slab.c
> index f7117ad9b3a3..98a89d7c922d 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1830,6 +1830,14 @@ static bool set_objfreelist_slab_cache(struct
> kmem_cache *cachep,
>  
>  	cachep->num = 0;
>  
> +	/*
> +	 * If slab auto-initialization on free is enabled, store the freelist
> +	 * off-slab, so that its contents don't end up in one of the
> allocated
> +	 * objects.
> +	 */
> +	if (unlikely(slab_want_init_on_free(cachep)))
> +		return false;
> +
>  	if (cachep->ctor || flags & SLAB_TYPESAFE_BY_RCU)
>  		return false;
>  
> @@ -3263,7 +3271,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags,
> int nodeid,
>  	local_irq_restore(save_flags);
>  	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
>  
> -	if (unlikely(flags & __GFP_ZERO) && ptr)
> +	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
>  		memset(ptr, 0, cachep->object_size);
>  
>  	slab_post_alloc_hook(cachep, flags, 1, &ptr);
> @@ -3320,7 +3328,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags,
> unsigned long caller)
>  	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
>  	prefetchw(objp);
>  
> -	if (unlikely(flags & __GFP_ZERO) && objp)
> +	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
>  		memset(objp, 0, cachep->object_size);
>  
>  	slab_post_alloc_hook(cachep, flags, 1, &objp);
> @@ -3441,6 +3449,8 @@ void ___cache_free(struct kmem_cache *cachep, void
> *objp,
>  	struct array_cache *ac = cpu_cache_get(cachep);
>  
>  	check_irq_off();
> +	if (unlikely(slab_want_init_on_free(cachep)))
> +		memset(objp, 0, cachep->object_size);
>  	kmemleak_free_recursive(objp, cachep->flags);
>  	objp = cache_free_debugcheck(cachep, objp, caller);
>  
> @@ -3528,7 +3538,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t
> flags, size_t size,
>  	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
>  
>  	/* Clear memory outside IRQ disabled section */
> -	if (unlikely(flags & __GFP_ZERO))
> +	if (unlikely(slab_want_init_on_alloc(flags, s)))
>  		for (i = 0; i < size; i++)
>  			memset(p[i], 0, s->object_size);
>  
> diff --git a/mm/slab.h b/mm/slab.h
> index 43ac818b8592..d3f585e604bb 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -524,4 +524,24 @@ static inline int cache_random_seq_create(struct
> kmem_cache *cachep,
>  static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
>  #endif /* CONFIG_SLAB_FREELIST_RANDOM */
>  
> +static inline bool slab_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
> +{
> +	if (static_branch_unlikely(&init_on_alloc)) {
> +		if (c->ctor)
> +			return false;
> +		if (c->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON))
> +			return flags & __GFP_ZERO;
> +		return true;
> +	}
> +	return flags & __GFP_ZERO;
> +}
> +
> +static inline bool slab_want_init_on_free(struct kmem_cache *c)
> +{
> +	if (static_branch_unlikely(&init_on_free))
> +		return !(c->ctor ||
> +			 (c->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)));
> +	return false;
> +}
> +
>  #endif /* MM_SLAB_H */
> diff --git a/mm/slub.c b/mm/slub.c
> index cd04dbd2b5d0..3ccdab86f253 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1279,6 +1279,11 @@ static int __init setup_slub_debug(char *str)
>  	if (*str == ',')
>  		slub_debug_slabs = str + 1;
>  out:
> +	if ((static_branch_unlikely(&init_on_alloc) ||
> +	     static_branch_unlikely(&init_on_free)) &&
> +	    (slub_debug & SLAB_POISON)) {
> +		pr_warn("mem auto-init: SLAB_POISON will take precedence over
> init_on_alloc/init_on_free\n");
> +	}

Ditto

^ permalink raw reply

* Re: linux-next: Tree for Jun 26 (security/integrity/ima/)
From: David Howells @ 2019-06-27 13:29 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: dhowells, Randy Dunlap, Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, linux-integrity, Dmitry Kasatkin,
	linux-security-module
In-Reply-To: <1561640534.4101.124.camel@linux.ibm.com>

Mimi Zohar <zohar@linux.ibm.com> wrote:

> >   CC      security/integrity/ima/ima_fs.o
> > In file included from ../security/integrity/ima/ima.h:25:0,
> >                  from ../security/integrity/ima/ima_fs.c:26:
> > ../security/integrity/ima/../integrity.h:170:18: warning: ‘struct key_acl’ declared inside parameter list [enabled by default]
> >            struct key_acl *acl)
> >                   ^
> > ../security/integrity/ima/../integrity.h:170:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
> 
> David, CONFIG_INTEGRITY_SIGNATURE is dependent on KEYS being enabled,
> but the stub functions are not.  There's now a dependency on
> key_acl().

I added a forward declaration for struct key_acl into
security/integrity/integrity.h as you can see here:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/diff/security/integrity/integrity.h?h=keys-acl&id=75ce113a1d56880e5abd37fa664ea9af399d2bcd

which might not have made it into linux-next before you used it.

David

^ permalink raw reply

* Re: LSM module for SGX?
From: Stephen Smalley @ 2019-06-27 13:41 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-security-module, linux-sgx, x86
  Cc: casey.schaufler, jmorris, luto, sean.j.christopherson
In-Reply-To: <320da9183c7e9ae2c63982d5e124054a615c4b99.camel@linux.intel.com>

On 6/27/19 8:56 AM, Jarkko Sakkinen wrote:
> Looking at the SGX-LSM discussions I haven't seen even a single email
> that would have any conclusions that the new hooks are the only possible
> route to limit the privileges to use SGX.
> 
> An obvious alternative to consider might be to have a small-scale LSM
> that you could stack. AFAIK Casey's LSM stacking patch set has not yet
> landed but I also remember that with some constraints you can still do
> it. Casey explained these constraints to me few years ago but I can't
> recall them anymore :-)
> 
> One example of this is Yama, which limits the use of ptrace(). You can
> enable it together with any of the "big" LSMs in the kernel.
> 
> A major benefit in this approach would that it is non-intrusive when it
> comes to other architectures than x86. New hooks are not only
> maintenance burden for those who care about SGX but also for those who
> have to deal with LSMs.

Regardless of whether or not you or anyone else creates such a 
small-scale LSM, we would still want to be able to control the loading 
of enclaves and the creation of executable enclave mappings via SELinux 
policy, so the hooks would be necessary anyway.


^ permalink raw reply

* Re: [PATCH v9 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Michal Hocko @ 2019-06-27 13:57 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andrew Morton, Christoph Lameter, Kees Cook, Masahiro Yamada,
	James Morris, Serge E. Hallyn, Nick Desaulniers,
	Kostya Serebryany, Dmitry Vyukov, Sandeep Patil, Laura Abbott,
	Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver, Qian Cai,
	linux-mm, linux-security-module, kernel-hardening
In-Reply-To: <20190627130316.254309-2-glider@google.com>

On Thu 27-06-19 15:03:15, Alexander Potapenko wrote:
> The new options are needed to prevent possible information leaks and
> make control-flow bugs that depend on uninitialized values more
> deterministic.
> 
> This is expected to be on-by-default on Android and Chrome OS. And it
> gives the opportunity for anyone else to use it under distros too via
> the boot args. (The init_on_free feature is regularly requested by
> folks where memory forensics is included in their threat models.)
> 
> init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
> objects with zeroes. Initialization is done at allocation time at the
> places where checks for __GFP_ZERO are performed.
> 
> init_on_free=1 makes the kernel initialize freed pages and heap objects
> with zeroes upon their deletion. This helps to ensure sensitive data
> doesn't leak via use-after-free accesses.
> 
> Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
> returns zeroed memory. The two exceptions are slab caches with
> constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
> zero-initialized to preserve their semantics.
> 
> Both init_on_alloc and init_on_free default to zero, but those defaults
> can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
> CONFIG_INIT_ON_FREE_DEFAULT_ON.
> 
> If either SLUB poisoning or page poisoning is enabled, those options
> take precedence over init_on_alloc and init_on_free: initialization is
> only applied to unpoisoned allocations.
> 
> Slowdown for the new features compared to init_on_free=0,
> init_on_alloc=0:
> 
> hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
> hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
> 
> Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
> Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
> Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
> Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
> 
> The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> baseline is within the standard error.
> 
> The new features are also going to pave the way for hardware memory
> tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
> hooks to set the tags for heap objects. With MTE, tagging will have the
> same cost as memory initialization.
> 
> Although init_on_free is rather costly, there are paranoid use-cases where
> in-memory data lifetime is desired to be minimized. There are various
> arguments for/against the realism of the associated threat models, but
> given that we'll need the infrastructure for MTE anyway, and there are
> people who want wipe-on-free behavior no matter what the performance cost,
> it seems reasonable to include it in this series.
> 
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Acked-by: Kees Cook <keescook@chromium.org>
> To: Andrew Morton <akpm@linux-foundation.org>
> To: Christoph Lameter <cl@linux.com>
> To: Kees Cook <keescook@chromium.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marco Elver <elver@google.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Acked-by: Michal Hocko <mhocko@suse.com> # page and dmapool parts
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH V33 24/30] bpf: Restrict bpf when kernel lockdown is in confidentiality mode
From: Stephen Smalley @ 2019-06-27 14:35 UTC (permalink / raw)
  To: Andy Lutomirski, James Morris
  Cc: Andy Lutomirski, Matthew Garrett, linux-security, LKML, Linux API,
	David Howells, Alexei Starovoitov, Matthew Garrett,
	Network Development, Chun-Yi Lee, Daniel Borkmann,
	linux-security-module
In-Reply-To: <6E53376F-01BB-4795-BC02-24F9CAE00001@amacapital.net>

On 6/26/19 8:57 PM, Andy Lutomirski wrote:
> 
> 
>> On Jun 26, 2019, at 1:22 PM, James Morris <jmorris@namei.org> wrote:
>>
>> [Adding the LSM mailing list: missed this patchset initially]
>>
>>> On Thu, 20 Jun 2019, Andy Lutomirski wrote:
>>>
>>> This patch exemplifies why I don't like this approach:
>>>
>>>> @@ -97,6 +97,7 @@ enum lockdown_reason {
>>>>         LOCKDOWN_INTEGRITY_MAX,
>>>>         LOCKDOWN_KCORE,
>>>>         LOCKDOWN_KPROBES,
>>>> +       LOCKDOWN_BPF,
>>>>         LOCKDOWN_CONFIDENTIALITY_MAX,
>>>
>>>> --- a/security/lockdown/lockdown.c
>>>> +++ b/security/lockdown/lockdown.c
>>>> @@ -33,6 +33,7 @@ static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
>>>>         [LOCKDOWN_INTEGRITY_MAX] = "integrity",
>>>>         [LOCKDOWN_KCORE] = "/proc/kcore access",
>>>>         [LOCKDOWN_KPROBES] = "use of kprobes",
>>>> +       [LOCKDOWN_BPF] = "use of bpf",
>>>>         [LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
>>>
>>> The text here says "use of bpf", but what this patch is *really* doing
>>> is locking down use of BPF to read kernel memory.  If the details
>>> change, then every LSM needs to get updated, and we risk breaking user
>>> policies that are based on LSMs that offer excessively fine
>>> granularity.
>>
>> Can you give an example of how the details might change?
>>
>>> I'd be more comfortable if the LSM only got to see "confidentiality"
>>> or "integrity".
>>
>> These are not sufficient for creating a useful policy for the SELinux
>> case.
>>
>>
> 
> I may have misunderstood, but I thought that SELinux mainly needed to allow certain privileged programs to bypass the policy.  Is there a real example of what SELinux wants to do that can’t be done in the simplified model?
> 
> The think that specifically makes me uneasy about exposing all of these precise actions to LSMs is that they will get exposed to userspace in a way that forces us to treat them as stable ABIs.

There are two scenarios where finer-grained distinctions make sense:

- Users may need to enable specific functionality that falls under the 
umbrella of "confidentiality" or "integrity" lockdown.  Finer-grained 
lockdown reasons free them from having to make an all-or-nothing choice 
between lost functionality or no lockdown at all. This can be supported 
directly by the lockdown module without any help from SELinux or other 
security modules; we just need the ability to specify these 
finer-grained lockdown levels via the boot parameters and securityfs nodes.

- Different processes/programs may need to use different sets of 
functionality restricted via lockdown confidentiality or integrity 
categories.  If we have to allow all-or-none for the set of 
interfaces/functionality covered by the generic confidentiality or 
integrity categories, then we'll end up having to choose between lost 
functionality or overprivileged processes, neither of which is optimal.

Is it truly the case that everything under the "confidentiality" 
category poses the same level of risk to kernel confidentiality, and 
similarly for everything under the "integrity" category?  If not, then 
being able to distinguish them definitely has benefit.

I'm still not clear though on how/if this will compose with or be 
overridden by other security modules.  We would need some means for 
another security module to take over lockdown decisions once it has 
initialized (including policy load), and to be able to access state that 
is currently private to the lockdown module, like the level.

^ permalink raw reply

* Re: linux-next: Tree for Jun 26 (security/integrity/ima/)
From: Randy Dunlap @ 2019-06-27 15:08 UTC (permalink / raw)
  To: David Howells, Mimi Zohar
  Cc: Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, linux-integrity, Dmitry Kasatkin,
	linux-security-module
In-Reply-To: <9446.1561642145@warthog.procyon.org.uk>

On 6/27/19 6:29 AM, David Howells wrote:
> Mimi Zohar <zohar@linux.ibm.com> wrote:
> 
>>>   CC      security/integrity/ima/ima_fs.o
>>> In file included from ../security/integrity/ima/ima.h:25:0,
>>>                  from ../security/integrity/ima/ima_fs.c:26:
>>> ../security/integrity/ima/../integrity.h:170:18: warning: ‘struct key_acl’ declared inside parameter list [enabled by default]
>>>            struct key_acl *acl)
>>>                   ^
>>> ../security/integrity/ima/../integrity.h:170:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
>>
>> David, CONFIG_INTEGRITY_SIGNATURE is dependent on KEYS being enabled,
>> but the stub functions are not.  There's now a dependency on
>> key_acl().
> 
> I added a forward declaration for struct key_acl into
> security/integrity/integrity.h as you can see here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/diff/security/integrity/integrity.h?h=keys-acl&id=75ce113a1d56880e5abd37fa664ea9af399d2bcd
> 
> which might not have made it into linux-next before you used it.

No problem in linux-next 20190627.

Thanks.

-- 
~Randy

^ permalink raw reply

* Re: [PATCH V10 2/3] IMA: Define a new template field buf
From: Mimi Zohar @ 2019-06-27 15:08 UTC (permalink / raw)
  To: Thiago Jung Bauermann, Prakhar Srivastava
  Cc: linux-integrity, linux-security-module, linux-kernel,
	roberto.sassu, vgoyal
In-Reply-To: <87ftnyk5e0.fsf@morokweng.localdomain>

On Mon, 2019-06-24 at 19:03 -0300, Thiago Jung Bauermann wrote:
> Hello Prakhar,
> 
> Prakhar Srivastava <prsriva02@gmail.com> writes:
> 
> > diff --git a/security/integrity/ima/ima_template.c b/security/integrity/ima/ima_template.c
> > index 00dd5a434689..a01a17e5c581 100644
> > --- a/security/integrity/ima/ima_template.c
> > +++ b/security/integrity/ima/ima_template.c
> > @@ -26,6 +26,7 @@ static struct ima_template_desc builtin_templates[] = {
> >  	{.name = IMA_TEMPLATE_IMA_NAME, .fmt = IMA_TEMPLATE_IMA_FMT},
> >  	{.name = "ima-ng", .fmt = "d-ng|n-ng"},
> >  	{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
> > +	{.name = "ima-buf", .fmt = "d-ng|n-ng|buf"},
> >  	{.name = "", .fmt = ""},	/* placeholder for a custom format */
> >  };
> >
> > @@ -43,6 +44,8 @@ static const struct ima_template_field supported_fields[] = {
> >  	 .field_show = ima_show_template_string},
> >  	{.field_id = "sig", .field_init = ima_eventsig_init,
> >  	 .field_show = ima_show_template_sig},
> > +	{.field_id = "buf", .field_init = ima_eventbuf_init,
> > +	 .field_show = ima_show_template_buf},
> >  };
> >  #define MAX_TEMPLATE_NAME_LEN 15
> 
> Currently, MAX_TEMPLATE_NAME_LEN is the length of a template that
> contains all valid fields. It may make sense to increase it since
> there's a new field being added.
> 
> I suggest using a sizeof() to show where the number comes from (and
> which can be visually shown to be correct):
> 
> #define MAX_TEMPLATE_NAME_LEN sizeof("d|n|d-ng|n-ng|sig|buf")
> 
> The sizeof() is calculated at compile time.

MAX_TEMPLATE_NAME_LEN is used when restoring measurements carried over
from a kexec.  'd' and 'd-ng' should not both be defined in the
template description, nor should 'n' and 'n-ng'.  Even without the
duplication, the MAX_TEPLATE_NAME_LEN is greater than the current 15.

Thiago, could you address this as a separate patch?

thanks!

Mimi




^ permalink raw reply

* Re: [PATCH V34 09/29] kexec_file: Restrict at runtime if the kernel is locked down
From: Matthew Garrett @ 2019-06-27 15:28 UTC (permalink / raw)
  To: James Morris
  Cc: LSM List, Linux Kernel Mailing List, Linux API, Jiri Bohac,
	David Howells, kexec
In-Reply-To: <alpine.LRH.2.21.1906271423070.16512@namei.org>

On Wed, Jun 26, 2019 at 9:59 PM James Morris <jmorris@namei.org> wrote:
> This is not a criticism of the patch but a related issue which I haven't
> seen discussed (apologies if it has).
>
> If signed code is loaded into ring 0, verified by the kernel, then
> executed, you still lose your secure/trusted/verified boot state. If the
> currently running kernel has been runtime-compromised, any signature
> verification performed by the kernel cannot be trusted.
>
> This problem is out of scope for the lockdown threat model (which
> naturally cannot include a compromised kernel), but folk should be aware
> that signature-verified kexec does not provide equivalent assurance to a
> full reboot on a secure-boot system.

By that metric, on a secure boot system how do we determine that code
running in the firmware environment wasn't compromised before it
launched the initial signed kernel?

^ permalink raw reply

* Re: [PATCH V34 19/29] Lock down module params that specify hardware parameters (eg. ioport)
From: Matthew Garrett @ 2019-06-27 15:30 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: James Morris, LSM List, Linux Kernel Mailing List, Linux API,
	David Howells, Alan Cox
In-Reply-To: <87ef3f3ihh.fsf@dja-thinkpad.axtens.net>

On Wed, Jun 26, 2019 at 6:49 PM Daniel Axtens <dja@axtens.net> wrote:
>
> Matthew Garrett <matthewgarrett@google.com> writes:
> > +     if (kp->flags & KERNEL_PARAM_FL_HWPARAM &&
> > +         security_locked_down(LOCKDOWN_MODULE_PARAMETERS))
> > +             return false;
> > +     return true;
> >  }
>
> Should this test occur before tainting the kernel?

Seems reasonable.

^ permalink raw reply

* Re: [PATCH v8 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Kees Cook @ 2019-06-27 16:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Qian Cai, Catalin Marinas, Ard Biesheuvel, Alexander Potapenko,
	Andrew Morton, Christoph Lameter, Masahiro Yamada, James Morris,
	Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
	Dmitry Vyukov, Sandeep Patil, Laura Abbott, Randy Dunlap,
	Jann Horn, Mark Rutland, Marco Elver, linux-mm,
	linux-security-module, kernel-hardening, clang-built-linux
In-Reply-To: <20190627061534.GA17798@dhcp22.suse.cz>

On Thu, Jun 27, 2019 at 08:15:34AM +0200, Michal Hocko wrote:
> This sounds familiar: http://lkml.kernel.org/r/CABXOdTd-cqHM_feAO1tvwn4Z=kM6WHKYAbDJ7LGfMvRPRPG7GA@mail.gmail.com

Your memory is better than mine! I entirely forgot about this and it was
only 2 months ago. Whoops. :P

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH bpf-next v9 05/10] bpf,landlock: Add a new map type: inode
From: Mickaël Salaün @ 2019-06-27 16:18 UTC (permalink / raw)
  To: Al Viro, Mickaël Salaün
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Andrew Morton,
	Andy Lutomirski, Arnaldo Carvalho de Melo, Casey Schaufler,
	Daniel Borkmann, David Drysdale, David S . Miller,
	Eric W . Biederman, James Morris, Jann Horn, John Johansen,
	Jonathan Corbet, Kees Cook, Michael Kerrisk, Paul Moore,
	Sargun Dhillon, Serge E . Hallyn, Shuah Khan, Stephen Smalley,
	Tejun Heo, Tetsuo Handa, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-fsdevel, linux-security-module,
	netdev
In-Reply-To: <20190625225201.GJ17978@ZenIV.linux.org.uk>


On 26/06/2019 00:52, Al Viro wrote:
> On Tue, Jun 25, 2019 at 11:52:34PM +0200, Mickaël Salaün wrote:
>> +/* must call iput(inode) after this call */
>> +static struct inode *inode_from_fd(int ufd, bool check_access)
>> +{
>> +    struct inode *ret;
>> +    struct fd f;
>> +    int deny;
>> +
>> +    f = fdget(ufd);
>> +    if (unlikely(!f.file || !file_inode(f.file))) {
>> +            ret = ERR_PTR(-EBADF);
>> +            goto put_fd;
>> +    }
>
> Just when does one get a NULL file_inode()?  The reason I'm asking is
> that arseloads of code would break if one managed to create such
> a beast...

I didn't find any API documentation about this guarantee, so I followed
a defensive programming approach. I'll remove the file_inode() check.

>
> Incidentally, that should be return ERR_PTR(-EBADF); fdput() is wrong there.

Right, I'll fix that.

>
>> +    }
>> +    /* check if the FD is tied to a mount point */
>> +    /* TODO: add this check when called from an eBPF program too */
>> +    if (unlikely(!f.file->f_path.mnt
>
> Again, the same question - when the hell can that happen?

Defensive programming again, I'll remove it.

> If you are
> sitting on an exploitable roothole, do share it...
>
>  || f.file->f_path.mnt->mnt_flags &
>> +                            MNT_INTERNAL)) {
>> +            ret = ERR_PTR(-EINVAL);
>> +            goto put_fd;
>
> What does it have to do with mountpoints, anyway?

I want to only manage inodes tied to a userspace-visible file system
(this check may not be enough though). It doesn't make sense to be able
to add inodes which are not mounted, to this kind of map.

>
>> +/* called from syscall */
>> +static int sys_inode_map_delete_elem(struct bpf_map *map, struct inode *key)
>> +{
>> +    struct inode_array *array = container_of(map, struct inode_array, map);
>> +    struct inode *inode;
>> +    int i;
>> +
>> +    WARN_ON_ONCE(!rcu_read_lock_held());
>> +    for (i = 0; i < array->map.max_entries; i++) {
>> +            if (array->elems[i].inode == key) {
>> +                    inode = xchg(&array->elems[i].inode, NULL);
>> +                    array->nb_entries--;
>
> Umm...  Is that intended to be atomic in any sense?

nb_entries is not used as a bound check but to avoid walking uselessly
through the (pre-allocated) array when adding a new element, but I'll
use an atomic to avoid inconsistencies anyway.

>
>> +                    iput(inode);
>> +                    return 0;
>> +            }
>> +    }
>> +    return -ENOENT;
>> +}
>> +
>> +/* called from syscall */
>> +int bpf_inode_map_delete_elem(struct bpf_map *map, int *key)
>> +{
>> +    struct inode *inode;
>> +    int err;
>> +
>> +    inode = inode_from_fd(*key, false);
>> +    if (IS_ERR(inode))
>> +            return PTR_ERR(inode);
>> +    err = sys_inode_map_delete_elem(map, inode);
>> +    iput(inode);
>> +    return err;
>> +}
>
> Wait a sec...  So we have those beasties that can have long-term
> references to arbitrary inodes stuck in them?  What will happen
> if you get umount(2) called while such a thing exists?

I though an umount would be denied but no, we get a self-destructed busy
inode and a bug!
What about wrapping the inode's superblock->s_op->destroy_inode() to
first remove the element from the map and then call the real
destroy_inode(), if any?
Or I could update fs/inode.c:destroy_inode() to call inode->free_inode()
if it is set, and set it when such inode is referenced by a map?
Or maybe I could hold the referencing file in the map and then wrap its
f_op?


--
Mickaël Salaün
ANSSI/SDE/ST/LAM

Les données à caractère personnel recueillies et traitées dans le cadre de cet échange, le sont à seule fin d’exécution d’une relation professionnelle et s’opèrent dans cette seule finalité et pour la durée nécessaire à cette relation. Si vous souhaitez faire usage de vos droits de consultation, de rectification et de suppression de vos données, veuillez contacter contact.rgpd@sgdsn.gouv.fr. Si vous avez reçu ce message par erreur, nous vous remercions d’en informer l’expéditeur et de détruire le message. The personal data collected and processed during this exchange aims solely at completing a business relationship and is limited to the necessary duration of that relationship. If you wish to use your rights of consultation, rectification and deletion of your data, please contact: contact.rgpd@sgdsn.gouv.fr. If you have received this message in error, we thank you for informing the sender and destroying the message.

^ permalink raw reply

* Re: [PATCH v9 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Kees Cook @ 2019-06-27 16:29 UTC (permalink / raw)
  To: Qian Cai
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Masahiro Yamada, Michal Hocko, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov, Sandeep Patil,
	Laura Abbott, Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	linux-mm, linux-security-module, kernel-hardening
In-Reply-To: <1561641911.5154.85.camel@lca.pw>

On Thu, Jun 27, 2019 at 09:25:11AM -0400, Qian Cai wrote:
> On Thu, 2019-06-27 at 15:03 +0200, Alexander Potapenko wrote:
> > +static int __init early_init_on_alloc(char *buf)
> > +{
> > +	int ret;
> > +	bool bool_result;
> > +
> > +	if (!buf)
> > +		return -EINVAL;
> > +	ret = kstrtobool(buf, &bool_result);
> > +	if (bool_result && IS_ENABLED(CONFIG_PAGE_POISONING))
> > +		pr_warn("mem auto-init: CONFIG_PAGE_POISONING is on, will
> > take precedence over init_on_alloc\n");
> 
> I don't like the warning here. It makes people think it is bug that need to be
> fixed, but actually it is just information. People could enable both in a debug
> kernel.

How would you suggest it be adjusted? Should it be silent, or be
switched to pr_info()?

Also, doesn't this need to check "want_page_poisoning", not just
CONFIG_PAGE_POISONING? Perhaps just leave the warning out entirely?

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v4 17/23] LSM: Use lsmcontext in security_secid_to_secctx
From: Casey Schaufler @ 2019-06-27 16:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: casey.schaufler, jmorris, linux-security-module, selinux,
	john.johansen, penguin-kernel, paul, sds, casey
In-Reply-To: <201906262051.59B064019F@keescook>

On 6/26/2019 8:53 PM, Kees Cook wrote:
> On Wed, Jun 26, 2019 at 12:22:28PM -0700, Casey Schaufler wrote:
>> Replace the (secctx,seclen) pointer pair with a single
>> lsmcontext pointer to allow return of the LSM identifier
>> along with the context and context length. This allows
>> security_release_secctx() to know how to release the
>> context. Callers have been modified to use or save the
>> returned data from the new structure.
>>
>> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
>> ---
>>  drivers/android/binder.c                | 24 ++++++---------
>>  include/linux/security.h                |  4 +--
>>  include/net/scm.h                       |  9 ++----
>>  kernel/audit.c                          | 29 +++++++-----------
>>  kernel/auditsc.c                        | 31 +++++++------------
>>  net/ipv4/ip_sockglue.c                  |  7 ++---
>>  net/netfilter/nf_conntrack_netlink.c    | 14 +++++----
>>  net/netfilter/nf_conntrack_standalone.c |  7 ++---
>>  net/netfilter/nfnetlink_queue.c         |  5 +++-
>>  net/netlabel/netlabel_unlabeled.c       | 40 ++++++++-----------------
>>  net/netlabel/netlabel_user.c            |  7 ++---
>>  security/security.c                     |  9 ++++--
>>  12 files changed, 72 insertions(+), 114 deletions(-)
>>
>> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
>> index 89e574be34cc..5d417a7b9bb3 100644
>> --- a/drivers/android/binder.c
>> +++ b/drivers/android/binder.c
>> @@ -2874,9 +2874,7 @@ static void binder_transaction(struct binder_proc *proc,
>>  	binder_size_t last_fixup_min_off = 0;
>>  	struct binder_context *context = proc->context;
>>  	int t_debug_id = atomic_inc_return(&binder_last_id);
>> -	char *secctx = NULL;
>> -	u32 secctx_sz = 0;
>> -	struct lsmcontext scaff; /* scaffolding */
>> +	struct lsmcontext lsmctx;
> As James found, this needs to be zero initialized:
>
> struct lsmcontext lsmctx = { };

Thanks! I'll incorporate this in v5. It's great to
have y'all checking it out.
??


^ permalink raw reply

* Re: [PATCH v9 1/2] mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
From: Qian Cai @ 2019-06-27 16:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander Potapenko, Andrew Morton, Christoph Lameter,
	Masahiro Yamada, Michal Hocko, James Morris, Serge E. Hallyn,
	Nick Desaulniers, Kostya Serebryany, Dmitry Vyukov, Sandeep Patil,
	Laura Abbott, Randy Dunlap, Jann Horn, Mark Rutland, Marco Elver,
	linux-mm, linux-security-module, kernel-hardening
In-Reply-To: <201906270926.02AAEE93@keescook>

On Thu, 2019-06-27 at 09:29 -0700, Kees Cook wrote:
> On Thu, Jun 27, 2019 at 09:25:11AM -0400, Qian Cai wrote:
> > On Thu, 2019-06-27 at 15:03 +0200, Alexander Potapenko wrote:
> > > +static int __init early_init_on_alloc(char *buf)
> > > +{
> > > +	int ret;
> > > +	bool bool_result;
> > > +
> > > +	if (!buf)
> > > +		return -EINVAL;
> > > +	ret = kstrtobool(buf, &bool_result);
> > > +	if (bool_result && IS_ENABLED(CONFIG_PAGE_POISONING))
> > > +		pr_warn("mem auto-init: CONFIG_PAGE_POISONING is on, will
> > > take precedence over init_on_alloc\n");
> > 
> > I don't like the warning here. It makes people think it is bug that need to
> > be
> > fixed, but actually it is just information. People could enable both in a
> > debug
> > kernel.
> 
> How would you suggest it be adjusted? Should it be silent, or be
> switched to pr_info()?

pr_info() sounds more reasonable to me, so people don't need to guess the
correct behavior. Ideally, CONFIG_PAGE_POISONING should be  renamed to something
like CONFIG_INIT_ON_FREE_CHECK, and it only does the checking part if enabled,
and init_on_free will gain an ability to poison a pattern other than 0.

Also, there might be some rooms to consolidate with SLAB_POSION as well.

> 
> Also, doesn't this need to check "want_page_poisoning", not just
> CONFIG_PAGE_POISONING? Perhaps just leave the warning out entirely?
> 

Yes, only checking CONFIG_PAGE_POISONING is not enough, and need to check
page_poisoning_enabled().

^ permalink raw reply

* Re: [PATCH bpf-next v9 05/10] bpf,landlock: Add a new map type: inode
From: Al Viro @ 2019-06-27 16:56 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Mickaël Salaün, linux-kernel, Aleksa Sarai,
	Alexei Starovoitov, Andrew Morton, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, John Johansen, Jonathan Corbet,
	Kees Cook, Michael Kerrisk, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Stephen Smalley, Tejun Heo,
	Tetsuo Handa, Thomas Graf, Tycho Andersen, Will Drewry,
	kernel-hardening, linux-api, linux-fsdevel, linux-security-module,
	netdev
In-Reply-To: <79bac827-4092-8a4d-9dc6-6019419b2486@ssi.gouv.fr>

On Thu, Jun 27, 2019 at 06:18:12PM +0200, Mickaël Salaün wrote:

> >> +/* called from syscall */
> >> +static int sys_inode_map_delete_elem(struct bpf_map *map, struct inode *key)
> >> +{
> >> +    struct inode_array *array = container_of(map, struct inode_array, map);
> >> +    struct inode *inode;
> >> +    int i;
> >> +
> >> +    WARN_ON_ONCE(!rcu_read_lock_held());
> >> +    for (i = 0; i < array->map.max_entries; i++) {
> >> +            if (array->elems[i].inode == key) {
> >> +                    inode = xchg(&array->elems[i].inode, NULL);
> >> +                    array->nb_entries--;
> >
> > Umm...  Is that intended to be atomic in any sense?
> 
> nb_entries is not used as a bound check but to avoid walking uselessly
> through the (pre-allocated) array when adding a new element, but I'll
> use an atomic to avoid inconsistencies anyway.


> > Wait a sec...  So we have those beasties that can have long-term
> > references to arbitrary inodes stuck in them?  What will happen
> > if you get umount(2) called while such a thing exists?
> 
> I though an umount would be denied but no, we get a self-destructed busy
> inode and a bug!
> What about wrapping the inode's superblock->s_op->destroy_inode() to
> first remove the element from the map and then call the real
> destroy_inode(), if any?

What do you mean, _the_ map?  I don't see anything to prevent insertion
of references to the same inode into any number of those...

> Or I could update fs/inode.c:destroy_inode() to call inode->free_inode()
> if it is set, and set it when such inode is referenced by a map?
> Or maybe I could hold the referencing file in the map and then wrap its
> f_op?

First of all, anything including the word "wrap" is a non-starter.
We really don't need the headache associated with the locking needed
to replace the method tables on the fly, or with the code checking that
->f_op points to given method table, etc.  That's not going to fly,
especially since you'd end up _chaining_ those (again, the same reference
can go in more than once).

Nothing is allowed to change the method tables of live objects, period.
Once a struct file is opened, its ->f_op is never going to change and
it entirely belongs to the device driver or filesystem it resides on.
Nothing else (not VFS, not VM, not some LSM module, etc.) has any business
touching that.  The same goes for inodes, dentries, etc.

What kind of behaviour do you want there?  Do you want the inodes you've
referenced there to be forgotten on e.g. memory pressure?  The thing is,
I don't see how "it's getting freed" could map onto any semantics that
might be useful for you - it looks like the wrong event for that.

^ permalink raw reply

* Re: [PATCH v4 00/23] LSM: Module stacking for AppArmor
From: Kees Cook @ 2019-06-27 17:07 UTC (permalink / raw)
  To: James Morris
  Cc: Casey Schaufler, casey.schaufler, linux-security-module, selinux,
	John Johansen, Tetsuo Handa, Paul Moore, Stephen Smalley
In-Reply-To: <alpine.LRH.2.21.1906271409460.16512@namei.org>

On Thu, Jun 27, 2019 at 02:10:18PM +1000, James Morris wrote:
> On Thu, 27 Jun 2019, James Morris wrote:
> 
> > Confirming there's no oops when Tomoyo is un-selected in the kernel 
> > config.
> 
> n/m, the problem is still there.

Were you able to test my fix for this? I wonder if what I found was just
a coincidence.

-- 
Kees Cook

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox