Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [RFC PATCH 0/7] x86: introduce system calls addess space isolation
From: 黄金海 @ 2020-07-01 14:05 UTC (permalink / raw)
  To: rppt
  Cc: James.Bottomley, alexandre.chartre, bp, dave.hansen, hpa, jwadams,
	keescook, linux-kernel, linux-mm, linux-security-module, luto,
	mingo, peterz, pjt, tglx, x86

How about performance when running with ASI?

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Luis Chamberlain @ 2020-07-01 13:53 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Christian Borntraeger, Christoph Hellwig, ast, axboe, bfields,
	bridge, chainsaw, christian.brauner, chuck.lever, davem, dhowells,
	gregkh, jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <a6792135-3285-0861-014e-3db85ea251dc@i-love.sakura.ne.jp>

On Wed, Jul 01, 2020 at 10:24:29PM +0900, Tetsuo Handa wrote:
> On 2020/07/01 19:08, Christian Borntraeger wrote:
> > 
> > 
> > On 30.06.20 19:57, Luis Chamberlain wrote:
> >> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
> >>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 24.06.20 20:32, Christian Borntraeger wrote:
> >>>> [...]> 
> >>>>> So the translations look correct. But your change is actually a sematic change
> >>>>> if(ret) will only trigger if there is an error
> >>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> >>>>> and we did not do it before. 
> >>>>>
> >>>>
> >>>> So the right fix is
> >>>>
> >>>> diff --git a/kernel/umh.c b/kernel/umh.c
> >>>> index f81e8698e36e..a3a3196e84d1 100644
> >>>> --- a/kernel/umh.c
> >>>> +++ b/kernel/umh.c
> >>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >>>>                  * the real error code is already in sub_info->retval or
> >>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
> >>>>                  */
> >>>> -               if (KWIFEXITED(ret))
> >>>> +               if (KWEXITSTATUS(ret))
> >>>>                         sub_info->retval = KWEXITSTATUS(ret);
> 
> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
> processed error when UMH_WAIT_PROC is used" will be the correct behavior).

br_stp_start() doesn't check for the raw value, it just checks for err
or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
used" propagates the correct error now.

Christian, can you try removing the binary temporarily and seeing if
you get your bridge working?

  Luis

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Luis Chamberlain @ 2020-07-01 13:46 UTC (permalink / raw)
  To: Christian Borntraeger, Jessica Yu, Tetsuo Handa
  Cc: Christoph Hellwig, Eric W. Biederman, Andrew Morton, ast, axboe,
	bfields, bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390
In-Reply-To: <b24d8dae-1872-ba2c-acd4-ed46c0781317@de.ibm.com>

On Wed, Jul 01, 2020 at 12:08:11PM +0200, Christian Borntraeger wrote:
> dmesg attached
> [   14.438482] virbr0: port 1(virbr0-nic) entered blocking state
> [   14.438485] virbr0: port 1(virbr0-nic) entered disabled state
> [   14.438635] device virbr0-nic entered promiscuous mode
> [   14.439654] umh: sub_info->path: /sbin/bridge-stp
> [   14.439656] /sbin/bridge-stp 
> [   14.439656] virbr0 
> [   14.439656] start 

OK so what we seem to want to debug is the umh call for:

/sbin/bridge-stp virbr0 start

> [   14.439734] == ret: 00
> [   14.439735] == KWIFEXITED(ret): 01
> [   14.439736] KWEXITSTATUS(ret): 0

Its not clear if this is the respective return value, but now
that we have a clue that this is the the only non-modprobe
call, we should have a clearer certainty that this is the
issue.

Indeed my patch "umh: fix processed error when UMH_WAIT_PROC is used"
did modify bridge-stp in the following way:

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index ba55851fe132..bdd94b45396b 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -133,14 +133,8 @@ static int br_stp_call_user(struct net_bridge *br, char *arg)
 
 	/* call userspace STP and report program errors */
 	rc = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC);
-	if (rc > 0) {
-		if (rc & 0xff)
-			br_debug(br, BR_STP_PROG " received signal %d\n",
-				 rc & 0x7f);
-		else
-			br_debug(br, BR_STP_PROG " exited with code %d\n",
-				 (rc >> 8) & 0xff);
-	}
+	if (rc != 0)
+		br_debug(br, BR_STP_PROG " failed with exit code %d\n", rc);
 
 	return rc;
 }

If you look at this carefully though you'll notice that the change just
modifies *when* we issue the debug print. The more important relevant
part of the patch however was that we now do return a correct error
value when the call fails.

More importantly, depending on if an error or not we run the kernel STP
or userspace STP later:

static void br_stp_start(struct net_bridge *br)
{
	int err = -ENOENT;

	if (net_eq(dev_net(br->dev), &init_net))
		err = br_stp_call_user(br, "start");

	if (err && err != -ENOENT)
		br_err(br, "failed to start userspace STP (%d)\n", err);

	spin_lock_bh(&br->lock);

	if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY)
		__br_set_forward_delay(br, BR_MIN_FORWARD_DELAY);
	else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY)
		__br_set_forward_delay(br, BR_MAX_FORWARD_DELAY);

--------------------->  can you enable debug print for this to see what
--------------------->  you end up using?
	if (!err) {
		br->stp_enabled = BR_USER_STP;
		br_debug(br, "userspace STP started\n");
	} else {
		br->stp_enabled = BR_KERNEL_STP;
		br_debug(br, "using kernel STP\n");

		/* To start timers on any ports left in blocking */
		if (br->dev->flags & IFF_UP)
			mod_timer(&br->hello_timer, jiffies + br->hello_time);
		br_port_state_selection(br);
	}
----------------->

	spin_unlock_bh(&br->lock);
}


^ permalink raw reply related

* [PATCH] integrity: remove redundant initialization of variable ret
From: Colin King @ 2020-07-01 13:46 UTC (permalink / raw)
  To: James Morris, Serge E . Hallyn, linux-security-module
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The variable ret is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 security/integrity/digsig_asymmetric.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/digsig_asymmetric.c b/security/integrity/digsig_asymmetric.c
index 4e0d6778277e..cfa4127d0518 100644
--- a/security/integrity/digsig_asymmetric.c
+++ b/security/integrity/digsig_asymmetric.c
@@ -79,7 +79,7 @@ int asymmetric_verify(struct key *keyring, const char *sig,
 	struct public_key_signature pks;
 	struct signature_v2_hdr *hdr = (struct signature_v2_hdr *)sig;
 	struct key *key;
-	int ret = -ENOMEM;
+	int ret;

 	if (siglen <= sizeof(*hdr))
 		return -EBADMSG;
-- 
2.27.0

^ permalink raw reply related

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Tetsuo Handa @ 2020-07-01 13:24 UTC (permalink / raw)
  To: Christian Borntraeger, Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <b24d8dae-1872-ba2c-acd4-ed46c0781317@de.ibm.com>

On 2020/07/01 19:08, Christian Borntraeger wrote:
> 
> 
> On 30.06.20 19:57, Luis Chamberlain wrote:
>> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
>>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 24.06.20 20:32, Christian Borntraeger wrote:
>>>> [...]> 
>>>>> So the translations look correct. But your change is actually a sematic change
>>>>> if(ret) will only trigger if there is an error
>>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>>>>> and we did not do it before. 
>>>>>
>>>>
>>>> So the right fix is
>>>>
>>>> diff --git a/kernel/umh.c b/kernel/umh.c
>>>> index f81e8698e36e..a3a3196e84d1 100644
>>>> --- a/kernel/umh.c
>>>> +++ b/kernel/umh.c
>>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>>>                  * the real error code is already in sub_info->retval or
>>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>>>>                  */
>>>> -               if (KWIFEXITED(ret))
>>>> +               if (KWEXITSTATUS(ret))
>>>>                         sub_info->retval = KWEXITSTATUS(ret);

Well, it is not br_stp_call_user() but br_stp_start() which is expecting
to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
processed error when UMH_WAIT_PROC is used" will be the correct behavior).


^ permalink raw reply

* Re: INFO: task hung in request_key_tag
From: Tetsuo Handa @ 2020-07-01 12:49 UTC (permalink / raw)
  To: Luis Chamberlain, Andrew Morton, Eric W. Biederman,
	Naresh Kamboju, Herbert Xu, Christoph Hellwig
  Cc: syzbot, dhowells, jarkko.sakkinen, jmorris, keyrings,
	linux-kernel, linux-security-module, serge, syzkaller-bugs
In-Reply-To: <20200701122030.GP4332@42.do-not-panic.com>

On 2020/07/01 21:20, Luis Chamberlain wrote:
> On Wed, Jul 01, 2020 at 07:04:15PM +0900, Tetsuo Handa wrote:
>> I suspect commit 9e9b47d6bbe9df65 ("umh: fix processed error when UMH_WAIT_PROC is used").
>> Maybe the change in kernel/umh.c and/or security/keys/request_key.c made by that commit is
>> affecting call_usermodehelper_keys() == 0 case when complete_request_key() is called.
> 
> That patch has been dropped for now due to another reported issue
> bisected to it and even though we have not root caused that issue [0].
> 
> It would be good to have a reproducer for this reported issue as well,

Reproducer is not available yet.
But at least this patch is changing the behavior of

  call_usermodehelper_keys() == 0 && (test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0)

case from key_negate_and_link() to key_revoke() in complete_request_key().
Since test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) might be called from
key_reject_and_link() from key_negate_and_link(), this change might be the cause of
this hung task report.


^ permalink raw reply

* Re: INFO: task hung in request_key_tag
From: Luis Chamberlain @ 2020-07-01 12:20 UTC (permalink / raw)
  To: Tetsuo Handa, Andrew Morton, Eric W. Biederman, Naresh Kamboju,
	Herbert Xu, Christoph Hellwig, mcgrof
  Cc: syzbot, dhowells, jarkko.sakkinen, jmorris, keyrings,
	linux-kernel, linux-security-module, serge, syzkaller-bugs
In-Reply-To: <728915db-592b-997f-6970-464cc59441d7@i-love.sakura.ne.jp>

On Wed, Jul 01, 2020 at 07:04:15PM +0900, Tetsuo Handa wrote:
> I suspect commit 9e9b47d6bbe9df65 ("umh: fix processed error when UMH_WAIT_PROC is used").
> Maybe the change in kernel/umh.c and/or security/keys/request_key.c made by that commit is
> affecting call_usermodehelper_keys() == 0 case when complete_request_key() is called.

That patch has been dropped for now due to another reported issue
bisected to it and even though we have not root caused that issue [0].

It would be good to have a reproducer for this reported issue as well,
but I should also point out a regression which Naresh reported is fixed
recently by a posted patch [1] by Herbert Xu. It would be good if you
can test with that patch posted by Herbert.

[0] https://lkml.kernel.org/r/20200623141157.5409-1-borntraeger@de.ibm.com          
[1] https://lkml.kernel.org/r/20200626062948.GA25285@gondor.apana.org.au

  Luis

> 
> On 2020/07/01 16:53, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    c28e58ee Add linux-next specific files for 20200629
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17925a9d100000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=dcd26bbca17dd1db
> > dashboard link: https://syzkaller.appspot.com/bug?extid=46c77dc7e98c732de754
> > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+46c77dc7e98c732de754@syzkaller.appspotmail.com

^ permalink raw reply

* Re: [fs] 140402bab8: stress-ng.splice.ops_per_sec -100.0% regression
From: Christoph Hellwig @ 2020-07-01 12:13 UTC (permalink / raw)
  To: kernel test robot
  Cc: Christoph Hellwig, Al Viro, Linus Torvalds, Ian Kent,
	David Howells, linux-kernel, linux-fsdevel, linux-security-module,
	netfilter-devel, lkp
In-Reply-To: <20200701091943.GC3874@shao2-debian>

FYI, this is because stress-nh tests splice using /dev/null.  Which
happens to actually have the iter ops, but doesn't have explicit
splice_read operation.

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christian Borntraeger @ 2020-07-01 10:08 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <20200630175704.GO13911@42.do-not-panic.com>

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]



On 30.06.20 19:57, Luis Chamberlain wrote:
> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 24.06.20 20:32, Christian Borntraeger wrote:
>>> [...]> 
>>>> So the translations look correct. But your change is actually a sematic change
>>>> if(ret) will only trigger if there is an error
>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>>>> and we did not do it before. 
>>>>
>>>
>>> So the right fix is
>>>
>>> diff --git a/kernel/umh.c b/kernel/umh.c
>>> index f81e8698e36e..a3a3196e84d1 100644
>>> --- a/kernel/umh.c
>>> +++ b/kernel/umh.c
>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>>                  * the real error code is already in sub_info->retval or
>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>>>                  */
>>> -               if (KWIFEXITED(ret))
>>> +               if (KWEXITSTATUS(ret))
>>>                         sub_info->retval = KWEXITSTATUS(ret);
>>>         }
>>>  
>>> I think.
>>
>> Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
>> I'm not able to reproduce this on x86 with a bridge. What type of bridge
>> are you using on a guest, or did you mean using KVM so that the *host*
>> can spawn kvm guests?
>>
>> It would be good if you can try to add a bridge manually and see where
>> things fail. Can you do something like this:
>>
>> brctl addbr br0
>> brctl addif br0 ens6 
>> ip link set dev br0 up
>>
>> Note that most callers are for modprobe. I'd be curious to see which
>> umh is failing which breaks bridge for you. Can you trut this so we can
>> see which umh call is failing?
> 
> Christian, any luck getting to test the code below to see what this
> reveals?
> 
>   Luis

dmesg attached


[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 41598 bytes --]

[    0.137762] Linux version 5.8.0-rc2-next-20200623-00002-g5a5b173022ac-dirty (cborntra@m83lp52.lnxne.boe) (gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1), GNU ld version 2.32-31.fc31) #60 SMP Wed Jul 1 09:14:56 CEST 2020
[    0.137764] setup: Linux is running natively in 64-bit mode
[    0.137812] setup: The maximum memory size is 131072MB
[    0.137817] setup: Reserving 1024MB of memory at 130048MB for crashkernel (System RAM: 130048MB)
[    0.137901] cpu: 64 configured CPUs, 0 standby CPUs
[    0.137966] cpu: The CPU configuration topology of the machine is: 0 0 4 2 3 10 / 4
[    0.138533] Write protected kernel read-only data: 13660k
[    0.139239] Zone ranges:
[    0.139240]   DMA      [mem 0x0000000000000000-0x000000007fffffff]
[    0.139241]   Normal   [mem 0x0000000080000000-0x0000001fffffffff]
[    0.139243] Movable zone start for each node
[    0.139243] Early memory node ranges
[    0.139244]   node   0: [mem 0x0000000000000000-0x0000001fffffffff]
[    0.139251] Initmem setup node 0 [mem 0x0000000000000000-0x0000001fffffffff]
[    0.139252] On node 0 totalpages: 33554432
[    0.139253]   DMA zone: 8192 pages used for memmap
[    0.139253]   DMA zone: 0 pages reserved
[    0.139254]   DMA zone: 524288 pages, LIFO batch:63
[    0.152116]   Normal zone: 516096 pages used for memmap
[    0.152117]   Normal zone: 33030144 pages, LIFO batch:63
[    0.168380] percpu: Embedded 33 pages/cpu s97536 r8192 d29440 u135168
[    0.168388] pcpu-alloc: s97536 r8192 d29440 u135168 alloc=33*4096
[    0.168388] pcpu-alloc: [0] 000 [0] 001 [0] 002 [0] 003 
[    0.168390] pcpu-alloc: [0] 004 [0] 005 [0] 006 [0] 007 
[    0.168391] pcpu-alloc: [0] 008 [0] 009 [0] 010 [0] 011 
[    0.168392] pcpu-alloc: [0] 012 [0] 013 [0] 014 [0] 015 
[    0.168393] pcpu-alloc: [0] 016 [0] 017 [0] 018 [0] 019 
[    0.168394] pcpu-alloc: [0] 020 [0] 021 [0] 022 [0] 023 
[    0.168395] pcpu-alloc: [0] 024 [0] 025 [0] 026 [0] 027 
[    0.168396] pcpu-alloc: [0] 028 [0] 029 [0] 030 [0] 031 
[    0.168397] pcpu-alloc: [0] 032 [0] 033 [0] 034 [0] 035 
[    0.168398] pcpu-alloc: [0] 036 [0] 037 [0] 038 [0] 039 
[    0.168399] pcpu-alloc: [0] 040 [0] 041 [0] 042 [0] 043 
[    0.168400] pcpu-alloc: [0] 044 [0] 045 [0] 046 [0] 047 
[    0.168401] pcpu-alloc: [0] 048 [0] 049 [0] 050 [0] 051 
[    0.168403] pcpu-alloc: [0] 052 [0] 053 [0] 054 [0] 055 
[    0.168404] pcpu-alloc: [0] 056 [0] 057 [0] 058 [0] 059 
[    0.168405] pcpu-alloc: [0] 060 [0] 061 [0] 062 [0] 063 
[    0.168406] pcpu-alloc: [0] 064 [0] 065 [0] 066 [0] 067 
[    0.168407] pcpu-alloc: [0] 068 [0] 069 [0] 070 [0] 071 
[    0.168408] pcpu-alloc: [0] 072 [0] 073 [0] 074 [0] 075 
[    0.168409] pcpu-alloc: [0] 076 [0] 077 [0] 078 [0] 079 
[    0.168410] pcpu-alloc: [0] 080 [0] 081 [0] 082 [0] 083 
[    0.168411] pcpu-alloc: [0] 084 [0] 085 [0] 086 [0] 087 
[    0.168413] pcpu-alloc: [0] 088 [0] 089 [0] 090 [0] 091 
[    0.168414] pcpu-alloc: [0] 092 [0] 093 [0] 094 [0] 095 
[    0.168415] pcpu-alloc: [0] 096 [0] 097 [0] 098 [0] 099 
[    0.168416] pcpu-alloc: [0] 100 [0] 101 [0] 102 [0] 103 
[    0.168417] pcpu-alloc: [0] 104 [0] 105 [0] 106 [0] 107 
[    0.168418] pcpu-alloc: [0] 108 [0] 109 [0] 110 [0] 111 
[    0.168419] pcpu-alloc: [0] 112 [0] 113 [0] 114 [0] 115 
[    0.168420] pcpu-alloc: [0] 116 [0] 117 [0] 118 [0] 119 
[    0.168421] pcpu-alloc: [0] 120 [0] 121 [0] 122 [0] 123 
[    0.168422] pcpu-alloc: [0] 124 [0] 125 [0] 126 [0] 127 
[    0.168424] pcpu-alloc: [0] 128 [0] 129 [0] 130 [0] 131 
[    0.168425] pcpu-alloc: [0] 132 [0] 133 [0] 134 [0] 135 
[    0.168426] pcpu-alloc: [0] 136 [0] 137 [0] 138 [0] 139 
[    0.168427] pcpu-alloc: [0] 140 [0] 141 [0] 142 [0] 143 
[    0.168428] pcpu-alloc: [0] 144 [0] 145 [0] 146 [0] 147 
[    0.168429] pcpu-alloc: [0] 148 [0] 149 [0] 150 [0] 151 
[    0.168430] pcpu-alloc: [0] 152 [0] 153 [0] 154 [0] 155 
[    0.168431] pcpu-alloc: [0] 156 [0] 157 [0] 158 [0] 159 
[    0.168432] pcpu-alloc: [0] 160 [0] 161 [0] 162 [0] 163 
[    0.168433] pcpu-alloc: [0] 164 [0] 165 [0] 166 [0] 167 
[    0.168434] pcpu-alloc: [0] 168 [0] 169 [0] 170 [0] 171 
[    0.168435] pcpu-alloc: [0] 172 [0] 173 [0] 174 [0] 175 
[    0.168436] pcpu-alloc: [0] 176 [0] 177 [0] 178 [0] 179 
[    0.168437] pcpu-alloc: [0] 180 [0] 181 [0] 182 [0] 183 
[    0.168438] pcpu-alloc: [0] 184 [0] 185 [0] 186 [0] 187 
[    0.168439] pcpu-alloc: [0] 188 [0] 189 [0] 190 [0] 191 
[    0.168440] pcpu-alloc: [0] 192 [0] 193 [0] 194 [0] 195 
[    0.168442] pcpu-alloc: [0] 196 [0] 197 [0] 198 [0] 199 
[    0.168443] pcpu-alloc: [0] 200 [0] 201 [0] 202 [0] 203 
[    0.168444] pcpu-alloc: [0] 204 [0] 205 [0] 206 [0] 207 
[    0.168445] pcpu-alloc: [0] 208 [0] 209 [0] 210 [0] 211 
[    0.168446] pcpu-alloc: [0] 212 [0] 213 [0] 214 [0] 215 
[    0.168447] pcpu-alloc: [0] 216 [0] 217 [0] 218 [0] 219 
[    0.168448] pcpu-alloc: [0] 220 [0] 221 [0] 222 [0] 223 
[    0.168449] pcpu-alloc: [0] 224 [0] 225 [0] 226 [0] 227 
[    0.168450] pcpu-alloc: [0] 228 [0] 229 [0] 230 [0] 231 
[    0.168451] pcpu-alloc: [0] 232 [0] 233 [0] 234 [0] 235 
[    0.168452] pcpu-alloc: [0] 236 [0] 237 [0] 238 [0] 239 
[    0.168453] pcpu-alloc: [0] 240 [0] 241 [0] 242 [0] 243 
[    0.168454] pcpu-alloc: [0] 244 [0] 245 [0] 246 [0] 247 
[    0.168455] pcpu-alloc: [0] 248 [0] 249 [0] 250 [0] 251 
[    0.168456] pcpu-alloc: [0] 252 [0] 253 [0] 254 [0] 255 
[    0.168457] pcpu-alloc: [0] 256 [0] 257 [0] 258 [0] 259 
[    0.168459] pcpu-alloc: [0] 260 [0] 261 [0] 262 [0] 263 
[    0.168460] pcpu-alloc: [0] 264 [0] 265 [0] 266 [0] 267 
[    0.168461] pcpu-alloc: [0] 268 [0] 269 [0] 270 [0] 271 
[    0.168462] pcpu-alloc: [0] 272 [0] 273 [0] 274 [0] 275 
[    0.168463] pcpu-alloc: [0] 276 [0] 277 [0] 278 [0] 279 
[    0.168464] pcpu-alloc: [0] 280 [0] 281 [0] 282 [0] 283 
[    0.168465] pcpu-alloc: [0] 284 [0] 285 [0] 286 [0] 287 
[    0.168466] pcpu-alloc: [0] 288 [0] 289 [0] 290 [0] 291 
[    0.168467] pcpu-alloc: [0] 292 [0] 293 [0] 294 [0] 295 
[    0.168468] pcpu-alloc: [0] 296 [0] 297 [0] 298 [0] 299 
[    0.168469] pcpu-alloc: [0] 300 [0] 301 [0] 302 [0] 303 
[    0.168470] pcpu-alloc: [0] 304 [0] 305 [0] 306 [0] 307 
[    0.168471] pcpu-alloc: [0] 308 [0] 309 [0] 310 [0] 311 
[    0.168472] pcpu-alloc: [0] 312 [0] 313 [0] 314 [0] 315 
[    0.168473] pcpu-alloc: [0] 316 [0] 317 [0] 318 [0] 319 
[    0.168474] pcpu-alloc: [0] 320 [0] 321 [0] 322 [0] 323 
[    0.168475] pcpu-alloc: [0] 324 [0] 325 [0] 326 [0] 327 
[    0.168476] pcpu-alloc: [0] 328 [0] 329 [0] 330 [0] 331 
[    0.168477] pcpu-alloc: [0] 332 [0] 333 [0] 334 [0] 335 
[    0.168478] pcpu-alloc: [0] 336 [0] 337 [0] 338 [0] 339 
[    0.168501] Built 1 zonelists, mobility grouping on.  Total pages: 33030144
[    0.168502] Policy zone: Normal
[    0.168503] Kernel command line: root=/dev/disk/by-path/ccw-0.0.3318-part1 rd.dasd=0.0.3318 cio_ignore=all,!condev rd.znet=qeth,0.0.bd00,0.0.bd01,0.0.bd02,layer2=1,portno=0,portname=OSAPORT zfcp.allow_lun_scan=0 BOOT_IMAGE=0 crashkernel=1G brd.rd_size=8000000 brd.rd_nr=8 BOOT_IMAGE=0
[    0.169679] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.169680] printk: log_buf_len total cpu_extra contributions: 1388544 bytes
[    0.169680] printk: log_buf_len min size: 131072 bytes
[    0.169973] printk: log_buf_len: 2097152 bytes
[    0.169974] printk: early log buf free: 123812(94%)
[    0.179095] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes, linear)
[    0.183664] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes, linear)
[    0.183681] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.212977] Memory: 2312420K/134217728K available (10512K kernel code, 2084K rwdata, 3148K rodata, 4008K init, 856K bss, 3358660K reserved, 0K cma-reserved)
[    0.213438] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=340, Nodes=1
[    0.213476] ftrace: allocating 31392 entries in 123 pages
[    0.218045] ftrace: allocated 123 pages with 6 groups
[    0.219167] rcu: Hierarchical RCU implementation.
[    0.219167] rcu: 	RCU event tracing is enabled.
[    0.219168] rcu: 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=340.
[    0.219169] 	Trampoline variant of Tasks RCU enabled.
[    0.219169] 	Rude variant of Tasks RCU enabled.
[    0.219169] 	Tracing variant of Tasks RCU enabled.
[    0.219170] rcu: RCU calculated value of scheduler-enlistment delay is 11 jiffies.
[    0.219171] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=340
[    0.222162] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3
[    0.222313] clocksource: tod: mask: 0xffffffffffffffff max_cycles: 0x3b0a9be803b0a9, max_idle_ns: 1805497147909793 ns
[    0.222534] Console: colour dummy device 80x25
[    0.323253] printk: console [ttyS0] enabled
[    0.457563] Calibrating delay loop (skipped)... 21881.00 BogoMIPS preset
[    0.457564] pid_max: default: 348160 minimum: 2720
[    0.457736] LSM: Security Framework initializing
[    0.457767] SELinux:  Initializing.
[    0.458030] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.458174] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.459174] rcu: Hierarchical SRCU implementation.
[    0.462173] smp: Bringing up secondary CPUs ...
[    0.485719] smp: Brought up 1 node, 64 CPUs
[    1.072386] random: fast init done
[    1.413328] node 0 deferred pages initialised in 930ms
[    1.442064] devtmpfs: initialized
[    1.442586] random: get_random_u32 called from bucket_table_alloc.isra.0+0x82/0x120 with crng_init=1
[    1.442863] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    1.443545] futex hash table entries: 131072 (order: 13, 33554432 bytes, vmalloc)
[    1.447717] xor: automatically using best checksumming function   xc        
[    1.447917] NET: Registered protocol family 16
[    1.447956] audit: initializing netlink subsys (disabled)
[    1.448023] audit: type=2000 audit(1593597944.331:1): state=initialized audit_enabled=0 res=1
[    1.448147] Spectre V2 mitigation: etokens
[    1.456276] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages
[    1.702379] raid6: vx128x8  gen() 21910 MB/s
[    1.872319] raid6: vx128x8  xor() 13356 MB/s
[    1.872320] raid6: using algorithm vx128x8 gen() 21910 MB/s
[    1.872321] raid6: .... xor() 13356 MB/s, rmw enabled
[    1.872322] raid6: using s390xc recovery algorithm
[    1.872601] iommu: Default domain type: Translated 
[    1.872748] SCSI subsystem initialized
[    1.894528] PCI host bridge to bus 0000:00
[    1.894533] pci_bus 0000:00: root bus resource [bus 00]
[    1.894535] pci_bus 0000:00: root bus resource [mem 0x8000000000000000-0x8000000007ffffff 64bit pref]
[    1.894600] pci 0000:00:00.0: [1014:044b] type 00 class 0x120000
[    1.894650] pci 0000:00:00.0: reg 0x10: [mem 0xffffd80008000000-0xffffd8000fffffff 64bit pref]
[    1.894918] pci 0000:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0000:00:00.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[    1.894951] pci 0000:00:00.0: Adding to iommu group 0
[    1.896880] PCI host bridge to bus 0001:00
[    1.896882] pci_bus 0001:00: root bus resource [bus 00]
[    1.896883] pci_bus 0001:00: root bus resource [mem 0x8001000000000000-0x80010000000fffff 64bit pref]
[    1.896971] pci 0001:00:00.0: [15b3:1016] type 00 class 0x020000
[    1.897066] pci 0001:00:00.0: reg 0x10: [mem 0xffffd40002000000-0xffffd400020fffff 64bit pref]
[    1.897216] pci 0001:00:00.0: enabling Extended Tags
[    1.897702] pci 0001:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0001:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[    1.897740] pci 0001:00:00.0: Adding to iommu group 1
[    1.899688] PCI host bridge to bus 0002:00
[    1.899690] pci_bus 0002:00: root bus resource [bus 00]
[    1.899692] pci_bus 0002:00: root bus resource [mem 0x8002000000000000-0x80020000000fffff 64bit pref]
[    1.899779] pci 0002:00:00.0: [15b3:1016] type 00 class 0x020000
[    1.899877] pci 0002:00:00.0: reg 0x10: [mem 0xffffd40008000000-0xffffd400080fffff 64bit pref]
[    1.900031] pci 0002:00:00.0: enabling Extended Tags
[    1.900530] pci 0002:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0002:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[    1.900563] pci 0002:00:00.0: Adding to iommu group 2
[    2.424592] VFS: Disk quotas dquot_6.6.0
[    2.424648] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    2.425726] random: crng init done
[    2.426438] NET: Registered protocol family 2
[    2.427232] tcp_listen_portaddr_hash hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    2.427784] TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
[    2.429806] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    2.430265] TCP: Hash tables configured (established 524288 bind 65536)
[    2.430641] UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
[    2.431636] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
[    2.433139] NET: Registered protocol family 1
[    2.433180] Trying to unpack rootfs image as initramfs...
[    3.042521] Freeing initrd memory: 46272K
[    3.042920] alg: No test for crc32be (crc32be-vx)
[    3.045455] Initialise system trusted keyrings
[    3.045514] workingset: timestamp_bits=45 max_order=25 bucket_order=0
[    3.046303] zbud: loaded
[    3.046685] fuse: init (API version 7.31)
[    3.046739] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
[    3.049940] Key type asymmetric registered
[    3.049942] Asymmetric key parser 'x509' registered
[    3.049948] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    3.050150] io scheduler mq-deadline registered
[    3.050152] io scheduler kyber registered
[    3.050171] io scheduler bfq registered
[    3.051187] atomic64_test: passed
[    3.051236] hvc_iucv: The z/VM IUCV HVC device driver cannot be used without z/VM
[    3.055982] brd: module loaded
[    3.056482] NET: Registered protocol family 10
[    3.057354] Segment Routing with IPv6
[    3.057366] NET: Registered protocol family 17
[    3.057370] Key type dns_resolver registered
[    3.057524] cio: Channel measurement facility initialized using format extended (mode autodetected)
[    3.057802] Discipline DIAG cannot be used without z/VM
[    5.446198] sclp_sd: No data is available for the config data entity
[    5.954014] qeth: loading core functions
[    5.954063] qeth: register layer 2 discipline
[    5.954064] qeth: register layer 3 discipline
[    5.954126] registered taskstats version 1
[    5.954131] Loading compiled-in X.509 certificates
[    5.993277] Loaded X.509 cert 'Build time autogenerated kernel key: c46ba92ee388c82c5891ee836c9c20b752cdfac5'
[    5.995428] zswap: loaded using pool lzo/zbud
[    5.996258] Key type ._fscrypt registered
[    5.996258] Key type .fscrypt registered
[    5.996259] Key type fscrypt-provisioning registered
[    5.996570] Btrfs loaded, crc32c=crc32c-vx
[    5.996583] ima: No TPM chip found, activating TPM-bypass!
[    5.996586] ima: Allocated hash algorithm: sha256
[    5.996596] ima: No architecture policies found
[    5.997561] Freeing unused kernel memory: 4008K
[    6.042376] Write protected read-only-after-init data: 80k
[    6.042381] Run /init as init process
[    6.042383]   with arguments:
[    6.042383]     /init
[    6.042383]   with environment:
[    6.042384]     HOME=/
[    6.042384]     TERM=linux
[    6.042384]     BOOT_IMAGE=0
[    6.042385]     crashkernel=1G
[    6.057157] systemd[1]: Inserted module 'autofs4'
[    6.058276] systemd[1]: systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[    6.059083] systemd[1]: Detected architecture s390x.
[    6.059085] systemd[1]: Running in initial RAM disk.
[    6.059147] systemd[1]: Set hostname to <m83lp52.lnxne.boe>.
[    6.098479] systemd[1]: Reached target Local File Systems.
[    6.098536] systemd[1]: Reached target Slices.
[    6.098555] systemd[1]: Reached target Swap.
[    6.098570] systemd[1]: Reached target Timers.
[    6.098663] systemd[1]: Listening on Journal Audit Socket.
[    6.098712] systemd[1]: Listening on Journal Socket (/dev/log).
[    6.145682] umh: sub_info->path: /sbin/modprobe
[    6.145683] /sbin/modprobe 
[    6.145684] -q 
[    6.145685] -- 
[    6.145685] sch_fq_codel 

[    6.146408] == ret: 100
[    6.146410] == KWIFEXITED(ret): 01
[    6.146411] KWEXITSTATUS(ret): 1
[    6.365672] audit: type=1130 audit(1593597949.251:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    6.372997] audit: type=1130 audit(1593597949.261:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.294273] audit: type=1130 audit(1593597950.181:4): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.310777] audit: type=1130 audit(1593597950.191:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-pre-udev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.311257] audit: type=1334 audit(1593597950.191:6): prog-id=6 op=LOAD
[    7.311287] audit: type=1334 audit(1593597950.191:7): prog-id=7 op=LOAD
[    7.533584] audit: type=1130 audit(1593597950.421:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.538462] qeth 0.0.bd00: portname is deprecated and is ignored
[    7.539958] dasd-eckd 0.0.3318: A channel path to the device has become operational
[    7.540242] dasd-eckd 0.0.3319: A channel path to the device has become operational
[    7.540743] dasd-eckd 0.0.331a: A channel path to the device has become operational
[    7.540985] dasd-eckd 0.0.331b: A channel path to the device has become operational
[    7.543658] qdio: 0.0.bd02 OSA on SC 159b using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
[    7.550125] dasd-eckd 0.0.3319: New DASD 3390/0E (CU 3990/01) with 262668 cylinders, 15 heads, 224 sectors
[    7.552945] dasd-eckd 0.0.3319: DASD with 4 KB/block, 189120960 KB total size, 48 KB/track, compatible disk layout
[    7.554058]  dasdb:VOL1/  0X3319: dasdb1
[    7.554603] dasd-eckd 0.0.3318: New DASD 3390/0E (CU 3990/01) with 262668 cylinders, 15 heads, 224 sectors
[    7.557378] dasd-eckd 0.0.3318: DASD with 4 KB/block, 189120960 KB total size, 48 KB/track, compatible disk layout
[    7.558349]  dasda:VOL1/  0X3318: dasda1
[    7.558949] dasd-eckd 0.0.331a: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors
[    7.561612] dasd-eckd 0.0.331a: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout
[    7.563390] dasd-eckd 0.0.331b: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors
[    7.563427]  dasdc:VOL1/  0X3333: dasdc1 dasdc2 dasdc3
[    7.566603] dasd-eckd 0.0.331b: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout
[    7.567763]  dasdd:VOL1/  0X3333: dasdd1 dasdd2 dasdd3
[    7.575390] qeth 0.0.bd00: QDIO data connection isolation is deactivated
[    7.575863] qeth 0.0.bd00: The device represents a Bridge Capable Port
[    7.576279] audit: type=1130 audit(1593597950.461:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.579230] qeth 0.0.bd00: MAC address b6:c4:0f:21:ab:4e successfully registered
[    7.579749] qeth 0.0.bd00: Device is a OSD Express card (level: 0202)
               with link type OSD_10GIG.
[    7.590981] audit: type=1130 audit(1593597950.471:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.621333] qeth 0.0.bd00: MAC address de:45:d7:61:c4:13 successfully registered
[    7.623582] qeth 0.0.bd00 encbd00: renamed from eth0
[    7.812522] mlx5_core 0001:00:00.0: enabling device (0000 -> 0002)
[    7.812608] mlx5_core 0001:00:00.0: firmware version: 14.23.1020
[    8.035163] dasdconf.sh Warning: 0.0.3319 is already online, not configuring
[    8.037162] dasdconf.sh Warning: 0.0.331b is already online, not configuring
[    8.037885] dasdconf.sh Warning: 0.0.3318 is already online, not configuring
[    8.065688] dasdconf.sh Warning: 0.0.331a is already online, not configuring
[    8.254855] umh: sub_info->path: /sbin/modprobe
[    8.254857] /sbin/modprobe 
[    8.254858] -q 
[    8.254858] -- 
[    8.254859] mlx5_ib 

[    8.255376] mlx5_core 0002:00:00.0: enabling device (0000 -> 0002)
[    8.255460] mlx5_core 0002:00:00.0: firmware version: 14.23.1020
[    8.706511] umh: sub_info->path: /sbin/modprobe
[    8.706542] /sbin/modprobe 
[    8.706546] -q 
[    8.706555] -- 
[    8.706563] mlx5_ib 

[    8.708963] mlx5_core 0001:00:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[    8.853364] mlx5_core 0002:00:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[    9.023031] mlx5_core 0002:00:00.0 enP2s564np0: renamed from eth1
[    9.223067] mlx5_core 0001:00:00.0 enP1s519np0: renamed from eth0
[    9.553829] audit: type=1130 audit(1593597952.441:11): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.570174] audit: type=1130 audit(1593597952.451:12): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.578552] EXT4-fs (dasda1): mounted filesystem with ordered data mode. Opts: (null)
[    9.586599] audit: type=1334 audit(1593597952.471:13): prog-id=5 op=UNLOAD
[    9.937200] systemd-journald[484]: Received SIGTERM from PID 1 (systemd).
[    9.954691] printk: systemd: 19 output lines suppressed due to ratelimiting
[   10.278153] SELinux:  Permission watch in class filesystem not defined in policy.
[   10.278160] SELinux:  Permission watch in class file not defined in policy.
[   10.278161] SELinux:  Permission watch_mount in class file not defined in policy.
[   10.278162] SELinux:  Permission watch_sb in class file not defined in policy.
[   10.278163] SELinux:  Permission watch_with_perm in class file not defined in policy.
[   10.278164] SELinux:  Permission watch_reads in class file not defined in policy.
[   10.278167] SELinux:  Permission watch in class dir not defined in policy.
[   10.278168] SELinux:  Permission watch_mount in class dir not defined in policy.
[   10.278169] SELinux:  Permission watch_sb in class dir not defined in policy.
[   10.278170] SELinux:  Permission watch_with_perm in class dir not defined in policy.
[   10.278171] SELinux:  Permission watch_reads in class dir not defined in policy.
[   10.278174] SELinux:  Permission watch in class lnk_file not defined in policy.
[   10.278175] SELinux:  Permission watch_mount in class lnk_file not defined in policy.
[   10.278176] SELinux:  Permission watch_sb in class lnk_file not defined in policy.
[   10.278177] SELinux:  Permission watch_with_perm in class lnk_file not defined in policy.
[   10.278178] SELinux:  Permission watch_reads in class lnk_file not defined in policy.
[   10.278180] SELinux:  Permission watch in class chr_file not defined in policy.
[   10.278198] SELinux:  Permission watch_mount in class chr_file not defined in policy.
[   10.278199] SELinux:  Permission watch_sb in class chr_file not defined in policy.
[   10.278200] SELinux:  Permission watch_with_perm in class chr_file not defined in policy.
[   10.278201] SELinux:  Permission watch_reads in class chr_file not defined in policy.
[   10.278203] SELinux:  Permission watch in class blk_file not defined in policy.
[   10.278204] SELinux:  Permission watch_mount in class blk_file not defined in policy.
[   10.278205] SELinux:  Permission watch_sb in class blk_file not defined in policy.
[   10.278206] SELinux:  Permission watch_with_perm in class blk_file not defined in policy.
[   10.278207] SELinux:  Permission watch_reads in class blk_file not defined in policy.
[   10.278209] SELinux:  Permission watch in class sock_file not defined in policy.
[   10.278211] SELinux:  Permission watch_mount in class sock_file not defined in policy.
[   10.278212] SELinux:  Permission watch_sb in class sock_file not defined in policy.
[   10.278212] SELinux:  Permission watch_with_perm in class sock_file not defined in policy.
[   10.278213] SELinux:  Permission watch_reads in class sock_file not defined in policy.
[   10.278216] SELinux:  Permission watch in class fifo_file not defined in policy.
[   10.278217] SELinux:  Permission watch_mount in class fifo_file not defined in policy.
[   10.278218] SELinux:  Permission watch_sb in class fifo_file not defined in policy.
[   10.278219] SELinux:  Permission watch_with_perm in class fifo_file not defined in policy.
[   10.278220] SELinux:  Permission watch_reads in class fifo_file not defined in policy.
[   10.278262] SELinux:  Permission perfmon in class capability2 not defined in policy.
[   10.278263] SELinux:  Permission bpf in class capability2 not defined in policy.
[   10.278268] SELinux:  Permission perfmon in class cap2_userns not defined in policy.
[   10.278269] SELinux:  Permission bpf in class cap2_userns not defined in policy.
[   10.278304] SELinux:  Class perf_event not defined in policy.
[   10.278305] SELinux:  Class lockdown not defined in policy.
[   10.278306] SELinux: the above unknown classes and permissions will be allowed
[   10.278311] SELinux:  policy capability network_peer_controls=1
[   10.278311] SELinux:  policy capability open_perms=1
[   10.278312] SELinux:  policy capability extended_socket_class=1
[   10.278313] SELinux:  policy capability always_check_network=0
[   10.278314] SELinux:  policy capability cgroup_seclabel=1
[   10.278314] SELinux:  policy capability nnp_nosuid_transition=1
[   10.278315] SELinux:  policy capability genfs_seclabel_symlinks=0
[   10.389095] systemd[1]: Successfully loaded SELinux policy in 339.175ms.
[   10.534855] systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 13.169ms.
[   10.537104] systemd[1]: systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[   10.537930] systemd[1]: Detected architecture s390x.
[   10.540018] systemd[1]: Set hostname to <m83lp52.lnxne.boe>.
[   10.716298] systemd[1]: /usr/lib/systemd/system/sssd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/sssd.pid → /run/sssd.pid; please update the unit file accordingly.
[   10.721240] systemd[1]: /usr/lib/systemd/system/iscsid.service:11: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsid.pid → /run/iscsid.pid; please update the unit file accordingly.
[   10.721443] systemd[1]: /usr/lib/systemd/system/iscsiuio.service:13: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsiuio.pid → /run/iscsiuio.pid; please update the unit file accordingly.
[   10.748479] systemd[1]: /usr/lib/systemd/system/sssd-kcm.socket:7: ListenStream= references a path below legacy directory /var/run/, updating /var/run/.heim_org.h5l.kcm-socket → /run/.heim_org.h5l.kcm-socket; please update the unit file accordingly.
[   10.781859] systemd[1]: initrd-switch-root.service: Succeeded.
[   10.781985] systemd[1]: Stopped Switch Root.
[   10.782251] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[   10.810779] EXT4-fs (dasda1): re-mounted. Opts: (null)
[   10.841271] umh: sub_info->path: /sbin/modprobe
[   10.841275] /sbin/modprobe 
[   10.841275] -q 
[   10.841276] -- 
[   10.841277] sch_fq_codel 

[   10.889454] == ret: 00
[   10.889456] == KWIFEXITED(ret): 01
[   10.889457] KWEXITSTATUS(ret): 0
[   11.118962] systemd-journald[990]: Received client request to flush runtime journal.
[   11.392641] VFIO - User Level meta-driver version: 0.3
[   11.490782] genwqe 0000:00:00.0: enabling device (0000 -> 0002)
[   11.506302] dasdconf.sh Warning: 0.0.3319 is already online, not configuring
[   11.507431] dasdconf.sh Warning: 0.0.331a is already online, not configuring
[   11.507992] dasdconf.sh Warning: 0.0.331b is already online, not configuring
[   11.508762] dasdconf.sh Warning: 0.0.3318 is already online, not configuring
[   11.625686] umh: sub_info->path: /sbin/modprobe
[   11.625690] /sbin/modprobe 
[   11.625690] -q 
[   11.625691] -- 
[   11.625692] crypto-ecb(aes) 

[   11.627270] == ret: 100
[   11.627272] == KWIFEXITED(ret): 01
[   11.627274] KWEXITSTATUS(ret): 1
[   11.627678] umh: sub_info->path: /sbin/modprobe
[   11.627681] /sbin/modprobe 
[   11.627682] -q 
[   11.627682] -- 
[   11.627683] crypto-cbc(aes) 

[   11.628695] == ret: 100
[   11.628699] == KWIFEXITED(ret): 01
[   11.628699] KWEXITSTATUS(ret): 1
[   11.629108] umh: sub_info->path: /sbin/modprobe
[   11.629110] /sbin/modprobe 
[   11.629110] -q 
[   11.629111] -- 
[   11.629112] crypto-xts(aes) 

[   11.630032] == ret: 100
[   11.630034] == KWIFEXITED(ret): 01
[   11.630035] KWEXITSTATUS(ret): 1
[   11.630098] umh: sub_info->path: /sbin/modprobe
[   11.630100] /sbin/modprobe 
[   11.630100] -q 
[   11.630101] -- 
[   11.630102] crypto-aes 

[   11.630979] == ret: 100
[   11.630980] == KWIFEXITED(ret): 01
[   11.631000] KWEXITSTATUS(ret): 1
[   11.631010] umh: sub_info->path: /sbin/modprobe
[   11.631012] /sbin/modprobe 
[   11.631012] -q 
[   11.631012] -- 
[   11.631013] cryptomgr 

[   11.631844] == ret: 00
[   11.631845] == KWIFEXITED(ret): 01
[   11.631846] KWEXITSTATUS(ret): 0
[   11.632291] umh: sub_info->path: /sbin/modprobe
[   11.632293] /sbin/modprobe 
[   11.632293] -q 
[   11.632294] -- 
[   11.632295] crypto-ctr(aes) 

[   11.633134] == ret: 100
[   11.633136] == KWIFEXITED(ret): 01
[   11.633136] KWEXITSTATUS(ret): 1
[   11.633211] umh: sub_info->path: /sbin/modprobe
[   11.633213] /sbin/modprobe 
[   11.633213] -q 
[   11.633214] -- 
[   11.633215] crypto-ctr 

[   11.639862] == ret: 00
[   11.639863] == KWIFEXITED(ret): 01
[   11.639864] KWEXITSTATUS(ret): 0
[   12.291736] device-mapper: uevent: version 1.0.3
[   12.291869] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
[   12.811519] kauditd_printk_skb: 81 callbacks suppressed
[   12.811520] audit: type=1130 audit(1593597955.691:93): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.811526] audit: type=1131 audit(1593597955.691:94): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.827013] audit: type=1130 audit(1593597955.711:95): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2dpath-ccw\x2d0.0.3319\x2dpart1 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.832192] XFS (dasdb1): Mounting V5 Filesystem
[   12.842643] RPC: Registered named UNIX socket transport module.
[   12.842645] RPC: Registered udp transport module.
[   12.842646] RPC: Registered tcp transport module.
[   12.842647] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   12.855633] XFS (dasdb1): Ending clean mount
[   12.857782] xfs filesystem being mounted at /home supports timestamps until 2038 (0x7fffffff)
[   12.876314] audit: type=1130 audit(1593597955.761:96): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.896111] audit: type=1130 audit(1593597955.781:97): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.896302] audit: type=1130 audit(1593597955.781:98): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.899463] RPC: Registered rdma transport module.
[   12.899466] RPC: Registered rdma backchannel transport module.
[   12.900337] audit: type=1130 audit(1593597955.781:99): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rdma-load-modules@rdma comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.916905] audit: type=1130 audit(1593597955.801:100): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rdma comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.952843] audit: type=1130 audit(1593597955.841:101): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.962072] audit: type=1305 audit(1593597955.841:102): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[   13.444427] umh: sub_info->path: /sbin/modprobe
[   13.444430] /sbin/modprobe 
[   13.444430] -q 
[   13.444431] -- 
[   13.444432] iptable_filter 

[   13.485947] == ret: 00
[   13.485949] == KWIFEXITED(ret): 01
[   13.485950] KWEXITSTATUS(ret): 0
[   13.525155] umh: sub_info->path: /sbin/modprobe
[   13.525156] /sbin/modprobe 
[   13.525157] -q 
[   13.525157] -- 
[   13.525158] ip6table_filter 

[   13.575540] == ret: 00
[   13.575542] == KWIFEXITED(ret): 01
[   13.575543] KWEXITSTATUS(ret): 0
[   13.590423] umh: sub_info->path: /sbin/modprobe
[   13.590425] /sbin/modprobe 
[   13.590425] -q 
[   13.590426] -- 
[   13.590427] net-pf-16-proto-12 

[   13.617106] == ret: 00
[   13.617108] == KWIFEXITED(ret): 01
[   13.617109] KWEXITSTATUS(ret): 0
[   13.617147] umh: sub_info->path: /sbin/modprobe
[   13.617148] /sbin/modprobe 
[   13.617149] -q 
[   13.617149] -- 
[   13.617150] nfnetlink-subsys-6 

[   13.638728] == ret: 00
[   13.638729] == KWIFEXITED(ret): 01
[   13.638730] KWEXITSTATUS(ret): 0
[   13.831375] umh: sub_info->path: /sbin/modprobe
[   13.831377] /sbin/modprobe 
[   13.831378] -q 
[   13.831379] -- 
[   13.831379] iptable_security 

[   13.846372] == ret: 00
[   13.846374] == KWIFEXITED(ret): 01
[   13.846375] KWEXITSTATUS(ret): 0
[   13.847932] umh: sub_info->path: /sbin/modprobe
[   13.847934] /sbin/modprobe 
[   13.847934] -q 
[   13.847935] -- 
[   13.847935] iptable_raw 

[   13.854549] == ret: 00
[   13.854550] == KWIFEXITED(ret): 01
[   13.854551] KWEXITSTATUS(ret): 0
[   13.855856] umh: sub_info->path: /sbin/modprobe
[   13.855857] /sbin/modprobe 
[   13.855857] -q 
[   13.855858] -- 
[   13.855858] iptable_mangle 

[   13.860364] == ret: 00
[   13.860365] == KWIFEXITED(ret): 01
[   13.860366] KWEXITSTATUS(ret): 0
[   13.861761] umh: sub_info->path: /sbin/modprobe
[   13.861762] /sbin/modprobe 
[   13.861763] -q 
[   13.861763] -- 
[   13.861764] iptable_nat 

[   13.911825] == ret: 00
[   13.911826] == KWIFEXITED(ret): 01
[   13.911827] KWEXITSTATUS(ret): 0
[   13.914686] umh: sub_info->path: /sbin/modprobe
[   13.914688] /sbin/modprobe 
[   13.914688] -q 
[   13.914689] -- 
[   13.914689] ip6table_security 

[   13.945628] == ret: 00
[   13.945630] == KWIFEXITED(ret): 01
[   13.945630] KWEXITSTATUS(ret): 0
[   13.947019] umh: sub_info->path: /sbin/modprobe
[   13.947020] /sbin/modprobe 
[   13.947020] -q 
[   13.947021] -- 
[   13.947022] ip6table_raw 

[   13.969857] == ret: 00
[   13.969859] == KWIFEXITED(ret): 01
[   13.969860] KWEXITSTATUS(ret): 0
[   13.971549] umh: sub_info->path: /sbin/modprobe
[   13.971551] /sbin/modprobe 
[   13.971552] -q 
[   13.971553] -- 
[   13.971553] ip6table_mangle 

[   13.977724] == ret: 00
[   13.977725] == KWIFEXITED(ret): 01
[   13.977726] KWEXITSTATUS(ret): 0
[   13.979392] umh: sub_info->path: /sbin/modprobe
[   13.979393] /sbin/modprobe 
[   13.979394] -q 
[   13.979394] -- 
[   13.979395] ip6table_nat 

[   13.985103] == ret: 00
[   13.985104] == KWIFEXITED(ret): 01
[   13.985105] KWEXITSTATUS(ret): 0
[   14.014760] umh: sub_info->path: /sbin/modprobe
[   14.014762] /sbin/modprobe 
[   14.014762] -q 
[   14.014762] -- 
[   14.014763] ipt_conntrack 

[   14.039306] mlx5_core 0001:00:00.0 enP1s519np0: Link up
[   14.041835] IPv6: ADDRCONF(NETDEV_CHANGE): enP1s519np0: link becomes ready
[   14.047723] == ret: 00
[   14.047725] == KWIFEXITED(ret): 01
[   14.047726] KWEXITSTATUS(ret): 0
[   14.048012] umh: sub_info->path: /sbin/modprobe
[   14.048014] /sbin/modprobe 
[   14.048014] -q 
[   14.048015] -- 
[   14.048015] ipt_REJECT 

[   14.073846] == ret: 00
[   14.073848] == KWIFEXITED(ret): 01
[   14.073848] KWEXITSTATUS(ret): 0
[   14.077239] umh: sub_info->path: /sbin/modprobe
[   14.077241] /sbin/modprobe 
[   14.077241] -q 
[   14.077242] -- 
[   14.077242] ip6t_rpfilter 

[   14.099370] == ret: 00
[   14.099372] == KWIFEXITED(ret): 01
[   14.099373] KWEXITSTATUS(ret): 0
[   14.100300] umh: sub_info->path: /sbin/modprobe
[   14.100301] /sbin/modprobe 
[   14.100302] -q 
[   14.100302] -- 
[   14.100303] ip6t_REJECT 

[   14.122176] == ret: 00
[   14.122177] == KWIFEXITED(ret): 01
[   14.122178] KWEXITSTATUS(ret): 0
[   14.128047] umh: sub_info->path: /sbin/modprobe
[   14.128048] /sbin/modprobe 
[   14.128048] -q 
[   14.128049] -- 
[   14.128049] ip6t_tcp 

[   14.149302] mlx5_core 0002:00:00.0 enP2s564np0: Link up
[   14.173411] == ret: 00
[   14.173413] == KWIFEXITED(ret): 01
[   14.173414] KWEXITSTATUS(ret): 0
[   14.322406] umh: sub_info->path: /sbin/modprobe
[   14.322409] /sbin/modprobe 
[   14.322410] -q 
[   14.322410] -- 
[   14.322411] rtnl-link-bridge 

[   14.389015] == ret: 00
[   14.389017] == KWIFEXITED(ret): 01
[   14.389018] KWEXITSTATUS(ret): 0
[   14.389335] umh: sub_info->path: /sbin/modprobe
[   14.389336] /sbin/modprobe 
[   14.389336] -q 
[   14.389337] -- 
[   14.389338] char-major-10-200 

[   14.437420] tun: Universal TUN/TAP device driver, 1.6
[   14.437604] == ret: 00
[   14.437606] == KWIFEXITED(ret): 01
[   14.437606] KWEXITSTATUS(ret): 0
[   14.438482] virbr0: port 1(virbr0-nic) entered blocking state
[   14.438485] virbr0: port 1(virbr0-nic) entered disabled state
[   14.438635] device virbr0-nic entered promiscuous mode
[   14.439654] umh: sub_info->path: /sbin/bridge-stp
[   14.439656] /sbin/bridge-stp 
[   14.439656] virbr0 
[   14.439656] start 

[   14.439734] == ret: 00
[   14.439735] == KWIFEXITED(ret): 01
[   14.439736] KWEXITSTATUS(ret): 0
[   14.567072] umh: sub_info->path: /sbin/modprobe
[   14.567074] /sbin/modprobe 
[   14.567075] -q 
[   14.567075] -- 
[   14.567076] ipt_CT 

[   14.576352] == ret: 00
[   14.576353] == KWIFEXITED(ret): 01
[   14.576354] KWEXITSTATUS(ret): 0
[   14.576443] umh: sub_info->path: /sbin/modprobe
[   14.576444] /sbin/modprobe 
[   14.576444] -q 
[   14.576445] -- 
[   14.576445] nfct-helper-tftp 

[   14.605182] == ret: 00
[   14.605183] == KWIFEXITED(ret): 01
[   14.605184] KWEXITSTATUS(ret): 0
[   14.675162] umh: sub_info->path: /sbin/modprobe
[   14.675164] /sbin/modprobe 
[   14.675164] -q 
[   14.675165] -- 
[   14.675165] ipt_MASQUERADE 

[   14.684223] == ret: 00
[   14.684224] == KWIFEXITED(ret): 01
[   14.684225] KWEXITSTATUS(ret): 0
[   14.701460] umh: sub_info->path: /sbin/modprobe
[   14.701462] /sbin/modprobe 
[   14.701462] -q 
[   14.701462] -- 
[   14.701463] ipt_CHECKSUM 

[   14.707019] == ret: 00
[   14.707021] == KWIFEXITED(ret): 01
[   14.707022] KWEXITSTATUS(ret): 0
[   14.707948] virbr0: port 1(virbr0-nic) entered blocking state
[   14.731752] virbr0: port 1(virbr0-nic) entered disabled state
[   14.981871] umh: sub_info->path: /sbin/modprobe
[   14.981873] /sbin/modprobe 
[   14.981873] -q 
[   14.981874] -- 
[   14.981874] char-major-10-232 

[   15.014374] == ret: 00
[   15.014375] == KWIFEXITED(ret): 01
[   15.014376] KWEXITSTATUS(ret): 0
[   15.102392] IPv6: ADDRCONF(NETDEV_CHANGE): enP2s564np0: link becomes ready
[   20.341424] virbr0: port 2(vnet0) entered blocking state
[   20.341429] virbr0: port 2(vnet0) entered disabled state
[   20.341514] device vnet0 entered promiscuous mode
[   20.341714] virbr0: port 2(vnet0) entered blocking state
[   20.341988] umh: sub_info->path: /sbin/modprobe
[   20.341989] /sbin/modprobe 
[   20.341989] -q 
[   20.341990] -- 
[   20.341991] char-major-10-238 

[   20.408823] == ret: 00
[   20.408825] == KWIFEXITED(ret): 01
[   20.408826] KWEXITSTATUS(ret): 0

^ permalink raw reply

* Re: INFO: task hung in request_key_tag
From: Tetsuo Handa @ 2020-07-01 10:04 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: syzbot, dhowells, jarkko.sakkinen, jmorris, keyrings,
	linux-kernel, linux-security-module, serge, syzkaller-bugs
In-Reply-To: <000000000000961dea05a95c9558@google.com>

I suspect commit 9e9b47d6bbe9df65 ("umh: fix processed error when UMH_WAIT_PROC is used").
Maybe the change in kernel/umh.c and/or security/keys/request_key.c made by that commit is
affecting call_usermodehelper_keys() == 0 case when complete_request_key() is called.

On 2020/07/01 16:53, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    c28e58ee Add linux-next specific files for 20200629
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=17925a9d100000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=dcd26bbca17dd1db
> dashboard link: https://syzkaller.appspot.com/bug?extid=46c77dc7e98c732de754
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+46c77dc7e98c732de754@syzkaller.appspotmail.com

^ permalink raw reply

* Re: [PATCH v4 3/3] prctl: Allow ptrace capable processes to change /proc/self/exe
From: Christian Brauner @ 2020-07-01  8:55 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Eric Biederman, Pavel Emelyanov, Oleg Nesterov, Dmitry Safonov,
	Andrei Vagin, Nicolas Viennot, Michał Cłapiński,
	Kamil Yurtsever, Dirk Petersen, Christine Flood, Casey Schaufler,
	Mike Rapoport, Radostin Stoyanov, Cyrill Gorcunov, Serge Hallyn,
	Stephen Smalley, Sargun Dhillon, Arnd Bergmann,
	linux-security-module, linux-kernel, selinux, Eric Paris,
	Jann Horn, linux-fsdevel
In-Reply-To: <20200701064906.323185-4-areber@redhat.com>

On Wed, Jul 01, 2020 at 08:49:06AM +0200, Adrian Reber wrote:
> From: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
> 
> Previously, the current process could only change the /proc/self/exe
> link with local CAP_SYS_ADMIN.
> This commit relaxes this restriction by permitting such change with
> CAP_CHECKPOINT_RESTORE, and the ability to use ptrace.
> 
> With access to ptrace facilities, a process can do the following: fork a
> child, execve() the target executable, and have the child use ptrace()
> to replace the memory content of the current process. This technique
> makes it possible to masquerade an arbitrary program as any executable,
> even setuid ones.
> 
> Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
> Signed-off-by: Adrian Reber <areber@redhat.com>
> ---
>  include/linux/lsm_hook_defs.h |  1 +
>  include/linux/security.h      |  6 ++++++
>  kernel/sys.c                  | 12 ++++--------
>  security/commoncap.c          | 26 ++++++++++++++++++++++++++
>  security/security.c           |  5 +++++
>  security/selinux/hooks.c      | 14 ++++++++++++++
>  6 files changed, 56 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 0098852bb56a..90e51d5e093b 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -211,6 +211,7 @@ LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
>  	 int sig, const struct cred *cred)
>  LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
>  	 unsigned long arg3, unsigned long arg4, unsigned long arg5)
> +LSM_HOOK(int, 0, prctl_set_mm_exe_file, struct file *exe_file)
>  LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
>  	 struct inode *inode)
>  LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 2797e7f6418e..0f594eb7e766 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -412,6 +412,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
>  			int sig, const struct cred *cred);
>  int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>  			unsigned long arg4, unsigned long arg5);
> +int security_prctl_set_mm_exe_file(struct file *exe_file);
>  void security_task_to_inode(struct task_struct *p, struct inode *inode);
>  int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
>  void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid);
> @@ -1124,6 +1125,11 @@ static inline int security_task_prctl(int option, unsigned long arg2,
>  	return cap_task_prctl(option, arg2, arg3, arg4, arg5);
>  }
>  
> +static inline int security_prctl_set_mm_exe_file(struct file *exe_file)
> +{
> +	return cap_prctl_set_mm_exe_file(exe_file);
> +}
> +
>  static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
>  { }
>  
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 00a96746e28a..bb53e8408c63 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1851,6 +1851,10 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
>  	if (err)
>  		goto exit;
>  
> +	err = security_prctl_set_mm_exe_file(exe.file);
> +	if (err)
> +		goto exit;
> +
>  	/*
>  	 * Forbid mm->exe_file change if old file still mapped.
>  	 */
> @@ -2006,14 +2010,6 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  	}
>  
>  	if (prctl_map.exe_fd != (u32)-1) {
> -		/*
> -		 * Make sure the caller has the rights to
> -		 * change /proc/pid/exe link: only local sys admin should
> -		 * be allowed to.
> -		 */
> -		if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> -			return -EINVAL;
> -
>  		error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
>  		if (error)
>  			return error;
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 59bf3c1674c8..663d00fe2ecc 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -1291,6 +1291,31 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>  	}
>  }
>  
> +/**
> + * cap_prctl_set_mm_exe_file - Determine whether /proc/self/exe can be changed
> + * by the current process.
> + * @exe_file: The new exe file
> + * Returns 0 if permission is granted, -ve if denied.
> + *
> + * The current process is permitted to change its /proc/self/exe link via two policies:
> + * 1) The current user can do checkpoint/restore. At the time of this writing,
> + *    this means CAP_SYS_ADMIN or CAP_CHECKPOINT_RESTORE capable.
> + * 2) The current user can use ptrace.
> + *
> + * With access to ptrace facilities, a process can do the following:
> + * fork a child, execve() the target executable, and have the child use
> + * ptrace() to replace the memory content of the current process.
> + * This technique makes it possible to masquerade an arbitrary program as the
> + * target executable, even if it is setuid.
> + */
> +int cap_prctl_set_mm_exe_file(struct file *exe_file)
> +{
> +	if (checkpoint_restore_ns_capable(current_user_ns()))
> +		return 0;
> +
> +	return security_ptrace_access_check(current, PTRACE_MODE_ATTACH_REALCREDS);
> +}

What is the reason for having this be a new security hook? Doesn't look
like it needs to be unless I'm missing something. This just seems more
complex than it needs to be.

I might be wrong here but if you look at the callsites for
security_ptrace_access_check() right now, you'll see that it's only
called from kernel/ptrace.c in __ptrace_may_access() and that function
checks right at the top:

	/* Don't let security modules deny introspection */
	if (same_thread_group(task, current))
		return 0;

since you're passing in same_thread_group(current, current) you're
passing this check and never even hitting
security_ptrace_access_check(). So the contract seems to be (as is
obvious from the comment) that a task can't be denied ptrace
introspection. But if you're using security_ptrace_access_check(current)
here and _if_ there would be any lsm that would deny ptrace
introspection to current you'd suddenly introduce a callsite where
ptrace introspection is denied. That seems wrong. So either you meant to
do something else here or you really just want:

checkpoint_restore_ns_capable(current_user_ns())

and none of the rest. But I might be missing the big picture in this
patch.

> +	if (checkpoint_restore_ns_capable(current_user_ns()))
> +		return 0;
> +
>  /**
>   * cap_vm_enough_memory - Determine whether a new virtual mapping is permitted
>   * @mm: The VM space in which the new mapping is to be made
> @@ -1356,6 +1381,7 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(mmap_file, cap_mmap_file),
>  	LSM_HOOK_INIT(task_fix_setuid, cap_task_fix_setuid),
>  	LSM_HOOK_INIT(task_prctl, cap_task_prctl),
> +	LSM_HOOK_INIT(prctl_set_mm_exe_file, cap_prctl_set_mm_exe_file),
>  	LSM_HOOK_INIT(task_setscheduler, cap_task_setscheduler),
>  	LSM_HOOK_INIT(task_setioprio, cap_task_setioprio),
>  	LSM_HOOK_INIT(task_setnice, cap_task_setnice),
> diff --git a/security/security.c b/security/security.c
> index 2bb912496232..13a1ed32f9e3 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1790,6 +1790,11 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>  	return rc;
>  }
>  
> +int security_prctl_set_mm_exe_file(struct file *exe_file)
> +{
> +	return call_int_hook(prctl_set_mm_exe_file, 0, exe_file);
> +}
> +
>  void security_task_to_inode(struct task_struct *p, struct inode *inode)
>  {
>  	call_void_hook(task_to_inode, p, inode);
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index ca901025802a..fca5581392b8 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4156,6 +4156,19 @@ static int selinux_task_kill(struct task_struct *p, struct kernel_siginfo *info,
>  			    secid, task_sid(p), SECCLASS_PROCESS, perm, NULL);
>  }
>  
> +static int selinux_prctl_set_mm_exe_file(struct file *exe_file)
> +{
> +	u32 sid = current_sid();
> +
> +	struct common_audit_data ad = {
> +		.type = LSM_AUDIT_DATA_FILE,
> +		.u.file = exe_file,
> +	};
> +
> +	return avc_has_perm(&selinux_state, sid, sid,
> +			    SECCLASS_FILE, FILE__EXECUTE_NO_TRANS, &ad);
> +}
> +
>  static void selinux_task_to_inode(struct task_struct *p,
>  				  struct inode *inode)
>  {
> @@ -7057,6 +7070,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(task_getscheduler, selinux_task_getscheduler),
>  	LSM_HOOK_INIT(task_movememory, selinux_task_movememory),
>  	LSM_HOOK_INIT(task_kill, selinux_task_kill),
> +	LSM_HOOK_INIT(prctl_set_mm_exe_file, selinux_prctl_set_mm_exe_file),
>  	LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
>  
>  	LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [PATCH v4 1/3] capabilities: Introduce CAP_CHECKPOINT_RESTORE
From: Christian Brauner @ 2020-07-01  8:27 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Eric Biederman, Pavel Emelyanov, Oleg Nesterov, Dmitry Safonov,
	Andrei Vagin, Nicolas Viennot, Michał Cłapiński,
	Kamil Yurtsever, Dirk Petersen, Christine Flood, Casey Schaufler,
	Mike Rapoport, Radostin Stoyanov, Cyrill Gorcunov, Serge Hallyn,
	Stephen Smalley, Sargun Dhillon, Arnd Bergmann,
	linux-security-module, linux-kernel, selinux, Eric Paris,
	Jann Horn, linux-fsdevel
In-Reply-To: <20200701064906.323185-2-areber@redhat.com>

On Wed, Jul 01, 2020 at 08:49:04AM +0200, Adrian Reber wrote:
> This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
> checkpoint/restore for non-root users.
> 
> Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has been
> asked numerous times if it is possible to checkpoint/restore a process as
> non-root. The answer usually was: 'almost'.
> 
> The main blocker to restore a process as non-root was to control the PID of the
> restored process. This feature available via the clone3 system call, or via
> /proc/sys/kernel/ns_last_pid is unfortunately guarded by CAP_SYS_ADMIN.
> 
> In the past two years, requests for non-root checkpoint/restore have increased
> due to the following use cases:
> * Checkpoint/Restore in an HPC environment in combination with a resource
>   manager distributing jobs where users are always running as non-root.
>   There is a desire to provide a way to checkpoint and restore long running
>   jobs.
> * Container migration as non-root
> * We have been in contact with JVM developers who are integrating
>   CRIU into a Java VM to decrease the startup time. These checkpoint/restore
>   applications are not meant to be running with CAP_SYS_ADMIN.
> 
> We have seen the following workarounds:
> * Use a setuid wrapper around CRIU:
>   See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
> * Use a setuid helper that writes to ns_last_pid.
>   Unfortunately, this helper delegation technique is impossible to use with
>   clone3, and is thus prone to races.
>   See https://github.com/twosigma/set_ns_last_pid
> * Cycle through PIDs with fork() until the desired PID is reached:
>   This has been demonstrated to work with cycling rates of 100,000 PIDs/s
>   See https://github.com/twosigma/set_ns_last_pid
> * Patch out the CAP_SYS_ADMIN check from the kernel
> * Run the desired application in a new user and PID namespace to provide
>   a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited use in
>   typical container environments (e.g., Kubernetes) as /proc is
>   typically protected with read-only layers (e.g., /proc/sys) for hardening
>   purposes. Read-only layers prevent additional /proc mounts (due to proc's
>   SB_I_USERNS_VISIBLE property), making the use of new PID namespaces limited as
>   certain applications need access to /proc matching their PID namespace.
> 
> The introduced capability allows to:
> * Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
>   for the corresponding PID namespace via ns_last_pid/clone3.
> * Open files in /proc/pid/map_files when the current user is
>   CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for recovering
>   files that are unreachable via the file system such as deleted files, or memfd
>   files.
> 
> See corresponding selftest for an example with clone3().
> 
> Signed-off-by: Adrian Reber <areber@redhat.com>
> Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
> ---

I think that now looks reasonable. A few comments.

Before we proceed, please split the addition of
checkpoint_restore_ns_capable() out into a separate patch.
In fact, I think the cleanest way of doing this would be:
- 0/n capability: add CAP_CHECKPOINT_RESTORE
- 1/n pid: use checkpoint_restore_ns_capable() for set_tid
- 2/n pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid
- 3/n: proc: require checkpoint_restore_ns_capable() in init userns for map_files

(commit subjects up to you of course) and a nice commit message for each
time we relax a permissions on something so we have a clear separate
track record for each change in case we need to revert something. Then
the rest of the patches in this series. Testing patches probably last.

>  fs/proc/base.c                      | 8 ++++----
>  include/linux/capability.h          | 6 ++++++
>  include/uapi/linux/capability.h     | 9 ++++++++-
>  kernel/pid.c                        | 2 +-
>  kernel/pid_namespace.c              | 2 +-
>  security/selinux/include/classmap.h | 5 +++--
>  6 files changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index d86c0afc8a85..ad806069c778 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2189,16 +2189,16 @@ struct map_files_info {
>  };
>  
>  /*
> - * Only allow CAP_SYS_ADMIN to follow the links, due to concerns about how the
> - * symlinks may be used to bypass permissions on ancestor directories in the
> - * path to the file in question.
> + * Only allow CAP_SYS_ADMIN and CAP_CHECKPOINT_RESTORE to follow the links, due
> + * to concerns about how the symlinks may be used to bypass permissions on
> + * ancestor directories in the path to the file in question.
>   */
>  static const char *
>  proc_map_files_get_link(struct dentry *dentry,
>  			struct inode *inode,
>  		        struct delayed_call *done)
>  {
> -	if (!capable(CAP_SYS_ADMIN))
> +	if (!capable(CAP_SYS_ADMIN) && !capable(CAP_CHECKPOINT_RESTORE))
>  		return ERR_PTR(-EPERM);

I think it's clearer if you just use:
checkpoint_restore_ns_capable(&init_user_ns)

> +static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)

>  
>  	return proc_pid_get_link(dentry, inode, done);
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index b4345b38a6be..1e7fe311cabe 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -261,6 +261,12 @@ static inline bool bpf_capable(void)
>  	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
>  }
>  
> +static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)
> +{
> +	return ns_capable(ns, CAP_CHECKPOINT_RESTORE) ||
> +		ns_capable(ns, CAP_SYS_ADMIN);
> +}
> +
>  /* audit system wants to get cap info from files as well */
>  extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
>  
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 48ff0757ae5e..395dd0df8d08 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -408,7 +408,14 @@ struct vfs_ns_cap_data {
>   */
>  #define CAP_BPF			39
>  
> -#define CAP_LAST_CAP         CAP_BPF
> +
> +/* Allow checkpoint/restore related operations */
> +/* Allow PID selection during clone3() */
> +/* Allow writing to ns_last_pid */
> +
> +#define CAP_CHECKPOINT_RESTORE	40
> +
> +#define CAP_LAST_CAP         CAP_CHECKPOINT_RESTORE
>  
>  #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
>  
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 5799ae54b89e..2d0a97b7ed7a 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -198,7 +198,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
>  			if (tid != 1 && !tmp->child_reaper)
>  				goto out_free;
>  			retval = -EPERM;
> -			if (!ns_capable(tmp->user_ns, CAP_SYS_ADMIN))
> +			if (!checkpoint_restore_ns_capable(tmp->user_ns))
>  				goto out_free;
>  			set_tid_size--;
>  		}
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 0e5ac162c3a8..ac135bd600eb 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -269,7 +269,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
>  	struct ctl_table tmp = *table;
>  	int ret, next;
>  
> -	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
> +	if (write && !checkpoint_restore_ns_capable(pid_ns->user_ns))
>  		return -EPERM;
>  
>  	/*
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index 98e1513b608a..40cebde62856 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -27,9 +27,10 @@
>  	    "audit_control", "setfcap"
>  
>  #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
> -		"wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf"
> +		"wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf", \
> +		"checkpoint_restore"
>  
> -#if CAP_LAST_CAP > CAP_BPF
> +#if CAP_LAST_CAP > CAP_CHECKPOINT_RESTORE
>  #error New capability defined, please update COMMON_CAP2_PERMS.
>  #endif
>  
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [PATCH v2 11/11] ima: Support additional conditionals in the KEXEC_CMDLINE hook function
From: Dave Young @ 2020-07-01  8:04 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Mimi Zohar, Dmitry Kasatkin, Prakhar Srivastava, kexec,
	James Morris, linux-kernel, Lakshmi Ramasubramanian,
	linux-security-module, Eric Biederman, linux-integrity,
	Serge E . Hallyn
In-Reply-To: <20200626223900.253615-12-tyhicks@linux.microsoft.com>

Hi,
On 06/26/20 at 05:39pm, Tyler Hicks wrote:
> Take the properties of the kexec kernel's inode and the current task
> ownership into consideration when matching a KEXEC_CMDLINE operation to
> the rules in the IMA policy. This allows for some uniformity when
> writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK,
> and KEXEC_CMDLINE operations.
> 
> Prior to this patch, it was not possible to write a set of rules like
> this:
> 
>  dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
>  dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
>  dont_measure func=KEXEC_CMDLINE obj_type=foo_t
>  measure func=KEXEC_KERNEL_CHECK
>  measure func=KEXEC_INITRAMFS_CHECK
>  measure func=KEXEC_CMDLINE
> 
> The inode information associated with the kernel being loaded by a
> kexec_kernel_load(2) syscall can now be included in the decision to
> measure or not
> 
> Additonally, the uid, euid, and subj_* conditionals can also now be
> used in KEXEC_CMDLINE rules. There was no technical reason as to why
> those conditionals weren't being considered previously other than
> ima_match_rules() didn't have a valid inode to use so it immediately
> bailed out for KEXEC_CMDLINE operations rather than going through the
> full list of conditional comparisons.
> 
> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: kexec@lists.infradead.org
> ---
> 
> * v2
>   - Moved the inode parameter of process_buffer_measurement() to be the
>     first parameter so that it more closely matches process_masurement()
> 
>  include/linux/ima.h                          |  4 ++--
>  kernel/kexec_file.c                          |  2 +-
>  security/integrity/ima/ima.h                 |  2 +-
>  security/integrity/ima/ima_api.c             |  2 +-
>  security/integrity/ima/ima_appraise.c        |  2 +-
>  security/integrity/ima/ima_asymmetric_keys.c |  2 +-
>  security/integrity/ima/ima_main.c            | 23 +++++++++++++++-----
>  security/integrity/ima/ima_policy.c          | 17 +++++----------
>  security/integrity/ima/ima_queue_keys.c      |  2 +-
>  9 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 9164e1534ec9..d15100de6cdd 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -25,7 +25,7 @@ extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
>  			      enum kernel_read_file_id id);
>  extern void ima_post_path_mknod(struct dentry *dentry);
>  extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
> -extern void ima_kexec_cmdline(const void *buf, int size);
> +extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
>  
>  #ifdef CONFIG_IMA_KEXEC
>  extern void ima_add_kexec_buffer(struct kimage *image);
> @@ -103,7 +103,7 @@ static inline int ima_file_hash(struct file *file, char *buf, size_t buf_size)
>  	return -EOPNOTSUPP;
>  }
>  
> -static inline void ima_kexec_cmdline(const void *buf, int size) {}
> +static inline void ima_kexec_cmdline(int kernel_fd, const void *buf, int size) {}
>  #endif /* CONFIG_IMA */
>  
>  #ifndef CONFIG_IMA_KEXEC
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index bb05fd52de85..07df431c1f21 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -287,7 +287,7 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd,
>  			goto out;
>  		}
>  
> -		ima_kexec_cmdline(image->cmdline_buf,
> +		ima_kexec_cmdline(kernel_fd, image->cmdline_buf,
>  				  image->cmdline_buf_len - 1);
>  	}
>  
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index 59ec28f5c117..ff2bf57ff0c7 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -265,7 +265,7 @@ void ima_store_measurement(struct integrity_iint_cache *iint, struct file *file,
>  			   struct evm_ima_xattr_data *xattr_value,
>  			   int xattr_len, const struct modsig *modsig, int pcr,
>  			   struct ima_template_desc *template_desc);
> -void process_buffer_measurement(const void *buf, int size,
> +void process_buffer_measurement(struct inode *inode, const void *buf, int size,
>  				const char *eventname, enum ima_hooks func,
>  				int pcr, const char *keyring);
>  void ima_audit_measurement(struct integrity_iint_cache *iint,
> diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
> index bf22de8b7ce0..4f39fb93f278 100644
> --- a/security/integrity/ima/ima_api.c
> +++ b/security/integrity/ima/ima_api.c
> @@ -162,7 +162,7 @@ void ima_add_violation(struct file *file, const unsigned char *filename,
>  
>  /**
>   * ima_get_action - appraise & measure decision based on policy.
> - * @inode: pointer to inode to measure
> + * @inode: pointer to the inode associated with the object being validated
>   * @cred: pointer to credentials structure to validate
>   * @secid: secid of the task being validated
>   * @mask: contains the permission mask (MAY_READ, MAY_WRITE, MAY_EXEC,
> diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
> index a9649b04b9f1..6c52bf7ea7f0 100644
> --- a/security/integrity/ima/ima_appraise.c
> +++ b/security/integrity/ima/ima_appraise.c
> @@ -328,7 +328,7 @@ int ima_check_blacklist(struct integrity_iint_cache *iint,
>  
>  		rc = is_binary_blacklisted(digest, digestsize);
>  		if ((rc == -EPERM) && (iint->flags & IMA_MEASURE))
> -			process_buffer_measurement(digest, digestsize,
> +			process_buffer_measurement(NULL, digest, digestsize,
>  						   "blacklisted-hash", NONE,
>  						   pcr, NULL);
>  	}
> diff --git a/security/integrity/ima/ima_asymmetric_keys.c b/security/integrity/ima/ima_asymmetric_keys.c
> index aaae80c4e376..1c68c500c26f 100644
> --- a/security/integrity/ima/ima_asymmetric_keys.c
> +++ b/security/integrity/ima/ima_asymmetric_keys.c
> @@ -58,7 +58,7 @@ void ima_post_key_create_or_update(struct key *keyring, struct key *key,
>  	 * if the IMA policy is configured to measure a key linked
>  	 * to the given keyring.
>  	 */
> -	process_buffer_measurement(payload, payload_len,
> +	process_buffer_measurement(NULL, payload, payload_len,
>  				   keyring->description, KEY_CHECK, 0,
>  				   keyring->description);
>  }
> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> index 8351b2fd48e0..8a91711ca79b 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -726,6 +726,7 @@ int ima_load_data(enum kernel_load_data_id id)
>  
>  /*
>   * process_buffer_measurement - Measure the buffer to ima log.
> + * @inode: inode associated with the object being measured (NULL for KEY_CHECK)
>   * @buf: pointer to the buffer that needs to be added to the log.
>   * @size: size of buffer(in bytes).
>   * @eventname: event name to be used for the buffer entry.
> @@ -735,7 +736,7 @@ int ima_load_data(enum kernel_load_data_id id)
>   *
>   * Based on policy, the buffer is measured into the ima log.
>   */
> -void process_buffer_measurement(const void *buf, int size,
> +void process_buffer_measurement(struct inode *inode, const void *buf, int size,
>  				const char *eventname, enum ima_hooks func,
>  				int pcr, const char *keyring)
>  {
> @@ -768,7 +769,7 @@ void process_buffer_measurement(const void *buf, int size,
>  	 */
>  	if (func) {
>  		security_task_getsecid(current, &secid);
> -		action = ima_get_action(NULL, current_cred(), secid, 0, func,
> +		action = ima_get_action(inode, current_cred(), secid, 0, func,
>  					&pcr, &template, keyring);
>  		if (!(action & IMA_MEASURE))
>  			return;
> @@ -823,16 +824,26 @@ void process_buffer_measurement(const void *buf, int size,
>  
>  /**
>   * ima_kexec_cmdline - measure kexec cmdline boot args
> + * @kernel_fd: file descriptor of the kexec kernel being loaded
>   * @buf: pointer to buffer
>   * @size: size of buffer
>   *
>   * Buffers can only be measured, not appraised.
>   */
> -void ima_kexec_cmdline(const void *buf, int size)
> +void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
>  {
> -	if (buf && size != 0)
> -		process_buffer_measurement(buf, size, "kexec-cmdline",
> -					   KEXEC_CMDLINE, 0, NULL);
> +	struct fd f;
> +
> +	if (!buf || !size)
> +		return;
> +
> +	f = fdget(kernel_fd);
> +	if (!f.file)
> +		return;
> +
> +	process_buffer_measurement(file_inode(f.file), buf, size,
> +				   "kexec-cmdline", KEXEC_CMDLINE, 0, NULL);
> +	fdput(f);
>  }
>  
>  static int __init init_ima(void)
> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> index 5eb14b567a31..294323b36d06 100644
> --- a/security/integrity/ima/ima_policy.c
> +++ b/security/integrity/ima/ima_policy.c
> @@ -443,13 +443,9 @@ static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode,
>  {
>  	int i;
>  
> -	if ((func == KEXEC_CMDLINE) || (func == KEY_CHECK)) {
> -		if ((rule->flags & IMA_FUNC) && (rule->func == func)) {
> -			if (func == KEY_CHECK)
> -				return ima_match_keyring(rule, keyring, cred);
> -			return true;
> -		}
> -		return false;
> +	if (func == KEY_CHECK) {
> +		return (rule->flags & IMA_FUNC) && (rule->func == func) &&
> +		       ima_match_keyring(rule, keyring, cred);
>  	}
>  	if ((rule->flags & IMA_FUNC) &&
>  	    (rule->func != func && func != POST_SETATTR))
> @@ -1007,10 +1003,9 @@ static bool ima_validate_rule(struct ima_rule_entry *entry)
>  			if (entry->action & ~(MEASURE | DONT_MEASURE))
>  				return false;
>  
> -			if (entry->flags & ~(IMA_FUNC | IMA_PCR))
> -				return false;
> -
> -			if (ima_rule_contains_lsm_cond(entry))
> +			if (entry->flags & ~(IMA_FUNC | IMA_FSMAGIC | IMA_UID |
> +					     IMA_FOWNER | IMA_FSUUID |
> +					     IMA_EUID | IMA_PCR | IMA_FSNAME))
>  				return false;
>  
>  			break;
> diff --git a/security/integrity/ima/ima_queue_keys.c b/security/integrity/ima/ima_queue_keys.c
> index 56ce24a18b66..69a8626a35c0 100644
> --- a/security/integrity/ima/ima_queue_keys.c
> +++ b/security/integrity/ima/ima_queue_keys.c
> @@ -158,7 +158,7 @@ void ima_process_queued_keys(void)
>  
>  	list_for_each_entry_safe(entry, tmp, &ima_keys, list) {
>  		if (!timer_expired)
> -			process_buffer_measurement(entry->payload,
> +			process_buffer_measurement(NULL, entry->payload,
>  						   entry->payload_len,
>  						   entry->keyring_name,
>  						   KEY_CHECK, 0,
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

Although I still do not understand the deep knowledge of IMA, I
still wonder to know what is the effect to the behavior changes end user
visible.   Does it work with a kernel built-in commandline? eg no
cmdlien passed at all.

Thanks
Dave


^ permalink raw reply

* INFO: task hung in request_key_tag
From: syzbot @ 2020-07-01  7:53 UTC (permalink / raw)
  To: dhowells, jarkko.sakkinen, jmorris, keyrings, linux-kernel,
	linux-security-module, serge, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    c28e58ee Add linux-next specific files for 20200629
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17925a9d100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=dcd26bbca17dd1db
dashboard link: https://syzkaller.appspot.com/bug?extid=46c77dc7e98c732de754
compiler:       gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+46c77dc7e98c732de754@syzkaller.appspotmail.com

INFO: task syz-executor.3:23879 can't die for more than 143 seconds.
syz-executor.3  D27880 23879   7210 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007f17e9e62c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000240
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007f17e9e636d4
INFO: task syz-executor.3:23879 blocked for more than 143 seconds.
      Not tainted 5.8.0-rc3-next-20200629-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.3  D27880 23879   7210 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007f17e9e62c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000240
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007f17e9e636d4
INFO: task syz-executor.3:23899 can't die for more than 144 seconds.
syz-executor.3  D29304 23899   7210 0x00000006
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007f17e9e20c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000240
RBP: 000000000078c040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007f17e9e216d4
INFO: task syz-executor.3:23899 blocked for more than 144 seconds.
      Not tainted 5.8.0-rc3-next-20200629-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.3  D29304 23899   7210 0x00000006
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007f17e9e20c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000240
RBP: 000000000078c040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007f17e9e216d4
INFO: task syz-executor.1:23880 can't die for more than 144 seconds.
syz-executor.1  D27912 23880   6964 0x00000006
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fe6e7011c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000080
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007fe6e70126d4
INFO: task syz-executor.1:23880 blocked for more than 145 seconds.
      Not tainted 5.8.0-rc3-next-20200629-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.1  D27912 23880   6964 0x00000006
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fe6e7011c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000080
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007fe6e70126d4
INFO: task syz-executor.1:23885 can't die for more than 145 seconds.
syz-executor.1  D27880 23885   6964 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fe6e6ff0c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000080
RBP: 000000000078bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007fe6e6ff16d4
INFO: task syz-executor.1:23885 blocked for more than 145 seconds.
      Not tainted 5.8.0-rc3-next-20200629-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.1  D27880 23885   6964 0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3445 [inline]
 __schedule+0x8b4/0x1e80 kernel/sched/core.c:4169
 schedule+0xd0/0x2a0 kernel/sched/core.c:4244
 bit_wait+0x12/0xa0 kernel/sched/wait_bit.c:199
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit include/linux/wait_bit.h:76 [inline]
 wait_for_key_construction+0x10b/0x140 security/keys/request_key.c:664
 request_key_tag+0x7a/0xb0 security/keys/request_key.c:705
 dns_query+0x257/0x6c3 net/dns_resolver/dns_query.c:128
 ceph_dns_resolve_name net/ceph/messenger.c:1887 [inline]
 ceph_parse_server_name net/ceph/messenger.c:1922 [inline]
 ceph_parse_ips+0x77f/0x8c0 net/ceph/messenger.c:1949
 ceph_parse_mon_ips+0x59/0xc0 net/ceph/ceph_common.c:411
 ceph_parse_source fs/ceph/super.c:271 [inline]
 ceph_parse_mount_param+0x1239/0x17e0 fs/ceph/super.c:322
 vfs_parse_fs_param fs/fs_context.c:117 [inline]
 vfs_parse_fs_param+0x203/0x550 fs/fs_context.c:98
 vfs_parse_fs_string+0xe6/0x150 fs/fs_context.c:161
 do_new_mount fs/namespace.c:2905 [inline]
 do_mount+0x1222/0x1df0 fs/namespace.c:3237
 __do_sys_mount fs/namespace.c:3447 [inline]
 __se_sys_mount fs/namespace.c:3424 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3424
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fe6e6ff0c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000004f72e0 RCX: 000000000045cb29
RDX: 0000000020000040 RSI: 0000000020000600 RDI: 0000000020000080
RBP: 000000000078bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000772 R14: 00000000004ca761 R15: 00007fe6e6ff16d4

Showing all locks held in the system:
1 lock held by khungtaskd/1151:
 #0: ffffffff89bc3000 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5779
1 lock held by in:imklog/6487:
 #0: ffff8880934365f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:928
2 locks held by agetty/6717:
 #0: ffff8880a00c5098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
 #1: ffffc90000f942e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x223/0x1a30 drivers/tty/n_tty.c:2156
2 locks held by agetty/6721:
 #0: ffff888093f4e098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:267
 #1: ffffc90000f642e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x223/0x1a30 drivers/tty/n_tty.c:2156

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1151 Comm: khungtaskd Not tainted 5.8.0-rc3-next-20200629-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:147 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:253 [inline]
 watchdog+0xd89/0xf30 kernel/hung_task.c:339
 kthread+0x3b5/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3865 Comm: systemd-journal Not tainted 5.8.0-rc3-next-20200629-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:107 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:134 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:165 [inline]
RIP: 0010:check_memory_region_inline mm/kasan/generic.c:183 [inline]
RIP: 0010:check_memory_region+0x59/0x180 mm/kasan/generic.c:192
Code: 00 49 83 e9 01 48 89 fd 48 b8 00 00 00 00 00 fc ff df 4d 89 ca 48 c1 ed 03 49 c1 ea 03 48 01 c5 49 01 c2 48 89 e8 49 8d 5a 01 <48> 89 da 48 29 ea 48 83 fa 10 7e 63 41 89 eb 41 83 e3 07 75 74 4c
RSP: 0018:ffffc90001657970 EFLAGS: 00000086
RAX: fffffbfff18b3b44 RBX: fffffbfff18b3b45 RCX: ffffffff8159e633
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff8c59da20
RBP: fffffbfff18b3b44 R08: 0000000000000000 R09: ffffffff8c59da27
R10: fffffbfff18b3b44 R11: 0000000000000000 R12: ffff888093ca2080
R13: 0000000000000000 R14: cf6300484b50d8b0 R15: 0000000000000000
FS:  00007fec0e4d08c0(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fec0bd2e000 CR3: 0000000093db6000 CR4: 00000000001526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 instrument_atomic_read include/linux/instrumented.h:56 [inline]
 test_bit include/asm-generic/bitops/instrumented-non-atomic.h:110 [inline]
 hlock_class kernel/locking/lockdep.c:179 [inline]
 lookup_chain_cache_add kernel/locking/lockdep.c:3127 [inline]
 validate_chain kernel/locking/lockdep.c:3183 [inline]
 __lock_acquire+0x16e3/0x56e0 kernel/locking/lockdep.c:4380
 lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xc0 kernel/locking/spinlock.c:159
 __debug_check_no_obj_freed lib/debugobjects.c:955 [inline]
 debug_check_no_obj_freed+0xc7/0x41c lib/debugobjects.c:998
 free_pages_prepare mm/page_alloc.c:1215 [inline]
 __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1467
 slab_destroy mm/slab.c:1625 [inline]
 slabs_destroy+0x89/0xc0 mm/slab.c:1641
 cache_flusharray mm/slab.c:3409 [inline]
 ___cache_free+0x516/0x750 mm/slab.c:3459
 qlink_free mm/kasan/quarantine.c:148 [inline]
 qlist_free_all+0x79/0x140 mm/kasan/quarantine.c:167
 quarantine_reduce+0x17e/0x200 mm/kasan/quarantine.c:260
 __kasan_kmalloc.constprop.0+0x9e/0xd0 mm/kasan/common.c:475
 slab_post_alloc_hook mm/slab.h:535 [inline]
 slab_alloc mm/slab.c:3316 [inline]
 kmem_cache_alloc+0x148/0x550 mm/slab.c:3486
 prepare_creds+0x39/0x6c0 kernel/cred.c:258
 access_override_creds fs/open.c:353 [inline]
 do_faccessat+0x3d7/0x820 fs/open.c:417
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fec0d78c9c7
Code: Bad RIP value.
RSP: 002b:00007ffd86b9dd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ffd86ba0c90 RCX: 00007fec0d78c9c7
RDX: 00007fec0e1fda00 RSI: 0000000000000000 RDI: 0000557c536289a3
RBP: 00007ffd86b9ddb0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffd86ba0c90 R15: 00007ffd86b9e2a0


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* [PATCH v4 3/3] prctl: Allow ptrace capable processes to change /proc/self/exe
From: Adrian Reber @ 2020-07-01  6:49 UTC (permalink / raw)
  To: Christian Brauner, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, Nicolas Viennot,
	Michał Cłapiński, Kamil Yurtsever, Dirk Petersen,
	Christine Flood, Casey Schaufler
  Cc: Mike Rapoport, Radostin Stoyanov, Adrian Reber, Cyrill Gorcunov,
	Serge Hallyn, Stephen Smalley, Sargun Dhillon, Arnd Bergmann,
	linux-security-module, linux-kernel, selinux, Eric Paris,
	Jann Horn, linux-fsdevel
In-Reply-To: <20200701064906.323185-1-areber@redhat.com>

From: Nicolas Viennot <Nicolas.Viennot@twosigma.com>

Previously, the current process could only change the /proc/self/exe
link with local CAP_SYS_ADMIN.
This commit relaxes this restriction by permitting such change with
CAP_CHECKPOINT_RESTORE, and the ability to use ptrace.

With access to ptrace facilities, a process can do the following: fork a
child, execve() the target executable, and have the child use ptrace()
to replace the memory content of the current process. This technique
makes it possible to masquerade an arbitrary program as any executable,
even setuid ones.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
---
 include/linux/lsm_hook_defs.h |  1 +
 include/linux/security.h      |  6 ++++++
 kernel/sys.c                  | 12 ++++--------
 security/commoncap.c          | 26 ++++++++++++++++++++++++++
 security/security.c           |  5 +++++
 security/selinux/hooks.c      | 14 ++++++++++++++
 6 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 0098852bb56a..90e51d5e093b 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -211,6 +211,7 @@ LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
 	 int sig, const struct cred *cred)
 LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
 	 unsigned long arg3, unsigned long arg4, unsigned long arg5)
+LSM_HOOK(int, 0, prctl_set_mm_exe_file, struct file *exe_file)
 LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
 	 struct inode *inode)
 LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
diff --git a/include/linux/security.h b/include/linux/security.h
index 2797e7f6418e..0f594eb7e766 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -412,6 +412,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
 			int sig, const struct cred *cred);
 int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			unsigned long arg4, unsigned long arg5);
+int security_prctl_set_mm_exe_file(struct file *exe_file);
 void security_task_to_inode(struct task_struct *p, struct inode *inode);
 int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
 void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid);
@@ -1124,6 +1125,11 @@ static inline int security_task_prctl(int option, unsigned long arg2,
 	return cap_task_prctl(option, arg2, arg3, arg4, arg5);
 }
 
+static inline int security_prctl_set_mm_exe_file(struct file *exe_file)
+{
+	return cap_prctl_set_mm_exe_file(exe_file);
+}
+
 static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
 { }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 00a96746e28a..bb53e8408c63 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1851,6 +1851,10 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	if (err)
 		goto exit;
 
+	err = security_prctl_set_mm_exe_file(exe.file);
+	if (err)
+		goto exit;
+
 	/*
 	 * Forbid mm->exe_file change if old file still mapped.
 	 */
@@ -2006,14 +2010,6 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	}
 
 	if (prctl_map.exe_fd != (u32)-1) {
-		/*
-		 * Make sure the caller has the rights to
-		 * change /proc/pid/exe link: only local sys admin should
-		 * be allowed to.
-		 */
-		if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
-			return -EINVAL;
-
 		error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
 		if (error)
 			return error;
diff --git a/security/commoncap.c b/security/commoncap.c
index 59bf3c1674c8..663d00fe2ecc 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1291,6 +1291,31 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 	}
 }
 
+/**
+ * cap_prctl_set_mm_exe_file - Determine whether /proc/self/exe can be changed
+ * by the current process.
+ * @exe_file: The new exe file
+ * Returns 0 if permission is granted, -ve if denied.
+ *
+ * The current process is permitted to change its /proc/self/exe link via two policies:
+ * 1) The current user can do checkpoint/restore. At the time of this writing,
+ *    this means CAP_SYS_ADMIN or CAP_CHECKPOINT_RESTORE capable.
+ * 2) The current user can use ptrace.
+ *
+ * With access to ptrace facilities, a process can do the following:
+ * fork a child, execve() the target executable, and have the child use
+ * ptrace() to replace the memory content of the current process.
+ * This technique makes it possible to masquerade an arbitrary program as the
+ * target executable, even if it is setuid.
+ */
+int cap_prctl_set_mm_exe_file(struct file *exe_file)
+{
+	if (checkpoint_restore_ns_capable(current_user_ns()))
+		return 0;
+
+	return security_ptrace_access_check(current, PTRACE_MODE_ATTACH_REALCREDS);
+}
+
 /**
  * cap_vm_enough_memory - Determine whether a new virtual mapping is permitted
  * @mm: The VM space in which the new mapping is to be made
@@ -1356,6 +1381,7 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(mmap_file, cap_mmap_file),
 	LSM_HOOK_INIT(task_fix_setuid, cap_task_fix_setuid),
 	LSM_HOOK_INIT(task_prctl, cap_task_prctl),
+	LSM_HOOK_INIT(prctl_set_mm_exe_file, cap_prctl_set_mm_exe_file),
 	LSM_HOOK_INIT(task_setscheduler, cap_task_setscheduler),
 	LSM_HOOK_INIT(task_setioprio, cap_task_setioprio),
 	LSM_HOOK_INIT(task_setnice, cap_task_setnice),
diff --git a/security/security.c b/security/security.c
index 2bb912496232..13a1ed32f9e3 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1790,6 +1790,11 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 	return rc;
 }
 
+int security_prctl_set_mm_exe_file(struct file *exe_file)
+{
+	return call_int_hook(prctl_set_mm_exe_file, 0, exe_file);
+}
+
 void security_task_to_inode(struct task_struct *p, struct inode *inode)
 {
 	call_void_hook(task_to_inode, p, inode);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ca901025802a..fca5581392b8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4156,6 +4156,19 @@ static int selinux_task_kill(struct task_struct *p, struct kernel_siginfo *info,
 			    secid, task_sid(p), SECCLASS_PROCESS, perm, NULL);
 }
 
+static int selinux_prctl_set_mm_exe_file(struct file *exe_file)
+{
+	u32 sid = current_sid();
+
+	struct common_audit_data ad = {
+		.type = LSM_AUDIT_DATA_FILE,
+		.u.file = exe_file,
+	};
+
+	return avc_has_perm(&selinux_state, sid, sid,
+			    SECCLASS_FILE, FILE__EXECUTE_NO_TRANS, &ad);
+}
+
 static void selinux_task_to_inode(struct task_struct *p,
 				  struct inode *inode)
 {
@@ -7057,6 +7070,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(task_getscheduler, selinux_task_getscheduler),
 	LSM_HOOK_INIT(task_movememory, selinux_task_movememory),
 	LSM_HOOK_INIT(task_kill, selinux_task_kill),
+	LSM_HOOK_INIT(prctl_set_mm_exe_file, selinux_prctl_set_mm_exe_file),
 	LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
 
 	LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 2/3] selftests: add clone3() CAP_CHECKPOINT_RESTORE test
From: Adrian Reber @ 2020-07-01  6:49 UTC (permalink / raw)
  To: Christian Brauner, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, Nicolas Viennot,
	Michał Cłapiński, Kamil Yurtsever, Dirk Petersen,
	Christine Flood, Casey Schaufler
  Cc: Mike Rapoport, Radostin Stoyanov, Adrian Reber, Cyrill Gorcunov,
	Serge Hallyn, Stephen Smalley, Sargun Dhillon, Arnd Bergmann,
	linux-security-module, linux-kernel, selinux, Eric Paris,
	Jann Horn, linux-fsdevel
In-Reply-To: <20200701064906.323185-1-areber@redhat.com>

This adds a test that changes its UID, uses capabilities to
get CAP_CHECKPOINT_RESTORE and uses clone3() with set_tid to
create a process with a given PID as non-root.

Signed-off-by: Adrian Reber <areber@redhat.com>
---
 tools/testing/selftests/clone3/Makefile       |   4 +-
 .../clone3/clone3_cap_checkpoint_restore.c    | 203 ++++++++++++++++++
 2 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c

diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile
index cf976c732906..ef7564cb7abe 100644
--- a/tools/testing/selftests/clone3/Makefile
+++ b/tools/testing/selftests/clone3/Makefile
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 CFLAGS += -g -I../../../../usr/include/
+LDLIBS += -lcap
 
-TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid
+TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid \
+	clone3_cap_checkpoint_restore
 
 include ../lib.mk
diff --git a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c
new file mode 100644
index 000000000000..2cc3d57b91f2
--- /dev/null
+++ b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Based on Christian Brauner's clone3() example.
+ * These tests are assuming to be running in the host's
+ * PID namespace.
+ */
+
+/* capabilities related code based on selftests/bpf/test_verifier.c */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <linux/types.h>
+#include <linux/sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <sys/capability.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/un.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <sched.h>
+
+#include "../kselftest.h"
+#include "clone3_selftests.h"
+
+#ifndef MAX_PID_NS_LEVEL
+#define MAX_PID_NS_LEVEL 32
+#endif
+
+static void child_exit(int ret)
+{
+	fflush(stdout);
+	fflush(stderr);
+	_exit(ret);
+}
+
+static int call_clone3_set_tid(pid_t * set_tid, size_t set_tid_size)
+{
+	int status;
+	pid_t pid = -1;
+
+	struct clone_args args = {
+		.exit_signal = SIGCHLD,
+		.set_tid = ptr_to_u64(set_tid),
+		.set_tid_size = set_tid_size,
+	};
+
+	pid = sys_clone3(&args, sizeof(struct clone_args));
+	if (pid < 0) {
+		ksft_print_msg("%s - Failed to create new process\n",
+			       strerror(errno));
+		return -errno;
+	}
+
+	if (pid == 0) {
+		int ret;
+		char tmp = 0;
+
+		ksft_print_msg
+		    ("I am the child, my PID is %d (expected %d)\n",
+		     getpid(), set_tid[0]);
+
+		if (set_tid[0] != getpid())
+			child_exit(EXIT_FAILURE);
+		child_exit(EXIT_SUCCESS);
+	}
+
+	ksft_print_msg("I am the parent (%d). My child's pid is %d\n",
+		       getpid(), pid);
+
+	if (waitpid(pid, &status, 0) < 0) {
+		ksft_print_msg("Child returned %s\n", strerror(errno));
+		return -errno;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static int test_clone3_set_tid(pid_t * set_tid,
+			       size_t set_tid_size, int expected)
+{
+	int ret;
+
+	ksft_print_msg("[%d] Trying clone3() with CLONE_SET_TID to %d\n",
+		       getpid(), set_tid[0]);
+	ret = call_clone3_set_tid(set_tid, set_tid_size);
+
+	ksft_print_msg
+	    ("[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n",
+	     getpid(), set_tid[0], ret, expected);
+	if (ret != expected) {
+		ksft_test_result_fail
+		    ("[%d] Result (%d) is different than expected (%d)\n",
+		     getpid(), ret, expected);
+		return -1;
+	}
+	ksft_test_result_pass
+	    ("[%d] Result (%d) matches expectation (%d)\n", getpid(), ret,
+	     expected);
+
+	return 0;
+}
+
+struct libcap {
+	struct __user_cap_header_struct hdr;
+	struct __user_cap_data_struct data[2];
+};
+
+static int set_capability()
+{
+	cap_value_t cap_values[] = { CAP_SETUID, CAP_SETGID };
+	struct libcap *cap;
+	int ret = -1;
+	cap_t caps;
+
+	caps = cap_get_proc();
+	if (!caps) {
+		perror("cap_get_proc");
+		return -1;
+	}
+
+	/* Drop all capabilities */
+	if (cap_clear(caps)) {
+		perror("cap_clear");
+		goto out;
+	}
+
+	cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_values, CAP_SET);
+	cap_set_flag(caps, CAP_PERMITTED, 2, cap_values, CAP_SET);
+
+	cap = (struct libcap *) caps;
+
+	/* 40 -> CAP_CHECKPOINT_RESTORE */
+	cap->data[1].effective |= 1 << (40 - 32);
+	cap->data[1].permitted |= 1 << (40 - 32);
+
+	if (cap_set_proc(caps)) {
+		perror("cap_set_proc");
+		goto out;
+	}
+	ret = 0;
+out:
+	if (cap_free(caps))
+		perror("cap_free");
+	return ret;
+}
+
+int main(int argc, char *argv[])
+{
+	pid_t pid;
+	int status;
+	int ret = 0;
+	pid_t set_tid[1];
+	uid_t uid = getuid();
+
+	ksft_print_header();
+	test_clone3_supported();
+	ksft_set_plan(2);
+
+	if (uid != 0) {
+		ksft_cnt.ksft_xskip = ksft_plan;
+		ksft_print_msg("Skipping all tests as non-root\n");
+		return ksft_exit_pass();
+	}
+
+	memset(&set_tid, 0, sizeof(set_tid));
+
+	/* Find the current active PID */
+	pid = fork();
+	if (pid == 0) {
+		ksft_print_msg("Child has PID %d\n", getpid());
+		child_exit(EXIT_SUCCESS);
+	}
+	if (waitpid(pid, &status, 0) < 0)
+		ksft_exit_fail_msg("Waiting for child %d failed", pid);
+
+	/* After the child has finished, its PID should be free. */
+	set_tid[0] = pid;
+
+	if (set_capability())
+		ksft_test_result_fail
+		    ("Could not set CAP_CHECKPOINT_RESTORE\n");
+	prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0);
+	/* This would fail without CAP_CHECKPOINT_RESTORE */
+	setgid(1000);
+	setuid(1000);
+	set_tid[0] = pid;
+	ret |= test_clone3_set_tid(set_tid, 1, -EPERM);
+	if (set_capability())
+		ksft_test_result_fail
+		    ("Could not set CAP_CHECKPOINT_RESTORE\n");
+	/* This should work as we have CAP_CHECKPOINT_RESTORE as non-root */
+	ret |= test_clone3_set_tid(set_tid, 1, 0);
+
+	return !ret ? ksft_exit_pass() : ksft_exit_fail();
+}
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 1/3] capabilities: Introduce CAP_CHECKPOINT_RESTORE
From: Adrian Reber @ 2020-07-01  6:49 UTC (permalink / raw)
  To: Christian Brauner, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, Nicolas Viennot,
	Michał Cłapiński, Kamil Yurtsever, Dirk Petersen,
	Christine Flood, Casey Schaufler
  Cc: Mike Rapoport, Radostin Stoyanov, Adrian Reber, Cyrill Gorcunov,
	Serge Hallyn, Stephen Smalley, Sargun Dhillon, Arnd Bergmann,
	linux-security-module, linux-kernel, selinux, Eric Paris,
	Jann Horn, linux-fsdevel
In-Reply-To: <20200701064906.323185-1-areber@redhat.com>

This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
checkpoint/restore for non-root users.

Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has been
asked numerous times if it is possible to checkpoint/restore a process as
non-root. The answer usually was: 'almost'.

The main blocker to restore a process as non-root was to control the PID of the
restored process. This feature available via the clone3 system call, or via
/proc/sys/kernel/ns_last_pid is unfortunately guarded by CAP_SYS_ADMIN.

In the past two years, requests for non-root checkpoint/restore have increased
due to the following use cases:
* Checkpoint/Restore in an HPC environment in combination with a resource
  manager distributing jobs where users are always running as non-root.
  There is a desire to provide a way to checkpoint and restore long running
  jobs.
* Container migration as non-root
* We have been in contact with JVM developers who are integrating
  CRIU into a Java VM to decrease the startup time. These checkpoint/restore
  applications are not meant to be running with CAP_SYS_ADMIN.

We have seen the following workarounds:
* Use a setuid wrapper around CRIU:
  See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
* Use a setuid helper that writes to ns_last_pid.
  Unfortunately, this helper delegation technique is impossible to use with
  clone3, and is thus prone to races.
  See https://github.com/twosigma/set_ns_last_pid
* Cycle through PIDs with fork() until the desired PID is reached:
  This has been demonstrated to work with cycling rates of 100,000 PIDs/s
  See https://github.com/twosigma/set_ns_last_pid
* Patch out the CAP_SYS_ADMIN check from the kernel
* Run the desired application in a new user and PID namespace to provide
  a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited use in
  typical container environments (e.g., Kubernetes) as /proc is
  typically protected with read-only layers (e.g., /proc/sys) for hardening
  purposes. Read-only layers prevent additional /proc mounts (due to proc's
  SB_I_USERNS_VISIBLE property), making the use of new PID namespaces limited as
  certain applications need access to /proc matching their PID namespace.

The introduced capability allows to:
* Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
  for the corresponding PID namespace via ns_last_pid/clone3.
* Open files in /proc/pid/map_files when the current user is
  CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for recovering
  files that are unreachable via the file system such as deleted files, or memfd
  files.

See corresponding selftest for an example with clone3().

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
---
 fs/proc/base.c                      | 8 ++++----
 include/linux/capability.h          | 6 ++++++
 include/uapi/linux/capability.h     | 9 ++++++++-
 kernel/pid.c                        | 2 +-
 kernel/pid_namespace.c              | 2 +-
 security/selinux/include/classmap.h | 5 +++--
 6 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d86c0afc8a85..ad806069c778 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2189,16 +2189,16 @@ struct map_files_info {
 };

 /*
- * Only allow CAP_SYS_ADMIN to follow the links, due to concerns about how the
- * symlinks may be used to bypass permissions on ancestor directories in the
- * path to the file in question.
+ * Only allow CAP_SYS_ADMIN and CAP_CHECKPOINT_RESTORE to follow the links, due
+ * to concerns about how the symlinks may be used to bypass permissions on
+ * ancestor directories in the path to the file in question.
  */
 static const char *
 proc_map_files_get_link(struct dentry *dentry,
 			struct inode *inode,
 		        struct delayed_call *done)
 {
-	if (!capable(CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN) && !capable(CAP_CHECKPOINT_RESTORE))
 		return ERR_PTR(-EPERM);

 	return proc_pid_get_link(dentry, inode, done);
diff --git a/include/linux/capability.h b/include/linux/capability.h
index b4345b38a6be..1e7fe311cabe 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -261,6 +261,12 @@ static inline bool bpf_capable(void)
 	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
 }

+static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)
+{
+	return ns_capable(ns, CAP_CHECKPOINT_RESTORE) ||
+		ns_capable(ns, CAP_SYS_ADMIN);
+}
+
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);

diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 48ff0757ae5e..395dd0df8d08 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -408,7 +408,14 @@ struct vfs_ns_cap_data {
  */
 #define CAP_BPF			39

-#define CAP_LAST_CAP         CAP_BPF
+
+/* Allow checkpoint/restore related operations */
+/* Allow PID selection during clone3() */
+/* Allow writing to ns_last_pid */
+
+#define CAP_CHECKPOINT_RESTORE	40
+
+#define CAP_LAST_CAP         CAP_CHECKPOINT_RESTORE

 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

diff --git a/kernel/pid.c b/kernel/pid.c
index 5799ae54b89e..2d0a97b7ed7a 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -198,7 +198,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 			if (tid != 1 && !tmp->child_reaper)
 				goto out_free;
 			retval = -EPERM;
-			if (!ns_capable(tmp->user_ns, CAP_SYS_ADMIN))
+			if (!checkpoint_restore_ns_capable(tmp->user_ns))
 				goto out_free;
 			set_tid_size--;
 		}
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 0e5ac162c3a8..ac135bd600eb 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -269,7 +269,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write,
 	struct ctl_table tmp = *table;
 	int ret, next;

-	if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN))
+	if (write && !checkpoint_restore_ns_capable(pid_ns->user_ns))
 		return -EPERM;

 	/*
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 98e1513b608a..40cebde62856 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,10 @@
 	    "audit_control", "setfcap"

 #define COMMON_CAP2_PERMS  "mac_override", "mac_admin", "syslog", \
-		"wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf"
+		"wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf", \
+		"checkpoint_restore"

-#if CAP_LAST_CAP > CAP_BPF
+#if CAP_LAST_CAP > CAP_CHECKPOINT_RESTORE
 #error New capability defined, please update COMMON_CAP2_PERMS.
 #endif

-- 
2.26.2

^ permalink raw reply related

* [PATCH v4 0/3] capabilities: Introduce CAP_CHECKPOINT_RESTORE
From: Adrian Reber @ 2020-07-01  6:49 UTC (permalink / raw)
  To: Christian Brauner, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, Nicolas Viennot,
	Michał Cłapiński, Kamil Yurtsever, Dirk Petersen,
	Christine Flood, Casey Schaufler
  Cc: Mike Rapoport, Radostin Stoyanov, Adrian Reber, Cyrill Gorcunov,
	Serge Hallyn, Stephen Smalley, Sargun Dhillon, Arnd Bergmann,
	linux-security-module, linux-kernel, selinux, Eric Paris,
	Jann Horn, linux-fsdevel

This is v4 of the 'Introduce CAP_CHECKPOINT_RESTORE' patchset. There
is only one change from v3 to address Jann's comment on patch 3/3

 (That is not necessarily true in the presence of LSMs like SELinux:
 You'd have to be able to FILE__EXECUTE_NO_TRANS the target executable
 according to the system's security policy.)

Nicolas updated the last patch (3/3). The first two patches are
unchanged from v3.

Adrian Reber (2):
  capabilities: Introduce CAP_CHECKPOINT_RESTORE
  selftests: add clone3() CAP_CHECKPOINT_RESTORE test

Nicolas Viennot (1):
  prctl: Allow ptrace capable processes to change /proc/self/exe

 fs/proc/base.c                                |   8 +-
 include/linux/capability.h                    |   6 +
 include/linux/lsm_hook_defs.h                 |   1 +
 include/linux/security.h                      |   6 +
 include/uapi/linux/capability.h               |   9 +-
 kernel/pid.c                                  |   2 +-
 kernel/pid_namespace.c                        |   2 +-
 kernel/sys.c                                  |  12 +-
 security/commoncap.c                          |  26 +++
 security/security.c                           |   5 +
 security/selinux/hooks.c                      |  14 ++
 security/selinux/include/classmap.h           |   5 +-
 tools/testing/selftests/clone3/Makefile       |   4 +-
 .../clone3/clone3_cap_checkpoint_restore.c    | 203 ++++++++++++++++++
 14 files changed, 285 insertions(+), 18 deletions(-)
 create mode 100644 tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c


base-commit: f2b92b14533e646e434523abdbafddb727c23898
-- 
2.26.2


^ permalink raw reply

* Re: [PATCH v2 00/11] ima: Fix rule parsing bugs and extend KEXEC_CMDLINE rule support
From: Mimi Zohar @ 2020-07-01  0:29 UTC (permalink / raw)
  To: Tyler Hicks, Dmitry Kasatkin
  Cc: James Morris, Serge E . Hallyn, Lakshmi Ramasubramanian,
	Prakhar Srivastava, linux-kernel, linux-integrity,
	linux-security-module, Janne Karhunen, Eric Biederman, kexec,
	Casey Schaufler
In-Reply-To: <20200626223900.253615-1-tyhicks@linux.microsoft.com>

On Fri, 2020-06-26 at 17:38 -0500, Tyler Hicks wrote:
> This series ultimately extends the supported IMA rule conditionals for
> the KEXEC_CMDLINE hook function. As of today, there's an imbalance in
> IMA language conditional support for KEXEC_CMDLINE rules in comparison
> to KEXEC_KERNEL_CHECK and KEXEC_INITRAMFS_CHECK rules. The KEXEC_CMDLINE
> rules do not support *any* conditionals so you cannot have a sequence of
> rules like this:
> 
>  dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
>  dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
>  dont_measure func=KEXEC_CMDLINE obj_type=foo_t
>  measure func=KEXEC_KERNEL_CHECK
>  measure func=KEXEC_INITRAMFS_CHECK
>  measure func=KEXEC_CMDLINE
> 
> Instead, KEXEC_CMDLINE rules can only be measured or not measured and
> there's no additional flexibility in today's implementation of the
> KEXEC_CMDLINE hook function.
> 
> With this series, the above sequence of rules becomes valid and any
> calls to kexec_file_load() with a kernel and initramfs inode type of
> foo_t will not be measured (that includes the kernel cmdline buffer)
> while all other objects given to a kexec_file_load() syscall will be
> measured. There's obviously not an inode directly associated with the
> kernel cmdline buffer but this patch series ties the inode based
> decision making for KEXEC_CMDLINE to the kernel's inode. I think this
> will be intuitive to policy authors.
> 
> While reading IMA code and preparing to make this change, I realized
> that the buffer based hook functions (KEXEC_CMDLINE and KEY_CHECK) are
> quite special in comparison to longer standing hook functions. These
> buffer based hook functions can only support measure actions and there
> are some restrictions on the conditionals that they support. However,
> the rule parser isn't enforcing any of those restrictions and IMA policy
> authors wouldn't have any immediate way of knowing that the policy that
> they wrote is invalid. For example, the sequence of rules above parses
> successfully in today's kernel but the
> "dont_measure func=KEXEC_CMDLINE ..." rule is incorrectly handled in
> ima_match_rules(). The dont_measure rule is *always* considered to be a
> match so, surprisingly, no KEXEC_CMDLINE measurements are made.
> 
> While making the rule parser more strict, I realized that the parser
> does not correctly free all of the allocated memory associated with an
> ima_rule_entry when going down some error paths. Invalid policy loaded
> by the policy administrator could result in small memory leaks.
> 
> I envision patches 1-6 going to stable. The series is ordered in a way
> that has all the fixes up front, followed by cleanups, followed by the
> feature patch. The breakdown of patches looks like so:
> 
>  Memory leak fixes: 1-3
>  Parser strictness fixes: 4-6
>  Code cleanups made possible by the fixes: 7-10
>  Extend KEXEC_CMDLINE rule support: 11
> 
> Perhaps the most logical ordering for code review is:
> 
>  1, 2, 3, 7, 8, 4, 5, 6, 9, 10, 11
> 
> If you'd like me to re-order or split up the series, just let me know.
> Thanks for considering these patches!
> 
> * Series-wide v2 changes
>   - Rebased onto next-integrity-testing
>   - Squashed patches 2 and 3 from v1
>     + Updated this cover letter to account for changes to patch index
>       changes
>     + See patch 2 for specific code changes

Other than the comment on 9/11 the patch set looks good.

thanks!

Mimi

^ permalink raw reply

* Re: [PATCH v3 1/1] fs: move kernel_read_file* to its own include file
From: Scott Branden @ 2020-06-30 23:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Luis Chamberlain, Alexander Viro, Eric Biederman, Jessica Yu,
	BCM Kernel Feedback, Greg Kroah-Hartman, Rafael J . Wysocki,
	James Morris, Serge E . Hallyn, Mimi Zohar, Dmitry Kasatkin,
	Kees Cook, Paul Moore, Stephen Smalley, Eric Paris, linux-kernel,
	linux-fsdevel, kexec, linux-security-module, linux-integrity,
	selinux
In-Reply-To: <20200624075516.GA20553@lst.de>

Hi Al (Viro),

Are you able to take this patch into your tree or does someone else?

On 2020-06-24 12:55 a.m., Christoph Hellwig wrote:
> Looks good,
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH v2 09/11] ima: Move validation of the keyrings conditional into ima_validate_rule()
From: Mimi Zohar @ 2020-06-30 23:07 UTC (permalink / raw)
  To: Tyler Hicks, Dmitry Kasatkin
  Cc: James Morris, Serge E . Hallyn, Lakshmi Ramasubramanian,
	Prakhar Srivastava, linux-kernel, linux-integrity,
	linux-security-module
In-Reply-To: <20200626223900.253615-10-tyhicks@linux.microsoft.com>

On Fri, 2020-06-26 at 17:38 -0500, Tyler Hicks wrote:
> Use ima_validate_rule() to ensure that the combination of a hook
> function and the keyrings conditional is valid and that the keyrings
> conditional is not specified without an explicit KEY_CHECK func
> conditional. This is a code cleanup and has no user-facing change.
> 
> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> ---
> 
> * v2
>   - Allowed IMA_DIGSIG_REQUIRED, IMA_PERMIT_DIRECTIO,
>     IMA_MODSIG_ALLOWED, and IMA_CHECK_BLACKLIST conditionals to be
>     present in the rule entry flags for non-buffer hook functions.
> 
>  security/integrity/ima/ima_policy.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> index 8cdca2399d59..43d49ad958fb 100644
> --- a/security/integrity/ima/ima_policy.c
> +++ b/security/integrity/ima/ima_policy.c
> @@ -1000,6 +1000,15 @@ static bool ima_validate_rule(struct ima_rule_entry *entry)
>  		case KEXEC_KERNEL_CHECK:
>  		case KEXEC_INITRAMFS_CHECK:
>  		case POLICY_CHECK:
> +			if (entry->flags & ~(IMA_FUNC | IMA_MASK | IMA_FSMAGIC |
> +					     IMA_UID | IMA_FOWNER | IMA_FSUUID |
> +					     IMA_INMASK | IMA_EUID | IMA_PCR |
> +					     IMA_FSNAME | IMA_DIGSIG_REQUIRED |
> +					     IMA_PERMIT_DIRECTIO |
> +					     IMA_MODSIG_ALLOWED |
> +					     IMA_CHECK_BLACKLIST))

Other than KEYRINGS, this patch should continue to behave the same.
 However, this list gives the impressions that all of these flags are
permitted on all of the above flags, which isn't true.

For example, both IMA_MODSIG_ALLOWED & IMA_CHECK_BLACKLIST are limited
to appended signatures, meaning KERNEL_CHECK and KEXEC_KERNEL_CHECK.
 Both should only be allowed on APPRAISE action rules.

IMA_PCR should be limited to MEASURE action rules.

IMA_DIGSIG_REQUIRED should be limited to APPRAISE action rules.

> +				return false;
> +
>  			break;
>  		case KEXEC_CMDLINE:
>  			if (entry->action & ~(MEASURE | DONT_MEASURE))
> @@ -1027,7 +1036,8 @@ static bool ima_validate_rule(struct ima_rule_entry *entry)
>  		default:
>  			return false;
>  		}
> -	}
> +	} else if (entry->flags & IMA_KEYRINGS)
> +		return false;

IMA_MODSIG_ALLOWED and IMA_CHECK_BLACKLIST need to be added here as
well.

Mimi

>  
>  	return true;
>  }
> @@ -1209,7 +1219,6 @@ static int ima_parse_rule(char *rule, struct ima_rule_entry *entry)
>  			keyrings_len = strlen(args[0].from) + 1;
>  
>  			if ((entry->keyrings) ||
> -			    (entry->func != KEY_CHECK) ||
>  			    (keyrings_len < 2)) {
>  				result = -EINVAL;
>  				break;


^ permalink raw reply

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Tetsuo Handa @ 2020-06-30 22:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric W. Biederman, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <CAADnVQKrRpjQpc9-xMizCPr1E12_jXrvH-kaKwxBmvQ03n_uiw@mail.gmail.com>

On 2020/07/01 6:57, Alexei Starovoitov wrote:
>>>>> They were all should never happen cases.  Which is why my patches do:
>>>>> if (WARN_ON_ONCE(...))
>>>>
>>>> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
>>>
>>> I don't believe that's true.
>>> Please show fuzzing stack trace to prove your point.
>>>
>>
>> Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)
> 
> Is it a joke? Do you understand how syzbot works?
> If so, please explain how it can invoke umd_* interface.
> 

Currently syzkaller can't invoke umd_* interface because this interface is used by only
bpfilter_umh module. But I can imagine that someone starts using this interface in a way
syzkaller can somehow invoke. Thus, how can it be a joke? I don't understand your question.

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/4] bpf: Generalize bpf_sk_storage
From: KP Singh @ 2020-06-30 22:00 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Linux Security Module list, Alexei Starovoitov,
	Daniel Borkmann, Paul Turner, Jann Horn
In-Reply-To: <20200630193441.kdwnkestulg5erii@kafai-mbp.dhcp.thefacebook.com>



On Tue, Jun 30, 2020 at 9:35 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Mon, Jun 29, 2020 at 06:01:00PM +0200, KP Singh wrote:
> > > >

[...]

> > > >  static atomic_t cache_idx;
> > > inode local storage and sk local storage probably need a separate
> > > cache_idx.  An improvement on picking cache_idx has just been
> > > landed also.
> >
> > I see, thanks! I rebased and I now see that cache_idx is now a:
> >
> >   static u64 cache_idx_usage_counts[BPF_STORAGE_CACHE_SIZE];
> >
> > which tracks the free cache slots rather than using a single atomic
> > cache_idx. I guess all types of local storage can share this now
> > right?
> I believe they have to be separated.  A sk-storage will not be cached/stored
> in inode.  Caching a sk-storage at idx=0 of a sk should not stop
> an inode-storage to be cached at the same idx of a inode.

Ah yes, I see.

I came up with some macros to solve this. Let me know what you think:
(this is on top of the refactoring I did, so some function names may seem new,
but it should, hopefully, convey the general idea).

diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 3067774cc640..1dc2e6d72091 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -79,6 +79,26 @@ struct bpf_local_storage_elem {
 #define SDATA(_SELEM) (&(_SELEM)->sdata)
 #define BPF_STORAGE_CACHE_SIZE	16
 
+u16 bpf_ls_cache_idx_get(spinlock_t *cache_idx_lock,
+			   u64 *cache_idx_usage_count);
+
+void bpf_ls_cache_idx_free(spinlock_t *cache_idx_lock,
+			   u64 *cache_idx_usage_counts, u16 idx);
+
+#define DEFINE_BPF_STORAGE_CACHE(type)					\
+static DEFINE_SPINLOCK(cache_idx_lock_##type);				\
+static u64 cache_idx_usage_counts_##type[BPF_STORAGE_CACHE_SIZE];	\
+static u16 cache_idx_get_##type(void)					\
+{									\
+	return bpf_ls_cache_idx_get(&cache_idx_lock_##type,		\
+				    cache_idx_usage_counts_##type);	\
+}									\
+static void cache_idx_free_##type(u16 idx)				\
+{									\
+	return bpf_ls_cache_idx_free(&cache_idx_lock_##type,		\
+				     cache_idx_usage_counts_##type,	\
+				     idx);				\
+}
 
 /* U16_MAX is much more than enough for sk local storage
  * considering a tcp_sock is ~2k.
@@ -105,13 +125,14 @@ struct bpf_local_storage {
 
 /* Helper functions for bpf_local_storage */
 int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
-struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
+struct bpf_local_storage_map *
+bpf_local_storage_map_alloc(union bpf_attr *attr);
 
 struct bpf_local_storage_data *
 bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
 	struct bpf_local_storage_map *smap, bool cacheit_lockit);
 
-void bpf_local_storage_map_free(struct bpf_map *map);
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
 
 int bpf_local_storage_map_check_btf(const struct bpf_map *map,
 	const struct btf *btf, const struct btf_type *key_type,
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index fb589a5715f5..2bc04f8d1e35 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -17,9 +17,6 @@
 	container_of((_SDATA), struct bpf_local_storage_elem, sdata)
 #define SDATA(_SELEM) (&(_SELEM)->sdata)
 
-static DEFINE_SPINLOCK(cache_idx_lock);
-static u64 cache_idx_usage_counts[BPF_STORAGE_CACHE_SIZE];
-
 static struct bucket *select_bucket(struct bpf_local_storage_map *smap,
 				    struct bpf_local_storage_elem *selem)
 {
@@ -460,12 +457,13 @@ static struct bpf_local_storage_data *inode_storage_update(
 					map_flags);
 }
 
-static u16 cache_idx_get(void)
+u16 bpf_ls_cache_idx_get(spinlock_t *cache_idx_lock,
+			 u64 *cache_idx_usage_counts)
 {
 	u64 min_usage = U64_MAX;
 	u16 i, res = 0;
 
-	spin_lock(&cache_idx_lock);
+	spin_lock(cache_idx_lock);
 
 	for (i = 0; i < BPF_STORAGE_CACHE_SIZE; i++) {
 		if (cache_idx_usage_counts[i] < min_usage) {
@@ -479,16 +477,17 @@ static u16 cache_idx_get(void)
 	}
 	cache_idx_usage_counts[res]++;
 
-	spin_unlock(&cache_idx_lock);
+	spin_unlock(cache_idx_lock);
 
 	return res;
 }
 
-static void cache_idx_free(u16 idx)
+void bpf_ls_cache_idx_free(spinlock_t *cache_idx_lock,
+			   u64 *cache_idx_usage_counts, u16 idx)
 {
-	spin_lock(&cache_idx_lock);
+	spin_lock(cache_idx_lock);
 	cache_idx_usage_counts[idx]--;
-	spin_unlock(&cache_idx_lock);
+	spin_unlock(cache_idx_lock);
 }
 
 static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
@@ -552,17 +551,12 @@ void bpf_inode_storage_free(struct inode *inode)
 		kfree_rcu(local_storage, rcu);
 }
 
-void bpf_local_storage_map_free(struct bpf_map *map)
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
 {
 	struct bpf_local_storage_elem *selem;
-	struct bpf_local_storage_map *smap;
 	struct bucket *b;
 	unsigned int i;
 
-	smap = (struct bpf_local_storage_map *)map;
-
-	cache_idx_free(smap->cache_idx);
-
 	/* Note that this map might be concurrently cloned from
 	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
 	 * RCU read section to finish before proceeding. New RCU
@@ -607,7 +601,7 @@ void bpf_local_storage_map_free(struct bpf_map *map)
 	synchronize_rcu();
 
 	kvfree(smap->buckets);
-	kfree(map);
+	kfree(smap);
 }
 
 int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
@@ -629,8 +623,7 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
 	return 0;
 }
 
-
-struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_local_storage_map *smap;
 	unsigned int i;
@@ -670,9 +663,8 @@ struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 
 	smap->elem_size =
 		sizeof(struct bpf_local_storage_elem) + attr->value_size;
-	smap->cache_idx = cache_idx_get();
 
-	return &smap->map;
+	return smap;
 }
 
 int bpf_local_storage_map_check_btf(const struct bpf_map *map,
@@ -768,11 +760,34 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
 	return -ENOTSUPP;
 }
 
+DEFINE_BPF_STORAGE_CACHE(inode);
+
+struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = bpf_local_storage_map_alloc(attr);
+	if (IS_ERR(smap))
+		return ERR_CAST(smap);
+
+	smap->cache_idx = cache_idx_get_inode();
+	return &smap->map;
+}
+
+void inode_storage_map_free(struct bpf_map *map)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = (struct bpf_local_storage_map *)map;
+	cache_idx_free_inode(smap->cache_idx);
+	bpf_local_storage_map_free(smap);
+}
+
 static int inode_storage_map_btf_id;
 const struct bpf_map_ops inode_storage_map_ops = {
 	.map_alloc_check = bpf_local_storage_map_alloc_check,
-	.map_alloc = bpf_local_storage_map_alloc,
-	.map_free = bpf_local_storage_map_free,
+	.map_alloc = inode_storage_map_alloc,
+	.map_free = inode_storage_map_free,
 	.map_get_next_key = notsupp_get_next_key,
 	.map_lookup_elem = bpf_inode_storage_lookup_elem,
 	.map_update_elem = bpf_inode_storage_update_elem,
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 0ec44e819bfe..add0340e9ad3 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -396,11 +396,34 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
 	return -ENOTSUPP;
 }
 
+DEFINE_BPF_STORAGE_CACHE(sk);
+
+struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = bpf_local_storage_map_alloc(attr);
+	if (IS_ERR(smap))
+		return ERR_CAST(smap);
+
+	smap->cache_idx = cache_idx_get_sk();
+	return &smap->map;
+}
+
+void sk_storage_map_free(struct bpf_map *map)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = (struct bpf_local_storage_map *)map;
+	cache_idx_free_sk(smap->cache_idx);
+	bpf_local_storage_map_free(smap);
+}
+
 static int sk_storage_map_btf_id;
 const struct bpf_map_ops sk_storage_map_ops = {
 	.map_alloc_check = bpf_local_storage_map_alloc_check,
-	.map_alloc = bpf_local_storage_map_alloc,
-	.map_free = bpf_local_storage_map_free,
+	.map_alloc = sk_storage_map_alloc,
+	.map_free = sk_storage_map_free,
 	.map_get_next_key = notsupp_get_next_key,
 	.map_lookup_elem = bpf_sk_storage_lookup_elem,
 	.map_update_elem = bpf_sk_storage_update_elem,

>

[...]

> >
> > Sure, I can also keep the sk_clone code their as well for now.
> Just came to my mind.  For easier review purpose, may be
> first do the refactoring/renaming within bpf_sk_storage.c first and
> then create another patch to move the common parts to a new
> file bpf_local_storage.c.
>
> Not sure if it will be indeed easier to read the diff in practice.
> I probably should be able to follow it either way.

Since I already refactored it. I will send the next version with the refactoring
and split done as a part of the Generalize bpf_sk_storage patch.
If it becomes too hard to review, please let me know and I can split it. :)


>
> >
> > >
> > > There is a test in map_tests/sk_storage_map.c, in case you may not notice.
> >
> > I will try to make it generic as a part of this series. If it takes
> > too much time, I will send a separate patch for testing
> > inode_storage_map and till then we have some assurance with
> > test_local_storage in test_progs.
> Sure. no problem.  It is mostly for you to test sk_storage to ensure things ;)
> Also give some ideas on what racing conditions have
> been considered in the sk_storage test and may be the inode storage
> test want to stress similar code path.

Thanks! I ran test_maps and it passes. I will send a separate patch
that generalizes the sk_storage_map.c.

- KP

^ permalink raw reply related

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Alexei Starovoitov @ 2020-06-30 21:57 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric W. Biederman, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <c7d4df91-d78e-5134-2161-192426fc51cd@i-love.sakura.ne.jp>

On Tue, Jun 30, 2020 at 2:55 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/07/01 1:48, Alexei Starovoitov wrote:
> > On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
> >> On 2020/06/30 5:19, Eric W. Biederman wrote:
> >>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> >>>
> >>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
> >>>>> But all the defensive programming kinda goes against general kernel style.
> >>>>> I wouldn't do it. Especially pr_info() ?!
> >>>>> Though I don't feel strongly about it.
> >>>>
> >>>> Honestly speaking, caller should check for errors and print appropriate
> >>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
> >>>> went wrong (maybe memory corruption). But other conditions are not fatal.
> >>>> That is, I consider even pr_info() here should be unnecessary.
> >>>
> >>> They were all should never happen cases.  Which is why my patches do:
> >>> if (WARN_ON_ONCE(...))
> >>
> >> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> >
> > I don't believe that's true.
> > Please show fuzzing stack trace to prove your point.
> >
>
> Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)

Is it a joke? Do you understand how syzbot works?
If so, please explain how it can invoke umd_* interface.

^ permalink raw reply

* Re: [PATCH 00/14] Make the user mode driver code a better citizen
From: Tetsuo Handa @ 2020-06-30 21:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric W. Biederman, Linus Torvalds, David Miller,
	Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov,
	Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler
In-Reply-To: <20200630164817.txa2jewfvk4stajy@ast-mbp.dhcp.thefacebook.com>

On 2020/07/01 1:48, Alexei Starovoitov wrote:
> On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
>> On 2020/06/30 5:19, Eric W. Biederman wrote:
>>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>>>
>>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>>>> But all the defensive programming kinda goes against general kernel style.
>>>>> I wouldn't do it. Especially pr_info() ?!
>>>>> Though I don't feel strongly about it.
>>>>
>>>> Honestly speaking, caller should check for errors and print appropriate
>>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>>>> went wrong (maybe memory corruption). But other conditions are not fatal.
>>>> That is, I consider even pr_info() here should be unnecessary.
>>>
>>> They were all should never happen cases.  Which is why my patches do:
>>> if (WARN_ON_ONCE(...))
>>
>> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> 
> I don't believe that's true.
> Please show fuzzing stack trace to prove your point.
> 

Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox