Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* [PATCH] ima: fix deadlock within "ima_match_policy" function.
From: liqiong @ 2021-08-24  8:57 UTC (permalink / raw)
  To: Simon.THOBY, zohar
  Cc: dmitry.kasatkin, jmorris, serge, linux-integrity,
	linux-security-module, linux-kernel, liqiong
In-Reply-To: <20210819101529.28001-1-liqiong@nfschina.com>

When "ima_match_policy" is looping while "ima_update_policy" changs
the variable "ima_rules", then "ima_match_policy" may can't exit
loop, Finally cause RCU CPU Stall Warnings: "rcu_sched detected
stall on CPU ...".

The problem is limited to transitioning from the builtin policy to
the custom policy. Eg. At boot time, systemd-services are being
checked within "ima_match_policy", at the same time, the variable
"ima_rules" is changed by another service.

Signed-off-by: liqiong <liqiong@nfschina.com>
---
 security/integrity/ima/ima_policy.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index fd5d46e511f1..e92b197bfd3c 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -662,12 +662,14 @@ int ima_match_policy(struct user_namespace *mnt_userns, struct inode *inode,
 {
 	struct ima_rule_entry *entry;
 	int action = 0, actmask = flags | (flags << 1);
+	struct list_head *ima_rules_tmp;
 
 	if (template_desc && !*template_desc)
 		*template_desc = ima_template_desc_current();
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(entry, ima_rules, list) {
+	ima_rules_tmp = rcu_dereference(ima_rules);
+	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
 
 		if (!(entry->action & actmask))
 			continue;
@@ -919,8 +921,8 @@ void ima_update_policy(void)
 
 	if (ima_rules != policy) {
 		ima_policy_flag = 0;
-		ima_rules = policy;
 
+		rcu_assign_pointer(ima_rules, policy);
 		/*
 		 * IMA architecture specific policy rules are specified
 		 * as strings and converted to an array of ima_entry_rules
@@ -1649,9 +1651,11 @@ void *ima_policy_start(struct seq_file *m, loff_t *pos)
 {
 	loff_t l = *pos;
 	struct ima_rule_entry *entry;
+	struct list_head *ima_rules_tmp;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(entry, ima_rules, list) {
+	ima_rules_tmp = rcu_dereference(ima_rules);
+	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
 		if (!l--) {
 			rcu_read_unlock();
 			return entry;
@@ -1670,7 +1674,8 @@ void *ima_policy_next(struct seq_file *m, void *v, loff_t *pos)
 	rcu_read_unlock();
 	(*pos)++;
 
-	return (&entry->list == ima_rules) ? NULL : entry;
+	return (&entry->list == &ima_default_rules ||
+		&entry->list == &ima_policy_rules) ? NULL : entry;
 }
 
 void ima_policy_stop(struct seq_file *m, void *v)
@@ -1872,6 +1877,7 @@ bool ima_appraise_signature(enum kernel_read_file_id id)
 	struct ima_rule_entry *entry;
 	bool found = false;
 	enum ima_hooks func;
+	struct list_head *ima_rules_tmp;
 
 	if (id >= READING_MAX_ID)
 		return false;
@@ -1879,7 +1885,8 @@ bool ima_appraise_signature(enum kernel_read_file_id id)
 	func = read_idmap[id] ?: FILE_CHECK;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(entry, ima_rules, list) {
+	ima_rules_tmp = rcu_dereference(ima_rules);
+	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
 		if (entry->action != APPRAISE)
 			continue;
 
-- 
2.11.0


^ permalink raw reply related

* Writes to /proc/self/mem and file_mprotect() LSM hook
From: Igor Zhbanov @ 2021-08-24  9:25 UTC (permalink / raw)
  To: linux-security-module, linux-integrity

Hello,

There are several ways to write data to write-protected page. For example,
a process can write to /proc/self/mem to change read-only or even executable
pages: https://offlinemark.com/2021/05/12/an-obscure-quirk-of-proc/

In this case, the kernel code will map the physical page with another access
mode and change the data (FOLL_FORCE flag will ignore the access check). The
problem is that no security hooks are called in this case. For example, the
file_mprotect() LSM hook was designed to intercept process' attempts to
remap memory pages. Particularly SELinux and IMA controlling, if a process
is trying to make a code page writable. And this method allows to bypass it.

Therefore, my question is, should all page modifications that ignores the
protection mode call LSM hook prior to temporarily remapping the page?

Thanks.

^ permalink raw reply

* Re: [PATCH] security: remove unneeded subdir-$(CONFIG_...)
From: Masahiro Yamada @ 2021-08-24  9:27 UTC (permalink / raw)
  To: James Morris, Serge E . Hallyn
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	John Fastabend, KP Singh, Martin KaFai Lau,
	Mickaël Salaün, Song Liu, Yonghong Song, bpf,
	Linux Kernel Mailing List, linux-security-module, Networking
In-Reply-To: <CAK7LNAQ5x55oCYRQbbC6fCE6qP5cp1Jdw+9SH-BNFuN=bqntFw@mail.gmail.com>

On Sat, Jul 31, 2021 at 3:02 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Sat, May 29, 2021 at 3:02 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> >
> > All of these are unneeded. The directories to descend are specified
> > by obj-$(CONFIG_...).
> >
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>
>
> Ping?
>



Applied to linux-kbuild.




>
>
> > ---
> >
> >  security/Makefile | 11 -----------
> >  1 file changed, 11 deletions(-)
> >
> > diff --git a/security/Makefile b/security/Makefile
> > index 47e432900e24..18121f8f85cd 100644
> > --- a/security/Makefile
> > +++ b/security/Makefile
> > @@ -4,16 +4,6 @@
> >  #
> >
> >  obj-$(CONFIG_KEYS)                     += keys/
> > -subdir-$(CONFIG_SECURITY_SELINUX)      += selinux
> > -subdir-$(CONFIG_SECURITY_SMACK)                += smack
> > -subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
> > -subdir-$(CONFIG_SECURITY_APPARMOR)     += apparmor
> > -subdir-$(CONFIG_SECURITY_YAMA)         += yama
> > -subdir-$(CONFIG_SECURITY_LOADPIN)      += loadpin
> > -subdir-$(CONFIG_SECURITY_SAFESETID)    += safesetid
> > -subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown
> > -subdir-$(CONFIG_BPF_LSM)               += bpf
> > -subdir-$(CONFIG_SECURITY_LANDLOCK)     += landlock
> >
> >  # always enable default capabilities
> >  obj-y                                  += commoncap.o
> > @@ -36,5 +26,4 @@ obj-$(CONFIG_BPF_LSM)                 += bpf/
> >  obj-$(CONFIG_SECURITY_LANDLOCK)                += landlock/
> >
> >  # Object integrity file lists
> > -subdir-$(CONFIG_INTEGRITY)             += integrity
> >  obj-$(CONFIG_INTEGRITY)                        += integrity/
> > --
> > 2.27.0
> >
>
>
> --
> Best Regards
> Masahiro Yamada



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH] ima: fix deadlock within "ima_match_policy" function.
From: THOBY Simon @ 2021-08-24  9:50 UTC (permalink / raw)
  To: liqiong, zohar@linux.ibm.com
  Cc: dmitry.kasatkin@gmail.com, jmorris@namei.org, serge@hallyn.com,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20210824085747.23604-1-liqiong@nfschina.com>

Hi liqiong,

On 8/24/21 10:57 AM, liqiong wrote:
> When "ima_match_policy" is looping while "ima_update_policy" changs

Small typo: "changes"/"updates"

> the variable "ima_rules", then "ima_match_policy" may can't exit
> loop, Finally cause RCU CPU Stall Warnings: "rcu_sched detected
> stall on CPU ...".

This could perhaps be rephrased to something like:
"""
ima_match_policy() can loop on the policy ruleset while
ima_update_policy() updates the variable "ima_rules".
This can lead to a situation where ima_match_policy()
can't exit the 'list_for_each_entry_rcu' loop, causing
RCU stalls ("rcu_sched detected stall on CPU ...").
"""


> 
> The problem is limited to transitioning from the builtin policy to
> the custom policy. Eg. At boot time, systemd-services are being
> checked within "ima_match_policy", at the same time, the variable
> "ima_rules" is changed by another service.

For the second sentence, consider something in the likes of:
"This problem can happen in practice: updating the IMA policy
in the boot process while systemd-services are being checked
have been observed to trigger this issue.".


Your commit message is also supposed to explain what you are doing,
using the imperative form ((see 'Documentation/process/submitting-patches.rst'):
"""
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.
"""

Maybe add a paragraph with something like "Fix locking by introducing ...."?


> 
> Signed-off-by: liqiong <liqiong@nfschina.com>
> ---
>  security/integrity/ima/ima_policy.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> index fd5d46e511f1..e92b197bfd3c 100644
> --- a/security/integrity/ima/ima_policy.c
> +++ b/security/integrity/ima/ima_policy.c
> @@ -662,12 +662,14 @@ int ima_match_policy(struct user_namespace *mnt_userns, struct inode *inode,
>  {
>  	struct ima_rule_entry *entry;
>  	int action = 0, actmask = flags | (flags << 1);
> +	struct list_head *ima_rules_tmp;
>  
>  	if (template_desc && !*template_desc)
>  		*template_desc = ima_template_desc_current();
>  
>  	rcu_read_lock();
> -	list_for_each_entry_rcu(entry, ima_rules, list) {
> +	ima_rules_tmp = rcu_dereference(ima_rules);
> +	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
>  
>  		if (!(entry->action & actmask))
>  			continue;
> @@ -919,8 +921,8 @@ void ima_update_policy(void)
>  
>  	if (ima_rules != policy) {
>  		ima_policy_flag = 0;
> -		ima_rules = policy;
>  
> +		rcu_assign_pointer(ima_rules, policy);
>  		/*
>  		 * IMA architecture specific policy rules are specified
>  		 * as strings and converted to an array of ima_entry_rules
> @@ -1649,9 +1651,11 @@ void *ima_policy_start(struct seq_file *m, loff_t *pos)
>  {
>  	loff_t l = *pos;
>  	struct ima_rule_entry *entry;
> +	struct list_head *ima_rules_tmp;
>  
>  	rcu_read_lock();
> -	list_for_each_entry_rcu(entry, ima_rules, list) {
> +	ima_rules_tmp = rcu_dereference(ima_rules);
> +	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
>  		if (!l--) {
>  			rcu_read_unlock();
>  			return entry;
> @@ -1670,7 +1674,8 @@ void *ima_policy_next(struct seq_file *m, void *v, loff_t *pos)
>  	rcu_read_unlock();
>  	(*pos)++;
>  
> -	return (&entry->list == ima_rules) ? NULL : entry;
> +	return (&entry->list == &ima_default_rules ||
> +		&entry->list == &ima_policy_rules) ? NULL : entry;
>  }
>  
>  void ima_policy_stop(struct seq_file *m, void *v)
> @@ -1872,6 +1877,7 @@ bool ima_appraise_signature(enum kernel_read_file_id id)
>  	struct ima_rule_entry *entry;
>  	bool found = false;
>  	enum ima_hooks func;
> +	struct list_head *ima_rules_tmp;
>  
>  	if (id >= READING_MAX_ID)
>  		return false;
> @@ -1879,7 +1885,8 @@ bool ima_appraise_signature(enum kernel_read_file_id id)
>  	func = read_idmap[id] ?: FILE_CHECK;
>  
>  	rcu_read_lock();
> -	list_for_each_entry_rcu(entry, ima_rules, list) {
> +	ima_rules_tmp = rcu_dereference(ima_rules);
> +	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
>  		if (entry->action != APPRAISE)
>  			continue;
>  
> 

I haven't tested the patch myself, but the code diff looks fine to me.

Thanks,
Simon

^ permalink raw reply

* Re: [PATCH] ima: fix deadlock within "ima_match_policy" function.
From: liqiong @ 2021-08-24 12:09 UTC (permalink / raw)
  To: THOBY Simon, zohar@linux.ibm.com
  Cc: dmitry.kasatkin@gmail.com, jmorris@namei.org, serge@hallyn.com,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <e720e88e-ebfa-56df-6048-f2da0b8fa2a0@viveris.fr>

Hi Simon :

ima: fix deadlock within RCU list of ima_rules.

ima_match_policy() is looping on the policy ruleset while
ima_update_policy() updates the variable "ima_rules". This can
lead to a situation where ima_match_policy() can't exit the
'list_for_each_entry_rcu' loop, causing RCU stalls
("rcu_sched detected stall on CPU ...").

This problem can happen in practice: updating the IMA policy
in the boot process while systemd-services are being checked.

In addition to ima_match_policy(), other function with 
"list_for_each_entry_rcu" should happen too. Fix locking by 
introducing a duplicate of "ima_rules" for each 
"list_for_each_entry_rcu".


How about this commit message ?

I have tested this patch in lab, we can reproduced this error case, 
have done reboot test many times. This patch should work. 


在 2021年08月24日 17:50, THOBY Simon 写道:
> Hi liqiong,
>
> On 8/24/21 10:57 AM, liqiong wrote:
>> When "ima_match_policy" is looping while "ima_update_policy" changs
> Small typo: "changes"/"updates"
>
>> the variable "ima_rules", then "ima_match_policy" may can't exit
>> loop, Finally cause RCU CPU Stall Warnings: "rcu_sched detected
>> stall on CPU ...".
> This could perhaps be rephrased to something like:
> """
> ima_match_policy() can loop on the policy ruleset while
> ima_update_policy() updates the variable "ima_rules".
> This can lead to a situation where ima_match_policy()
> can't exit the 'list_for_each_entry_rcu' loop, causing
> RCU stalls ("rcu_sched detected stall on CPU ...").
> """
>
>
>> The problem is limited to transitioning from the builtin policy to
>> the custom policy. Eg. At boot time, systemd-services are being
>> checked within "ima_match_policy", at the same time, the variable
>> "ima_rules" is changed by another service.
> For the second sentence, consider something in the likes of:
> "This problem can happen in practice: updating the IMA policy
> in the boot process while systemd-services are being checked
> have been observed to trigger this issue.".
>
>
> Your commit message is also supposed to explain what you are doing,
> using the imperative form ((see 'Documentation/process/submitting-patches.rst'):
> """
> Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour.
> """
>
> Maybe add a paragraph with something like "Fix locking by introducing ...."?
>
>
>> Signed-off-by: liqiong <liqiong@nfschina.com>
>> ---
>>  security/integrity/ima/ima_policy.c | 17 ++++++++++++-----
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
>> index fd5d46e511f1..e92b197bfd3c 100644
>> --- a/security/integrity/ima/ima_policy.c
>> +++ b/security/integrity/ima/ima_policy.c
>> @@ -662,12 +662,14 @@ int ima_match_policy(struct user_namespace *mnt_userns, struct inode *inode,
>>  {
>>  	struct ima_rule_entry *entry;
>>  	int action = 0, actmask = flags | (flags << 1);
>> +	struct list_head *ima_rules_tmp;
>>  
>>  	if (template_desc && !*template_desc)
>>  		*template_desc = ima_template_desc_current();
>>  
>>  	rcu_read_lock();
>> -	list_for_each_entry_rcu(entry, ima_rules, list) {
>> +	ima_rules_tmp = rcu_dereference(ima_rules);
>> +	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
>>  
>>  		if (!(entry->action & actmask))
>>  			continue;
>> @@ -919,8 +921,8 @@ void ima_update_policy(void)
>>  
>>  	if (ima_rules != policy) {
>>  		ima_policy_flag = 0;
>> -		ima_rules = policy;
>>  
>> +		rcu_assign_pointer(ima_rules, policy);
>>  		/*
>>  		 * IMA architecture specific policy rules are specified
>>  		 * as strings and converted to an array of ima_entry_rules
>> @@ -1649,9 +1651,11 @@ void *ima_policy_start(struct seq_file *m, loff_t *pos)
>>  {
>>  	loff_t l = *pos;
>>  	struct ima_rule_entry *entry;
>> +	struct list_head *ima_rules_tmp;
>>  
>>  	rcu_read_lock();
>> -	list_for_each_entry_rcu(entry, ima_rules, list) {
>> +	ima_rules_tmp = rcu_dereference(ima_rules);
>> +	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
>>  		if (!l--) {
>>  			rcu_read_unlock();
>>  			return entry;
>> @@ -1670,7 +1674,8 @@ void *ima_policy_next(struct seq_file *m, void *v, loff_t *pos)
>>  	rcu_read_unlock();
>>  	(*pos)++;
>>  
>> -	return (&entry->list == ima_rules) ? NULL : entry;
>> +	return (&entry->list == &ima_default_rules ||
>> +		&entry->list == &ima_policy_rules) ? NULL : entry;
>>  }
>>  
>>  void ima_policy_stop(struct seq_file *m, void *v)
>> @@ -1872,6 +1877,7 @@ bool ima_appraise_signature(enum kernel_read_file_id id)
>>  	struct ima_rule_entry *entry;
>>  	bool found = false;
>>  	enum ima_hooks func;
>> +	struct list_head *ima_rules_tmp;
>>  
>>  	if (id >= READING_MAX_ID)
>>  		return false;
>> @@ -1879,7 +1885,8 @@ bool ima_appraise_signature(enum kernel_read_file_id id)
>>  	func = read_idmap[id] ?: FILE_CHECK;
>>  
>>  	rcu_read_lock();
>> -	list_for_each_entry_rcu(entry, ima_rules, list) {
>> +	ima_rules_tmp = rcu_dereference(ima_rules);
>> +	list_for_each_entry_rcu(entry, ima_rules_tmp, list) {
>>  		if (entry->action != APPRAISE)
>>  			continue;
>>  
>>
> I haven't tested the patch myself, but the code diff looks fine to me.
>
> Thanks,
> Simon


^ permalink raw reply

* Re: [PATCH] ima: fix deadlock within "ima_match_policy" function.
From: Mimi Zohar @ 2021-08-24 12:38 UTC (permalink / raw)
  To: liqiong, THOBY Simon
  Cc: dmitry.kasatkin@gmail.com, jmorris@namei.org, serge@hallyn.com,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <3ba4da9d-fa7b-c486-0c48-67cee4d5de6d@nfschina.com>

On Tue, 2021-08-24 at 20:09 +0800, liqiong wrote:
> Hi Simon :
> 
> ima: fix deadlock within RCU list of ima_rules.
> 

Before the following paragraph, an introductory sentence is needed. 
Try adding a sentence to the affect that "ima_rules" initially points
to the "ima_default_rules", but after loading a custom policy points to
the "ima_policy_rules".   Then describe the bug at a high level,
something like - transitioning to the "ima_policy_rules" isn't being
done safely.

Followed by the details.

> ima_match_policy() is looping on the policy ruleset while
> ima_update_policy() updates the variable "ima_rules". This can
> lead to a situation where ima_match_policy() can't exit the
> 'list_for_each_entry_rcu' loop, causing RCU stalls
> ("rcu_sched detected stall on CPU ...").
> 
> This problem can happen in practice: updating the IMA policy
> in the boot process while systemd-services are being checked.
> 
> In addition to ima_match_policy(), other function with 
> "list_for_each_entry_rcu" should happen too. Fix locking by 
> introducing a duplicate of "ima_rules" for each 
> "list_for_each_entry_rcu".
> 
> 
> How about this commit message ?
> 
> I have tested this patch in lab, we can reproduced this error case, 
> have done reboot test many times. This patch should work. 

The above comment doesn't belong in the commit message, but is a
message to the reviewers/maintainers and goes after the patch
descriptions three dashes line.

thanks,

Mimi


^ permalink raw reply

* Re: [PATCH v4 00/12] Enroll kernel keys thru MOK
From: Mimi Zohar @ 2021-08-24 14:34 UTC (permalink / raw)
  To: Nayna, Jarkko Sakkinen, Eric Snowberg, David Howells
  Cc: keyrings, linux-integrity, David Woodhouse, Herbert Xu,
	David S . Miller, James Morris, Serge E . Hallyn, keescook,
	gregkh, torvalds, scott.branden, weiyongjun1, nayna, ebiggers,
	ardb, Lakshmi Ramasubramanian, lszubowi, linux-kernel,
	linux-crypto, linux-security-module, James Bottomley, pjones,
	konrad.wilk@oracle.com, Patrick Uiterwijk
In-Reply-To: <bffb33a3-d5b5-f376-9d7d-706d38357d1a@linux.vnet.ibm.com>


> >> Jarkko, I think the emphasis should not be on "machine" from Machine
> >> Owner Key (MOK), but on "owner".  Whereas Nayna is focusing more on the
> >> "_ca" aspect of the name.   Perhaps consider naming it
> >> "system_owner_ca" or something along those lines.

> > What do you gain such overly long identifier? Makes no sense. What
> > is "ca aspect of the name" anyway?
> 
> As I mentioned previously, the main usage of this new keyring is that it 
> should contain only CA keys which can be later used to vouch for user 
> keys loaded onto secondary or IMA keyring at runtime. Having ca in the 
> name like .xxxx_ca, would make the keyring name self-describing. Since 
> you preferred .system, we can call it .system_ca.

Sounds good to me.  Jarkko?

thanks,

Mimi


^ permalink raw reply

* Re: [PATCH v28 22/25] Audit: Add record for multiple process LSM attributes
From: Paul Moore @ 2021-08-24 14:45 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: casey.schaufler, James Morris, linux-security-module, selinux,
	linux-audit, keescook, john.johansen, penguin-kernel,
	Stephen Smalley
In-Reply-To: <be20e3c8-a068-4aa2-be52-8601cf2d30a6@schaufler-ca.com>

On Fri, Aug 20, 2021 at 7:48 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> > On 8/20/2021 12:06 PM, Paul Moore wrote:
> >> Unless you explicitly enable audit on the kernel cmdline, e.g.
> >> "audit=1", processes started before userspace enables audit will not
> >> have a properly allocated audit_context; see the "if
> >> (likely(!audit_ever_enabled))" check at the top of audit_alloc() for
> >> the reason why.
>
> I found a hack-around that no one will like. I changed that check to be
>
> (likely(!audit_ever_enabled) && !lsm_multiple_contexts())
>
> It probably introduces a memory leak and/or performance degradation,
> but it has the desired affect.

I can't speak for everyone, but I know I don't like that as a solution
;)  I imagine such a change would also draw the ire of the never-audit
crowd and the distros themselves.  However, I understand the need to
just get *something* in place so you can continue to test/develop;
it's fine to keep that for now, but I'm going to be very disappointed
if that line finds its way into the next posted patchset revision.

I'm very much open to ideas but my gut feeling is that the end
solution is going to be changes to audit_log_start() and
audit_log_end().  In my mind the primary reason for this hunch is that
support for multiple LSMs[*] needs to be transparent to the various
callers in the kernel; this means the existing audit pattern of ...

  audit_log_start(...);
  audit_log_format(...);
  audit_log_end(...);

... should be preserved and be unchanged from what it is now.  We've
already talked in some general terms about what such changes might
look like, but to summarize the previous discussions, I think we would
need to think about the following things:

* Adding a timestamp/serial field to the audit_buffer struct, it could
even be in a union with the audit_context pointer as we would never
need both at the same time: if the audit_context ptr is non-NULL you
use that, otherwise you use the timestamp.  The audit_buffer timestamp
would not eliminate the need for the timestamp info in the
audit_context struct for what I hope are obvious reasons.

* Additional logic in audit_log_end() to generate additional ancillary
records for LSM labels.  This will likely mean storing the message
"type" passed to audit_log_start() in the audit_record struct and
using that information in audit_log_end() to decide if a LSM ancillary
record is needed.  Logistically this would likely mean converting the
existing audit_log_end() function into a static/private
__audit_log_end() and converting audit_log_end() into something like
this:

  void audit_log_end(ab)
  {
    __audit_log_end(ab); // rm the ab cleanup from __audit_log_end
    if (ab->anc_mlsm) {
      // generate the multiple lsm record
    }
    audit_buffer_free(ab);
  }

* Something else that I'm surely forgetting :)

At the end of all this we may find that the "local" audit_context
concept is no longer needed.  It was originally created to deal with
grouping ancillary records with non-syscall records into a single
event; while what we are talking about above is different, I believe
that our likely solution will also work to solve the original "local"
audit_context use case as well.  We'll have to see how this goes.

[*] I expect that the audit container ID work will have similar issues
and need a similar solution, I'm surprised it hasn't come up yet.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH v28 22/25] Audit: Add record for multiple process LSM attributes
From: Casey Schaufler @ 2021-08-24 15:20 UTC (permalink / raw)
  To: Paul Moore
  Cc: casey.schaufler, James Morris, linux-security-module, selinux,
	linux-audit, keescook, john.johansen, penguin-kernel,
	Stephen Smalley, Casey Schaufler
In-Reply-To: <CAHC9VhT-MfsU-azbV4QQ-asQFqdCG8fAeB-BOV3MKAdtSOW8Nw@mail.gmail.com>

On 8/24/2021 7:45 AM, Paul Moore wrote:
> On Fri, Aug 20, 2021 at 7:48 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>> On 8/20/2021 12:06 PM, Paul Moore wrote:
>>>> Unless you explicitly enable audit on the kernel cmdline, e.g.
>>>> "audit=1", processes started before userspace enables audit will not
>>>> have a properly allocated audit_context; see the "if
>>>> (likely(!audit_ever_enabled))" check at the top of audit_alloc() for
>>>> the reason why.
>> I found a hack-around that no one will like. I changed that check to be
>>
>> (likely(!audit_ever_enabled) && !lsm_multiple_contexts())
>>
>> It probably introduces a memory leak and/or performance degradation,
>> but it has the desired affect.
> I can't speak for everyone, but I know I don't like that as a solution
> ;)  I imagine such a change would also draw the ire of the never-audit
> crowd and the distros themselves.  However, I understand the need to
> just get *something* in place so you can continue to test/develop;
> it's fine to keep that for now, but I'm going to be very disappointed
> if that line finds its way into the next posted patchset revision.

As I said, it's a hack-around that demonstrates the scope of the
problem. Had you expressed enthusiastic approval for it I'd have
been very surprised.

> I'm very much open to ideas but my gut feeling is that the end
> solution is going to be changes to audit_log_start() and
> audit_log_end().  In my mind the primary reason for this hunch is that
> support for multiple LSMs[*] needs to be transparent to the various
> callers in the kernel; this means the existing audit pattern of ...
>
>   audit_log_start(...);
>   audit_log_format(...);
>   audit_log_end(...);
>
> ... should be preserved and be unchanged from what it is now.  We've
> already talked in some general terms about what such changes might
> look like, but to summarize the previous discussions, I think we would
> need to think about the following things:

I will give this a shot.

>
> * Adding a timestamp/serial field to the audit_buffer struct, it could
> even be in a union with the audit_context pointer as we would never
> need both at the same time: if the audit_context ptr is non-NULL you
> use that, otherwise you use the timestamp.  The audit_buffer timestamp
> would not eliminate the need for the timestamp info in the
> audit_context struct for what I hope are obvious reasons.
>
> * Additional logic in audit_log_end() to generate additional ancillary
> records for LSM labels.  This will likely mean storing the message
> "type" passed to audit_log_start() in the audit_record struct and
> using that information in audit_log_end() to decide if a LSM ancillary
> record is needed.  Logistically this would likely mean converting the
> existing audit_log_end() function into a static/private
> __audit_log_end() and converting audit_log_end() into something like
> this:
>
>   void audit_log_end(ab)
>   {
>     __audit_log_end(ab); // rm the ab cleanup from __audit_log_end
>     if (ab->anc_mlsm) {
>       // generate the multiple lsm record
>     }
>     audit_buffer_free(ab);
>   }
>
> * Something else that I'm surely forgetting :)
>
> At the end of all this we may find that the "local" audit_context
> concept is no longer needed.  It was originally created to deal with
> grouping ancillary records with non-syscall records into a single
> event; while what we are talking about above is different, I believe
> that our likely solution will also work to solve the original "local"
> audit_context use case as well.  We'll have to see how this goes.

I'll keep that in mind as I fiddle about.

> [*] I expect that the audit container ID work will have similar issues
> and need a similar solution, I'm surprised it hasn't come up yet.

Hmm. That effort has been quiet lately. Too quiet.



^ permalink raw reply

* Re: [PATCH 0/4] KEYS: trusted: Introduce support for NXP CAAM-based trusted keys
From: Tim Harvey @ 2021-08-24 15:23 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: David Gstir, Aymen Sghaier, Mimi Zohar, Jan Luebbe, keyrings,
	Steffen Trumtrar, linux-security-module, Udit Agarwal, Herbert Xu,
	Horia Geantă, Richard Weinberger, James Morris, Eric Biggers,
	Serge E. Hallyn, Sumit Garg, James Bottomley, Franck LENORMAND,
	David Howells, open list, Jarkko Sakkinen, linux-crypto,
	Sascha Hauer, linux-integrity, David S. Miller
In-Reply-To: <8b559c9c-a4c0-d335-5e54-40b9acc08707@pengutronix.de>

On Tue, Aug 24, 2021 at 12:33 AM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>
> On 23.08.21 19:50, Tim Harvey wrote:
> > On Mon, Aug 23, 2021 at 6:29 AM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
> >> On 20.08.21 23:19, Tim Harvey wrote:
> >>> On Fri, Aug 20, 2021 at 1:36 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
> >>>> On 20.08.21 22:20, Tim Harvey wrote:
> >>> It works for a user keyring but not a session keyring... does that
> >>> explain anything?
> >>> # keyctl add trusted mykey 'new 32' @u
> >>> 941210782
> >>> # keyctl print 941210782
> >>> 83b7845cb45216496aead9ee2c6a406f587d64aad47bddc539d8947a247e618798d9306b36398b5dc2722a4c3f220a3a763ee175f6bd64758fdd49ca4db597e8ce328121b60edbba9b8d8d55056be896
> >>> # keyctl add trusted mykey 'new 32' @s
> >>> 310571960
> >>> # keyctl print 310571960
> >>> keyctl_read_alloc: Unknown error 126
> >>
> >> Both sequences work for me.
> >>
> >> My getty is started by systemd. I think systemd allocates a new session
> >> keyring for the getty that's inherited by the shell and the commands I run
> >> it in. If you don't do that, each command will get its own session key.
> >>
> >>> Sorry, I'm still trying to wrap my head around the differences in
> >>> keyrings and trusted vs user keys.
> >>
> >> No problem. HTH.
> >
> > Ahmad,
> >
> > Ok that explains it - my testing is using a very basic buildroot
> > ramdisk rootfs. If I do a 'keyctl new_session' first I can use the
> > system keyring fine as well.
>
> Great. Does this mean I can get your Tested-by: ? :)
>

Absolutely,

For the series:

I tested this series on top of v5.14.rc-7 on a Gateworks
imx8mm-venice-gw73xx board with kernel param trusted.source=caam and
keyutils-1.6:
# keyctl new_session
22544757
# keyctl add trusted mykey 'new 32' @s
160701809
# keyctl print 160701809
990e03aa4515aee420eede17e26a58d0c5568c8bd2c9c2ee2f22a0583181d20d4f65cf9cb1f944a3cc92c0e3184a44a29a7e511f0a55a6af11a70ac2b2924514002475e73ae09820042896b9ee00a5ec

Tested-By: Tim Harvey <tharvey@gateworks.com>

One more question: I've got a user that wants to blob/deblob generic
data. They can use the caam_encap_blob/caam_decap_blob functions in
kernel code but could you give me a suggestion for how they could use
this in:
a) userspace code (using the keyctl syscall I assume)
b) userspace cmdline (via keyutils I assume)

Many thanks,

Tim

^ permalink raw reply

* Re: [PATCH v28 22/25] Audit: Add record for multiple process LSM attributes
From: Paul Moore @ 2021-08-24 16:14 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: casey.schaufler, James Morris, linux-security-module, selinux,
	linux-audit, keescook, john.johansen, penguin-kernel,
	Stephen Smalley
In-Reply-To: <a44252d1-6a96-def2-e84c-2faec643f5c1@schaufler-ca.com>

On Tue, Aug 24, 2021 at 11:20 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 8/24/2021 7:45 AM, Paul Moore wrote:
> > On Fri, Aug 20, 2021 at 7:48 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>> On 8/20/2021 12:06 PM, Paul Moore wrote:
> >>>> Unless you explicitly enable audit on the kernel cmdline, e.g.
> >>>> "audit=1", processes started before userspace enables audit will not
> >>>> have a properly allocated audit_context; see the "if
> >>>> (likely(!audit_ever_enabled))" check at the top of audit_alloc() for
> >>>> the reason why.
> >> I found a hack-around that no one will like. I changed that check to be
> >>
> >> (likely(!audit_ever_enabled) && !lsm_multiple_contexts())
> >>
> >> It probably introduces a memory leak and/or performance degradation,
> >> but it has the desired affect.
> > I can't speak for everyone, but I know I don't like that as a solution
> > ;)  I imagine such a change would also draw the ire of the never-audit
> > crowd and the distros themselves.  However, I understand the need to
> > just get *something* in place so you can continue to test/develop;
> > it's fine to keep that for now, but I'm going to be very disappointed
> > if that line finds its way into the next posted patchset revision.
>
> As I said, it's a hack-around that demonstrates the scope of the
> problem. Had you expressed enthusiastic approval for it I'd have
> been very surprised.

That's okay, you can admit you were trying to catch me not paying attention ;)

> > I'm very much open to ideas but my gut feeling is that the end
> > solution is going to be changes to audit_log_start() and
> > audit_log_end().  In my mind the primary reason for this hunch is that
> > support for multiple LSMs[*] needs to be transparent to the various
> > callers in the kernel; this means the existing audit pattern of ...
> >
> >   audit_log_start(...);
> >   audit_log_format(...);
> >   audit_log_end(...);
> >
> > ... should be preserved and be unchanged from what it is now.  We've
> > already talked in some general terms about what such changes might
> > look like, but to summarize the previous discussions, I think we would
> > need to think about the following things:
>
> I will give this a shot.

Thanks.  I'm sure I'm probably missing some detail, but if you get
stuck let me know and I'll try to lend a hand.

> > [*] I expect that the audit container ID work will have similar issues
> > and need a similar solution, I'm surprised it hasn't come up yet.
>
> Hmm. That effort has been quiet lately. Too quiet.

The current delay is intentional and is related to the io_uring work.

When Richard and I first became aware of the io_uring issue Richard
was in the process of readying his latest revision to the audit
container ID patchset and some of the changes he was incorporating, in
my opinion, hinted at the io_uring issue, or at least drew more
attention to that than I was comfortable seeing posted publicly.
Richard discussed this with his management and security response team,
and they felt differently.  I told Richard that I didn't want to block
him posting an update to the patchset, but that I felt it would be The
Wrong Thing To Do and if he did post the patchset I would likely
ignore it until after the io_uring fix had been posted so as to not
draw additional attention to his changes.  I can't speak for Richard's
mindset, but he appeared anxious to post his changes regardless of my
concerns, and he did so shortly afterwards.

That's why you haven't seen much progress on this for a while, and why
you will see me comment on the latest patchset after the io_uring
patches land in -next after the next merge window closes.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH v2 0/9] Add LSM access controls and auditing to io_uring
From: Richard Guy Briggs @ 2021-08-24 20:57 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, selinux, linux-audit, io-uring,
	linux-fsdevel, Kumar Kartikeya Dwivedi, Jens Axboe,
	Pavel Begunkov
In-Reply-To: <162871480969.63873.9434591871437326374.stgit@olly>

On 2021-08-11 16:48, Paul Moore wrote:
> Draft #2 of the patchset which brings auditing and proper LSM access
> controls to the io_uring subsystem.  The original patchset was posted
> in late May and can be found via lore using the link below:
> 
> https://lore.kernel.org/linux-security-module/162163367115.8379.8459012634106035341.stgit@sifl/
> 
> This draft should incorporate all of the feedback from the original
> posting as well as a few smaller things I noticed while playing
> further with the code.  The big change is of course the selective
> auditing in the io_uring op servicing, but that has already been
> discussed quite a bit in the original thread so I won't go into
> detail here; the important part is that we found a way to move
> forward and this draft captures that.  For those of you looking to
> play with these patches, they are based on Linus' v5.14-rc5 tag and
> on my test system they boot and appear to function without problem;
> they pass the selinux-testsuite and audit-testsuite and I have not
> noticed any regressions in the normal use of the system.  If you want
> to get a copy of these patches straight from git you can use the
> "working-io_uring" branch in the repo below:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
> https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
> 
> Beyond the existing test suite tests mentioned above, I've cobbled
> together some very basic, very crude tests to exercise some of the
> things I care about from a LSM/audit perspective.  These tests are
> pretty awful (I'm not kidding), but they might be helpful for the
> other LSM/audit developers who want to test things:
> 
> https://drop.paul-moore.com/90.kUgq
> 
> There are currently two tests: 'iouring.2' and 'iouring.3';
> 'iouring.1' was lost in a misguided and overzealous 'rm' command.
> The first test is standalone and basically tests the SQPOLL
> functionality while the second tests sharing io_urings across process
> boundaries and the credential/personality sharing mechanism.  The
> console output of both tests isn't particularly useful, the more
> interesting bits are in the audit and LSM specific logs.  The
> 'iouring.2' command requires no special arguments to run but the
> 'iouring.3' test is split into a "server" and "client"; the server
> should be run without argument:
> 
>   % ./iouring.3s
>   >>> server started, pid = 11678
>   >>> memfd created, fd = 3
>   >>> io_uring created; fd = 5, creds = 1
> 
> ... while the client should be run with two arguments: the first is
> the PID of the server process, the second is the "memfd" fd number:
> 
>   % ./iouring.3c 11678 3
>   >>> client started, server_pid = 11678 server_memfd = 3
>   >>> io_urings = 5 (server) / 5 (client)
>   >>> io_uring ops using creds = 1
>   >>> async op result: 36
>   >>> async op result: 36
>   >>> async op result: 36
>   >>> async op result: 36
>   >>> START file contents
>   What is this life if, full of care,
>   we have no time to stand and stare.
>   >>> END file contents
> 
> The tests were hacked together from various sources online,
> attribution and links to additional info can be found in the test
> sources, but I expect these tests to die a fiery death in the not
> to distant future as I work to add some proper tests to the SELinux
> and audit test suites.
> 
> As I believe these patches should spend a full -rcX cycle in
> linux-next, my current plan is to continue to solicit feedback on
> these patches while they undergo additional testing (next up is
> verification of the audit filter code for io_uring).  Assuming no
> critical issues are found on the mailing lists or during testing, I
> will post a proper patchset later with the idea of merging it into
> selinux/next after the upcoming merge window closes.
> 
> Any comments, feedback, etc. are welcome.

Thanks for the tests.  I have a bunch of userspace patches to add to the
last set I posted and these tests will help exercise them.  I also have
one more kernel patch to post...  I'll dive back into that now.  I had
wanted to post them before now but got distracted with AUDIT_TRIM
breakage.

> ---
> 
> Casey Schaufler (1):
>       Smack: Brutalist io_uring support with debug
> 
> Paul Moore (8):
>       audit: prepare audit_context for use in calling contexts beyond
>              syscalls
>       audit,io_uring,io-wq: add some basic audit support to io_uring
>       audit: dev/test patch to force io_uring auditing
>       audit: add filtering for io_uring records
>       fs: add anon_inode_getfile_secure() similar to
>           anon_inode_getfd_secure()
>       io_uring: convert io_uring to the secure anon inode interface
>       lsm,io_uring: add LSM hooks to io_uring
>       selinux: add support for the io_uring access controls
> 
> 
>  fs/anon_inodes.c                    |  29 ++
>  fs/io-wq.c                          |   4 +
>  fs/io_uring.c                       |  69 +++-
>  include/linux/anon_inodes.h         |   4 +
>  include/linux/audit.h               |  26 ++
>  include/linux/lsm_hook_defs.h       |   5 +
>  include/linux/lsm_hooks.h           |  13 +
>  include/linux/security.h            |  16 +
>  include/uapi/linux/audit.h          |   4 +-
>  kernel/audit.h                      |   7 +-
>  kernel/audit_tree.c                 |   3 +-
>  kernel/audit_watch.c                |   3 +-
>  kernel/auditfilter.c                |  15 +-
>  kernel/auditsc.c                    | 483 +++++++++++++++++++-----
>  security/security.c                 |  12 +
>  security/selinux/hooks.c            |  34 ++
>  security/selinux/include/classmap.h |   2 +
>  security/smack/smack_lsm.c          |  64 ++++
>  18 files changed, 678 insertions(+), 115 deletions(-)
> 

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply

* Re: [RFC PATCH v2 0/9] Add LSM access controls and auditing to io_uring
From: Paul Moore @ 2021-08-24 22:27 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-security-module, selinux, linux-audit, io-uring,
	linux-fsdevel, Kumar Kartikeya Dwivedi, Jens Axboe,
	Pavel Begunkov
In-Reply-To: <20210824205724.GB490529@madcap2.tricolour.ca>

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]

On Tue, Aug 24, 2021 at 4:57 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> Thanks for the tests.  I have a bunch of userspace patches to add to the
> last set I posted and these tests will help exercise them.  I also have
> one more kernel patch to post...  I'll dive back into that now.  I had
> wanted to post them before now but got distracted with AUDIT_TRIM
> breakage.

If it helps, last week I started working on a little test tool for the
audit-testsuite and selinux-testsuite (see attached).  It may not be
final, but I don't expect too many changes to it before I post the
test suite patches; it is definitely usable now.  It's inspired by the
previous tests, but it uses a much more test suite friendly fork/exec
model for testing the sharing of io_urings across process boundaries.

Would you mind sharing your latest userspace patches, if not publicly
I would be okay with privately off-list; I'm putting together the test
suite patches this week and it would be good to make sure I'm using
your latest take on the userspace changes.

Also, what is the kernel patch?  Did you find a bug or is this some
new functionality you think might be useful?  Both can be important,
but the bug is *really* important; even if you don't have a fix for
that, just a description of the problem would be good.

-- 
paul moore
www.paul-moore.com

[-- Attachment #2: iouring.4.c --]
[-- Type: text/x-csrc, Size: 15303 bytes --]

/*
 * io_uring test tool to exercise LSM/SELinux and audit kernel code paths
 * Author: Paul Moore <paul@paul-moore.com>
 *
 * Copyright 2021 Microsoft Corporation
 *
 * At the time this code was written the best, and most current, source of info
 * on io_uring seemed to be the liburing sources themselves (link below).  The
 * code below is based on the lessons learned from looking at the liburing
 * code.
 *
 * -> https://github.com/axboe/liburing
 *
 * The liburing LICENSE file contains the following:
 *
 * Copyright 2020 Jens Axboe
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to
 * deal in the Software without restriction, including without limitation the
 * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
 * sell copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 *  The above copyright notice and this permission notice shall be included in
 *  all copies or substantial portions of the Software.
 *
 *  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 *  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 *  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 *  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 *  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 *  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 *  DEALINGS IN THE SOFTWARE.
 *
 */

/*
 * BUILDING:
 *
 * gcc -o <binary> -g -O0 -luring -lrt <source>
 *
 * RUNNING:
 *
 * The program can be run using the following command lines:
 *
 *  % prog sqpoll
 * ... this invocation runs the io_uring SQPOLL test.
 *
 *  % prog t1
 * ... this invocation runs the parent/child io_uring sharing test.
 *
 *  % prog t1 <domain>
 * ... this invocation runs the parent/child io_uring sharing test with the
 * child process run in the specified SELinux domain.
 *
 */

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/wait.h>

#include <liburing.h>

struct urt_config {
	struct io_uring ring;
	struct io_uring_params ring_params;
	int ring_creds;
};

#define URING_ENTRIES				8
#define URING_SHM_NAME				"/iouring_test_4"

int selinux_state = -1;
#define SELINUX_CTX_MAX				512
char selinux_ctx[SELINUX_CTX_MAX] = "\0";

/**
 * Display an error message and exit
 * @param msg the error message
 *
 * Output @msg to stderr and exit with errno as the exit value.
 */
void fatal(const char *msg)
{
	const char *str = (msg ? msg : "unknown");

	if (!errno) {
		errno = 1;
		fprintf(stderr, "%s: unknown error\n", msg);
	} else
		perror(str);

	if (errno < 0)
		exit(-errno);
	exit(errno);
}

/**
 * Determine if SELinux is enabled and set the internal state
 *
 * Attempt to read from /proc/self/attr/current and determine if SELinux is
 * enabled, store the current context/domain in @selinux_ctx if SELinux is
 * enabled.  We avoid using the libselinux API in order to increase portability
 * and make it easier for other LSMs to adopt this test.
 */
int selinux_enabled(void)
{
	int fd = -1;
	ssize_t ctx_len;
	char ctx[SELINUX_CTX_MAX];

	if (selinux_state >= 0)
		return selinux_state;

	/* attempt to get the current context */
	fd = open("/proc/self/attr/current", O_RDONLY);
	if (fd < 0)
		goto err;
	ctx_len = read(fd, ctx, SELINUX_CTX_MAX - 1);
	if (ctx_len <= 0)
		goto err;
	close(fd);

	/* save the current context */
	ctx[ctx_len] = '\0';
	strcpy(selinux_ctx, ctx);

	selinux_state = 1;
	return selinux_state;

err:
	if (fd >= 0)
		close(fd);

	selinux_state = 0;
	return selinux_state;
}

/**
 * Return the current SELinux domain or "DISABLED" if SELinux is not enabled
 *
 * The returned string should not be free()'d.
 */
const char *selinux_current(void)
{
	int rc;

	rc = selinux_enabled();
	if (!rc)
		return "DISABLED";

	return selinux_ctx;
}

/**
 * Set the SELinux domain for the next exec()'d process
 * @param ctx the SELinux domain
 *
 * This is similar to the setexeccon() libselinux API but we do it manually to
 * help increase portability and make it easier for other LSMs to adopt this
 * test.
 */
int selinux_exec(const char *ctx)
{
	int fd = -1;
	ssize_t len;

	if (!ctx)
		return -EINVAL;

	fd = open("/proc/self/attr/exec", O_WRONLY);
	if (fd < 0)
		return -errno;
	len = write(fd, ctx, strlen(ctx) + 1);
	close(fd);

	return len;
}

/**
 * Setup the io_uring
 * @param ring the io_uring pointer
 * @param params the io_uring parameters
 * @param creds pointer to the current process' registered io_uring personality
 *
 * Create a new io_uring using @params and return it in @ring with the
 * registered personality returned in @creds.  Returns 0 on success, negative
 * values on failure.
 */
int uring_setup(struct io_uring *ring,
		struct io_uring_params *params, int *creds)
{
	int rc;

	/* call into liburing to do the setup heavy lifting */
	rc = io_uring_queue_init_params(URING_ENTRIES, ring, params);
	if (rc < 0)
		fatal("io_uring_queue_init_params");

	/* register our creds/personality */
	rc = io_uring_register_personality(ring);
	if (rc < 0)
		fatal("io_uring_register_personality()");
	*creds = rc;
	rc = 0;

	printf(">>> io_uring created; fd = %d, personality = %d\n",
	       ring->ring_fd, *creds);

	return rc;
}

/**
 * Import an existing io_uring based on the given file descriptor
 * @param fd the io_uring's file descriptor
 * @param ring the io_uring pointer
 * @param params the io_uring parameters
 *
 * This function takes an io_uring file descriptor in @fd as well as the
 * io_uring parameters in @params and creates a valid io_uring in @ring.
 * Returns 0 on success, negative values on failure.
 */
int uring_import(int fd, struct io_uring *ring, struct io_uring_params *params)
{
	int rc;

	memset(ring, 0, sizeof(*ring));
	ring->flags = params->flags;
	ring->features = params->features;
	ring->ring_fd = fd;

	ring->sq.ring_sz = params->sq_off.array +
			   params->sq_entries * sizeof(unsigned);
	ring->cq.ring_sz = params->cq_off.cqes +
			   params->cq_entries * sizeof(struct io_uring_cqe);

	ring->sq.ring_ptr = mmap(NULL, ring->sq.ring_sz, PROT_READ | PROT_WRITE,
				 MAP_SHARED | MAP_POPULATE, fd,
				 IORING_OFF_SQ_RING);
	if (ring->sq.ring_ptr == MAP_FAILED)
		fatal("import mmap(ring)");

	ring->cq.ring_ptr = mmap(0, ring->cq.ring_sz, PROT_READ | PROT_WRITE,
				 MAP_SHARED | MAP_POPULATE,
				 fd, IORING_OFF_CQ_RING);
	if (ring->cq.ring_ptr == MAP_FAILED) {
		ring->cq.ring_ptr = NULL;
		goto err;
	}

	ring->sq.khead = ring->sq.ring_ptr + params->sq_off.head;
	ring->sq.ktail = ring->sq.ring_ptr + params->sq_off.tail;
	ring->sq.kring_mask = ring->sq.ring_ptr + params->sq_off.ring_mask;
	ring->sq.kring_entries = ring->sq.ring_ptr +
				 params->sq_off.ring_entries;
	ring->sq.kflags = ring->sq.ring_ptr + params->sq_off.flags;
	ring->sq.kdropped = ring->sq.ring_ptr + params->sq_off.dropped;
	ring->sq.array = ring->sq.ring_ptr + params->sq_off.array;

	ring->sq.sqes = mmap(NULL,
			     params->sq_entries * sizeof(struct io_uring_sqe),
			     PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
			     fd, IORING_OFF_SQES);
	if (ring->sq.sqes == MAP_FAILED)
		goto err;

	ring->cq.khead = ring->cq.ring_ptr + params->cq_off.head;
	ring->cq.ktail = ring->cq.ring_ptr + params->cq_off.tail;
	ring->cq.kring_mask = ring->cq.ring_ptr + params->cq_off.ring_mask;
	ring->cq.kring_entries = ring->cq.ring_ptr +
				 params->cq_off.ring_entries;
	ring->cq.koverflow = ring->cq.ring_ptr + params->cq_off.overflow;
	ring->cq.cqes = ring->cq.ring_ptr + params->cq_off.cqes;
	if (params->cq_off.flags)
		ring->cq.kflags = ring->cq.ring_ptr + params->cq_off.flags;

	return 0;

err:
	if (ring->sq.ring_ptr)
		munmap(ring->sq.ring_ptr, ring->sq.ring_sz);
	if (ring->cq.ring_ptr);
		munmap(ring->cq.ring_ptr, ring->cq.ring_sz);
	fatal("import mmap");
}

void uring_shutdown(struct io_uring *ring)
{
	if (!ring)
		return;
	io_uring_queue_exit(ring);
}

/**
 * An io_uring test
 * @param ring the io_uring pointer
 * @param personality the registered personality to use or 0
 * @param path the file path to use for the test
 *
 * This function executes an io_uring test, see the function body for more
 * details.  Returns 0 on success, negative values on failure.
 */
int uring_op_a(struct io_uring *ring, int personality, const char *path)
{

#define __OP_A_BSIZE		512
#define __OP_A_STR		"Lorem ipsum dolor sit amet.\n"

	int rc;
	int fds[1];
	char buf1[__OP_A_BSIZE];
	char buf2[__OP_A_BSIZE];
	struct io_uring_sqe *sqe;
	struct io_uring_cqe *cqe;
	int str_sz = strlen(__OP_A_STR);

	memset(buf1, 0, __OP_A_BSIZE);
	memset(buf2, 0, __OP_A_BSIZE);
	strncpy(buf1, __OP_A_STR, str_sz);

	if (personality > 0)
		printf(">>> io_uring ops using personality = %d\n",
		       personality);

	/*
	 * open
	 */

	sqe = io_uring_get_sqe(ring);
	if (!sqe)
		fatal("io_uring_get_sqe(open)");
	io_uring_prep_openat(sqe, AT_FDCWD, path,
			     O_RDWR | O_TRUNC | O_CREAT, 0644);
	if (personality > 0)
		sqe->personality = personality;

	rc = io_uring_submit(ring);
	if (rc < 0)
		fatal("io_uring_submit(open)");

	rc = io_uring_wait_cqe(ring, &cqe);
	fds[0] = cqe->res;
	if (rc < 0)
		fatal("io_uring_wait_cqe(open)");
	if (fds[0] < 0)
		fatal("uring_open");
	io_uring_cqe_seen(ring, cqe);

	rc = io_uring_register_files(ring, fds, 1);
	if(rc)
		fatal("io_uring_register_files");

	printf(">>> io_uring open(): OK\n");

	/*
	 * write
	 */

	sqe = io_uring_get_sqe(ring);
	if (!sqe)
		fatal("io_uring_get_sqe(write1)");
	io_uring_prep_write(sqe, 0, buf1, str_sz, 0);
	io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
	if (personality > 0)
		sqe->personality = personality;

	rc = io_uring_submit(ring);
	if (rc < 0)
		fatal("io_uring_submit(write)");

	rc = io_uring_wait_cqe(ring, &cqe);
	if (rc < 0)
		fatal("io_uring_wait_cqe(write)");
	if (cqe->res < 0)
		fatal("uring_write");
	if (cqe->res != str_sz)
		fatal("uring_write(length)");
	io_uring_cqe_seen(ring, cqe);

	printf(">>> io_uring write(): OK\n");

	/*
	 * read
	 */

	sqe = io_uring_get_sqe(ring);
	if (!sqe)
		fatal("io_uring_get_sqe(read1)");
	io_uring_prep_read(sqe, 0, buf2,__OP_A_BSIZE, 0);
	io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
	if (personality > 0)
		sqe->personality = personality;

	rc = io_uring_submit(ring);
	if (rc < 0)
		fatal("io_uring_submit(read)");

	rc = io_uring_wait_cqe(ring, &cqe);
	if (rc < 0)
		fatal("io_uring_wait_cqe(read)");
	if (cqe->res < 0)
		fatal("uring_read");
	if (cqe->res != str_sz)
		fatal("uring_read(length)");
	io_uring_cqe_seen(ring, cqe);

	if (strncmp(buf1, buf2, str_sz))
		fatal("strncmp(buf1,buf2)");

	printf(">>> io_uring read(): OK\n");

	/*
	 * close
	 */

	sqe = io_uring_get_sqe(ring);
	if (!sqe)
		fatal("io_uring_get_sqe(close)");
	io_uring_prep_close(sqe, 0);
	if (personality > 0)
		sqe->personality = personality;

	rc = io_uring_submit(ring);
	if (rc < 0)
		fatal("io_uring_submit(close)");

	rc = io_uring_wait_cqe(ring, &cqe);
	if (rc < 0)
		fatal("io_uring_wait_cqe(close)");
	if (cqe->res < 0)
		fatal("uring_close");
	io_uring_cqe_seen(ring, cqe);

	rc = io_uring_unregister_files(ring);
	if (rc < 0)
		fatal("io_uring_unregister_files");

	printf(">>> io_uring close(): OK\n");

	return 0;
}

/**
 * The main entrypoint to the test program
 * @param argc number of command line options
 * @param argv the command line options array
 */
int main(int argc, char *argv[])
{
	int rc = 1;
	int ring_shm_fd;
	struct io_uring ring_storage, *ring;
	struct urt_config *cfg_p;

	enum { TST_UNKNOWN,
	       TST_SQPOLL,
	       TST_T1_PARENT, TST_T1_CHILD } tst_method;

	/* parse the command line and do some sanity checks */
	tst_method = TST_UNKNOWN;
	if (argc >= 2) {
		if (!strcmp(argv[1], "sqpoll"))
			tst_method = TST_SQPOLL;
		else if (!strcmp(argv[1], "t1") ||
			 !strcmp(argv[1], "t1_parent"))
			tst_method = TST_T1_PARENT;
		else if (!strcmp(argv[1], "t1_child"))
			tst_method = TST_T1_CHILD;
	}
	if (tst_method == TST_UNKNOWN) {
		fprintf(stderr, "usage: %s <method> ... \n", argv[0]);
		exit(EINVAL);
	}

	/* simple header */
	printf(">>> running as PID = %d\n", getpid());
	printf(">>> LSM/SELinux = %s\n", selinux_current());

	/*
	 * test setup (if necessary)
	 */
	if (tst_method == TST_SQPOLL || tst_method == TST_T1_PARENT) {
		 /* create an io_uring and prepare it for optional sharing */
		int flags;

		/* create a shm segment to hold the io_uring info */
		ring_shm_fd = shm_open(URING_SHM_NAME, O_CREAT | O_RDWR,
				       S_IRUSR | S_IWUSR);
		if (ring_shm_fd < 0)
			fatal("shm_open(create)");

		rc = ftruncate(ring_shm_fd, sizeof(struct urt_config));
		if (rc < 0)
			fatal("ftruncate(shm)");

		cfg_p = mmap(NULL, sizeof(*cfg_p), PROT_READ | PROT_WRITE,
			     MAP_SHARED, ring_shm_fd, 0);
		if (!cfg_p)
			fatal("mmap(shm)");

		/* create the io_uring */
		memset(&cfg_p->ring, 0, sizeof(cfg_p->ring));
		memset(&cfg_p->ring_params, 0, sizeof(cfg_p->ring_params));
		if (tst_method == TST_SQPOLL)
			cfg_p->ring_params.flags |= IORING_SETUP_SQPOLL;
		rc = uring_setup(&cfg_p->ring, &cfg_p->ring_params,
				 &cfg_p->ring_creds);
		if (rc)
			fatal("uring_setup");
		ring = &cfg_p->ring;

		/* explicitly clear FD_CLOEXEC on the io_uring */
		flags = fcntl(cfg_p->ring.ring_fd, F_GETFD, 0);
		if (flags < 0)
			fatal("fcntl(ring_shm_fd,getfd)");
		flags &= ~FD_CLOEXEC;
		rc = fcntl(cfg_p->ring.ring_fd, F_SETFD, flags);
		if (rc)
			fatal("fcntl(ring_shm_fd,setfd)");
	} else if (tst_method = TST_T1_CHILD) {
		/* import a previously created and shared io_uring */

		/* open the existing shm segment with the io_uring info */
		ring_shm_fd = shm_open(URING_SHM_NAME, O_RDWR, 0);
		if (ring_shm_fd < 0)
			fatal("shm_open(existing)");
		cfg_p = mmap(NULL, sizeof(*cfg_p), PROT_READ | PROT_WRITE,
			     MAP_SHARED, ring_shm_fd, 0);
		if (!cfg_p)
			fatal("mmap(shm)");

		/* import the io_uring */
		ring = &ring_storage;
		rc = uring_import(cfg_p->ring.ring_fd,
				  ring, &cfg_p->ring_params);
		if (rc < 0)
			fatal("uring_import");
	}

	/*
	 * fork/exec a child process (if necessary)
	 */
	if (tst_method == TST_T1_PARENT) {
		pid_t pid;

		/* set the ctx for the next exec */
		if (argc >= 3) {
			printf(">>> set LSM/SELinux exec: %s\n",
			       (selinux_exec(argv[2]) > 0 ? "OK" : "FAILED"));
		}

		/* fork/exec */
		pid = fork();
		if (!pid) {
			/* start the child */
			rc = execl(argv[0], argv[0], "t1_child", (char *)NULL);
			if (rc < 0)
				fatal("exec");
		} else {
			/* wait for the child to exit */
			int status;
			waitpid(pid, &status, 0);
			if (WIFEXITED(status))
				rc = WEXITSTATUS(status);
		}
	}

	/*
	 * run test(s)
	 */
	if (tst_method == TST_SQPOLL || tst_method == TST_T1_CHILD) {
		rc = uring_op_a(ring, cfg_p->ring_creds, "/tmp/iouring.4.txt");
		if (rc < 0)
			fatal("uring_op_a(\"/tmp/iouring.4.txt\")");
	}

	/*
	 * cleanup
	 */
	if (tst_method == TST_SQPOLL || tst_method == TST_T1_PARENT) {
		printf(">>> shutdown\n");
		uring_shutdown(&cfg_p->ring);
		shm_unlink(URING_SHM_NAME);
	} else if (tst_method == TST_T1_CHILD) {
		shm_unlink(URING_SHM_NAME);
	}

	return rc;
}

^ permalink raw reply

* Re: [RFC PATCH 2/9] audit, io_uring, io-wq: add some basic audit support to io_uring
From: Richard Guy Briggs @ 2021-08-25  1:21 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, selinux, linux-audit, io-uring,
	linux-fsdevel, Kumar Kartikeya Dwivedi, Jens Axboe,
	Alexander Viro
In-Reply-To: <CAHC9VhS0sy_Y8yx4uiZeJhAf_a94ipt1EbE16BOVv6tXtWkgMg@mail.gmail.com>

On 2021-06-02 13:46, Paul Moore wrote:
> On Wed, Jun 2, 2021 at 1:29 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2021-05-21 17:49, Paul Moore wrote:
> > > WARNING - This is a work in progress and should not be merged
> > > anywhere important.  It is almost surely not complete, and while it
> > > probably compiles it likely hasn't been booted and will do terrible
> > > things.  You have been warned.
> > >
> > > This patch adds basic auditing to io_uring operations, regardless of
> > > their context.  This is accomplished by allocating audit_context
> > > structures for the io-wq worker and io_uring SQPOLL kernel threads
> > > as well as explicitly auditing the io_uring operations in
> > > io_issue_sqe().  The io_uring operations are audited using a new
> > > AUDIT_URINGOP record, an example is shown below:
> > >
> > >   % <TODO - insert AUDIT_URINGOP record example>
> > >
> > > Thanks to Richard Guy Briggs for review and feedback.
> > >
> > > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > ---
> > >  fs/io-wq.c                 |    4 +
> > >  fs/io_uring.c              |   11 +++
> > >  include/linux/audit.h      |   17 ++++
> > >  include/uapi/linux/audit.h |    1
> > >  kernel/audit.h             |    2 +
> > >  kernel/auditsc.c           |  173 ++++++++++++++++++++++++++++++++++++++++++++
> > >  6 files changed, 208 insertions(+)
> 
> ...
> 
> > > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > > index e481ac8a757a..e9941d1ad8fd 100644
> > > --- a/fs/io_uring.c
> > > +++ b/fs/io_uring.c
> > > @@ -78,6 +78,7 @@
> > >  #include <linux/task_work.h>
> > >  #include <linux/pagemap.h>
> > >  #include <linux/io_uring.h>
> > > +#include <linux/audit.h>
> > >
> > >  #define CREATE_TRACE_POINTS
> > >  #include <trace/events/io_uring.h>
> > > @@ -6105,6 +6106,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> > >       if (req->work.creds && req->work.creds != current_cred())
> > >               creds = override_creds(req->work.creds);
> > >
> > > +     if (req->opcode < IORING_OP_LAST)
> > > +             audit_uring_entry(req->opcode);
> 
> Note well the override_creds() call right above the audit code that is
> being added, it will be important later in this email.
> 
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index cc89e9f9a753..729849d41631 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -1536,6 +1562,52 @@ static void audit_log_proctitle(void)
> > >       audit_log_end(ab);
> > >  }
> > >
> > > +/**
> > > + * audit_log_uring - generate a AUDIT_URINGOP record
> > > + * @ctx: the audit context
> > > + */
> > > +static void audit_log_uring(struct audit_context *ctx)
> > > +{
> > > +     struct audit_buffer *ab;
> > > +     const struct cred *cred;
> > > +
> > > +     /*
> > > +      * TODO: What do we log here?  I'm tossing in a few things to start the
> > > +      *       conversation, but additional thought needs to go into this.
> > > +      */
> > > +
> > > +     ab = audit_log_start(ctx, GFP_KERNEL, AUDIT_URINGOP);
> > > +     if (!ab)
> > > +             return;
> > > +     cred = current_cred();
> >
> > This may need to be req->work.creds.  I haven't been following if the
> > io_uring thread inherited the user task's creds (and below, comm and
> > exe).
> 
> Nope, we're good.  See the existing code in io_issue_sqe() :)
> 
> > > +     audit_log_format(ab, "uring_op=%d", ctx->uring_op);
> >
> > arch is stored below in __audit_uring_entry() and never used in the
> > AUDIT_CTX_URING case.  That assignment can either be dropped or printed
> > before uring_op similar to the SYSCALL record.
> 
> Good point, I'll drop the code that records the arch from _entry(); it
> is really only useful to give the appropriate context if needed for
> other things in the audit stream, and that isn't the case like it is
> with syscalls.
> 
> > There aren't really any arg[0-3] to print.
> 
> Which is why I didn't print them.
> 
> > io_uring_register and io_uring_setup() args are better covered by other
> > records.  io_uring_enter() has 6 args and the last two aren't covered by
> > SYSCALL anyways.
> 
>  ???
> 
> I think you are confusing the io_uring ops with syscalls; they are
> very different things from an audit perspective and the io_uring
> auditing is not intended to replace syscall records.  The
> io_uring_setup() and io_uring_enter() syscalls will be auditing just
> as any other syscalls would be using the existing syscall audit code.
> 
> > > +     if (ctx->return_valid != AUDITSC_INVALID)
> > > +             audit_log_format(ab, " success=%s exit=%ld",
> > > +                              (ctx->return_valid == AUDITSC_SUCCESS ?
> > > +                               "yes" : "no"),
> > > +                              ctx->return_code);
> > > +     audit_log_format(ab,
> > > +                      " items=%d"
> > > +                      " ppid=%d pid=%d auid=%u uid=%u gid=%u"
> > > +                      " euid=%u suid=%u fsuid=%u"
> > > +                      " egid=%u sgid=%u fsgid=%u",
> > > +                      ctx->name_count,
> > > +                      task_ppid_nr(current),
> > > +                      task_tgid_nr(current),
> > > +                      from_kuid(&init_user_ns, audit_get_loginuid(current)),
> > > +                      from_kuid(&init_user_ns, cred->uid),
> > > +                      from_kgid(&init_user_ns, cred->gid),
> > > +                      from_kuid(&init_user_ns, cred->euid),
> > > +                      from_kuid(&init_user_ns, cred->suid),
> > > +                      from_kuid(&init_user_ns, cred->fsuid),
> > > +                      from_kgid(&init_user_ns, cred->egid),
> > > +                      from_kgid(&init_user_ns, cred->sgid),
> > > +                      from_kgid(&init_user_ns, cred->fsgid));
> >
> > The audit session ID is still important, relevant and qualifies auid.
> > In keeping with the SYSCALL record format, I think we want to keep
> > ses=audit_get_sessionid(current) in here.
> 
> This might be another case of syscall/io_uring confusion.  An io_uring
> op doesn't necessarily have an audit session ID or an audit UID in the
> conventional sense; for example think about SQPOLL works, shared
> rings, etc.

Right, but those syscalls are what instigate io_uring operations, so
whatever process starts that operation, or gets handed that handle
should be tracked with auid and sessionid (the two work together to
track) unless we can easily track io_uring ops to connect them to a
previous setup syscall.  If we see a need to keep the auid, then the
sessionid goes with it.

> > I'm pretty sure we also want to keep comm= and exe= too, but may have to
> > reach into req->task to get it.  There are two values for comm possible,
> > one from the original task and second "iou-sqp-<pid>" set at the top of
> > io_sq_thread().
> 
> I think this is more syscall/io_uring confusion.

I wouldn't call them confusion but rather parallels, attributing a
particular subject to an action.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply

* Re: [RFC PATCH v2 0/9] Add LSM access controls and auditing to io_uring
From: Richard Guy Briggs @ 2021-08-25  1:36 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, selinux, linux-audit, io-uring,
	linux-fsdevel, Kumar Kartikeya Dwivedi, Jens Axboe,
	Pavel Begunkov
In-Reply-To: <CAHC9VhRoHYG8247SvNgxHe8YCduoMi_oEvZqhyYH=faZUAC=CQ@mail.gmail.com>

On 2021-08-24 18:27, Paul Moore wrote:
> On Tue, Aug 24, 2021 at 4:57 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > Thanks for the tests.  I have a bunch of userspace patches to add to the
> > last set I posted and these tests will help exercise them.  I also have
> > one more kernel patch to post...  I'll dive back into that now.  I had
> > wanted to post them before now but got distracted with AUDIT_TRIM
> > breakage.
> 
> If it helps, last week I started working on a little test tool for the
> audit-testsuite and selinux-testsuite (see attached).  It may not be
> final, but I don't expect too many changes to it before I post the
> test suite patches; it is definitely usable now.  It's inspired by the
> previous tests, but it uses a much more test suite friendly fork/exec
> model for testing the sharing of io_urings across process boundaries.
> 
> Would you mind sharing your latest userspace patches, if not publicly
> I would be okay with privately off-list; I'm putting together the test
> suite patches this week and it would be good to make sure I'm using
> your latest take on the userspace changes.

I intend to publish them but they need squashing and some documentation
first.  And a run through with io_uring specific tests would be good to
catch anything obvious...

> Also, what is the kernel patch?  Did you find a bug or is this some
> new functionality you think might be useful?  Both can be important,
> but the bug is *really* important; even if you don't have a fix for
> that, just a description of the problem would be good.

It was a very small patch that I realize I had already talked about and
you justified not including sessionid along with auid.  That was
addressed in a reply tacked on to your v1 patchset just now.

> paul moore

> /*
>  * io_uring test tool to exercise LSM/SELinux and audit kernel code paths
>  * Author: Paul Moore <paul@paul-moore.com>
>  *
>  * Copyright 2021 Microsoft Corporation
>  *
>  * At the time this code was written the best, and most current, source of info
>  * on io_uring seemed to be the liburing sources themselves (link below).  The
>  * code below is based on the lessons learned from looking at the liburing
>  * code.
>  *
>  * -> https://github.com/axboe/liburing
>  *
>  * The liburing LICENSE file contains the following:
>  *
>  * Copyright 2020 Jens Axboe
>  *
>  * Permission is hereby granted, free of charge, to any person obtaining a copy
>  * of this software and associated documentation files (the "Software"), to
>  * deal in the Software without restriction, including without limitation the
>  * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
>  * sell copies of the Software, and to permit persons to whom the Software is
>  * furnished to do so, subject to the following conditions:
>  *
>  *  The above copyright notice and this permission notice shall be included in
>  *  all copies or substantial portions of the Software.
>  *
>  *  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>  *  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>  *  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>  *  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>  *  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>  *  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>  *  DEALINGS IN THE SOFTWARE.
>  *
>  */
> 
> /*
>  * BUILDING:
>  *
>  * gcc -o <binary> -g -O0 -luring -lrt <source>
>  *
>  * RUNNING:
>  *
>  * The program can be run using the following command lines:
>  *
>  *  % prog sqpoll
>  * ... this invocation runs the io_uring SQPOLL test.
>  *
>  *  % prog t1
>  * ... this invocation runs the parent/child io_uring sharing test.
>  *
>  *  % prog t1 <domain>
>  * ... this invocation runs the parent/child io_uring sharing test with the
>  * child process run in the specified SELinux domain.
>  *
>  */
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/wait.h>
> 
> #include <liburing.h>
> 
> struct urt_config {
> 	struct io_uring ring;
> 	struct io_uring_params ring_params;
> 	int ring_creds;
> };
> 
> #define URING_ENTRIES				8
> #define URING_SHM_NAME				"/iouring_test_4"
> 
> int selinux_state = -1;
> #define SELINUX_CTX_MAX				512
> char selinux_ctx[SELINUX_CTX_MAX] = "\0";
> 
> /**
>  * Display an error message and exit
>  * @param msg the error message
>  *
>  * Output @msg to stderr and exit with errno as the exit value.
>  */
> void fatal(const char *msg)
> {
> 	const char *str = (msg ? msg : "unknown");
> 
> 	if (!errno) {
> 		errno = 1;
> 		fprintf(stderr, "%s: unknown error\n", msg);
> 	} else
> 		perror(str);
> 
> 	if (errno < 0)
> 		exit(-errno);
> 	exit(errno);
> }
> 
> /**
>  * Determine if SELinux is enabled and set the internal state
>  *
>  * Attempt to read from /proc/self/attr/current and determine if SELinux is
>  * enabled, store the current context/domain in @selinux_ctx if SELinux is
>  * enabled.  We avoid using the libselinux API in order to increase portability
>  * and make it easier for other LSMs to adopt this test.
>  */
> int selinux_enabled(void)
> {
> 	int fd = -1;
> 	ssize_t ctx_len;
> 	char ctx[SELINUX_CTX_MAX];
> 
> 	if (selinux_state >= 0)
> 		return selinux_state;
> 
> 	/* attempt to get the current context */
> 	fd = open("/proc/self/attr/current", O_RDONLY);
> 	if (fd < 0)
> 		goto err;
> 	ctx_len = read(fd, ctx, SELINUX_CTX_MAX - 1);
> 	if (ctx_len <= 0)
> 		goto err;
> 	close(fd);
> 
> 	/* save the current context */
> 	ctx[ctx_len] = '\0';
> 	strcpy(selinux_ctx, ctx);
> 
> 	selinux_state = 1;
> 	return selinux_state;
> 
> err:
> 	if (fd >= 0)
> 		close(fd);
> 
> 	selinux_state = 0;
> 	return selinux_state;
> }
> 
> /**
>  * Return the current SELinux domain or "DISABLED" if SELinux is not enabled
>  *
>  * The returned string should not be free()'d.
>  */
> const char *selinux_current(void)
> {
> 	int rc;
> 
> 	rc = selinux_enabled();
> 	if (!rc)
> 		return "DISABLED";
> 
> 	return selinux_ctx;
> }
> 
> /**
>  * Set the SELinux domain for the next exec()'d process
>  * @param ctx the SELinux domain
>  *
>  * This is similar to the setexeccon() libselinux API but we do it manually to
>  * help increase portability and make it easier for other LSMs to adopt this
>  * test.
>  */
> int selinux_exec(const char *ctx)
> {
> 	int fd = -1;
> 	ssize_t len;
> 
> 	if (!ctx)
> 		return -EINVAL;
> 
> 	fd = open("/proc/self/attr/exec", O_WRONLY);
> 	if (fd < 0)
> 		return -errno;
> 	len = write(fd, ctx, strlen(ctx) + 1);
> 	close(fd);
> 
> 	return len;
> }
> 
> /**
>  * Setup the io_uring
>  * @param ring the io_uring pointer
>  * @param params the io_uring parameters
>  * @param creds pointer to the current process' registered io_uring personality
>  *
>  * Create a new io_uring using @params and return it in @ring with the
>  * registered personality returned in @creds.  Returns 0 on success, negative
>  * values on failure.
>  */
> int uring_setup(struct io_uring *ring,
> 		struct io_uring_params *params, int *creds)
> {
> 	int rc;
> 
> 	/* call into liburing to do the setup heavy lifting */
> 	rc = io_uring_queue_init_params(URING_ENTRIES, ring, params);
> 	if (rc < 0)
> 		fatal("io_uring_queue_init_params");
> 
> 	/* register our creds/personality */
> 	rc = io_uring_register_personality(ring);
> 	if (rc < 0)
> 		fatal("io_uring_register_personality()");
> 	*creds = rc;
> 	rc = 0;
> 
> 	printf(">>> io_uring created; fd = %d, personality = %d\n",
> 	       ring->ring_fd, *creds);
> 
> 	return rc;
> }
> 
> /**
>  * Import an existing io_uring based on the given file descriptor
>  * @param fd the io_uring's file descriptor
>  * @param ring the io_uring pointer
>  * @param params the io_uring parameters
>  *
>  * This function takes an io_uring file descriptor in @fd as well as the
>  * io_uring parameters in @params and creates a valid io_uring in @ring.
>  * Returns 0 on success, negative values on failure.
>  */
> int uring_import(int fd, struct io_uring *ring, struct io_uring_params *params)
> {
> 	int rc;
> 
> 	memset(ring, 0, sizeof(*ring));
> 	ring->flags = params->flags;
> 	ring->features = params->features;
> 	ring->ring_fd = fd;
> 
> 	ring->sq.ring_sz = params->sq_off.array +
> 			   params->sq_entries * sizeof(unsigned);
> 	ring->cq.ring_sz = params->cq_off.cqes +
> 			   params->cq_entries * sizeof(struct io_uring_cqe);
> 
> 	ring->sq.ring_ptr = mmap(NULL, ring->sq.ring_sz, PROT_READ | PROT_WRITE,
> 				 MAP_SHARED | MAP_POPULATE, fd,
> 				 IORING_OFF_SQ_RING);
> 	if (ring->sq.ring_ptr == MAP_FAILED)
> 		fatal("import mmap(ring)");
> 
> 	ring->cq.ring_ptr = mmap(0, ring->cq.ring_sz, PROT_READ | PROT_WRITE,
> 				 MAP_SHARED | MAP_POPULATE,
> 				 fd, IORING_OFF_CQ_RING);
> 	if (ring->cq.ring_ptr == MAP_FAILED) {
> 		ring->cq.ring_ptr = NULL;
> 		goto err;
> 	}
> 
> 	ring->sq.khead = ring->sq.ring_ptr + params->sq_off.head;
> 	ring->sq.ktail = ring->sq.ring_ptr + params->sq_off.tail;
> 	ring->sq.kring_mask = ring->sq.ring_ptr + params->sq_off.ring_mask;
> 	ring->sq.kring_entries = ring->sq.ring_ptr +
> 				 params->sq_off.ring_entries;
> 	ring->sq.kflags = ring->sq.ring_ptr + params->sq_off.flags;
> 	ring->sq.kdropped = ring->sq.ring_ptr + params->sq_off.dropped;
> 	ring->sq.array = ring->sq.ring_ptr + params->sq_off.array;
> 
> 	ring->sq.sqes = mmap(NULL,
> 			     params->sq_entries * sizeof(struct io_uring_sqe),
> 			     PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
> 			     fd, IORING_OFF_SQES);
> 	if (ring->sq.sqes == MAP_FAILED)
> 		goto err;
> 
> 	ring->cq.khead = ring->cq.ring_ptr + params->cq_off.head;
> 	ring->cq.ktail = ring->cq.ring_ptr + params->cq_off.tail;
> 	ring->cq.kring_mask = ring->cq.ring_ptr + params->cq_off.ring_mask;
> 	ring->cq.kring_entries = ring->cq.ring_ptr +
> 				 params->cq_off.ring_entries;
> 	ring->cq.koverflow = ring->cq.ring_ptr + params->cq_off.overflow;
> 	ring->cq.cqes = ring->cq.ring_ptr + params->cq_off.cqes;
> 	if (params->cq_off.flags)
> 		ring->cq.kflags = ring->cq.ring_ptr + params->cq_off.flags;
> 
> 	return 0;
> 
> err:
> 	if (ring->sq.ring_ptr)
> 		munmap(ring->sq.ring_ptr, ring->sq.ring_sz);
> 	if (ring->cq.ring_ptr);
> 		munmap(ring->cq.ring_ptr, ring->cq.ring_sz);
> 	fatal("import mmap");
> }
> 
> void uring_shutdown(struct io_uring *ring)
> {
> 	if (!ring)
> 		return;
> 	io_uring_queue_exit(ring);
> }
> 
> /**
>  * An io_uring test
>  * @param ring the io_uring pointer
>  * @param personality the registered personality to use or 0
>  * @param path the file path to use for the test
>  *
>  * This function executes an io_uring test, see the function body for more
>  * details.  Returns 0 on success, negative values on failure.
>  */
> int uring_op_a(struct io_uring *ring, int personality, const char *path)
> {
> 
> #define __OP_A_BSIZE		512
> #define __OP_A_STR		"Lorem ipsum dolor sit amet.\n"
> 
> 	int rc;
> 	int fds[1];
> 	char buf1[__OP_A_BSIZE];
> 	char buf2[__OP_A_BSIZE];
> 	struct io_uring_sqe *sqe;
> 	struct io_uring_cqe *cqe;
> 	int str_sz = strlen(__OP_A_STR);
> 
> 	memset(buf1, 0, __OP_A_BSIZE);
> 	memset(buf2, 0, __OP_A_BSIZE);
> 	strncpy(buf1, __OP_A_STR, str_sz);
> 
> 	if (personality > 0)
> 		printf(">>> io_uring ops using personality = %d\n",
> 		       personality);
> 
> 	/*
> 	 * open
> 	 */
> 
> 	sqe = io_uring_get_sqe(ring);
> 	if (!sqe)
> 		fatal("io_uring_get_sqe(open)");
> 	io_uring_prep_openat(sqe, AT_FDCWD, path,
> 			     O_RDWR | O_TRUNC | O_CREAT, 0644);
> 	if (personality > 0)
> 		sqe->personality = personality;
> 
> 	rc = io_uring_submit(ring);
> 	if (rc < 0)
> 		fatal("io_uring_submit(open)");
> 
> 	rc = io_uring_wait_cqe(ring, &cqe);
> 	fds[0] = cqe->res;
> 	if (rc < 0)
> 		fatal("io_uring_wait_cqe(open)");
> 	if (fds[0] < 0)
> 		fatal("uring_open");
> 	io_uring_cqe_seen(ring, cqe);
> 
> 	rc = io_uring_register_files(ring, fds, 1);
> 	if(rc)
> 		fatal("io_uring_register_files");
> 
> 	printf(">>> io_uring open(): OK\n");
> 
> 	/*
> 	 * write
> 	 */
> 
> 	sqe = io_uring_get_sqe(ring);
> 	if (!sqe)
> 		fatal("io_uring_get_sqe(write1)");
> 	io_uring_prep_write(sqe, 0, buf1, str_sz, 0);
> 	io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
> 	if (personality > 0)
> 		sqe->personality = personality;
> 
> 	rc = io_uring_submit(ring);
> 	if (rc < 0)
> 		fatal("io_uring_submit(write)");
> 
> 	rc = io_uring_wait_cqe(ring, &cqe);
> 	if (rc < 0)
> 		fatal("io_uring_wait_cqe(write)");
> 	if (cqe->res < 0)
> 		fatal("uring_write");
> 	if (cqe->res != str_sz)
> 		fatal("uring_write(length)");
> 	io_uring_cqe_seen(ring, cqe);
> 
> 	printf(">>> io_uring write(): OK\n");
> 
> 	/*
> 	 * read
> 	 */
> 
> 	sqe = io_uring_get_sqe(ring);
> 	if (!sqe)
> 		fatal("io_uring_get_sqe(read1)");
> 	io_uring_prep_read(sqe, 0, buf2,__OP_A_BSIZE, 0);
> 	io_uring_sqe_set_flags(sqe, IOSQE_FIXED_FILE);
> 	if (personality > 0)
> 		sqe->personality = personality;
> 
> 	rc = io_uring_submit(ring);
> 	if (rc < 0)
> 		fatal("io_uring_submit(read)");
> 
> 	rc = io_uring_wait_cqe(ring, &cqe);
> 	if (rc < 0)
> 		fatal("io_uring_wait_cqe(read)");
> 	if (cqe->res < 0)
> 		fatal("uring_read");
> 	if (cqe->res != str_sz)
> 		fatal("uring_read(length)");
> 	io_uring_cqe_seen(ring, cqe);
> 
> 	if (strncmp(buf1, buf2, str_sz))
> 		fatal("strncmp(buf1,buf2)");
> 
> 	printf(">>> io_uring read(): OK\n");
> 
> 	/*
> 	 * close
> 	 */
> 
> 	sqe = io_uring_get_sqe(ring);
> 	if (!sqe)
> 		fatal("io_uring_get_sqe(close)");
> 	io_uring_prep_close(sqe, 0);
> 	if (personality > 0)
> 		sqe->personality = personality;
> 
> 	rc = io_uring_submit(ring);
> 	if (rc < 0)
> 		fatal("io_uring_submit(close)");
> 
> 	rc = io_uring_wait_cqe(ring, &cqe);
> 	if (rc < 0)
> 		fatal("io_uring_wait_cqe(close)");
> 	if (cqe->res < 0)
> 		fatal("uring_close");
> 	io_uring_cqe_seen(ring, cqe);
> 
> 	rc = io_uring_unregister_files(ring);
> 	if (rc < 0)
> 		fatal("io_uring_unregister_files");
> 
> 	printf(">>> io_uring close(): OK\n");
> 
> 	return 0;
> }
> 
> /**
>  * The main entrypoint to the test program
>  * @param argc number of command line options
>  * @param argv the command line options array
>  */
> int main(int argc, char *argv[])
> {
> 	int rc = 1;
> 	int ring_shm_fd;
> 	struct io_uring ring_storage, *ring;
> 	struct urt_config *cfg_p;
> 
> 	enum { TST_UNKNOWN,
> 	       TST_SQPOLL,
> 	       TST_T1_PARENT, TST_T1_CHILD } tst_method;
> 
> 	/* parse the command line and do some sanity checks */
> 	tst_method = TST_UNKNOWN;
> 	if (argc >= 2) {
> 		if (!strcmp(argv[1], "sqpoll"))
> 			tst_method = TST_SQPOLL;
> 		else if (!strcmp(argv[1], "t1") ||
> 			 !strcmp(argv[1], "t1_parent"))
> 			tst_method = TST_T1_PARENT;
> 		else if (!strcmp(argv[1], "t1_child"))
> 			tst_method = TST_T1_CHILD;
> 	}
> 	if (tst_method == TST_UNKNOWN) {
> 		fprintf(stderr, "usage: %s <method> ... \n", argv[0]);
> 		exit(EINVAL);
> 	}
> 
> 	/* simple header */
> 	printf(">>> running as PID = %d\n", getpid());
> 	printf(">>> LSM/SELinux = %s\n", selinux_current());
> 
> 	/*
> 	 * test setup (if necessary)
> 	 */
> 	if (tst_method == TST_SQPOLL || tst_method == TST_T1_PARENT) {
> 		 /* create an io_uring and prepare it for optional sharing */
> 		int flags;
> 
> 		/* create a shm segment to hold the io_uring info */
> 		ring_shm_fd = shm_open(URING_SHM_NAME, O_CREAT | O_RDWR,
> 				       S_IRUSR | S_IWUSR);
> 		if (ring_shm_fd < 0)
> 			fatal("shm_open(create)");
> 
> 		rc = ftruncate(ring_shm_fd, sizeof(struct urt_config));
> 		if (rc < 0)
> 			fatal("ftruncate(shm)");
> 
> 		cfg_p = mmap(NULL, sizeof(*cfg_p), PROT_READ | PROT_WRITE,
> 			     MAP_SHARED, ring_shm_fd, 0);
> 		if (!cfg_p)
> 			fatal("mmap(shm)");
> 
> 		/* create the io_uring */
> 		memset(&cfg_p->ring, 0, sizeof(cfg_p->ring));
> 		memset(&cfg_p->ring_params, 0, sizeof(cfg_p->ring_params));
> 		if (tst_method == TST_SQPOLL)
> 			cfg_p->ring_params.flags |= IORING_SETUP_SQPOLL;
> 		rc = uring_setup(&cfg_p->ring, &cfg_p->ring_params,
> 				 &cfg_p->ring_creds);
> 		if (rc)
> 			fatal("uring_setup");
> 		ring = &cfg_p->ring;
> 
> 		/* explicitly clear FD_CLOEXEC on the io_uring */
> 		flags = fcntl(cfg_p->ring.ring_fd, F_GETFD, 0);
> 		if (flags < 0)
> 			fatal("fcntl(ring_shm_fd,getfd)");
> 		flags &= ~FD_CLOEXEC;
> 		rc = fcntl(cfg_p->ring.ring_fd, F_SETFD, flags);
> 		if (rc)
> 			fatal("fcntl(ring_shm_fd,setfd)");
> 	} else if (tst_method = TST_T1_CHILD) {
> 		/* import a previously created and shared io_uring */
> 
> 		/* open the existing shm segment with the io_uring info */
> 		ring_shm_fd = shm_open(URING_SHM_NAME, O_RDWR, 0);
> 		if (ring_shm_fd < 0)
> 			fatal("shm_open(existing)");
> 		cfg_p = mmap(NULL, sizeof(*cfg_p), PROT_READ | PROT_WRITE,
> 			     MAP_SHARED, ring_shm_fd, 0);
> 		if (!cfg_p)
> 			fatal("mmap(shm)");
> 
> 		/* import the io_uring */
> 		ring = &ring_storage;
> 		rc = uring_import(cfg_p->ring.ring_fd,
> 				  ring, &cfg_p->ring_params);
> 		if (rc < 0)
> 			fatal("uring_import");
> 	}
> 
> 	/*
> 	 * fork/exec a child process (if necessary)
> 	 */
> 	if (tst_method == TST_T1_PARENT) {
> 		pid_t pid;
> 
> 		/* set the ctx for the next exec */
> 		if (argc >= 3) {
> 			printf(">>> set LSM/SELinux exec: %s\n",
> 			       (selinux_exec(argv[2]) > 0 ? "OK" : "FAILED"));
> 		}
> 
> 		/* fork/exec */
> 		pid = fork();
> 		if (!pid) {
> 			/* start the child */
> 			rc = execl(argv[0], argv[0], "t1_child", (char *)NULL);
> 			if (rc < 0)
> 				fatal("exec");
> 		} else {
> 			/* wait for the child to exit */
> 			int status;
> 			waitpid(pid, &status, 0);
> 			if (WIFEXITED(status))
> 				rc = WEXITSTATUS(status);
> 		}
> 	}
> 
> 	/*
> 	 * run test(s)
> 	 */
> 	if (tst_method == TST_SQPOLL || tst_method == TST_T1_CHILD) {
> 		rc = uring_op_a(ring, cfg_p->ring_creds, "/tmp/iouring.4.txt");
> 		if (rc < 0)
> 			fatal("uring_op_a(\"/tmp/iouring.4.txt\")");
> 	}
> 
> 	/*
> 	 * cleanup
> 	 */
> 	if (tst_method == TST_SQPOLL || tst_method == TST_T1_PARENT) {
> 		printf(">>> shutdown\n");
> 		uring_shutdown(&cfg_p->ring);
> 		shm_unlink(URING_SHM_NAME);
> 	} else if (tst_method == TST_T1_CHILD) {
> 		shm_unlink(URING_SHM_NAME);
> 	}
> 
> 	return rc;
> }


- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply

* Re: [PATCH] ima: fix deadlock within RCU list of ima_rules
From: liqiong @ 2021-08-25  7:05 UTC (permalink / raw)
  To: Mimi Zohar, THOBY Simon
  Cc: dmitry.kasatkin@gmail.com, jmorris@namei.org, serge@hallyn.com,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <2c4f61ff68544b2627fc4a38ad1e4109184ec68a.camel@linux.ibm.com>

Hi Mimi,

Thanks for the advice，maybe i should trim the message,
here is a new copy:


subject: ima: fix deadlock when iterating over the init "ima_rules" list.

The init "ima_rules" list can't traverse back to head, if "ima_rules"
is being updated to "ima_policy_rules". It causes soft lockup and RCU stalls.
So we can introduce a duplicate of "ima_rules" for each "ima_rules" list loop.

Signed-off-by: liqiong <liqiong@nfschina.com>
---
 This problem can happen in practice: updating the IMA policy
 in the boot process while systemd-services are being checked.

 security/integrity/ima/ima_policy.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index fd5d46e511f1..e92b197bfd3c 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c


Regards,

liqiong

在 2021年08月24日 20:38, Mimi Zohar 写道:
> On Tue, 2021-08-24 at 20:09 +0800, liqiong wrote:
>> Hi Simon :
>>
>> ima: fix deadlock within RCU list of ima_rules.
>>
> Before the following paragraph, an introductory sentence is needed. 
> Try adding a sentence to the affect that "ima_rules" initially points
> to the "ima_default_rules", but after loading a custom policy points to
> the "ima_policy_rules".   Then describe the bug at a high level,
> something like - transitioning to the "ima_policy_rules" isn't being
> done safely.
>
> Followed by the details.
>
>> ima_match_policy() is looping on the policy ruleset while
>> ima_update_policy() updates the variable "ima_rules". This can
>> lead to a situation where ima_match_policy() can't exit the
>> 'list_for_each_entry_rcu' loop, causing RCU stalls
>> ("rcu_sched detected stall on CPU ...").
>>
>> This problem can happen in practice: updating the IMA policy
>> in the boot process while systemd-services are being checked.
>>
>> In addition to ima_match_policy(), other function with 
>> "list_for_each_entry_rcu" should happen too. Fix locking by 
>> introducing a duplicate of "ima_rules" for each 
>> "list_for_each_entry_rcu".
>>
>>
>> How about this commit message ?
>>
>> I have tested this patch in lab, we can reproduced this error case, 
>> have done reboot test many times. This patch should work. 
> The above comment doesn't belong in the commit message, but is a
> message to the reviewers/maintainers and goes after the patch
> descriptions three dashes line.
>
> thanks,
>
> Mimi
>
>


^ permalink raw reply

* Re: [PATCH 0/4] KEYS: trusted: Introduce support for NXP CAAM-based trusted keys
From: Ahmad Fatoum @ 2021-08-25  9:34 UTC (permalink / raw)
  To: Tim Harvey
  Cc: David Gstir, Aymen Sghaier, Mimi Zohar, Jan Luebbe, keyrings,
	Steffen Trumtrar, linux-security-module, Udit Agarwal, Herbert Xu,
	Horia Geantă, Richard Weinberger, James Morris, Eric Biggers,
	Serge E. Hallyn, Sumit Garg, James Bottomley, Franck LENORMAND,
	David Howells, open list, Jarkko Sakkinen, linux-crypto,
	Sascha Hauer, linux-integrity, David S. Miller
In-Reply-To: <CAJ+vNU2q_KCi8nNv56s0ip7CZaAE=YgObwFUyzuGa_T1Ywp-wQ@mail.gmail.com>

On 24.08.21 17:23, Tim Harvey wrote:
> On Tue, Aug 24, 2021 at 12:33 AM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>>
>> On 23.08.21 19:50, Tim Harvey wrote:
>>> On Mon, Aug 23, 2021 at 6:29 AM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>>>> On 20.08.21 23:19, Tim Harvey wrote:
>>>>> On Fri, Aug 20, 2021 at 1:36 PM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>>>>>> On 20.08.21 22:20, Tim Harvey wrote:
>>>>> It works for a user keyring but not a session keyring... does that
>>>>> explain anything?
>>>>> # keyctl add trusted mykey 'new 32' @u
>>>>> 941210782
>>>>> # keyctl print 941210782
>>>>> 83b7845cb45216496aead9ee2c6a406f587d64aad47bddc539d8947a247e618798d9306b36398b5dc2722a4c3f220a3a763ee175f6bd64758fdd49ca4db597e8ce328121b60edbba9b8d8d55056be896
>>>>> # keyctl add trusted mykey 'new 32' @s
>>>>> 310571960
>>>>> # keyctl print 310571960
>>>>> keyctl_read_alloc: Unknown error 126
>>>>
>>>> Both sequences work for me.
>>>>
>>>> My getty is started by systemd. I think systemd allocates a new session
>>>> keyring for the getty that's inherited by the shell and the commands I run
>>>> it in. If you don't do that, each command will get its own session key.
>>>>
>>>>> Sorry, I'm still trying to wrap my head around the differences in
>>>>> keyrings and trusted vs user keys.
>>>>
>>>> No problem. HTH.
>>>
>>> Ahmad,
>>>
>>> Ok that explains it - my testing is using a very basic buildroot
>>> ramdisk rootfs. If I do a 'keyctl new_session' first I can use the
>>> system keyring fine as well.
>>
>> Great. Does this mean I can get your Tested-by: ? :)
>>
> 
> Absolutely,
> 
> For the series:
> 
> I tested this series on top of v5.14.rc-7 on a Gateworks
> imx8mm-venice-gw73xx board with kernel param trusted.source=caam and
> keyutils-1.6:
> # keyctl new_session
> 22544757
> # keyctl add trusted mykey 'new 32' @s
> 160701809
> # keyctl print 160701809
> 990e03aa4515aee420eede17e26a58d0c5568c8bd2c9c2ee2f22a0583181d20d4f65cf9cb1f944a3cc92c0e3184a44a29a7e511f0a55a6af11a70ac2b2924514002475e73ae09820042896b9ee00a5ec
> 
> Tested-By: Tim Harvey <tharvey@gateworks.com>

Thanks. I'll apply it to the whole series then.

> One more question: I've got a user that wants to blob/deblob generic
> data. They can use the caam_encap_blob/caam_decap_blob functions in
> kernel code but could you give me a suggestion for how they could use
> this in:
> a) userspace code (using the keyctl syscall I assume)
> b) userspace cmdline (via keyutils I assume)

Trusted keys aren't disclosed to userspace in plain text, only in sealed
form (bar vulnerabilities of course).

Cheers,
Ahmad

> 
> Many thanks,
> 
> Tim
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH] ima: fix deadlock within RCU list of ima_rules
From: liqiong @ 2021-08-25 11:45 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: THOBY Simon, dmitry.kasatkin@gmail.com, jmorris@namei.org,
	serge@hallyn.com, linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <d502623a-7a49-04f8-1672-6521ceef260b@nfschina.com>

Hi Mimi,

This copy may be better.


subject: ima: fix deadlock when iterating over the init "ima_rules" list.



When traversing back to head, the init "ima_rules" list can't exit
iterating if "ima_rules" has been updated to "ima_policy_rules".
It causes soft lockup and RCU stalls. So we can introduce a duplicate
of "ima_rules" for each "ima_rules" list loop.

Signed-off-by: liqiong <liqiong@nfschina.com>
---
 This problem can happen in practice: updating the IMA policy
 in the boot process while systemd-services are being checked.

 security/integrity/ima/ima_policy.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index fd5d46e511f1..e92b197bfd3c 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c


Regards,

liqiong



在 2021年08月25日 15:05, liqiong 写道:
> Hi Mimi,
>
> Thanks for the advice，maybe i should trim the message,
> here is a new copy:
>
>
> subject: ima: fix deadlock when iterating over the init "ima_rules" list.
>
> The init "ima_rules" list can't traverse back to head, if "ima_rules"
> is being updated to "ima_policy_rules". It causes soft lockup and RCU stalls.
> So we can introduce a duplicate of "ima_rules" for each "ima_rules" list loop.
>
> Signed-off-by: liqiong <liqiong@nfschina.com>
> ---
>  This problem can happen in practice: updating the IMA policy
>  in the boot process while systemd-services are being checked.
>
>  security/integrity/ima/ima_policy.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> index fd5d46e511f1..e92b197bfd3c 100644
> --- a/security/integrity/ima/ima_policy.c
> +++ b/security/integrity/ima/ima_policy.c
>
>
> Regards,
>
> liqiong
>
> 在 2021年08月24日 20:38, Mimi Zohar 写道:
>> On Tue, 2021-08-24 at 20:09 +0800, liqiong wrote:
>>> Hi Simon :
>>>
>>> ima: fix deadlock within RCU list of ima_rules.
>>>
>> Before the following paragraph, an introductory sentence is needed. 
>> Try adding a sentence to the affect that "ima_rules" initially points
>> to the "ima_default_rules", but after loading a custom policy points to
>> the "ima_policy_rules".   Then describe the bug at a high level,
>> something like - transitioning to the "ima_policy_rules" isn't being
>> done safely.
>>
>> Followed by the details.
>>
>>> ima_match_policy() is looping on the policy ruleset while
>>> ima_update_policy() updates the variable "ima_rules". This can
>>> lead to a situation where ima_match_policy() can't exit the
>>> 'list_for_each_entry_rcu' loop, causing RCU stalls
>>> ("rcu_sched detected stall on CPU ...").
>>>
>>> This problem can happen in practice: updating the IMA policy
>>> in the boot process while systemd-services are being checked.
>>>
>>> In addition to ima_match_policy(), other function with 
>>> "list_for_each_entry_rcu" should happen too. Fix locking by 
>>> introducing a duplicate of "ima_rules" for each 
>>> "list_for_each_entry_rcu".
>>>
>>>
>>> How about this commit message ?
>>>
>>> I have tested this patch in lab, we can reproduced this error case, 
>>> have done reboot test many times. This patch should work. 
>> The above comment doesn't belong in the commit message, but is a
>> message to the reviewers/maintainers and goes after the patch
>> descriptions three dashes line.
>>
>> thanks,
>>
>> Mimi
>>
>>
>


^ permalink raw reply

* Re: [PATCH] ima: fix deadlock within RCU list of ima_rules
From: THOBY Simon @ 2021-08-25 12:03 UTC (permalink / raw)
  To: liqiong, Mimi Zohar
  Cc: dmitry.kasatkin@gmail.com, jmorris@namei.org, serge@hallyn.com,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <5a032a1b-f763-a0e4-8ea2-803872bd7174@nfschina.com>

Hi Liqiong,

On 8/25/21 1:45 PM, liqiong wrote:
> Hi Mimi,
> 
> This copy may be better.
> 
> 
> subject: ima: fix deadlock when iterating over the init "ima_rules" list.
> 
> 

As Mimi said, consider adding an introducing paragraph, like:
'The current IMA ruleset is identified by the variable "ima_rules",
and the pointer starts pointing at the list "ima_default_rules". When
loading a custom policy for the first time, the variable is
updated to point to the list "ima_policy_rules" instead. That update
isn't RCU-safe, and deadlocks are possible.'

> 
> When traversing back to head, the init "ima_rules" list can't exit
> iterating if "ima_rules" has been updated to "ima_policy_rules".
> It causes soft lockup and RCU stalls. So we can introduce a duplicate
> of "ima_rules" for each "ima_rules" list loop.

Per the process (see 'Documentation/process/submitting-patches.rst'),
please prefer an imperative style (written in another paragraph):
'Introduce a temporary value for "ima_rules" when iterating over the ruleset.'


Thanks,
Simon

^ permalink raw reply

* [syzbot] KASAN: use-after-free Read in netlbl_catmap_walk
From: syzbot @ 2021-08-25 17:21 UTC (permalink / raw)
  To: davem, kuba, linux-kernel, linux-security-module, netdev, paul,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    6e764bcd1cf7 Merge tag 'for-linus' of git://git.kernel.org..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=124e77c5300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2fd902af77ff1e56
dashboard link: https://syzkaller.appspot.com/bug?extid=3f91de0b813cc3d19a80
compiler:       Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13f72f16300000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=133e338d300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3f91de0b813cc3d19a80@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:564 [inline]
BUG: KASAN: use-after-free in netlbl_catmap_walk+0x28b/0x2e0 net/netlabel/netlabel_kapi.c:615
Read of size 4 at addr ffff8880161c9800 by task syz-executor742/8768

CPU: 0 PID: 8768 Comm: syz-executor742 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1ae/0x29f lib/dump_stack.c:105
 print_address_description+0x66/0x3b0 mm/kasan/report.c:233
 __kasan_report mm/kasan/report.c:419 [inline]
 kasan_report+0x163/0x210 mm/kasan/report.c:436
 _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:564 [inline]
 netlbl_catmap_walk+0x28b/0x2e0 net/netlabel/netlabel_kapi.c:615
 cipso_seq_show+0x15f/0x280 security/smack/smackfs.c:789
 traverse+0x1dc/0x530 fs/seq_file.c:111
 seq_lseek+0x12b/0x240 fs/seq_file.c:323
 vfs_llseek fs/read_write.c:300 [inline]
 ksys_lseek fs/read_write.c:313 [inline]
 __do_sys_lseek fs/read_write.c:324 [inline]
 __se_sys_lseek fs/read_write.c:322 [inline]
 __x64_sys_lseek+0x15b/0x1e0 fs/read_write.c:322
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x445889
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9a9cff0318 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
RAX: ffffffffffffffda RBX: 00000000004ca408 RCX: 0000000000445889
RDX: 0000000000000000 RSI: 0000000100000001 RDI: 0000000000000003
RBP: 00000000004ca400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ca40c
R13: 00007ffead4e31bf R14: 00007f9a9cff0400 R15: 0000000000022000

Allocated by task 8768:
 kasan_save_stack mm/kasan/common.c:38 [inline]
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 ____kasan_kmalloc+0xcd/0x100 mm/kasan/common.c:513
 kasan_kmalloc include/linux/kasan.h:264 [inline]
 kmem_cache_alloc_trace+0x96/0x340 mm/slub.c:2986
 kmalloc include/linux/slab.h:591 [inline]
 kzalloc include/linux/slab.h:721 [inline]
 netlbl_catmap_alloc include/net/netlabel.h:317 [inline]
 _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:582 [inline]
 netlbl_catmap_setbit+0x1cb/0x3f0 net/netlabel/netlabel_kapi.c:782
 smk_netlbl_mls+0x103/0x5e0 security/smack/smack_access.c:505
 smk_set_cipso+0x621/0x810 security/smack/smackfs.c:921
 vfs_write+0x289/0xc90 fs/read_write.c:603
 ksys_write+0x171/0x2a0 fs/read_write.c:658
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 8769:
 kasan_save_stack mm/kasan/common.c:38 [inline]
 kasan_set_track+0x3d/0x70 mm/kasan/common.c:46
 kasan_set_free_info+0x1f/0x40 mm/kasan/generic.c:360
 ____kasan_slab_free+0x10d/0x150 mm/kasan/common.c:366
 kasan_slab_free include/linux/kasan.h:230 [inline]
 slab_free_hook mm/slub.c:1628 [inline]
 slab_free_freelist_hook+0x1e8/0x2a0 mm/slub.c:1653
 slab_free mm/slub.c:3213 [inline]
 kfree+0xcf/0x2e0 mm/slub.c:4267
 netlbl_catmap_free include/net/netlabel.h:335 [inline]
 smk_set_cipso+0x682/0x810 security/smack/smackfs.c:923
 vfs_write+0x289/0xc90 fs/read_write.c:603
 ksys_write+0x171/0x2a0 fs/read_write.c:658
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff8880161c9800
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
 64-byte region [ffff8880161c9800, ffff8880161c9840)
The buggy address belongs to the page:
page:ffffea0000587240 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x161c9
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 0000000000000000 0000000b00000001 ffff888011041640
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 6484, ts 27339779214, free_ts 27310944203
 prep_new_page mm/page_alloc.c:2436 [inline]
 get_page_from_freelist+0x779/0xa30 mm/page_alloc.c:4168
 __alloc_pages+0x26c/0x5f0 mm/page_alloc.c:5390
 alloc_slab_page mm/slub.c:1691 [inline]
 allocate_slab+0xf1/0x540 mm/slub.c:1831
 new_slab mm/slub.c:1894 [inline]
 new_slab_objects mm/slub.c:2640 [inline]
 ___slab_alloc+0x1cf/0x350 mm/slub.c:2803
 __slab_alloc mm/slub.c:2843 [inline]
 slab_alloc_node mm/slub.c:2925 [inline]
 __kmalloc_node+0x310/0x430 mm/slub.c:4159
 kmalloc_node include/linux/slab.h:614 [inline]
 kvmalloc_node+0x81/0xf0 mm/util.c:587
 kvmalloc include/linux/mm.h:806 [inline]
 simple_xattr_alloc+0x3f/0xa0 fs/xattr.c:951
 shmem_initxattrs+0x91/0x1e0 mm/shmem.c:3142
 security_inode_init_security+0x37a/0x3c0 security/security.c:1099
 shmem_mknod+0xb0/0x1b0 mm/shmem.c:2822
 lookup_open fs/namei.c:3228 [inline]
 open_last_lookups fs/namei.c:3298 [inline]
 path_openat+0x13b7/0x36b0 fs/namei.c:3504
 do_filp_open+0x253/0x4d0 fs/namei.c:3534
 do_sys_openat2+0x124/0x460 fs/open.c:1204
 do_sys_open fs/open.c:1220 [inline]
 __do_sys_open fs/open.c:1228 [inline]
 __se_sys_open fs/open.c:1224 [inline]
 __x64_sys_open+0x221/0x270 fs/open.c:1224
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1346 [inline]
 free_pcp_prepare+0xc29/0xd20 mm/page_alloc.c:1397
 free_unref_page_prepare mm/page_alloc.c:3332 [inline]
 free_unref_page+0x7e/0x550 mm/page_alloc.c:3411
 __vunmap+0x926/0xa70 mm/vmalloc.c:2587
 free_work+0x66/0x90 mm/vmalloc.c:82
 process_one_work+0x833/0x10c0 kernel/workqueue.c:2276
 worker_thread+0xac1/0x1320 kernel/workqueue.c:2422
 kthread+0x453/0x480 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Memory state around the buggy address:
 ffff8880161c9700: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
 ffff8880161c9780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8880161c9800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                   ^
 ffff8880161c9880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8880161c9900: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [syzbot] KASAN: use-after-free Read in netlbl_catmap_walk
From: Paul Moore @ 2021-08-25 18:31 UTC (permalink / raw)
  To: casey
  Cc: davem, kuba, linux-kernel, linux-security-module, netdev,
	syzkaller-bugs, syzbot
In-Reply-To: <000000000000a814c505ca657a4e@google.com>

On Wed, Aug 25, 2021 at 1:21 PM syzbot
<syzbot+3f91de0b813cc3d19a80@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    6e764bcd1cf7 Merge tag 'for-linus' of git://git.kernel.org..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=124e77c5300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2fd902af77ff1e56
> dashboard link: https://syzkaller.appspot.com/bug?extid=3f91de0b813cc3d19a80
> compiler:       Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13f72f16300000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=133e338d300000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3f91de0b813cc3d19a80@syzkaller.appspotmail.com
>
> ==================================================================
> BUG: KASAN: use-after-free in _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:564 [inline]
> BUG: KASAN: use-after-free in netlbl_catmap_walk+0x28b/0x2e0 net/netlabel/netlabel_kapi.c:615
> Read of size 4 at addr ffff8880161c9800 by task syz-executor742/8768
>
> CPU: 0 PID: 8768 Comm: syz-executor742 Not tainted 5.14.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0x1ae/0x29f lib/dump_stack.c:105
>  print_address_description+0x66/0x3b0 mm/kasan/report.c:233
>  __kasan_report mm/kasan/report.c:419 [inline]
>  kasan_report+0x163/0x210 mm/kasan/report.c:436
>  _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:564 [inline]
>  netlbl_catmap_walk+0x28b/0x2e0 net/netlabel/netlabel_kapi.c:615
>  cipso_seq_show+0x15f/0x280 security/smack/smackfs.c:789
>  traverse+0x1dc/0x530 fs/seq_file.c:111
>  seq_lseek+0x12b/0x240 fs/seq_file.c:323
>  vfs_llseek fs/read_write.c:300 [inline]
>  ksys_lseek fs/read_write.c:313 [inline]
>  __do_sys_lseek fs/read_write.c:324 [inline]
>  __se_sys_lseek fs/read_write.c:322 [inline]
>  __x64_sys_lseek+0x15b/0x1e0 fs/read_write.c:322
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x445889
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f9a9cff0318 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
> RAX: ffffffffffffffda RBX: 00000000004ca408 RCX: 0000000000445889
> RDX: 0000000000000000 RSI: 0000000100000001 RDI: 0000000000000003
> RBP: 00000000004ca400 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ca40c
> R13: 00007ffead4e31bf R14: 00007f9a9cff0400 R15: 0000000000022000
>
> Allocated by task 8768:
>  kasan_save_stack mm/kasan/common.c:38 [inline]
>  kasan_set_track mm/kasan/common.c:46 [inline]
>  set_alloc_info mm/kasan/common.c:434 [inline]
>  ____kasan_kmalloc+0xcd/0x100 mm/kasan/common.c:513
>  kasan_kmalloc include/linux/kasan.h:264 [inline]
>  kmem_cache_alloc_trace+0x96/0x340 mm/slub.c:2986
>  kmalloc include/linux/slab.h:591 [inline]
>  kzalloc include/linux/slab.h:721 [inline]
>  netlbl_catmap_alloc include/net/netlabel.h:317 [inline]
>  _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:582 [inline]
>  netlbl_catmap_setbit+0x1cb/0x3f0 net/netlabel/netlabel_kapi.c:782
>  smk_netlbl_mls+0x103/0x5e0 security/smack/smack_access.c:505
>  smk_set_cipso+0x621/0x810 security/smack/smackfs.c:921
>  vfs_write+0x289/0xc90 fs/read_write.c:603
>  ksys_write+0x171/0x2a0 fs/read_write.c:658
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Freed by task 8769:
>  kasan_save_stack mm/kasan/common.c:38 [inline]
>  kasan_set_track+0x3d/0x70 mm/kasan/common.c:46
>  kasan_set_free_info+0x1f/0x40 mm/kasan/generic.c:360
>  ____kasan_slab_free+0x10d/0x150 mm/kasan/common.c:366
>  kasan_slab_free include/linux/kasan.h:230 [inline]
>  slab_free_hook mm/slub.c:1628 [inline]
>  slab_free_freelist_hook+0x1e8/0x2a0 mm/slub.c:1653
>  slab_free mm/slub.c:3213 [inline]
>  kfree+0xcf/0x2e0 mm/slub.c:4267
>  netlbl_catmap_free include/net/netlabel.h:335 [inline]
>  smk_set_cipso+0x682/0x810 security/smack/smackfs.c:923
>  vfs_write+0x289/0xc90 fs/read_write.c:603
>  ksys_write+0x171/0x2a0 fs/read_write.c:658
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The buggy address belongs to the object at ffff8880161c9800
>  which belongs to the cache kmalloc-64 of size 64
> The buggy address is located 0 bytes inside of
>  64-byte region [ffff8880161c9800, ffff8880161c9840)
> The buggy address belongs to the page:
> page:ffffea0000587240 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x161c9
> flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000000200 0000000000000000 0000000b00000001 ffff888011041640
> raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 6484, ts 27339779214, free_ts 27310944203
>  prep_new_page mm/page_alloc.c:2436 [inline]
>  get_page_from_freelist+0x779/0xa30 mm/page_alloc.c:4168
>  __alloc_pages+0x26c/0x5f0 mm/page_alloc.c:5390
>  alloc_slab_page mm/slub.c:1691 [inline]
>  allocate_slab+0xf1/0x540 mm/slub.c:1831
>  new_slab mm/slub.c:1894 [inline]
>  new_slab_objects mm/slub.c:2640 [inline]
>  ___slab_alloc+0x1cf/0x350 mm/slub.c:2803
>  __slab_alloc mm/slub.c:2843 [inline]
>  slab_alloc_node mm/slub.c:2925 [inline]
>  __kmalloc_node+0x310/0x430 mm/slub.c:4159
>  kmalloc_node include/linux/slab.h:614 [inline]
>  kvmalloc_node+0x81/0xf0 mm/util.c:587
>  kvmalloc include/linux/mm.h:806 [inline]
>  simple_xattr_alloc+0x3f/0xa0 fs/xattr.c:951
>  shmem_initxattrs+0x91/0x1e0 mm/shmem.c:3142
>  security_inode_init_security+0x37a/0x3c0 security/security.c:1099
>  shmem_mknod+0xb0/0x1b0 mm/shmem.c:2822
>  lookup_open fs/namei.c:3228 [inline]
>  open_last_lookups fs/namei.c:3298 [inline]
>  path_openat+0x13b7/0x36b0 fs/namei.c:3504
>  do_filp_open+0x253/0x4d0 fs/namei.c:3534
>  do_sys_openat2+0x124/0x460 fs/open.c:1204
>  do_sys_open fs/open.c:1220 [inline]
>  __do_sys_open fs/open.c:1228 [inline]
>  __se_sys_open fs/open.c:1224 [inline]
>  __x64_sys_open+0x221/0x270 fs/open.c:1224
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> page last free stack trace:
>  reset_page_owner include/linux/page_owner.h:24 [inline]
>  free_pages_prepare mm/page_alloc.c:1346 [inline]
>  free_pcp_prepare+0xc29/0xd20 mm/page_alloc.c:1397
>  free_unref_page_prepare mm/page_alloc.c:3332 [inline]
>  free_unref_page+0x7e/0x550 mm/page_alloc.c:3411
>  __vunmap+0x926/0xa70 mm/vmalloc.c:2587
>  free_work+0x66/0x90 mm/vmalloc.c:82
>  process_one_work+0x833/0x10c0 kernel/workqueue.c:2276
>  worker_thread+0xac1/0x1320 kernel/workqueue.c:2422
>  kthread+0x453/0x480 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>
> Memory state around the buggy address:
>  ffff8880161c9700: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
>  ffff8880161c9780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> >ffff8880161c9800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>                    ^
>  ffff8880161c9880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>  ffff8880161c9900: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> ==================================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

Adding Casey to the To/CC line as I think this issue lies with Smack
and not NetLabel, although if someone believes otherwise please let me
know.

It looks like the problem is near the bottom of smk_set_cipso(), right
after smk_netlbl_mls() successfully creates the netlabel category
bitmap and attempts to free any old existing category bitmaps before
assigning the new value.  As I see it the problem is that the
smack_known pointer, @skp, which is host to the bitmap is located from
a RCU protected list meaning that it is possible for other tasks to be
accessing the category bitmap while it is being freed, or directly
afterwards given they may still be pointing at the old/freed data.

Casey obviously knows Smack much better than I do so I'll refrain from
going to far with a solution here in the likelihood that I'm off the
mark, but I suspect the right solution here would be to either
duplicate and replace the entry in the smack_known list using the
normal RCU list manipulation approach (easiest?), manage the
smack_known->smk_netlabel field as it's own RCU protected pointer
(less easy?), or something else entirely (wildcard!).  I'm not sure
this is a problem we can, or want to, solve at the NetLabel layer.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [syzbot] KASAN: use-after-free Read in netlbl_catmap_walk
From: Casey Schaufler @ 2021-08-25 18:41 UTC (permalink / raw)
  To: Paul Moore
  Cc: davem, kuba, linux-kernel, linux-security-module, netdev,
	syzkaller-bugs, syzbot, Casey Schaufler
In-Reply-To: <CAHC9VhSUnq-+bMYCBvc4yyw9bkMNTChdn1m2b4Vn2i0e54AxxQ@mail.gmail.com>

On 8/25/2021 11:31 AM, Paul Moore wrote:
> On Wed, Aug 25, 2021 at 1:21 PM syzbot
> <syzbot+3f91de0b813cc3d19a80@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    6e764bcd1cf7 Merge tag 'for-linus' of git://git.kernel.org..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=124e77c5300000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2fd902af77ff1e56
>> dashboard link: https://syzkaller.appspot.com/bug?extid=3f91de0b813cc3d19a80
>> compiler:       Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13f72f16300000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=133e338d300000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+3f91de0b813cc3d19a80@syzkaller.appspotmail.com
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:564 [inline]
>> BUG: KASAN: use-after-free in netlbl_catmap_walk+0x28b/0x2e0 net/netlabel/netlabel_kapi.c:615
>> Read of size 4 at addr ffff8880161c9800 by task syz-executor742/8768
>>
>> CPU: 0 PID: 8768 Comm: syz-executor742 Not tainted 5.14.0-rc7-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:88 [inline]
>>  dump_stack_lvl+0x1ae/0x29f lib/dump_stack.c:105
>>  print_address_description+0x66/0x3b0 mm/kasan/report.c:233
>>  __kasan_report mm/kasan/report.c:419 [inline]
>>  kasan_report+0x163/0x210 mm/kasan/report.c:436
>>  _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:564 [inline]
>>  netlbl_catmap_walk+0x28b/0x2e0 net/netlabel/netlabel_kapi.c:615
>>  cipso_seq_show+0x15f/0x280 security/smack/smackfs.c:789
>>  traverse+0x1dc/0x530 fs/seq_file.c:111
>>  seq_lseek+0x12b/0x240 fs/seq_file.c:323
>>  vfs_llseek fs/read_write.c:300 [inline]
>>  ksys_lseek fs/read_write.c:313 [inline]
>>  __do_sys_lseek fs/read_write.c:324 [inline]
>>  __se_sys_lseek fs/read_write.c:322 [inline]
>>  __x64_sys_lseek+0x15b/0x1e0 fs/read_write.c:322
>>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x445889
>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007f9a9cff0318 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
>> RAX: ffffffffffffffda RBX: 00000000004ca408 RCX: 0000000000445889
>> RDX: 0000000000000000 RSI: 0000000100000001 RDI: 0000000000000003
>> RBP: 00000000004ca400 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ca40c
>> R13: 00007ffead4e31bf R14: 00007f9a9cff0400 R15: 0000000000022000
>>
>> Allocated by task 8768:
>>  kasan_save_stack mm/kasan/common.c:38 [inline]
>>  kasan_set_track mm/kasan/common.c:46 [inline]
>>  set_alloc_info mm/kasan/common.c:434 [inline]
>>  ____kasan_kmalloc+0xcd/0x100 mm/kasan/common.c:513
>>  kasan_kmalloc include/linux/kasan.h:264 [inline]
>>  kmem_cache_alloc_trace+0x96/0x340 mm/slub.c:2986
>>  kmalloc include/linux/slab.h:591 [inline]
>>  kzalloc include/linux/slab.h:721 [inline]
>>  netlbl_catmap_alloc include/net/netlabel.h:317 [inline]
>>  _netlbl_catmap_getnode net/netlabel/netlabel_kapi.c:582 [inline]
>>  netlbl_catmap_setbit+0x1cb/0x3f0 net/netlabel/netlabel_kapi.c:782
>>  smk_netlbl_mls+0x103/0x5e0 security/smack/smack_access.c:505
>>  smk_set_cipso+0x621/0x810 security/smack/smackfs.c:921
>>  vfs_write+0x289/0xc90 fs/read_write.c:603
>>  ksys_write+0x171/0x2a0 fs/read_write.c:658
>>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> Freed by task 8769:
>>  kasan_save_stack mm/kasan/common.c:38 [inline]
>>  kasan_set_track+0x3d/0x70 mm/kasan/common.c:46
>>  kasan_set_free_info+0x1f/0x40 mm/kasan/generic.c:360
>>  ____kasan_slab_free+0x10d/0x150 mm/kasan/common.c:366
>>  kasan_slab_free include/linux/kasan.h:230 [inline]
>>  slab_free_hook mm/slub.c:1628 [inline]
>>  slab_free_freelist_hook+0x1e8/0x2a0 mm/slub.c:1653
>>  slab_free mm/slub.c:3213 [inline]
>>  kfree+0xcf/0x2e0 mm/slub.c:4267
>>  netlbl_catmap_free include/net/netlabel.h:335 [inline]
>>  smk_set_cipso+0x682/0x810 security/smack/smackfs.c:923
>>  vfs_write+0x289/0xc90 fs/read_write.c:603
>>  ksys_write+0x171/0x2a0 fs/read_write.c:658
>>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> The buggy address belongs to the object at ffff8880161c9800
>>  which belongs to the cache kmalloc-64 of size 64
>> The buggy address is located 0 bytes inside of
>>  64-byte region [ffff8880161c9800, ffff8880161c9840)
>> The buggy address belongs to the page:
>> page:ffffea0000587240 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x161c9
>> flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
>> raw: 00fff00000000200 0000000000000000 0000000b00000001 ffff888011041640
>> raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>> page_owner tracks the page as allocated
>> page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 6484, ts 27339779214, free_ts 27310944203
>>  prep_new_page mm/page_alloc.c:2436 [inline]
>>  get_page_from_freelist+0x779/0xa30 mm/page_alloc.c:4168
>>  __alloc_pages+0x26c/0x5f0 mm/page_alloc.c:5390
>>  alloc_slab_page mm/slub.c:1691 [inline]
>>  allocate_slab+0xf1/0x540 mm/slub.c:1831
>>  new_slab mm/slub.c:1894 [inline]
>>  new_slab_objects mm/slub.c:2640 [inline]
>>  ___slab_alloc+0x1cf/0x350 mm/slub.c:2803
>>  __slab_alloc mm/slub.c:2843 [inline]
>>  slab_alloc_node mm/slub.c:2925 [inline]
>>  __kmalloc_node+0x310/0x430 mm/slub.c:4159
>>  kmalloc_node include/linux/slab.h:614 [inline]
>>  kvmalloc_node+0x81/0xf0 mm/util.c:587
>>  kvmalloc include/linux/mm.h:806 [inline]
>>  simple_xattr_alloc+0x3f/0xa0 fs/xattr.c:951
>>  shmem_initxattrs+0x91/0x1e0 mm/shmem.c:3142
>>  security_inode_init_security+0x37a/0x3c0 security/security.c:1099
>>  shmem_mknod+0xb0/0x1b0 mm/shmem.c:2822
>>  lookup_open fs/namei.c:3228 [inline]
>>  open_last_lookups fs/namei.c:3298 [inline]
>>  path_openat+0x13b7/0x36b0 fs/namei.c:3504
>>  do_filp_open+0x253/0x4d0 fs/namei.c:3534
>>  do_sys_openat2+0x124/0x460 fs/open.c:1204
>>  do_sys_open fs/open.c:1220 [inline]
>>  __do_sys_open fs/open.c:1228 [inline]
>>  __se_sys_open fs/open.c:1224 [inline]
>>  __x64_sys_open+0x221/0x270 fs/open.c:1224
>>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> page last free stack trace:
>>  reset_page_owner include/linux/page_owner.h:24 [inline]
>>  free_pages_prepare mm/page_alloc.c:1346 [inline]
>>  free_pcp_prepare+0xc29/0xd20 mm/page_alloc.c:1397
>>  free_unref_page_prepare mm/page_alloc.c:3332 [inline]
>>  free_unref_page+0x7e/0x550 mm/page_alloc.c:3411
>>  __vunmap+0x926/0xa70 mm/vmalloc.c:2587
>>  free_work+0x66/0x90 mm/vmalloc.c:82
>>  process_one_work+0x833/0x10c0 kernel/workqueue.c:2276
>>  worker_thread+0xac1/0x1320 kernel/workqueue.c:2422
>>  kthread+0x453/0x480 kernel/kthread.c:319
>>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>>
>> Memory state around the buggy address:
>>  ffff8880161c9700: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
>>  ffff8880161c9780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>>> ffff8880161c9800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>>                    ^
>>  ffff8880161c9880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>>  ffff8880161c9900: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>> ==================================================================
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>> syzbot can test patches for this issue, for details see:
>> https://goo.gl/tpsmEJ#testing-patches
> Adding Casey to the To/CC line as I think this issue lies with Smack
> and not NetLabel, although if someone believes otherwise please let me
> know.

This does appear to be a Smack issue. Smack issues should come to me.

> It looks like the problem is near the bottom of smk_set_cipso(), right
> after smk_netlbl_mls() successfully creates the netlabel category
> bitmap and attempts to free any old existing category bitmaps before
> assigning the new value.  As I see it the problem is that the
> smack_known pointer, @skp, which is host to the bitmap is located from
> a RCU protected list meaning that it is possible for other tasks to be
> accessing the category bitmap while it is being freed, or directly
> afterwards given they may still be pointing at the old/freed data.
>
> Casey obviously knows Smack much better than I do so I'll refrain from
> going to far with a solution here in the likelihood that I'm off the
> mark, but I suspect the right solution here would be to either
> duplicate and replace the entry in the smack_known list using the
> normal RCU list manipulation approach (easiest?), manage the
> smack_known->smk_netlabel field as it's own RCU protected pointer
> (less easy?), or something else entirely (wildcard!).  I'm not sure
> this is a problem we can, or want to, solve at the NetLabel layer.

Agreed. Not a problem for netlabel. 


^ permalink raw reply

* Re: [RFC PATCH 2/9] audit, io_uring, io-wq: add some basic audit support to io_uring
From: Paul Moore @ 2021-08-25 19:41 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-security-module, selinux, linux-audit, io-uring,
	linux-fsdevel, Kumar Kartikeya Dwivedi, Jens Axboe,
	Alexander Viro
In-Reply-To: <20210825012102.GC490529@madcap2.tricolour.ca>

On Tue, Aug 24, 2021 at 9:21 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> On 2021-06-02 13:46, Paul Moore wrote:
> > On Wed, Jun 2, 2021 at 1:29 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2021-05-21 17:49, Paul Moore wrote:
> > > > WARNING - This is a work in progress and should not be merged
> > > > anywhere important.  It is almost surely not complete, and while it
> > > > probably compiles it likely hasn't been booted and will do terrible
> > > > things.  You have been warned.
> > > >
> > > > This patch adds basic auditing to io_uring operations, regardless of
> > > > their context.  This is accomplished by allocating audit_context
> > > > structures for the io-wq worker and io_uring SQPOLL kernel threads
> > > > as well as explicitly auditing the io_uring operations in
> > > > io_issue_sqe().  The io_uring operations are audited using a new
> > > > AUDIT_URINGOP record, an example is shown below:
> > > >
> > > >   % <TODO - insert AUDIT_URINGOP record example>
> > > >
> > > > Thanks to Richard Guy Briggs for review and feedback.
> > > >
> > > > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > > ---
> > > >  fs/io-wq.c                 |    4 +
> > > >  fs/io_uring.c              |   11 +++
> > > >  include/linux/audit.h      |   17 ++++
> > > >  include/uapi/linux/audit.h |    1
> > > >  kernel/audit.h             |    2 +
> > > >  kernel/auditsc.c           |  173 ++++++++++++++++++++++++++++++++++++++++++++
> > > >  6 files changed, 208 insertions(+)

...

> > > > +     if (ctx->return_valid != AUDITSC_INVALID)
> > > > +             audit_log_format(ab, " success=%s exit=%ld",
> > > > +                              (ctx->return_valid == AUDITSC_SUCCESS ?
> > > > +                               "yes" : "no"),
> > > > +                              ctx->return_code);
> > > > +     audit_log_format(ab,
> > > > +                      " items=%d"
> > > > +                      " ppid=%d pid=%d auid=%u uid=%u gid=%u"
> > > > +                      " euid=%u suid=%u fsuid=%u"
> > > > +                      " egid=%u sgid=%u fsgid=%u",
> > > > +                      ctx->name_count,
> > > > +                      task_ppid_nr(current),
> > > > +                      task_tgid_nr(current),
> > > > +                      from_kuid(&init_user_ns, audit_get_loginuid(current)),
> > > > +                      from_kuid(&init_user_ns, cred->uid),
> > > > +                      from_kgid(&init_user_ns, cred->gid),
> > > > +                      from_kuid(&init_user_ns, cred->euid),
> > > > +                      from_kuid(&init_user_ns, cred->suid),
> > > > +                      from_kuid(&init_user_ns, cred->fsuid),
> > > > +                      from_kgid(&init_user_ns, cred->egid),
> > > > +                      from_kgid(&init_user_ns, cred->sgid),
> > > > +                      from_kgid(&init_user_ns, cred->fsgid));
> > >
> > > The audit session ID is still important, relevant and qualifies auid.
> > > In keeping with the SYSCALL record format, I think we want to keep
> > > ses=audit_get_sessionid(current) in here.
> >
> > This might be another case of syscall/io_uring confusion.  An io_uring
> > op doesn't necessarily have an audit session ID or an audit UID in the
> > conventional sense; for example think about SQPOLL works, shared
> > rings, etc.
>
> Right, but those syscalls are what instigate io_uring operations, so
> whatever process starts that operation, or gets handed that handle
> should be tracked with auid and sessionid (the two work together to
> track) unless we can easily track io_uring ops to connect them to a
> previous setup syscall.  If we see a need to keep the auid, then the
> sessionid goes with it.

As a reminder, once the io_uring is created appropriately one can
issue io_uring operations without making a syscall.  Further, sharing
a io_uring across process boundaries means that both the audit session
ID and audit login UID used to create the io_uring might not be the
same as the subject which issues operations to the io_uring.

Any io_uring operations that happen synchronously as the result of a
syscall should be associated with the SYSCALL record so the session ID
and login UID will be part of the event.  Asynchronous operations will
not have that information because we don't have a way to get it.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: There is an interesting conversation going on about virtiofsd and SELinux
From: Casey Schaufler @ 2021-08-25 21:06 UTC (permalink / raw)
  To: Stephen Smalley, dwalsh
  Cc: Vivek Goyal, selinux@vger.kernel.org, Ondrej Mosnacek, Paul Moore,
	Casey Schaufler, Linux Security Module list
In-Reply-To: <CAEjxPJ61LJK5tOX2Fr4F2Aubjfo-rkW1EAe9YDG_G6CQ-gr=-w@mail.gmail.com>

On 8/25/2021 1:26 PM, Stephen Smalley wrote:
> On Wed, Aug 25, 2021 at 3:19 PM Daniel Walsh <dwalsh@redhat.com> wrote:
>> The problem is how do we handle a VM with SELinux enabled and the rootfs
>> handled by virtiofsd? Or if the SELinux inside of the VM wanted to write
>> multiple different labels on a volume mounted in via virtiofsd.
>>
>> Here is how Vivek defines it:
>>
>> ```
>> How do we manage SELinux host and guest policy with passthrough
>> filesystems like virtiofs. Both guest and host might have different
>> SELinux policies and can conflict with each other in various ways. And
>> this problem exacerbates where there are nested VM guests. Now there
>> are 3 entities with possibly 3 different SELinux policies and still
>> all the real security.selinux xattr is created and stored on host
>> filesystem.
>>
>> One possible proposal is to remap guest SELinux xattrs to some other
>> name, say user.virtiofs.security.selinux. That way host will enforce
>> SELinux policy based on security.selinux xattr stored on file. Guest
>> will get labels stored in user.virtiofs.security.selinux and guest
>> kernel will enforce that. This in theory provides both guest and
>> host policies to co-exist and work. And this can be extended to
>> nested guest where its attrs are prefixed by one more level of
>> user.virtiofs. IOW, user.virtiofs.user.virtiofs.security.selinux
>> will be the xattr on host which will show up as security.selinux
>> xattr when nested guest does getxattr().
>>
>> Virtiofsd currently has capability to do this remapping. One problem
>> we have is that we are using "user" xattr namespace and one can
>> not use "user" xattr on symlinks and special files. So when guest
>> tries relabeling or chcon on these files, it fails. May be this is
>> fixable. I have done an RFC propsal upstream.
>>
>> https://lore.kernel.org/linux-fsdevel/20210708175738.360757-1-vgoyal@redhat.com/
>>
>> Looking for thoughts how to fix the issue of SELinux with passthrough
>> filesystems. What's the best way to solve this issue.
>>
>> Thanks
>> Vivek
>>
>> ```
>>
>> We used to talk about this way back when, but never found a good solution. Theoretically
>> labeled NFS has the same issue, but I don't believe there are any NFS rootfs using SELinux.
> The early labeled NFS work included a notion of a label
> domain-of-interpretation (DOI) field and label translation as part of
> the infrastructure but I don't think that made it into mainline?
> It is however part of the NFSv4.2 spec I believe (called a label
> format specifier or LFS).
> At present I believe the assumption is that either the NFS server is
> just a "dumb" server that is not enforcing any SELinux policy at all
> (just storing the labels for use by clients) or is enforcing the same
> policy as the clients.
>
> A while ago James Morris proposed an approach to namespaced selinux
> xattrs to support selinux namespaces,
> https://lore.kernel.org/linux-security-module/alpine.LFD.2.20.1711212009330.6297@localhost/
> .
> However, that keeps them in the security.* namespace which on the one
> hand better protects them from tampering but runs afoul of virtiofsd's
> goal of avoiding the need to run with capabilities.
>
> If we use the user. namespace we need a way to control which processes
> can change these attributes (or remove them) on the host; optimally we
> could limit it to only virtiofsd itself and no other processes not
> even
> root processes on the host. That presumably requires some kind of LSM
> hook or hook call; the current SELinux setxattr and removexattr hooks
> don't care about user.* beyond a general setattr permission check.

	"Wow, if only there was a round thing we could attach to this
	 wagon so it wouldn't be so hard to pull."

I must be missing something really obvious. User namespaces map uids
and gids to their parent namespace. An xattr namespace could do exactly
the same thing, mapping security.selinux=foo in the child namespace to
security.selinux=bar in the parent namespace. We know how to do this for
uids, and have found and addressed a bunch of issues. Why do something
different? There's no reason to make this LSM specific, or even something
that's only useful for security attributes. I know that we've got partial
"solutions", including virtiofs and various LSM specific namespace schemes.
Nothing I've seen wouldn't be better served by an xattr namespace.

Warning: I am not a fan of namespaces. I am less a fan of having dozens
of "solutions", none of which work with any of the others, all of which
have to be made to play nice together by user-space services.



^ permalink raw reply

* Re: There is an interesting conversation going on about virtiofsd and SELinux
From: Stephen Smalley @ 2021-08-25 21:24 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: dwalsh, Vivek Goyal, selinux@vger.kernel.org, Ondrej Mosnacek,
	Paul Moore, Linux Security Module list
In-Reply-To: <5001fd3a-7abc-9163-c912-c7d975c2fab3@schaufler-ca.com>

On Wed, Aug 25, 2021 at 5:06 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 8/25/2021 1:26 PM, Stephen Smalley wrote:
> > On Wed, Aug 25, 2021 at 3:19 PM Daniel Walsh <dwalsh@redhat.com> wrote:
> >> The problem is how do we handle a VM with SELinux enabled and the rootfs
> >> handled by virtiofsd? Or if the SELinux inside of the VM wanted to write
> >> multiple different labels on a volume mounted in via virtiofsd.
> >>
> >> Here is how Vivek defines it:
> >>
> >> ```
> >> How do we manage SELinux host and guest policy with passthrough
> >> filesystems like virtiofs. Both guest and host might have different
> >> SELinux policies and can conflict with each other in various ways. And
> >> this problem exacerbates where there are nested VM guests. Now there
> >> are 3 entities with possibly 3 different SELinux policies and still
> >> all the real security.selinux xattr is created and stored on host
> >> filesystem.
> >>
> >> One possible proposal is to remap guest SELinux xattrs to some other
> >> name, say user.virtiofs.security.selinux. That way host will enforce
> >> SELinux policy based on security.selinux xattr stored on file. Guest
> >> will get labels stored in user.virtiofs.security.selinux and guest
> >> kernel will enforce that. This in theory provides both guest and
> >> host policies to co-exist and work. And this can be extended to
> >> nested guest where its attrs are prefixed by one more level of
> >> user.virtiofs. IOW, user.virtiofs.user.virtiofs.security.selinux
> >> will be the xattr on host which will show up as security.selinux
> >> xattr when nested guest does getxattr().
> >>
> >> Virtiofsd currently has capability to do this remapping. One problem
> >> we have is that we are using "user" xattr namespace and one can
> >> not use "user" xattr on symlinks and special files. So when guest
> >> tries relabeling or chcon on these files, it fails. May be this is
> >> fixable. I have done an RFC propsal upstream.
> >>
> >> https://lore.kernel.org/linux-fsdevel/20210708175738.360757-1-vgoyal@redhat.com/
> >>
> >> Looking for thoughts how to fix the issue of SELinux with passthrough
> >> filesystems. What's the best way to solve this issue.
> >>
> >> Thanks
> >> Vivek
> >>
> >> ```
> >>
> >> We used to talk about this way back when, but never found a good solution. Theoretically
> >> labeled NFS has the same issue, but I don't believe there are any NFS rootfs using SELinux.
> > The early labeled NFS work included a notion of a label
> > domain-of-interpretation (DOI) field and label translation as part of
> > the infrastructure but I don't think that made it into mainline?
> > It is however part of the NFSv4.2 spec I believe (called a label
> > format specifier or LFS).
> > At present I believe the assumption is that either the NFS server is
> > just a "dumb" server that is not enforcing any SELinux policy at all
> > (just storing the labels for use by clients) or is enforcing the same
> > policy as the clients.
> >
> > A while ago James Morris proposed an approach to namespaced selinux
> > xattrs to support selinux namespaces,
> > https://lore.kernel.org/linux-security-module/alpine.LFD.2.20.1711212009330.6297@localhost/
> > .
> > However, that keeps them in the security.* namespace which on the one
> > hand better protects them from tampering but runs afoul of virtiofsd's
> > goal of avoiding the need to run with capabilities.
> >
> > If we use the user. namespace we need a way to control which processes
> > can change these attributes (or remove them) on the host; optimally we
> > could limit it to only virtiofsd itself and no other processes not
> > even
> > root processes on the host. That presumably requires some kind of LSM
> > hook or hook call; the current SELinux setxattr and removexattr hooks
> > don't care about user.* beyond a general setattr permission check.
>
>         "Wow, if only there was a round thing we could attach to this
>          wagon so it wouldn't be so hard to pull."
>
> I must be missing something really obvious. User namespaces map uids
> and gids to their parent namespace. An xattr namespace could do exactly
> the same thing, mapping security.selinux=foo in the child namespace to
> security.selinux=bar in the parent namespace. We know how to do this for
> uids, and have found and addressed a bunch of issues. Why do something
> different? There's no reason to make this LSM specific, or even something
> that's only useful for security attributes. I know that we've got partial
> "solutions", including virtiofs and various LSM specific namespace schemes.
> Nothing I've seen wouldn't be better served by an xattr namespace.
>
> Warning: I am not a fan of namespaces. I am less a fan of having dozens
> of "solutions", none of which work with any of the others, all of which
> have to be made to play nice together by user-space services.

The uid mappings for user namespaces are simplified by the fact that
uids are totally ordered so you can express the mapping compactly
using ranges.
Even for MLS/BLP, labels are only partially ordered and in TE, there
is no order at all among types.
Defining a complete mapping of all SELinux labels (to include all
possible category sets with all possible sensitivities with all
possible types...) would be huge.
There may not even be any one label in any of the policies in question
that can serve as the "ground truth" label that can be stored in the
filesystem xattr and mapped to a label in every other policy.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox