Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH v3 2/6] IMA: change process_buffer_measurement return type from void to int
From: Tushar Sugandhi @ 2020-09-11 16:22 UTC (permalink / raw)
  To: Mimi Zohar, stephen.smalley.work, casey, agk, snitzer, gmazyland
  Cc: tyhicks, sashal, jmorris, nramas, linux-integrity, selinux,
	linux-security-module, linux-kernel, dm-devel
In-Reply-To: <e76bdb18c6045702156441470e50380445e6e218.camel@linux.ibm.com>



On 2020-08-31 4:36 a.m., Mimi Zohar wrote:
> On Thu, 2020-08-27 at 18:57 -0700, Tushar Sugandhi wrote:
>> process_buffer_measurement() does not return the result of the operation.
>> Therefore, the consumers of this function cannot act on it, if needed.
>>
>> Update return type of process_buffer_measurement() from void to int.
> 
> Failure to measure may be audited, but should never fail.  This is one
> of the main differences between secure and trusted boot concepts.
> Notice in process_measurement() that -EACCES is only returned for
> appraisal.
> 
> Returning a failure from process_buffer_measurement() in itself isn't a
> problem, as long as the failure isn't returned to the LSM/IMA hook.
> However,  just as the callers of  process_measurement() originally
> processed the result, that processing was moved into
> process_measurement() [1].
> 
> Mimi
> 
> [1] 750943a30714 ima: remove enforce checking duplication
> 
I can ignore the result of process_buffer_measurement() in
ima_measure_critical_data(), and make  ima_measure_critical_data()
return type "void".

But currently ima_measure_critical_data() is the only place where the
results of p_b_m() are being used.
So I might as well just revert back the return type of p_b_m() to
the original "void".



^ permalink raw reply

* Re: [PATCH v3 1/6] IMA: generalize keyring specific measurement constructs
From: Tushar Sugandhi @ 2020-09-11 16:19 UTC (permalink / raw)
  To: Mimi Zohar, stephen.smalley.work, casey, agk, snitzer, gmazyland
  Cc: tyhicks, sashal, jmorris, nramas, linux-integrity, selinux,
	linux-security-module, linux-kernel, dm-devel
In-Reply-To: <4802c73c2ed22c64ea4f315d3115ead919c3205e.camel@linux.ibm.com>



On 2020-08-31 4:55 a.m., Mimi Zohar wrote:
> On Thu, 2020-08-27 at 18:56 -0700, Tushar Sugandhi wrote:
>> IMA functions such as ima_match_keyring(), process_buffer_measurement(),
>> ima_match_policy() etc. handle data specific to keyrings. Currently,
>> these constructs are not generic to handle any func specific data.
>> This makes it harder to extend without code duplication.
>>
>> Refactor the keyring specific measurement constructs to be generic and
>> reusable in other measurement scenarios.
> 
> Mostly this patch changes the variable name from keyring to func_data,
> which is good.  Other changes should be minimized.
> 
The only other change in this patch is introduction of
bool allow_empty_opt_list, which is needed as per my comment below.

Maybe I can move "allow_empty_opt_list" to a new patch after this one in
this series.

>>
>> Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
>> ---
> 
> <snip>
> 
>> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
>> index fe1df373c113..8866e84d0062 100644
>> --- a/security/integrity/ima/ima_policy.c
>> +++ b/security/integrity/ima/ima_policy.c
>> @@ -451,15 +451,21 @@ int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event,
>>   }
>>   
>>   /**
>> - * ima_match_keyring - determine whether the keyring matches the measure rule
>> - * @rule: a pointer to a rule
>> - * @keyring: name of the keyring to match against the measure rule
>> + * ima_match_rule_data - determine whether the given func_data matches
>> + *			 the measure rule data
>> + * @rule: IMA policy rule
>> + * @opt_list: rule data to match func_data against
>> + * @func_data: data to match against the measure rule data
>> + * @allow_empty_opt_list: If true matches all func_data
>>    * @cred: a pointer to a credentials structure for user validation
>>    *
>> - * Returns true if keyring matches one in the rule, false otherwise.
>> + * Returns true if func_data matches one in the rule, false otherwise.
>>    */
>> -static bool ima_match_keyring(struct ima_rule_entry *rule,
>> -			      const char *keyring, const struct cred *cred)
>> +static bool ima_match_rule_data(struct ima_rule_entry *rule,
>> +				const struct ima_rule_opt_list *opt_list,
> 
> Ok
> 
>> +				const char *func_data,
>> +				bool allow_empty_opt_list,
> 
> As the policy is loaded, shouldn't the rules should be checked, not
> here on usage?
> 
> Mimi

Since "keyrings=" is optional, I cannot check the rule at load time for
keyrings. func=KEY_CHECK may or may not have "keyrings=", and both are
valid scenarios.

However "critical_kernel_data_sources=" is mandatory for 
func=CRITICAL_DATA.

So I am already making that check at policy load time.

See patch 5/6 – function ima_match_rules(), where I check for
IMA_DATA_SOURCES.

+       case CRITICAL_DATA:
<snip>
+               if (!(entry->flags & IMA_DATA_SOURCES) ||
<snip>
+                       return false;
+

Since ima_match_rule_data (this function) handles both func=KEY_CHECK 
and func=CRITICAL_DATA, we have to use the bool "allow_empty_opt_list"
to differentiate between the two scenarios – whether the rule is
optional or not for a given func.

> 
>> +				const struct cred *cred)
>>   {
>>   	bool matched = false;
>>   	size_t i;
>>

^ permalink raw reply

* Re: [PATCH] security: keys: Use kvfree_sensitive in a few places
From: Alex Dewar @ 2020-09-11 16:09 UTC (permalink / raw)
  To: efremov
  Cc: David Howells, Jarkko Sakkinen, James Morris, Serge E. Hallyn,
	keyrings, linux-security-module, linux-kernel
In-Reply-To: <f97076d2-de8a-4600-ee81-4cf4fcdc3ec1@linux.com>

On 2020-09-11 17:05, Denis Efremov wrote:
> Hi,
>
> same patch
>
> https://lkml.org/lkml/2020/8/27/168
>
> Thanks,
> Denis
Ah ok. Sorry for the noise!
>
> On 9/11/20 2:44 PM, Alex Dewar wrote:
>> In big_key.c, there are a few places where memzero_explicit + kvfree is
>> used. It is better to use kvfree_sensitive instead, which is more
>> readable and also prevents the compiler from eliding the call to
>> memzero_explicit. Fix this.
>>
>> Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
>> ---
>>   security/keys/big_key.c | 9 +++------
>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/security/keys/big_key.c b/security/keys/big_key.c
>> index 691347dea3c1..d17e5f09eeb8 100644
>> --- a/security/keys/big_key.c
>> +++ b/security/keys/big_key.c
>> @@ -121,8 +121,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
>>   		*path = file->f_path;
>>   		path_get(path);
>>   		fput(file);
>> -		memzero_explicit(buf, enclen);
>> -		kvfree(buf);
>> +		kvfree_sensitive(buf, enclen);
>>   	} else {
>>   		/* Just store the data in a buffer */
>>   		void *data = kmalloc(datalen, GFP_KERNEL);
>> @@ -140,8 +139,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
>>   err_enckey:
>>   	kfree_sensitive(enckey);
>>   error:
>> -	memzero_explicit(buf, enclen);
>> -	kvfree(buf);
>> +	kvfree_sensitive(buf, enclen);
>>   	return ret;
>>   }
>>   
>> @@ -273,8 +271,7 @@ long big_key_read(const struct key *key, char *buffer, size_t buflen)
>>   err_fput:
>>   		fput(file);
>>   error:
>> -		memzero_explicit(buf, enclen);
>> -		kvfree(buf);
>> +		kvfree_sensitive(buf, enclen);
>>   	} else {
>>   		ret = datalen;
>>   		memcpy(buffer, key->payload.data[big_key_data], datalen);
>>


^ permalink raw reply

* Re: [PATCH] security: keys: Use kvfree_sensitive in a few places
From: Denis Efremov @ 2020-09-11 16:05 UTC (permalink / raw)
  To: Alex Dewar
  Cc: David Howells, Jarkko Sakkinen, James Morris, Serge E. Hallyn,
	keyrings, linux-security-module, linux-kernel
In-Reply-To: <20200911114400.82207-1-alex.dewar90@gmail.com>

Hi,

same patch

https://lkml.org/lkml/2020/8/27/168

Thanks,
Denis

On 9/11/20 2:44 PM, Alex Dewar wrote:
> In big_key.c, there are a few places where memzero_explicit + kvfree is
> used. It is better to use kvfree_sensitive instead, which is more
> readable and also prevents the compiler from eliding the call to
> memzero_explicit. Fix this.
> 
> Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
> ---
>  security/keys/big_key.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/security/keys/big_key.c b/security/keys/big_key.c
> index 691347dea3c1..d17e5f09eeb8 100644
> --- a/security/keys/big_key.c
> +++ b/security/keys/big_key.c
> @@ -121,8 +121,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
>  		*path = file->f_path;
>  		path_get(path);
>  		fput(file);
> -		memzero_explicit(buf, enclen);
> -		kvfree(buf);
> +		kvfree_sensitive(buf, enclen);
>  	} else {
>  		/* Just store the data in a buffer */
>  		void *data = kmalloc(datalen, GFP_KERNEL);
> @@ -140,8 +139,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
>  err_enckey:
>  	kfree_sensitive(enckey);
>  error:
> -	memzero_explicit(buf, enclen);
> -	kvfree(buf);
> +	kvfree_sensitive(buf, enclen);
>  	return ret;
>  }
>  
> @@ -273,8 +271,7 @@ long big_key_read(const struct key *key, char *buffer, size_t buflen)
>  err_fput:
>  		fput(file);
>  error:
> -		memzero_explicit(buf, enclen);
> -		kvfree(buf);
> +		kvfree_sensitive(buf, enclen);
>  	} else {
>  		ret = datalen;
>  		memcpy(buffer, key->payload.data[big_key_data], datalen);
> 

^ permalink raw reply

* [PATCH v2] security: keys: Use kvfree_sensitive in a few places
From: Alex Dewar @ 2020-09-11 16:06 UTC (permalink / raw)
  Cc: Alex Dewar, David Howells, Jarkko Sakkinen, James Morris,
	Serge E. Hallyn, keyrings, linux-security-module, linux-kernel
In-Reply-To: <20200911160009.n2drvcjpzrsloxcj@lenovo-laptop>

In big_key.c, there are a few places where memzero_explicit + kvfree is
used. Replace these with a single call to kvfree_sensitive, to make the
code more readable.

Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
---
v2 - Update commit message as per James Bottomley's suggestion

 security/keys/big_key.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/security/keys/big_key.c b/security/keys/big_key.c
index 691347dea3c1..d17e5f09eeb8 100644
--- a/security/keys/big_key.c
+++ b/security/keys/big_key.c
@@ -121,8 +121,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 		*path = file->f_path;
 		path_get(path);
 		fput(file);
-		memzero_explicit(buf, enclen);
-		kvfree(buf);
+		kvfree_sensitive(buf, enclen);
 	} else {
 		/* Just store the data in a buffer */
 		void *data = kmalloc(datalen, GFP_KERNEL);
@@ -140,8 +139,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 err_enckey:
 	kfree_sensitive(enckey);
 error:
-	memzero_explicit(buf, enclen);
-	kvfree(buf);
+	kvfree_sensitive(buf, enclen);
 	return ret;
 }
 
@@ -273,8 +271,7 @@ long big_key_read(const struct key *key, char *buffer, size_t buflen)
 err_fput:
 		fput(file);
 error:
-		memzero_explicit(buf, enclen);
-		kvfree(buf);
+		kvfree_sensitive(buf, enclen);
 	} else {
 		ret = datalen;
 		memcpy(buffer, key->payload.data[big_key_data], datalen);
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH V2 0/3] integrity: Load certs from EFI MOK config table
From: Mimi Zohar @ 2020-09-11 16:01 UTC (permalink / raw)
  To: Ard Biesheuvel, Lenny Szubowicz
  Cc: Linux Kernel Mailing List, linux-efi, platform-driver-x86,
	linux-security-module, andy.shevchenko, James Morris, serge,
	Kees Cook, Borislav Petkov, Peter Jones, David Howells, prarit
In-Reply-To: <CAMj1kXHOcGiwOT_sNTQRA=G7GCQSKLk2HSNoS2vEQYPzQpn0nw@mail.gmail.com>

On Fri, 2020-09-11 at 18:17 +0300, Ard Biesheuvel wrote:
> On Sat, 5 Sep 2020 at 04:31, Lenny Szubowicz <lszubowi@redhat.com> wrote:
> >
> > Because of system-specific EFI firmware limitations, EFI volatile
> > variables may not be capable of holding the required contents of
> > the Machine Owner Key (MOK) certificate store when the certificate
> > list grows above some size. Therefore, an EFI boot loader may pass
> > the MOK certs via a EFI configuration table created specifically for
> > this purpose to avoid this firmware limitation.
> >
> > An EFI configuration table is a simpler and more robust mechanism
> > compared to EFI variables and is well suited for one-way passage
> > of static information from a pre-OS environment to the kernel.
> >
> > Entries in the MOK variable configuration table are named key/value
> > pairs. Therefore the shim boot loader can create a MokListRT named
> > entry in the MOK configuration table that contains exactly the same
> > data as the MokListRT UEFI variable does or would otherwise contain.
> > As such, the kernel can load certs from the data in the MokListRT
> > configuration table entry data in the same way that it loads certs
> > from the data returned by the EFI GetVariable() runtime call for the
> > MokListRT variable.
> >
> > This patch set does not remove the support for loading certs from the
> > EFI MOK variables into the platform key ring. However, if both the EFI
> > MOK configuration table and corresponding EFI MOK variables are present,
> > the MOK table is used as the source of MOK certs.
> >
> > The contents of the individual named MOK config table entries are
> > made available to user space as individual sysfs binary files,
> > which are read-only to root, under:
> >
> >         /sys/firmware/efi/mok-variables/
> >
> > This enables an updated mokutil to provide support for:
> >
> >         mokutil --list-enrolled
> >
> > such that it can provide accurate information regardless of whether
> > the MOK configuration table or MOK EFI variables were the source
> > for certs. Note that all modifications of MOK related state are still
> > initiated by mokutil via EFI variables.
> >
> > V2: Incorporate feedback from V1
> >   Patch 01: efi: Support for MOK variable config table
> >   - Minor update to change log; no code changes
> >   Patch 02: integrity: Move import of MokListRT certs to a separate routine
> >   - Clean up code flow in code moved to load_moklist_certs()
> >   - Remove some unnecessary initialization of variables
> >   Patch 03: integrity: Load certs from the EFI MOK config table
> >   - Update required due to changes in patch 02.
> >   - Remove unnecessary init of mokvar_entry in load_moklist_certs()
> >
> > V1:
> >   https://lore.kernel.org/lkml/20200826034455.28707-1-lszubowi@redhat.com/
> >
> > Lenny Szubowicz (3):
> >   efi: Support for MOK variable config table
> >   integrity: Move import of MokListRT certs to a separate routine
> >   integrity: Load certs from the EFI MOK config table
> >
> >  arch/x86/kernel/setup.c                       |   1 +
> >  arch/x86/platform/efi/efi.c                   |   3 +
> >  drivers/firmware/efi/Makefile                 |   1 +
> >  drivers/firmware/efi/arm-init.c               |   1 +
> >  drivers/firmware/efi/efi.c                    |   6 +
> >  drivers/firmware/efi/mokvar-table.c           | 360 ++++++++++++++++++
> >  include/linux/efi.h                           |  34 ++
> >  security/integrity/platform_certs/load_uefi.c |  85 ++++-
> >  8 files changed, 472 insertions(+), 19 deletions(-)
> >  create mode 100644 drivers/firmware/efi/mokvar-table.c
> >
> 
> Thanks. I have tentatively queued these up in efi/next.
> 
> Mimi, please let me know if you have any thoughts on 3/3, and whether
> your R-b on 2/3 [v1] implies that you are ok with the series going
> through the EFI tree.

Yes, Ard, that was the intent.  I haven't reviewed the most recent
version.

Mimi


^ permalink raw reply

* Re: [PATCH] security: keys: Use kvfree_sensitive in a few places
From: Alex Dewar @ 2020-09-11 16:03 UTC (permalink / raw)
  To: Alex Dewar
  Cc: James Bottomley, David Howells, Jarkko Sakkinen, James Morris,
	Serge E. Hallyn, keyrings, linux-security-module
In-Reply-To: <20200911160009.n2drvcjpzrsloxcj@lenovo-laptop>

On Fri, Sep 11, 2020 at 05:00:09PM +0100, Alex Dewar wrote:
> On Fri, Sep 11, 2020 at 08:04:24AM -0700, James Bottomley wrote:
> > On Fri, 2020-09-11 at 12:44 +0100, Alex Dewar wrote:
> > > In big_key.c, there are a few places where memzero_explicit + kvfree
> > > is used. It is better to use kvfree_sensitive instead, which is more
> > > readable and also prevents the compiler from eliding the call to
> > > memzero_explicit. Fix this.
> > 
> > That last bit is untrue: the compiler can't elide memzero_explicit ...
> > that's why it has the explicit suffix.
> > 
> > The original problem was a lot of people do memset(.., 0, ..); kfree()
> > which the compiler can elide if it understands the memory is going out
> > of scope.  Or the even more problematic memset(..., 0, ...) on a stack
> > variable before it goes out of scope.
> > 
> > We can argue about readability but there's no secret leak here.
> 
> Ahh, my mistake. Thanks for the explanation.
> 
> I'll send a v2 with an updated commit message. I think it would still
> make sense to use kfree_sensitive here as on next-20200911 this is the
> last use of kzfree in the tree and it would be nice to excise it
> altogether.

Ignore this! I thought we were talking about a different patch :-/

I'll send a respin with a better commit message anyways.

Cheers :-)

> 
> Best,
> Alex
> 
> > 
> > James
> > 

^ permalink raw reply

* Re: [PATCH] security: keys: Use kvfree_sensitive in a few places
From: Alex Dewar @ 2020-09-11 16:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Alex Dewar, David Howells, Jarkko Sakkinen, James Morris,
	Serge E. Hallyn, keyrings, linux-security-module
In-Reply-To: <1599836664.4041.21.camel@HansenPartnership.com>

On Fri, Sep 11, 2020 at 08:04:24AM -0700, James Bottomley wrote:
> On Fri, 2020-09-11 at 12:44 +0100, Alex Dewar wrote:
> > In big_key.c, there are a few places where memzero_explicit + kvfree
> > is used. It is better to use kvfree_sensitive instead, which is more
> > readable and also prevents the compiler from eliding the call to
> > memzero_explicit. Fix this.
> 
> That last bit is untrue: the compiler can't elide memzero_explicit ...
> that's why it has the explicit suffix.
> 
> The original problem was a lot of people do memset(.., 0, ..); kfree()
> which the compiler can elide if it understands the memory is going out
> of scope.  Or the even more problematic memset(..., 0, ...) on a stack
> variable before it goes out of scope.
> 
> We can argue about readability but there's no secret leak here.

Ahh, my mistake. Thanks for the explanation.

I'll send a v2 with an updated commit message. I think it would still
make sense to use kfree_sensitive here as on next-20200911 this is the
last use of kzfree in the tree and it would be nice to excise it
altogether.

Best,
Alex

> 
> James
> 

^ permalink raw reply

* Re: [PATCH V2 2/3] integrity: Move import of MokListRT certs to a separate routine
From: Mimi Zohar @ 2020-09-11 15:59 UTC (permalink / raw)
  To: Lenny Szubowicz, Ard Biesheuvel
  Cc: Linux Kernel Mailing List, linux-efi, platform-driver-x86,
	linux-security-module, andy.shevchenko, James Morris, serge,
	Kees Cook, Borislav Petkov, Peter Jones, David Howells, prarit
In-Reply-To: <f0a079b1-5f02-8618-fdfe-aea2278113c9@redhat.com>

On Fri, 2020-09-11 at 11:54 -0400, Lenny Szubowicz wrote:
> On 9/11/20 11:02 AM, Ard Biesheuvel wrote:
> > On Sat, 5 Sep 2020 at 04:31, Lenny Szubowicz <lszubowi@redhat.com> wrote:
> >>
> >> Move the loading of certs from the UEFI MokListRT into a separate
> >> routine to facilitate additional MokList functionality.
> >>
> >> There is no visible functional change as a result of this patch.
> >> Although the UEFI dbx certs are now loaded before the MokList certs,
> >> they are loaded onto different key rings. So the order of the keys
> >> on their respective key rings is the same.
> >>
> >> Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
> > 
> > Why did you drop Mimi's reviewed-by from this patch?
> 
> It was not intentional. I was just not aware that I needed to propagate
> Mimi Zohar's reviewed-by from V1 of the patch to V2.
> 
> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
> 
> V2 includes changes in that patch to incorporate suggestions from
> Andy Shevchenko. My assumption was that the maintainer would
> gather up the reviewed-by and add any signed-off-by as appropriate,
> but it sounds like my assumption was incorrect. In retrospect, I
> could see that having the maintainer dig through prior versions
> of a patch set for prior reviewed-by tags could be burdensome.

As much as possible moving code should be done without making changes,
simpler for code review.   Then as a separate patch you make changes.  
That way you could also have retained my Reviewed-by.

Mimi

> 
> Advice on the expected handling of this would be appreciated.



^ permalink raw reply

* Re: [PATCH V2 2/3] integrity: Move import of MokListRT certs to a separate routine
From: Lenny Szubowicz @ 2020-09-11 15:54 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Kernel Mailing List, linux-efi, platform-driver-x86,
	linux-security-module, andy.shevchenko, James Morris, serge,
	Kees Cook, zohar, Borislav Petkov, Peter Jones, David Howells,
	prarit
In-Reply-To: <CAMj1kXEdkdeE8VSZqEzhd__Kb7_ZmG2af6iBpbY3=nsj1-phYw@mail.gmail.com>

On 9/11/20 11:02 AM, Ard Biesheuvel wrote:
> On Sat, 5 Sep 2020 at 04:31, Lenny Szubowicz <lszubowi@redhat.com> wrote:
>>
>> Move the loading of certs from the UEFI MokListRT into a separate
>> routine to facilitate additional MokList functionality.
>>
>> There is no visible functional change as a result of this patch.
>> Although the UEFI dbx certs are now loaded before the MokList certs,
>> they are loaded onto different key rings. So the order of the keys
>> on their respective key rings is the same.
>>
>> Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
> 
> Why did you drop Mimi's reviewed-by from this patch?

It was not intentional. I was just not aware that I needed to propagate
Mimi Zohar's reviewed-by from V1 of the patch to V2.

Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>

V2 includes changes in that patch to incorporate suggestions from
Andy Shevchenko. My assumption was that the maintainer would
gather up the reviewed-by and add any signed-off-by as appropriate,
but it sounds like my assumption was incorrect. In retrospect, I
could see that having the maintainer dig through prior versions
of a patch set for prior reviewed-by tags could be burdensome.

Advice on the expected handling of this would be appreciated.

                     -Lenny.

> 
>> ---
>>   security/integrity/platform_certs/load_uefi.c | 63 +++++++++++++------
>>   1 file changed, 44 insertions(+), 19 deletions(-)
>>
>> diff --git a/security/integrity/platform_certs/load_uefi.c b/security/integrity/platform_certs/load_uefi.c
>> index 253fb9a7fc98..c1c622b4dc78 100644
>> --- a/security/integrity/platform_certs/load_uefi.c
>> +++ b/security/integrity/platform_certs/load_uefi.c
>> @@ -66,6 +66,43 @@ static __init void *get_cert_list(efi_char16_t *name, efi_guid_t *guid,
>>   }
>>
>>   /*
>> + * load_moklist_certs() - Load MokList certs
>> + *
>> + * Load the certs contained in the UEFI MokListRT database into the
>> + * platform trusted keyring.
>> + *
>> + * Return:     Status
>> + */
>> +static int __init load_moklist_certs(void)
>> +{
>> +       efi_guid_t mok_var = EFI_SHIM_LOCK_GUID;
>> +       void *mok;
>> +       unsigned long moksize;
>> +       efi_status_t status;
>> +       int rc;
>> +
>> +       /* Get MokListRT. It might not exist, so it isn't an error
>> +        * if we can't get it.
>> +        */
>> +       mok = get_cert_list(L"MokListRT", &mok_var, &moksize, &status);
>> +       if (mok) {
>> +               rc = parse_efi_signature_list("UEFI:MokListRT",
>> +                                             mok, moksize, get_handler_for_db);
>> +               kfree(mok);
>> +               if (rc)
>> +                       pr_err("Couldn't parse MokListRT signatures: %d\n", rc);
>> +               return rc;
>> +       }
>> +       if (status == EFI_NOT_FOUND)
>> +               pr_debug("MokListRT variable wasn't found\n");
>> +       else
>> +               pr_info("Couldn't get UEFI MokListRT\n");
>> +       return 0;
>> +}
>> +
>> +/*
>> + * load_uefi_certs() - Load certs from UEFI sources
>> + *
>>    * Load the certs contained in the UEFI databases into the platform trusted
>>    * keyring and the UEFI blacklisted X.509 cert SHA256 hashes into the blacklist
>>    * keyring.
>> @@ -73,17 +110,16 @@ static __init void *get_cert_list(efi_char16_t *name, efi_guid_t *guid,
>>   static int __init load_uefi_certs(void)
>>   {
>>          efi_guid_t secure_var = EFI_IMAGE_SECURITY_DATABASE_GUID;
>> -       efi_guid_t mok_var = EFI_SHIM_LOCK_GUID;
>> -       void *db = NULL, *dbx = NULL, *mok = NULL;
>> -       unsigned long dbsize = 0, dbxsize = 0, moksize = 0;
>> +       void *db = NULL, *dbx = NULL;
>> +       unsigned long dbsize = 0, dbxsize = 0;
>>          efi_status_t status;
>>          int rc = 0;
>>
>>          if (!efi_rt_services_supported(EFI_RT_SUPPORTED_GET_VARIABLE))
>>                  return false;
>>
>> -       /* Get db, MokListRT, and dbx.  They might not exist, so it isn't
>> -        * an error if we can't get them.
>> +       /* Get db and dbx.  They might not exist, so it isn't an error
>> +        * if we can't get them.
>>           */
>>          if (!uefi_check_ignore_db()) {
>>                  db = get_cert_list(L"db", &secure_var, &dbsize, &status);
>> @@ -102,20 +138,6 @@ static int __init load_uefi_certs(void)
>>                  }
>>          }
>>
>> -       mok = get_cert_list(L"MokListRT", &mok_var, &moksize, &status);
>> -       if (!mok) {
>> -               if (status == EFI_NOT_FOUND)
>> -                       pr_debug("MokListRT variable wasn't found\n");
>> -               else
>> -                       pr_info("Couldn't get UEFI MokListRT\n");
>> -       } else {
>> -               rc = parse_efi_signature_list("UEFI:MokListRT",
>> -                                             mok, moksize, get_handler_for_db);
>> -               if (rc)
>> -                       pr_err("Couldn't parse MokListRT signatures: %d\n", rc);
>> -               kfree(mok);
>> -       }
>> -
>>          dbx = get_cert_list(L"dbx", &secure_var, &dbxsize, &status);
>>          if (!dbx) {
>>                  if (status == EFI_NOT_FOUND)
>> @@ -131,6 +153,9 @@ static int __init load_uefi_certs(void)
>>                  kfree(dbx);
>>          }
>>
>> +       /* Load the MokListRT certs */
>> +       rc = load_moklist_certs();
>> +
>>          return rc;
>>   }
>>   late_initcall(load_uefi_certs);
>> --
>> 2.27.0
>>
> 


^ permalink raw reply

* [PATCH v37 11/24] x86/sgx: Add SGX enclave driver
From: Jarkko Sakkinen @ 2020-09-11 12:40 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Jarkko Sakkinen, linux-security-module, linux-mm,
	Andrew Morton, Matthew Wilcox, Jethro Beekman, Haitao Huang,
	Chunyang Hui, Jordan Hand, Nathaniel McCallum, Seth Moore,
	Darren Kenny, Sean Christopherson, Suresh Siddha,
	andriy.shevchenko, asapek, bp, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, luto, nhorman, puiterwijk,
	rientjes, tglx, yaozhangx
In-Reply-To: <20200911124019.42178-1-jarkko.sakkinen@linux.intel.com>

Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can
be used by applications to set aside private regions of code and data. The
code outside the SGX hosted software entity is prevented from accessing the
memory inside the enclave by the CPU. We call these entities as enclaves.

Add a driver that provides an ioctl API to construct and run enclaves.
Enclaves are constructed from pages residing in reserved physical memory
areas. The contents of these pages can only be accessed when they are
mapped as part of an enclave, by a hardware thread running inside the
enclave.

The starting state of an enclave consists of a fixed measured set of
pages that are copied to the EPC during the construction process by
using the opcode ENCLS leaf functions and Software Enclave Control
Structure (SECS) that defines the enclave properties.

Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
the EPC and EINIT checks a given signed measurement and moves the enclave
into a state ready for execution.

An initialized enclave can only be accessed through special Thread Control
Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER.  This leaf
function converts a thread into enclave mode and continues the execution in
the offset defined by the TCS provided to EENTER. An enclave is exited
through syscall, exception, interrupts or by explicitly calling another
ENCLU leaf EEXIT.

The mmap() permissions are capped by the contained enclave page
permissions. The mapped areas must also be populated, i.e. each page
address must contain a page. This logic is implemented in
sgx_encl_may_map().

Cc: linux-security-module@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>
Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
Tested-by: Seth Moore <sethmo@google.com>
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/Makefile |   2 +
 arch/x86/kernel/cpu/sgx/driver.c | 173 ++++++++++++++++
 arch/x86/kernel/cpu/sgx/driver.h |  29 +++
 arch/x86/kernel/cpu/sgx/encl.c   | 331 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h   |  85 ++++++++
 arch/x86/kernel/cpu/sgx/main.c   |  11 +
 6 files changed, 631 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h

diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 79510ce01b3b..3fc451120735 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,2 +1,4 @@
 obj-y += \
+	driver.o \
+	encl.o \
 	main.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
new file mode 100644
index 000000000000..f54da5f19c2b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/acpi.h>
+#include <linux/miscdevice.h>
+#include <linux/mman.h>
+#include <linux/security.h>
+#include <linux/suspend.h>
+#include <asm/traps.h>
+#include "driver.h"
+#include "encl.h"
+
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
+u32 sgx_misc_reserved_mask;
+u64 sgx_attributes_reserved_mask;
+u64 sgx_xfrm_reserved_mask = ~0x3;
+u32 sgx_xsave_size_tbl[64];
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl;
+	int ret;
+
+	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+	if (!encl)
+		return -ENOMEM;
+
+	atomic_set(&encl->flags, 0);
+	kref_init(&encl->refcount);
+	xa_init(&encl->page_array);
+	mutex_init(&encl->lock);
+	INIT_LIST_HEAD(&encl->mm_list);
+	spin_lock_init(&encl->mm_lock);
+
+	ret = init_srcu_struct(&encl->srcu);
+	if (ret) {
+		kfree(encl);
+		return ret;
+	}
+
+	file->private_data = encl;
+
+	return 0;
+}
+
+static int sgx_release(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl = file->private_data;
+	struct sgx_encl_mm *encl_mm;
+
+	for ( ; ; )  {
+		spin_lock(&encl->mm_lock);
+
+		if (list_empty(&encl->mm_list)) {
+			encl_mm = NULL;
+		} else {
+			encl_mm = list_first_entry(&encl->mm_list,
+						   struct sgx_encl_mm, list);
+			list_del_rcu(&encl_mm->list);
+		}
+
+		spin_unlock(&encl->mm_lock);
+
+		/* The list is empty, ready to go. */
+		if (!encl_mm)
+			break;
+
+		synchronize_srcu(&encl->srcu);
+		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
+		kfree(encl_mm);
+	}
+
+	mutex_lock(&encl->lock);
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+	mutex_unlock(&encl->lock);
+
+	kref_put(&encl->refcount, sgx_encl_release);
+	return 0;
+}
+
+static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = file->private_data;
+	int ret;
+
+	ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, vma->vm_flags);
+	if (ret)
+		return ret;
+
+	ret = sgx_encl_mm_add(encl, vma->vm_mm);
+	if (ret)
+		return ret;
+
+	vma->vm_ops = &sgx_vm_ops;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+	vma->vm_private_data = encl;
+
+	return 0;
+}
+
+static unsigned long sgx_get_unmapped_area(struct file *file,
+					   unsigned long addr,
+					   unsigned long len,
+					   unsigned long pgoff,
+					   unsigned long flags)
+{
+	if ((flags & MAP_TYPE) == MAP_PRIVATE)
+		return -EINVAL;
+
+	if (flags & MAP_FIXED)
+		return addr;
+
+	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+static const struct file_operations sgx_encl_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_open,
+	.release		= sgx_release,
+	.mmap			= sgx_mmap,
+	.get_unmapped_area	= sgx_get_unmapped_area,
+};
+
+static struct miscdevice sgx_dev_enclave = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "enclave",
+	.nodename = "sgx/enclave",
+	.fops = &sgx_encl_fops,
+};
+
+int __init sgx_drv_init(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	u64 attr_mask, xfrm_mask;
+	int ret;
+	int i;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		pr_info("The public key MSRs are not writable.\n");
+		return -ENODEV;
+	}
+
+	cpuid_count(SGX_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	sgx_misc_reserved_mask = ~ebx | SGX_MISC_RESERVED_MASK;
+	sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
+	sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
+
+	cpuid_count(SGX_CPUID, 1, &eax, &ebx, &ecx, &edx);
+
+	attr_mask = (((u64)ebx) << 32) + (u64)eax;
+	sgx_attributes_reserved_mask = ~attr_mask | SGX_ATTR_RESERVED_MASK;
+
+	if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+		xfrm_mask = (((u64)edx) << 32) + (u64)ecx;
+
+		for (i = 2; i < 64; i++) {
+			cpuid_count(0x0D, i, &eax, &ebx, &ecx, &edx);
+			if ((1UL << i) & xfrm_mask)
+				sgx_xsave_size_tbl[i] = eax + ebx;
+		}
+
+		sgx_xfrm_reserved_mask = ~xfrm_mask;
+	}
+
+	ret = misc_register(&sgx_dev_enclave);
+	if (ret) {
+		pr_err("Creating /dev/sgx/enclave failed with %d.\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
new file mode 100644
index 000000000000..f7ce40dedc91
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef __ARCH_SGX_DRIVER_H__
+#define __ARCH_SGX_DRIVER_H__
+
+#include <crypto/hash.h>
+#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include "sgx.h"
+
+#define SGX_EINIT_SPIN_COUNT	20
+#define SGX_EINIT_SLEEP_COUNT	50
+#define SGX_EINIT_SLEEP_TIME	20
+
+extern u64 sgx_encl_size_max_32;
+extern u64 sgx_encl_size_max_64;
+extern u32 sgx_misc_reserved_mask;
+extern u64 sgx_attributes_reserved_mask;
+extern u64 sgx_xfrm_reserved_mask;
+extern u32 sgx_xsave_size_tbl[64];
+
+long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+
+int sgx_drv_init(void);
+
+#endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
new file mode 100644
index 000000000000..c2c4a77af36b
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/lockdep.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/shmem_fs.h>
+#include <linux/suspend.h>
+#include <linux/sched/mm.h>
+#include "arch.h"
+#include "encl.h"
+#include "encls.h"
+#include "sgx.h"
+
+static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+						unsigned long addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned int flags;
+
+	/* If process was forked, VMA is still there but vm_private_data is set
+	 * to NULL.
+	 */
+	if (!encl)
+		return ERR_PTR(-EFAULT);
+
+	flags = atomic_read(&encl->flags);
+	if ((flags & SGX_ENCL_DEAD) || !(flags & SGX_ENCL_INITIALIZED))
+		return ERR_PTR(-EFAULT);
+
+	entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+	if (!entry)
+		return ERR_PTR(-EFAULT);
+
+	/* Page is already resident in the EPC. */
+	if (entry->epc_page)
+		return entry;
+
+	return ERR_PTR(-EFAULT);
+}
+
+static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
+	struct sgx_encl_mm *tmp = NULL;
+
+	/*
+	 * The enclave itself can remove encl_mm.  Note, objects can't be moved
+	 * off an RCU protected list, but deletion is ok.
+	 */
+	spin_lock(&encl_mm->encl->mm_lock);
+	list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) {
+		if (tmp == encl_mm) {
+			list_del_rcu(&encl_mm->list);
+			break;
+		}
+	}
+	spin_unlock(&encl_mm->encl->mm_lock);
+
+	if (tmp == encl_mm) {
+		synchronize_srcu(&encl_mm->encl->srcu);
+		mmu_notifier_put(mn);
+	}
+}
+
+static void sgx_mmu_notifier_free(struct mmu_notifier *mn)
+{
+	struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier);
+
+	kfree(encl_mm);
+}
+
+static const struct mmu_notifier_ops sgx_mmu_notifier_ops = {
+	.release		= sgx_mmu_notifier_release,
+	.free_notifier		= sgx_mmu_notifier_free,
+};
+
+static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl,
+					    struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = NULL;
+	struct sgx_encl_mm *tmp;
+	int idx;
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(tmp, &encl->mm_list, list) {
+		if (tmp->mm == mm) {
+			encl_mm = tmp;
+			break;
+		}
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return encl_mm;
+}
+
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm;
+	int ret;
+
+	/* mm_list can be accessed only by a single thread at a time. */
+	mmap_assert_write_locked(mm);
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_DEAD)
+		return -EINVAL;
+
+	/*
+	 * mm_structs are kept on mm_list until the mm or the enclave dies,
+	 * i.e. once an mm is off the list, it's gone for good, therefore it's
+	 * impossible to get a false positive on @mm due to a stale mm_list.
+	 */
+	if (sgx_encl_find_mm(encl, mm))
+		return 0;
+
+	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
+	if (!encl_mm)
+		return -ENOMEM;
+
+	encl_mm->encl = encl;
+	encl_mm->mm = mm;
+	encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops;
+
+	ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm);
+	if (ret) {
+		kfree(encl_mm);
+		return ret;
+	}
+
+	spin_lock(&encl->mm_lock);
+	list_add_rcu(&encl_mm->list, &encl->mm_list);
+	spin_unlock(&encl->mm_lock);
+
+	return 0;
+}
+
+static void sgx_vma_open(struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = vma->vm_private_data;
+
+	if (!encl)
+		return;
+
+	if (sgx_encl_mm_add(encl, vma->vm_mm))
+		vma->vm_private_data = NULL;
+}
+
+static unsigned int sgx_vma_fault(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl *encl = vma->vm_private_data;
+	struct sgx_encl_page *entry;
+	int ret = VM_FAULT_NOPAGE;
+	unsigned long pfn;
+
+	if (!encl)
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = sgx_encl_load_page(encl, addr);
+	if (IS_ERR(entry)) {
+		if (unlikely(PTR_ERR(entry) != -EBUSY))
+			ret = VM_FAULT_SIGBUS;
+
+		goto out;
+	}
+
+	if (!follow_pfn(vma, addr, &pfn))
+		goto out;
+
+	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(entry->epc_page->desc));
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
+/**
+ * sgx_encl_may_map() - Check if a requested VMA mapping is allowed
+ * @encl:		an enclave pointer
+ * @start:		lower bound of the address range, inclusive
+ * @end:		upper bound of the address range, exclusive
+ * @vm_prot_bits:	requested protections of the address range
+ *
+ * Iterate through the enclave pages contained within [@start, @end) to verify
+ * the permissions requested by @vm_prot_bits do not exceed that of any enclave
+ * page to be mapped.
+ *
+ * Return:
+ *   0 on success,
+ *   -EACCES if VMA permissions exceed enclave page permissions
+ */
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_flags)
+{
+	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+	unsigned long idx_start = PFN_DOWN(start);
+	unsigned long idx_end = PFN_DOWN(end - 1);
+	struct sgx_encl_page *page;
+
+	XA_STATE(xas, &encl->page_array, idx_start);
+
+	/*
+	 * Disallow READ_IMPLIES_EXEC tasks as their VMA permissions might
+	 * conflict with the enclave page permissions.
+	 */
+	if (current->personality & READ_IMPLIES_EXEC)
+		return -EACCES;
+
+	xas_for_each(&xas, page, idx_end)
+		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
+			return -EACCES;
+
+	return 0;
+}
+
+static int sgx_vma_mprotect(struct vm_area_struct *vma,
+			    struct vm_area_struct **pprev, unsigned long start,
+			    unsigned long end, unsigned long newflags)
+{
+	int ret;
+
+	ret = sgx_encl_may_map(vma->vm_private_data, start, end, newflags);
+	if (ret)
+		return ret;
+
+	return mprotect_fixup(vma, pprev, start, end, newflags);
+}
+
+const struct vm_operations_struct sgx_vm_ops = {
+	.open = sgx_vma_open,
+	.fault = sgx_vma_fault,
+	.mprotect = sgx_vma_mprotect,
+};
+
+/**
+ * sgx_encl_find - find an enclave
+ * @mm:		mm struct of the current process
+ * @addr:	address in the ELRANGE
+ * @vma:	the resulting VMA
+ *
+ * Find an enclave identified by the given address. Give back a VMA that is
+ * part of the enclave and located in that address. The VMA is given back if it
+ * is a proper enclave VMA even if an &sgx_encl instance does not exist yet
+ * (enclave creation has not been performed).
+ *
+ * Return:
+ *   0 on success,
+ *   -EINVAL if an enclave was not found,
+ *   -ENOENT if the enclave has not been created yet
+ */
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma)
+{
+	struct vm_area_struct *result;
+	struct sgx_encl *encl;
+
+	result = find_vma(mm, addr);
+	if (!result || result->vm_ops != &sgx_vm_ops || addr < result->vm_start)
+		return -EINVAL;
+
+	encl = result->vm_private_data;
+	*vma = result;
+
+	return encl ? 0 : -ENOENT;
+}
+
+/**
+ * sgx_encl_destroy() - destroy enclave resources
+ * @encl:	an enclave pointer
+ */
+void sgx_encl_destroy(struct sgx_encl *encl)
+{
+	struct sgx_encl_page *entry;
+	unsigned long index;
+
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+
+	xa_for_each(&encl->page_array, index, entry) {
+		if (entry->epc_page) {
+			sgx_free_epc_page(entry->epc_page);
+			encl->secs_child_cnt--;
+			entry->epc_page = NULL;
+		}
+
+		kfree(entry);
+	}
+
+	xa_destroy(&encl->page_array);
+
+	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_free_epc_page(encl->secs.epc_page);
+		encl->secs.epc_page = NULL;
+	}
+}
+
+/**
+ * sgx_encl_release - Destroy an enclave instance
+ * @kref:	address of a kref inside &sgx_encl
+ *
+ * Used together with kref_put(). Frees all the resources associated with the
+ * enclave and the instance itself.
+ */
+void sgx_encl_release(struct kref *ref)
+{
+	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+
+	sgx_encl_destroy(encl);
+
+	if (encl->backing)
+		fput(encl->backing);
+
+	cleanup_srcu_struct(&encl->srcu);
+
+	WARN_ON_ONCE(!list_empty(&encl->mm_list));
+
+	/* Detect EPC page leak's. */
+	WARN_ON_ONCE(encl->secs_child_cnt);
+	WARN_ON_ONCE(encl->secs.epc_page);
+
+	kfree(encl);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
new file mode 100644
index 000000000000..8ff445476657
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _X86_ENCL_H
+#define _X86_ENCL_H
+
+#include <linux/cpumask.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm_types.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/srcu.h>
+#include <linux/workqueue.h>
+#include <linux/xarray.h>
+#include "sgx.h"
+
+/**
+ * enum sgx_encl_page_desc - defines bits for an enclave page's descriptor
+ * %SGX_ENCL_PAGE_ADDR_MASK:		Holds the virtual address of the page.
+ *
+ * The page address for SECS is zero and is used by the subsystem to recognize
+ * the SECS page.
+ */
+enum sgx_encl_page_desc {
+	/* Bits 11:3 are available when the page is not swapped. */
+	SGX_ENCL_PAGE_ADDR_MASK		= PAGE_MASK,
+};
+
+#define SGX_ENCL_PAGE_ADDR(page) \
+	((page)->desc & SGX_ENCL_PAGE_ADDR_MASK)
+
+struct sgx_encl_page {
+	unsigned long desc;
+	unsigned long vm_max_prot_bits;
+	struct sgx_epc_page *epc_page;
+	struct sgx_encl *encl;
+};
+
+enum sgx_encl_flags {
+	SGX_ENCL_CREATED	= BIT(0),
+	SGX_ENCL_INITIALIZED	= BIT(1),
+	SGX_ENCL_DEBUG		= BIT(2),
+	SGX_ENCL_DEAD		= BIT(3),
+	SGX_ENCL_IOCTL		= BIT(4),
+};
+
+struct sgx_encl_mm {
+	struct sgx_encl *encl;
+	struct mm_struct *mm;
+	struct list_head list;
+	struct mmu_notifier mmu_notifier;
+};
+
+struct sgx_encl {
+	atomic_t flags;
+	unsigned int page_cnt;
+	unsigned int secs_child_cnt;
+	struct mutex lock;
+	struct list_head mm_list;
+	spinlock_t mm_lock;
+	struct file *backing;
+	struct kref refcount;
+	struct srcu_struct srcu;
+	unsigned long base;
+	unsigned long size;
+	unsigned long ssaframesize;
+	struct xarray page_array;
+	struct sgx_encl_page secs;
+	cpumask_t cpumask;
+};
+
+extern const struct vm_operations_struct sgx_vm_ops;
+
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma);
+void sgx_encl_destroy(struct sgx_encl *encl);
+void sgx_encl_release(struct kref *ref);
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_flags);
+
+#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 97c6895fb6c9..4137254fb29e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -9,6 +9,8 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include "driver.h"
+#include "encl.h"
 #include "encls.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
@@ -260,6 +262,8 @@ static bool __init sgx_page_cache_init(void)
 
 static void __init sgx_init(void)
 {
+	int ret;
+
 	if (!boot_cpu_has(X86_FEATURE_SGX))
 		return;
 
@@ -269,8 +273,15 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_kthread;
+
 	return;
 
+err_kthread:
+	kthread_stop(ksgxswapd_tsk);
+
 err_page_cache:
 	sgx_page_cache_teardown();
 }
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] socket.7,unix.7: add initial description for SO_PEERSEC
From: Stephen Smalley @ 2020-09-11 12:20 UTC (permalink / raw)
  To: Simon McVittie, James Morris, Serge E. Hallyn
  Cc: Michael Kerrisk, linux-man, LSM List, SElinux list
In-Reply-To: <20200911092320.GA1302235@horizon>

On Fri, Sep 11, 2020 at 5:23 AM Simon McVittie <smcv@collabora.com> wrote:
>
> On Thu, 10 Sep 2020 at 17:00:59 -0400, Stephen Smalley wrote:
> > +For SELinux, the security context string is a null-terminated
> > +string and the returned length includes the terminating null.
> > +Other security modules may differ.
>
> We discussed this interface a while ago when I was setting up dbus to
> use SO_PEERSEC. It would be really useful if the man page documented
> what callers can and can't expect from an unknown LSM, so that the
> author of the next D-Bus-equivalent doesn't have to turn up on the
> linux-security-module list and annoy maintainers like I did.
>
> Perhaps something like this?
>
>     The security context string may include a terminating null character
>     in the returned length, but is not guaranteed to do so:
>     a security context "foo" might be represented as either {'f','o','o'}
>     of length 3 or {'f','o','o','\0'} of length 4, which are considered
>     to be interchangeable. It is printable, does not contain non-terminating
>     null characters, and is in an unspecified encoding (in particular it is
>     not guaranteed to be ASCII or UTF-8).

Works for me.  Do the security subsystem maintainers concur?

^ permalink raw reply

* Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)
From: Mickaël Salaün @ 2020-09-11 12:16 UTC (permalink / raw)
  To: Matthew Wilcox, Al Viro
  Cc: Mimi Zohar, linux-kernel, Aleksa Sarai, Alexei Starovoitov,
	Andrew Morton, Andy Lutomirski, Arnd Bergmann, Casey Schaufler,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Kees Cook, Lakshmi Ramasubramanian,
	Matthew Garrett, Michael Kerrisk, Miklos Szeredi,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <20200910200543.GY6583@casper.infradead.org>


On 10/09/2020 22:05, Matthew Wilcox wrote:
> On Thu, Sep 10, 2020 at 09:00:10PM +0100, Al Viro wrote:
>> On Thu, Sep 10, 2020 at 07:40:33PM +0100, Matthew Wilcox wrote:
>>> On Thu, Sep 10, 2020 at 08:38:21PM +0200, Mickaël Salaün wrote:
>>>> There is also the use case of noexec mounts and file permissions. From
>>>> user space point of view, it doesn't matter which kernel component is in
>>>> charge of defining the policy. The syscall should then not be tied with
>>>> a verification/integrity/signature/appraisal vocabulary, but simply an
>>>> access control one.
>>>
>>> permission()?
>>
>> int lsm(int fd, const char *how, char *error, int size);
>>
>> Seriously, this is "ask LSM to apply special policy to file"; let's
>> _not_ mess with flags, etc. for that; give it decent bandwidth
>> and since it's completely opaque for the rest of the kernel,
>> just a pass a string to be parsed by LSM as it sees fit.

Well, I don't know why you're so angry against LSM, but as noticed by
Matthew, the main focus of this patch series is not about LSM (no hook,
no security/* code, only file permission and mount option checks,
nothing fancy). Moreover, the syscall you're proposing doesn't make
sense, but I guess it's yet another sarcastic reply. Please, cool down.
We asked for constructive comments and already followed your previous
requests (even if we didn't get answers for some questions), but
seriously, this one is nonsense.

> 
> Hang on, it does have some things which aren't BD^W^WLSM.  It lets
> the interpreter honour the mount -o noexec option.  I presume it's
> not easily defeated by
> 	cat /home/salaun/bin/bad.pl | perl -
> 

Funny. I know there is a lot of text and links but please read the
commit messages before further comments.

^ permalink raw reply

* [PATCH] security: keys: Use kvfree_sensitive in a few places
From: Alex Dewar @ 2020-09-11 11:44 UTC (permalink / raw)
  Cc: Alex Dewar, David Howells, Jarkko Sakkinen, James Morris,
	Serge E. Hallyn, keyrings, linux-security-module, linux-kernel

In big_key.c, there are a few places where memzero_explicit + kvfree is
used. It is better to use kvfree_sensitive instead, which is more
readable and also prevents the compiler from eliding the call to
memzero_explicit. Fix this.

Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
---
 security/keys/big_key.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/security/keys/big_key.c b/security/keys/big_key.c
index 691347dea3c1..d17e5f09eeb8 100644
--- a/security/keys/big_key.c
+++ b/security/keys/big_key.c
@@ -121,8 +121,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 		*path = file->f_path;
 		path_get(path);
 		fput(file);
-		memzero_explicit(buf, enclen);
-		kvfree(buf);
+		kvfree_sensitive(buf, enclen);
 	} else {
 		/* Just store the data in a buffer */
 		void *data = kmalloc(datalen, GFP_KERNEL);
@@ -140,8 +139,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 err_enckey:
 	kfree_sensitive(enckey);
 error:
-	memzero_explicit(buf, enclen);
-	kvfree(buf);
+	kvfree_sensitive(buf, enclen);
 	return ret;
 }
 
@@ -273,8 +271,7 @@ long big_key_read(const struct key *key, char *buffer, size_t buflen)
 err_fput:
 		fput(file);
 error:
-		memzero_explicit(buf, enclen);
-		kvfree(buf);
+		kvfree_sensitive(buf, enclen);
 	} else {
 		ret = datalen;
 		memcpy(buffer, key->payload.data[big_key_data], datalen);
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH] socket.7,unix.7: add initial description for SO_PEERSEC
From: Simon McVittie @ 2020-09-11  9:23 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: mtk.manpages, linux-man, linux-security-module, selinux
In-Reply-To: <20200910210059.34759-1-stephen.smalley.work@gmail.com>

On Thu, 10 Sep 2020 at 17:00:59 -0400, Stephen Smalley wrote:
> +For SELinux, the security context string is a null-terminated
> +string and the returned length includes the terminating null.
> +Other security modules may differ.

We discussed this interface a while ago when I was setting up dbus to
use SO_PEERSEC. It would be really useful if the man page documented
what callers can and can't expect from an unknown LSM, so that the
author of the next D-Bus-equivalent doesn't have to turn up on the
linux-security-module list and annoy maintainers like I did.

Perhaps something like this?

    The security context string may include a terminating null character
    in the returned length, but is not guaranteed to do so:
    a security context "foo" might be represented as either {'f','o','o'}
    of length 3 or {'f','o','o','\0'} of length 4, which are considered
    to be interchangeable. It is printable, does not contain non-terminating
    null characters, and is in an unspecified encoding (in particular it is
    not guaranteed to be ASCII or UTF-8).

Thanks,
    smcv

^ permalink raw reply

* Re: [PATCH v6 0/8] crpyto: introduce OSCCA certificate and SM2 asymmetric algorithm
From: Herbert Xu @ 2020-09-11  4:24 UTC (permalink / raw)
  To: Tianjia Zhang
  Cc: David S. Miller, David Howells, Maxime Coquelin, Alexandre Torgue,
	James Morris, Serge E. Hallyn, Stephan Mueller,
	Marcelo Henrique Cerri, Steven Rostedt (VMware), Masahiro Yamada,
	Brendan Higgins, Andrew Morton, Johannes Weiner, Waiman Long,
	Mimi Zohar, Lakshmi Ramasubramanian, Colin Ian King,
	Tushar Sugandhi, Vitaly Chikunov, Gilad Ben-Yossef,
	Pascal van Leeuwen, linux-crypto, linux-kernel, keyrings,
	linux-stm32, linux-arm-kernel, linux-security-module,
	Xufeng Zhang, Jia Zhang
In-Reply-To: <20200903131242.128665-1-tianjia.zhang@linux.alibaba.com>

On Thu, Sep 03, 2020 at 09:12:34PM +0800, Tianjia Zhang wrote:
>
> ---
> v6 changes:
>   1. remove mpi_sub_ui function from mpi library.
>   2. rebase on mainline.

This series is still missing acks for patches 6-8.  Without them
it cannot proceed.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC PATCH 6/6] security/fbfam: Mitigate a fork brute force attack
From: Jann Horn @ 2020-09-11  0:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kernel Hardening, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, kernel list, linux-fsdevel,
	linux-security-module
In-Reply-To: <202009101649.2A0BF95@keescook>

On Fri, Sep 11, 2020 at 1:56 AM Kees Cook <keescook@chromium.org> wrote:
> On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> > From: John Wood <john.wood@gmx.com>
> >
> > In order to mitigate a fork brute force attack it is necessary to kill
> > all the offending tasks. This tasks are all the ones that share the
> > statistical data with the current task (the task that has crashed).
> >
> > Since the attack detection is done in the function fbfam_handle_attack()
> > that is called every time a core dump is triggered, only is needed to
> > kill the others tasks that share the same statistical data, not the
> > current one as this is in the path to be killed.
[...]
> > +     for_each_process(p) {
> > +             if (p == current || p->fbfam_stats != stats)
> > +                     continue;
> > +
> > +             do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> > +             pr_warn("fbfam: Offending process with PID %d killed\n",
> > +                     p->pid);
[...]
> > +
> > +             killed += 1;
> > +             if (killed >= to_kill)
> > +                     break;
> > +     }
> > +
> > +     rcu_read_unlock();
>
> Can't newly created processes escape this RCU read lock? I think this
> need alternate locking, or something in the task_alloc hook that will
> block any new process from being created within the stats group.

Good point; the proper way to deal with this would probably be to take
the tasklist_lock in read mode around this loop (with
read_lock(&tasklist_lock) / read_unlock(&tasklist_lock)), which pairs
with the write_lock_irq(&tasklist_lock) in copy_process(). Thanks to
the fatal_signal_pending() check while holding the lock in
copy_process(), that would be race-free - any fork() that has not yet
inserted the new task into the global task list would wait for us to
drop the tasklist_lock, then bail out at the fatal_signal_pending()
check.

^ permalink raw reply

* Re: [RFC PATCH 5/6] security/fbfam: Detect a fork brute force attack
From: Jann Horn @ 2020-09-11  0:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kernel Hardening, John Wood, Matthew Wilcox, Jonathan Corbet,
	Alexander Viro, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Iurii Zaikin, James Morris,
	Serge E. Hallyn, linux-doc, kernel list, linux-fsdevel,
	linux-security-module
In-Reply-To: <202009101634.52ED6751AD@keescook>

On Fri, Sep 11, 2020 at 1:49 AM Kees Cook <keescook@chromium.org> wrote:
> On Thu, Sep 10, 2020 at 01:21:06PM -0700, Kees Cook wrote:
> > From: John Wood <john.wood@gmx.com>
> >
> > To detect a fork brute force attack it is necessary to compute the
> > crashing rate of the application. This calculation is performed in each
> > fatal fail of a task, or in other words, when a core dump is triggered.
> > If this rate shows that the application is crashing quickly, there is a
> > clear signal that an attack is happening.
> >
> > Since the crashing rate is computed in milliseconds per fault, if this
> > rate goes under a certain threshold a warning is triggered.
> >
> > Signed-off-by: John Wood <john.wood@gmx.com>
> > ---
> >  fs/coredump.c          |  2 ++
> >  include/fbfam/fbfam.h  |  2 ++
> >  security/fbfam/fbfam.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 43 insertions(+)
> >
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 76e7c10edfc0..d4ba4e1828d5 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -51,6 +51,7 @@
> >  #include "internal.h"
> >
> >  #include <trace/events/sched.h>
> > +#include <fbfam/fbfam.h>
> >
> >  int core_uses_pid;
> >  unsigned int core_pipe_limit;
> > @@ -825,6 +826,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> >  fail_creds:
> >       put_cred(cred);
> >  fail:
> > +     fbfam_handle_attack(siginfo->si_signo);
>
> I don't think this is the right place for detecting a crash -- isn't
> this only for the "dumping core" condition? In other words, don't you
> want to do this in get_signal()'s "fatal" block? (i.e. very close to the
> do_coredump, but without the "should I dump?" check?)
>
> Hmm, but maybe I'm wrong? It looks like you're looking at noticing the
> process taking a signal from SIG_KERNEL_COREDUMP_MASK ?
>
> (Better yet: what are fatal conditions that do NOT match
> SIG_KERNEL_COREDUMP_MASK, and should those be covered?)
>
> Regardless, *this* looks like the only place without an LSM hook. And it
> doesn't seem unreasonable to add one here. I assume it would probably
> just take the siginfo pointer, which is also what you're checking.

Good point, making this an LSM might be a good idea.

> e.g. for include/linux/lsm_hook_defs.h:
>
> LSM_HOOK(int, 0, task_coredump, const kernel_siginfo_t *siginfo);

I guess it should probably be an LSM_RET_VOID hook? And since, as you
said, it's not really semantically about core dumping, maybe it should
be named task_fatal_signal or something like that.

^ permalink raw reply

* Re: [RESEND][RFC PATCH 0/6] Fork brute force attack mitigation (fbfam)
From: Kees Cook @ 2020-09-10 23:58 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-1-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:01PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> The goal of this patch serie is to detect and mitigate a fork brute force
> attack.

Thanks for this RFC! I'm excited to get this problem finally handled in
the kernel. Hopefully the feedback is useful. :)

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC PATCH 6/6] security/fbfam: Mitigate a fork brute force attack
From: Kees Cook @ 2020-09-10 23:56 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-7-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:07PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> In order to mitigate a fork brute force attack it is necessary to kill
> all the offending tasks. This tasks are all the ones that share the
> statistical data with the current task (the task that has crashed).
> 
> Since the attack detection is done in the function fbfam_handle_attack()
> that is called every time a core dump is triggered, only is needed to
> kill the others tasks that share the same statistical data, not the
> current one as this is in the path to be killed.
> 
> When the SIGKILL signal is sent to the offending tasks from the function
> fbfam_kill_tasks(), this one will be called again during the core dump
> due to the shared statistical data shows a quickly crashing rate. So, to
> avoid kill again the same tasks due to a recursive call of this
> function, it is necessary to disable the attack detection.
> 
> To disable this attack detection, add a condition in the function
> fbfam_handle_attack() to not compute the crashing rate when the jiffies
> stored in the statistical data are set to zero.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  security/fbfam/fbfam.c | 76 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
> index 3aa669e4ea51..173a6122390f 100644
> --- a/security/fbfam/fbfam.c
> +++ b/security/fbfam/fbfam.c
> @@ -4,8 +4,11 @@
>  #include <linux/errno.h>
>  #include <linux/gfp.h>
>  #include <linux/jiffies.h>
> +#include <linux/pid.h>
>  #include <linux/printk.h>
> +#include <linux/rcupdate.h>
>  #include <linux/refcount.h>
> +#include <linux/sched/signal.h>
>  #include <linux/signal.h>
>  #include <linux/slab.h>
>  
> @@ -24,7 +27,8 @@ unsigned long sysctl_crashing_rate_threshold = 30000;
>   * struct fbfam_stats - Fork brute force attack mitigation statistics.
>   * @refc: Reference counter.
>   * @faults: Number of crashes since jiffies.
> - * @jiffies: First fork or execve timestamp.
> + * @jiffies: First fork or execve timestamp. If zero, the attack detection is
> + *           disabled.
>   *
>   * The purpose of this structure is to manage all the necessary information to
>   * compute the crashing rate of an application. So, it holds a first fork or
> @@ -175,13 +179,69 @@ int fbfam_exit(void)
>  }
>  
>  /**
> - * fbfam_handle_attack() - Fork brute force attack detection.
> + * fbfam_kill_tasks() - Kill the offending tasks
> + *
> + * When a fork brute force attack is detected it is necessary to kill all the
> + * offending tasks. Since this function is called from fbfam_handle_attack(),
> + * and so, every time a core dump is triggered, only is needed to kill the
> + * others tasks that share the same statistical data, not the current one as
> + * this is in the path to be killed.
> + *
> + * When the SIGKILL signal is sent to the offending tasks, this function will be
> + * called again during the core dump due to the shared statistical data shows a
> + * quickly crashing rate. So, to avoid kill again the same tasks due to a
> + * recursive call of this function, it is necessary to disable the attack
> + * detection setting the jiffies to zero.
> + *
> + * To improve the for_each_process loop it is possible to end it when all the
> + * tasks that shared the same statistics are found.
> + *
> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +static int fbfam_kill_tasks(void)
> +{
> +	struct fbfam_stats *stats = current->fbfam_stats;
> +	struct task_struct *p;
> +	unsigned int to_kill, killed = 0;
> +
> +	if (!stats)
> +		return -EFAULT;
> +
> +	to_kill = refcount_read(&stats->refc) - 1;
> +	if (!to_kill)
> +		return 0;
> +
> +	/* Disable the attack detection */
> +	stats->jiffies = 0;
> +	rcu_read_lock();
> +
> +	for_each_process(p) {
> +		if (p == current || p->fbfam_stats != stats)
> +			continue;
> +
> +		do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> +		pr_warn("fbfam: Offending process with PID %d killed\n",
> +			p->pid);

I'd make this ratelimited (along with Jann's suggestions). Also, instead
of the explicit "fbfam:" prefix, use the regular prefixing method:

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt


> +
> +		killed += 1;
> +		if (killed >= to_kill)
> +			break;
> +	}
> +
> +	rcu_read_unlock();

Can't newly created processes escape this RCU read lock? I think this
need alternate locking, or something in the task_alloc hook that will
block any new process from being created within the stats group.

> +	return 0;
> +}
> +
> +/**
> + * fbfam_handle_attack() - Fork brute force attack detection and mitigation.
>   * @signal: Signal number that causes the core dump.
>   *
>   * The crashing rate of an application is computed in milliseconds per fault in
>   * each crash. So, if this rate goes under a certain threshold there is a clear
>   * signal that the application is crashing quickly. At this moment, a fork brute
> - * force attack is happening.
> + * force attack is happening. Under this scenario it is necessary to kill all
> + * the offending tasks in order to mitigate the attack.
>   *
>   * Return: -EFAULT if the current task doesn't have statistical data. Zero
>   *         otherwise.
> @@ -195,6 +255,10 @@ int fbfam_handle_attack(int signal)
>  	if (!stats)
>  		return -EFAULT;
>  
> +	/* The attack detection is disabled */
> +	if (!stats->jiffies)
> +		return 0;
> +
>  	if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
>  	      signal == SIGSEGV || signal == SIGSYS))
>  		return 0;
> @@ -205,9 +269,11 @@ int fbfam_handle_attack(int signal)
>  	delta_time = jiffies64_to_msecs(delta_jiffies);
>  	crashing_rate = delta_time / (u64)stats->faults;
>  
> -	if (crashing_rate < (u64)sysctl_crashing_rate_threshold)
> -		pr_warn("fbfam: Fork brute force attack detected\n");
> +	if (crashing_rate >= (u64)sysctl_crashing_rate_threshold)
> +		return 0;
>  
> +	pr_warn("fbfam: Fork brute force attack detected\n");
> +	fbfam_kill_tasks();
>  	return 0;
>  }
>  
> -- 
> 2.25.1
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC PATCH 5/6] security/fbfam: Detect a fork brute force attack
From: Kees Cook @ 2020-09-10 23:49 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-6-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:06PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> To detect a fork brute force attack it is necessary to compute the
> crashing rate of the application. This calculation is performed in each
> fatal fail of a task, or in other words, when a core dump is triggered.
> If this rate shows that the application is crashing quickly, there is a
> clear signal that an attack is happening.
> 
> Since the crashing rate is computed in milliseconds per fault, if this
> rate goes under a certain threshold a warning is triggered.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  fs/coredump.c          |  2 ++
>  include/fbfam/fbfam.h  |  2 ++
>  security/fbfam/fbfam.c | 39 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 43 insertions(+)
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 76e7c10edfc0..d4ba4e1828d5 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -51,6 +51,7 @@
>  #include "internal.h"
>  
>  #include <trace/events/sched.h>
> +#include <fbfam/fbfam.h>
>  
>  int core_uses_pid;
>  unsigned int core_pipe_limit;
> @@ -825,6 +826,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>  fail_creds:
>  	put_cred(cred);
>  fail:
> +	fbfam_handle_attack(siginfo->si_signo);

I don't think this is the right place for detecting a crash -- isn't
this only for the "dumping core" condition? In other words, don't you
want to do this in get_signal()'s "fatal" block? (i.e. very close to the
do_coredump, but without the "should I dump?" check?)

Hmm, but maybe I'm wrong? It looks like you're looking at noticing the
process taking a signal from SIG_KERNEL_COREDUMP_MASK ?

(Better yet: what are fatal conditions that do NOT match
SIG_KERNEL_COREDUMP_MASK, and should those be covered?)

Regardless, *this* looks like the only place without an LSM hook. And it
doesn't seem unreasonable to add one here. I assume it would probably
just take the siginfo pointer, which is also what you're checking.

e.g. for include/linux/lsm_hook_defs.h:

LSM_HOOK(int, 0, task_coredump, const kernel_siginfo_t *siginfo);


>  	return;
>  }
>  
> diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
> index 2cfe51d2b0d5..9ac8e33d8291 100644
> --- a/include/fbfam/fbfam.h
> +++ b/include/fbfam/fbfam.h
> @@ -12,10 +12,12 @@ extern struct ctl_table fbfam_sysctls[];
>  int fbfam_fork(struct task_struct *child);
>  int fbfam_execve(void);
>  int fbfam_exit(void);
> +int fbfam_handle_attack(int signal);
>  #else
>  static inline int fbfam_fork(struct task_struct *child) { return 0; }
>  static inline int fbfam_execve(void) { return 0; }
>  static inline int fbfam_exit(void) { return 0; }
> +static inline int fbfam_handle_attack(int signal) { return 0; }
>  #endif
>  
>  #endif /* _FBFAM_H_ */
> diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
> index 9be4639b72eb..3aa669e4ea51 100644
> --- a/security/fbfam/fbfam.c
> +++ b/security/fbfam/fbfam.c
> @@ -4,7 +4,9 @@
>  #include <linux/errno.h>
>  #include <linux/gfp.h>
>  #include <linux/jiffies.h>
> +#include <linux/printk.h>
>  #include <linux/refcount.h>
> +#include <linux/signal.h>
>  #include <linux/slab.h>
>  
>  /**
> @@ -172,3 +174,40 @@ int fbfam_exit(void)
>  	return 0;
>  }
>  
> +/**
> + * fbfam_handle_attack() - Fork brute force attack detection.
> + * @signal: Signal number that causes the core dump.
> + *
> + * The crashing rate of an application is computed in milliseconds per fault in
> + * each crash. So, if this rate goes under a certain threshold there is a clear
> + * signal that the application is crashing quickly. At this moment, a fork brute
> + * force attack is happening.
> + *
> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +int fbfam_handle_attack(int signal)
> +{
> +	struct fbfam_stats *stats = current->fbfam_stats;
> +	u64 delta_jiffies, delta_time;
> +	u64 crashing_rate;
> +
> +	if (!stats)
> +		return -EFAULT;
> +
> +	if (!(signal == SIGILL || signal == SIGBUS || signal == SIGKILL ||
> +	      signal == SIGSEGV || signal == SIGSYS))
> +		return 0;

This will only be called for:

#define SIG_KERNEL_COREDUMP_MASK (\
        rt_sigmask(SIGQUIT)   |  rt_sigmask(SIGILL)    | \
        rt_sigmask(SIGTRAP)   |  rt_sigmask(SIGABRT)   | \
        rt_sigmask(SIGFPE)    |  rt_sigmask(SIGSEGV)   | \
        rt_sigmask(SIGBUS)    |  rt_sigmask(SIGSYS)    | \
        rt_sigmask(SIGXCPU)   |  rt_sigmask(SIGXFSZ)   | \
        SIGEMT_MASK                                    )

So you're skipping:

	SIGQUIT
	SIGTRAP
	SIGABRT
	SIGFPE
	SIGXCPU
	SIGXFSZ
	SIGEMT_MASK

I would include SIGABRT (e.g. glibc will call abort() for stack
canary, malloc, etc failures, which may indicate a mitigation has
fired).

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC PATCH 3/6] security/fbfam: Use the api to manage statistics
From: Kees Cook @ 2020-09-10 23:33 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-4-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:04PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> Use the previous defined api to manage statistics calling it accordingly
> when a task forks, calls execve or exits.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  fs/exec.c     | 2 ++
>  kernel/exit.c | 2 ++
>  kernel/fork.c | 4 ++++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index a91003e28eaa..b30118674d32 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -71,6 +71,7 @@
>  #include "internal.h"
>  
>  #include <trace/events/sched.h>
> +#include <fbfam/fbfam.h>
>  
>  static int bprm_creds_from_file(struct linux_binprm *bprm);
>  
> @@ -1940,6 +1941,7 @@ static int bprm_execve(struct linux_binprm *bprm,
>  	task_numa_free(current, false);
>  	if (displaced)
>  		put_files_struct(displaced);
> +	fbfam_execve();

As mentioned in the other emails, I think this could trivially be
converted into an LSM: all the hooks are available AFAICT. If you only
want to introspect execve _happening_, you can use bprm_creds_for_exec
which is called a few lines above. Otherwise, my prior suggestion ("the
exec has happened" hook via brpm_cred_committing, etc).

>  	return retval;
>  
>  out:
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 733e80f334e7..39a6139dcf31 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -67,6 +67,7 @@
>  #include <linux/uaccess.h>
>  #include <asm/unistd.h>
>  #include <asm/mmu_context.h>
> +#include <fbfam/fbfam.h>
>  
>  static void __unhash_process(struct task_struct *p, bool group_dead)
>  {
> @@ -852,6 +853,7 @@ void __noreturn do_exit(long code)
>  		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
>  	exit_rcu();
>  	exit_tasks_rcu_finish();
> +	fbfam_exit();
>  
>  	lockdep_free_task(tsk);
>  	do_task_dead();

The place for this would be put_task_struct, and the LSM hook is
task_free. :) (The only caveat with task_free hook is that it may be
called in non-process context due to being freed during RCU, etc. In
practice, this is unlikely to cause problems.)

> diff --git a/kernel/fork.c b/kernel/fork.c
> index 49677d668de4..c933838450a8 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -107,6 +107,8 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/task.h>
>  
> +#include <fbfam/fbfam.h>
> +
>  /*
>   * Minimum number of threads to boot the kernel
>   */
> @@ -941,6 +943,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  #ifdef CONFIG_MEMCG
>  	tsk->active_memcg = NULL;
>  #endif
> +
> +	fbfam_fork(tsk);
>  	return tsk;

Since you don't need "orig", this is also trivially an LSM hook.
dup_task_struct() is called by copy_process(), which will also call the
task_alloc LSM hook later on.

>  
>  free_stack:
> -- 
> 2.25.1
> 

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC PATCH 2/6] security/fbfam: Add the api to manage statistics
From: Kees Cook @ 2020-09-10 23:23 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-3-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:03PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> Create a statistical data structure to hold all the necessary
> information involve in a fork brute force attack. This info is a
> timestamp for the first fork or execve and the number of crashes since
> then. Moreover, due to this statitistical data will be shared between
> different tasks, a reference counter it is necessary.
> 
> For a correct management of an attack it is also necessary that all the
> tasks hold statistical data. The same statistical data needs to be
> shared between all the tasks that hold the same memory contents or in
> other words, between all the tasks that have been forked without any
> execve call.
> 
> When a forked task calls the execve system call, the memory contents are
> set with new values. So, in this scenario the parent's statistical data
> no need to be share. Instead, a new statistical data structure must be
> allocated to start a new cycle.
> 
> The statistical data that every task holds needs to be clear when a task
> exits. Due to this data is shared across multiples tasks, the reference
> counter is useful to free the previous allocated data only when there
> are not other pointers to the same data. Or in other words, when the
> reference counter reaches zero.
> 
> So, based in all the previous information add the api to manage all the
> commented cases.
> 
> Also, add to the struct task_struct a new field to point to the
> statitistical data related to an attack. This way, all the tasks will
> have access to this information.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>

I think patch 1 should be merged into this one since the former doesn't
really *do* anything. ;)

> ---
>  include/fbfam/fbfam.h   |  18 +++++
>  include/linux/sched.h   |   4 +
>  security/Makefile       |   4 +
>  security/fbfam/Makefile |   2 +
>  security/fbfam/fbfam.c  | 163 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 191 insertions(+)
>  create mode 100644 include/fbfam/fbfam.h
>  create mode 100644 security/fbfam/Makefile
>  create mode 100644 security/fbfam/fbfam.c
> 
> diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
> new file mode 100644
> index 000000000000..b5b7d1127a52
> --- /dev/null
> +++ b/include/fbfam/fbfam.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _FBFAM_H_
> +#define _FBFAM_H_
> +
> +#include <linux/sched.h>
> +
> +#ifdef CONFIG_FBFAM
> +int fbfam_fork(struct task_struct *child);
> +int fbfam_execve(void);
> +int fbfam_exit(void);
> +#else
> +static inline int fbfam_fork(struct task_struct *child) { return 0; }

This appears to map well to LSM hook "task_alloc".

> +static inline int fbfam_execve(void) { return 0; }

This appears to map well to LSM hook "bprm_committing_creds".

> +static inline int fbfam_exit(void) { return 0; }

This appears to map well to LSM hook "task_free".

> +#endif
> +
> +#endif /* _FBFAM_H_ */
> +
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index afe01e232935..00e1aa5e00cd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1315,6 +1315,10 @@ struct task_struct {
>  	struct callback_head		mce_kill_me;
>  #endif
>  
> +#ifdef CONFIG_FBFAM
> +	struct fbfam_stats		*fbfam_stats;
> +#endif

This could be part of the task_struct security blob used by LSMs.

> +
>  	/*
>  	 * New fields for task_struct should be added above here, so that
>  	 * they are included in the randomized portion of task_struct.
> diff --git a/security/Makefile b/security/Makefile
> index 3baf435de541..36dc4b536349 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM)			+= bpf/
>  # Object integrity file lists
>  subdir-$(CONFIG_INTEGRITY)		+= integrity
>  obj-$(CONFIG_INTEGRITY)			+= integrity/
> +
> +# Object fbfam file lists
> +subdir-$(CONFIG_FBFAM)			+= fbfam
> +obj-$(CONFIG_FBFAM)			+= fbfam/
> diff --git a/security/fbfam/Makefile b/security/fbfam/Makefile
> new file mode 100644
> index 000000000000..f4b9f0b19c44
> --- /dev/null
> +++ b/security/fbfam/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_FBFAM) += fbfam.o
> diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
> new file mode 100644
> index 000000000000..0387f95f6408
> --- /dev/null
> +++ b/security/fbfam/fbfam.c
> @@ -0,0 +1,163 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <asm/current.h>
> +#include <fbfam/fbfam.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/jiffies.h>
> +#include <linux/refcount.h>
> +#include <linux/slab.h>
> +
> +/**
> + * struct fbfam_stats - Fork brute force attack mitigation statistics.
> + * @refc: Reference counter.
> + * @faults: Number of crashes since jiffies.
> + * @jiffies: First fork or execve timestamp.
> + *
> + * The purpose of this structure is to manage all the necessary information to
> + * compute the crashing rate of an application. So, it holds a first fork or
> + * execve timestamp and a number of crashes since then. This way the crashing
> + * rate in milliseconds per fault can be compute when necessary with the
> + * following formula:
> + *
> + * u64 delta_jiffies = get_jiffies_64() - fbfam_stats::jiffies;
> + * u64 delta_time = jiffies64_to_msecs(delta_jiffies);
> + * u64 crashing_rate = delta_time / (u64)fbfam_stats::faults;
> + *
> + * If the fbfam_stats::faults is zero, the above formula can't be used. In this
> + * case, the crashing rate is zero.
> + *
> + * Moreover, since the same allocated structure will be used in every fork
> + * since the first one or execve, it's also necessary a reference counter.
> + */
> +struct fbfam_stats {
> +	refcount_t refc;
> +	unsigned int faults;
> +	u64 jiffies;
> +};
> +
> +/**
> + * fbfam_new_stats() - Allocation of new statistics structure.
> + *
> + * If the allocation is successful the reference counter is set to one to
> + * indicate that there will be one task that points to this structure. The
> + * faults field is initialize to zero and the timestamp for this moment is set.
> + *
> + * Return: NULL if the allocation fails. A pointer to the new allocated
> + *         statistics structure if it success.
> + */
> +static struct fbfam_stats *fbfam_new_stats(void)
> +{
> +	struct fbfam_stats *stats = kmalloc(sizeof(struct fbfam_stats),
> +					    GFP_KERNEL);
> +
> +	if (stats) {
> +		refcount_set(&stats->refc, 1);
> +		stats->faults = 0;
> +		stats->jiffies = get_jiffies_64();
> +	}
> +
> +	return stats;
> +}
> +
> +/*
> + * fbfam_fork() - Fork management.
> + * @child: Pointer to the child task that will be created with the fork system
> + *         call.
> + *
> + * For a correct management of a fork brute force attack it is necessary that
> + * all the tasks hold statistical data. The same statistical data needs to be
> + * shared between all the tasks that hold the same memory contents or in other
> + * words, between all the tasks that have been forked without any execve call.
> + *
> + * To ensure this, if the current task doesn't have statistical data when forks
> + * (only possible in the first fork of the zero task), it is mandatory to
> + * allocate a new one. This way, the child task always will share the statistics
> + * with its parent.
> + *
> + * Return: -ENOMEN if the allocation of the new statistics structure fails.
> + *         Zero otherwise.
> + */
> +int fbfam_fork(struct task_struct *child)
> +{
> +	struct fbfam_stats **stats = &current->fbfam_stats;
> +
> +	if (!*stats) {
> +		*stats = fbfam_new_stats();
> +		if (!*stats)
> +			return -ENOMEM;
> +	}
> +
> +	refcount_inc(&(*stats)->refc);
> +	child->fbfam_stats = *stats;
> +	return 0;
> +}
> +
> +/**
> + * fbfam_execve() - Execve management.
> + *
> + * When a forked task calls the execve system call, the memory contents are set
> + * with new values. So, in this scenario the parent's statistical data no need
> + * to be share. Instead, a new statistical data structure must be allocated to
> + * start a new cycle. This condition is detected when the statistics reference
> + * counter holds a value greater than or equal to two (a fork always sets the
> + * statistics reference counter to two since the parent and the child task are
> + * sharing the same data).
> + *
> + * However, if the execve function is called immediately after another execve
> + * call, althought the memory contents are reset, there is no need to allocate
> + * a new statistical data structure. This is possible because at this moment
> + * only one task (the task that calls the execve function) points to the data.
> + * In this case, the previous allocation is used and only the faults and time
> + * fields are reset.
> + *
> + * Return: -ENOMEN if the allocation of the new statistics structure fails.
> + *         -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +int fbfam_execve(void)
> +{
> +	struct fbfam_stats **stats = &current->fbfam_stats;
> +
> +	if (!*stats)
> +		return -EFAULT;
> +
> +	if (!refcount_dec_not_one(&(*stats)->refc)) {
> +		/* execve call after an execve call */
> +		(*stats)->faults = 0;
> +		(*stats)->jiffies = get_jiffies_64();
> +		return 0;
> +	}
> +
> +	/* execve call after a fork call */
> +	*stats = fbfam_new_stats();
> +	if (!*stats)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +/**
> + * fbfam_exit() - Exit management.
> + *
> + * The statistical data that every task holds needs to be clear when a task
> + * exits. Due to this data is shared across multiples tasks, the reference
> + * counter is useful to free the previous allocated data only when there are
> + * not other pointers to the same data. Or in other words, when the reference
> + * counter reaches zero.
> + *
> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + *         otherwise.
> + */
> +int fbfam_exit(void)
> +{
> +	struct fbfam_stats *stats = current->fbfam_stats;
> +
> +	if (!stats)
> +		return -EFAULT;
> +
> +	if (refcount_dec_and_test(&stats->refc))
> +		kfree(stats);
> +
> +	return 0;
> +}
> +
> -- 
> 2.25.1
> 

Jann mentions some concerns about races, and I'd agree: this doesn't
feel right to me, but I've not had a chance to study it yet. I'm
concerned about the lifetime management of the stats vs the task
hierarchy.

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC PATCH 1/6] security/fbfam: Add a Kconfig to enable the fbfam feature
From: Kees Cook @ 2020-09-10 23:18 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-2-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:02PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> Add a menu entry under "Security options" to enable the "Fork brute
> force attack mitigation" feature.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  security/Kconfig       |  1 +
>  security/fbfam/Kconfig | 10 ++++++++++
>  2 files changed, 11 insertions(+)
>  create mode 100644 security/fbfam/Kconfig
> 
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..00a90e25b8d5 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -290,6 +290,7 @@ config LSM
>  	  If unsure, leave this as the default.
>  
>  source "security/Kconfig.hardening"
> +source "security/fbfam/Kconfig"

Given the layout you've chosen and the interface you've got, I think
this should just be treated like a regular LSM.

>  
>  endmenu
>  
> diff --git a/security/fbfam/Kconfig b/security/fbfam/Kconfig
> new file mode 100644
> index 000000000000..bbe7f6aad369
> --- /dev/null
> +++ b/security/fbfam/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0
> +config FBFAM

To jump on the bikeshed: how about just calling this
FORK_BRUTE_FORCE_DETECTION or FORK_BRUTE, and the directory could be
"brute", etc. "fbfam" doesn't tell anyone anything.

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC PATCH 4/6] security/fbfam: Add a new sysctl to control the crashing rate threshold
From: Kees Cook @ 2020-09-10 23:14 UTC (permalink / raw)
  To: kernel-hardening
  Cc: John Wood, Matthew Wilcox, Jonathan Corbet, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Iurii Zaikin, James Morris, Serge E. Hallyn,
	linux-doc, linux-kernel, linux-fsdevel, linux-security-module
In-Reply-To: <20200910202107.3799376-5-keescook@chromium.org>

On Thu, Sep 10, 2020 at 01:21:05PM -0700, Kees Cook wrote:
> From: John Wood <john.wood@gmx.com>
> 
> This is a previous step to add the detection feature.
> 
> A fork brute force attack will be detected when an application crashes
> quickly. Since, a rate can be defined as a time per fault, add a new
> sysctl to control the crashing rate threshold.
> 
> This way, each system can tune the detection's sensibility adjusting the
> milliseconds per fault. So, if the application's crashing rate falls
> under this threshold an attack will be detected. So, the higher this
> value, the faster an attack will be detected.
> 
> Signed-off-by: John Wood <john.wood@gmx.com>
> ---
>  include/fbfam/fbfam.h   |  4 ++++
>  kernel/sysctl.c         |  9 +++++++++
>  security/fbfam/Makefile |  1 +
>  security/fbfam/fbfam.c  | 11 +++++++++++
>  security/fbfam/sysctl.c | 20 ++++++++++++++++++++
>  5 files changed, 45 insertions(+)
>  create mode 100644 security/fbfam/sysctl.c
> 
> diff --git a/include/fbfam/fbfam.h b/include/fbfam/fbfam.h
> index b5b7d1127a52..2cfe51d2b0d5 100644
> --- a/include/fbfam/fbfam.h
> +++ b/include/fbfam/fbfam.h
> @@ -3,8 +3,12 @@
>  #define _FBFAM_H_
>  
>  #include <linux/sched.h>
> +#include <linux/sysctl.h>
>  
>  #ifdef CONFIG_FBFAM
> +#ifdef CONFIG_SYSCTL
> +extern struct ctl_table fbfam_sysctls[];
> +#endif

Instead of doing the extern and adding to sysctl.c, this can all be done
directly (dynamically) from the fbfam.c file instead.

>  int fbfam_fork(struct task_struct *child);
>  int fbfam_execve(void);
>  int fbfam_exit(void);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 09e70ee2332e..c3b4d737bef3 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -77,6 +77,8 @@
>  #include <linux/uaccess.h>
>  #include <asm/processor.h>
>  
> +#include <fbfam/fbfam.h>
> +
>  #ifdef CONFIG_X86
>  #include <asm/nmi.h>
>  #include <asm/stacktrace.h>
> @@ -2660,6 +2662,13 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> +#endif
> +#ifdef CONFIG_FBFAM
> +	{
> +		.procname	= "fbfam",
> +		.mode		= 0555,
> +		.child		= fbfam_sysctls,
> +	},
>  #endif
>  	{ }
>  };
> diff --git a/security/fbfam/Makefile b/security/fbfam/Makefile
> index f4b9f0b19c44..b8d5751ecea4 100644
> --- a/security/fbfam/Makefile
> +++ b/security/fbfam/Makefile
> @@ -1,2 +1,3 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_FBFAM) += fbfam.o
> +obj-$(CONFIG_SYSCTL) += sysctl.o
> diff --git a/security/fbfam/fbfam.c b/security/fbfam/fbfam.c
> index 0387f95f6408..9be4639b72eb 100644
> --- a/security/fbfam/fbfam.c
> +++ b/security/fbfam/fbfam.c
> @@ -7,6 +7,17 @@
>  #include <linux/refcount.h>
>  #include <linux/slab.h>
>  
> +/**
> + * sysctl_crashing_rate_threshold - Crashing rate threshold.
> + *
> + * The rate's units are in milliseconds per fault.
> + *
> + * A fork brute force attack will be detected if the application's crashing rate
> + * falls under this threshold. So, the higher this value, the faster an attack
> + * will be detected.
> + */
> +unsigned long sysctl_crashing_rate_threshold = 30000;

I would move the sysctls here, instead. (Also, the above should be
const.)

> +
>  /**
>   * struct fbfam_stats - Fork brute force attack mitigation statistics.
>   * @refc: Reference counter.
> diff --git a/security/fbfam/sysctl.c b/security/fbfam/sysctl.c
> new file mode 100644
> index 000000000000..430323ad8e9f
> --- /dev/null
> +++ b/security/fbfam/sysctl.c
> @@ -0,0 +1,20 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/sysctl.h>
> +
> +extern unsigned long sysctl_crashing_rate_threshold;
> +static unsigned long ulong_one = 1;
> +static unsigned long ulong_max = ULONG_MAX;
> +
> +struct ctl_table fbfam_sysctls[] = {
> +	{
> +		.procname	= "crashing_rate_threshold",
> +		.data		= &sysctl_crashing_rate_threshold,
> +		.maxlen		= sizeof(sysctl_crashing_rate_threshold),
> +		.mode		= 0644,
> +		.proc_handler	= proc_doulongvec_minmax,
> +		.extra1		= &ulong_one,
> +		.extra2		= &ulong_max,
> +	},
> +	{ }
> +};

I wouldn't bother splitting this into a separate file. (Just leave it in
fbfam.c)

-- 
Kees Cook

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox