Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [RFC PATCH v3 3/4] X86/sgx: Introduce EMA as a new LSM module
From: Casey Schaufler @ 2019-07-08 23:53 UTC (permalink / raw)
  To: Xing, Cedric, linux-sgx, linux-security-module, selinux, casey
In-Reply-To: <ce4dcce2-88fb-ccec-f173-fc567d9ca006@intel.com>

On 7/8/2019 10:16 AM, Xing, Cedric wrote:
> On 7/8/2019 9:26 AM, Casey Schaufler wrote:
>> In this scheme you use an ema LSM to manage your ema data.
>> A quick sketch looks like:
>>
>>     sgx_something_in() calls
>>         security_enclave_load() calls
>>             ema_enclave_load()
>>             selinux_enclave_load()
>>             otherlsm_enclave_load()
>>
>> Why is this better than:
>>
>>     sgx_something_in() calls
>>         ema_enclave_load()
>>         security_enclave_load() calls
>>             selinux_enclave_load()
>>             otherlsm_enclave_load()
>
> Are you talking about moving EMA somewhere outside LSM?

Yes. That's what I've been saying all along.

> If so, where?

I tried to make it obvious. Put the call to your EMA code
on the line before you call security_enclave_load().

>
>>
>> If you did really want ema to behave like an LSM
>> you would put the file data that SELinux is managing
>> into the ema portion of the blob and provide interfaces
>> for the SELinux (or whoever) to use that. Also, it's
>> an abomination (as I've stated before) for ema to
>> rely on SELinux to provide a file_free() hook for
>> ema's data. If you continue down the LSM route, you
>> need to provide an ema_file_free() hook. You can't
>> count on SELinux to do it for you. If there are multiple
>> LSMs (coming soon!) that use the ema data, they'll all
>> try to free it, and then Bad Things can happen.
>
> I'm afraid you have misunderstood the code. What is kept open and gets closed in selinux_file_free() is the sigstruct file. SELinux uses it to determine the page permissions for enclave pages from anonymous sources. It is a policy choice made inside SELinux and has nothing to do with EMA.

OK.

>
> There's indeed an ema_file_free_security() to free the EMA map for enclaves being closed. EMA does *NOT* rely on any other LSMs to free data for it. The only exception is when an LSM fails enclave_load(), it has to call ema_remove_range() to remove the range being added, which was *not* required originally in v2.

OK.

>
>>> +/**
>>> + * ema - Enclave Memory Area structure for LSM modules
>>
>> LSM modules is redundant. "LSM" or "LSMs" would be better.
>
> Noted
>
>>> diff --git a/security/Makefile b/security/Makefile
>>> index c598b904938f..b66d03a94853 100644
>>> --- a/security/Makefile
>>> +++ b/security/Makefile
>>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA)        += yama/
>>>   obj-$(CONFIG_SECURITY_LOADPIN)        += loadpin/
>>>   obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
>>>   obj-$(CONFIG_CGROUP_DEVICE)        += device_cgroup.o
>>> +obj-$(CONFIG_INTEL_SGX)            += commonema.o
>>
>> The config option and the file name ought to match,
>> or at least be closer.
>
> Just trying to match file names as "capability" uses commoncap.c.

Fine, then you should be using CONFIG_SECURITY_EMA.

>
> Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference?

Make
	CONFIG_SECURITY_EMA
	depends on CONFIG_INTEL_SGX

When another TEE (maybe MIPS_SSRPQ) comes along you can have

	CONFIG_SECURITY_EMA
	depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ
 

>
>>> diff --git a/security/commonema.c b/security/commonema.c
>>
>> Put this in a subdirectory. Please.
>
> Then why is commoncap.c located in this directory? I'm just trying to match the existing convention.

commoncap is not optional. It is a base part of the
security subsystem. ema is optional.


>
>>> +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = {
>>> +    .lbs_file = sizeof(atomic_long_t),
>>> +};
>>
>> If this is ema's data ema must manage it. You *must* have
>> a file_free().
>
> There is one indeed - ema_file_free_security().

I see it now.

>
>>
>>> +
>>> +static atomic_long_t *_map_file(struct file *encl)
>>> +{
>>> +    return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file);
>>
>> I don't trust all the casting going on here, especially since
>> you don't end up with the type you should be returning.
>
> Will change.
>
>>> +}
>>> +
>>> +static struct ema_map *_alloc_map(void)
>>
>> Function header comments, please.
>
> Will add.


^ permalink raw reply

* Re: [RFC PATCH v3 0/4] security/x86/sgx: SGX specific LSM hooks
From: Xing, Cedric @ 2019-07-08 22:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-sgx, linux-security-module, selinux, casey.schaufler,
	jmorris, luto, jethro, greg, sds, jarkko.sakkinen
In-Reply-To: <20190708184955.GC20791@linux.intel.com>

Hi Sean,

What's in my cover letter is my assessment on what's in your series. You 
may disagree. But I don't think it productive until you can prove your 
points in code.

The key points I'm making are:
(1) The impact to user mode code due to UAPI change is more significant 
than you have envisioned.
(2) Your series has implemented less than required in practice.

For #1, regular shared objects don't carry info like whether it contains 
self-modifying code or generates code on the fly. So your requirement of 
"maximal protection" is new, and you should at least put together a 
story to show everyone how it could be met, especially without changing 
build tools.

For #2, SGX2 is imminent, and the upcoming ICX server will support 512GB 
of EPC. So the problems in mprotect() performance and EAUG-on-#PF must 
be solved, let alone other problems. I guess you have to code them up so 
everyone will be able to evaluate whether your approach is really as 
simple as you have claimed.

-Cedric

On 7/8/2019 11:49 AM, Sean Christopherson wrote:
> On Mon, Jul 08, 2019 at 10:49:59AM -0700, Xing, Cedric wrote:
>> On 7/8/2019 8:55 AM, Sean Christopherson wrote:
>>> On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote:
>> True only for SGX1.
>>>>      “maximal protection” at page load time, but such information is NOT always
>>>>      available. For example, Graphene containers may run different applications
>>>>      comprised of different set of executables and/or shared objects. Some of
>>>>      them may contain self-modifying code (or text relocation) while others
>>>>      don’t. The generic enclave loader usually doesn’t have such information so
>>>>      wouldn’t be able to provide it ahead of time.
>>>
>>> I'm unconvinced that it would be remotely difficult to teach an enclave
>>> loader that an enclave or hosted application employs SMC, relocation or
>>> any other behavior that would require declaring RWX on all pages.
>>
>> You've been talking as if "enclave loader" is tailored to the enclave it is
>> loading. But in reality "enclave loader" is usually a library knowing
>> usually nothing about the enclave. How could it know if an enclave contains
>> self-modifying code?
> 
> Given the rarity of SMC, require enclaves to declare "I do SMC"...  The
> Intel SDK already requires the enclave developer to declare heap size,
> stack size, thread affinity, etc...  I have a very hard time believing
> that it can't support SMC and relocation flags.
>   
>>>>    · Inefficient Auditing – Audit logs are supposed to help system
>>>>      administrators to determine the set of minimally needed permissions and to
>>>>      detect abnormal behaviors. But consider the “maximal protection” model, if
>>>>      “maximal protection” is set to be too permissive, then audit log wouldn’t
>>>>      be able to detect anomalies;
>>>
>>> Huh?  Declaring overly permissive protections is only problematic if an
>>> LSM denies the permission, in which case it will generate an accurate
>>> audit log.
>>>
>>> If the enclave/loader "requires" a permission it doesn't actually need,
>>> e.g. EXECDIRTY, then it's a software bug that should be fixed.  I don't
>>> see how this scenario is any different than an application that uses
>>> assembly code without 'noexecstack' and inadvertantly "requires"
>>> EXECSTACK due to triggering "read implies exec".  In both cases the
>>> denied permission is unnecessary due to a userspace application bug.
>>
>> You see, you've been assuming "enclave loader" knows everything and tailored
>> to what it loads in a particular application. But the reality is the loader
>> is generic and probably shared by multiple applications.
> 
> No, I'm assuming that an enclave can communicate its basic needs without
> undue pain.
> 
>> It needs some generic way to figure out the "maximal protection". An
>> implementation could use information embedded in the enclave file, or could
>> just be "configurable". In the former case, you put extra burdens on the build
>> tools, while in the latter case, your audit logs cannot help generating an
>> appropriate configuration.
> 
> I'm contending the "extra burdens" is minimal.
> 
>    if (do_smc || do_relocation)
>            max_prot = RWX;
>    else
>            max_prot = SECINFO.FLAGS;
> 
>>>>      or if “maximal protection” is too restrictive,
>>>>      then audit log cannot identify the file violating the policy.
>>>
>>> Maximal protections that are too restrictive are completely orthogonal to
>>> LSMs as the enclave would fail to run irrespective of LSMs.  This is no
>>> different than specifying the wrong RWX flags in SECINFO, or opening a
>>> file as RO instead of RW.
>>
>> Say loader is configurable. By looking at the log, can an administrator tell
>> which file has too restrictive "maximal protection"?
> 
> Again, this fails irrespective of LSMs.  So the answer is "no", because
> there is no log.  But the admin will never have to deal with this issue
> because the enclave will *never* run, i.e. would unconditionally fail to
> run during initial development.  And the developer has bigger problems if
> they can't debug their own code.
> 
>>>> In either case the audit log cannot fulfill its purposes.

^ permalink raw reply

* Re: [PATCH] KEYS: trusted: allow module init if TPM is inactive or deactivated
From: James Bottomley @ 2019-07-08 20:34 UTC (permalink / raw)
  To: Roberto Sassu, jarkko.sakkinen, zohar, jgg
  Cc: linux-integrity, linux-security-module, keyrings, linux-kernel,
	crazyt2019+lml, tyhicks, nayna, silviu.vlasceanu
In-Reply-To: <20190705163735.11539-1-roberto.sassu@huawei.com>

On Fri, 2019-07-05 at 18:37 +0200, Roberto Sassu wrote:
> Commit c78719203fc6 ("KEYS: trusted: allow trusted.ko to initialize
> w/o a
> TPM") allows the trusted module to be loaded even a TPM is not found
> to
> avoid module dependency problems.
> 
> Unfortunately, this does not completely solve the issue, as there
> could be
> a case where a TPM is found but is not functional (the TPM commands
> return
> an error). Specifically, after the tpm_chip structure is returned by
> tpm_default_chip() in init_trusted(), the execution terminates after
> init_digests() returns -EFAULT (due to the fact that tpm_get_random()
> returns a positive value, but less than TPM_MAX_DIGEST_SIZE).
> 
> This patch fixes the issue by ignoring the TPM_ERR_DEACTIVATED and
> TPM_ERR_DISABLED errors.
> 
> Fixes: 240730437deb ("KEYS: trusted: explicitly use tpm_chip
> structure...")
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
>  drivers/char/tpm/tpm.h  | 2 --
>  include/linux/tpm.h     | 3 +++
>  security/keys/trusted.c | 6 +++++-
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index e503ffc3aa39..a216ac396711 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -54,8 +54,6 @@ enum tpm_addr {
>  
>  #define TPM_WARN_RETRY          0x800
>  #define TPM_WARN_DOING_SELFTEST 0x802
> -#define TPM_ERR_DEACTIVATED     0x6
> -#define TPM_ERR_DISABLED        0x7
>  #define TPM_ERR_INVALID_POSTINIT 38
>  
>  #define TPM_HEADER_SIZE		10
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index 53c0ea9ec9df..efd3ccbb6aee 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -26,6 +26,9 @@
>  #define TPM_DIGEST_SIZE 20	/* Max TPM v1.2 PCR size */
>  #define TPM_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE
>  
> +#define TPM_ERR_DEACTIVATED     0x6
> +#define TPM_ERR_DISABLED        0x7
> +
>  struct tpm_chip;
>  struct trusted_key_payload;
>  struct trusted_key_options;
> diff --git a/security/keys/trusted.c b/security/keys/trusted.c
> index 9a94672e7adc..430d85090b3b 100644
> --- a/security/keys/trusted.c
> +++ b/security/keys/trusted.c
> @@ -389,6 +389,10 @@ static int pcrlock(const int pcrnum)
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	/* This can happen if the TPM is inactive. */
> +	if (!digests)
> +		return -EINVAL;
> +
>  	return tpm_pcr_extend(chip, pcrnum, digests) ? -EINVAL : 0;
>  }
>  
> @@ -1233,7 +1237,7 @@ static int __init init_digests(void)
>  	int i;
>  
>  	ret = tpm_get_random(chip, digest, TPM_MAX_DIGEST_SIZE);

Not a criticism of your patch, but can we please stop doing this. 
Single random number sources are horrendously bad practice because it
gives an attacker a single target to subvert.  We should ensure the TPM
is plugged into the kernel RNG as a source and then take randomness
from the mixed pool so it's harder for an attacker because they have to
subvert all our sources to predict what came out.

James


^ permalink raw reply

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
From: Al Viro @ 2019-07-08 20:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module
In-Reply-To: <20190708180132.GU17978@ZenIV.linux.org.uk>

On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:
> On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:
> 
> > Al you do realize that the TOCTOU you are talking about comes the system
> > call API.  TOMOYO can only be faulted for not playing in their own
> > sandbox and not reaching out and fixing the vfs implementation details.

PS: the fact that mount(2) has been overloaded to hell and back (including
MS_MOVE, which goes back to v2.5.0.5) predates the introduction of ->sb_mount()
and LSM in general (2.5.27).  MS_BIND is 2.4.0-test9pre2.

In all the years since the introduction of ->sb_mount() I've seen zero
questions from LSM folks regarding a sane place for those checks.  What I have
seen was "we want it immediately upon the syscall entry, let the module
figure out what to do" in reply to several times I tried to tell them "folks,
it's called in a bad place; you want the checks applied to objects, not to
raw string arguments".

As it is, we have easily bypassable checks on mount(2) (by way of ->sb_mount();
there are other hooks also in the game for remounts and new mounts).

I see no point whatsoever trying to duplicate ->sb_mount() on the top level
of move_mount(2).  When and if sane checks are agreed upon for that thing,
they certainly should be used both for MS_MOVE case of mount(2) and for
move_mount(2).  And that'll come for free from calling those in do_move_mount().
They won't be the first thing called in mount(2) - we demultiplex first,
decide that we have a move and do pathname resolution on source.  And that's
precisely what we need to avoid the TOCTOU there.

I'm sorry, but this "run the hook at the very beginning, the modules know
better what they want, just give them as close to syscall arguments as
possible before even looking at the flags" model is wrong, plain and simple.

As for the MS_MOVE checks, the arguments are clear enough (two struct path *,
same as what we pass to do_move_mount()).  AFAICS, only tomoyo and
apparmor are trying to do anything for MS_MOVE in ->sb_mount(), and both
look like it should be easy enough to implement what said checks intend
to do (probably - assuming that aa_move_mount() doesn't depend upon
having its kern_path() inside the __begin_current_label_crit_section()/
__end_current_label_crit_section() pair; looks like it shouldn't be,
but that's for apparmor folks to decide).

That's really for LSM folks, though - I've given up on convincing
(or even getting a rationale out of) them on anything related to hook
semantics years ago.

^ permalink raw reply

* Re: [PATCH] KEYS: trusted: allow module init if TPM is inactive or deactivated
From: Tyler Hicks @ 2019-07-08 19:55 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: jarkko.sakkinen, jejb, zohar, jgg, linux-integrity,
	linux-security-module, keyrings, linux-kernel, crazyt2019+lml,
	nayna, silviu.vlasceanu
In-Reply-To: <20190705163735.11539-1-roberto.sassu@huawei.com>

On 2019-07-05 18:37:35, Roberto Sassu wrote:
> Commit c78719203fc6 ("KEYS: trusted: allow trusted.ko to initialize w/o a
> TPM") allows the trusted module to be loaded even a TPM is not found to
> avoid module dependency problems.
> 
> Unfortunately, this does not completely solve the issue, as there could be
> a case where a TPM is found but is not functional (the TPM commands return
> an error). Specifically, after the tpm_chip structure is returned by
> tpm_default_chip() in init_trusted(), the execution terminates after
> init_digests() returns -EFAULT (due to the fact that tpm_get_random()
> returns a positive value, but less than TPM_MAX_DIGEST_SIZE).
> 
> This patch fixes the issue by ignoring the TPM_ERR_DEACTIVATED and
> TPM_ERR_DISABLED errors.
> 
> Fixes: 240730437deb ("KEYS: trusted: explicitly use tpm_chip structure...")
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
>  drivers/char/tpm/tpm.h  | 2 --
>  include/linux/tpm.h     | 3 +++
>  security/keys/trusted.c | 6 +++++-
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index e503ffc3aa39..a216ac396711 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -54,8 +54,6 @@ enum tpm_addr {
>  
>  #define TPM_WARN_RETRY          0x800
>  #define TPM_WARN_DOING_SELFTEST 0x802
> -#define TPM_ERR_DEACTIVATED     0x6
> -#define TPM_ERR_DISABLED        0x7
>  #define TPM_ERR_INVALID_POSTINIT 38
>  
>  #define TPM_HEADER_SIZE		10
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index 53c0ea9ec9df..efd3ccbb6aee 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -26,6 +26,9 @@
>  #define TPM_DIGEST_SIZE 20	/* Max TPM v1.2 PCR size */
>  #define TPM_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE
>  
> +#define TPM_ERR_DEACTIVATED     0x6
> +#define TPM_ERR_DISABLED        0x7
> +
>  struct tpm_chip;
>  struct trusted_key_payload;
>  struct trusted_key_options;
> diff --git a/security/keys/trusted.c b/security/keys/trusted.c
> index 9a94672e7adc..430d85090b3b 100644
> --- a/security/keys/trusted.c
> +++ b/security/keys/trusted.c
> @@ -389,6 +389,10 @@ static int pcrlock(const int pcrnum)
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	/* This can happen if the TPM is inactive. */
> +	if (!digests)
> +		return -EINVAL;
> +
>  	return tpm_pcr_extend(chip, pcrnum, digests) ? -EINVAL : 0;
>  }
>  
> @@ -1233,7 +1237,7 @@ static int __init init_digests(void)
>  	int i;
>  
>  	ret = tpm_get_random(chip, digest, TPM_MAX_DIGEST_SIZE);
> -	if (ret < 0)
> +	if (ret < 0 || ret == TPM_ERR_DEACTIVATED || ret == TPM_ERR_DISABLED)
>  		return ret;

As someone who hasn't looked at much of the TPM code, I would have
expected tpm_get_random() to return a positive value that only ever
indicates the number of random bytes saved to the buffer. From the
function documentation:

  Return: number of random bytes read or a negative error value.

Despite the function documentation and as your patch suggests, I can
see that it is possible for tpm_transmit_cmd() to return
a positive value that's also returned by tpm_get_random() even though it
may not have filled the buffer when the TPM is in an
inactive/deactivated state.

I think there are other callers which are not prepared for positive
return values that indicate a failure to fill the buffer with random
data. For instance, the way that tpm_hwrng_read() is calling
tpm_get_random() looks a little worrisome.

This patch would likely fix the bug reported against eCryptfs
(https://bugzilla.kernel.org/show_bug.cgi?id=203953) but I can't help to
think that callers of tpm_get_random() would benefit from a more
consolidated approach of handling TPM_ERR_* return values rather than
handling them at this single call site.

Tyler

>  	if (ret < TPM_MAX_DIGEST_SIZE)
>  		return -EFAULT;
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH 3/9] security: Add a hook for the point of notification insertion [ver #5]
From: Stephen Smalley @ 2019-07-08 19:13 UTC (permalink / raw)
  To: David Howells, viro
  Cc: Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven,
	Christian Brauner, keyrings, linux-usb, linux-security-module,
	linux-fsdevel, linux-api, linux-block, linux-kernel
In-Reply-To: <156173694190.15137.8939274212328721351.stgit@warthog.procyon.org.uk>

On 6/28/19 11:49 AM, David Howells wrote:
> Add a security hook that allows an LSM to rule on whether a notification
> message is allowed to be inserted into a particular watch queue.
> 
> The hook is given the following information:
> 
>   (1) The credentials of the triggerer (which may be init_cred for a system
>       notification, eg. a hardware error).
> 
>   (2) The credentials of the whoever set the watch.
> 
>   (3) The notification message.

As with the other proposed hooks, it is difficult to evaluate this hook 
without at least one implementation of the hook.  Since Casey is the 
only one requesting this hook, a Smack implementation would be the 
natural choice; I do not intend to implement this hook for SELinux. 
However, by providing this hook, you are in effect taking a position wrt 
the earlier controversy over it, i.e. that application developers must 
deal with the possibility that notifications can be dropped if a 
security module does not permit the triggerer to post the notification 
to the watcher, without any indication to either the triggerer or the 
watcher.  This is a choice you are making by providing this hook.  The 
alternative is to require that permission to set a watch imply the 
ability to receive all notifications for the watched object.  Aside from 
friendliness to application developers, the latter also yields stable, 
sane policy and better performance.

> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Casey Schaufler <casey@schaufler-ca.com>
> cc: Stephen Smalley <sds@tycho.nsa.gov>
> cc: linux-security-module@vger.kernel.org
> ---
> 
>   include/linux/lsm_hooks.h |   10 ++++++++++
>   include/linux/security.h  |   10 ++++++++++
>   security/security.c       |    6 ++++++
>   3 files changed, 26 insertions(+)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index f9d31f6445e4..fd4b2b14e7d0 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1426,6 +1426,12 @@
>    *	from devices (as a global set).
>    *	@watch: The watch object
>    *
> + * @post_notification:
> + *	Check to see if a watch notification can be posted to a particular
> + *	queue.
> + *	@w_cred: The credentials of the whoever set the watch.
> + *	@cred: The event-triggerer's credentials
> + *	@n: The notification being posted
>    *
>    * Security hooks for using the eBPF maps and programs functionalities through
>    * eBPF syscalls.
> @@ -1705,6 +1711,9 @@ union security_list_options {
>   #ifdef CONFIG_WATCH_QUEUE
>   	int (*watch_key)(struct watch *watch, struct key *key);
>   	int (*watch_devices)(struct watch *watch);
> +	int (*post_notification)(const struct cred *w_cred,
> +				 const struct cred *cred,
> +				 struct watch_notification *n);
>   #endif /* CONFIG_WATCH_QUEUE */
>   
>   #ifdef CONFIG_SECURITY_NETWORK
> @@ -1985,6 +1994,7 @@ struct security_hook_heads {
>   #ifdef CONFIG_WATCH_QUEUE
>   	struct hlist_head watch_key;
>   	struct hlist_head watch_devices;
> +	struct hlist_head post_notification;
>   #endif /* CONFIG_WATCH_QUEUE */
>   #ifdef CONFIG_SECURITY_NETWORK
>   	struct hlist_head unix_stream_connect;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 540863678355..5c074bf18bea 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -58,6 +58,7 @@ struct fs_context;
>   struct fs_parameter;
>   enum fs_value_type;
>   struct watch;
> +struct watch_notification;
>   
>   /* Default (no) options for the capable function */
>   #define CAP_OPT_NONE 0x0
> @@ -396,6 +397,9 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
>   #ifdef CONFIG_WATCH_QUEUE
>   int security_watch_key(struct watch *watch, struct key *key);
>   int security_watch_devices(struct watch *watch);
> +int security_post_notification(const struct cred *w_cred,
> +			       const struct cred *cred,
> +			       struct watch_notification *n);
>   #endif /* CONFIG_WATCH_QUEUE */
>   #else /* CONFIG_SECURITY */
>   
> @@ -1218,6 +1222,12 @@ static inline int security_watch_devices(struct watch *watch)
>   {
>   	return 0;
>   }
> +static inline int security_post_notification(const struct cred *w_cred,
> +					     const struct cred *cred,
> +					     struct watch_notification *n)
> +{
> +	return 0;
> +}
>   #endif /* CONFIG_WATCH_QUEUE */
>   #endif	/* CONFIG_SECURITY */
>   
> diff --git a/security/security.c b/security/security.c
> index 2c9919226ad1..459e87d55ac9 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1928,6 +1928,12 @@ int security_watch_devices(struct watch *watch)
>   	return call_int_hook(watch_devices, 0, watch);
>   }
>   
> +int security_post_notification(const struct cred *w_cred,
> +			       const struct cred *cred,
> +			       struct watch_notification *n)
> +{
> +	return call_int_hook(post_notification, 0, w_cred, cred, n);
> +}
>   #endif /* CONFIG_WATCH_QUEUE */
>   
>   #ifdef CONFIG_SECURITY_NETWORK
> 


^ permalink raw reply

* Re: [RFC PATCH v3 0/4] security/x86/sgx: SGX specific LSM hooks
From: Sean Christopherson @ 2019-07-08 18:49 UTC (permalink / raw)
  To: Xing, Cedric
  Cc: linux-sgx, linux-security-module, selinux, casey.schaufler,
	jmorris, luto, jethro, greg, sds, jarkko.sakkinen
In-Reply-To: <8ba6c32d-cedc-53fd-9e86-f78be0067ac3@intel.com>

On Mon, Jul 08, 2019 at 10:49:59AM -0700, Xing, Cedric wrote:
> On 7/8/2019 8:55 AM, Sean Christopherson wrote:
> >On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote:
> True only for SGX1.
> >>     “maximal protection” at page load time, but such information is NOT always
> >>     available. For example, Graphene containers may run different applications
> >>     comprised of different set of executables and/or shared objects. Some of
> >>     them may contain self-modifying code (or text relocation) while others
> >>     don’t. The generic enclave loader usually doesn’t have such information so
> >>     wouldn’t be able to provide it ahead of time.
> >
> >I'm unconvinced that it would be remotely difficult to teach an enclave
> >loader that an enclave or hosted application employs SMC, relocation or
> >any other behavior that would require declaring RWX on all pages.
> 
> You've been talking as if "enclave loader" is tailored to the enclave it is
> loading. But in reality "enclave loader" is usually a library knowing
> usually nothing about the enclave. How could it know if an enclave contains
> self-modifying code?

Given the rarity of SMC, require enclaves to declare "I do SMC"...  The
Intel SDK already requires the enclave developer to declare heap size,
stack size, thread affinity, etc...  I have a very hard time believing
that it can't support SMC and relocation flags.
 
> >>   · Inefficient Auditing – Audit logs are supposed to help system
> >>     administrators to determine the set of minimally needed permissions and to
> >>     detect abnormal behaviors. But consider the “maximal protection” model, if
> >>     “maximal protection” is set to be too permissive, then audit log wouldn’t
> >>     be able to detect anomalies;
> >
> >Huh?  Declaring overly permissive protections is only problematic if an
> >LSM denies the permission, in which case it will generate an accurate
> >audit log.
> >
> >If the enclave/loader "requires" a permission it doesn't actually need,
> >e.g. EXECDIRTY, then it's a software bug that should be fixed.  I don't
> >see how this scenario is any different than an application that uses
> >assembly code without 'noexecstack' and inadvertantly "requires"
> >EXECSTACK due to triggering "read implies exec".  In both cases the
> >denied permission is unnecessary due to a userspace application bug.
> 
> You see, you've been assuming "enclave loader" knows everything and tailored
> to what it loads in a particular application. But the reality is the loader
> is generic and probably shared by multiple applications. 

No, I'm assuming that an enclave can communicate its basic needs without
undue pain.

> It needs some generic way to figure out the "maximal protection". An
> implementation could use information embedded in the enclave file, or could
> just be "configurable". In the former case, you put extra burdens on the build
> tools, while in the latter case, your audit logs cannot help generating an
> appropriate configuration.

I'm contending the "extra burdens" is minimal.

  if (do_smc || do_relocation)
          max_prot = RWX;
  else
          max_prot = SECINFO.FLAGS;

> >>     or if “maximal protection” is too restrictive,
> >>     then audit log cannot identify the file violating the policy.
> >
> >Maximal protections that are too restrictive are completely orthogonal to
> >LSMs as the enclave would fail to run irrespective of LSMs.  This is no
> >different than specifying the wrong RWX flags in SECINFO, or opening a
> >file as RO instead of RW.
> 
> Say loader is configurable. By looking at the log, can an administrator tell
> which file has too restrictive "maximal protection"?

Again, this fails irrespective of LSMs.  So the answer is "no", because
there is no log.  But the admin will never have to deal with this issue
because the enclave will *never* run, i.e. would unconditionally fail to
run during initial development.  And the developer has bigger problems if
they can't debug their own code.

> >>In either case the audit log cannot fulfill its purposes.

^ permalink raw reply

* Re: [PATCH 2/9] security: Add hooks to rule on setting a watch [ver #5]
From: Stephen Smalley @ 2019-07-08 18:46 UTC (permalink / raw)
  To: David Howells, viro
  Cc: Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven,
	Christian Brauner, keyrings, linux-usb, linux-security-module,
	linux-fsdevel, linux-api, linux-block, linux-kernel
In-Reply-To: <156173692760.15137.9636883182556029747.stgit@warthog.procyon.org.uk>

On 6/28/19 11:48 AM, David Howells wrote:
> Add security hooks that will allow an LSM to rule on whether or not a watch
> may be set.  More than one hook is required as the watches watch different
> types of object.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Casey Schaufler <casey@schaufler-ca.com>
> cc: Stephen Smalley <sds@tycho.nsa.gov>
> cc: linux-security-module@vger.kernel.org
> ---
> 
>   include/linux/lsm_hooks.h |   22 ++++++++++++++++++++++
>   include/linux/security.h  |   15 +++++++++++++++
>   security/security.c       |   13 +++++++++++++
>   3 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 47f58cfb6a19..f9d31f6445e4 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1413,6 +1413,20 @@
>    *	@ctx is a pointer in which to place the allocated security context.
>    *	@ctxlen points to the place to put the length of @ctx.
>    *
> + * Security hooks for the general notification queue:
> + *
> + * @watch_key:
> + *	Check to see if a process is allowed to watch for event notifications
> + *	from a key or keyring.
> + *	@watch: The watch object
> + *	@key: The key to watch.
> + *
> + * @watch_devices:
> + *	Check to see if a process is allowed to watch for event notifications
> + *	from devices (as a global set).
> + *	@watch: The watch object

It is difficult to evaluate these without at least one implementation of 
each hook.  I am unclear as to how any security module would use the 
watch argument, since it has no security field/blob and does not appear 
to contain any information that would be relevant to deciding whether or 
not to permit the watch to be set.

> + *
> + *
>    * Security hooks for using the eBPF maps and programs functionalities through
>    * eBPF syscalls.
>    *
> @@ -1688,6 +1702,10 @@ union security_list_options {
>   	int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen);
>   	int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen);
>   	int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen);
> +#ifdef CONFIG_WATCH_QUEUE
> +	int (*watch_key)(struct watch *watch, struct key *key);
> +	int (*watch_devices)(struct watch *watch);
> +#endif /* CONFIG_WATCH_QUEUE */
>   
>   #ifdef CONFIG_SECURITY_NETWORK
>   	int (*unix_stream_connect)(struct sock *sock, struct sock *other,
> @@ -1964,6 +1982,10 @@ struct security_hook_heads {
>   	struct hlist_head inode_notifysecctx;
>   	struct hlist_head inode_setsecctx;
>   	struct hlist_head inode_getsecctx;
> +#ifdef CONFIG_WATCH_QUEUE
> +	struct hlist_head watch_key;
> +	struct hlist_head watch_devices;
> +#endif /* CONFIG_WATCH_QUEUE */
>   #ifdef CONFIG_SECURITY_NETWORK
>   	struct hlist_head unix_stream_connect;
>   	struct hlist_head unix_may_send;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 659071c2e57c..540863678355 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -57,6 +57,7 @@ struct mm_struct;
>   struct fs_context;
>   struct fs_parameter;
>   enum fs_value_type;
> +struct watch;
>   
>   /* Default (no) options for the capable function */
>   #define CAP_OPT_NONE 0x0
> @@ -392,6 +393,10 @@ void security_inode_invalidate_secctx(struct inode *inode);
>   int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
>   int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
>   int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
> +#ifdef CONFIG_WATCH_QUEUE
> +int security_watch_key(struct watch *watch, struct key *key);
> +int security_watch_devices(struct watch *watch);
> +#endif /* CONFIG_WATCH_QUEUE */
>   #else /* CONFIG_SECURITY */
>   
>   static inline int call_lsm_notifier(enum lsm_event event, void *data)
> @@ -1204,6 +1209,16 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32
>   {
>   	return -EOPNOTSUPP;
>   }
> +#ifdef CONFIG_WATCH_QUEUE
> +static inline int security_watch_key(struct watch *watch, struct key *key)
> +{
> +	return 0;
> +}
> +static inline int security_watch_devices(struct watch *watch)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_WATCH_QUEUE */
>   #endif	/* CONFIG_SECURITY */
>   
>   #ifdef CONFIG_SECURITY_NETWORK
> diff --git a/security/security.c b/security/security.c
> index 613a5c00e602..2c9919226ad1 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1917,6 +1917,19 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
>   }
>   EXPORT_SYMBOL(security_inode_getsecctx);
>   
> +#ifdef CONFIG_WATCH_QUEUE
> +int security_watch_key(struct watch *watch, struct key *key)
> +{
> +	return call_int_hook(watch_key, 0, watch, key);
> +}
> +
> +int security_watch_devices(struct watch *watch)
> +{
> +	return call_int_hook(watch_devices, 0, watch);
> +}
> +
> +#endif /* CONFIG_WATCH_QUEUE */
> +
>   #ifdef CONFIG_SECURITY_NETWORK
>   
>   int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk)
> 


^ permalink raw reply

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
From: Al Viro @ 2019-07-08 18:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module
In-Reply-To: <20190708180132.GU17978@ZenIV.linux.org.uk>

On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:

> Right now anyone relying upon DAC enforced for MS_MOVE has worse problems

That'd be MAC, of course.  Sorry.

^ permalink raw reply

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
From: Al Viro @ 2019-07-08 18:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module
In-Reply-To: <874l3wo3gq.fsf@xmission.com>

On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:

> Al you do realize that the TOCTOU you are talking about comes the system
> call API.  TOMOYO can only be faulted for not playing in their own
> sandbox and not reaching out and fixing the vfs implementation details.
>
> Userspace has always had to very careful to only mount filesystems
> on paths that root completely controls and won't change.

That has nothing whatsoever to do with the path where you are mounting
something.  _That_ is actually looked up before ->sb_mount() gets called;
no TOCTOU there.

The thing where ->sb_mount() is fucked by design is its handling of
	* device name
	* old tree in mount --bind
	* old tree in mount --move
	* things like journal name (not that any of the instances had tried
to do anything with that)

All of those *do* have TOCTOU, and that's an inevitable result of the
idiotic hook fetishism of LSM design.  Instead of "we want something
to happen when such-and-such predicate is about to change", it's
"lemme run my code, the earlier the better, I don't care about any
damn predicates, it's all too complicated anyway, whaddya mean
racy?"

Any time you have pathname resolution done twice, it's a built-in race.
If you want *ALL* checks on mount(2) to be done before the mean, nasty
kernel code gets to decide anything (bind/move/mount/etc. all squashed
together, just let us have at the syscall arguments, mmkay?) - that's
precisely what you get.

And no, that TOCTOU is not in syscall API.  "open() of an untrusted
pathname may end up trying to open hell knows what" is one thing;
"open() of an untrusted pathname may apply MAC checks to one object
and open something entirely different" is another.  The former is
inherent to syscall API.  The latter would be a badly fucked up
implementation (we don't have that issue on open(2), thankfully).

To make it clear, TOMOYO is not at fault here; LSM "architecture" is.
Note, BTW, that TOMOYO checks there do *NOT* limit the input pathname
at all - only the destination of the first pathwalk.  Repeating it
may easily lead to an entirely different place.

Canonicalized pathname is derived from pathwalk result; having concluded
that it's perfectly fine for the operation requested is pure security
theatre - it
	* says nothing about the trustedness of the original pathname
	* may have nothing whatsoever to the object yielded by the
second pathwalk, which is what'll end up actually used.
It's not even "this thing walks through /proc, and thus not to be trusted
to be stable" - the checks won't notice where the damn thing had been.

When somebody proposes _useful_ MAC for mount --move (and that really
can't be done at the level of syscall entry - we need to have already
figured out that with given combination of flags the 1st argument of
mount(2) will be a pathname *and* already looked it up), sure - it
will be added to do_move_mount(), which is where we have all lookups
done, and apply both for mount() and move_mount().

Right now anyone relying upon DAC enforced for MS_MOVE has worse problems
than "attacker will use move_mount(2) and bypass my policy" - the same
attacker can bloody well bypass those with nothing more exotic than
clone(2) and dup2(2) (and mount(2), of course).

And it's not just MS_MOVE (or MS_BIND).  Anyone trying to prevent
mounting e.g. ext2 from untrusted device and do that on the level of
->sb_mount() *is* *bloody* *well* *fucked*.  ->sb_mount() is simply
the wrong place for that.

^ permalink raw reply

* Re: [RFC PATCH v3 0/4] security/x86/sgx: SGX specific LSM hooks
From: Xing, Cedric @ 2019-07-08 17:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-sgx, linux-security-module, selinux, casey.schaufler,
	jmorris, luto, jethro, greg, sds, jarkko.sakkinen
In-Reply-To: <20190708155524.GD20433@linux.intel.com>

On 7/8/2019 8:55 AM, Sean Christopherson wrote:
> On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote:
> ...
> 
>> different FSMs to govern page protection transitions. Implementation wise, his
>> model also imposes unwanted restrictions specifically to SGX2, such as:
>>    · Complicated/Restricted UAPI – Enclave loaders are required to provide
> 
> I don't think "complicated" is a fair assessment.  For SGX1 enclaves it's
> literally a direct propagation of the SECINFO RWX flags.

True only for SGX1.
>>      “maximal protection” at page load time, but such information is NOT always
>>      available. For example, Graphene containers may run different applications
>>      comprised of different set of executables and/or shared objects. Some of
>>      them may contain self-modifying code (or text relocation) while others
>>      don’t. The generic enclave loader usually doesn’t have such information so
>>      wouldn’t be able to provide it ahead of time.
> 
> I'm unconvinced that it would be remotely difficult to teach an enclave
> loader that an enclave or hosted application employs SMC, relocation or
> any other behavior that would require declaring RWX on all pages.

You've been talking as if "enclave loader" is tailored to the enclave it 
is loading. But in reality "enclave loader" is usually a library knowing 
usually nothing about the enclave. How could it know if an enclave 
contains self-modifying code?

>>    · Inefficient Auditing – Audit logs are supposed to help system
>>      administrators to determine the set of minimally needed permissions and to
>>      detect abnormal behaviors. But consider the “maximal protection” model, if
>>      “maximal protection” is set to be too permissive, then audit log wouldn’t
>>      be able to detect anomalies;
> 
> Huh?  Declaring overly permissive protections is only problematic if an
> LSM denies the permission, in which case it will generate an accurate
> audit log.
> 
> If the enclave/loader "requires" a permission it doesn't actually need,
> e.g. EXECDIRTY, then it's a software bug that should be fixed.  I don't
> see how this scenario is any different than an application that uses
> assembly code without 'noexecstack' and inadvertantly "requires"
> EXECSTACK due to triggering "read implies exec".  In both cases the
> denied permission is unnecessary due to a userspace application bug.

You see, you've been assuming "enclave loader" knows everything and 
tailored to what it loads in a particular application. But the reality 
is the loader is generic and probably shared by multiple applications. 
It needs some generic way to figure out the "maximal protection". An 
implementation could use information embedded in the enclave file, or 
could just be "configurable". In the former case, you put extra burdens 
on the build tools, while in the latter case, your audit logs cannot 
help generating an appropriate configuration.

>>      or if “maximal protection” is too restrictive,
>>      then audit log cannot identify the file violating the policy.
> 
> Maximal protections that are too restrictive are completely orthogonal to
> LSMs as the enclave would fail to run irrespective of LSMs.  This is no
> different than specifying the wrong RWX flags in SECINFO, or opening a
> file as RO instead of RW.

Say loader is configurable. By looking at the log, can an administrator 
tell which file has too restrictive "maximal protection"?

>> In either case the audit log cannot fulfill its purposes.
>>    · Inability to support #PF driven EPC allocation in SGX2 – For those
>>      unfamiliar with SGX2 software flows, an SGX2 enclave requests a page by
>>      issuing EACCEPT on the address that a new page is wanted, and the resulted
>>      #PF is expected to be handled by the kernel by EAUG’ing an EPC page at the
>>      fault address, and then the enclave would be resumed and the faulting
>>      EACCEPT retried, and succeed. The key requirement is to allow mmap()’ing
>>      non-existing enclave pages so that the SGX module/subsystem could respond
>>      to #PFs by EAUG’ing new pages. Sean’s implementation doesn’t allow
>>      mmap()’ing non-existing pages for variety of reasons and therefore blocks
>>      this major SGX2 usage.
> 
> This is simply wrong.  The key requirement in the theoretical EAUG scheme
> is to mmap() pages that have not been added to the _hardware_ maintained
> enclave.  The pages (or some optimized representation of a range of pages)
> would exist in the kernel's software mode of the enclave.

You are right. Code can always change. My assessment was made on what's 
in your patch.

The key point here is your patch is more complicated than it seems 
because you've been hand-waving on imminent requirements.

^ permalink raw reply

* Re: [PATCH v5 04/12] S.A.R.A.: generic DFA for string matching
From: Jann Horn @ 2019-07-08 17:37 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: kernel list, Kernel Hardening, Linux-MM, linux-security-module,
	Alexander Viro, Brad Spengler, Casey Schaufler, Christoph Hellwig,
	Kees Cook, PaX Team, Serge E. Hallyn, Thomas Gleixner,
	James Morris, John Johansen
In-Reply-To: <CAJHCu1+35GhGJY8jDMPEU8meYhJTVgvzY5sJgVCuLrxCoGgHEg@mail.gmail.com>

On Sun, Jul 7, 2019 at 6:01 PM Salvatore Mesoraca
<s.mesoraca16@gmail.com> wrote:
> Jann Horn <jannh@google.com> wrote:
> > Throughout the series, you are adding files that both add an SPDX
> > identifier and have a description of the license in the comment block
> > at the top. The SPDX identifier already identifies the license.
>
> I added the license description because I thought it was required anyway.
> IANAL, if you tell me that SPDX it's enough I'll remove the description.

IANAL too, but Documentation/process/license-rules.rst says:

====
The common way of expressing the license of a source file is to add the
matching boilerplate text into the top comment of the file.  Due to
formatting, typos etc. these "boilerplates" are hard to validate for
tools which are used in the context of license compliance.

An alternative to boilerplate text is the use of Software Package Data
Exchange (SPDX) license identifiers in each source file.  SPDX license
identifiers are machine parsable and precise shorthands for the license
under which the content of the file is contributed.  SPDX license
identifiers are managed by the SPDX Workgroup at the Linux Foundation and
have been agreed on by partners throughout the industry, tool vendors, and
legal teams.  For further information see https://spdx.org/

The Linux kernel requires the precise SPDX identifier in all source files.
The valid identifiers used in the kernel are explained in the section
`License identifiers`_ and have been retrieved from the official SPDX
license list at https://spdx.org/licenses/ along with the license texts.
====

and there have been lots of conversion patches to replace license
boilerplate headers with SPDX identifiers, see e.g. all the "treewide:
Replace GPLv2 boilerplate/reference with SPDX" patches.

^ permalink raw reply

* Re: [RFC PATCH v4 00/12] security: x86/sgx: SGX vs. LSM
From: Xing, Cedric @ 2019-07-08 17:33 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen
  Cc: linux-sgx, linux-security-module, selinux, Bill Roberts,
	Casey Schaufler, James Morris, Dave Hansen, Andy Lutomirski,
	Jethro Beekman, Dr . Greg Wettstein, Stephen Smalley
In-Reply-To: <20190708172930.GA20791@linux.intel.com>

On 7/8/2019 10:29 AM, Sean Christopherson wrote:
> On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote:
>> On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote:
>>
>> I still don't get why we need this whole mess and do not simply admit
>> that there are two distinct roles:
>>
>> 1. Creator
>> 2. User
> 
> Because SELinux has existing concepts of EXECMEM and EXECMOD.
> 
>> In the SELinux context Creator needs FILE__WRITE and FILE__EXECUTE but
>> User does not. It just gets the fd from the Creator. I'm sure that all
>> the SGX2 related functionality can be solved somehow in this role
>> playing game.
>>
>> An example would be the usual case where enclave is actually a loader
>> that loads the actual piece of software that one wants to run. Things
>> simply need to be designed in a way the Creator runs the loader part.
>> These are non-trivial problems but oddball security model is not going
>> to make them disappear - on the contrary it will make designing user
>> space only more complicated.
>>
>> I think this is classical example of when something overly complicated
>> is invented in the kernel only to realize that it should be solved in
>> the user space.

Are you talking about changing enclave loaders in user mode? That'd 
break all existing code. I don't think we shall ever consider this approach.

>>
>> It would not be like the only use case where some kind of privileged
>> daemon is used for managing some a kernel provided resource.
>>
>> I think a really good conclusion from this discussion that has taken two
>> months is to realize that nothing needs to be done in this area (except
>> *maybe* noexec check).
> 
> Hmm, IMO we need to support at least equivalents to EXECMEM and EXECMOD.
> 
> That being said, we can do so without functional changes to the SGX uapi,
> e.g. add reserved fields so that the initial uapi can be extended *if* we
> decide to go with the "userspace provides maximal protections" path, and
> use the EPCM permissions as the maximal protections for the initial
> upstreaming.
> 
> That'd give us a minimal implemenation for initial upstreaming and would
> eliminate Cedric's blocking complaint.  The "whole mess" of whitelisting,
> blacklisting and SGX2 support would be deferred until post-upstreaming.
> 

^ permalink raw reply

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
From: Eric W. Biederman @ 2019-07-08 17:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module
In-Reply-To: <20190708131831.GT17978@ZenIV.linux.org.uk>

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Mon, Jul 08, 2019 at 09:02:10PM +0900, Tetsuo Handa wrote:
>> Hello, David Howells.
>> 
>> I realized via https://lwn.net/Articles/792622/ that a new set of
>> system calls for filesystem mounting has been added to Linux 5.2. But
>> I feel that LSM modules are not ready to support these system calls.
>> 
>> An example is move_mount() added by this patch. This patch added
>> security_move_mount() LSM hook but none of in-tree LSM modules are
>> providing "LSM_HOOK_INIT(move_mount, ...)" entry. Therefore, currently
>> security_move_mount() is a no-op. At least for TOMOYO, I want to check
>> mount manipulations caused by system calls because allowing mounts on
>> arbitrary location is not acceptable for pathname based access control.
>> What happened? I want TOMOYO to perform similar checks like mount() does.
>
> That would be like tomoyo_check_mount_acl(), right?
>         if (need_dev) {
>                 /* Get mount point or device file. */
>                 if (!dev_name || kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
>                         error = -ENOENT;
>                         goto out;
>                 }
>                 obj.path1 = path;
>                 requested_dev_name = tomoyo_realpath_from_path(&path);
>                 if (!requested_dev_name) {
>                         error = -ENOENT;
>                         goto out;
>                 }
>         } else {
> is an obvious crap for *ALL* cases.  You are doing pathname resolution,
> followed by normalization and checks.  Then the result of said pathname
> resolution is thrown out and it's redone (usually by something in fs/super.c).
> Results of _that_ get used.
>
> Could you spell TOCTOU?  And yes, exploiting that takes a lot less than
> being able to do mount(2) in the first place - just pass it
> /proc/self/fd/69/<some acceptable path>/. with descriptor refering to
> opened root directory.  With ~/<some acceptable path> being a symlink
> to whatever you actually want to hit.  And descriptor 42 being your
> opened homedir.  Now have that call of mount(2) overlap with dup2(42, 69)
> from another thread sharing your descriptor table.  It doesn't take much
> to get the timing right, especially if you can arrange for some other
> activity frequently hitting namespace_sem at least shared (e.g. reading
> /proc/mounts in a loop from another process); that's likely to stall
> mount(2) at the point of lock_mount(), which comes *AFTER* the point
> where LSM hook is stuck into.
>
> Again, *ANY* checks on "dev_name" in ->sb_mount() instances are so much
> snake oil.  Always had been.

Al you do realize that the TOCTOU you are talking about comes the system
call API.  TOMOYO can only be faulted for not playing in their own
sandbox and not reaching out and fixing the vfs implementation details.
Userspace has always had to very careful to only mount filesystems
on paths that root completely controls and won't change.

Further system calls for manipulating the mount tree have historically
done a crap job of validating their inputs.  Relying on the fact that
only root can call them.  So the idea of guarding against root doing
something silly was silly.

So I figure at the end of the day if the new security hooks for the new
mount system calls don't make it possible to remove the TOCTOU that is
on you and Dave.  You two touched that code last after all.

Not updating the new security hooks to at least do as good a job as the
old security hooks is the definition of regression.

So Al.  Please simmer down and take the valid criticism.

Eric

^ permalink raw reply

* Re: [RFC PATCH v4 00/12] security: x86/sgx: SGX vs. LSM
From: Sean Christopherson @ 2019-07-08 17:29 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, linux-security-module, selinux, Bill Roberts,
	Casey Schaufler, James Morris, Dave Hansen, Cedric Xing,
	Andy Lutomirski, Jethro Beekman, Dr . Greg Wettstein,
	Stephen Smalley
In-Reply-To: <20190705160549.tzsck5ho5kvtdhit@linux.intel.com>

On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote:
> 
> I still don't get why we need this whole mess and do not simply admit
> that there are two distinct roles:
> 
> 1. Creator
> 2. User

Because SELinux has existing concepts of EXECMEM and EXECMOD.

> In the SELinux context Creator needs FILE__WRITE and FILE__EXECUTE but
> User does not. It just gets the fd from the Creator. I'm sure that all
> the SGX2 related functionality can be solved somehow in this role
> playing game.
> 
> An example would be the usual case where enclave is actually a loader
> that loads the actual piece of software that one wants to run. Things
> simply need to be designed in a way the Creator runs the loader part.
> These are non-trivial problems but oddball security model is not going
> to make them disappear - on the contrary it will make designing user
> space only more complicated.
> 
> I think this is classical example of when something overly complicated
> is invented in the kernel only to realize that it should be solved in
> the user space.
> 
> It would not be like the only use case where some kind of privileged
> daemon is used for managing some a kernel provided resource.
> 
> I think a really good conclusion from this discussion that has taken two
> months is to realize that nothing needs to be done in this area (except
> *maybe* noexec check).

Hmm, IMO we need to support at least equivalents to EXECMEM and EXECMOD.

That being said, we can do so without functional changes to the SGX uapi,
e.g. add reserved fields so that the initial uapi can be extended *if* we
decide to go with the "userspace provides maximal protections" path, and
use the EPCM permissions as the maximal protections for the initial
upstreaming.

That'd give us a minimal implemenation for initial upstreaming and would
eliminate Cedric's blocking complaint.  The "whole mess" of whitelisting,
blacklisting and SGX2 support would be deferred until post-upstreaming.

^ permalink raw reply

* Re: [RFC PATCH v4 04/12] x86/sgx: Require userspace to define enclave pages' protection bits
From: Xing, Cedric @ 2019-07-08 17:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jarkko Sakkinen, linux-sgx@vger.kernel.org,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	Roberts, William C, Schaufler, Casey, James Morris, Hansen, Dave,
	Andy Lutomirski, Jethro Beekman, Dr . Greg Wettstein,
	Stephen Smalley
In-Reply-To: <20190708163431.GF20433@linux.intel.com>

On 7/8/2019 9:34 AM, Sean Christopherson wrote:
> On Fri, Jun 21, 2019 at 09:42:54AM -0700, Xing, Cedric wrote:
>>> From: Christopherson, Sean J
>>> Sent: Wednesday, June 19, 2019 3:24 PM
>>>
>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index
>>> 6dba9f282232..67a3babbb24d 100644
>>> --- a/arch/x86/include/uapi/asm/sgx.h
>>> +++ b/arch/x86/include/uapi/asm/sgx.h
>>> @@ -35,15 +35,17 @@ struct sgx_enclave_create  {
>>>    * @src:	address for the page data
>>>    * @secinfo:	address for the SECINFO data
>>>    * @mrmask:	bitmask for the measured 256 byte chunks
>>> + * @prot:	maximal PROT_{READ,WRITE,EXEC} protections for the page
>>>    */
>>>   struct sgx_enclave_add_page {
>>>   	__u64	addr;
>>>   	__u64	src;
>>>   	__u64	secinfo;
>>> -	__u64	mrmask;
>>> +	__u16	mrmask;
>>> +	__u8	prot;
>>> +	__u8	pad;
>>>   };
>>
>> Given EPCM permissions cannot change in SGX1, these maximal PROT_* flags can
>> be the same as EPCM permissions, so don't have to be specified by user code
>> until SGX2. Given we don't have a clear picture on how SGX2 will work yet, I
>> think we shall take "prot" off until it is proven necessary.
> 
> I'm ok with deriving the maximal protections from SECINFO, so long as we
> acknowledge that we're preventing userspace from utilizing EMODPE (until
> the kernel supports SGX2).

I think that's alright.

>>> diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c
>>> b/arch/x86/kernel/cpu/sgx/driver/main.c
>>> index 29384cdd0842..dabfe2a7245a 100644
>>> --- a/arch/x86/kernel/cpu/sgx/driver/main.c
>>> +++ b/arch/x86/kernel/cpu/sgx/driver/main.c
>>> @@ -93,15 +93,64 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,  }
>>> #endif
>>>
>>> +/*
>>> + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages
>>> + * covered by the specific VMA.  A non-existent (or yet to be added)
>>> +enclave
>>> + * page is considered to have no RWX permissions, i.e. is inaccessible.
>>> + */
>>> +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl,
>>> +				     struct vm_area_struct *vma)
>>> +{
>>> +	unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC;
>>> +	unsigned long idx, idx_start, idx_end;
>>> +	struct sgx_encl_page *page;
>>> +
>>> +	idx_start = PFN_DOWN(vma->vm_start);
>>> +	idx_end = PFN_DOWN(vma->vm_end - 1);
>>> +
>>> +	for (idx = idx_start; idx <= idx_end; ++idx) {
>>> +		/*
>>> +		 * No need to take encl->lock, vm_prot_bits is set prior to
>>> +		 * insertion and never changes, and racing with adding pages is
>>> +		 * a userspace bug.
>>> +		 */
>>> +		rcu_read_lock();
>>> +		page = radix_tree_lookup(&encl->page_tree, idx);
>>> +		rcu_read_unlock();
>>
>> This loop iterates through every page in the range, which could be very slow
>> if the range is large.
> 
> At this point I'm shooting for functional correctness and minimal code
> changes.  Optimizations will be in order at some point, just not now.

I was trying to point out in this thread that your approach isn't as 
simple as it looks lik

>>> +
>>> +		/* Do not allow R|W|X to a non-existent page. */
>>> +		if (!page)
>>> +			allowed_rwx = 0;
>>> +		else
>>> +			allowed_rwx &= page->vm_prot_bits;
>>> +		if (!allowed_rwx)
>>> +			break;
>>> +	}
>>> +
>>> +	return allowed_rwx;
>>> +}
>>> +
>>>   static int sgx_mmap(struct file *file, struct vm_area_struct *vma)  {
>>>   	struct sgx_encl *encl = file->private_data;
>>> +	unsigned long allowed_rwx;
>>>   	int ret;
>>>
>>> +	allowed_rwx = sgx_allowed_rwx(encl, vma);
>>> +	if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx)
>>> +		return -EACCES;
>>> +
>>>   	ret = sgx_encl_mm_add(encl, vma->vm_mm);
>>>   	if (ret)
>>>   		return ret;
>>>
>>> +	if (!(allowed_rwx & VM_READ))
>>> +		vma->vm_flags &= ~VM_MAYREAD;
>>> +	if (!(allowed_rwx & VM_WRITE))
>>> +		vma->vm_flags &= ~VM_MAYWRITE;
>>> +	if (!(allowed_rwx & VM_EXEC))
>>> +		vma->vm_flags &= ~VM_MAYEXEC;
>>> +
>>
>> Say a range comprised of a RW sub-range and a RX sub-range is being mmap()'ed
>> as R here. It'd succeed but mprotect(<RW sub-range>, RW) afterwards will fail
>> because VM_MAYWRITE is cleared here. However, if those two sub-ranges are
>> mapped by separate mmap() calls then the same mprotect() would succeed. The
>> inconsistence here is unexpected and unprecedented.
> 
> Boo, I thought I was being super clever.
> 
>>>   	vma->vm_ops = &sgx_vm_ops;
>>>   	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
>>>   	vma->vm_private_data = encl;
>>

^ permalink raw reply

* Re: [RFC PATCH v3 3/4] X86/sgx: Introduce EMA as a new LSM module
From: Xing, Cedric @ 2019-07-08 17:16 UTC (permalink / raw)
  To: Casey Schaufler, linux-sgx, linux-security-module, selinux
In-Reply-To: <fb4352a4-7ef4-6824-a2ab-21e5fcb208ae@schaufler-ca.com>

On 7/8/2019 9:26 AM, Casey Schaufler wrote:
> In this scheme you use an ema LSM to manage your ema data.
> A quick sketch looks like:
> 
> 	sgx_something_in() calls
> 		security_enclave_load() calls
> 			ema_enclave_load()
> 			selinux_enclave_load()
> 			otherlsm_enclave_load()
> 
> Why is this better than:
> 
> 	sgx_something_in() calls
> 		ema_enclave_load()
> 		security_enclave_load() calls
> 			selinux_enclave_load()
> 			otherlsm_enclave_load()

Are you talking about moving EMA somewhere outside LSM? If so, where?

> 
> 
> If you did really want ema to behave like an LSM
> you would put the file data that SELinux is managing
> into the ema portion of the blob and provide interfaces
> for the SELinux (or whoever) to use that. Also, it's
> an abomination (as I've stated before) for ema to
> rely on SELinux to provide a file_free() hook for
> ema's data. If you continue down the LSM route, you
> need to provide an ema_file_free() hook. You can't
> count on SELinux to do it for you. If there are multiple
> LSMs (coming soon!) that use the ema data, they'll all
> try to free it, and then Bad Things can happen.

I'm afraid you have misunderstood the code. What is kept open and gets 
closed in selinux_file_free() is the sigstruct file. SELinux uses it to 
determine the page permissions for enclave pages from anonymous sources. 
It is a policy choice made inside SELinux and has nothing to do with EMA.

There's indeed an ema_file_free_security() to free the EMA map for 
enclaves being closed. EMA does *NOT* rely on any other LSMs to free 
data for it. The only exception is when an LSM fails enclave_load(), it 
has to call ema_remove_range() to remove the range being added, which 
was *not* required originally in v2.

>> +/**
>> + * ema - Enclave Memory Area structure for LSM modules
> 
> LSM modules is redundant. "LSM" or "LSMs" would be better.

Noted

>> diff --git a/security/Makefile b/security/Makefile
>> index c598b904938f..b66d03a94853 100644
>> --- a/security/Makefile
>> +++ b/security/Makefile
>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA)		+= yama/
>>   obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
>>   obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
>>   obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
>> +obj-$(CONFIG_INTEL_SGX)			+= commonema.o
> 
> The config option and the file name ought to match,
> or at least be closer.

Just trying to match file names as "capability" uses commoncap.c.

Like I said, this feature could potentially be used by TEEs other than 
SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I 
can rename it to ema.c or enclave.c. Do you have a preference?

>> diff --git a/security/commonema.c b/security/commonema.c
> 
> Put this in a subdirectory. Please.

Then why is commoncap.c located in this directory? I'm just trying to 
match the existing convention.

>> +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = {
>> +	.lbs_file = sizeof(atomic_long_t),
>> +};
> 
> If this is ema's data ema must manage it. You *must* have
> a file_free().

There is one indeed - ema_file_free_security().

> 
>> +
>> +static atomic_long_t *_map_file(struct file *encl)
>> +{
>> +	return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file);
> 
> I don't trust all the casting going on here, especially since
> you don't end up with the type you should be returning.

Will change.

>> +}
>> +
>> +static struct ema_map *_alloc_map(void)
> 
> Function header comments, please.

Will add.

^ permalink raw reply

* Re: [RFC PATCH v4 04/12] x86/sgx: Require userspace to define enclave pages' protection bits
From: Sean Christopherson @ 2019-07-08 16:34 UTC (permalink / raw)
  To: Xing, Cedric
  Cc: Jarkko Sakkinen, linux-sgx@vger.kernel.org,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	Roberts, William C, Schaufler, Casey, James Morris, Hansen, Dave,
	Andy Lutomirski, Jethro Beekman, Dr . Greg Wettstein,
	Stephen Smalley
In-Reply-To: <960B34DE67B9E140824F1DCDEC400C0F65518451@ORSMSX116.amr.corp.intel.com>

On Fri, Jun 21, 2019 at 09:42:54AM -0700, Xing, Cedric wrote:
> > From: Christopherson, Sean J
> > Sent: Wednesday, June 19, 2019 3:24 PM
> > 
> > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index
> > 6dba9f282232..67a3babbb24d 100644
> > --- a/arch/x86/include/uapi/asm/sgx.h
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -35,15 +35,17 @@ struct sgx_enclave_create  {
> >   * @src:	address for the page data
> >   * @secinfo:	address for the SECINFO data
> >   * @mrmask:	bitmask for the measured 256 byte chunks
> > + * @prot:	maximal PROT_{READ,WRITE,EXEC} protections for the page
> >   */
> >  struct sgx_enclave_add_page {
> >  	__u64	addr;
> >  	__u64	src;
> >  	__u64	secinfo;
> > -	__u64	mrmask;
> > +	__u16	mrmask;
> > +	__u8	prot;
> > +	__u8	pad;
> >  };
> 
> Given EPCM permissions cannot change in SGX1, these maximal PROT_* flags can
> be the same as EPCM permissions, so don't have to be specified by user code
> until SGX2. Given we don't have a clear picture on how SGX2 will work yet, I
> think we shall take "prot" off until it is proven necessary.

I'm ok with deriving the maximal protections from SECINFO, so long as we
acknowledge that we're preventing userspace from utilizing EMODPE (until
the kernel supports SGX2).

> > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c
> > b/arch/x86/kernel/cpu/sgx/driver/main.c
> > index 29384cdd0842..dabfe2a7245a 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c
> > @@ -93,15 +93,64 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,  }
> > #endif
> > 
> > +/*
> > + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages
> > + * covered by the specific VMA.  A non-existent (or yet to be added)
> > +enclave
> > + * page is considered to have no RWX permissions, i.e. is inaccessible.
> > + */
> > +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl,
> > +				     struct vm_area_struct *vma)
> > +{
> > +	unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC;
> > +	unsigned long idx, idx_start, idx_end;
> > +	struct sgx_encl_page *page;
> > +
> > +	idx_start = PFN_DOWN(vma->vm_start);
> > +	idx_end = PFN_DOWN(vma->vm_end - 1);
> > +
> > +	for (idx = idx_start; idx <= idx_end; ++idx) {
> > +		/*
> > +		 * No need to take encl->lock, vm_prot_bits is set prior to
> > +		 * insertion and never changes, and racing with adding pages is
> > +		 * a userspace bug.
> > +		 */
> > +		rcu_read_lock();
> > +		page = radix_tree_lookup(&encl->page_tree, idx);
> > +		rcu_read_unlock();
> 
> This loop iterates through every page in the range, which could be very slow
> if the range is large.

At this point I'm shooting for functional correctness and minimal code
changes.  Optimizations will be in order at some point, just not now.

> > +
> > +		/* Do not allow R|W|X to a non-existent page. */
> > +		if (!page)
> > +			allowed_rwx = 0;
> > +		else
> > +			allowed_rwx &= page->vm_prot_bits;
> > +		if (!allowed_rwx)
> > +			break;
> > +	}
> > +
> > +	return allowed_rwx;
> > +}
> > +
> >  static int sgx_mmap(struct file *file, struct vm_area_struct *vma)  {
> >  	struct sgx_encl *encl = file->private_data;
> > +	unsigned long allowed_rwx;
> >  	int ret;
> > 
> > +	allowed_rwx = sgx_allowed_rwx(encl, vma);
> > +	if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx)
> > +		return -EACCES;
> > +
> >  	ret = sgx_encl_mm_add(encl, vma->vm_mm);
> >  	if (ret)
> >  		return ret;
> > 
> > +	if (!(allowed_rwx & VM_READ))
> > +		vma->vm_flags &= ~VM_MAYREAD;
> > +	if (!(allowed_rwx & VM_WRITE))
> > +		vma->vm_flags &= ~VM_MAYWRITE;
> > +	if (!(allowed_rwx & VM_EXEC))
> > +		vma->vm_flags &= ~VM_MAYEXEC;
> > +
> 
> Say a range comprised of a RW sub-range and a RX sub-range is being mmap()'ed
> as R here. It'd succeed but mprotect(<RW sub-range>, RW) afterwards will fail
> because VM_MAYWRITE is cleared here. However, if those two sub-ranges are
> mapped by separate mmap() calls then the same mprotect() would succeed. The
> inconsistence here is unexpected and unprecedented.

Boo, I thought I was being super clever.

> >  	vma->vm_ops = &sgx_vm_ops;
> >  	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
> >  	vma->vm_private_data = encl;
> 

^ permalink raw reply

* Re: [RFC 0/7] Introduce TEE based Trusted Keys support
From: Jens Wiklander @ 2019-07-08 16:31 UTC (permalink / raw)
  To: Sumit Garg
  Cc: corbet, dhowells, jejb, Jarkko Sakkinen, Mimi Zohar, jmorris,
	serge, Ard Biesheuvel, Daniel Thompson, linux-doc,
	Linux Kernel Mailing List, tee-dev, keyrings,
	linux-security-module, linux-integrity
In-Reply-To: <CAFA6WYPn3HB6BRocKmKTR+ZPE=Fav5w1TUdRgmLp-NkYobp3rw@mail.gmail.com>

Hi Sumit,

On Mon, Jul 08, 2019 at 06:11:39PM +0530, Sumit Garg wrote:
> Hi Jens,
> 
> On Thu, 13 Jun 2019 at 16:01, Sumit Garg <sumit.garg@linaro.org> wrote:
> >
> > Add support for TEE based trusted keys where TEE provides the functionality
> > to seal and unseal trusted keys using hardware unique key. Also, this is
> > an alternative in case platform doesn't possess a TPM device.
> >
> > This series also adds some TEE features like:
> >
> > Patch #1, #2 enables support for registered kernel shared memory with TEE.
> >
> 
> Would you like to pick up Patch #1, #2 separately? I think both these
> patches add independent functionality and also got reviewed-by tags
> too.

I think it makes more sense to keep them together in the same patch
series or could end up with dependencies between trees.

If you don't think dependencies will be an issue then I don't mind
picking them up, in that case they'd likely sit in an arm-soc branch
until next merge window. However, I think that #3 (support for private
kernel login method) should be included too and that one isn't ready
yet.

Thanks,
Jens

> 
> 
> -Sumit
> 
> > Patch #3 enables support for private kernel login method required for
> > cases like trusted keys where we don't wan't user-space to directly access
> > TEE service to retrieve trusted key contents.
> >
> > Rest of the patches from #4 to #7 adds support for TEE based trusted keys.
> >
> > This patch-set has been tested with OP-TEE based pseudo TA which can be
> > found here [1].
> >
> > Looking forward to your valuable feedback/suggestions.
> >
> > [1] https://github.com/OP-TEE/optee_os/pull/3082
> >
> > Sumit Garg (7):
> >   tee: optee: allow kernel pages to register as shm
> >   tee: enable support to register kernel memory
> >   tee: add private login method for kernel clients
> >   KEYS: trusted: Introduce TEE based Trusted Keys
> >   KEYS: encrypted: Allow TEE based trusted master keys
> >   doc: keys: Document usage of TEE based Trusted Keys
> >   MAINTAINERS: Add entry for TEE based Trusted Keys
> >
> >  Documentation/security/keys/tee-trusted.rst      |  93 +++++
> >  MAINTAINERS                                      |   9 +
> >  drivers/tee/optee/call.c                         |   7 +
> >  drivers/tee/tee_core.c                           |   6 +
> >  drivers/tee/tee_shm.c                            |  16 +-
> >  include/keys/tee_trusted.h                       |  84 ++++
> >  include/keys/trusted-type.h                      |   1 +
> >  include/linux/tee_drv.h                          |   1 +
> >  include/uapi/linux/tee.h                         |   2 +
> >  security/keys/Kconfig                            |   3 +
> >  security/keys/Makefile                           |   3 +
> >  security/keys/encrypted-keys/masterkey_trusted.c |  10 +-
> >  security/keys/tee_trusted.c                      | 506 +++++++++++++++++++++++
> >  13 files changed, 737 insertions(+), 4 deletions(-)
> >  create mode 100644 Documentation/security/keys/tee-trusted.rst
> >  create mode 100644 include/keys/tee_trusted.h
> >  create mode 100644 security/keys/tee_trusted.c
> >
> > --
> > 2.7.4
> >

^ permalink raw reply

* Re: [RFC PATCH v3 3/4] X86/sgx: Introduce EMA as a new LSM module
From: Casey Schaufler @ 2019-07-08 16:26 UTC (permalink / raw)
  To: Cedric Xing, linux-sgx, linux-security-module, selinux
In-Reply-To: <41e1a1a2f66226d88d45675434eb34dde5d0f50d.1562542383.git.cedric.xing@intel.com>

On 7/7/2019 4:41 PM, Cedric Xing wrote:
> As enclave pages have different lifespan than the existing MAP_PRIVATE and
> MAP_SHARED pages, a new data structure is required outside of VMA to track
> their protections and/or origins. Enclave Memory Area (or EMA for short) has
> been introduced to address the need. EMAs are maintained by a new LSM module
> named “ema”, which is similar to the idea of the “capability” LSM module.

First off, I'll say that this is an improvement over
the LSM integrated version that preceded it. I still
have some issues with the naming, but I'll address that
inline. I do have a suggestion that I think will make
this more conventional.

In this scheme you use an ema LSM to manage your ema data.
A quick sketch looks like:

	sgx_something_in() calls
		security_enclave_load() calls
			ema_enclave_load()
			selinux_enclave_load()
			otherlsm_enclave_load()

Why is this better than:

	sgx_something_in() calls
		ema_enclave_load()
		security_enclave_load() calls
			selinux_enclave_load()
			otherlsm_enclave_load()


If you did really want ema to behave like an LSM
you would put the file data that SELinux is managing
into the ema portion of the blob and provide interfaces
for the SELinux (or whoever) to use that. Also, it's
an abomination (as I've stated before) for ema to
rely on SELinux to provide a file_free() hook for
ema's data. If you continue down the LSM route, you
need to provide an ema_file_free() hook. You can't
count on SELinux to do it for you. If there are multiple
LSMs (coming soon!) that use the ema data, they'll all
try to free it, and then Bad Things can happen.

>
> This new “ema” module has LSM_ORDER_FIRST so will always be dispatched before
> other LSM_ORDER_MUTABLE modules (e.g. selinux, apparmor, etc.). It is
> responsible for initializing EMA maps, and inserting and freeing EMA nodes, and
> offers APIs for other LSM modules to query/update EMAs. Details could be found
> in include/linux/lsm_ema.h and security/commonema.c.
>
> Signed-off-by: Cedric Xing <cedric.xing@intel.com>
> ---
>  include/linux/lsm_ema.h |  97 ++++++++++++++

I still think this should be enclave.h (or commonema.h)
as it is not LSM code.

>  security/Makefile       |   1 +
>  security/commonema.c    | 277 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 375 insertions(+)
>  create mode 100644 include/linux/lsm_ema.h
>  create mode 100644 security/commonema.c
>
> diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h
> new file mode 100644
> index 000000000000..59fc4ea6fa78
> --- /dev/null
> +++ b/include/linux/lsm_ema.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> +/**
> + * Enclave Memory Area interface for LSM modules
> + *
> + * Copyright(c) 2016-19 Intel Corporation.
> + */
> +
> +#ifndef _LSM_EMA_H_
> +#define _LSM_EMA_H_
> +
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +
> +/**
> + * ema - Enclave Memory Area structure for LSM modules

LSM modules is redundant. "LSM" or "LSMs" would be better.

> + *
> + * Data structure to track origins of enclave pages
> + *
> + * @link:
> + *	Link to adjacent EMAs. EMAs are sorted by their addresses in ascending
> + *	order
> + * @start:
> + *	Starting address
> + * @end:
> + *	Ending address
> + * @source:
> + *	File from which this range was loaded from, or NULL if not loaded from
> + *	any files
> + */
> +struct ema {
> +	struct list_head	link;
> +	size_t			start;
> +	size_t			end;
> +	struct file		*source;
> +};
> +
> +#define ema_data(ema, offset)	\
> +	((void *)((char *)((struct ema *)(ema) + 1) + offset))
> +
> +/**
> + * ema_map - LSM Enclave Memory Map structure for LSM modules

As above.

> + *
> + * Container for EMAs of an enclave
> + *
> + * @list:
> + *	Head of a list of sorted EMAs
> + * @lock:
> + *	Acquire before querying/updateing the list EMAs
> + */
> +struct ema_map {
> +	struct list_head	list;
> +	struct mutex		lock;
> +};
> +
> +size_t __init ema_request_blob(size_t blob_size);
> +struct ema_map *ema_get_map(struct file *encl);
> +int ema_apply_to_range(struct ema_map *map, size_t start, size_t end,
> +		       int (*cb)(struct ema *ema, void *arg), void *arg);
> +void ema_remove_range(struct ema_map *map, size_t start, size_t end);
> +
> +static inline int __must_check ema_lock_map(struct ema_map *map)
> +{
> +	return mutex_lock_interruptible(&map->lock);
> +}
> +
> +static inline void ema_unlock_map(struct ema_map *map)
> +{
> +	mutex_unlock(&map->lock);
> +}
> +
> +static inline int ema_lock_apply_to_range(struct ema_map *map,
> +					  size_t start, size_t end,
> +					  int (*cb)(struct ema *, void *),
> +					  void *arg)
> +{
> +	int rc = ema_lock_map(map);
> +	if (!rc) {
> +		rc = ema_apply_to_range(map, start, end, cb, arg);
> +		ema_unlock_map(map);
> +	}
> +	return rc;
> +}
> +
> +static inline int ema_lock_remove_range(struct ema_map *map,
> +					size_t start, size_t end)
> +{
> +	int rc = ema_lock_map(map);
> +	if (!rc) {
> +		ema_remove_range(map, start, end);
> +		ema_unlock_map(map);
> +	}
> +	return rc;
> +}
> +
> +#endif /* _LSM_EMA_H_ */
> diff --git a/security/Makefile b/security/Makefile
> index c598b904938f..b66d03a94853 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA)		+= yama/
>  obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
>  obj-$(CONFIG_SECURITY_SAFESETID)       += safesetid/
>  obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
> +obj-$(CONFIG_INTEL_SGX)			+= commonema.o

The config option and the file name ought to match,
or at least be closer.

>  
>  # Object integrity file lists
>  subdir-$(CONFIG_INTEGRITY)		+= integrity
> diff --git a/security/commonema.c b/security/commonema.c

Put this in a subdirectory. Please.

> new file mode 100644
> index 000000000000..c5b0bdfdc013
> --- /dev/null
> +++ b/security/commonema.c
> @@ -0,0 +1,277 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2016-18 Intel Corporation.
> +
> +#include <linux/lsm_ema.h>
> +#include <linux/lsm_hooks.h>
> +#include <linux/slab.h>
> +
> +static struct kmem_cache *_map_cache;
> +static struct kmem_cache *_node_cache;
> +static size_t _data_size __lsm_ro_after_init;
> +
> +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = {
> +	.lbs_file = sizeof(atomic_long_t),
> +};

If this is ema's data ema must manage it. You *must* have
a file_free().

> +
> +static atomic_long_t *_map_file(struct file *encl)
> +{
> +	return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file);

I don't trust all the casting going on here, especially since
you don't end up with the type you should be returning.

> +}
> +
> +static struct ema_map *_alloc_map(void)

Function header comments, please.

> +{
> +	struct ema_map *m;
> +
> +	m = kmem_cache_zalloc(_map_cache, GFP_KERNEL);
> +	if (likely(m)) {
> +		INIT_LIST_HEAD(&m->list);
> +		mutex_init(&m->lock);
> +	}
> +	return m;
> +}
> +
> +static struct ema *_new_ema(size_t start, size_t end, struct file *src)
> +{
> +	struct ema *ema;
> +
> +	if (unlikely(!_node_cache)) {
> +		struct kmem_cache *c;
> +
> +		c = kmem_cache_create("lsm-ema", sizeof(*ema) + _data_size,
> +				      __alignof__(typeof(*ema)), SLAB_PANIC,
> +				      NULL);
> +		if (atomic_long_cmpxchg((atomic_long_t *)&_node_cache, 0,
> +					(long)c))
> +			kmem_cache_destroy(c);
> +	}
> +
> +	ema = kmem_cache_zalloc(_node_cache, GFP_KERNEL);
> +	if (likely(ema)) {
> +		INIT_LIST_HEAD(&ema->link);
> +		ema->start = start;
> +		ema->end = end;
> +		if (src)
> +			ema->source = get_file(src);
> +	}
> +	return ema;
> +}
> +
> +static void _free_ema(struct ema *ema)
> +{
> +	if (ema->source)
> +		fput(ema->source);
> +	kmem_cache_free(_node_cache, ema);
> +}
> +
> +static void _free_map(struct ema_map *map)
> +{
> +	struct ema *p, *n;
> +
> +	WARN_ON(mutex_is_locked(&map->lock));
> +	list_for_each_entry_safe(p, n, &map->list, link)
> +		_free_ema(p);
> +	kmem_cache_free(_map_cache, map);
> +}
> +
> +static struct ema_map *_init_map(struct file *encl)
> +{
> +	struct ema_map *m = ema_get_map(encl);
> +	if (!m) {
> +		m = _alloc_map();
> +		if (atomic_long_cmpxchg(_map_file(encl), 0, (long)m)) {
> +			_free_map(m);
> +			m = ema_get_map(encl);
> +		}
> +	}
> +	return m;
> +}
> +
> +static inline struct ema *_next_ema(struct ema *p, struct ema_map *map)
> +{
> +	p = list_next_entry(p, link);
> +	return &p->link == &map->list ? NULL : p;
> +}
> +
> +static inline struct ema *_find_ema(struct ema_map *map, size_t a)
> +{
> +	struct ema *p;
> +
> +	WARN_ON(!mutex_is_locked(&map->lock));
> +
> +	list_for_each_entry(p, &map->list, link)
> +		if (a < p->end)
> +			break;
> +	return &p->link == &map->list ? NULL : p;
> +}
> +
> +static struct ema *_split_ema(struct ema *p, size_t at)
> +{
> +	typeof(p) n;
> +
> +	if (at <= p->start || at >= p->end)
> +		return p;
> +
> +	n = _new_ema(p->start, at, p->source);
> +	if (likely(n)) {
> +		memcpy(n + 1, p + 1, _data_size);
> +		p->start = at;
> +		list_add_tail(&n->link, &p->link);
> +	}
> +	return n;
> +}
> +
> +static int _merge_ema(struct ema *p, struct ema_map *map)
> +{
> +	typeof(p) prev = list_prev_entry(p, link);
> +
> +	WARN_ON(!mutex_is_locked(&map->lock));
> +
> +	if (&prev->link == &map->list || prev->end != p->start ||
> +	    prev->source != p->source || memcmp(prev + 1, p + 1, _data_size))
> +		return 0;
> +
> +	p->start = prev->start;
> +	fput(prev->source);
> +	_free_ema(prev);
> +	return 1;
> +}
> +
> +static inline int _insert_ema(struct ema_map *map, struct ema *n)
> +{
> +	typeof(n) p = _find_ema(map, n->start);
> +
> +	if (!p)
> +		list_add_tail(&n->link, &map->list);
> +	else if (n->end <= p->start)
> +		list_add_tail(&n->link, &p->link);
> +	else
> +		return -EEXIST;
> +
> +	_merge_ema(n, map);
> +	if (p)
> +		_merge_ema(p, map);
> +	return 0;
> +}
> +
> +static void ema_file_free_security(struct file *encl)
> +{
> +	struct ema_map *m = ema_get_map(encl);
> +	if (m)
> +		_free_map(m);
> +}
> +
> +static int ema_enclave_load(struct file *encl, size_t start, size_t end,
> +			    size_t flags, struct vm_area_struct *vma)
> +{
> +	struct ema_map *m;
> +	struct ema *ema;
> +	int rc;
> +
> +	m = _init_map(encl);
> +	if (unlikely(!m))
> +		return -ENOMEM;
> +
> +	ema = _new_ema(start, end, vma ? vma->vm_file : NULL);
> +	if (unlikely(!ema))
> +		return -ENOMEM;
> +
> +	rc = ema_lock_map(m);
> +	if (!rc) {
> +		rc = _insert_ema(m, ema);
> +		ema_unlock_map(m);
> +	}
> +	if (rc)
> +		_free_ema(ema);
> +	return rc;
> +}
> +
> +static int ema_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct,
> +			    struct vm_area_struct *vma)
> +{
> +	if (unlikely(!_init_map(encl)))
> +		return -ENOMEM;
> +	return 0;
> +}
> +
> +static struct security_hook_list ema_hooks[] __lsm_ro_after_init = {
> +	LSM_HOOK_INIT(file_free_security, ema_file_free_security),
> +	LSM_HOOK_INIT(enclave_load, ema_enclave_load),
> +	LSM_HOOK_INIT(enclave_init, ema_enclave_init),
> +};
> +
> +static int __init ema_init(void)
> +{
> +	_map_cache = kmem_cache_create("lsm-ema_map", sizeof(struct ema_map),
> +				       __alignof__(struct ema_map), SLAB_PANIC,
> +				       NULL);
> +	security_add_hooks(ema_hooks, ARRAY_SIZE(ema_hooks), "ema");
> +	return 0;
> +}
> +
> +DEFINE_LSM(ema) = {
> +	.name = "ema",
> +	.order = LSM_ORDER_FIRST,
> +	.init = ema_init,
> +	.blobs = &ema_blob_sizes,
> +};
> +
> +/* ema_request_blob shall only be called from LSM module init function */
> +size_t __init ema_request_blob(size_t size)
> +{
> +	typeof(_data_size) offset = _data_size;
> +	_data_size += size;
> +	return offset;
> +}
> +
> +struct ema_map *ema_get_map(struct file *encl)
> +{
> +	return (struct ema_map *)atomic_long_read(_map_file(encl));
> +}
> +
> +/**
> + * Invoke a callback function on every EMA falls within range, split EMAs as
> + * needed
> + */
> +int ema_apply_to_range(struct ema_map *map, size_t start, size_t end,
> +		       int (*cb)(struct ema *, void *), void *arg)
> +{
> +	struct ema *ema;
> +	int rc;
> +
> +	ema = _find_ema(map, start);
> +	while (ema && end > ema->start) {
> +		if (start > ema->start)
> +			_split_ema(ema, start);
> +		if (end < ema->end)
> +			ema = _split_ema(ema, end);
> +
> +		rc = (*cb)(ema, arg);
> +		_merge_ema(ema, map);
> +		if (rc)
> +			return rc;
> +
> +		ema = _next_ema(ema, map);
> +	}
> +
> +	if (ema)
> +		_merge_ema(ema, map);
> +	return 0;
> +}
> +
> +/* Remove all EMAs falling within range, split EMAs as needed */
> +void ema_remove_range(struct ema_map *map, size_t start, size_t end)
> +{
> +	struct ema *ema, *n;
> +
> +	ema = _find_ema(map, start);
> +	while (ema && end > ema->start) {
> +		if (start > ema->start)
> +			_split_ema(ema, start);
> +		if (end < ema->end)
> +			ema = _split_ema(ema, end);
> +
> +		n = _next_ema(ema, map);
> +		_free_ema(ema);
> +		ema = n;
> +	}
> +}


^ permalink raw reply

* Re: [RFC PATCH v3 0/4] security/x86/sgx: SGX specific LSM hooks
From: Sean Christopherson @ 2019-07-08 15:55 UTC (permalink / raw)
  To: Cedric Xing
  Cc: linux-sgx, linux-security-module, selinux, casey.schaufler,
	jmorris, luto, jethro, greg, sds, jarkko.sakkinen
In-Reply-To: <cover.1562542383.git.cedric.xing@intel.com>

On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote:
...

> different FSMs to govern page protection transitions. Implementation wise, his
> model also imposes unwanted restrictions specifically to SGX2, such as:
>   · Complicated/Restricted UAPI – Enclave loaders are required to provide

I don't think "complicated" is a fair assessment.  For SGX1 enclaves it's
literally a direct propagation of the SECINFO RWX flags.

>     “maximal protection” at page load time, but such information is NOT always
>     available. For example, Graphene containers may run different applications
>     comprised of different set of executables and/or shared objects. Some of
>     them may contain self-modifying code (or text relocation) while others
>     don’t. The generic enclave loader usually doesn’t have such information so
>     wouldn’t be able to provide it ahead of time.

I'm unconvinced that it would be remotely difficult to teach an enclave
loader that an enclave or hosted application employs SMC, relocation or
any other behavior that would require declaring RWX on all pages.

>   · Inefficient Auditing – Audit logs are supposed to help system
>     administrators to determine the set of minimally needed permissions and to
>     detect abnormal behaviors. But consider the “maximal protection” model, if
>     “maximal protection” is set to be too permissive, then audit log wouldn’t
>     be able to detect anomalies;

Huh?  Declaring overly permissive protections is only problematic if an
LSM denies the permission, in which case it will generate an accurate
audit log.

If the enclave/loader "requires" a permission it doesn't actually need,
e.g. EXECDIRTY, then it's a software bug that should be fixed.  I don't
see how this scenario is any different than an application that uses
assembly code without 'noexecstack' and inadvertantly "requires"
EXECSTACK due to triggering "read implies exec".  In both cases the
denied permission is unnecessary due to a userspace application bug.

>     or if “maximal protection” is too restrictive,
>     then audit log cannot identify the file violating the policy.

Maximal protections that are too restrictive are completely orthogonal to
LSMs as the enclave would fail to run irrespective of LSMs.  This is no
different than specifying the wrong RWX flags in SECINFO, or opening a
file as RO instead of RW.

> In either case the audit log cannot fulfill its purposes.
>   · Inability to support #PF driven EPC allocation in SGX2 – For those
>     unfamiliar with SGX2 software flows, an SGX2 enclave requests a page by
>     issuing EACCEPT on the address that a new page is wanted, and the resulted
>     #PF is expected to be handled by the kernel by EAUG’ing an EPC page at the
>     fault address, and then the enclave would be resumed and the faulting
>     EACCEPT retried, and succeed. The key requirement is to allow mmap()’ing
>     non-existing enclave pages so that the SGX module/subsystem could respond
>     to #PFs by EAUG’ing new pages. Sean’s implementation doesn’t allow
>     mmap()’ing non-existing pages for variety of reasons and therefore blocks
>     this major SGX2 usage.

This is simply wrong.  The key requirement in the theoretical EAUG scheme
is to mmap() pages that have not been added to the _hardware_ maintained
enclave.  The pages (or some optimized representation of a range of pages)
would exist in the kernel's software mode of the enclave.

^ permalink raw reply

* Re: [RFC 3/7] tee: add private login method for kernel clients
From: Jens Wiklander @ 2019-07-08 15:39 UTC (permalink / raw)
  To: Sumit Garg
  Cc: keyrings, linux-integrity, linux-security-module, corbet,
	dhowells, jejb, jarkko.sakkinen, zohar, jmorris, serge,
	ard.biesheuvel, daniel.thompson, linux-doc, linux-kernel, tee-dev
In-Reply-To: <1560421833-27414-4-git-send-email-sumit.garg@linaro.org>

Hi Sumit,

On Thu, Jun 13, 2019 at 04:00:29PM +0530, Sumit Garg wrote:
> There are use-cases where user-space shouldn't be allowed to communicate
> directly with a TEE device which is dedicated to provide a specific
> service for a kernel client. So add a private login method for kernel
> clients and disallow user-space to open-session using this login method.
> 
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> ---
>  drivers/tee/tee_core.c   | 6 ++++++
>  include/uapi/linux/tee.h | 2 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> index 0f16d9f..4581bd1 100644
> --- a/drivers/tee/tee_core.c
> +++ b/drivers/tee/tee_core.c
> @@ -334,6 +334,12 @@ static int tee_ioctl_open_session(struct tee_context *ctx,
>  			goto out;
>  	}
>  
> +	if (arg.clnt_login == TEE_IOCTL_LOGIN_REE_KERNEL) {
TEE_IOCTL_LOGIN_REE_KERNEL is defined as 0x80000000 which is in the
range specified and implementation defined by the GP spec. I wonder if
we shouldn't filter the entire implementation defined range instead of
just this value.

> +		pr_err("login method not allowed for user-space client\n");
pr_debug(), if it's needed at all.

> +		rc = -EPERM;
> +		goto out;
> +	}
> +
>  	rc = ctx->teedev->desc->ops->open_session(ctx, &arg, params);
>  	if (rc)
>  		goto out;
> diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
> index 4b9eb06..f33c69c 100644
> --- a/include/uapi/linux/tee.h
> +++ b/include/uapi/linux/tee.h
> @@ -172,6 +172,8 @@ struct tee_ioctl_buf_data {
>  #define TEE_IOCTL_LOGIN_APPLICATION		4
>  #define TEE_IOCTL_LOGIN_USER_APPLICATION	5
>  #define TEE_IOCTL_LOGIN_GROUP_APPLICATION	6
> +/* Private login method for REE kernel clients */
It's worth noting that this is filtered by the TEE framework, compared
to everything else which is treated opaquely.

> +#define TEE_IOCTL_LOGIN_REE_KERNEL		0x80000000
>  
>  /**
>   * struct tee_ioctl_param - parameter
> -- 
> 2.7.4
> 

Thanks,
Jens

^ permalink raw reply

* Re: [RFC PATCH v4 01/12] x86/sgx: Use mmu_notifier.release() instead of per-vma refcounting
From: Sean Christopherson @ 2019-07-08 14:57 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, linux-security-module, selinux, Bill Roberts,
	Casey Schaufler, James Morris, Dave Hansen, Cedric Xing,
	Andy Lutomirski, Jethro Beekman, Dr . Greg Wettstein,
	Stephen Smalley
In-Reply-To: <20190620210324.GF15383@linux.intel.com>

On Fri, Jun 21, 2019 at 12:03:36AM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 19, 2019 at 03:23:50PM -0700, Sean Christopherson wrote:
> > Using per-vma refcounting to track mm_structs associated with an enclave
> > requires hooking .vm_close(), which in turn prevents the mm from merging
> > vmas (precisely to allow refcounting).
> 
> Why having sgx_vma_close() prevents that? I do not understand the
> problem statement.

vmas that define .vm_close() cannot be merged.

  /*
   * If the vma has a ->close operation then the driver probably needs to release
   * per-vma resources, so we don't attempt to merge those.
   */
  static inline int is_mergeable_vma(struct vm_area_struct *vma,
				struct file *file, unsigned long vm_flags,
				struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
  {
	...

	if (vma->vm_ops && vma->vm_ops->close)
		return 0;
	if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
		return 0;
	return 1;
  }


> 
> > Avoid refcounting encl_mm altogether by registering an mmu_notifier at
> > .mmap(), removing the dying encl_mm at mmu_notifier.release() and
> > protecting mm_list during reclaim via a per-enclave SRCU.
> 
> Right, there is the potential collision with my changes:
> 
> 1. Your patch: enclave life-cycle equals life-cycle of all processes
>    that are associated with the enclave.
> 2. My (yet be sent) patch: enclave life-cycle equals the life cycle.
> 
> I won't rush with my patch. I rather merge neither at this point and
> you can review mine after you come back from your vacation.
> 
> > Removing refcounting/vm_close() allows merging of enclave vmas, at the
> > cost of delaying removal of encl_mm structs from mm_list, i.e. an mm is
> > disassociated from an enclave when the mm exits or the enclave dies, as
> > opposed to when the last vma (in a given mm) is closed.
> > 
> > The impact of delying encl_mm removal is its memory footprint and
> > whatever overhead is incurred during EPC reclaim (to walk an mm's vmas).
> > Practically speaking, a stale encl_mm will exist for a meaningful amount
> > of time if and only if the enclave is mapped in a long-lived process and
> > then passed off to another long-lived process.  It is expected that the
> > vast majority of use cases will not encounter this condition, e.g. even
> > using a daemon to build enclaves should not result in a stale encl_mm as
> > the builder should never need to mmap() the enclave.
> 
> This paragraph speaks only about "well behaving" software.

Malicious software isn't all that interesting as there are far easier ways
to waste system resources.  That being said, the encl_mm allocation can
use GFP_KERNEL_ACCOUNT.

> > Even if there are scenarios that lead to defunct encl_mms, the cost is
> > likely far outweighed by the benefits of reducing the number of vmas
> > across all enclaves.
> > 
> > Note, using SRCU to protect mm_list is not strictly necessary, i.e. the
> > existing walker with encl_mm refcounting could be massaged to work with
> > mmu_notifier.release(), but the resulting code is subtle and fragile (I
> > never actually got it working).  The primary issue is that an encl_mm
> > can't be moved off the list until its refcount goes to zero, otherwise
> > the custom walker goes off into the weeds.  The refcount requirement
> > then prevents using mm_list to identify if an mmu_notifier.release()
> > has fired, i.e. another mechanism is needed to guard against races
> > between exit_mmap() and sgx_release().
> 
> Is it really impossible to send a separate SRCU patch?

I can split out the SRCU as a precursor.  It'll likely take me a few days
to get it sent.

> I fully agree with the SRCU whereas rest of this patch is still
> under debate.
> 
> If you could do that, I can merge it in no time. It is a small
> step into better direction.
> 
> > Cc: Dave Hansen <dave.hansen@intel.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Needs to be rebased because the master missing your earlier bug fix.

...

> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 9566eb72d417..c6436bbd4a68 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -132,103 +132,125 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >  	return entry;
> >  }
> >  
> > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl,
> > -				    struct mm_struct *mm)
> > +static void sgx_encl_mm_release_wq(struct work_struct *work)
> > +{
> > +	struct sgx_encl_mm *encl_mm =
> > +		container_of(work, struct sgx_encl_mm, release_work);
> > +
> > +	sgx_encl_mm_release(encl_mm);
> > +}
> > +
> > +/*
> > + * Being a call_srcu() callback, this needs to be short, and sgx_encl_release()
> > + * is anything but short.  Do the final freeing in yet another async callback.
> > + */
> > +static void sgx_encl_mm_release_delayed(struct rcu_head *rcu)
> 
> Would rename this either as *_tail() or *_deferred().

Deferred works for me.

> > +{
> > +	struct sgx_encl_mm *encl_mm =
> > +		container_of(rcu, struct sgx_encl_mm, rcu);
> > +
> > +	INIT_WORK(&encl_mm->release_work, sgx_encl_mm_release_wq);
> > +	schedule_work(&encl_mm->release_work);
> > +}
> > +

...

> > @@ -118,11 +123,13 @@ void sgx_encl_destroy(struct sgx_encl *encl);
> >  void sgx_encl_release(struct kref *ref);
> >  pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page);
> >  struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index);
> > -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl,
> > -				     struct sgx_encl_mm *encl_mm, int *iter);
> > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl,
> > -				    struct mm_struct *mm);
> > -void sgx_encl_mm_release(struct kref *ref);
> > +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
> > +static inline void sgx_encl_mm_release(struct sgx_encl_mm *encl_mm)
> > +{
> > +	kref_put(&encl_mm->encl->refcount, sgx_encl_release);
> > +
> > +	kfree(encl_mm);
> > +}
> 
> Please just open code this to the two call sites. Makes the code hard to
> follow.

Heh, I waffled between a helper and open coding.  I chose poorly :-)

> Right now I did not find anything else questionable from the code
> changes. Repeating myself but if it is by any means possible before
> going away, can you construct a pure SRCU patch.
>
> I could then reconstruct my changes on top off that, which would
> make evalution of both heck a lot easier.
> 
> /Jarkko

^ permalink raw reply

* Re: [RFC PATCH v2 0/3] security/x86/sgx: SGX specific LSM hooks
From: Jarkko Sakkinen @ 2019-07-08 14:46 UTC (permalink / raw)
  To: Xing, Cedric
  Cc: linux-sgx, linux-security-module, selinux, casey.schaufler,
	jmorris, luto, jethro, greg, sds, sean.j.christopherson
In-Reply-To: <415f8ae7-93d4-129e-4169-ffc7059398e5@intel.com>

On Fri, 2019-07-05 at 22:04 -0700, Xing, Cedric wrote:
> On 7/3/2019 4:16 PM, Jarkko Sakkinen wrote:
> > On Thu, Jun 27, 2019 at 11:56:18AM -0700, Cedric Xing wrote:
> > 
> > I think it is fine to have these patch sets as a discussion starters but
> > it does not make any sense to me to upstream LSM changes with the SGX
> > foundations.
> 
> Guess LSM is a gating factor, because otherwise SGX could be abused to 
> make executable EPC from pages that are otherwise not allowed to be 
> executable. Am I missing anything?

No, but what was the point? LSM is always additional gating factor.

Does not make a case for any of the proposed LSM changes.

/Jarrko


^ permalink raw reply

* Re: [RFC PATCH v4 12/12] LSM: x86/sgx: Show line of sight to LSM support SGX2's EAUG
From: Sean Christopherson @ 2019-07-08 14:34 UTC (permalink / raw)
  To: Xing, Cedric
  Cc: Jarkko Sakkinen, linux-sgx@vger.kernel.org,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	Roberts, William C, Schaufler, Casey, James Morris, Hansen, Dave,
	Andy Lutomirski, Jethro Beekman, Dr . Greg Wettstein,
	Stephen Smalley
In-Reply-To: <960B34DE67B9E140824F1DCDEC400C0F655184EC@ORSMSX116.amr.corp.intel.com>

On Fri, Jun 21, 2019 at 10:18:55AM -0700, Xing, Cedric wrote:
> > From: Christopherson, Sean J
> > Sent: Wednesday, June 19, 2019 3:24 PM
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
> > index 4379a2fb1f82..b478c0f45279 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c
> > @@ -99,7 +99,8 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
> >   * page is considered to have no RWX permissions, i.e. is inaccessible.
> >   */
> >  static unsigned long sgx_allowed_rwx(struct sgx_encl *encl,
> > -				     struct vm_area_struct *vma)
> > +				     struct vm_area_struct *vma,
> > +				     bool *eaug)
> >  {
> >  	unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC;
> >  	unsigned long idx, idx_start, idx_end; @@ -123,6 +124,8 @@ static unsigned long
> > sgx_allowed_rwx(struct sgx_encl *encl,
> >  			allowed_rwx = 0;
> >  		else
> >  			allowed_rwx &= page->vm_prot_bits;
> > +		if (page->vm_prot_bits & SGX_VM_EAUG)
> > +			*eaug = true;
> >  		if (!allowed_rwx)
> >  			break;
> >  	}
> > @@ -134,16 +137,17 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> >  	struct sgx_encl *encl = file->private_data;
> >  	unsigned long allowed_rwx, prot;
> > +	bool eaug = false;
> >  	int ret;
> > 
> > -	allowed_rwx = sgx_allowed_rwx(encl, vma);
> > +	allowed_rwx = sgx_allowed_rwx(encl, vma, &eaug);
> >  	if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx)
> >  		return -EACCES;
> 
> IIUC, "eaug range" has to be mapped PROT_NONE, then vm_ops->fault() won't be
> invoked. Am I correct? Then how to EAUG on #PF?

Pages tagged SGX_VM_EAUG also have maximal permissions and can be mapped
PROT_{READ,WRITE,EXEC} accordingly.

> 
> > 
> >  	prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) |
> >  	       _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) |
> >  	       _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC);
> > -	ret = security_enclave_map(prot);
> > +	ret = security_enclave_map(prot, eaug);
> >  	if (ret)
> >  		return ret;
> > 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox