Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: David Howells @ 2019-06-10 22:07 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: dhowells, Andy Lutomirski, Stephen Smalley, Al Viro, USB list,
	LSM List, Greg Kroah-Hartman, raven, Linux FS Devel, Linux API,
	linux-block, keyrings, LKML, Paul Moore
In-Reply-To: <25d88489-9850-f092-205e-0a4fc292f41b@schaufler-ca.com>

Casey Schaufler <casey@schaufler-ca.com> wrote:

> Process A and process B both open /dev/null.
> A and B can write and read to their hearts content
> to/from /dev/null without ever once communicating.
> The mutual accessibility of /dev/null in no way implies that
> A and B can communicate. If A can set a watch on /dev/null,
> and B triggers an event, there still has to be an access
> check on the delivery of the event because delivering an event
> to A is not an action on /dev/null, but on A.

If a process has the privilege, it appears that fanotify() allows that process
to see others accessing /dev/null (FAN_ACCESS, FAN_ACCESS_PERM).  There don't
seem to be any LSM checks there either.

On the other hand, the privilege required is CAP_SYS_ADMIN,

> > The mount tree can't be modified by unprivileged users, unless a
> > privileged user very carefully configured it as such.
> 
> "Unless" means *is* possible. In which case access control is
> required. I will admit to being less then expert on the extent
> to which mounts can be done without privilege.

Automounts in network filesystems, for example.

The initial mount of the network filesystem requires local privilege, but then
mountpoints are managed with remote privilege as granted by things like
kerberos tickets.  The local kernel has no control.

If you have CONFIG_AFS_FS enabled in your kernel, for example, and you install
the keyutils package (dnf, rpm, apt, etc.), then you should be able to do:

	mount -t afs none /mnt -o dyn
	ls /afs/grand.central.org/software/

for example.  That will go through a couple of automount points.  Assuming you
don't have a kerberos login on those servers, however, you shouldn't be able
to add new mountpoints.

Someone watching the mount topology can see events when an automount is
enacted and when it expires, the latter being an event with the system as the
subject since the expiry is done on a timeout set by the kernel.

David

^ permalink raw reply

* RE: [RFC PATCH v2 1/5] mm: Introduce vm_ops->may_mprotect()
From: Xing, Cedric @ 2019-06-10 22:06 UTC (permalink / raw)
  To: Christopherson, Sean J
  Cc: Jarkko Sakkinen, Andy Lutomirski, Stephen Smalley, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris,
	selinux@vger.kernel.org, Jethro Beekman, Hansen, Dave,
	Thomas Gleixner, Linus Torvalds, LKML, X86 ML,
	linux-sgx@vger.kernel.org, Andrew Morton, nhorman@redhat.com,
	npmccallum@redhat.com, Ayoun, Serge, Katz-zamir, Shay,
	Huang, Haitao, Andy Shevchenko, Svahn, Kai, Borislav Petkov,
	Josh Triplett, Huang, Kai, David Rientjes, Roberts, William C,
	Tricca, Philip B
In-Reply-To: <20190610194941.GK15995@linux.intel.com>

> From: Christopherson, Sean J
> Sent: Monday, June 10, 2019 12:50 PM
> 
> On Mon, Jun 10, 2019 at 10:47:52AM -0700, Xing, Cedric wrote:
> > > From: Christopherson, Sean J
> > > Sent: Monday, June 10, 2019 8:56 AM
> > >
> > > > > As a result, LSM policies cannot be meaningfully applied, e.g.
> > > > > an LSM can deny access to the EPC as a whole, but can't deny
> > > > > PROT_EXEC on page that originated in a non-EXECUTE file (which
> > > > > is long gone by the time
> > > > > mprotect() is called).
> > > >
> > > > I have hard time following what is paragraph is trying to say.
> > > >
> > > > > By hooking mprotect(), SGX can make explicit LSM upcalls while
> > > > > an enclave is being built, i.e. when the kernel has a handle to
> > > > > origin of each enclave page, and enforce the result of the LSM
> > > > > policy whenever userspace maps the enclave page in the future.
> > > >
> > > > "LSM policy whenever calls mprotect()"? I'm no sure why you mean
> > > > by mapping here and if there is any need to talk about future.
> > > > Isn't this needed now?
> > >
> > > Future is referring to the timeline of a running kernel, not the
> > > future of the kernel code.
> > >
> > > Rather than trying to explain all of the above with words, I'll
> > > provide code examples to show how ->may_protect() will be used by
> > > SGX and why it is the preferred solution.
> >
> > The LSM concept is to separate security policy enforcement from the
> > rest of the kernel. For modules, the "official" way is to use VM_MAY*
> > flags to limit allowable permissions, while LSM uses
> security_file_mprotect().
> > I guess that's why we didn't have .may_mprotect() in the first place.
> 
> Heh, so I've typed up about five different responses to this comment.
> In doing so, I think I've convinced myself that ->may_mprotect() is
> unnecessary.  Rther than hook mprotect(), simply update the VM_MAY*
> flags during mmap(), with all bits cleared if there isn't an associated
> enclave page.  IIRC, the need to add ->may_protect() came about when we
> were exploring more dynamic interplay between SGX and LSMs.
> 
> > What you are doing is enforcing some security policy outside of LSM,
> > which is dirty from architecture perspective.
> 
> No, the enclave page protections are enforced regardless of LSM policy,
> and in v2 those protections are immutable.  Yes, the explicit enclave
> page protection bits are being added primarily for LSMs, but they don't
> impact functionality other than at the security_enclave_load()
> touchpoint.

Disagreed.

You can say you want to enforce "something" without LSM. But what's the purpose of that "something" without LSM? Why doesn't the original mprotect() enforce that "something"?

It *does* affect functionality because user mode code has to figure out an "explicit protection" to make sure the enclave would work with *and also* without LSM. That said, the "explicit protection" can neither be too restrictive (or enclave wouldn't work) nor be too permissive (or LSM policies are violated). But what if the user mode code doesn't have appropriate "explicit protection" ahead of time as it is just going to mprotect() as the enclave requests at runtime?

And your restrictions on mmap()'ing non-existing pages also have great impacts to SGX2 support.

I think some reasonable answers are needed to the above questions before we can call this proposal viable. 

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Casey Schaufler @ 2019-06-10 21:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stephen Smalley, David Howells, Al Viro, USB list, LSM List,
	Greg Kroah-Hartman, raven, Linux FS Devel, Linux API, linux-block,
	keyrings, LKML, Paul Moore
In-Reply-To: <CALCETrU+PKVbrKQJoXj9x_5y+vTZENMczHqyM_Xb85ca5YDZuA@mail.gmail.com>

On 6/10/2019 12:53 PM, Andy Lutomirski wrote:
> On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>>> I think you really need to give an example of a coherent policy that
>>>>> needs this.
>>>> I keep telling you, and you keep ignoring what I say.
>>>>
>>>>>  As it stands, your analogy seems confusing.
>>>> It's pretty simple. I have given both the abstract
>>>> and examples.
>>> You gave the /dev/null example, which is inapplicable to this patchset.
>> That addressed an explicit objection, and pointed out
>> an exception to a generality you had asserted, which was
>> not true. It's also a red herring regarding the current
>> discussion.
> This argument is pointless.
>
> Please humor me and just give me an example.  If you think you have
> already done so, feel free to repeat yourself.  If you have no
> example, then please just say so.

To repeat the /dev/null example:

Process A and process B both open /dev/null.
A and B can write and read to their hearts content
to/from /dev/null without ever once communicating.
The mutual accessibility of /dev/null in no way implies that
A and B can communicate. If A can set a watch on /dev/null,
and B triggers an event, there still has to be an access
check on the delivery of the event because delivering an event
to A is not an action on /dev/null, but on A.


>
>>>>>  If someone
>>>>> changes the system clock, we don't restrict who is allowed to be
>>>>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock
>>>>> was changed based on who changed the clock.
>>>> That's right. The system clock is not an object that
>>>> unprivileged processes can modify. In fact, it is not
>>>> an object at all. If you care to look, you will see that
>>>> Smack does nothing with the clock.
>>> And this is different from the mount tree how?
>> The mount tree can be modified by unprivileged users.
>> If nothing that unprivileged users can do to the mount
>> tree can trigger a notification you are correct, the
>> mount tree is very like the system clock. Is that the
>> case?
> The mount tree can't be modified by unprivileged users, unless a
> privileged user very carefully configured it as such.

"Unless" means *is* possible. In which case access control is
required. I will admit to being less then expert on the extent
to which mounts can be done without privilege.

>   An unprivileged
> user can create a new userns and a new mount ns, but then they're
> modifying a whole different mount tree.

Within those namespaces you can still have multiple users,
constrained be system access control policy.

>
>>>>>  Similarly, if someone
>>>>> tries to receive a packet on a socket, we check whether they have the
>>>>> right to receive on that socket (from the endpoint in question) and,
>>>>> if the sender is local, whether the sender can send to that socket.
>>>>> We do not check whether the sender can send to the receiver.
>>>> Bzzzt! Smack sure does.
>>> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense.
>> Process A sends a packet to process B.
>> If A has access to TopSecret data and B is not
>> allowed to see TopSecret data, the delivery should
>> be prevented. Is that nonsensical?
> It makes sense.  As I see it, the way that a sensible policy should do
> this is by making sure that there are no sockets, pipes, etc that
> Process A can write and that Process B can read.

You can't explain UDP controls without doing the access check
on packet delivery. The sendmsg() succeeds when the packet leaves
the sender. There doesn't even have to be a socket bound to the
port. The only opportunity you have for control is on packet
delivery, which is the only point at which you can have the
information required.

> If you really want to prevent a malicious process with TopSecret data
> from sending it to a different process, then you can't use Linux on
> x86 or ARM.  Maybe that will be fixed some day, but you're going to
> need to use an extremely tight sandbox to make this work.

I won't be commenting on that.

>
>>>>> The signal example is inapplicable.
>>>> From a modeling viewpoint the actions are identical.
>>> This seems incorrect to me
>> What would be correct then? Some convoluted combination
>> of system entities that aren't owned or controlled by
>> any mechanism?
>>
> POSIX signal restrictions aren't there to prevent two processes from
> communicating.  They're there to prevent the sender from manipulating
> or crashing the receiver without appropriate privilege.

POSIX signal restrictions have a long history. In the P10031e/2c
debates both communication and manipulation where seriously
considered. I would say both are true.

>>>  and, I think, to most everyone else reading this.
>> That's quite the assertion. You may even be correct.
>>
>>>  Can you explain?
>>>
>>> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file.  When you send a signal to a process, the object is the target process.
>> YES!!!!!!!!!!!!
>>
>> And when a process triggers a notification it is the subject
>> and the watching process is the object!
>>
>> Subject == active entity
>> Object  == passive entity
>>
>> Triggering an event is, like calling kill(), an action!
>>
> And here is where I disagree with your interpretation.  Triggering an
> event is a side effect of writing to the file.  There are *two*
> security relevant actions, not one, and they are:
>
> First, the write:
>
> Subject == the writer
> Action == write
> Object == the file
>
> Then the event, which could be modeled in a couple of ways:
>
> Subject == the file

Files   are   not   subjects. They are passive entities.

> Action == notify
> Object == the recipient
>
> or
>
> Subject == the recipient
> Action == watch
> Object == the file
>
> By conflating these two actions into one, you've made the modeling
> very hard, and you start running into all these nasty questions like
> "who actually closed this open file"

No, I've made the code more difficult. You can not call
the file a subject. That is just wrong. It's not a valid
model.



^ permalink raw reply

* Re: [RFC PATCH v3 1/1] Add dm verity root hash pkcs7 sig validation
From: Jaskaran Singh Khurana @ 2019-06-10 21:22 UTC (permalink / raw)
  To: Milan Broz
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-fsdevel, agk, snitzer, dm-devel, jmorris, scottsh, ebiggers,
	Mikulas Patocka
In-Reply-To: <54170d18-31c7-463d-10b5-9af8b666df0f@gmail.com>

On Sat, 8 Jun 2019, Milan Broz wrote:

> On 08/06/2019 00:31, Jaskaran Khurana wrote:
> Why is this different from existing FEC extension?
> FEC uses ifdefs in header to blind functions if config is not set.
>
> ifeq ($(CONFIG_DM_VERITY_FEC),y)
> dm-verity-objs                  += dm-verity-fec.o
> endif
>
> ...
>

The reasoning for doing it this way is that there might be scripts that 
create a device mapper device and then mount and use it, with the 
signature verification enabled in kernel the scripts would be passing the 
signature like:

veritysetup open params... --roothash-sig=<sig.p7>

If later due to some reason the DM_VERITY_VERIFY_ROOTHASH_SIG is disabled 
if we do not recognize the parameter then the scripts need to be changed 
or else they will fail with INVALID argument,
in current implementation the parameter for signature is always parsed but
enforced based on the config being set, so the scripts need not be 
changed.
Let me know if you still feel I should be changing this and I will be 
happy to make the change, just wanted to share my reasoning for this.

>
> Thanks,
> Milan
>
Regards,
Jaskaran

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Andy Lutomirski @ 2019-06-10 19:53 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Andy Lutomirski, Stephen Smalley, David Howells, Al Viro,
	USB list, LSM List, Greg Kroah-Hartman, raven, Linux FS Devel,
	Linux API, linux-block, keyrings, LKML, Paul Moore
In-Reply-To: <4b7d02b2-2434-8a7c-66cc-7dbebc37efbc@schaufler-ca.com>

On Mon, Jun 10, 2019 at 12:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>> I think you really need to give an example of a coherent policy that
> >>> needs this.
> >> I keep telling you, and you keep ignoring what I say.
> >>
> >>>  As it stands, your analogy seems confusing.
> >> It's pretty simple. I have given both the abstract
> >> and examples.
> > You gave the /dev/null example, which is inapplicable to this patchset.
>
> That addressed an explicit objection, and pointed out
> an exception to a generality you had asserted, which was
> not true. It's also a red herring regarding the current
> discussion.

This argument is pointless.

Please humor me and just give me an example.  If you think you have
already done so, feel free to repeat yourself.  If you have no
example, then please just say so.

>
> >>>  If someone
> >>> changes the system clock, we don't restrict who is allowed to be
> >>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock
> >>> was changed based on who changed the clock.
> >> That's right. The system clock is not an object that
> >> unprivileged processes can modify. In fact, it is not
> >> an object at all. If you care to look, you will see that
> >> Smack does nothing with the clock.
> > And this is different from the mount tree how?
>
> The mount tree can be modified by unprivileged users.
> If nothing that unprivileged users can do to the mount
> tree can trigger a notification you are correct, the
> mount tree is very like the system clock. Is that the
> case?

The mount tree can't be modified by unprivileged users, unless a
privileged user very carefully configured it as such.  An unprivileged
user can create a new userns and a new mount ns, but then they're
modifying a whole different mount tree.

>
> >>>  Similarly, if someone
> >>> tries to receive a packet on a socket, we check whether they have the
> >>> right to receive on that socket (from the endpoint in question) and,
> >>> if the sender is local, whether the sender can send to that socket.
> >>> We do not check whether the sender can send to the receiver.
> >> Bzzzt! Smack sure does.
> > This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense.
>
> Process A sends a packet to process B.
> If A has access to TopSecret data and B is not
> allowed to see TopSecret data, the delivery should
> be prevented. Is that nonsensical?

It makes sense.  As I see it, the way that a sensible policy should do
this is by making sure that there are no sockets, pipes, etc that
Process A can write and that Process B can read.

If you really want to prevent a malicious process with TopSecret data
from sending it to a different process, then you can't use Linux on
x86 or ARM.  Maybe that will be fixed some day, but you're going to
need to use an extremely tight sandbox to make this work.

>
> >>> The signal example is inapplicable.
> >> From a modeling viewpoint the actions are identical.
> > This seems incorrect to me
>
> What would be correct then? Some convoluted combination
> of system entities that aren't owned or controlled by
> any mechanism?
>

POSIX signal restrictions aren't there to prevent two processes from
communicating.  They're there to prevent the sender from manipulating
or crashing the receiver without appropriate privilege.


> >  and, I think, to most everyone else reading this.
>
> That's quite the assertion. You may even be correct.
>
> >  Can you explain?
> >
> > In SELinux-ese, when you write to a file, the subject is the writer and the object is the file.  When you send a signal to a process, the object is the target process.
>
> YES!!!!!!!!!!!!
>
> And when a process triggers a notification it is the subject
> and the watching process is the object!
>
> Subject == active entity
> Object  == passive entity
>
> Triggering an event is, like calling kill(), an action!
>

And here is where I disagree with your interpretation.  Triggering an
event is a side effect of writing to the file.  There are *two*
security relevant actions, not one, and they are:

First, the write:

Subject == the writer
Action == write
Object == the file

Then the event, which could be modeled in a couple of ways:

Subject == the file
Action == notify
Object == the recipient

or

Subject == the recipient
Action == watch
Object == the file

By conflating these two actions into one, you've made the modeling
very hard, and you start running into all these nasty questions like
"who actually closed this open file"

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] mm: Introduce vm_ops->may_mprotect()
From: Sean Christopherson @ 2019-06-10 19:49 UTC (permalink / raw)
  To: Xing, Cedric
  Cc: Jarkko Sakkinen, Andy Lutomirski, Stephen Smalley, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris,
	selinux@vger.kernel.org, Jethro Beekman, Hansen, Dave,
	Thomas Gleixner, Linus Torvalds, LKML, X86 ML,
	linux-sgx@vger.kernel.org, Andrew Morton, nhorman@redhat.com,
	npmccallum@redhat.com, Ayoun, Serge, Katz-zamir, Shay,
	Huang, Haitao, Andy Shevchenko, Svahn, Kai, Borislav Petkov,
	Josh Triplett, Huang, Kai, David Rientjes, Roberts, William C,
	Tricca, Philip B
In-Reply-To: <960B34DE67B9E140824F1DCDEC400C0F654FFD59@ORSMSX116.amr.corp.intel.com>

On Mon, Jun 10, 2019 at 10:47:52AM -0700, Xing, Cedric wrote:
> > From: Christopherson, Sean J
> > Sent: Monday, June 10, 2019 8:56 AM
> > 
> > > > As a result, LSM policies cannot be meaningfully applied, e.g. an
> > > > LSM can deny access to the EPC as a whole, but can't deny PROT_EXEC
> > > > on page that originated in a non-EXECUTE file (which is long gone by
> > > > the time
> > > > mprotect() is called).
> > >
> > > I have hard time following what is paragraph is trying to say.
> > >
> > > > By hooking mprotect(), SGX can make explicit LSM upcalls while an
> > > > enclave is being built, i.e. when the kernel has a handle to origin
> > > > of each enclave page, and enforce the result of the LSM policy
> > > > whenever userspace maps the enclave page in the future.
> > >
> > > "LSM policy whenever calls mprotect()"? I'm no sure why you mean by
> > > mapping here and if there is any need to talk about future. Isn't this
> > > needed now?
> > 
> > Future is referring to the timeline of a running kernel, not the future
> > of the kernel code.
> > 
> > Rather than trying to explain all of the above with words, I'll provide
> > code examples to show how ->may_protect() will be used by SGX and why it
> > is the preferred solution.
> 
> The LSM concept is to separate security policy enforcement from the rest of
> the kernel. For modules, the "official" way is to use VM_MAY* flags to limit
> allowable permissions, while LSM uses security_file_mprotect().
> I guess that's why we didn't have .may_mprotect() in the first place.

Heh, so I've typed up about five different responses to this comment.  In
doing so, I think I've convinced myself that ->may_mprotect() is
unnecessary.  Rther than hook mprotect(), simply update the VM_MAY* flags
during mmap(), with all bits cleared if there isn't an associated enclave
page.  IIRC, the need to add ->may_protect() came about when we were
exploring more dynamic interplay between SGX and LSMs.

> What you are doing is enforcing some security policy outside of LSM, which
> is dirty from architecture perspective.

No, the enclave page protections are enforced regardless of LSM policy,
and in v2 those protections are immutable.  Yes, the explicit enclave
page protection bits are being added primarily for LSMs, but they don't
impact functionality other than at the security_enclave_load() touchpoint.

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Casey Schaufler @ 2019-06-10 19:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Stephen Smalley, David Howells, Al Viro,
	USB list, LSM List, Greg Kroah-Hartman, raven, Linux FS Devel,
	Linux API, linux-block, keyrings, LKML, Paul Moore, casey
In-Reply-To: <E0925E1F-E5F2-4457-8704-47B6E64FE3F3@amacapital.net>

On 6/10/2019 11:22 AM, Andy Lutomirski wrote:
>> On Jun 10, 2019, at 11:01 AM, Casey Schaufler <casey@schaufler-ca.com> wrote:
>>
>>> On 6/10/2019 9:42 AM, Andy Lutomirski wrote:
>>>> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>>> On 6/10/2019 8:21 AM, Stephen Smalley wrote:
>>>>>> On 6/7/19 10:17 AM, David Howells wrote:
>>>>>> Hi Al,
>>>>>>
>>>>>> Here's a set of patches to add a general variable-length notification queue
>>>>>> concept and to add sources of events for:
>>>>>>
>>>>>>  (1) Mount topology events, such as mounting, unmounting, mount expiry,
>>>>>>      mount reconfiguration.
>>>>>>
>>>>>>  (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
>>>>>>      errors (not complete yet).
>>>>>>
>>>>>>  (3) Key/keyring events, such as creating, linking and removal of keys.
>>>>>>
>>>>>>  (4) General device events (single common queue) including:
>>>>>>
>>>>>>      - Block layer events, such as device errors
>>>>>>
>>>>>>      - USB subsystem events, such as device/bus attach/remove, device
>>>>>>        reset, device errors.
>>>>>>
>>>>>> One of the reasons for this is so that we can remove the issue of processes
>>>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to
>>>>>> be a system performance problem.  To further aid this, the fsinfo() syscall
>>>>>> on which this patch series depends, provides a way to access superblock and
>>>>>> mount information in binary form without the need to parse /proc/mounts.
>>>>>>
>>>>>>
>>>>>> LSM support is included, but controversial:
>>>>>>
>>>>>>  (1) The creds of the process that did the fput() that reduced the refcount
>>>>>>      to zero are cached in the file struct.
>>>>>>
>>>>>>  (2) __fput() overrides the current creds with the creds from (1) whilst
>>>>>>      doing the cleanup, thereby making sure that the creds seen by the
>>>>>>      destruction notification generated by mntput() appears to come from
>>>>>>      the last fputter.
>>>>>>
>>>>>>  (3) security_post_notification() is called for each queue that we might
>>>>>>      want to post a notification into, thereby allowing the LSM to prevent
>>>>>>      covert communications.
>>>>>>
>>>>>>  (?) Do I need to add security_set_watch(), say, to rule on whether a watch
>>>>>>      may be set in the first place?  I might need to add a variant per
>>>>>>      watch-type.
>>>>>>
>>>>>>  (?) Do I really need to keep track of the process creds in which an
>>>>>>      implicit object destruction happened?  For example, imagine you create
>>>>>>      an fd with fsopen()/fsmount().  It is marked to dissolve the mount it
>>>>>>      refers to on close unless move_mount() clears that flag.  Now, imagine
>>>>>>      someone looking at that fd through procfs at the same time as you exit
>>>>>>      due to an error.  The LSM sees the destruction notification come from
>>>>>>      the looker if they happen to do their fput() after yours.
>>>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea.
>>>>>
>>>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type.
>>>>>
>>>>> I wouldn't expect SELinux to implement security_post_notification() at all.  I can't see how one can construct a meaningful, stable policy for it.  I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it.
>>>> I cannot agree. There is an explicit action by a subject that results
>>>> in information being delivered to an object. Just like a signal or a
>>>> UDP packet delivery. Smack handles this kind of thing just fine. The
>>>> internal mechanism that results in the access is irrelevant from
>>>> this viewpoint. I can understand how a mechanism like SELinux that
>>>> works on finer granularity might view it differently.
>>> I think you really need to give an example of a coherent policy that
>>> needs this.
>> I keep telling you, and you keep ignoring what I say.
>>
>>>  As it stands, your analogy seems confusing.
>> It's pretty simple. I have given both the abstract
>> and examples.
> You gave the /dev/null example, which is inapplicable to this patchset.

That addressed an explicit objection, and pointed out
an exception to a generality you had asserted, which was
not true. It's also a red herring regarding the current
discussion.

>>>  If someone
>>> changes the system clock, we don't restrict who is allowed to be
>>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock
>>> was changed based on who changed the clock.
>> That's right. The system clock is not an object that
>> unprivileged processes can modify. In fact, it is not
>> an object at all. If you care to look, you will see that
>> Smack does nothing with the clock.
> And this is different from the mount tree how?

The mount tree can be modified by unprivileged users.
If nothing that unprivileged users can do to the mount
tree can trigger a notification you are correct, the
mount tree is very like the system clock. Is that the
case? 

>>>  Similarly, if someone
>>> tries to receive a packet on a socket, we check whether they have the
>>> right to receive on that socket (from the endpoint in question) and,
>>> if the sender is local, whether the sender can send to that socket.
>>> We do not check whether the sender can send to the receiver.
>> Bzzzt! Smack sure does.
> This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense.

Process A sends a packet to process B.
If A has access to TopSecret data and B is not
allowed to see TopSecret data, the delivery should
be prevented. Is that nonsensical?

>>> The signal example is inapplicable.
>> From a modeling viewpoint the actions are identical.
> This seems incorrect to me

What would be correct then? Some convoluted combination
of system entities that aren't owned or controlled by
any mechanism? 

>  and, I think, to most everyone else reading this.

That's quite the assertion. You may even be correct.

>  Can you explain?
>
> In SELinux-ese, when you write to a file, the subject is the writer and the object is the file.  When you send a signal to a process, the object is the target process.

YES!!!!!!!!!!!!

And when a process triggers a notification it is the subject
and the watching process is the object!

Subject == active entity
Object  == passive entity

Triggering an event is, like calling kill(), an action!



^ permalink raw reply

* Re: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits
From: Andy Lutomirski @ 2019-06-10 19:15 UTC (permalink / raw)
  To: Xing, Cedric
  Cc: Christopherson, Sean J, Jarkko Sakkinen, Andy Lutomirski,
	Stephen Smalley, James Morris, Serge E . Hallyn, LSM List,
	Paul Moore, Eric Paris, selinux@vger.kernel.org, Jethro Beekman,
	Hansen, Dave, Thomas Gleixner, Linus Torvalds, LKML, X86 ML,
	linux-sgx@vger.kernel.org, Andrew Morton, nhorman@redhat.com,
	npmccallum@redhat.com, Ayoun, Serge, Katz-zamir, Shay,
	Huang, Haitao, Andy Shevchenko, Svahn, Kai, Borislav Petkov,
	Josh Triplett, Huang, Kai, David Rientjes, Roberts, William C,
	Tricca, Philip B
In-Reply-To: <960B34DE67B9E140824F1DCDEC400C0F65500E13@ORSMSX116.amr.corp.intel.com>

On Mon, Jun 10, 2019 at 11:29 AM Xing, Cedric <cedric.xing@intel.com> wrote:
>
> > From: Christopherson, Sean J
> > Sent: Wednesday, June 05, 2019 7:12 PM
> >
> > +/**
> > + * sgx_map_allowed - check vma protections against the associated
> > enclave page
> > + * @encl:    an enclave
> > + * @start:   start address of the mapping (inclusive)
> > + * @end:     end address of the mapping (exclusive)
> > + * @prot:    protection bits of the mapping
> > + *
> > + * Verify a userspace mapping to an enclave page would not violate the
> > +security
> > + * requirements of the *kernel*.  Note, this is in no way related to
> > +the
> > + * page protections enforced by hardware via the EPCM.  The EPCM
> > +protections
> > + * can be directly extended by the enclave, i.e. cannot be relied upon
> > +by the
> > + * kernel for security guarantees of any kind.
> > + *
> > + * Return:
> > + *   0 on success,
> > + *   -EACCES if the mapping is disallowed
> > + */
> > +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
> > +                 unsigned long end, unsigned long prot) {
> > +     struct sgx_encl_page *page;
> > +     unsigned long addr;
> > +
> > +     prot &= (VM_READ | VM_WRITE | VM_EXEC);
> > +     if (!prot || !encl)
> > +             return 0;
> > +
> > +     mutex_lock(&encl->lock);
> > +
> > +     for (addr = start; addr < end; addr += PAGE_SIZE) {
> > +             page = radix_tree_lookup(&encl->page_tree, addr >>
> > PAGE_SHIFT);
> > +
> > +             /*
> > +              * Do not allow R|W|X to a non-existent page, or protections
> > +              * beyond those of the existing enclave page.
> > +              */
> > +             if (!page || (prot & ~page->prot))
> > +                     return -EACCES;
>
> In SGX2, pages will be "mapped" before being populated.
>
> Here's a brief summary for those who don't have enough background on how new EPC pages could be added to a running enclave in SGX2:
>   - There are 2 new instructions - EACCEPT and EAUG.
>   - EAUG is used by SGX module to add (augment) a new page to an existing enclave. The newly added page is *inaccessible* until the enclave *accepts* it.
>   - EACCEPT is the instruction for an enclave to accept a new page.
>
> And the s/w flow for an enclave to request new EPC pages is expected to be something like the following:
>   - The enclave issues EACCEPT at the linear address that it would like a new page.
>   - EACCEPT results in #PF, as there's no page at the linear address above.
>   - SGX module is notified about the #PF, in form of its vma->vm_ops->fault() being called by kernel.
>   - SGX module EAUGs a new EPC page at the fault address, and resumes the enclave.
>   - EACCEPT is reattempted, and succeeds at this time.

This seems like an odd workflow.  Shouldn't the #PF return back to
untrusted userspace so that the untrusted user code can make its own
decision as to whether it wants to EAUG a page there as opposed to,
say, killing the enclave or waiting to keep resource usage under
control?

^ permalink raw reply

* [IMA] Re: possible deadlock in get_user_pages_unlocked (2)
From: Eric Biggers @ 2019-06-10 19:06 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: syzbot, akpm, aneesh.kumar, dan.j.williams, ira.weiny, jack,
	jhubbard, jmorris, keith.busch, linux-kernel, linux-mm,
	linux-security-module, richard.weiyang, rppt, serge, sfr,
	syzkaller-bugs, willy
In-Reply-To: <0000000000001d42b5058a895703@google.com>

On Tue, Jun 04, 2019 at 06:16:00PM -0700, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit 69d61f577d147b396be0991b2ac6f65057f7d445
> Author: Mimi Zohar <zohar@linux.ibm.com>
> Date:   Wed Apr 3 21:47:46 2019 +0000
> 
>     ima: verify mprotect change is consistent with mmap policy
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1055a2f2a00000
> start commit:   56b697c6 Add linux-next specific files for 20190604
> git tree:       linux-next
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=1255a2f2a00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=1455a2f2a00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4248d6bc70076f7d
> dashboard link: https://syzkaller.appspot.com/bug?extid=e1374b2ec8f6a25ab2e5
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=165757eea00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10dd3e86a00000
> 
> Reported-by: syzbot+e1374b2ec8f6a25ab2e5@syzkaller.appspotmail.com
> Fixes: 69d61f577d14 ("ima: verify mprotect change is consistent with mmap
> policy")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 

Hi Mimi, it seems your change to call ima_file_mmap() from
security_file_mprotect() violates the locking order by taking i_rwsem while
mmap_sem is held.

- Eric

^ permalink raw reply

* RE: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits
From: Xing, Cedric @ 2019-06-10 18:29 UTC (permalink / raw)
  To: Christopherson, Sean J, Jarkko Sakkinen
  Cc: Andy Lutomirski, Stephen Smalley, James Morris, Serge E . Hallyn,
	LSM List, Paul Moore, Eric Paris, selinux@vger.kernel.org,
	Jethro Beekman, Hansen, Dave, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx@vger.kernel.org, Andrew Morton,
	nhorman@redhat.com, npmccallum@redhat.com, Ayoun, Serge,
	Katz-zamir, Shay, Huang, Haitao, Andy Shevchenko, Svahn, Kai,
	Borislav Petkov, Josh Triplett, Huang, Kai, David Rientjes,
	Roberts, William C, Tricca, Philip B
In-Reply-To: <20190606021145.12604-3-sean.j.christopherson@intel.com>

> From: Christopherson, Sean J
> Sent: Wednesday, June 05, 2019 7:12 PM
> 
> +/**
> + * sgx_map_allowed - check vma protections against the associated
> enclave page
> + * @encl:	an enclave
> + * @start:	start address of the mapping (inclusive)
> + * @end:	end address of the mapping (exclusive)
> + * @prot:	protection bits of the mapping
> + *
> + * Verify a userspace mapping to an enclave page would not violate the
> +security
> + * requirements of the *kernel*.  Note, this is in no way related to
> +the
> + * page protections enforced by hardware via the EPCM.  The EPCM
> +protections
> + * can be directly extended by the enclave, i.e. cannot be relied upon
> +by the
> + * kernel for security guarantees of any kind.
> + *
> + * Return:
> + *   0 on success,
> + *   -EACCES if the mapping is disallowed
> + */
> +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
> +		    unsigned long end, unsigned long prot) {
> +	struct sgx_encl_page *page;
> +	unsigned long addr;
> +
> +	prot &= (VM_READ | VM_WRITE | VM_EXEC);
> +	if (!prot || !encl)
> +		return 0;
> +
> +	mutex_lock(&encl->lock);
> +
> +	for (addr = start; addr < end; addr += PAGE_SIZE) {
> +		page = radix_tree_lookup(&encl->page_tree, addr >>
> PAGE_SHIFT);
> +
> +		/*
> +		 * Do not allow R|W|X to a non-existent page, or protections
> +		 * beyond those of the existing enclave page.
> +		 */
> +		if (!page || (prot & ~page->prot))
> +			return -EACCES;

In SGX2, pages will be "mapped" before being populated.

Here's a brief summary for those who don't have enough background on how new EPC pages could be added to a running enclave in SGX2:
  - There are 2 new instructions - EACCEPT and EAUG.
  - EAUG is used by SGX module to add (augment) a new page to an existing enclave. The newly added page is *inaccessible* until the enclave *accepts* it.
  - EACCEPT is the instruction for an enclave to accept a new page.

And the s/w flow for an enclave to request new EPC pages is expected to be something like the following:
  - The enclave issues EACCEPT at the linear address that it would like a new page.
  - EACCEPT results in #PF, as there's no page at the linear address above.
  - SGX module is notified about the #PF, in form of its vma->vm_ops->fault() being called by kernel.
  - SGX module EAUGs a new EPC page at the fault address, and resumes the enclave.
  - EACCEPT is reattempted, and succeeds at this time.

But with the above check in sgx_map_allowed(), I'm not sure how this will work out with SGX2.

> +	}
> +
> +	mutex_unlock(&encl->lock);
> +
> +	return 0;
> +}
> +

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Andy Lutomirski @ 2019-06-10 18:22 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Andy Lutomirski, Stephen Smalley, David Howells, Al Viro,
	USB list, LSM List, Greg Kroah-Hartman, raven, Linux FS Devel,
	Linux API, linux-block, keyrings, LKML, Paul Moore
In-Reply-To: <dac74580-5b48-86e4-8222-cac29a9f541d@schaufler-ca.com>


> On Jun 10, 2019, at 11:01 AM, Casey Schaufler <casey@schaufler-ca.com> wrote:
> 
>> On 6/10/2019 9:42 AM, Andy Lutomirski wrote:
>>> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>> On 6/10/2019 8:21 AM, Stephen Smalley wrote:
>>>>> On 6/7/19 10:17 AM, David Howells wrote:
>>>>> Hi Al,
>>>>> 
>>>>> Here's a set of patches to add a general variable-length notification queue
>>>>> concept and to add sources of events for:
>>>>> 
>>>>>  (1) Mount topology events, such as mounting, unmounting, mount expiry,
>>>>>      mount reconfiguration.
>>>>> 
>>>>>  (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
>>>>>      errors (not complete yet).
>>>>> 
>>>>>  (3) Key/keyring events, such as creating, linking and removal of keys.
>>>>> 
>>>>>  (4) General device events (single common queue) including:
>>>>> 
>>>>>      - Block layer events, such as device errors
>>>>> 
>>>>>      - USB subsystem events, such as device/bus attach/remove, device
>>>>>        reset, device errors.
>>>>> 
>>>>> One of the reasons for this is so that we can remove the issue of processes
>>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to
>>>>> be a system performance problem.  To further aid this, the fsinfo() syscall
>>>>> on which this patch series depends, provides a way to access superblock and
>>>>> mount information in binary form without the need to parse /proc/mounts.
>>>>> 
>>>>> 
>>>>> LSM support is included, but controversial:
>>>>> 
>>>>>  (1) The creds of the process that did the fput() that reduced the refcount
>>>>>      to zero are cached in the file struct.
>>>>> 
>>>>>  (2) __fput() overrides the current creds with the creds from (1) whilst
>>>>>      doing the cleanup, thereby making sure that the creds seen by the
>>>>>      destruction notification generated by mntput() appears to come from
>>>>>      the last fputter.
>>>>> 
>>>>>  (3) security_post_notification() is called for each queue that we might
>>>>>      want to post a notification into, thereby allowing the LSM to prevent
>>>>>      covert communications.
>>>>> 
>>>>>  (?) Do I need to add security_set_watch(), say, to rule on whether a watch
>>>>>      may be set in the first place?  I might need to add a variant per
>>>>>      watch-type.
>>>>> 
>>>>>  (?) Do I really need to keep track of the process creds in which an
>>>>>      implicit object destruction happened?  For example, imagine you create
>>>>>      an fd with fsopen()/fsmount().  It is marked to dissolve the mount it
>>>>>      refers to on close unless move_mount() clears that flag.  Now, imagine
>>>>>      someone looking at that fd through procfs at the same time as you exit
>>>>>      due to an error.  The LSM sees the destruction notification come from
>>>>>      the looker if they happen to do their fput() after yours.
>>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea.
>>>> 
>>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type.
>>>> 
>>>> I wouldn't expect SELinux to implement security_post_notification() at all.  I can't see how one can construct a meaningful, stable policy for it.  I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it.
>>> I cannot agree. There is an explicit action by a subject that results
>>> in information being delivered to an object. Just like a signal or a
>>> UDP packet delivery. Smack handles this kind of thing just fine. The
>>> internal mechanism that results in the access is irrelevant from
>>> this viewpoint. I can understand how a mechanism like SELinux that
>>> works on finer granularity might view it differently.
>> I think you really need to give an example of a coherent policy that
>> needs this.
> 
> I keep telling you, and you keep ignoring what I say.
> 
>>  As it stands, your analogy seems confusing.
> 
> It's pretty simple. I have given both the abstract
> and examples.

You gave the /dev/null example, which is inapplicable to this patchset.

> 
>>  If someone
>> changes the system clock, we don't restrict who is allowed to be
>> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock
>> was changed based on who changed the clock.
> 
> That's right. The system clock is not an object that
> unprivileged processes can modify. In fact, it is not
> an object at all. If you care to look, you will see that
> Smack does nothing with the clock.

And this is different from the mount tree how?

> 
>>  Similarly, if someone
>> tries to receive a packet on a socket, we check whether they have the
>> right to receive on that socket (from the endpoint in question) and,
>> if the sender is local, whether the sender can send to that socket.
>> We do not check whether the sender can send to the receiver.
> 
> Bzzzt! Smack sure does.

This seems dubious. I’m still trying to get you to explain to a non-Smack person why this makes sense.

> 
>> The signal example is inapplicable.
> 
> From a modeling viewpoint the actions are identical.

This seems incorrect to me and, I think, to most everyone else reading this. Can you explain?

In SELinux-ese, when you write to a file, the subject is the writer and the object is the file.  When you send a signal to a process, the object is the target process.

^ permalink raw reply

* Re: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits
From: Sean Christopherson @ 2019-06-10 18:17 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Cedric Xing, Stephen Smalley, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris, selinux,
	Jethro Beekman, Dave Hansen, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx, Andrew Morton, nhorman, npmccallum,
	Serge Ayoun, Shay Katz-zamir, Haitao Huang, Andy Shevchenko,
	Kai Svahn, Borislav Petkov, Josh Triplett, Kai Huang,
	David Rientjes, William Roberts, Philip Tricca
In-Reply-To: <20190610174506.GB13732@linux.intel.com>

On Mon, Jun 10, 2019 at 08:45:06PM +0300, Jarkko Sakkinen wrote:
> On Mon, Jun 10, 2019 at 09:15:33AM -0700, Sean Christopherson wrote:
> > > 'flags' should would renamed as 'secinfo_flags_mask' even if the name is
> > > longish. It would use the same values as the SECINFO flags. The field in
> > > struct sgx_encl_page should have the same name. That would express
> > > exactly relation between SECINFO and the new field. I would have never
> > > asked on last iteration why SECINFO is not enough with a better naming.
> > 
> > No, these flags do not impact the EPCM protections in any way.  Userspace
> > can extend the EPCM protections without going through the kernel.  The
> > protection flags for an enclave page impact VMA/PTE protection bits.
> > 
> > IMO, it is best to treat the EPCM as being completely separate from the
> > kernel's EPC management.
> 
> It is a clumsy API if permissions are not taken in the same format for
> everything. There is no reason not to do it. The way mprotect() callback
> just interprets the field is as VMA permissions.

They are two entirely different things.  The explicit protection bits are
consumed by the kernel, while SECINFO.flags is consumed by the CPU.  The
intent is to have the protection flags be analogous to mprotect(), the
fact that they have a similar/identical format to SECINFO is irrelevant.

Calling the field secinfo_flags_mask is straight up wrong on SGX2, as 
userspace can use EMODPE to set SECINFO after the page is added.  It's
also wrong on SGX1 when adding TCS pages since SECINFO.RWX bits for TCS
pages are forced to zero by hardware.

> It would also be more future-proof just to have a mask covering all bits
> of the SECINFO flags field.

This simply doesn't work, e.g. the PENDING, MODIFIED and PR flags in the
SECINFO are read-only from a software perspective.

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Casey Schaufler @ 2019-06-10 18:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stephen Smalley, David Howells, Al Viro, USB list, LSM List,
	Greg Kroah-Hartman, raven, Linux FS Devel, Linux API, linux-block,
	keyrings, LKML, Paul Moore, casey
In-Reply-To: <CALCETrX5O18q2=dUeC=hEtK2=t5KQpGBy9XveHxFw36OqkbNOg@mail.gmail.com>

On 6/10/2019 9:42 AM, Andy Lutomirski wrote:
> On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 6/10/2019 8:21 AM, Stephen Smalley wrote:
>>> On 6/7/19 10:17 AM, David Howells wrote:
>>>> Hi Al,
>>>>
>>>> Here's a set of patches to add a general variable-length notification queue
>>>> concept and to add sources of events for:
>>>>
>>>>   (1) Mount topology events, such as mounting, unmounting, mount expiry,
>>>>       mount reconfiguration.
>>>>
>>>>   (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
>>>>       errors (not complete yet).
>>>>
>>>>   (3) Key/keyring events, such as creating, linking and removal of keys.
>>>>
>>>>   (4) General device events (single common queue) including:
>>>>
>>>>       - Block layer events, such as device errors
>>>>
>>>>       - USB subsystem events, such as device/bus attach/remove, device
>>>>         reset, device errors.
>>>>
>>>> One of the reasons for this is so that we can remove the issue of processes
>>>> having to repeatedly and regularly scan /proc/mounts, which has proven to
>>>> be a system performance problem.  To further aid this, the fsinfo() syscall
>>>> on which this patch series depends, provides a way to access superblock and
>>>> mount information in binary form without the need to parse /proc/mounts.
>>>>
>>>>
>>>> LSM support is included, but controversial:
>>>>
>>>>   (1) The creds of the process that did the fput() that reduced the refcount
>>>>       to zero are cached in the file struct.
>>>>
>>>>   (2) __fput() overrides the current creds with the creds from (1) whilst
>>>>       doing the cleanup, thereby making sure that the creds seen by the
>>>>       destruction notification generated by mntput() appears to come from
>>>>       the last fputter.
>>>>
>>>>   (3) security_post_notification() is called for each queue that we might
>>>>       want to post a notification into, thereby allowing the LSM to prevent
>>>>       covert communications.
>>>>
>>>>   (?) Do I need to add security_set_watch(), say, to rule on whether a watch
>>>>       may be set in the first place?  I might need to add a variant per
>>>>       watch-type.
>>>>
>>>>   (?) Do I really need to keep track of the process creds in which an
>>>>       implicit object destruction happened?  For example, imagine you create
>>>>       an fd with fsopen()/fsmount().  It is marked to dissolve the mount it
>>>>       refers to on close unless move_mount() clears that flag.  Now, imagine
>>>>       someone looking at that fd through procfs at the same time as you exit
>>>>       due to an error.  The LSM sees the destruction notification come from
>>>>       the looker if they happen to do their fput() after yours.
>>> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea.
>>>
>>> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type.
>>>
>>> I wouldn't expect SELinux to implement security_post_notification() at all.  I can't see how one can construct a meaningful, stable policy for it.  I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it.
>> I cannot agree. There is an explicit action by a subject that results
>> in information being delivered to an object. Just like a signal or a
>> UDP packet delivery. Smack handles this kind of thing just fine. The
>> internal mechanism that results in the access is irrelevant from
>> this viewpoint. I can understand how a mechanism like SELinux that
>> works on finer granularity might view it differently.
> I think you really need to give an example of a coherent policy that
> needs this.

I keep telling you, and you keep ignoring what I say.

>   As it stands, your analogy seems confusing.

It's pretty simple. I have given both the abstract
and examples.

>   If someone
> changes the system clock, we don't restrict who is allowed to be
> notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock
> was changed based on who changed the clock.

That's right. The system clock is not an object that
unprivileged processes can modify. In fact, it is not
an object at all. If you care to look, you will see that
Smack does nothing with the clock.

>   Similarly, if someone
> tries to receive a packet on a socket, we check whether they have the
> right to receive on that socket (from the endpoint in question) and,
> if the sender is local, whether the sender can send to that socket.
> We do not check whether the sender can send to the receiver.

Bzzzt! Smack sure does.

> The signal example is inapplicable.

From a modeling viewpoint the actions are identical.

>   Sending a signal to a process is
> an explicit action done to that process, and it can easily adversely
> affect the target.  Of course it requires permission.
>
> --Andy


^ permalink raw reply

* RE: [RFC PATCH v2 1/5] mm: Introduce vm_ops->may_mprotect()
From: Xing, Cedric @ 2019-06-10 17:47 UTC (permalink / raw)
  To: Christopherson, Sean J, Jarkko Sakkinen
  Cc: Andy Lutomirski, Stephen Smalley, James Morris, Serge E . Hallyn,
	LSM List, Paul Moore, Eric Paris, selinux@vger.kernel.org,
	Jethro Beekman, Hansen, Dave, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx@vger.kernel.org, Andrew Morton,
	nhorman@redhat.com, npmccallum@redhat.com, Ayoun, Serge,
	Katz-zamir, Shay, Huang, Haitao, Andy Shevchenko, Svahn, Kai,
	Borislav Petkov, Josh Triplett, Huang, Kai, David Rientjes,
	Roberts, William C, Tricca, Philip B
In-Reply-To: <20190610155549.GB15995@linux.intel.com>

> From: Christopherson, Sean J
> Sent: Monday, June 10, 2019 8:56 AM
> 
> > > As a result, LSM policies cannot be meaningfully applied, e.g. an
> > > LSM can deny access to the EPC as a whole, but can't deny PROT_EXEC
> > > on page that originated in a non-EXECUTE file (which is long gone by
> > > the time
> > > mprotect() is called).
> >
> > I have hard time following what is paragraph is trying to say.
> >
> > > By hooking mprotect(), SGX can make explicit LSM upcalls while an
> > > enclave is being built, i.e. when the kernel has a handle to origin
> > > of each enclave page, and enforce the result of the LSM policy
> > > whenever userspace maps the enclave page in the future.
> >
> > "LSM policy whenever calls mprotect()"? I'm no sure why you mean by
> > mapping here and if there is any need to talk about future. Isn't this
> > needed now?
> 
> Future is referring to the timeline of a running kernel, not the future
> of the kernel code.
> 
> Rather than trying to explain all of the above with words, I'll provide
> code examples to show how ->may_protect() will be used by SGX and why it
> is the preferred solution.

The LSM concept is to separate security policy enforcement from the rest of the kernel. For modules, the "official" way is to use VM_MAY* flags to limit allowable permissions, while LSM uses security_file_mprotect(). I guess that's why we didn't have .may_mprotect() in the first place. What you are doing is enforcing some security policy outside of LSM, which is dirty from architecture perspective.

^ permalink raw reply

* Re: [PATCH 06/13] keys: Add a notification facility [ver #4]
From: David Howells @ 2019-06-10 17:47 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: dhowells, viro, raven, linux-fsdevel, linux-api, linux-block,
	keyrings, linux-security-module, linux-kernel
In-Reply-To: <20190610111110.72468326@lwn.net>

Jonathan Corbet <corbet@lwn.net> wrote:

> > 	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01);
> 
> One little nit: it seems that keyctl_watch_key is actually spelled
> keyctl(KEYCTL_WATCH_KEY, ...).

Yeah - but there'll be a wrapper for it in -lkeyutils.  The syscalls I added
in other patches are, technically, referred to syscall(__NR_xxx, ...) at the
moment too.

David

^ permalink raw reply

* Re: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits
From: Jarkko Sakkinen @ 2019-06-10 17:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Cedric Xing, Stephen Smalley, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris, selinux,
	Jethro Beekman, Dave Hansen, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx, Andrew Morton, nhorman, npmccallum,
	Serge Ayoun, Shay Katz-zamir, Haitao Huang, Andy Shevchenko,
	Kai Svahn, Borislav Petkov, Josh Triplett, Kai Huang,
	David Rientjes, William Roberts, Philip Tricca
In-Reply-To: <20190610161532.GC15995@linux.intel.com>

On Mon, Jun 10, 2019 at 09:15:33AM -0700, Sean Christopherson wrote:
> > 'flags' should would renamed as 'secinfo_flags_mask' even if the name is
> > longish. It would use the same values as the SECINFO flags. The field in
> > struct sgx_encl_page should have the same name. That would express
> > exactly relation between SECINFO and the new field. I would have never
> > asked on last iteration why SECINFO is not enough with a better naming.
> 
> No, these flags do not impact the EPCM protections in any way.  Userspace
> can extend the EPCM protections without going through the kernel.  The
> protection flags for an enclave page impact VMA/PTE protection bits.
> 
> IMO, it is best to treat the EPCM as being completely separate from the
> kernel's EPC management.

It is a clumsy API if permissions are not taken in the same format for
everything. There is no reason not to do it. The way mprotect() callback
just interprets the field is as VMA permissions.

It would also be more future-proof just to have a mask covering all bits
of the SECINFO flags field.

/Jarkko

^ permalink raw reply

* Re: [RFC PATCH v1 0/3] security/x86/sgx: SGX specific LSM hooks
From: Jarkko Sakkinen @ 2019-06-10 17:36 UTC (permalink / raw)
  To: Cedric Xing
  Cc: linux-security-module, selinux, linux-kernel, linux-sgx, luto,
	sds, jmorris, serge, paul, eparis, jethro, dave.hansen, tglx,
	torvalds, akpm, nhorman, pmccallum, serge.ayoun, shay.katz-zamir,
	haitao.huang, andriy.shevchenko, kai.svahn, bp, josh, kai.huang,
	rientjes, william.c.roberts, philip.b.tricca
In-Reply-To: <cover.1560131039.git.cedric.xing@intel.com>

On Mon, Jun 10, 2019 at 12:03:03AM -0700, Cedric Xing wrote:
> This series intends to make the new SGX subsystem and the existing LSM
> architecture work together smoothly so that, say, SGX cannot be abused to work
> around restrictions set forth by LSM. This series applies on top of Jarkko
> Sakkinen's SGX series v20 (https://lkml.org/lkml/2019/4/17/344), where abundant
> details of this SGX/LSM problem could be found.
> 
> This series is an alternative to Sean Christopherson's recent RFC series
> (https://lkml.org/lkml/2019/6/5/1070) that was trying to solve the same
> problem. The key problem is for LSM to determine the "maximal (most permissive)
> protection" allowed for individual enclave pages. Sean's approach is to take
> that from user mode code as a parameter of the EADD ioctl, validate it with LSM
> ahead of time, and then enforce it inside the SGX subsystem. The major
> disadvantage IMHO is that a priori knowledge of "maximal protection" is needed,
> but it isn't always available in certain use cases. In fact, it is an unusual
> approach to take "maximal protection" from user code, as what SELinux is doing
> today is to determine "maximal protection" of a vma using attributes associated
> with vma->vm_file instead. When it comes to enclaves, vma->vm_file always
> points /dev/sgx/enclave, so what's missing is a new way for LSM modules to
> remember origins of enclave pages so that they don't solely depend on
> vma->vm_file to determine "maximal protection".
> 
> This series takes advantage of the fact that enclave pages cannot be remapped
> (to different linear address), therefore the pair of { vma->vm_file,
> linear_address } can be used to uniquely identify an enclave page. Then by
> notifying LSM on creation of every enclave page (via a new LSM hook -
> security_enclave_load), LSM modules would be able to track origin and
> protection changes of every page, hence be able to judge correctly upon
> mmap/mprotect requests.
> 
> Cedric Xing (3):
>   LSM/x86/sgx: Add SGX specific LSM hooks
>   LSM/x86/sgx: Implement SGX specific hooks in SELinux
>   LSM/x86/sgx: Call new LSM hooks from SGX subsystem

A patch set containing direct LSM changes should consider all LSMs.
This will allow all the LSM maintainers to consider the changes. Now we
have a limited audience and we are favoring one LSM.

There is no good reason why direct LSM changes cannot be done
post-upstreaming like we do for virtualization.

Looking at Sean's patches, overally 1/5-3/5 make perfect sense.

/Jarkko

^ permalink raw reply

* Re: [PATCH 06/13] keys: Add a notification facility [ver #4]
From: Jonathan Corbet @ 2019-06-10 17:11 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, linux-fsdevel, linux-api, linux-block, keyrings,
	linux-security-module, linux-kernel
In-Reply-To: <155991709983.15579.13232123365803197237.stgit@warthog.procyon.org.uk>

On Fri, 07 Jun 2019 15:18:19 +0100
David Howells <dhowells@redhat.com> wrote:

> Add a key/keyring change notification facility whereby notifications about
> changes in key and keyring content and attributes can be received.
> 
> Firstly, an event queue needs to be created:
> 
> 	fd = open("/dev/event_queue", O_RDWR);
> 	ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n);
> 
> then a notification can be set up to report notifications via that queue:
> 
> 	struct watch_notification_filter filter = {
> 		.nr_filters = 1,
> 		.filters = {
> 			[0] = {
> 				.type = WATCH_TYPE_KEY_NOTIFY,
> 				.subtype_filter[0] = UINT_MAX,
> 			},
> 		},
> 	};
> 	ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
> 	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01);

One little nit: it seems that keyctl_watch_key is actually spelled
keyctl(KEYCTL_WATCH_KEY, ...).

jon

^ permalink raw reply

* Re: [RFC PATCH v2 5/5] security/selinux: Add enclave_load() implementation
From: Sean Christopherson @ 2019-06-10 16:46 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Jarkko Sakkinen, Andy Lutomirski, Cedric Xing, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris, selinux,
	Jethro Beekman, Dave Hansen, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx, Andrew Morton, nhorman, npmccallum,
	Serge Ayoun, Shay Katz-zamir, Haitao Huang, Andy Shevchenko,
	Kai Svahn, Borislav Petkov, Josh Triplett, Kai Huang,
	David Rientjes, William Roberts, Philip Tricca
In-Reply-To: <ca77e597-69de-9db4-f88a-26881ab53680@tycho.nsa.gov>

On Fri, Jun 07, 2019 at 05:16:01PM -0400, Stephen Smalley wrote:
> On 6/5/19 10:11 PM, Sean Christopherson wrote:
> >The goal of selinux_enclave_load() is to provide a facsimile of the
> >existing selinux_file_mprotect() and file_map_prot_check() policies,
> >but tailored to the unique properties of SGX.
> >
> >For example, an enclave page is technically backed by a MAP_SHARED file,
> >but the "file" is essentially shared memory that is never persisted
> >anywhere and also requires execute permissions (for some pages).
> >
> >The basic concept is to require appropriate execute permissions on the
> >source of the enclave for pages that are requesting PROT_EXEC, e.g. if
> >an enclave page is being loaded from a regular file, require
> >FILE__EXECUTE and/or FILE__EXECMOND, and if it's coming from an
> >anonymous/private mapping, require PROCESS__EXECMEM since the process
> >is essentially executing from the mapping, albeit in a roundabout way.
> >
> >Note, FILE__READ and FILE__WRITE are intentionally not required even if
> >the source page is backed by a regular file.  Writes to the enclave page
> >are contained to the EPC, i.e. never hit the original file, and read
> >permissions have already been vetted (or the VMA doesn't have PROT_READ,
> >in which case loading the page into the enclave will fail).
> >
> >Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >---
> >  security/selinux/hooks.c | 69 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 69 insertions(+)
> >
> >diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> >index 3ec702cf46ca..3c5418edf51c 100644
> >--- a/security/selinux/hooks.c
> >+++ b/security/selinux/hooks.c
> >@@ -6726,6 +6726,71 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> >  }
> >  #endif
> >+#ifdef CONFIG_INTEL_SGX
> >+int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot)
> >+{
> >+	const struct cred *cred = current_cred();
> >+	u32 sid = cred_sid(cred);
> >+	int ret;
> >+
> >+	/* SGX is supported only in 64-bit kernels. */
> >+	WARN_ON_ONCE(!default_noexec);
> >+
> >+	/* Only executable enclave pages are restricted in any way. */
> >+	if (!(prot & PROT_EXEC))
> >+		return 0;
> 
> prot/PROT_EXEC or vmflags/VM_EXEC
> 
> >+
> >+	/*
> >+	 * The source page is exectuable, i.e. has already passed SELinux's
> 
> executable
> 
> >+	 * checks, and userspace is not requesting RW->RX capabilities.
> 
> Is it requesting W->X or WX?

Hmm, good point.  I'll reword the "requesting RW->RX" and "RW->RX intent"
phrases to make it clear that we don't actually know whether userspace
intends to do W->X or WX, and I'll also expand the "Note, this hybrid
EXECMOD and EXECMEM behavior" comment to explain that existing checks
won't prevent WX.

> >+	 */
> >+	if ((vma->vm_flags & VM_EXEC) && !(prot & PROT_WRITE))
> >+		return 0;
> >+
> >+	/*
> >+	 * The source page is not executable, or userspace is requesting the
> >+	 * ability to do a RW->RX conversion.  Permissions are required as
> >+	 * follows, in order of increasing privelege:
> >+	 *
> >+	 * EXECUTE - Load an executable enclave page without RW->RX intent from
> >+	 *           a non-executable vma that is backed by a shared mapping to
> >+	 *           a regular file that has not undergone COW.
> 
> Shared mapping or unmodified private file mapping

Doh, messed that up.  Thanks!

> >+	 *
> >+	 * EXECMOD - Load an executable enclave page without RW->RX intent from
> >+	 *           a non-executable vma that is backed by a shared mapping to
> >+	 *           a regular file that *has* undergone COW.
> 
> modified private file mapping (write to shared mapping won't trigger COW; it
> would have been checked by FILE__WRITE earlier)

Same mental error.  Will fix.

> >+	 *
> >+	 *         - Load an enclave page *with* RW->RX intent from a shared
> >+	 *           mapping to a regular file.
> >+	 *
> >+	 * EXECMEM - Load an exectuable enclave page from an anonymous mapping.
> 
> executable
> 
> >+	 *
> >+	 *         - Load an exectuable enclave page from a private file, e.g.
> 
> executable

At least I'm consistent.

> >+	 *           from a shared mapping to a hugetlbfs file.
> >+	 *
> >+	 *         - Load an enclave page *with* RW->RX intent from a private
> 
> W->X or WX?
>
> >+	 *           mapping to a regular file.
> >+	 *
> >+	 * Note, this hybrid EXECMOD and EXECMEM behavior is intentional and
> >+	 * reflects the nature of enclaves and the EPC, e.g. EPC is effectively
> >+	 * a non-persistent shared file, but each enclave is a private domain
> >+	 * within that shared file, so delegate to the source of the enclave.
> >+	 */
> >+	if (vma->vm_file && !IS_PRIVATE(file_inode(vma->vm_file) &&
> >+	    ((vma->vm_flags & VM_SHARED) || !(prot & PROT_WRITE)))) {
> >+		if (!vma->anon_vma && !(prot & PROT_WRITE))
> >+			ret = file_has_perm(cred, vma->vm_file, FILE__EXECUTE);
> >+		else
> >+			ret = file_has_perm(cred, vma->vm_file, FILE__EXECMOD);
> >+	} else {
> >+		ret = avc_has_perm(&selinux_state,
> >+				   sid, sid, SECCLASS_PROCESS,
> >+				   PROCESS__EXECMEM, NULL);
> >+	}
> >+	return ret;
> >+}
> >+#endif
> >+
> >  struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = {
> >  	.lbs_cred = sizeof(struct task_security_struct),
> >  	.lbs_file = sizeof(struct file_security_struct),
> >@@ -6968,6 +7033,10 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
> >  	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
> >  	LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
> >  #endif
> >+
> >+#ifdef CONFIG_INTEL_SGX
> >+	LSM_HOOK_INIT(enclave_load, selinux_enclave_load),
> >+#endif
> >  };
> >  static __init int selinux_init(void)
> >
> 

^ permalink raw reply

* Re: [RFC PATCH v2 3/5] x86/sgx: Enforce noexec filesystem restriction for enclaves
From: Andy Lutomirski @ 2019-06-10 16:44 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, Andy Lutomirski, Cedric Xing,
	Stephen Smalley, James Morris, Serge E . Hallyn, LSM List,
	Paul Moore, Eric Paris, selinux, Jethro Beekman, Dave Hansen,
	Thomas Gleixner, Linus Torvalds, LKML, X86 ML, linux-sgx,
	Andrew Morton, nhorman, npmccallum, Serge Ayoun, Shay Katz-zamir,
	Haitao Huang, Andy Shevchenko, Kai Svahn, Borislav Petkov,
	Josh Triplett, Kai Huang, David Rientjes, William Roberts,
	Philip Tricca
In-Reply-To: <20190610160005.GC3752@linux.intel.com>

On Mon, Jun 10, 2019 at 9:00 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Wed, Jun 05, 2019 at 07:11:43PM -0700, Sean Christopherson wrote:
> > +             goto out;
> > +     }
> > +
> > +     /*
> > +      * Query VM_MAYEXEC as an indirect path_noexec() check (see do_mmap()),
> > +      * but with some future proofing against other cases that may deny
> > +      * execute permissions.
> > +      */
> > +     if (!(vma->vm_flags & VM_MAYEXEC)) {
> > +             ret = -EACCES;
> > +             goto out;
> > +     }
> > +
> > +     if (copy_from_user(dst, (void __user *)src, PAGE_SIZE))
> > +             ret = -EFAULT;
> > +     else
> > +             ret = 0;
> > +
> > +out:
> > +     up_read(&current->mm->mmap_sem);
> > +
> > +     return ret;
> > +}
>
> I would suggest to express the above instead like this for clarity
> and consistency:
>
>                 goto err_map_sem;
>         }
>
>         /* Query VM_MAYEXEC as an indirect path_noexec() check
>          * (see do_mmap()).
>          */
>         if (!(vma->vm_flags & VM_MAYEXEC)) {
>                 ret = -EACCES;
>                 goto err_mmap_sem;
>         }
>
>         if (copy_from_user(dst, (void __user *)src, PAGE_SIZE)) {
>                 ret = -EFAULT;
>                 goto err_mmap_sem;
>         }
>
>         return 0;
>
> err_mmap_sem:
>         up_read(&current->mm->mmap_sem);
>         return ret;
> }
>
> The comment about future proofing is unnecessary.
>

I'm also torn as to whether this patch is needed at all.  If we ever
get O_MAYEXEC, then enclave loaders should use it to enforce noexec in
userspace.  Otherwise I'm unconvinced it's that special.

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Andy Lutomirski @ 2019-06-10 16:42 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Stephen Smalley, David Howells, Al Viro, USB list, LSM List,
	Greg Kroah-Hartman, raven, Linux FS Devel, Linux API, linux-block,
	keyrings, LKML, Paul Moore
In-Reply-To: <0cf7a49d-85f6-fba9-62ec-a378e0b76adf@schaufler-ca.com>

On Mon, Jun 10, 2019 at 9:34 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 6/10/2019 8:21 AM, Stephen Smalley wrote:
> > On 6/7/19 10:17 AM, David Howells wrote:
> >>
> >> Hi Al,
> >>
> >> Here's a set of patches to add a general variable-length notification queue
> >> concept and to add sources of events for:
> >>
> >>   (1) Mount topology events, such as mounting, unmounting, mount expiry,
> >>       mount reconfiguration.
> >>
> >>   (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
> >>       errors (not complete yet).
> >>
> >>   (3) Key/keyring events, such as creating, linking and removal of keys.
> >>
> >>   (4) General device events (single common queue) including:
> >>
> >>       - Block layer events, such as device errors
> >>
> >>       - USB subsystem events, such as device/bus attach/remove, device
> >>         reset, device errors.
> >>
> >> One of the reasons for this is so that we can remove the issue of processes
> >> having to repeatedly and regularly scan /proc/mounts, which has proven to
> >> be a system performance problem.  To further aid this, the fsinfo() syscall
> >> on which this patch series depends, provides a way to access superblock and
> >> mount information in binary form without the need to parse /proc/mounts.
> >>
> >>
> >> LSM support is included, but controversial:
> >>
> >>   (1) The creds of the process that did the fput() that reduced the refcount
> >>       to zero are cached in the file struct.
> >>
> >>   (2) __fput() overrides the current creds with the creds from (1) whilst
> >>       doing the cleanup, thereby making sure that the creds seen by the
> >>       destruction notification generated by mntput() appears to come from
> >>       the last fputter.
> >>
> >>   (3) security_post_notification() is called for each queue that we might
> >>       want to post a notification into, thereby allowing the LSM to prevent
> >>       covert communications.
> >>
> >>   (?) Do I need to add security_set_watch(), say, to rule on whether a watch
> >>       may be set in the first place?  I might need to add a variant per
> >>       watch-type.
> >>
> >>   (?) Do I really need to keep track of the process creds in which an
> >>       implicit object destruction happened?  For example, imagine you create
> >>       an fd with fsopen()/fsmount().  It is marked to dissolve the mount it
> >>       refers to on close unless move_mount() clears that flag.  Now, imagine
> >>       someone looking at that fd through procfs at the same time as you exit
> >>       due to an error.  The LSM sees the destruction notification come from
> >>       the looker if they happen to do their fput() after yours.
> >
> > I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea.
> >
> > For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type.
> >
> > I wouldn't expect SELinux to implement security_post_notification() at all.  I can't see how one can construct a meaningful, stable policy for it.  I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it.
>
> I cannot agree. There is an explicit action by a subject that results
> in information being delivered to an object. Just like a signal or a
> UDP packet delivery. Smack handles this kind of thing just fine. The
> internal mechanism that results in the access is irrelevant from
> this viewpoint. I can understand how a mechanism like SELinux that
> works on finer granularity might view it differently.

I think you really need to give an example of a coherent policy that
needs this.  As it stands, your analogy seems confusing.  If someone
changes the system clock, we don't restrict who is allowed to be
notified (via, for example, TFD_TIMER_CANCEL_ON_SET) that the clock
was changed based on who changed the clock.  Similarly, if someone
tries to receive a packet on a socket, we check whether they have the
right to receive on that socket (from the endpoint in question) and,
if the sender is local, whether the sender can send to that socket.
We do not check whether the sender can send to the receiver.

The signal example is inapplicable.  Sending a signal to a process is
an explicit action done to that process, and it can easily adversely
affect the target.  Of course it requires permission.

--Andy

^ permalink raw reply

* Re: [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4]
From: Casey Schaufler @ 2019-06-10 16:33 UTC (permalink / raw)
  To: Stephen Smalley, David Howells, viro
  Cc: linux-usb, linux-security-module, Greg Kroah-Hartman, raven,
	linux-fsdevel, linux-api, linux-block, keyrings, linux-kernel,
	Paul Moore, casey
In-Reply-To: <be966d9c-e38d-7a30-8d80-fad5f25ab230@tycho.nsa.gov>

On 6/10/2019 8:21 AM, Stephen Smalley wrote:
> On 6/7/19 10:17 AM, David Howells wrote:
>>
>> Hi Al,
>>
>> Here's a set of patches to add a general variable-length notification queue
>> concept and to add sources of events for:
>>
>>   (1) Mount topology events, such as mounting, unmounting, mount expiry,
>>       mount reconfiguration.
>>
>>   (2) Superblock events, such as R/W<->R/O changes, quota overrun and I/O
>>       errors (not complete yet).
>>
>>   (3) Key/keyring events, such as creating, linking and removal of keys.
>>
>>   (4) General device events (single common queue) including:
>>
>>       - Block layer events, such as device errors
>>
>>       - USB subsystem events, such as device/bus attach/remove, device
>>         reset, device errors.
>>
>> One of the reasons for this is so that we can remove the issue of processes
>> having to repeatedly and regularly scan /proc/mounts, which has proven to
>> be a system performance problem.  To further aid this, the fsinfo() syscall
>> on which this patch series depends, provides a way to access superblock and
>> mount information in binary form without the need to parse /proc/mounts.
>>
>>
>> LSM support is included, but controversial:
>>
>>   (1) The creds of the process that did the fput() that reduced the refcount
>>       to zero are cached in the file struct.
>>
>>   (2) __fput() overrides the current creds with the creds from (1) whilst
>>       doing the cleanup, thereby making sure that the creds seen by the
>>       destruction notification generated by mntput() appears to come from
>>       the last fputter.
>>
>>   (3) security_post_notification() is called for each queue that we might
>>       want to post a notification into, thereby allowing the LSM to prevent
>>       covert communications.
>>
>>   (?) Do I need to add security_set_watch(), say, to rule on whether a watch
>>       may be set in the first place?  I might need to add a variant per
>>       watch-type.
>>
>>   (?) Do I really need to keep track of the process creds in which an
>>       implicit object destruction happened?  For example, imagine you create
>>       an fd with fsopen()/fsmount().  It is marked to dissolve the mount it
>>       refers to on close unless move_mount() clears that flag.  Now, imagine
>>       someone looking at that fd through procfs at the same time as you exit
>>       due to an error.  The LSM sees the destruction notification come from
>>       the looker if they happen to do their fput() after yours.
>
> I remain unconvinced that (1), (2), (3), and the final (?) above are a good idea.
>
> For SELinux, I would expect that one would implement a collection of per watch-type WATCH permission checks on the target object (or to some well-defined object label like the kernel SID if there is no object) that allow receipt of all notifications of that watch-type for objects related to the target object, where "related to" is defined per watch-type.
>
> I wouldn't expect SELinux to implement security_post_notification() at all.  I can't see how one can construct a meaningful, stable policy for it.  I'd argue that the triggering process is not posting the notification; the kernel is posting the notification and the watcher has been authorized to receive it.

I cannot agree. There is an explicit action by a subject that results
in information being delivered to an object. Just like a signal or a
UDP packet delivery. Smack handles this kind of thing just fine. The
internal mechanism that results in the access is irrelevant from
this viewpoint. I can understand how a mechanism like SELinux that
works on finer granularity might view it differently.



>
>>
>>
>> Design decisions:
>>
>>   (1) A misc chardev is used to create and open a ring buffer:
>>
>>     fd = open("/dev/watch_queue", O_RDWR);
>>
>>       which is then configured and mmap'd into userspace:
>>
>>     ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
>>     ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
>>     buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE,
>>            MAP_SHARED, fd, 0);
>>
>>       The fd cannot be read or written (though there is a facility to use
>>       write to inject records for debugging) and userspace just pulls data
>>       directly out of the buffer.
>>
>>   (2) The ring index pointers are stored inside the ring and are thus
>>       accessible to userspace.  Userspace should only update the tail
>>       pointer and never the head pointer or risk breaking the buffer.  The
>>       kernel checks that the pointers appear valid before trying to use
>>       them.  A 'skip' record is maintained around the pointers.
>>
>>   (3) poll() can be used to wait for data to appear in the buffer.
>>
>>   (4) Records in the buffer are binary, typed and have a length so that they
>>       can be of varying size.
>>
>>       This means that multiple heterogeneous sources can share a common
>>       buffer.  Tags may be specified when a watchpoint is created to help
>>       distinguish the sources.
>>
>>   (5) The queue is reusable as there are 16 million types available, of
>>       which I've used 4, so there is scope for others to be used.
>>
>>   (6) Records are filterable as types have up to 256 subtypes that can be
>>       individually filtered.  Other filtration is also available.
>>
>>   (7) Each time the buffer is opened, a new buffer is created - this means
>>       that there's no interference between watchers.
>>
>>   (8) When recording a notification, the kernel will not sleep, but will
>>       rather mark a queue as overrun if there's insufficient space, thereby
>>       avoiding userspace causing the kernel to hang.
>>
>>   (9) The 'watchpoint' should be specific where possible, meaning that you
>>       specify the object that you want to watch.
>>
>> (10) The buffer is created and then watchpoints are attached to it, using
>>       one of:
>>
>>     keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01);
>>     mount_notify(AT_FDCWD, "/", 0, fd, 0x02);
>>     sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03);
>>
>>       where in all three cases, fd indicates the queue and the number after
>>       is a tag between 0 and 255.
>>
>> (11) The watch must be removed if either the watch buffer is destroyed or
>>       the watched object is destroyed.
>>
>>
>> Things I want to avoid:
>>
>>   (1) Introducing features that make the core VFS dependent on the network
>>       stack or networking namespaces (ie. usage of netlink).
>>
>>   (2) Dumping all this stuff into dmesg and having a daemon that sits there
>>       parsing the output and distributing it as this then puts the
>>       responsibility for security into userspace and makes handling
>>       namespaces tricky.  Further, dmesg might not exist or might be
>>       inaccessible inside a container.
>>
>>   (3) Letting users see events they shouldn't be able to see.
>>
>>
>> Further things that could be considered:
>>
>>   (1) Adding a keyctl call to allow a watch on a keyring to be extended to
>>       "children" of that keyring, such that the watch is removed from the
>>       child if it is unlinked from the keyring.
>>
>>   (2) Adding global superblock event queue.
>>
>>   (3) Propagating watches to child superblock over automounts.
>>
>>
>> The patches can be found here also:
>>
>>     http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications
>>
>> Changes:
>>
>>   v4: Split the basic UAPI bits out into their own patch and then split the
>>       LSM hooks out into an intermediate patch.  Add LSM hooks for setting
>>       watches.
>>
>>       Rename the *_notify() system calls to watch_*() for consistency.
>>
>>   v3: I've added a USB notification source and reformulated the block
>>       notification source so that there's now a common watch list, for which
>>       the system call is now device_notify().
>>
>>       I've assigned a pair of unused ioctl numbers in the 'W' series to the
>>       ioctls added by this series.
>>
>>       I've also added a description of the kernel API to the documentation.
>>
>>   v2: I've fixed various issues raised by Jann Horn and GregKH and moved to
>>       krefs for refcounting.  I've added some security features to try and
>>       give Casey Schaufler the LSM control he wants.
>>
>> David
>> ---
>> David Howells (13):
>>        security: Override creds in __fput() with last fputter's creds
>>        uapi: General notification ring definitions
>>        security: Add hooks to rule on setting a watch
>>        security: Add a hook for the point of notification insertion
>>        General notification queue with user mmap()'able ring buffer
>>        keys: Add a notification facility
>>        vfs: Add a mount-notification facility
>>        vfs: Add superblock notifications
>>        fsinfo: Export superblock notification counter
>>        Add a general, global device notification watch list
>>        block: Add block layer notifications
>>        usb: Add USB subsystem notifications
>>        Add sample notification program
>>
>>
>>   Documentation/ioctl/ioctl-number.txt   |    1
>>   Documentation/security/keys/core.rst   |   58 ++
>>   Documentation/watch_queue.rst          |  492 ++++++++++++++++++
>>   arch/x86/entry/syscalls/syscall_32.tbl |    3
>>   arch/x86/entry/syscalls/syscall_64.tbl |    3
>>   block/Kconfig                          |    9
>>   block/blk-core.c                       |   29 +
>>   drivers/base/Kconfig                   |    9
>>   drivers/base/Makefile                  |    1
>>   drivers/base/watch.c                   |   89 +++
>>   drivers/misc/Kconfig                   |   13
>>   drivers/misc/Makefile                  |    1
>>   drivers/misc/watch_queue.c             |  889 ++++++++++++++++++++++++++++++++
>>   drivers/usb/core/Kconfig               |   10
>>   drivers/usb/core/devio.c               |   55 ++
>>   drivers/usb/core/hub.c                 |    3
>>   fs/Kconfig                             |   21 +
>>   fs/Makefile                            |    1
>>   fs/file_table.c                        |   12
>>   fs/fsinfo.c                            |   12
>>   fs/mount.h                             |   33 +
>>   fs/mount_notify.c                      |  187 +++++++
>>   fs/namespace.c                         |    9
>>   fs/super.c                             |  122 ++++
>>   include/linux/blkdev.h                 |   15 +
>>   include/linux/dcache.h                 |    1
>>   include/linux/device.h                 |    7
>>   include/linux/fs.h                     |   79 +++
>>   include/linux/key.h                    |    4
>>   include/linux/lsm_hooks.h              |   48 ++
>>   include/linux/security.h               |   35 +
>>   include/linux/syscalls.h               |    5
>>   include/linux/usb.h                    |   19 +
>>   include/linux/watch_queue.h            |   87 +++
>>   include/uapi/linux/fsinfo.h            |   10
>>   include/uapi/linux/keyctl.h            |    1
>>   include/uapi/linux/watch_queue.h       |  213 ++++++++
>>   kernel/sys_ni.c                        |    7
>>   samples/Kconfig                        |    6
>>   samples/Makefile                       |    1
>>   samples/vfs/test-fsinfo.c              |   13
>>   samples/watch_queue/Makefile           |    9
>>   samples/watch_queue/watch_test.c       |  308 +++++++++++
>>   security/keys/Kconfig                  |   10
>>   security/keys/compat.c                 |    2
>>   security/keys/gc.c                     |    5
>>   security/keys/internal.h               |   30 +
>>   security/keys/key.c                    |   37 +
>>   security/keys/keyctl.c                 |   95 +++
>>   security/keys/keyring.c                |   17 -
>>   security/keys/request_key.c            |    4
>>   security/security.c                    |   29 +
>>   52 files changed, 3121 insertions(+), 38 deletions(-)
>>   create mode 100644 Documentation/watch_queue.rst
>>   create mode 100644 drivers/base/watch.c
>>   create mode 100644 drivers/misc/watch_queue.c
>>   create mode 100644 fs/mount_notify.c
>>   create mode 100644 include/linux/watch_queue.h
>>   create mode 100644 include/uapi/linux/watch_queue.h
>>   create mode 100644 samples/watch_queue/Makefile
>>   create mode 100644 samples/watch_queue/watch_test.c
>>
>

^ permalink raw reply

* Re: [RFC PATCH v2 4/5] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
From: Sean Christopherson @ 2019-06-10 16:21 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Jarkko Sakkinen, Andy Lutomirski, Cedric Xing, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris, selinux,
	Jethro Beekman, Dave Hansen, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx, Andrew Morton, nhorman, npmccallum,
	Serge Ayoun, Shay Katz-zamir, Haitao Huang, Andy Shevchenko,
	Kai Svahn, Borislav Petkov, Josh Triplett, Kai Huang,
	David Rientjes, William Roberts, Philip Tricca
In-Reply-To: <5706a7ec-5497-c560-92fa-91c9751b9096@tycho.nsa.gov>

On Fri, Jun 07, 2019 at 03:58:34PM -0400, Stephen Smalley wrote:
> On 6/5/19 10:11 PM, Sean Christopherson wrote:
> >enclave_load() is roughly analogous to the existing file_mprotect().
> >
> >Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave
> >VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be
> >MAP_SHARED.  Furthermore, all enclaves need read, write and execute
> >VMAs.  As a result, the existing/standard call to file_mprotect() does
> >not provide any meaningful security for enclaves since an LSM can only
> >deny/grant access to the EPC as a whole.
> >
> >security_enclave_load() is called when SGX is first loading an enclave
> >page, i.e. copying a page from normal memory into the EPC.  Although
> >the prototype for enclave_load() is similar to file_mprotect(), e.g.
> >SGX could theoretically use file_mprotect() and set reqprot=prot, a
> >separate hook is desirable as the semantics of an enclave's protection
> >bits are different than those of vmas, e.g. an enclave page tracks the
> >maximal set of protections, whereas file_mprotect() operates on the
> >actual protections being provided.  In other words, LSMs will likely
> >want to implement different policies for enclave page protections.
> >
> >Note, extensive discussion yielded no sane alternative to some form of
> >SGX specific LSM hook[1].
> >
> >[1] https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com
> >
> >Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >---
> >  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 12 ++++++------
> >  include/linux/lsm_hooks.h              | 13 +++++++++++++
> >  include/linux/security.h               | 12 ++++++++++++
> >  security/security.c                    |  7 +++++++
> >  4 files changed, 38 insertions(+), 6 deletions(-)
> >
> >diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> >index 44b2d73de7c3..29c0df672250 100644
> >--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> >+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> >@@ -8,6 +8,7 @@
> >  #include <linux/highmem.h>
> >  #include <linux/ratelimit.h>
> >  #include <linux/sched/signal.h>
> >+#include <linux/security.h>
> >  #include <linux/shmem_fs.h>
> >  #include <linux/slab.h>
> >  #include <linux/suspend.h>
> >@@ -582,9 +583,6 @@ static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot)
> >  	struct vm_area_struct *vma;
> >  	int ret;
> >-	if (!(prot & VM_EXEC))
> >-		return 0;
> >-
> 
> Is there a real use case where LSM will want to be called if !(prot &
> VM_EXEC)?

I don't think so?  I have no objection to conditioning the LSM calls on
the page being executable.  I actually had the code written that way in
the first RFC, but it felt weird for SGX to be making assumptions about
LSM use cases.

> Also, you seem to be mixing prot and PROT_EXEC with vm_flags and
> VM_EXEC; other code does not appear to assume they are identical and
> explicitly converts, e.g. calc_vm_prot_bits().

Argh, I'll clean that up.
 
> >  	/* Hold mmap_sem across copy_from_user() to avoid a TOCTOU race. */
> >  	down_read(&current->mm->mmap_sem);

^ permalink raw reply

* Re: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits
From: Sean Christopherson @ 2019-06-10 16:15 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Cedric Xing, Stephen Smalley, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris, selinux,
	Jethro Beekman, Dave Hansen, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx, Andrew Morton, nhorman, npmccallum,
	Serge Ayoun, Shay Katz-zamir, Haitao Huang, Andy Shevchenko,
	Kai Svahn, Borislav Petkov, Josh Triplett, Kai Huang,
	David Rientjes, William Roberts, Philip Tricca
In-Reply-To: <20190610152717.GB3752@linux.intel.com>

On Mon, Jun 10, 2019 at 06:27:17PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 05, 2019 at 07:11:42PM -0700, Sean Christopherson wrote:
> > [SNAP]
> 
> Same general criticism as for the previous patch: try to say things as
> they are without anything extra.
> 
> > A third alternative would be to pull the protection bits from the page's
> > SECINFO, i.e. make decisions based on the protections enforced by
> > hardware.  However, with SGX2, userspace can extend the hardware-
> > enforced protections via ENCLU[EMODPE], e.g. can add a page as RW and
> > later convert it to RX.  With SGX2, making a decision based on the
> > initial protections would either create a security hole or force SGX to
> > dynamically track "dirty" pages (see first alternative above).
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> 'flags' should would renamed as 'secinfo_flags_mask' even if the name is
> longish. It would use the same values as the SECINFO flags. The field in
> struct sgx_encl_page should have the same name. That would express
> exactly relation between SECINFO and the new field. I would have never
> asked on last iteration why SECINFO is not enough with a better naming.

No, these flags do not impact the EPCM protections in any way.  Userspace
can extend the EPCM protections without going through the kernel.  The
protection flags for an enclave page impact VMA/PTE protection bits.

IMO, it is best to treat the EPCM as being completely separate from the
kernel's EPC management.

> The same field can be also used to cage page type to a subset of values.
> 
> /Jarkko

^ permalink raw reply

* Re: [RFC PATCH v2 4/5] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
From: Jarkko Sakkinen @ 2019-06-10 16:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Cedric Xing, Stephen Smalley, James Morris,
	Serge E . Hallyn, LSM List, Paul Moore, Eric Paris, selinux,
	Jethro Beekman, Dave Hansen, Thomas Gleixner, Linus Torvalds,
	LKML, X86 ML, linux-sgx, Andrew Morton, nhorman, npmccallum,
	Serge Ayoun, Shay Katz-zamir, Haitao Huang, Andy Shevchenko,
	Kai Svahn, Borislav Petkov, Josh Triplett, Kai Huang,
	David Rientjes, William Roberts, Philip Tricca
In-Reply-To: <20190606021145.12604-5-sean.j.christopherson@intel.com>

On Wed, Jun 05, 2019 at 07:11:44PM -0700, Sean Christopherson wrote:
> enclave_load() is roughly analogous to the existing file_mprotect().
> 
> Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave
> VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be
> MAP_SHARED.  Furthermore, all enclaves need read, write and execute
> VMAs.  As a result, the existing/standard call to file_mprotect() does
> not provide any meaningful security for enclaves since an LSM can only
> deny/grant access to the EPC as a whole.
> 
> security_enclave_load() is called when SGX is first loading an enclave
> page, i.e. copying a page from normal memory into the EPC.  Although
> the prototype for enclave_load() is similar to file_mprotect(), e.g.
> SGX could theoretically use file_mprotect() and set reqprot=prot, a
> separate hook is desirable as the semantics of an enclave's protection
> bits are different than those of vmas, e.g. an enclave page tracks the
> maximal set of protections, whereas file_mprotect() operates on the
> actual protections being provided.  In other words, LSMs will likely
> want to implement different policies for enclave page protections.
> 
> Note, extensive discussion yielded no sane alternative to some form of
> SGX specific LSM hook[1].
> 
> [1] https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

4/5 and 5/5 should only be added after upstreaming SGX.

/Jarkko

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox