Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: file metadata via fs API
From: David Howells @ 2020-08-12 13:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Karel Zak, Steven Whitehouse, Linus Torvalds,
	linux-fsdevel, Al Viro, Jeff Layton, Miklos Szeredi,
	Nicolas Dichtel, Christian Brauner, Lennart Poettering, Linux API,
	Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAJfpegv4sC2zm+N5tvEmYaEFvvWJRHfdGqXUoBzbeKj81uNCvQ@mail.gmail.com>

Miklos Szeredi <miklos@szeredi.hu> wrote:

> That presumably means the mount ID <-> mount path mapping already
> exists, which means it's just possible to use the open(mount_path,
> O_PATH) to obtain the base fd.

No, you can't.  A path more correspond to multiple mounts stacked on top of
each other, e.g.:

	mount -t tmpfs none /mnt
	mount -t tmpfs none /mnt
	mount -t tmpfs none /mnt

Now you have three co-located mounts and you can't use the path to
differentiate them.  I think this might be an issue in autofs, but Ian would
need to comment on that.

David


^ permalink raw reply

* Re: file metadata via fs API
From: Miklos Szeredi @ 2020-08-12 12:43 UTC (permalink / raw)
  To: Karel Zak
  Cc: Steven Whitehouse, David Howells, Linus Torvalds, linux-fsdevel,
	Al Viro, Jeff Layton, Miklos Szeredi, Nicolas Dichtel,
	Christian Brauner, Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200812112825.b52tqeuro2lquxlw@ws.net.home>

On Wed, Aug 12, 2020 at 1:28 PM Karel Zak <kzak@redhat.com> wrote:

> The proposal is based on paths and open(), how do you plan to deal
> with mount IDs? David's fsinfo() allows to ask for mount info by mount
> ID and it works well with mount notification where you get the ID. The
> collaboration with notification interface is critical for our use-cases.

One would use the notification to keep an up to date set of attributes
for each watched mount, right?

That presumably means the mount ID <-> mount path mapping already
exists, which means it's just possible to use the open(mount_path,
O_PATH) to obtain the base fd.

If that assumption is not true, we could add a new interface for
opening the root of the mount by ID.  Fsinfo uses the dfd as a root
for checking connectivity and the filename as the mount ID + a special
flag indicating that it's not "dfd + path" we are dealing with but
"rootfd + mntid".  That sort of semantic multiplexing is quite ugly
and I wouldn't suggest doing that with openat(2).

A new syscall that returns an fd pointing to the root of the mount
might be the best solution:

   int open_mount(int root_fd, u64 mntid, int flags);

Yeah, yeah this is adding just another syscall interface, but notice how:

 a) it does one simple thing, no multiplexing at all

 b) is general purpose, and could be used for example in conjunction
with open_by_handle_at(2), that also requires an fd pointing to a
mount.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API
From: Karel Zak @ 2020-08-12 11:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Steven Whitehouse, David Howells, Linus Torvalds, linux-fsdevel,
	Al Viro, Jeff Layton, Miklos Szeredi, Nicolas Dichtel,
	Christian Brauner, Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpeguMjU+n-JXE6aUQQGeMpCS4bsy4HQ37NHJ8aD8Aeg2qhA@mail.gmail.com>

On Wed, Aug 12, 2020 at 12:04:14PM +0200, Miklos Szeredi wrote:
> On Wed, Aug 12, 2020 at 11:43 AM Steven Whitehouse <swhiteho@redhat.com> wrote:
> >
> > Hi,
> >
> > On 12/08/2020 09:37, Miklos Szeredi wrote:
> > [snip]
> > >
> > > b) The awarded performance boost is not warranted for the use cases it
> > > is designed for.
> 
> >
> > This is a key point. One of the main drivers for this work is the
> > efficiency improvement for large numbers of mounts. Ian and Karel have
> > already provided performance measurements showing a significant benefit
> > compared with what we have today. If you want to propose this
> > alternative interface then you need to show that it can sustain similar
> > levels of performance, otherwise it doesn't solve the problem. So
> > performance numbers here would be helpful.
> 
> Definitely.   Will measure performance with the interface which Linus proposed.

The proposal is based on paths and open(), how do you plan to deal
with mount IDs? David's fsinfo() allows to ask for mount info by mount
ID and it works well with mount notification where you get the ID. The
collaboration with notification interface is critical for our use-cases.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Karel Zak @ 2020-08-12 10:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, linux-fsdevel, David Howells, Al Viro,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjzLmMRf=QG-n+1HnxWCx4KTQn9+OhVvUSJ=ZCQd6Y1WA@mail.gmail.com>

On Tue, Aug 11, 2020 at 08:20:24AM -0700, Linus Torvalds wrote:
> IOW, if you do something more along the lines of
> 
>        fd = open(""foo/bar", O_PATH);
>        metadatafd = openat(fd, "metadataname", O_ALT);
> 
> it might be workable.

I have thought we want to replace mountinfo to reduce overhead. If I
understand your idea than we will need openat()+read()+close() for
each attribute? Sounds like a pretty expensive interface.

The question is also how consistent results you get if you will read
information about the same mountpoint by multiple openat()+read()+close()
calls. 

For example,  by fsinfo(FSINFO_ATTR_MOUNT_TOPOLOGY) you get all
mountpoint propagation setting and relations by one syscall, with
your idea you will read parent, slave and flags by multiple read() and
without any lock. Sounds like you can get a mess if someone moves or
reconfigure the mountpoint or so.

openat(O_ALT) seems elegant at first glance, but it will be necessary to 
provide a richer (complex) answers by read() to reduce overhead and 
to make it more consistent for userspace. 

It would be also nice to avoid some strings formatting and separators
like we use in the current mountinfo.

I can imagine multiple values separated by binary header (like we already 
have for watch_notification, inotify, etc):

  fd = openat(fd, "mountinfo", O_ALT);

  sz = read(fd, buf, BUFSZ);
  p = buf;

  while (sz) {
     struct alt_metadata *alt = (struct alt_metadata *) p;

     char *varname = alt->name;
     char *data = alt->data;
     int len  = alt->len;

     sz -= len;
     p += len;
  }

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply

* Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Mark Rutland @ 2020-08-12 10:06 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86
In-Reply-To: <b3368692-afe6-89b5-d634-12f4f0a601f8@linux.microsoft.com>

On Thu, Aug 06, 2020 at 12:26:02PM -0500, Madhavan T. Venkataraman wrote:
> Thanks for the lively discussion. I have tried to answer some of the
> comments below.
> 
> On 8/4/20 9:30 AM, Mark Rutland wrote:
> >
> >> So, the context is - if security settings in a system disallow a page to have
> >> both write and execute permissions, how do you allow the execution of
> >> genuine trampolines that are runtime generated and placed in a data
> >> page or a stack page?
> > There are options today, e.g.
> >
> > a) If the restriction is only per-alias, you can have distinct aliases
> >    where one is writable and another is executable, and you can make it
> >    hard to find the relationship between the two.
> >
> > b) If the restriction is only temporal, you can write instructions into
> >    an RW- buffer, transition the buffer to R--, verify the buffer
> >    contents, then transition it to --X.
> >
> > c) You can have two processes A and B where A generates instrucitons into
> >    a buffer that (only) B can execute (where B may be restricted from
> >    making syscalls like write, mprotect, etc).
> 
> The general principle of the mitigation is W^X. I would argue that
> the above options are violations of the W^X principle. If they are
> allowed today, they must be fixed. And they will be. So, we cannot
> rely on them.

Hold on.

Contemporary W^X means that a given virtual alias cannot be writeable
and executeable simultaneously, permitting (a) and (b). If you read the
references on the Wikipedia page for W^X you'll see the OpenBSD 3.3
release notes and related presentation make this clear, and further they
expect (b) to occur with JITS flipping W/X with mprotect().

Please don't conflate your assumed stronger semantics with the general
principle. It not matching you expectations does not necessarily mean
that it is wrong.

If you want a stronger W^X semantics, please refer to this specifically
with a distinct name.

> a) This requires a remap operation. Two mappings point to the same
>      physical page. One mapping has W and the other one has X. This
>      is a violation of W^X.
> 
> b) This is again a violation. The kernel should refuse to give execute
>      permission to a page that was writeable in the past and refuse to
>      give write permission to a page that was executable in the past.
> 
> c) This is just a variation of (a).

As above, this is not true.

If you have a rationale for why this is desirable or necessary, please
justify that before using this as justification for additional features.

> In general, the problem with user-level methods to map and execute
> dynamic code is that the kernel cannot tell if a genuine application is
> using them or an attacker is using them or piggy-backing on them.

Yes, and as I pointed out the same is true for trampfd unless you can
somehow authenticate the calls are legitimate (in both callsite and the
set of arguments), and I don't see any reasonable way of doing that.

If you relax your threat model to an attacker not being able to make
arbitrary syscalls, then your suggestion that userspace can perorm
chceks between syscalls may be sufficient, but as I pointed out that's
equally true for a sealed memfd or similar.

> Off the top of my head, I have tried to identify some examples
> where we can have more trust on dynamic code and have the kernel
> permit its execution.
> 
> 1. If the kernel can do the job, then that is one safe way. Here, the kernel
>     is the code. There is no code generation involved. This is what I
>     have presented in the patch series as the first cut.

This is sleight-of-hand; it doesn't matter where the logic is performed
if the power is identical. Practically speaking this is equivalent to
some dynamic code generation.

I think that it's misleading to say that because the kernel emulates
something it is safe when the provenance of the syscall arguments cannot
be verified.

[...]

> Anyway, these are just examples. The principle is - if we can identify
> dynamic code that has a certain measure of trust, can the kernel
> permit their execution?

My point generally is that the kernel cannot identify this, and if
usrspace code is trusted to dynamically generate trampfd arguments it
can equally be trusted to dyncamilly generate code.

[...]

> As I have mentioned above, I intend to have the kernel generate code
> only if the code generation is simple enough. For more complicated cases,
> I plan to use a user-level code generator that is for exclusive kernel use.
> I have yet to work out the details on how this would work. Need time.

This reads to me like trampfd is only dealing with a few special cases
and we know that we need a more general solution.

I hope I am mistaken, but I get the strong impression that you're trying
to justify your existing solution rather than trying to understand the
problem space.

To be clear, my strong opinion is that we should not be trying to do
this sort of emulation or code generation within the kernel. I do think
it's worthwhile to look at mechanisms to make it harder to subvert
dynamic userspace code generation, but I think the code generation
itself needs to live in userspace (e.g. for ABI reasons I previously
mentioned).

Mark.

^ permalink raw reply

* Re: file metadata via fs API
From: Miklos Szeredi @ 2020-08-12 10:04 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: David Howells, Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <d2d179c7-9b60-ca1a-0c9f-d308fc7af5ce@redhat.com>

On Wed, Aug 12, 2020 at 11:43 AM Steven Whitehouse <swhiteho@redhat.com> wrote:
>
> Hi,
>
> On 12/08/2020 09:37, Miklos Szeredi wrote:
> [snip]
> >
> > b) The awarded performance boost is not warranted for the use cases it
> > is designed for.

>
> This is a key point. One of the main drivers for this work is the
> efficiency improvement for large numbers of mounts. Ian and Karel have
> already provided performance measurements showing a significant benefit
> compared with what we have today. If you want to propose this
> alternative interface then you need to show that it can sustain similar
> levels of performance, otherwise it doesn't solve the problem. So
> performance numbers here would be helpful.

Definitely.   Will measure performance with the interface which Linus proposed.

I'm not worried, though; the problem with the previous interface was
that it resulted in the complete mount table being re-parsed on each
individual event resulting in quadratic behavior.  This doesn't affect
any interface that can query individual mount/superblock objects.

> Also - I may have missed this earlier in the discussion, what are the
> atomicity guarantees with this proposal? This is the other key point for
> the API, so it would be good to see that clearly stated (i.e. how does
> one use it in combination with the notifications to provide an up to
> date, consistent view of the kernel's mounts)

fsinfo(2) provides version counters on mount and superblock objects to
verify consistency of returned data, since not all data is returned in
a single call.  Same method could be used with the open/read based
interface to verify consistency in case multiple attributes/attribute
groups need to be queried.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API
From: Steven Whitehouse @ 2020-08-12  9:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegsVJo9e=pHf3YGWkE16fT0QaNGhgkUdq4KUQypXaD=OgQ@mail.gmail.com>

Hi,

On 12/08/2020 09:37, Miklos Szeredi wrote:
[snip]
>
> b) The awarded performance boost is not warranted for the use cases it
> is designed for.
>
> Thanks,
> Miklos
>

This is a key point. One of the main drivers for this work is the 
efficiency improvement for large numbers of mounts. Ian and Karel have 
already provided performance measurements showing a significant benefit 
compared with what we have today. If you want to propose this 
alternative interface then you need to show that it can sustain similar 
levels of performance, otherwise it doesn't solve the problem. So 
performance numbers here would be helpful.

Also - I may have missed this earlier in the discussion, what are the 
atomicity guarantees with this proposal? This is the other key point for 
the API, so it would be good to see that clearly stated (i.e. how does 
one use it in combination with the notifications to provide an up to 
date, consistent view of the kernel's mounts)

Steve.

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12  8:37 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <98802.1597220949@warthog.procyon.org.uk>

On Wed, Aug 12, 2020 at 10:29 AM David Howells <dhowells@redhat.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > Worried about performance?  Io-uring will allow you to do all those
> > five syscalls (or many more) with just one I/O submission.
>
> io_uring isn't going to help here.  We're talking about synchronous reads.
> AIUI, you're adding a couple more syscalls to the list and running stuff in a
> side thread to save the effort of going in and out of the kernel five times.
> But you still have to pay the set up/tear down costs on the fds and do the
> pathwalks.  io_uring doesn't magically make that cost disappear.
>
> io_uring also requires resources such as a kernel accessible ring buffer to
> make it work.
>
> You're proposing making everything else more messy just to avoid a dedicated
> syscall.  Could you please set out your reasoning for that?

a) A dedicated syscall with a complex binary API is a non-trivial
maintenance burden.

b) The awarded performance boost is not warranted for the use cases it
is designed for.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: David Howells @ 2020-08-12  8:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegt=cQ159kEH9zCYVHV7R_08jwMxF0jKrSUV5E=uBg4Lzw@mail.gmail.com>

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Worried about performance?  Io-uring will allow you to do all those
> five syscalls (or many more) with just one I/O submission.

io_uring isn't going to help here.  We're talking about synchronous reads.
AIUI, you're adding a couple more syscalls to the list and running stuff in a
side thread to save the effort of going in and out of the kernel five times.
But you still have to pay the set up/tear down costs on the fds and do the
pathwalks.  io_uring doesn't magically make that cost disappear.

io_uring also requires resources such as a kernel accessible ring buffer to
make it work.

You're proposing making everything else more messy just to avoid a dedicated
syscall.  Could you please set out your reasoning for that?

David

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12  7:55 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <52483.1597190733@warthog.procyon.org.uk>

On Wed, Aug 12, 2020 at 2:05 AM David Howells <dhowells@redhat.com> wrote:

> >   {
> >      int fd, attrfd;
> >
> >      fd = open(path, O_PATH);
> >      attrfd = openat(fd, name, O_ALT);
> >      close(fd);
> >      read(attrfd, value, size);
> >      close(attrfd);
> >   }
>
> Please don't go down this path.  You're proposing five syscalls - including
> creating two file descriptors - to do what fsinfo() does in one.

So what?  People argued against readfile() for exactly the opposite of
reasons, even though that's a lot less specialized than fsinfo().

Worried about performance?  Io-uring will allow you to do all those
five syscalls (or many more) with just one I/O submission.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12  7:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jann Horn, Casey Schaufler, Andy Lutomirski, linux-fsdevel,
	David Howells, Al Viro, Karel Zak, Jeff Layton, Miklos Szeredi,
	Nicolas Dichtel, Christian Brauner, Lennart Poettering, Linux API,
	Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAHk-=whE42mFLi8CfNcdB6Jc40tXsG3sR+ThWAFihhBwfUbczA@mail.gmail.com>

On Tue, Aug 11, 2020 at 11:19 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Aug 11, 2020 at 1:56 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > So that's where O_ALT comes in.   If the application is consenting,
> > then that should prevent exploits.   Or?
>
> If the application is consenting AND GETS IT RIGHT it should prevent exploits.
>
> But that's a big deal.
>
> Why not just do it the way I suggested? Then you don't have any of these issues.

Will do.

I just want to understand the reasons why a unified namespace is
completely out of the question.   And I won't accept "it's just fugly"
or "it's the way it's always been done, so don't change it".  Those
are not good reasons.

Oh, I'm used to these "fights", had them all along.  In hindsight I
should have accepted others' advice in some of the cases, but in
others that big argument turned out to be a complete non-issue.   One
such being inode and dentry duplication in the overlayfs case vs.
in-built stacking in the union-mount case.  There were a lot of issues
with overlayfs, that's true, but dcache/icache size has NEVER actually
been reported as a problem.

While Al has a lot of experience, it's hard to accept all that
anecdotal evidence just because he says it.   Your worries are also
just those: worries.  They may turn out to be an issue or they may
not.

Anyway, starting with just introducing the alt namespace without
unification seems to be a good first step. If that turns out to be
workable, we can revisit unification later.

Thanks,
Miklos

^ permalink raw reply

* Re: [PATCH 0/3] Smack: Use the netlbl incoming cache
From: Paul Moore @ 2020-08-12  2:10 UTC (permalink / raw)
  To: Casey Schaufler; +Cc: SMACK-discuss, casey.schaufler, linux-security-module
In-Reply-To: <20200812003943.3036-1-casey@schaufler-ca.com>

On Tue, Aug 11, 2020 at 8:39 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> Update the Smack security module to use the Netlabel cache
> mechanism to speed the processing of incoming labeled packets.
> There is some refactoring of the existing code that makes it
> simpler, and reduces duplication. The outbound packet labeling
> is also optimized to track the labeling state of the socket.
> Prior to this the socket label was redundantly set on each
> packet send.
>
> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
> ---
>  security/smack/smack.h        |  19 ++--
>  security/smack/smack_access.c |  55 ++++++----
>  security/smack/smack_lsm.c    | 245 ++++++++++++++++++++++++------------------
>  security/smack/smackfs.c      |  23 ++--
>  4 files changed, 193 insertions(+), 149 deletions(-)

FWIW, I gave this a cursory look just now and the NetLabel usage
seemed reasonable.  Out of curiosity, have you done any before/after
performance tests?  It was quite significant when we adopted it in
SELinux, but that was some time ago, it would be nice to know that it
is still working well and hasn't been invalidated by some other,
unrelated change.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Ian Kent @ 2020-08-12  0:53 UTC (permalink / raw)
  To: Christian Brauner, Linus Torvalds
  Cc: Miklos Szeredi, linux-fsdevel, David Howells, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, LSM, Linux Kernel Mailing List
In-Reply-To: <20200811193916.zcwebstmbyvushau@wittgenstein>

On Tue, 2020-08-11 at 21:39 +0200, Christian Brauner wrote:
> On Tue, Aug 11, 2020 at 09:05:22AM -0700, Linus Torvalds wrote:
> > On Tue, Aug 11, 2020 at 8:30 AM Miklos Szeredi <miklos@szeredi.hu>
> > wrote:
> > > What's the disadvantage of doing it with a single lookup WITH an
> > > enabling flag?
> > > 
> > > It's definitely not going to break anything, so no backward
> > > compatibility issues whatsoever.
> > 
> > No backwards compatibility issues for existing programs, no.
> > 
> > But your suggestion is fundamentally ambiguous, and you most
> > definitely *can* hit that if people start using this in new
> > programs.
> > 
> > Where does that "unified" pathname come from? It will be generated
> > from "base filename + metadata name" in user space, and
> > 
> >  (a) the base filename might have double or triple slashes in it
> > for
> > whatever reasons.
> > 
> > This is not some "made-up gotcha" thing - I see double slashes
> > *all*
> > the time when we have things like Makefiles doing
> > 
> >     srctree=../../src/
> > 
> > and then people do "$(srctree)/". If you haven't seen that kind of
> > pattern where the pathname has two (or sometimes more!) slashes in
> > the
> > middle, you've led a very sheltered life.
> > 
> >  (b) even if the new user space were to think about that, and
> > remove
> > those (hah! when have you ever seen user space do that?), as Al
> > mentioned, the user *filesystem* might have pathnames with double
> > slashes as part of symlinks.
> > 
> > So now we'd have to make sure that when we traverse symlinks, that
> > O_ALT gets cleared. Which means that it's not a unified namespace
> > after all, because you can't make symlinks point to metadata.
> > 
> > Or we'd retroactively change the semantics of a symlink, and that
> > _is_
> > a backwards compatibility issue. Not with old software, no, but it
> > changes the meaning of old symlinks!
> > 
> > So no, I don't think a unified namespace ends up working.
> > 
> > And I say that as somebody who actually loves the concept. Ask Al:
> > I
> > have a few times pushed for "let's allow directory behavior on
> > regular
> > files", so that you could do things like a tar-filesystem, and
> > access
> > the contents of a tar-file by just doing
> > 
> >     cat my-file.tar/inside/the/archive.c
> > 
> > or similar.
> > 
> > Al has convinced me it's a horrible idea (and there you have a
> > non-ambiguous marker: the slash at the end of a pathname that
> > otherwise looks and acts as a non-directory)
> > 
> 
> Putting my kernel hat down, putting my userspace hat on.
> 
> I'm looking at this from a potential user of this interface.
> I'm not a huge fan of the metadata fd approach I'd much rather have a
> dedicated system call rather than opening a side-channel metadata fd
> that I can read binary data from. Maybe I'm alone in this but I was
> under the impression that other users including Ian, Lennart, and
> Karel
> have said on-list in some form that they would prefer this approach.
> There are even patches for systemd and libmount, I thought?

Not quite sure what you mean here.

Karel (with some contributions by me) has implemented the interfaces
for David's mount notifications and fsinfo() call in libmount. We
still have a little more to do on that.

I also have a systemd implementation that uses these libmount features
for mount table handling that works quite well, with a couple more
things to do to complete it, that Lennart has done an initial review
for.

It's no secret that I don't like the proc file system in general
but it is really useful for many things, that's just the way it
is.

Ian


^ permalink raw reply

* [PATCH 3/3] Smack: Use the netlabel cache
From: Casey Schaufler @ 2020-08-12  0:39 UTC (permalink / raw)
  To: SMACK-discuss, casey.schaufler, linux-security-module; +Cc: paul, casey
In-Reply-To: <20200812003943.3036-1-casey@schaufler-ca.com>

Utilize the Netlabel cache mechanism for incoming packet matching.
Refactor the initialization of secattr structures, as it was being
done in two places.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/smack/smack.h        |  1 +
 security/smack/smack_access.c | 55 +++++++++++++++++++++++------------
 security/smack/smack_lsm.c    | 27 +++++++++++++----
 security/smack/smackfs.c      | 23 ++++++---------
 4 files changed, 68 insertions(+), 38 deletions(-)

diff --git a/security/smack/smack.h b/security/smack/smack.h
index c5d745a3ada8..a9768b12716b 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -297,6 +297,7 @@ struct smack_known *smk_find_entry(const char *);
 bool smack_privileged(int cap);
 bool smack_privileged_cred(int cap, const struct cred *cred);
 void smk_destroy_label_list(struct list_head *list);
+int smack_populate_secattr(struct smack_known *skp);
 
 /*
  * Shared data.
diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
index 38ac3da4e791..efe2406a3960 100644
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
@@ -510,6 +510,42 @@ int smk_netlbl_mls(int level, char *catset, struct netlbl_lsm_secattr *sap,
 	return 0;
 }
 
+/**
+ * smack_populate_secattr - fill in the smack_known netlabel information
+ * @skp: pointer to the structure to fill
+ *
+ * Populate the netlabel secattr structure for a Smack label.
+ *
+ * Returns 0 unless creating the category mapping fails
+ */
+int smack_populate_secattr(struct smack_known *skp)
+{
+	int slen;
+
+	skp->smk_netlabel.attr.secid = skp->smk_secid;
+	skp->smk_netlabel.domain = skp->smk_known;
+	skp->smk_netlabel.cache = netlbl_secattr_cache_alloc(GFP_ATOMIC);
+	if (skp->smk_netlabel.cache != NULL) {
+		skp->smk_netlabel.flags |= NETLBL_SECATTR_CACHE;
+		skp->smk_netlabel.cache->free = NULL;
+		skp->smk_netlabel.cache->data = skp;
+	}
+	skp->smk_netlabel.flags |= NETLBL_SECATTR_SECID |
+				   NETLBL_SECATTR_MLS_LVL |
+				   NETLBL_SECATTR_DOMAIN;
+	/*
+	 * If direct labeling works use it.
+	 * Otherwise use mapped labeling.
+	 */
+	slen = strlen(skp->smk_known);
+	if (slen < SMK_CIPSOLEN)
+		return smk_netlbl_mls(smack_cipso_direct, skp->smk_known,
+				      &skp->smk_netlabel, slen);
+
+	return smk_netlbl_mls(smack_cipso_mapped, (char *)&skp->smk_secid,
+			      &skp->smk_netlabel, sizeof(skp->smk_secid));
+}
+
 /**
  * smk_import_entry - import a label, return the list entry
  * @string: a text string that might be a Smack label
@@ -523,7 +559,6 @@ struct smack_known *smk_import_entry(const char *string, int len)
 {
 	struct smack_known *skp;
 	char *smack;
-	int slen;
 	int rc;
 
 	smack = smk_parse_smack(string, len);
@@ -544,21 +579,8 @@ struct smack_known *smk_import_entry(const char *string, int len)
 
 	skp->smk_known = smack;
 	skp->smk_secid = smack_next_secid++;
-	skp->smk_netlabel.domain = skp->smk_known;
-	skp->smk_netlabel.flags =
-		NETLBL_SECATTR_DOMAIN | NETLBL_SECATTR_MLS_LVL;
-	/*
-	 * If direct labeling works use it.
-	 * Otherwise use mapped labeling.
-	 */
-	slen = strlen(smack);
-	if (slen < SMK_CIPSOLEN)
-		rc = smk_netlbl_mls(smack_cipso_direct, skp->smk_known,
-			       &skp->smk_netlabel, slen);
-	else
-		rc = smk_netlbl_mls(smack_cipso_mapped, (char *)&skp->smk_secid,
-			       &skp->smk_netlabel, sizeof(skp->smk_secid));
 
+	rc = smack_populate_secattr(skp);
 	if (rc >= 0) {
 		INIT_LIST_HEAD(&skp->smk_rules);
 		mutex_init(&skp->smk_rules_lock);
@@ -569,9 +591,6 @@ struct smack_known *smk_import_entry(const char *string, int len)
 		smk_insert_entry(skp);
 		goto unlockout;
 	}
-	/*
-	 * smk_netlbl_mls failed.
-	 */
 	kfree(skp);
 	skp = ERR_PTR(rc);
 freeout:
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 7a79ddb39e94..86db667ce319 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -3715,6 +3715,18 @@ static struct smack_known *smack_from_secattr(struct netlbl_lsm_secattr *sap,
 	int acat;
 	int kcat;
 
+	/*
+	 * Netlabel found it in the cache.
+	 */
+	if ((sap->flags & NETLBL_SECATTR_CACHE) != 0)
+		return (struct smack_known *)sap->cache->data;
+
+	if ((sap->flags & NETLBL_SECATTR_SECID) != 0)
+		/*
+		 * Looks like a fallback, which gives us a secid.
+		 */
+		return smack_from_secid(sap->attr.secid);
+
 	if ((sap->flags & NETLBL_SECATTR_MLS_LVL) != 0) {
 		/*
 		 * Looks like a CIPSO packet.
@@ -3762,11 +3774,6 @@ static struct smack_known *smack_from_secattr(struct netlbl_lsm_secattr *sap,
 			return &smack_known_web;
 		return &smack_known_star;
 	}
-	if ((sap->flags & NETLBL_SECATTR_SECID) != 0)
-		/*
-		 * Looks like a fallback, which gives us a secid.
-		 */
-		return smack_from_secid(sap->attr.secid);
 	/*
 	 * Without guidance regarding the smack value
 	 * for the packet fall back on the network
@@ -3845,6 +3852,9 @@ static struct smack_known *smack_from_skb(struct sk_buff *skb)
  * @family: address family
  * @skb: packet
  *
+ * Find the Smack label in the IP options. If it hasn't been
+ * added to the netlabel cache, add it here.
+ *
  * Returns smack_known of the IP options or NULL if that won't work.
  */
 static struct smack_known *smack_from_netlbl(struct sock *sk, u16 family,
@@ -3853,13 +3863,18 @@ static struct smack_known *smack_from_netlbl(struct sock *sk, u16 family,
 	struct netlbl_lsm_secattr secattr;
 	struct socket_smack *ssp = NULL;
 	struct smack_known *skp = NULL;
+	int rc = 0;
 
 	netlbl_secattr_init(&secattr);
 
 	if (sk)
 		ssp = sk->sk_security;
-	if (netlbl_skbuff_getattr(skb, family, &secattr) == 0)
+
+	if (netlbl_skbuff_getattr(skb, family, &secattr) == 0) {
 		skp = smack_from_secattr(&secattr, ssp);
+		if (secattr.flags & NETLBL_SECATTR_CACHEABLE)
+			rc = netlbl_cache_add(skb, family, &skp->smk_netlabel);
+	}
 
 	netlbl_secattr_destroy(&secattr);
 
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 9c4308077574..e567b4baf3a0 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -922,6 +922,10 @@ static ssize_t smk_set_cipso(struct file *file, const char __user *buf,
 		skp->smk_netlabel.attr.mls.cat = ncats.attr.mls.cat;
 		skp->smk_netlabel.attr.mls.lvl = ncats.attr.mls.lvl;
 		rc = count;
+		/*
+		 * This mapping may have been cached, so clear the cache.
+		 */
+		netlbl_cache_invalidate();
 	}
 
 out:
@@ -2950,15 +2954,6 @@ static struct file_system_type smk_fs_type = {
 
 static struct vfsmount *smackfs_mount;
 
-static int __init smk_preset_netlabel(struct smack_known *skp)
-{
-	skp->smk_netlabel.domain = skp->smk_known;
-	skp->smk_netlabel.flags =
-		NETLBL_SECATTR_DOMAIN | NETLBL_SECATTR_MLS_LVL;
-	return smk_netlbl_mls(smack_cipso_direct, skp->smk_known,
-				&skp->smk_netlabel, strlen(skp->smk_known));
-}
-
 /**
  * init_smk_fs - get the smackfs superblock
  *
@@ -2997,19 +2992,19 @@ static int __init init_smk_fs(void)
 	smk_cipso_doi();
 	smk_unlbl_ambient(NULL);
 
-	rc = smk_preset_netlabel(&smack_known_floor);
+	rc = smack_populate_secattr(&smack_known_floor);
 	if (err == 0 && rc < 0)
 		err = rc;
-	rc = smk_preset_netlabel(&smack_known_hat);
+	rc = smack_populate_secattr(&smack_known_hat);
 	if (err == 0 && rc < 0)
 		err = rc;
-	rc = smk_preset_netlabel(&smack_known_huh);
+	rc = smack_populate_secattr(&smack_known_huh);
 	if (err == 0 && rc < 0)
 		err = rc;
-	rc = smk_preset_netlabel(&smack_known_star);
+	rc = smack_populate_secattr(&smack_known_star);
 	if (err == 0 && rc < 0)
 		err = rc;
-	rc = smk_preset_netlabel(&smack_known_web);
+	rc = smack_populate_secattr(&smack_known_web);
 	if (err == 0 && rc < 0)
 		err = rc;
 
-- 
2.19.1


^ permalink raw reply related

* [PATCH 2/3] Smack: Set socket labels only once
From: Casey Schaufler @ 2020-08-12  0:39 UTC (permalink / raw)
  To: SMACK-discuss, casey.schaufler, linux-security-module; +Cc: paul, casey
In-Reply-To: <20200812003943.3036-1-casey@schaufler-ca.com>

Refactor the IP send checks so that the netlabel value
is set only when necessary, not on every send. Some functions
get renamed as the changes made the old name misleading.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/smack/smack.h     |  18 ++--
 security/smack/smack_lsm.c | 169 ++++++++++++++++++++-----------------
 2 files changed, 98 insertions(+), 89 deletions(-)

diff --git a/security/smack/smack.h b/security/smack/smack.h
index e9e817d09785..c5d745a3ada8 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -100,7 +100,12 @@ struct socket_smack {
 	struct smack_known	*smk_out;	/* outbound label */
 	struct smack_known	*smk_in;	/* inbound label */
 	struct smack_known	*smk_packet;	/* TCP peer label */
+	int			smk_state;	/* netlabel socket states */
 };
+#define	SMK_NETLBL_UNSET	0
+#define	SMK_NETLBL_UNLABELED	1
+#define	SMK_NETLBL_LABELED	2
+#define	SMK_NETLBL_REQSKB	3
 
 /*
  * Inode smack data
@@ -196,19 +201,6 @@ enum {
 #define SMACK_DELETE_OPTION	"-DELETE"
 #define SMACK_CIPSO_OPTION 	"-CIPSO"
 
-/*
- * How communications on this socket are treated.
- * Usually it's determined by the underlying netlabel code
- * but there are certain cases, including single label hosts
- * and potentially single label interfaces for which the
- * treatment can not be known in advance.
- *
- * The possibility of additional labeling schemes being
- * introduced in the future exists as well.
- */
-#define SMACK_UNLABELED_SOCKET	0
-#define SMACK_CIPSO_SOCKET	1
-
 /*
  * CIPSO defaults.
  */
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 3402ac4aa28e..7a79ddb39e94 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2383,38 +2383,31 @@ static struct smack_known *smack_ipv6host_label(struct sockaddr_in6 *sip)
 }
 
 /**
- * smack_netlabel - Set the secattr on a socket
+ * smack_netlbl_add - Set the secattr on a socket
  * @sk: the socket
- * @labeled: socket label scheme
  *
- * Convert the outbound smack value (smk_out) to a
- * secattr and attach it to the socket.
+ * Attach the outbound smack value (smk_out) to the socket.
  *
  * Returns 0 on success or an error code
  */
-static int smack_netlabel(struct sock *sk, int labeled)
+static int smack_netlbl_add(struct sock *sk)
 {
-	struct smack_known *skp;
 	struct socket_smack *ssp = sk->sk_security;
-	int rc = 0;
+	struct smack_known *skp = ssp->smk_out;
+	int rc;
 
-	/*
-	 * Usually the netlabel code will handle changing the
-	 * packet labeling based on the label.
-	 * The case of a single label host is different, because
-	 * a single label host should never get a labeled packet
-	 * even though the label is usually associated with a packet
-	 * label.
-	 */
 	local_bh_disable();
 	bh_lock_sock_nested(sk);
 
-	if (ssp->smk_out == smack_net_ambient ||
-	    labeled == SMACK_UNLABELED_SOCKET)
-		netlbl_sock_delattr(sk);
-	else {
-		skp = ssp->smk_out;
-		rc = netlbl_sock_setattr(sk, sk->sk_family, &skp->smk_netlabel);
+	rc = netlbl_sock_setattr(sk, sk->sk_family, &skp->smk_netlabel);
+	switch (rc) {
+	case 0:
+		ssp->smk_state = SMK_NETLBL_LABELED;
+		break;
+	case -EDESTADDRREQ:
+		ssp->smk_state = SMK_NETLBL_REQSKB;
+		rc = 0;
+		break;
 	}
 
 	bh_unlock_sock(sk);
@@ -2424,7 +2417,31 @@ static int smack_netlabel(struct sock *sk, int labeled)
 }
 
 /**
- * smack_netlbel_send - Set the secattr on a socket and perform access checks
+ * smack_netlbl_delete - Remove the secattr from a socket
+ * @sk: the socket
+ *
+ * Remove the outbound smack value from a socket
+ */
+static void smack_netlbl_delete(struct sock *sk)
+{
+	struct socket_smack *ssp = sk->sk_security;
+
+	/*
+	 * Take the label off the socket if one is set.
+	 */
+	if (ssp->smk_state != SMK_NETLBL_LABELED)
+		return;
+
+	local_bh_disable();
+	bh_lock_sock_nested(sk);
+	netlbl_sock_delattr(sk);
+	bh_unlock_sock(sk);
+	local_bh_enable();
+	ssp->smk_state = SMK_NETLBL_UNLABELED;
+}
+
+/**
+ * smk_ipv4_check - Perform IPv4 host access checks
  * @sk: the socket
  * @sap: the destination address
  *
@@ -2434,11 +2451,10 @@ static int smack_netlabel(struct sock *sk, int labeled)
  * Returns 0 on success or an error code.
  *
  */
-static int smack_netlabel_send(struct sock *sk, struct sockaddr_in *sap)
+static int smk_ipv4_check(struct sock *sk, struct sockaddr_in *sap)
 {
 	struct smack_known *skp;
-	int rc;
-	int sk_lbl;
+	int rc = 0;
 	struct smack_known *hkp;
 	struct socket_smack *ssp = sk->sk_security;
 	struct smk_audit_info ad;
@@ -2454,19 +2470,18 @@ static int smack_netlabel_send(struct sock *sk, struct sockaddr_in *sap)
 		ad.a.u.net->dport = sap->sin_port;
 		ad.a.u.net->v4info.daddr = sap->sin_addr.s_addr;
 #endif
-		sk_lbl = SMACK_UNLABELED_SOCKET;
 		skp = ssp->smk_out;
 		rc = smk_access(skp, hkp, MAY_WRITE, &ad);
 		rc = smk_bu_note("IPv4 host check", skp, hkp, MAY_WRITE, rc);
-	} else {
-		sk_lbl = SMACK_CIPSO_SOCKET;
-		rc = 0;
+		/*
+		 * Clear the socket netlabel if it's set.
+		 */
+		if (!rc)
+			smack_netlbl_delete(sk);
 	}
 	rcu_read_unlock();
-	if (rc != 0)
-		return rc;
 
-	return smack_netlabel(sk, sk_lbl);
+	return rc;
 }
 
 /**
@@ -2703,7 +2718,7 @@ static int smack_inode_setsecurity(struct inode *inode, const char *name,
 	else if (strcmp(name, XATTR_SMACK_IPOUT) == 0) {
 		ssp->smk_out = skp;
 		if (sock->sk->sk_family == PF_INET) {
-			rc = smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
+			rc = smack_netlbl_add(sock->sk);
 			if (rc != 0)
 				printk(KERN_WARNING
 					"Smack: \"%s\" netlbl error %d.\n",
@@ -2754,7 +2769,7 @@ static int smack_socket_post_create(struct socket *sock, int family,
 	/*
 	 * Set the outbound netlbl.
 	 */
-	return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
+	return smack_netlbl_add(sock->sk);
 }
 
 /**
@@ -2845,7 +2860,7 @@ static int smack_socket_connect(struct socket *sock, struct sockaddr *sap,
 	}
 	if (sap->sa_family != AF_INET || addrlen < sizeof(struct sockaddr_in))
 		return 0;
-	rc = smack_netlabel_send(sock->sk, (struct sockaddr_in *)sap);
+	rc = smk_ipv4_check(sock->sk, (struct sockaddr_in *)sap);
 	return rc;
 }
 
@@ -3663,7 +3678,7 @@ static int smack_socket_sendmsg(struct socket *sock, struct msghdr *msg,
 		if (msg->msg_namelen < sizeof(struct sockaddr_in) ||
 		    sip->sin_family != AF_INET)
 			return -EINVAL;
-		rc = smack_netlabel_send(sock->sk, sip);
+		rc = smk_ipv4_check(sock->sk, sip);
 		break;
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
@@ -3824,6 +3839,33 @@ static struct smack_known *smack_from_skb(struct sk_buff *skb)
 	return smack_from_secid(skb->secmark);
 }
 
+/**
+ * smack_from_netlbl - Smack data from the IP options in an skb
+ * @sk: socket data came in on
+ * @family: address family
+ * @skb: packet
+ *
+ * Returns smack_known of the IP options or NULL if that won't work.
+ */
+static struct smack_known *smack_from_netlbl(struct sock *sk, u16 family,
+					     struct sk_buff *skb)
+{
+	struct netlbl_lsm_secattr secattr;
+	struct socket_smack *ssp = NULL;
+	struct smack_known *skp = NULL;
+
+	netlbl_secattr_init(&secattr);
+
+	if (sk)
+		ssp = sk->sk_security;
+	if (netlbl_skbuff_getattr(skb, family, &secattr) == 0)
+		skp = smack_from_secattr(&secattr, ssp);
+
+	netlbl_secattr_destroy(&secattr);
+
+	return skp;
+}
+
 /**
  * smack_socket_sock_rcv_skb - Smack packet delivery access check
  * @sk: socket
@@ -3833,7 +3875,6 @@ static struct smack_known *smack_from_skb(struct sk_buff *skb)
  */
 static int smack_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
-	struct netlbl_lsm_secattr secattr;
 	struct socket_smack *ssp = sk->sk_security;
 	struct smack_known *skp = NULL;
 	int rc = 0;
@@ -3858,22 +3899,11 @@ static int smack_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		 * The secmark is assumed to reflect policy better.
 		 */
 		skp = smack_from_skb(skb);
-		if (skp)
-			goto access_check;
-		/*
-		 * Translate what netlabel gave us.
-		 */
-		netlbl_secattr_init(&secattr);
-
-		rc = netlbl_skbuff_getattr(skb, family, &secattr);
-		if (rc == 0)
-			skp = smack_from_secattr(&secattr, ssp);
-		else
-			skp = smack_net_ambient;
-
-		netlbl_secattr_destroy(&secattr);
-
-access_check:
+		if (skp == NULL) {
+			skp = smack_from_netlbl(sk, family, skb);
+			if (skp == NULL)
+				skp = smack_net_ambient;
+		}
 
 #ifdef CONFIG_AUDIT
 		smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
@@ -3979,12 +4009,11 @@ static int smack_socket_getpeersec_dgram(struct socket *sock,
 					 struct sk_buff *skb, u32 *secid)
 
 {
-	struct netlbl_lsm_secattr secattr;
 	struct socket_smack *ssp = NULL;
 	struct smack_known *skp;
+	struct sock *sk = NULL;
 	int family = PF_UNSPEC;
 	u32 s = 0;	/* 0 is the invalid secid */
-	int rc;
 
 	if (skb != NULL) {
 		if (skb->protocol == htons(ETH_P_IP))
@@ -4011,15 +4040,11 @@ static int smack_socket_getpeersec_dgram(struct socket *sock,
 		/*
 		 * Translate what netlabel gave us.
 		 */
-		if (sock != NULL && sock->sk != NULL)
-			ssp = sock->sk->sk_security;
-		netlbl_secattr_init(&secattr);
-		rc = netlbl_skbuff_getattr(skb, family, &secattr);
-		if (rc == 0) {
-			skp = smack_from_secattr(&secattr, ssp);
+		if (sock != NULL)
+			sk = sock->sk;
+		skp = smack_from_netlbl(sk, family, skb);
+		if (skp != NULL)
 			s = skp->smk_secid;
-		}
-		netlbl_secattr_destroy(&secattr);
 		break;
 	case PF_INET6:
 #ifdef SMACK_IPV6_SECMARK_LABELING
@@ -4073,7 +4098,6 @@ static int smack_inet_conn_request(struct sock *sk, struct sk_buff *skb,
 	u16 family = sk->sk_family;
 	struct smack_known *skp;
 	struct socket_smack *ssp = sk->sk_security;
-	struct netlbl_lsm_secattr secattr;
 	struct sockaddr_in addr;
 	struct iphdr *hdr;
 	struct smack_known *hskp;
@@ -4103,18 +4127,11 @@ static int smack_inet_conn_request(struct sock *sk, struct sk_buff *skb,
 	 * The secmark is assumed to reflect policy better.
 	 */
 	skp = smack_from_skb(skb);
-	if (skp)
-		goto access_check;
-
-	netlbl_secattr_init(&secattr);
-	rc = netlbl_skbuff_getattr(skb, family, &secattr);
-	if (rc == 0)
-		skp = smack_from_secattr(&secattr, ssp);
-	else
-		skp = &smack_known_huh;
-	netlbl_secattr_destroy(&secattr);
-
-access_check:
+	if (skp == NULL) {
+		skp = smack_from_netlbl(sk, family, skb);
+		if (skp == NULL)
+			skp = &smack_known_huh;
+	}
 
 #ifdef CONFIG_AUDIT
 	smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
-- 
2.19.1


^ permalink raw reply related

* [PATCH 1/3] Smack: Consolidate uses of secmark into a function
From: Casey Schaufler @ 2020-08-12  0:39 UTC (permalink / raw)
  To: SMACK-discuss, casey.schaufler, linux-security-module; +Cc: paul, casey
In-Reply-To: <20200812003943.3036-1-casey@schaufler-ca.com>

Add a function smack_from_skb() that returns the Smack label
identified by a network secmark. Replace the explicit uses of
the secmark with this function.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/smack/smack_lsm.c | 61 +++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 8ffbf951b7ed..3402ac4aa28e 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -3810,6 +3810,20 @@ static int smk_skb_to_addr_ipv6(struct sk_buff *skb, struct sockaddr_in6 *sip)
 }
 #endif /* CONFIG_IPV6 */
 
+/**
+ * smack_from_skb - Smack data from the secmark in an skb
+ * @skb: packet
+ *
+ * Returns smack_known of the secmark or NULL if that won't work.
+ */
+static struct smack_known *smack_from_skb(struct sk_buff *skb)
+{
+	if (skb == NULL || skb->secmark == 0)
+		return NULL;
+
+	return smack_from_secid(skb->secmark);
+}
+
 /**
  * smack_socket_sock_rcv_skb - Smack packet delivery access check
  * @sk: socket
@@ -3838,17 +3852,14 @@ static int smack_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 	switch (family) {
 	case PF_INET:
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
 		/*
 		 * If there is a secmark use it rather than the CIPSO label.
 		 * If there is no secmark fall back to CIPSO.
 		 * The secmark is assumed to reflect policy better.
 		 */
-		if (skb && skb->secmark != 0) {
-			skp = smack_from_secid(skb->secmark);
+		skp = smack_from_skb(skb);
+		if (skp)
 			goto access_check;
-		}
-#endif /* CONFIG_SECURITY_SMACK_NETFILTER */
 		/*
 		 * Translate what netlabel gave us.
 		 */
@@ -3862,9 +3873,8 @@ static int smack_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 		netlbl_secattr_destroy(&secattr);
 
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
 access_check:
-#endif
+
 #ifdef CONFIG_AUDIT
 		smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
 		ad.a.u.net->family = family;
@@ -3890,16 +3900,14 @@ static int smack_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		    proto != IPPROTO_TCP && proto != IPPROTO_DCCP)
 			break;
 #ifdef SMACK_IPV6_SECMARK_LABELING
-		if (skb && skb->secmark != 0)
-			skp = smack_from_secid(skb->secmark);
-		else if (smk_ipv6_localhost(&sadd))
-			break;
-		else
+		skp = smack_from_skb(skb);
+		if (skp == NULL) {
+			if (smk_ipv6_localhost(&sadd))
+				break;
 			skp = smack_ipv6host_label(&sadd);
-		if (skp == NULL)
-			skp = smack_net_ambient;
-		if (skb == NULL)
-			break;
+			if (skp == NULL)
+				skp = smack_net_ambient;
+		}
 #ifdef CONFIG_AUDIT
 		smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
 		ad.a.u.net->family = family;
@@ -3995,11 +4003,11 @@ static int smack_socket_getpeersec_dgram(struct socket *sock,
 		s = ssp->smk_out->smk_secid;
 		break;
 	case PF_INET:
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
-		s = skb->secmark;
-		if (s != 0)
+		skp = smack_from_skb(skb);
+		if (skp) {
+			s = skp->smk_secid;
 			break;
-#endif
+		}
 		/*
 		 * Translate what netlabel gave us.
 		 */
@@ -4015,7 +4023,9 @@ static int smack_socket_getpeersec_dgram(struct socket *sock,
 		break;
 	case PF_INET6:
 #ifdef SMACK_IPV6_SECMARK_LABELING
-		s = skb->secmark;
+		skp = smack_from_skb(skb);
+		if (skp)
+			s = skp->smk_secid;
 #endif
 		break;
 	}
@@ -4087,17 +4097,14 @@ static int smack_inet_conn_request(struct sock *sk, struct sk_buff *skb,
 	}
 #endif /* CONFIG_IPV6 */
 
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
 	/*
 	 * If there is a secmark use it rather than the CIPSO label.
 	 * If there is no secmark fall back to CIPSO.
 	 * The secmark is assumed to reflect policy better.
 	 */
-	if (skb && skb->secmark != 0) {
-		skp = smack_from_secid(skb->secmark);
+	skp = smack_from_skb(skb);
+	if (skp)
 		goto access_check;
-	}
-#endif /* CONFIG_SECURITY_SMACK_NETFILTER */
 
 	netlbl_secattr_init(&secattr);
 	rc = netlbl_skbuff_getattr(skb, family, &secattr);
@@ -4107,9 +4114,7 @@ static int smack_inet_conn_request(struct sock *sk, struct sk_buff *skb,
 		skp = &smack_known_huh;
 	netlbl_secattr_destroy(&secattr);
 
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
 access_check:
-#endif
 
 #ifdef CONFIG_AUDIT
 	smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
-- 
2.19.1


^ permalink raw reply related

* [PATCH 0/3] Smack: Use the netlbl incoming cache
From: Casey Schaufler @ 2020-08-12  0:39 UTC (permalink / raw)
  To: SMACK-discuss, casey.schaufler, linux-security-module; +Cc: paul, casey
In-Reply-To: <20200812003943.3036-1-casey.ref@schaufler-ca.com>

Update the Smack security module to use the Netlabel cache
mechanism to speed the processing of incoming labeled packets.
There is some refactoring of the existing code that makes it
simpler, and reduces duplication. The outbound packet labeling
is also optimized to track the labeling state of the socket.
Prior to this the socket label was redundantly set on each
packet send.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/smack/smack.h        |  19 ++--
 security/smack/smack_access.c |  55 ++++++----
 security/smack/smack_lsm.c    | 245 ++++++++++++++++++++++++------------------
 security/smack/smackfs.c      |  23 ++--
 4 files changed, 193 insertions(+), 149 deletions(-)

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: David Howells @ 2020-08-12  0:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjzLmMRf=QG-n+1HnxWCx4KTQn9+OhVvUSJ=ZCQd6Y1WA@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> [ I missed the beginning of this discussion, so maybe this was already
> suggested ]

Well, the start of it was my proposal of an fsinfo() system call.  That at its
simplest takes an object reference (eg. a path) and an integer attribute ID (it
could use a string instead, I suppose, but it would mean a bunch of strcmps
instead of integer comparisons) and returns the value of the attribute.  But I
allow you to do slightly more interesting things than that too.

Miklós seems dead-set against adding a system call specifically for this -
though he's proposed extending open in various ways and also proposed an
additional syscall, readfile(), that does the open+read+close all in one step.

I think also at some point, he (or maybe James?) proposed adding a new magic
filesystem mounted somewhere on proc (reflecting an open fd) that then had a
bunch of symlinks to somewhere in sysfs (reflecting a mount).  The idea being
that you did something like:

	fd = open("/path/to/object", O_PATH);
	sprintf(name, "/proc/self/fds/%u/attr1", fd);
	attrfd = open(name, O_RDONLY);
	read(attrfd, buf1, sizeof(buf1));
	close(attrfd);
	sprintf(name, "/proc/self/fds/%u/attr2", fd);
	attrfd = open(name, O_RDONLY);
	read(attrfd, buf2, sizeof(buf2));
	close(attrfd);

or:

	sprintf(name, "/proc/self/fds/%u/attr1", fd);
	readfile(name, buf1, sizeof(buf1));
	sprintf(name, "/proc/self/fds/%u/attr2", fd);
	readfile(name, buf2, sizeof(buf2));

and then "/proc/self/fds/12/attr2" might then be a symlink to, say,
"/sys/mounts/615/mount_attr".

Miklós's justification for this was that it could then be operated from a shell
script without the need for a utility - except that bash, at least, can't do
O_PATH opens.

James has proposed making fsconfig() able to retrieve attributes (though I'd
prefer to give it a sibling syscall that does the retrieval rather than making
fsconfig() do that too).

>   {
>      int fd, attrfd;
>
>      fd = open(path, O_PATH);
>      attrfd = openat(fd, name, O_ALT);
>      close(fd);
>      read(attrfd, value, size);
>      close(attrfd);
>   }

Please don't go down this path.  You're proposing five syscalls - including
creating two file descriptors - to do what fsinfo() does in one.

Do you have a particular objection to adding a syscall specifically for
retrieving filesystem/VFS information?

-~-

Anyway, in case you're interested in what I want to get out of this - which is
the reason for it being posted in the first place:

 (*) The ability to retrieve various attributes of a filesystem/superblock,
     including information on:

	- Filesystem features: Does it support things like hard links, user
          quotas, direct I/O.

	- Filesystem limits: What's the maximum size of a file, an xattr, a
          directory; how many files can it support.

	- Supported API features: What FS_IOC_GETFLAGS does it support?  Which
          can be set?  Does it have Windows file attributes available?  What
          statx attributes are supported?  What do the timestamps support?
          What sort of case handling is done on filenames?

     Note that for a lot of cases, this stuff is fixed and can just be memcpy'd
     from rodata.  Some of this is variable, however, in things like ext4 and
     xfs, depending on, say, mkfs configuration.  The situation is even more
     complex with network filesystems as this may depend on the server they're
     talking to.

     But note also that some of this stuff might change file-to-file, even
     within a superblock.

 (*) The ability to retrieve attributes of a mount point, including information
     on the flags, propagation settings and child lists.

 (*) The ability to quickly retrieve a list of accessible mount point IDs,
     with change event counters to permit userspace (eg. systemd) to quickly
     determine if anything changed in the even of an overrun.

 (*) The ability to find mounts/superblocks by mount ID.  Paths are not unique
     identifiers for mountpoints.  You can stack multiple mounts on the same
     directory, but a path only sees the top one.

 (*) The ability to look inside a different mount namespace - one to which you
     have a reference fd.  This would allow a container manager to look inside
     the container it is managing.

 (*) The ability to expose filesystem-specific attributes.  Network filesystems
     can expose lists of servers and server addresses, for instance.

 (*) The ability to use the object referenced to determine the namespace
     (particularly the network namespace) to look in.  The problem with looking
     in, say, /proc/net/... is that it looks at current's net namespace -
     whether or not the object of interest is in the same one.

 (*) The ability to query the context attached to the fd obtained from
     fsopen().  Such a context may not have a superblock attached to it yet or
     may not be mounted yet.

     The aim is to allow a container manager to supervise a mount being made in
     a container.  It kind of pairs with fsconfig() in that respect.

 (*) The ability to query mount and superblock event counters to help a
     watching process handle overrun in the notifications queue.

What I've done with fsinfo() is:

 (*) Provided a number of ways to refer to the object to be queried (path,
     dirfd+path, fd, mount ID - with others planned).

 (*) Made it so that attibutes are referenced by a numeric ID to keep search
     time minimal.  Numeric IDs must be declared in uapi/linux/fsinfo.h.

 (*) Made it so that the core does most of the work.  Filesystems are given an
     in-kernel buffer to copy into and don't get to see any userspace pointers.

 (*) Made it so that values are not, by and large, encoded as text if it can be
     avoided.  Backward and forward compatibility on binary structs is handled
     by the core.  The filesystem just fills in the values in the UAPI struct
     in the buffer.  The core will zero-pad or truncate the data to match what
     userspace asked for.

     The UAPI struct must be declared in uapi/linux/fsinfo.h.

 (*) Made it so that, for some attributes, the core will fill in the data as
     best it can from what's available in the superblock, mount struct or mount
     namespace.  The filesystem can then amend this if it wants to.

 (*) Made it so that attributes are typed.  The types are few: string, struct,
     list of struct, opaque.  Structs are extensible: the length is the
     version, a new version is required to be a superset of the old version and
     excess requestage is simply cleared by the kernel.

     Information about the type of an attribute can be queried by fsinfo().

What I want to avoid:

 (*) Adding another magic filesystem.

 (*) Adding symlinks from proc to sysfs.

 (*) Having to use open to get an attribute.

 (*) Having to use multiple opens to get an attribute.

 (*) Having to pathwalk to get to the attribute from the object being queried.

 (*) Allocating another O_ open flag for this.

 (*) Avoidable text encoding and decoding.

 (*) Letting the filesystem access the userspace buffer.

Note that I'm not against splitting fsinfo() into a set of sibling syscalls if
that makes it more palatable, or even against using strings for the attribute
IDs, though I'd prefer to avoid the strcmps.

David

^ permalink raw reply

* Re: [GIT PULL] Security subsystem updates for v5.9
From: pr-tracker-bot @ 2020-08-11 21:59 UTC (permalink / raw)
  To: James Morris; +Cc: Linus Torvalds, linux-kernel, linux-security-module
In-Reply-To: <alpine.LRH.2.21.2008110454190.26986@namei.org>

The pull request you sent on Tue, 11 Aug 2020 04:55:33 +1000 (AEST):

> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git tags/for-v5.9

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ce13266d97b198934e86166491bfa4938e96508f

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Casey Schaufler @ 2020-08-11 21:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andy Lutomirski, Linus Torvalds, linux-fsdevel, David Howells,
	Al Viro, Karel Zak, Jeff Layton, Miklos Szeredi, Nicolas Dichtel,
	Christian Brauner, Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List, Casey Schaufler
In-Reply-To: <CAJfpegvUBpb+C2Ab=CLAwWffOaeCedr-b7ZZKZnKvF4ph1nJrw@mail.gmail.com>

On 8/11/2020 1:28 PM, Miklos Szeredi wrote:
> On Tue, Aug 11, 2020 at 6:17 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
>> Since a////////b has known meaning, and lots of applications
>> play loose with '/', its really dangerous to treat the string as
>> special. We only get away with '.' and '..' because their behavior
>> was defined before many of y'all were born.
> So the founding fathers have set things in stone and now we can't
> change it.   Right?

The founders did lots of things that, in retrospect, weren't
such great ideas, but that we have to live with.

> Well that's how it looks... but let's think a little; we have '/' and
> '\0' that can't be used in filenames.  Also '.' and '..' are
> prohibited names. It's not a trivial limitation, so applications are
> probably not used to dumping binary data into file names.

Hee Hee. Back in the early days of UNIX (the 1970s) there was command
dsw(1) "delete from switches" because files with untypeible names where
unfortunately common. I would question the assertion that "applications
are not used to dumping binary data into file names", based on how
often I've wished we still had dsw(1).

>   And that
> means it's probably possible to find a fairly short combination that
> is never used in practice (probably containing the "/." sequence).

You'd think, but you'd be wrong. In the UNIX days we tried everything
from "..." to ".NO_HID." and there always arose a problem or two. Not
the least of which is that a "magic" pathname generated on an old system,
then mounted on a new system will never give you the results you want.

> Why couldn't we reserve such a combination now?
>
> I have no idea how to find such it, but other than that, I see no
> theoretical problem with extending the list of reserved filenames.

You need a sequence that is never used in any language, and
that has never been used as a magic shell sequence. If you want
a fun story to tell over beers, look up how using the "@" as the
erase character on a TTY33 lead to it being used in email addresses.

> Thanks,
> Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-11 21:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Casey Schaufler, Andy Lutomirski, Linus Torvalds, linux-fsdevel,
	David Howells, Karel Zak, Jeff Layton, Miklos Szeredi,
	Nicolas Dichtel, Christian Brauner, Lennart Poettering, Linux API,
	Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAJfpegvUBpb+C2Ab=CLAwWffOaeCedr-b7ZZKZnKvF4ph1nJrw@mail.gmail.com>

On Tue, Aug 11, 2020 at 10:28:31PM +0200, Miklos Szeredi wrote:
> On Tue, Aug 11, 2020 at 6:17 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> 
> > Since a////////b has known meaning, and lots of applications
> > play loose with '/', its really dangerous to treat the string as
> > special. We only get away with '.' and '..' because their behavior
> > was defined before many of y'all were born.
> 
> So the founding fathers have set things in stone and now we can't
> change it.   Right?

Right.

> Well that's how it looks... but let's think a little; we have '/' and
> '\0' that can't be used in filenames.  Also '.' and '..' are
> prohibited names. It's not a trivial limitation, so applications are
> probably not used to dumping binary data into file names.  And that
> means it's probably possible to find a fairly short combination that
> is never used in practice (probably containing the "/." sequence).

No, it is not.  Miklos, get real - you will end up with obscure
pathname produced once in a while by a script fragment from hell
spewed out by crusty piece of awk buried in a piece of shit
makefile from hell (and you are lucky if it won't be an automake
output, while we are at it).  Exercised only when some shipped
turd needs to be regenerated.  Have you _ever_ tried to debug e.g.
gcc build problems?  I have, and it's extremely unpleasant.  Failures
tend to be obscure as hell, backtracking them through the makefiles
is a massive PITA and figuring out why said piece of awk produces
what it does...

I know what I would've done if the likely 5 hours of cursing everything
would have ended up with discovery that some luser had assumed that
surely, no sane software would ever generate this sequence of characters
in anything used as a pathname, and that for this reason I'm looking
forward to several more hours of playing with utterly revolting crap
to convince it to stay away from that sequence...

> Why couldn't we reserve such a combination now?
> 
> I have no idea how to find such it, but other than that, I see no
> theoretical problem with extending the list of reserved filenames.

"not breaking userland", for one.

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Linus Torvalds @ 2020-08-11 21:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jann Horn, Casey Schaufler, Andy Lutomirski, linux-fsdevel,
	David Howells, Al Viro, Karel Zak, Jeff Layton, Miklos Szeredi,
	Nicolas Dichtel, Christian Brauner, Lennart Poettering, Linux API,
	Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAJfpeguh5VaDBdVkV3FJtRsMAvXHWUcBfEpQrYPEuX9wYzg9dA@mail.gmail.com>

On Tue, Aug 11, 2020 at 1:56 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> So that's where O_ALT comes in.   If the application is consenting,
> then that should prevent exploits.   Or?

If the application is consenting AND GETS IT RIGHT it should prevent exploits.

But that's a big deal.

Why not just do it the way I suggested? Then you don't have any of these issues.

              Linus

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Andy Lutomirski @ 2020-08-11 21:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jann Horn, Casey Schaufler, Linus Torvalds, linux-fsdevel,
	David Howells, Al Viro, Karel Zak, Jeff Layton, Miklos Szeredi,
	Nicolas Dichtel, Christian Brauner, Lennart Poettering, Linux API,
	Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAJfpeguh5VaDBdVkV3FJtRsMAvXHWUcBfEpQrYPEuX9wYzg9dA@mail.gmail.com>

On Tue, Aug 11, 2020 at 1:56 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, Aug 11, 2020 at 10:37 PM Jann Horn <jannh@google.com> wrote:
> > If you change the semantics of path strings, you'd have to be
> > confident that the new semantics fit nicely with all the path
> > validation routines that exist scattered across userspace, and don't
> > expose new interfaces through file server software and setuid binaries
> > and so on.
>
> So that's where O_ALT comes in.   If the application is consenting,
> then that should prevent exploits.   Or?

We're going to be at risk from libraries that want to use the new
O_ALT mechanism but are invoked by old code that passes traditional
Linux paths.  Each library will have to sanitize paths, and some will
screw it up.

I much prefer Linus' variant where the final part of the extended path
is passed as a separate parameter.

^ permalink raw reply

* Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Madhavan T. Venkataraman @ 2020-08-11 21:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kernel Hardening, Linux API, linux-arm-kernel, Linux FS Devel,
	linux-integrity, LKML, LSM List, Oleg Nesterov, X86 ML
In-Reply-To: <5f4e024b-cc14-8fe9-dc4a-df09da2a98ae@linux.microsoft.com>

I am working on version 2 of trampfd. Will send it out soon.

Thanks for all the comments so far!

Madhavan

On 8/10/20 12:34 PM, Madhavan T. Venkataraman wrote:
> Resending because of mailer problems. Some of the recipients did not receive
> my email. I apologize. Sigh.
> 
> Here is a redefinition of trampfd based on review comments.
> 
> I wanted to address dynamic code in 3 different ways:
> 
>     Remove the need for dynamic code where possible
>     --------------------------------------------------------------------
> 
>     If the kernel itself can perform the work of some dynamic code, then
>     the code can be replaced by the kernel.
> 
>     This is what I implemented in the patchset. But reviewers objected
>     to the performance impact. One trip to the kernel was needed for each
>     trampoline invocation. So, I have decided to defer this approach.
> 
>     Convert dynamic code to static code where possible
>     ----------------------------------------------------------------------
> 
>     This is possible with help from the kernel. This has no performance
>     impact and can be used in libffi, GCC nested functions, etc. I have
>     described the approach below.
> 
>     Deal with code generation
>     -----------------------------------
> 
>     For cases like generating JIT code from Java byte code, I wanted to
>     establish a framework. However, reviewers felt that details are missing.
> 
>     Should the kernel generate code or should it use a user-level code generator?
>     How do you make sure that a user level code generator can be trusted?
>     How would the communication work? ABI details? Architecture support?
>     Support for different types - JIT, DBT, etc?
> 
>     I have come to the conclusion that this is best done separately.
> 
> My main interest is to provide a way to convert dynamic code such as
> trampolines to static code without any special architecture support.
> This can be done with the kernel's help. Any code that gets written in
> the future can conform to this as well.
> 
> So, in version 2 of the Trampfd RFC, I would like to simplify trampfd and
> just address item 2. I will reimplement the support in libffi and present it.
> 
> Convert dynamic code to static code
> ------------------------------------------------
> 
> One problem with dynamic code is that it cannot be verified or authenticated
> by the kernel. The kernel cannot tell the difference between genuine dynamic
> code and an attacker's code. Where possible, dynamic code should be converted
> to static code and placed in the text segment of a binary file. This allows
> the kernel to verify the code by verifying the signature of the file.
> 
> The other problem is using user-level methods to load and execute dynamic code
> can potentially be exploited by an attacker to inject his code and have it be
> executed. To prevent this, a system may enforce W^X. If W^X is enforced
> properly, genuine dynamic code will not be able to run. This is another
> reason to convert dynamic code to static code.
> 
> The issue in converting dynamic code to static code is that the data is
> dynamic. The code does not know before hand where the data is going to be
> at runtime.
> 
> Some architectures support PC-relative data references. So, if you co-locate
> code and data, then the code can find the data at runtime. But this is not
> supported on all architectures. When supported, there may be limitations to
> deal with. Plus you have to take the trouble to co-locate code and data.
> And, to deal with W^X, code and data need to be in different pages.
> 
> All architectures must be supported without any limitations. Fortunately,
> the kernel can solve this problem quite easily. I suggest the following:
> 
> Convert dynamic code to static code like this:
> 
>     - Decide which register should point to the data that the code needs.
>       Call it register R.
> 
>     - Write the static code assuming that R already points to the data.
> 
>     - Use trampfd and pass the following to the kernel:
> 
>         - pointers to the code and data
>         - the name of the register R
> 
> The kernel will write the following instructions in a trampoline page
> mapped into the caller's address space with R-X.
> 
>     - Load the data address in register R
>     - Jump to the static code
> 
> Basically, the kernel provides a trampoline to jump to the user's code
> and returns the kernel-provided trampoline's address to the user.
> 
> It is trivial to implement a trampoline table in the trampoline page to
> conserve memory.
> 
> Issues raised previously
> -------------------------------
> 
> I believe that the following issues that were raised by reviewers is not
> a problem in this scheme. Please rereview.
> 
>     - Florian mentioned the libffi trampoline table. Trampoline tables can be
>       implemented in this scheme easily.
> 
>     - Florian mentioned stack unwinders. I am not an expert on unwinders.
>       But I don't see an issue with unwinders.
> 
>     - Mark Rutland mentioned Intel's CET and CFI. Don't see a problem there.
> 
>     - Mark Rutland mentioned PAC+BTI on ARM64. Don't see a problem there.
> 
> If I have missed addressing any previously raised issue, I apologize.
> Please let me know.
> 
> Thanks!
> 
> Madhavan
> 
> 

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: James Morris @ 2020-08-11 21:03 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Mimi Zohar, James Bottomley, Deven Bowers, Pavel Machek,
	Sasha Levin, snitzer, dm-devel, tyhicks, agk, Paul Moore,
	Jonathan Corbet, nramas, serge, pasha.tatashin, Jann Horn,
	linux-block, Al Viro, Jens Axboe, mdsakib, open list, eparis,
	linux-security-module, linux-audit, linux-fsdevel,
	linux-integrity, jaskarankhurana
In-Reply-To: <329E8DBA-049E-4959-AFD4-9D118DEB176E@gmail.com>

On Sat, 8 Aug 2020, Chuck Lever wrote:

> My interest is in code integrity enforcement for executables stored
> in NFS files.
> 
> My struggle with IPE is that due to its dependence on dm-verity, it
> does not seem to able to protect content that is stored separately
> from its execution environment and accessed via a file access
> protocol (FUSE, SMB, NFS, etc).

It's not dependent on DM-Verity, that's just one possible integrity 
verification mechanism, and one of two supported in this initial 
version. The other is 'boot_verified' for a verified or otherwise trusted 
rootfs. Future versions will support FS-Verity, at least.

IPE was designed to be extensible in this way, with a strong separation of 
mechanism and policy.

Whatever is implemented for NFS should be able to plug in to IPE pretty 
easily.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox