Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Linus Torvalds @ 2020-08-11 16:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, David Howells, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegtWai+5Tzxi1_G+R2wEZz0q66uaOFndNE0YEQSDjq0f_A@mail.gmail.com>

On Tue, Aug 11, 2020 at 8:30 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> What's the disadvantage of doing it with a single lookup WITH an enabling flag?
>
> It's definitely not going to break anything, so no backward
> compatibility issues whatsoever.

No backwards compatibility issues for existing programs, no.

But your suggestion is fundamentally ambiguous, and you most
definitely *can* hit that if people start using this in new programs.

Where does that "unified" pathname come from? It will be generated
from "base filename + metadata name" in user space, and

 (a) the base filename might have double or triple slashes in it for
whatever reasons.

This is not some "made-up gotcha" thing - I see double slashes *all*
the time when we have things like Makefiles doing

    srctree=../../src/

and then people do "$(srctree)/". If you haven't seen that kind of
pattern where the pathname has two (or sometimes more!) slashes in the
middle, you've led a very sheltered life.

 (b) even if the new user space were to think about that, and remove
those (hah! when have you ever seen user space do that?), as Al
mentioned, the user *filesystem* might have pathnames with double
slashes as part of symlinks.

So now we'd have to make sure that when we traverse symlinks, that
O_ALT gets cleared. Which means that it's not a unified namespace
after all, because you can't make symlinks point to metadata.

Or we'd retroactively change the semantics of a symlink, and that _is_
a backwards compatibility issue. Not with old software, no, but it
changes the meaning of old symlinks!

So no, I don't think a unified namespace ends up working.

And I say that as somebody who actually loves the concept. Ask Al: I
have a few times pushed for "let's allow directory behavior on regular
files", so that you could do things like a tar-filesystem, and access
the contents of a tar-file by just doing

    cat my-file.tar/inside/the/archive.c

or similar.

Al has convinced me it's a horrible idea (and there you have a
non-ambiguous marker: the slash at the end of a pathname that
otherwise looks and acts as a non-directory)

               Linus

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-11 16:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, linux-fsdevel, David Howells, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjzLmMRf=QG-n+1HnxWCx4KTQn9+OhVvUSJ=ZCQd6Y1WA@mail.gmail.com>

On Tue, Aug 11, 2020 at 08:20:24AM -0700, Linus Torvalds wrote:

> I don't think this works for the reasons Al says, but a slight
> modification might.
> 
> IOW, if you do something more along the lines of
> 
>        fd = open(""foo/bar", O_PATH);
>        metadatafd = openat(fd, "metadataname", O_ALT);
> 
> it might be workable.
> 
> So you couldn't do it with _one_ pathname, because that is always
> fundamentally going to hit pathname lookup rules.
> 
> But if you start a new path lookup with new rules, that's fine.

Except that you suddenly see non-directory dentries get children.
And a lot of dcache-related logics needs to be changed if that
becomes possible.

I agree that xattrs are garbage, but this approach won't be
a straightforward solution.  Can those suckers be passed to
...at() as starting points?  Can they be bound in namespace?
Can something be bound *on* them?  What do they have for inodes
and what maintains their inumbers (and st_dev, while we are at
it)?  Can _they_ have secondaries like that (sensu Swift)?
Is that a flat space, or can they be directories?

Only a part of the problems is implementation-related (and those are
not trivial at all); most the fun comes from semantics of those things.
And answers to the implementation questions are seriously dependent upon
that...

^ permalink raw reply

* Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Madhavan T. Venkataraman @ 2020-08-11 15:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mark Rutland, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86
In-Reply-To: <20200811130837.hi6wllv6g67j5wds@duo.ucw.cz>



On 8/11/20 8:08 AM, Pavel Machek wrote:
> Hi!
> 
>>>> Thanks for the lively discussion. I have tried to answer some of the
>>>> comments below.
>>>
>>>>> There are options today, e.g.
>>>>>
>>>>> a) If the restriction is only per-alias, you can have distinct aliases
>>>>>    where one is writable and another is executable, and you can make it
>>>>>    hard to find the relationship between the two.
>>>>>
>>>>> b) If the restriction is only temporal, you can write instructions into
>>>>>    an RW- buffer, transition the buffer to R--, verify the buffer
>>>>>    contents, then transition it to --X.
>>>>>
>>>>> c) You can have two processes A and B where A generates instrucitons into
>>>>>    a buffer that (only) B can execute (where B may be restricted from
>>>>>    making syscalls like write, mprotect, etc).
>>>>
>>>> The general principle of the mitigation is W^X. I would argue that
>>>> the above options are violations of the W^X principle. If they are
>>>> allowed today, they must be fixed. And they will be. So, we cannot
>>>> rely on them.
>>>
>>> Would you mind describing your threat model?
>>>
>>> Because I believe you are using model different from everyone else.
>>>
>>> In particular, I don't believe b) is a problem or should be fixed.
>>
>> It is a problem because a kernel that implements W^X properly
>> will not allow it. It has no idea what has been done in userland.
>> It has no idea that the user has checked and verified the buffer
>> contents after transitioning the page to R--.
> 
> No, it is not a problem. W^X is designed to protect from attackers
> doing buffer overflows, not attackers doing arbitrary syscalls.
> 

Hey Pavel,

You are correct. The W^X implementation today still has some holes.
IIUC, the principle of W^X is - user should not be able to (W) write code
into a page and use some trick to get it to (X) execute. So, what I
was trying to say was that the W^X principle is not implemented
completely today.

Mark Rutland mentioned some other tricks as well which are being used
today.

For instance, Microsoft has submitted this proposal:

 https://microsoft.github.io/ipe/

IPE is an LSM. In this proposal, only mappings that are backed by a
signature verified file can have execute permissions. This means that
all anonymous page based tricks will fail. And, file mapping based
tricks will fail as well when temporary files are used to load code
and mmap(). That is the intent.

Thanks!

Madhavan

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: James Bottomley @ 2020-08-11 15:53 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <16C3BF97-A7D3-488A-9D26-7C9B18AD2084@gmail.com>

On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
> > On Aug 11, 2020, at 1:43 AM, James Bottomley <James.Bottomley@Hanse
> > nPartnership.com> wrote:
> > 
> > On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
> > > > On Aug 10, 2020, at 11:35 AM, James Bottomley
> > > > <James.Bottomley@HansenPartnership.com> wrote:
[...]
> > > > The first basic is that a merkle tree allows unit at a time
> > > > verification. First of all we should agree on the unit.  Since
> > > > we always fault a page at a time, I think our merkle tree unit
> > > > should be a page not a block.
> > > 
> > > Remote filesystems will need to agree that the size of that unit
> > > is the same everywhere, or the unit size could be stored in the
> > > per-filemetadata.
> > > 
> > > 
> > > > Next, we should agree where the check gates for the per page
> > > > accesses should be ... definitely somewhere in readpage, I
> > > > suspect and finally we should agree how the merkle tree is
> > > > presented at the gate.  I think there are three ways:
> > > > 
> > > >  1. Ahead of time transfer:  The merkle tree is transferred and
> > > > verified
> > > >     at some time before the accesses begin, so we already have
> > > > a
> > > >     verified copy and can compare against the lower leaf.
> > > >  2. Async transfer:  We provide an async mechanism to transfer
> > > > the
> > > >     necessary components, so when presented with a unit, we
> > > > check the
> > > >     log n components required to get to the root
> > > >  3. The protocol actually provides the capability of 2 (like
> > > > the SCSI
> > > >     DIF/DIX), so to IMA all the pieces get presented instead of
> > > > IMA
> > > >     having to manage the tree
> > > 
> > > A Merkle tree is potentially large enough that it cannot be
> > > stored in an extended attribute. In addition, an extended
> > > attribute is not a byte stream that you can seek into or read
> > > small parts of, it is retrieved in a single shot.
> > 
> > Well you wouldn't store the tree would you, just the head
> > hash.  The rest of the tree can be derived from the data.  You need
> > to distinguish between what you *must* have to verify integrity
> > (the head hash, possibly signed)
> 
> We're dealing with an untrusted storage device, and for a remote
> filesystem, an untrusted network.
> 
> Mimi's earlier point is that any IMA metadata format that involves
> unsigned digests is exposed to an alteration attack at rest or in
> transit, thus will not provide a robust end-to-end integrity
> guarantee.
> 
> Therefore, tree root digests must be cryptographically signed to be
> properly protected in these environments. Verifying that signature
> should be done infrequently relative to reading a file's content.

I'm not disagreeing there has to be a way for the relying party to
trust the root hash.

> > and what is nice to have to speed up the verification
> > process.  The choice for the latter is cache or reconstruct
> > depending on the resources available.  If the tree gets cached on
> > the server, that would be a server implementation detail invisible
> > to the client.
> 
> We assume that storage targets (for block or file) are not trusted.
> Therefore storage clients cannot rely on intermediate results (eg,
> middle nodes in a Merkle tree) unless those results are generated
> within the client's trust envelope.

Yes, they can ... because supplied nodes can be verified.  That's the
whole point of a merkle tree.  As long as I'm sure of the root hash I
can verify all the rest even if supplied by an untrusted source.  If
you consider a simple merkle tree covering 4 blocks:

       R
     /   \
  H11     H12
  / \     / \
H21 H22 H23 H24
 |    |   |   |
B1   B2  B3  B4

Assume I have the verified root hash R.  If you supply B3 you also
supply H24 and H11 as proof.  I verify by hashing B3 to produce H23
then hash H23 and H24 to produce H12 and if H12 and your supplied H11
hash to R the tree is correct and the B3 you supplied must likewise be
correct.

> So: if the storage target is considered inside the client's trust
> envelope, it can cache or store durably any intermediate parts of
> the verification process. If not, the network and file storage is
> considered untrusted, and the client has to rely on nothing but the
> signed digest of the tree root.
> 
> We could build a scheme around, say, fscache, that might save the
> intermediate results durably and locally.

I agree we want caching on the client, but we can always page in from
the remote as long as we page enough to verify up to R, so we're always
sure the remote supplied genuine information.

> > > For this reason, the idea was to save only the signature of the
> > > tree's root on durable storage. The client would retrieve that
> > > signature possibly at open time, and reconstruct the tree at that
> > > time.
> > 
> > Right that's the integrity data you must have.
> > 
> > > Or the tree could be partially constructed on-demand at the time
> > > each unit is to be checked (say, as part of 2. above).
> > 
> > Whether it's reconstructed or cached can be an implementation
> > detail. You clearly have to reconstruct once, but whether you have
> > to do it again depends on the memory available for caching and all
> > the other resource calls in the system.
> > 
> > > The client would have to reconstruct that tree again if memory
> > > pressure caused some or all of the tree to be evicted, so perhaps
> > > an on-demand mechanism is preferable.
> > 
> > Right, but I think that's implementation detail.  Probably what we
> > need is a way to get the log(N) verification hashes from the server
> > and it's up to the client whether it caches them or not.
> 
> Agreed, these are implementation details. But see above about the
> trustworthiness of the intermediate hashes. If they are conveyed
> on an untrusted network, then they can't be trusted either.

Yes, they can, provided enough of them are asked for to verify.  If you
look at the simple example above, suppose I have cached H11 and H12,
but I've lost the entire H2X layer.  I want to verify B3 so I also ask
you for your copy of H24.  Then I generate H23 from B3 and Hash H23 and
H24.  If this doesn't hash to H12 I know either you supplied me the
wrong block or lied about H24.  However, if it all hashes correctly I
know you supplied me with both the correct B3 and the correct H24.

> > > > There are also a load of minor things like how we get the head
> > > > hash, which must be presented and verified ahead of time for
> > > > each of the above 3.
> > > 
> > > Also, changes to a file's content and its tree signature are not
> > > atomic. If a file is mutable, then there is the period between
> > > when the file content has changed and when the signature is
> > > updated. Some discussion of how a client is to behave in those
> > > situations will be necessary.
> > 
> > For IMA, if you write to a checked file, it gets rechecked the next
> > time the gate (open/exec/mmap) is triggered.  This means you must
> > complete the update and have the new integrity data in-place before
> > triggering the check.  I think this could apply equally to a merkel
> > tree based system.  It's a sort of Doctor, Doctor it hurts when I
> > do this situation.
> 
> I imagine it's a common situation where a "yum update" process is
> modifying executables while clients are running them. To prevent
> a read from pulling refreshed content before the new tree root is
> available, it would have to block temporarily until the verification
> process succeeds with the updated tree root.

No ... it's not.  Yum specifically worries about that today because if
you update running binaries, it causes a crash.  Yum constructs the
entire new file then atomically links it into place and deletes the old
inode to prevent these crashes.  It never allows you to get into the
situation where you can execute something that will be modified. 
That's also why you have to restart stuff after a yum update because if
you didn't it would still be attached to the deleted inode.

James


^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Andy Lutomirski @ 2020-08-11 15:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, linux-fsdevel, David Howells, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjzLmMRf=QG-n+1HnxWCx4KTQn9+OhVvUSJ=ZCQd6Y1WA@mail.gmail.com>



> On Aug 11, 2020, at 8:20 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> [ I missed the beginning of this discussion, so maybe this was already
> suggested ]
> 
>> On Tue, Aug 11, 2020 at 6:54 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>> 
>>> 
>>> E.g.
>>>  openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
>> 
>> Proof of concept patch and test program below.
> 
> I don't think this works for the reasons Al says, but a slight
> modification might.
> 
> IOW, if you do something more along the lines of
> 
>       fd = open(""foo/bar", O_PATH);
>       metadatafd = openat(fd, "metadataname", O_ALT);
> 
> it might be workable.
> 
> So you couldn't do it with _one_ pathname, because that is always
> fundamentally going to hit pathname lookup rules.
> 
> But if you start a new path lookup with new rules, that's fine.
> 
> This is what I think xattrs should always have done, because they are
> broken garbage.
> 
> In fact, if we do it right, I think we could have "getxattr()" be 100%
> equivalent to (modulo all the error handling that this doesn't do, of
> course):
> 
>  ssize_t getxattr(const char *path, const char *name,
>                        void *value, size_t size)
>  {
>     int fd, attrfd;
> 
>     fd = open(path, O_PATH);
>     attrfd = openat(fd, name, O_ALT);
>     close(fd);
>     read(attrfd, value, size);
>     close(attrfd);
>  }
> 
> and you'd still use getxattr() and friends as a shorthand (and for
> POSIX compatibility), but internally in the kernel we'd have a
> interface around that "xattrs are just file handles" model.
> 
> 

This is a lot like a less nutty version of NTFS streams, whereas the /// idea is kind of like an extra-nutty version of NTFS streams.

I am personally not a fan of the in-band signaling implications of overloading /.  For example, there is plenty of code out there that thinks that (a + “/“ + b) concatenates paths. With /// overloaded, this stops being true.

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: James Bottomley @ 2020-08-11 15:32 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <16C3BF97-A7D3-488A-9D26-7C9B18AD2084@gmail.com>

On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
> > On Aug 11, 2020, at 1:43 AM, James Bottomley
> > <James.Bottomley@HansenPartnership.com> wrote:
> > On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
[...]
> > > Thanks for the help! I just want to emphasize that documentation
> > > (eg, a specification) will be critical for remote filesystems.
> > > 
> > > If any of this is to be supported by a remote filesystem, then we
> > > need an unencumbered description of the new metadata format
> > > rather than code. GPL-encumbered formats cannot be contributed to
> > > the NFS standard, and are probably difficult for other
> > > filesystems that are not Linux-native, like SMB, as well.
> > 
> > I don't understand what you mean by GPL encumbered formats.  The
> > GPL is a code licence not a data or document licence.
> 
> IETF contributions occur under a BSD-style license incompatible
> with the GPL.
> 
> https://trustee.ietf.org/trust-legal-provisions.html
> 
> Non-Linux implementers (of OEM storage devices) rely on such
> standards processes to indemnify them against licensing claims.

Well, that simply means we won't be contributing the Linux
implementation, right? However, IETF doesn't require BSD for all
implementations, so that's OK.

> Today, there is no specification for existing IMA metadata formats,
> there is only code. My lawyer tells me that because the code that
> implements these formats is under GPL, the formats themselves cannot
> be contributed to, say, the IETF without express permission from the
> authors of that code. There are a lot of authors of the Linux IMA
> code, so this is proving to be an impediment to contribution. That
> blocks the ability to provide a fully-specified NFS protocol
> extension to support IMA metadata formats.

Well, let me put the counterpoint: I can write a book about how linux
device drivers work (which includes describing the data formats), for
instance, without having to get permission from all the authors ... or
is your lawyer taking the view we should be suing Jonathan Corbet,
Alessandro Rubini, and Greg Kroah-Hartman for licence infringement?  In
fact do they think we now have a huge class action possibility against
O'Reilly  and a host of other publishers ...

> > The way the spec process works in Linux is that we implement or
> > evolve a data format under a GPL implementaiton, but that
> > implementation doesn't implicate the later standardisation of the
> > data format and people are free to reimplement under any licence
> > they choose.
> 
> That technology transfer can happen only if all the authors of that
> prototype agree to contribute to a standard. That's much easier if
> that agreement comes before an implementation is done. The current
> IMA code base is more than a decade old, and there are more than a
> hundred authors who have contributed to that base.
> 
> Thus IMO we want an unencumbered description of any IMA metadata
> format that is to be contributed to an open standards body (as it
> would have to be to extend, say, the NFS protocol).

Fine, good grief, people who take a sensible view of this can write the
data format down and publish it under any licence you like then you can
pick it up again safely.  Would CC0 be OK? ... neither GPL nor BSD are
document licences and we shouldn't perpetuate bad practice by licensing
documentation under them.

James


^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-11 15:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, David Howells, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjzLmMRf=QG-n+1HnxWCx4KTQn9+OhVvUSJ=ZCQd6Y1WA@mail.gmail.com>

On Tue, Aug 11, 2020 at 5:20 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> [ I missed the beginning of this discussion, so maybe this was already
> suggested ]
>
> On Tue, Aug 11, 2020 at 6:54 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > >
> > > E.g.
> > >   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
> >
> > Proof of concept patch and test program below.
>
> I don't think this works for the reasons Al says, but a slight
> modification might.
>
> IOW, if you do something more along the lines of
>
>        fd = open(""foo/bar", O_PATH);
>        metadatafd = openat(fd, "metadataname", O_ALT);
>
> it might be workable.

That would have been my backup suggestion, in case the unified
namespace doesn't work out.

I wouldn't think the normal lookup rules really get in the way if we
explicitly enable alternative path lookup with a flag.  The rules just
need to be documented.

What's the disadvantage of doing it with a single lookup WITH an enabling flag?

It's definitely not going to break anything, so no backward
compatibility issues whatsoever.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Linus Torvalds @ 2020-08-11 15:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, David Howells, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200811135419.GA1263716@miu.piliscsaba.redhat.com>

[ I missed the beginning of this discussion, so maybe this was already
suggested ]

On Tue, Aug 11, 2020 at 6:54 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> >
> > E.g.
> >   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
>
> Proof of concept patch and test program below.

I don't think this works for the reasons Al says, but a slight
modification might.

IOW, if you do something more along the lines of

       fd = open(""foo/bar", O_PATH);
       metadatafd = openat(fd, "metadataname", O_ALT);

it might be workable.

So you couldn't do it with _one_ pathname, because that is always
fundamentally going to hit pathname lookup rules.

But if you start a new path lookup with new rules, that's fine.

This is what I think xattrs should always have done, because they are
broken garbage.

In fact, if we do it right, I think we could have "getxattr()" be 100%
equivalent to (modulo all the error handling that this doesn't do, of
course):

  ssize_t getxattr(const char *path, const char *name,
                        void *value, size_t size)
  {
     int fd, attrfd;

     fd = open(path, O_PATH);
     attrfd = openat(fd, name, O_ALT);
     close(fd);
     read(attrfd, value, size);
     close(attrfd);
  }

and you'd still use getxattr() and friends as a shorthand (and for
POSIX compatibility), but internally in the kernel we'd have a
interface around that "xattrs are just file handles" model.

               Linus

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Chuck Lever @ 2020-08-11 14:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <1597124623.30793.14.camel@HansenPartnership.com>

> On Aug 11, 2020, at 1:43 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
>>> On Aug 10, 2020, at 11:35 AM, James Bottomley
>>> <James.Bottomley@HansenPartnership.com> wrote:
>>> On Sun, 2020-08-09 at 13:16 -0400, Mimi Zohar wrote:
>>>> On Sat, 2020-08-08 at 13:47 -0400, Chuck Lever wrote:
> [...]
>>>>> The first priority (for me, anyway) therefore is getting the
>>>>> ability to move IMA metadata between NFS clients and servers
>>>>> shoveled into the NFS protocol, but that's been blocked for
>>>>> various legal reasons.
>>>> 
>>>> Up to now, verifying remote filesystem file integrity has been
>>>> out of scope for IMA.   With fs-verity file signatures I can at
>>>> least grasp how remote file integrity could possibly work.  I
>>>> don't understand how remote file integrity with existing IMA
>>>> formats could be supported. You might want to consider writing a
>>>> whitepaper, which could later be used as the basis for a patch
>>>> set cover letter.
>>> 
>>> I think, before this, we can help with the basics (and perhaps we
>>> should sort them out before we start documenting what we'll do).
>> 
>> Thanks for the help! I just want to emphasize that documentation
>> (eg, a specification) will be critical for remote filesystems.
>> 
>> If any of this is to be supported by a remote filesystem, then we
>> need an unencumbered description of the new metadata format rather
>> than code. GPL-encumbered formats cannot be contributed to the NFS
>> standard, and are probably difficult for other filesystems that are
>> not Linux-native, like SMB, as well.
> 
> I don't understand what you mean by GPL encumbered formats.  The GPL is
> a code licence not a data or document licence.

IETF contributions occur under a BSD-style license incompatible
with the GPL.

https://trustee.ietf.org/trust-legal-provisions.html

Non-Linux implementers (of OEM storage devices) rely on such standards
processes to indemnify them against licensing claims.

Today, there is no specification for existing IMA metadata formats,
there is only code. My lawyer tells me that because the code that
implements these formats is under GPL, the formats themselves cannot
be contributed to, say, the IETF without express permission from the
authors of that code. There are a lot of authors of the Linux IMA
code, so this is proving to be an impediment to contribution. That
blocks the ability to provide a fully-specified NFS protocol
extension to support IMA metadata formats.

> The way the spec
> process works in Linux is that we implement or evolve a data format
> under a GPL implementaiton, but that implementation doesn't implicate
> the later standardisation of the data format and people are free to
> reimplement under any licence they choose.

That technology transfer can happen only if all the authors of that
prototype agree to contribute to a standard. That's much easier if
that agreement comes before an implementation is done. The current
IMA code base is more than a decade old, and there are more than a
hundred authors who have contributed to that base.

Thus IMO we want an unencumbered description of any IMA metadata
format that is to be contributed to an open standards body (as it
would have to be to extend, say, the NFS protocol).

I'm happy to write that specification, as long as any contributions
here are unencumbered and acknowledged. In fact, I have been working
on a (so far, flawed) NFS extension:

https://datatracker.ietf.org/doc/draft-ietf-nfsv4-integrity-measurement/

Now, for example, the components of a putative Merkle-based IMA
metadata format are all already open:

- The unit size is just an integer

- A certificate fingerprint is a de facto standard, and the
fingerprint digest algorithms are all standardized

- Merkle trees are public domain, I believe, and we're not adding
any special sauce here

- Digital signing algorithms are all standardized

Wondering if we want to hash and save the file's mtime and size too.

>>> The first basic is that a merkle tree allows unit at a time
>>> verification. First of all we should agree on the unit.  Since we
>>> always fault a page at a time, I think our merkle tree unit should
>>> be a page not a block.
>> 
>> Remote filesystems will need to agree that the size of that unit is
>> the same everywhere, or the unit size could be stored in the per-file
>> metadata.
>> 
>> 
>>> Next, we should agree where the check gates for the per page
>>> accesses should be ... definitely somewhere in readpage, I suspect
>>> and finally we should agree how the merkle tree is presented at the
>>> gate.  I think there are three ways:
>>> 
>>>  1. Ahead of time transfer:  The merkle tree is transferred and
>>> verified
>>>     at some time before the accesses begin, so we already have a
>>>     verified copy and can compare against the lower leaf.
>>>  2. Async transfer:  We provide an async mechanism to transfer the
>>>     necessary components, so when presented with a unit, we check
>>> the
>>>     log n components required to get to the root
>>>  3. The protocol actually provides the capability of 2 (like the
>>> SCSI
>>>     DIF/DIX), so to IMA all the pieces get presented instead of
>>> IMA
>>>     having to manage the tree
>> 
>> A Merkle tree is potentially large enough that it cannot be stored in
>> an extended attribute. In addition, an extended attribute is not a
>> byte stream that you can seek into or read small parts of, it is
>> retrieved in a single shot.
> 
> Well you wouldn't store the tree would you, just the head hash.  The
> rest of the tree can be derived from the data.  You need to distinguish
> between what you *must* have to verify integrity (the head hash,
> possibly signed)

We're dealing with an untrusted storage device, and for a remote
filesystem, an untrusted network.

Mimi's earlier point is that any IMA metadata format that involves
unsigned digests is exposed to an alteration attack at rest or in
transit, thus will not provide a robust end-to-end integrity
guarantee.

Therefore, tree root digests must be cryptographically signed to be
properly protected in these environments. Verifying that signature
should be done infrequently relative to reading a file's content.

> and what is nice to have to speed up the verification
> process.  The choice for the latter is cache or reconstruct depending
> on the resources available.  If the tree gets cached on the server,
> that would be a server implementation detail invisible to the client.

We assume that storage targets (for block or file) are not trusted.
Therefore storage clients cannot rely on intermediate results (eg,
middle nodes in a Merkle tree) unless those results are generated
within the client's trust envelope.

So: if the storage target is considered inside the client's trust
envelope, it can cache or store durably any intermediate parts of
the verification process. If not, the network and file storage is
considered untrusted, and the client has to rely on nothing but the
signed digest of the tree root.

We could build a scheme around, say, fscache, that might save the
intermediate results durably and locally.

>> For this reason, the idea was to save only the signature of the
>> tree's root on durable storage. The client would retrieve that
>> signature possibly at open time, and reconstruct the tree at that
>> time.
> 
> Right that's the integrity data you must have.
> 
>> Or the tree could be partially constructed on-demand at the time each
>> unit is to be checked (say, as part of 2. above).
> 
> Whether it's reconstructed or cached can be an implementation detail.
> You clearly have to reconstruct once, but whether you have to do it
> again depends on the memory available for caching and all the other
> resource calls in the system.
> 
>> The client would have to reconstruct that tree again if memory
>> pressure caused some or all of the tree to be evicted, so perhaps an
>> on-demand mechanism is preferable.
> 
> Right, but I think that's implementation detail.  Probably what we need
> is a way to get the log(N) verification hashes from the server and it's
> up to the client whether it caches them or not.

Agreed, these are implementation details. But see above about the
trustworthiness of the intermediate hashes. If they are conveyed
on an untrusted network, then they can't be trusted either.

>>> There are also a load of minor things like how we get the head
>>> hash, which must be presented and verified ahead of time for each
>>> of the above 3.
>> 
>> Also, changes to a file's content and its tree signature are not
>> atomic. If a file is mutable, then there is the period between when
>> the file content has changed and when the signature is updated.
>> Some discussion of how a client is to behave in those situations will
>> be necessary.
> 
> For IMA, if you write to a checked file, it gets rechecked the next
> time the gate (open/exec/mmap) is triggered.  This means you must
> complete the update and have the new integrity data in-place before
> triggering the check.  I think this could apply equally to a merkel
> tree based system.  It's a sort of Doctor, Doctor it hurts when I do
> this situation.

I imagine it's a common situation where a "yum update" process is
modifying executables while clients are running them. To prevent
a read from pulling refreshed content before the new tree root is
available, it would have to block temporarily until the verification
process succeeds with the updated tree root.

--
Chuck Lever
chucklever@gmail.com

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-11 14:47 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, David Howells, Linus Torvalds, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <20200811144247.GK1236603@ZenIV.linux.org.uk>

On Tue, Aug 11, 2020 at 4:42 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Aug 11, 2020 at 04:36:32PM +0200, Miklos Szeredi wrote:
>
> > > >  - strip off trailing part after first instance of ///
> > > >  - perform path lookup as normal
> > > >  - resolve meta path after /// on result of normal lookup
> > >
> > > ... and interpolation of relative symlink body into the pathname does change
> > > behaviour now, *including* the cases when said symlink body does not contain
> > > that triple-X^Hslash garbage.  Wonderful...
> >
> > Can you please explain?
>
> Currently substituting the body of a relative symlink in place of its name
> results in equivalent pathname.

Except proc symlinks, that is.

>  With your patch that is not just no longer
> true, it's no longer true even when the symlink body does not contain that
> /// kludge - it can come in part from the symlink body and in part from the
> rest of pathname.  I.e. you can't even tell if substitution is an equivalent
> replacement by looking at the symlink body alone.

Yes, that's true not just for symlink bodies but any concatenation of
two path segments.

That's why it's enabled with RESOLVE_ALT.  I've said that I plan to
experiment with turning this on globally, but that doesn't mean it's
necessarily a good idea.  The posted patch contains nothing of that
sort.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-11 14:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, David Howells, Linus Torvalds, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <CAJfpegvAVOYg03q7n24d6faX33rd16WWi5+LLDdfwj_gRYaJLQ@mail.gmail.com>

On Tue, Aug 11, 2020 at 04:36:32PM +0200, Miklos Szeredi wrote:

> > >  - strip off trailing part after first instance of ///
> > >  - perform path lookup as normal
> > >  - resolve meta path after /// on result of normal lookup
> >
> > ... and interpolation of relative symlink body into the pathname does change
> > behaviour now, *including* the cases when said symlink body does not contain
> > that triple-X^Hslash garbage.  Wonderful...
> 
> Can you please explain?

Currently substituting the body of a relative symlink in place of its name
results in equivalent pathname.  With your patch that is not just no longer
true, it's no longer true even when the symlink body does not contain that
/// kludge - it can come in part from the symlink body and in part from the
rest of pathname.  I.e. you can't even tell if substitution is an equivalent
replacement by looking at the symlink body alone.

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-11 14:36 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, David Howells, Linus Torvalds, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <20200811143107.GI1236603@ZenIV.linux.org.uk>

On Tue, Aug 11, 2020 at 4:31 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Aug 11, 2020 at 04:22:19PM +0200, Miklos Szeredi wrote:
> > On Tue, Aug 11, 2020 at 4:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Tue, Aug 11, 2020 at 03:54:19PM +0200, Miklos Szeredi wrote:
> > > > On Wed, Aug 05, 2020 at 10:24:23AM +0200, Miklos Szeredi wrote:
> > > > > On Tue, Aug 4, 2020 at 4:36 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > > >
> > > > > > I think we already lost that with the xattr API, that should have been
> > > > > > done in a way that fits this philosophy.  But given that we  have "/"
> > > > > > as the only special purpose char in filenames, and even repetitions
> > > > > > are allowed, it's hard to think of a good way to do that.  Pity.
> > > > >
> > > > > One way this could be solved is to allow opting into an alternative
> > > > > path resolution mode.
> > > > >
> > > > > E.g.
> > > > >   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
> > > >
> > > > Proof of concept patch and test program below.
> > > >
> > > > Opted for triple slash in the hope that just maybe we could add a global
> > > > /proc/sys/fs/resolve_alt knob to optionally turn on alternative (non-POSIX) path
> > > > resolution without breaking too many things.  Will try that later...
> > > >
> > > > Comments?
> > >
> > > Hell, NO.  This is unspeakably tasteless.  And full of lovely corner cases wrt
> > > symlink bodies, etc.
> >
> > It's disabled inside symlink body resolution.
> >
> > Rules are simple:
> >
> >  - strip off trailing part after first instance of ///
> >  - perform path lookup as normal
> >  - resolve meta path after /// on result of normal lookup
>
> ... and interpolation of relative symlink body into the pathname does change
> behaviour now, *including* the cases when said symlink body does not contain
> that triple-X^Hslash garbage.  Wonderful...

Can you please explain?

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-11 14:36 UTC (permalink / raw)
  To: Tang Jiye
  Cc: Miklos Szeredi, linux-fsdevel, David Howells, Linus Torvalds,
	Karel Zak, Jeff Layton, Miklos Szeredi, Nicolas Dichtel,
	Christian Brauner, Lennart Poettering, Linux API, Ian Kent, LSM,
	linux-kernel
In-Reply-To: <CAAgocE07=vVKpQhG+rjEGO=NEBKZ02gjg4TRPxECAc+RKrzn=Q@mail.gmail.com>

On Tue, Aug 11, 2020 at 10:33:59AM -0400, Tang Jiye wrote:
> anyone knows how to post a question?

Generally the way you just have, except that you generally
put it *after* the relevant parts of the quoted text (and
removes the irrelevant ones).

^ permalink raw reply

* Re: [PATCH v7 0/7] Add support for O_MAYEXEC
From: Mimi Zohar @ 2020-08-11 14:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mickaël Salaün, Jann Horn, Kees Cook, Deven Bowers,
	Al Viro, Andrew Morton, kernel list, Aleksa Sarai,
	Alexei Starovoitov, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jonathan Corbet, Lakshmi Ramasubramanian, Matthew Garrett,
	Michael Kerrisk, Philippe Trébuchet, Scott Shell,
	Sean Christopherson, Shuah Khan, Steve Dower, Steve Grubb,
	Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	Kernel Hardening, Linux API, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200811140203.GQ17456@casper.infradead.org>

On Tue, 2020-08-11 at 15:02 +0100, Matthew Wilcox wrote:
> On Tue, Aug 11, 2020 at 09:56:50AM -0400, Mimi Zohar wrote:
> > On Tue, 2020-08-11 at 10:48 +0200, Mickaël Salaün wrote:
> > > On 11/08/2020 01:03, Jann Horn wrote:
> > > > On Tue, Aug 11, 2020 at 12:43 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > On 10/08/2020 22:21, Al Viro wrote:
> > > > > > On Mon, Aug 10, 2020 at 10:11:53PM +0200, Mickaël Salaün wrote:
> > > > > > > It seems that there is no more complains nor questions. Do you want me
> > > > > > > to send another series to fix the order of the S-o-b in patch 7?
> > > > > > 
> > > > > > There is a major question regarding the API design and the choice of
> > > > > > hooking that stuff on open().  And I have not heard anything resembling
> > > > > > a coherent answer.
> > > > > 
> > > > > Hooking on open is a simple design that enables processes to check files
> > > > > they intend to open, before they open them. From an API point of view,
> > > > > this series extends openat2(2) with one simple flag: O_MAYEXEC. The
> > > > > enforcement is then subject to the system policy (e.g. mount points,
> > > > > file access rights, IMA, etc.).
> > > > > 
> > > > > Checking on open enables to not open a file if it does not meet some
> > > > > requirements, the same way as if the path doesn't exist or (for whatever
> > > > > reasons, including execution permission) if access is denied.
> > > > 
> > > > You can do exactly the same thing if you do the check in a separate
> > > > syscall though.
> > > > 
> > > > And it provides a greater degree of flexibility; for example, you can
> > > > use it in combination with fopen() without having to modify the
> > > > internals of fopen() or having to use fdopen().
> > > > 
> > > > > It is a
> > > > > good practice to check as soon as possible such properties, and it may
> > > > > enables to avoid (user space) time-of-check to time-of-use (TOCTOU)
> > > > > attacks (i.e. misuse of already open resources).
> > > > 
> > > > The assumption that security checks should happen as early as possible
> > > > can actually cause security problems. For example, because seccomp was
> > > > designed to do its checks as early as possible, including before
> > > > ptrace, we had an issue for a long time where the ptrace API could be
> > > > abused to bypass seccomp filters.
> > > > 
> > > > Please don't decide that a check must be ordered first _just_ because
> > > > it is a security check. While that can be good for limiting attack
> > > > surface, it can also create issues when the idea is applied too
> > > > broadly.
> > > 
> > > I'd be interested with such security issue examples.
> > > 
> > > I hope that delaying checks will not be an issue for mechanisms such as
> > > IMA or IPE:
> > > https://lore.kernel.org/lkml/1544699060.6703.11.camel@linux.ibm.com/
> > > 
> > > Any though Mimi, Deven, Chrome OS folks?
> > 
> > One of the major gaps, defining a system wide policy requiring all code
> > being executed to be signed, is interpreters.  The kernel has no
> > context for the interpreter's opening the file.  From an IMA
> > perspective, this information needs to be conveyed to the kernel prior
> > to ima_file_check(), which would allow IMA policy rules to be defined
> > in terms of O_MAYEXEC.
> 
> This is kind of evading the point -- Mickaël is proposing a new flag
> to open() to tell IMA to measure the file being opened before the fd
> is returned to userspace, and Al is suggesting a new syscall to allow
> a previously-obtained fd to be measured.
> 
> I think what you're saying is that you don't see any reason to prefer
> one over the other.

Being able to define IMA appraise and measure file open
(func=FILE_CHECK) policy rules  to prevent the interpreter from
executing unsigned files would be really nice.  Otherwise, the file
would be measured and appraised multiple times, once on file open and
again at the point of this new syscall.

Mimi


^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-11 14:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, David Howells, Linus Torvalds, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <CAJfpegsNj55pTXe97qE_i-=zFwca1=2W_NqFdn=rHqc_AJjr8Q@mail.gmail.com>

On Tue, Aug 11, 2020 at 04:22:19PM +0200, Miklos Szeredi wrote:
> On Tue, Aug 11, 2020 at 4:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Aug 11, 2020 at 03:54:19PM +0200, Miklos Szeredi wrote:
> > > On Wed, Aug 05, 2020 at 10:24:23AM +0200, Miklos Szeredi wrote:
> > > > On Tue, Aug 4, 2020 at 4:36 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > >
> > > > > I think we already lost that with the xattr API, that should have been
> > > > > done in a way that fits this philosophy.  But given that we  have "/"
> > > > > as the only special purpose char in filenames, and even repetitions
> > > > > are allowed, it's hard to think of a good way to do that.  Pity.
> > > >
> > > > One way this could be solved is to allow opting into an alternative
> > > > path resolution mode.
> > > >
> > > > E.g.
> > > >   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
> > >
> > > Proof of concept patch and test program below.
> > >
> > > Opted for triple slash in the hope that just maybe we could add a global
> > > /proc/sys/fs/resolve_alt knob to optionally turn on alternative (non-POSIX) path
> > > resolution without breaking too many things.  Will try that later...
> > >
> > > Comments?
> >
> > Hell, NO.  This is unspeakably tasteless.  And full of lovely corner cases wrt
> > symlink bodies, etc.
> 
> It's disabled inside symlink body resolution.
> 
> Rules are simple:
> 
>  - strip off trailing part after first instance of ///
>  - perform path lookup as normal
>  - resolve meta path after /// on result of normal lookup

... and interpolation of relative symlink body into the pathname does change
behaviour now, *including* the cases when said symlink body does not contain
that triple-X^Hslash garbage.  Wonderful...

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-11 14:22 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, David Howells, Linus Torvalds, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <20200811140833.GH1236603@ZenIV.linux.org.uk>

On Tue, Aug 11, 2020 at 4:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Aug 11, 2020 at 03:54:19PM +0200, Miklos Szeredi wrote:
> > On Wed, Aug 05, 2020 at 10:24:23AM +0200, Miklos Szeredi wrote:
> > > On Tue, Aug 4, 2020 at 4:36 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > > I think we already lost that with the xattr API, that should have been
> > > > done in a way that fits this philosophy.  But given that we  have "/"
> > > > as the only special purpose char in filenames, and even repetitions
> > > > are allowed, it's hard to think of a good way to do that.  Pity.
> > >
> > > One way this could be solved is to allow opting into an alternative
> > > path resolution mode.
> > >
> > > E.g.
> > >   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
> >
> > Proof of concept patch and test program below.
> >
> > Opted for triple slash in the hope that just maybe we could add a global
> > /proc/sys/fs/resolve_alt knob to optionally turn on alternative (non-POSIX) path
> > resolution without breaking too many things.  Will try that later...
> >
> > Comments?
>
> Hell, NO.  This is unspeakably tasteless.  And full of lovely corner cases wrt
> symlink bodies, etc.

It's disabled inside symlink body resolution.

Rules are simple:

 - strip off trailing part after first instance of ///
 - perform path lookup as normal
 - resolve meta path after /// on result of normal lookup

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API  (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-11 14:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, David Howells, Linus Torvalds, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <20200811135419.GA1263716@miu.piliscsaba.redhat.com>

On Tue, Aug 11, 2020 at 03:54:19PM +0200, Miklos Szeredi wrote:
> On Wed, Aug 05, 2020 at 10:24:23AM +0200, Miklos Szeredi wrote:
> > On Tue, Aug 4, 2020 at 4:36 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > 
> > > I think we already lost that with the xattr API, that should have been
> > > done in a way that fits this philosophy.  But given that we  have "/"
> > > as the only special purpose char in filenames, and even repetitions
> > > are allowed, it's hard to think of a good way to do that.  Pity.
> > 
> > One way this could be solved is to allow opting into an alternative
> > path resolution mode.
> > 
> > E.g.
> >   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);
> 
> Proof of concept patch and test program below.
> 
> Opted for triple slash in the hope that just maybe we could add a global
> /proc/sys/fs/resolve_alt knob to optionally turn on alternative (non-POSIX) path
> resolution without breaking too many things.  Will try that later...
> 
> Comments?

Hell, NO.  This is unspeakably tasteless.  And full of lovely corner cases wrt
symlink bodies, etc.

Consider that one NAKed.  I'm seriously unhappy with the entire fsinfo thing
in general, but this one is really over the top.

^ permalink raw reply

* Re: [PATCH v7 0/7] Add support for O_MAYEXEC
From: Matthew Wilcox @ 2020-08-11 14:02 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Mickaël Salaün, Jann Horn, Kees Cook, Deven Bowers,
	Al Viro, Andrew Morton, kernel list, Aleksa Sarai,
	Alexei Starovoitov, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jonathan Corbet, Lakshmi Ramasubramanian, Matthew Garrett,
	Michael Kerrisk, Philippe Trébuchet, Scott Shell,
	Sean Christopherson, Shuah Khan, Steve Dower, Steve Grubb,
	Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	Kernel Hardening, Linux API, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <5db0ef9cb5e7e1569a5a1f7a0594937023f7290b.camel@linux.ibm.com>

On Tue, Aug 11, 2020 at 09:56:50AM -0400, Mimi Zohar wrote:
> On Tue, 2020-08-11 at 10:48 +0200, Mickaël Salaün wrote:
> > On 11/08/2020 01:03, Jann Horn wrote:
> > > On Tue, Aug 11, 2020 at 12:43 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > On 10/08/2020 22:21, Al Viro wrote:
> > > > > On Mon, Aug 10, 2020 at 10:11:53PM +0200, Mickaël Salaün wrote:
> > > > > > It seems that there is no more complains nor questions. Do you want me
> > > > > > to send another series to fix the order of the S-o-b in patch 7?
> > > > > 
> > > > > There is a major question regarding the API design and the choice of
> > > > > hooking that stuff on open().  And I have not heard anything resembling
> > > > > a coherent answer.
> > > > 
> > > > Hooking on open is a simple design that enables processes to check files
> > > > they intend to open, before they open them. From an API point of view,
> > > > this series extends openat2(2) with one simple flag: O_MAYEXEC. The
> > > > enforcement is then subject to the system policy (e.g. mount points,
> > > > file access rights, IMA, etc.).
> > > > 
> > > > Checking on open enables to not open a file if it does not meet some
> > > > requirements, the same way as if the path doesn't exist or (for whatever
> > > > reasons, including execution permission) if access is denied.
> > > 
> > > You can do exactly the same thing if you do the check in a separate
> > > syscall though.
> > > 
> > > And it provides a greater degree of flexibility; for example, you can
> > > use it in combination with fopen() without having to modify the
> > > internals of fopen() or having to use fdopen().
> > > 
> > > > It is a
> > > > good practice to check as soon as possible such properties, and it may
> > > > enables to avoid (user space) time-of-check to time-of-use (TOCTOU)
> > > > attacks (i.e. misuse of already open resources).
> > > 
> > > The assumption that security checks should happen as early as possible
> > > can actually cause security problems. For example, because seccomp was
> > > designed to do its checks as early as possible, including before
> > > ptrace, we had an issue for a long time where the ptrace API could be
> > > abused to bypass seccomp filters.
> > > 
> > > Please don't decide that a check must be ordered first _just_ because
> > > it is a security check. While that can be good for limiting attack
> > > surface, it can also create issues when the idea is applied too
> > > broadly.
> > 
> > I'd be interested with such security issue examples.
> > 
> > I hope that delaying checks will not be an issue for mechanisms such as
> > IMA or IPE:
> > https://lore.kernel.org/lkml/1544699060.6703.11.camel@linux.ibm.com/
> > 
> > Any though Mimi, Deven, Chrome OS folks?
> 
> One of the major gaps, defining a system wide policy requiring all code
> being executed to be signed, is interpreters.  The kernel has no
> context for the interpreter's opening the file.  From an IMA
> perspective, this information needs to be conveyed to the kernel prior
> to ima_file_check(), which would allow IMA policy rules to be defined
> in terms of O_MAYEXEC.

This is kind of evading the point -- Mickaël is proposing a new flag
to open() to tell IMA to measure the file being opened before the fd
is returned to userspace, and Al is suggesting a new syscall to allow
a previously-obtained fd to be measured.

I think what you're saying is that you don't see any reason to prefer
one over the other.

^ permalink raw reply

* Re: [PATCH v7 0/7] Add support for O_MAYEXEC
From: Mimi Zohar @ 2020-08-11 13:56 UTC (permalink / raw)
  To: Mickaël Salaün, Jann Horn, Kees Cook, Deven Bowers
  Cc: Al Viro, Andrew Morton, kernel list, Aleksa Sarai,
	Alexei Starovoitov, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jonathan Corbet, Lakshmi Ramasubramanian, Matthew Garrett,
	Matthew Wilcox, Michael Kerrisk, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	Kernel Hardening, Linux API, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <0cc94c91-afd3-27cd-b831-8ea16ca8ca93@digikod.net>

On Tue, 2020-08-11 at 10:48 +0200, Mickaël Salaün wrote:
> On 11/08/2020 01:03, Jann Horn wrote:
> > On Tue, Aug 11, 2020 at 12:43 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > On 10/08/2020 22:21, Al Viro wrote:
> > > > On Mon, Aug 10, 2020 at 10:11:53PM +0200, Mickaël Salaün wrote:
> > > > > It seems that there is no more complains nor questions. Do you want me
> > > > > to send another series to fix the order of the S-o-b in patch 7?
> > > > 
> > > > There is a major question regarding the API design and the choice of
> > > > hooking that stuff on open().  And I have not heard anything resembling
> > > > a coherent answer.
> > > 
> > > Hooking on open is a simple design that enables processes to check files
> > > they intend to open, before they open them. From an API point of view,
> > > this series extends openat2(2) with one simple flag: O_MAYEXEC. The
> > > enforcement is then subject to the system policy (e.g. mount points,
> > > file access rights, IMA, etc.).
> > > 
> > > Checking on open enables to not open a file if it does not meet some
> > > requirements, the same way as if the path doesn't exist or (for whatever
> > > reasons, including execution permission) if access is denied.
> > 
> > You can do exactly the same thing if you do the check in a separate
> > syscall though.
> > 
> > And it provides a greater degree of flexibility; for example, you can
> > use it in combination with fopen() without having to modify the
> > internals of fopen() or having to use fdopen().
> > 
> > > It is a
> > > good practice to check as soon as possible such properties, and it may
> > > enables to avoid (user space) time-of-check to time-of-use (TOCTOU)
> > > attacks (i.e. misuse of already open resources).
> > 
> > The assumption that security checks should happen as early as possible
> > can actually cause security problems. For example, because seccomp was
> > designed to do its checks as early as possible, including before
> > ptrace, we had an issue for a long time where the ptrace API could be
> > abused to bypass seccomp filters.
> > 
> > Please don't decide that a check must be ordered first _just_ because
> > it is a security check. While that can be good for limiting attack
> > surface, it can also create issues when the idea is applied too
> > broadly.
> 
> I'd be interested with such security issue examples.
> 
> I hope that delaying checks will not be an issue for mechanisms such as
> IMA or IPE:
> https://lore.kernel.org/lkml/1544699060.6703.11.camel@linux.ibm.com/
> 
> Any though Mimi, Deven, Chrome OS folks?

One of the major gaps, defining a system wide policy requiring all code
being executed to be signed, is interpreters.  The kernel has no
context for the interpreter's opening the file.  From an IMA
perspective, this information needs to be conveyed to the kernel prior
to ima_file_check(), which would allow IMA policy rules to be defined
in terms of O_MAYEXEC.

> 
> > I don't see how TOCTOU issues are relevant in any way here. If someone
> > can turn a script that is considered a trusted file into an untrusted
> > file and then maliciously change its contents, you're going to have
> > issues either way because the modifications could still happen after
> > openat(); if this was possible, the whole thing would kind of fall
> > apart. And if that isn't possible, I don't see any TOCTOU.
> 
> Sure, and if the scripts are not protected in some way there is no point
> to check anything.

The interpreter itself would be signed.

Mimi

> 
> > > It is important to keep
> > > in mind that the use cases we are addressing consider that the (user
> > > space) script interpreters (or linkers) are trusted and unaltered (i.e.
> > > integrity/authenticity checked). These are similar sought defensive
> > > properties as for SUID/SGID binaries: attackers can still launch them
> > > with malicious inputs (e.g. file paths, file descriptors, environment
> > > variables, etc.), but the binaries can then have a way to check if they
> > > can extend their trust to some file paths.
> > > 
> > > Checking file descriptors may help in some use cases, but not the ones
> > > motivating this series.
> > 
> > It actually provides a superset of the functionality that your
> > existing patches provide.
> 
> It also brings new issues with multiple file descriptor origins (e.g.
> memfd_create).
> 
> > > Checking (already) opened resources could be a
> > > *complementary* way to check execute permission, but it is not in the
> > > scope of this series.



^ permalink raw reply

* Re: file metadata via fs API  (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-11 13:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David Howells, Linus Torvalds, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM, linux-kernel
In-Reply-To: <CAJfpegtNP8rQSS4Z14Ja4x-TOnejdhDRTsmmDD-Cccy2pkfVVw@mail.gmail.com>

On Wed, Aug 05, 2020 at 10:24:23AM +0200, Miklos Szeredi wrote:
> On Tue, Aug 4, 2020 at 4:36 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > I think we already lost that with the xattr API, that should have been
> > done in a way that fits this philosophy.  But given that we  have "/"
> > as the only special purpose char in filenames, and even repetitions
> > are allowed, it's hard to think of a good way to do that.  Pity.
> 
> One way this could be solved is to allow opting into an alternative
> path resolution mode.
> 
> E.g.
>   openat(AT_FDCWD, "foo/bar//mnt/info", O_RDONLY | O_ALT);

Proof of concept patch and test program below.

Opted for triple slash in the hope that just maybe we could add a global
/proc/sys/fs/resolve_alt knob to optionally turn on alternative (non-POSIX) path
resolution without breaking too many things.  Will try that later...

Comments?

Thanks,
Miklos

cat_alt.c:
-------- >8 --------
#define _GNU_SOURCE
#include <err.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>
#include <linux/unistd.h>
#include <linux/openat2.h>

#define RESOLVE_ALT		0x20 /* Alternative path walk mode where
					multiple slashes have special meaning */

int main(int argc, char *argv[])
{
	struct open_how how = {
		.flags = O_RDONLY,
		.resolve = RESOLVE_ALT,
	};
	int fd, res, i;
	char buf[65536], *end;
	const char *path = argv[1];
	int dfd = AT_FDCWD;

	if (argc < 2 || argc > 4)
		errx(1, "usage: %s path [dirfd] [--nofollow]", argv[0]);


	for (i = 2; i < argc; i++) {
		if (strcmp(argv[i], "--nofollow") == 0) {
			how.flags |= O_NOFOLLOW;
		} else {
			dfd = strtoul(argv[i], &end, 0);
			if (end == argv[i] || *end)
				errx(1, "invalid dirfd: %s", argv[i]);
		}
	}

	fd = syscall(__NR_openat2, dfd, path, &how, sizeof(how));
	if (fd == -1)
		err(1, "failed to open %s", argv[1]);

	while (1) {
		res = read(fd, buf, sizeof(buf));
		if (res == -1)
			err(1, "failed to read file");
		if (res == 0)
			break;

		write(1, buf, res);
	}
	close(fd);
	return 0;
}
-------- >8 --------

---
 fs/Makefile                  |    2 
 fs/file_table.c              |   70 ++++++++++++++--------
 fs/fsmeta.c                  |  135 +++++++++++++++++++++++++++++++++++++++++++
 fs/internal.h                |    9 ++
 fs/mount.h                   |    4 +
 fs/namei.c                   |   77 +++++++++++++++++++++---
 fs/namespace.c               |   12 +++
 fs/open.c                    |    2 
 fs/proc_namespace.c          |    2 
 include/linux/fcntl.h        |    2 
 include/linux/namei.h        |    3 
 include/uapi/linux/magic.h   |    1 
 include/uapi/linux/openat2.h |    2 
 13 files changed, 282 insertions(+), 39 deletions(-)

--- a/fs/Makefile
+++ b/fs/Makefile
@@ -13,7 +13,7 @@ obj-y :=	open.o read_write.o file_table.
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o splice.o sync.o utimes.o d_path.o \
 		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
-		fs_types.o fs_context.o fs_parser.o fsopen.o
+		fs_types.o fs_context.o fs_parser.o fsopen.o fsmeta.o \
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -178,22 +178,9 @@ struct file *alloc_empty_file_noaccount(
 	return f;
 }
 
-/**
- * alloc_file - allocate and initialize a 'struct file'
- *
- * @path: the (dentry, vfsmount) pair for the new file
- * @flags: O_... flags with which the new file will be opened
- * @fop: the 'struct file_operations' for the new file
- */
-static struct file *alloc_file(const struct path *path, int flags,
-		const struct file_operations *fop)
+static void init_file(struct file *file, const struct path *path, int flags,
+		      const struct file_operations *fop)
 {
-	struct file *file;
-
-	file = alloc_empty_file(flags, current_cred());
-	if (IS_ERR(file))
-		return file;
-
 	file->f_path = *path;
 	file->f_inode = path->dentry->d_inode;
 	file->f_mapping = path->dentry->d_inode->i_mapping;
@@ -209,31 +196,66 @@ static struct file *alloc_file(const str
 	file->f_op = fop;
 	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
 		i_readcount_inc(path->dentry->d_inode);
+}
+
+/**
+ * alloc_file - allocate and initialize a 'struct file'
+ *
+ * @path: the (dentry, vfsmount) pair for the new file
+ * @flags: O_... flags with which the new file will be opened
+ * @fop: the 'struct file_operations' for the new file
+ */
+static struct file *alloc_file(const struct path *path, int flags,
+		const struct file_operations *fop)
+{
+	struct file *file;
+
+	file = alloc_empty_file(flags, current_cred());
+	if (IS_ERR(file))
+		return file;
+
+	init_file(file, path, flags, fop);
+
 	return file;
 }
 
-struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
-				const char *name, int flags,
-				const struct file_operations *fops)
+int init_file_pseudo(struct file *file, struct inode *inode,
+		     struct vfsmount *mnt, const char *name, int flags,
+		     const struct file_operations *fops)
 {
 	static const struct dentry_operations anon_ops = {
 		.d_dname = simple_dname
 	};
 	struct qstr this = QSTR_INIT(name, strlen(name));
 	struct path path;
-	struct file *file;
 
 	path.dentry = d_alloc_pseudo(mnt->mnt_sb, &this);
 	if (!path.dentry)
-		return ERR_PTR(-ENOMEM);
+		return -ENOMEM;
 	if (!mnt->mnt_sb->s_d_op)
 		d_set_d_op(path.dentry, &anon_ops);
 	path.mnt = mntget(mnt);
 	d_instantiate(path.dentry, inode);
-	file = alloc_file(&path, flags, fops);
-	if (IS_ERR(file)) {
-		ihold(inode);
-		path_put(&path);
+	init_file(file, &path, flags, fops);
+
+	return 0;
+}
+
+struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
+				const char *name, int flags,
+				const struct file_operations *fops)
+{
+	struct file *file;
+	int err;
+
+	file = alloc_empty_file(flags, current_cred());
+	if (IS_ERR(file))
+		return file;
+
+	err = init_file_pseudo(file, inode, mnt, name, flags, fops);
+	if (err) {
+		fput(file);
+		file = ERR_PTR(err);
 	}
 	return file;
 }
--- /dev/null
+++ b/fs/fsmeta.c
@@ -0,0 +1,135 @@
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/magic.h>
+#include <linux/seq_file.h>
+#include <linux/fs_struct.h>
+#include <linux/pseudo_fs.h>
+
+#include "mount.h"
+#include "internal.h"
+
+static struct vfsmount *fsmeta_mnt;
+static struct inode *fsmeta_inode;
+
+
+static struct vfsmount *fsmeta_mnt_info_get_mnt(struct seq_file *seq)
+{
+	struct proc_mounts *p = seq->private;
+
+	return &list_entry(p->cursor.mnt_list.next, struct mount, mnt_list)->mnt;
+}
+
+static void *fsmeta_mnt_info_start(struct seq_file *seq, loff_t *pos)
+{
+	mnt_namespace_lock_read();
+	return *pos == 0 ? fsmeta_mnt_info_get_mnt(seq) : NULL;
+}
+
+static void *fsmeta_mnt_info_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	++*pos;
+	return NULL;
+}
+
+static void fsmeta_mnt_info_stop(struct seq_file *seq, void *v)
+{
+	mnt_namespace_unlock_read();
+}
+
+static int fsmeta_mnt_info_show(struct seq_file *seq, void *v)
+{
+	return show_mountinfo(seq, v);
+}
+
+static const struct seq_operations fsmeta_mnt_info_sops = {
+	.start = fsmeta_mnt_info_start,
+	.next = fsmeta_mnt_info_next,
+	.stop = fsmeta_mnt_info_stop,
+	.show = fsmeta_mnt_info_show,
+};
+
+static int fsmeta_mnt_info_release(struct inode *inode, struct file *file)
+{
+	if (file->private_data) {
+		struct seq_file *seq = file->private_data;
+		struct proc_mounts *p = seq->private;
+
+		mntput(fsmeta_mnt_info_get_mnt(seq));
+		path_put(&p->root);
+
+		return seq_release_private(inode, file);
+	}
+	return 0;
+}
+
+static const struct file_operations fsmeta_mnt_info_fops = {
+	.release = fsmeta_mnt_info_release,
+	.read = seq_read,
+	.llseek = no_llseek,
+};
+
+static int fsmeta_mnt_info_open(struct file *file, const struct path *path,
+				const struct open_flags *op)
+{
+	struct proc_mounts *p;
+	int err;
+
+	err = init_file_pseudo(file, fsmeta_inode, fsmeta_mnt, "[mnt.info]",
+			       op->open_flag, &fsmeta_mnt_info_fops);
+	if (err)
+		return err;
+	/*
+	 * This reference is now sunk in file->f_path.dentry->d_inode and will
+	 * be released by fput()
+	 */
+	ihold(fsmeta_inode);
+
+	err = seq_open_private(file, &fsmeta_mnt_info_sops, sizeof(*p));
+	if (err)
+		return err;
+
+	p = ((struct seq_file *)file->private_data)->private;
+	get_fs_root(current->fs, &p->root);
+	p->cursor.mnt_list.next = &real_mount(mntget(path->mnt))->mnt_list;
+
+	return 0;
+}
+
+int fsmeta_open(const char *meta_name, const struct path *path,
+	      struct file *file, const struct open_flags *op)
+{
+	if (op->open_flag & ~(O_LARGEFILE | O_CLOEXEC | O_NOFOLLOW))
+		return -EINVAL;
+
+	if (strcmp(meta_name, "mnt/info") == 0)
+		return fsmeta_mnt_info_open(file, path, op);
+
+	pr_info("invalid fsmeta file <%s> on %pd4\n", meta_name, path->dentry);
+	return -EINVAL;
+}
+
+static int fsmeta_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, FSMETA_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type fsmeta_fs_type = {
+	.name		= "fsmeta",
+	.init_fs_context = fsmeta_init_fs_context,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init fsmeta_init(void)
+{
+	fsmeta_mnt = kern_mount(&fsmeta_fs_type);
+	if (IS_ERR(fsmeta_mnt))
+		panic("fsmeta_init() kernel mount failed (%ld)\n", PTR_ERR(fsmeta_mnt));
+
+	fsmeta_inode = alloc_anon_inode(fsmeta_mnt->mnt_sb);
+	if (IS_ERR(fsmeta_inode))
+		panic("fsmeta_init() inode allocation failed (%ld)\n", PTR_ERR(fsmeta_inode));
+
+	return 0;
+}
+fs_initcall(fsmeta_init);
+
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -99,6 +99,9 @@ extern void chroot_fs_refs(const struct
  */
 extern struct file *alloc_empty_file(int, const struct cred *);
 extern struct file *alloc_empty_file_noaccount(int, const struct cred *);
+extern int init_file_pseudo(struct file *file, struct inode *inode,
+			    struct vfsmount *mnt, const char *name, int flags,
+			    const struct file_operations *fops);
 
 /*
  * super.c
@@ -185,3 +188,9 @@ int sb_init_dio_done_wq(struct super_blo
  */
 int do_statx(int dfd, const char __user *filename, unsigned flags,
 	     unsigned int mask, struct statx __user *buffer);
+
+/*
+ * fs/fsmeta.c
+ */
+int fsmeta_open(const char *meta_name, const struct path *path,
+		struct file *file, const struct open_flags *op);
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -159,3 +159,7 @@ static inline bool is_anon_ns(struct mnt
 }
 
 extern void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor);
+
+void mnt_namespace_lock_read(void);
+void mnt_namespace_unlock_read(void);
+int show_mountinfo(struct seq_file *m, struct vfsmount *mnt);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2094,6 +2094,30 @@ static inline u64 hash_name(const void *
 
 #endif
 
+static int lookup_alt(const char *name, struct nameidata *nd)
+{
+	if ((nd->flags & LOOKUP_RCU) && unlazy_walk(nd) != 0)
+		return -ECHILD;
+
+	nd->last.name = name + 3;
+	nd->last_type = LAST_META;
+
+	return 0;
+}
+
+static bool is_alt(const char *name, struct nameidata *nd, int depth)
+{
+	if (!(nd->flags & LOOKUP_ALT))
+		return false;
+
+	/* no alternative lookup inside symlinks */
+	if (depth)
+		return false;
+
+	/* name[0] has already been verified to be a slash */
+	return name[1] == '/' && name[2] == '/' && name[3] != '/';
+}
+
 /*
  * Name resolution.
  * This is the basic name resolution function, turning a pathname into
@@ -2111,8 +2135,13 @@ static int link_path_walk(const char *na
 	nd->flags |= LOOKUP_PARENT;
 	if (IS_ERR(name))
 		return PTR_ERR(name);
-	while (*name=='/')
-		name++;
+	if (*name == '/') {
+		if (!is_alt(name, nd, depth)) {
+			do {
+				name++;
+			} while (*name == '/');
+		}
+	}
 	if (!*name)
 		return 0;
 
@@ -2122,6 +2151,9 @@ static int link_path_walk(const char *na
 		u64 hash_len;
 		int type;
 
+		if (*name == '/')
+			return lookup_alt(name, nd);
+
 		err = may_lookup(nd);
 		if (err)
 			return err;
@@ -2163,6 +2195,13 @@ static int link_path_walk(const char *na
 		 * If it wasn't NUL, we know it was '/'. Skip that
 		 * slash, and continue until no more slashes.
 		 */
+		if (is_alt(name, nd, depth)) {
+			link = walk_component(nd, WALK_TRAILING);
+			if (unlikely(link))
+				goto LINK;
+
+			return lookup_alt(name, nd);
+		}
 		do {
 			name++;
 		} while (unlikely(*name == '/'));
@@ -2183,6 +2222,7 @@ static int link_path_walk(const char *na
 			link = walk_component(nd, WALK_MORE);
 		}
 		if (unlikely(link)) {
+LINK:
 			if (IS_ERR(link))
 				return PTR_ERR(link);
 			/* a symlink to follow */
@@ -2239,11 +2279,11 @@ static const char *path_init(struct name
 	nd->path.dentry = NULL;
 
 	/* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */
-	if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) {
+	if (*s == '/' && !is_alt(s, nd, 0) && !(flags & LOOKUP_IN_ROOT)) {
 		error = nd_jump_root(nd);
 		if (unlikely(error))
 			return ERR_PTR(error);
-		return s;
+		return s + 1;
 	}
 
 	/* Relative pathname -- get the starting-point it is relative to. */
@@ -2272,7 +2312,8 @@ static const char *path_init(struct name
 
 		dentry = f.file->f_path.dentry;
 
-		if (*s && unlikely(!d_can_lookup(dentry))) {
+		if (*s && unlikely(!d_can_lookup(dentry)) &&
+		    !is_alt(s, nd, 0)) {
 			fdput(f);
 			return ERR_PTR(-ENOTDIR);
 		}
@@ -2303,6 +2344,9 @@ static const char *path_init(struct name
 
 static inline const char *lookup_last(struct nameidata *nd)
 {
+	if (nd->last_type == LAST_META)
+		return ERR_PTR(-EINVAL);
+
 	if (nd->last_type == LAST_NORM && nd->last.name[nd->last.len])
 		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
 
@@ -2331,7 +2375,7 @@ static int path_lookupat(struct nameidat
 
 	while (!(err = link_path_walk(s, nd)) &&
 	       (s = lookup_last(nd)) != NULL)
-		;
+		nd->flags &= ~LOOKUP_ALT;
 	if (!err)
 		err = complete_walk(nd);
 
@@ -2410,9 +2454,15 @@ static struct filename *filename_parenta
 	if (unlikely(retval == -ESTALE))
 		retval = path_parentat(&nd, flags | LOOKUP_REVAL, parent);
 	if (likely(!retval)) {
-		*last = nd.last;
-		*type = nd.last_type;
-		audit_inode(name, parent->dentry, AUDIT_INODE_PARENT);
+		if (nd.last_type == LAST_META) {
+			path_put(parent);
+			putname(name);
+			name = ERR_PTR(-EINVAL);
+		} else {
+			*last = nd.last;
+			*type = nd.last_type;
+			audit_inode(name, parent->dentry, AUDIT_INODE_PARENT);
+		}
 	} else {
 		putname(name);
 		name = ERR_PTR(retval);
@@ -3123,6 +3173,10 @@ static const char *open_last_lookups(str
 	nd->flags |= op->intent;
 
 	if (nd->last_type != LAST_NORM) {
+		if (nd->last_type == LAST_META) {
+			return ERR_PTR(fsmeta_open(nd->last.name, &nd->path,
+						   file, op));
+		}
 		if (nd->depth)
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
@@ -3206,6 +3260,9 @@ static int do_open(struct nameidata *nd,
 	int acc_mode;
 	int error;
 
+	if (nd->last_type == LAST_META)
+		return 0;
+
 	if (!(file->f_mode & (FMODE_OPENED | FMODE_CREATED))) {
 		error = complete_walk(nd);
 		if (error)
@@ -3355,7 +3412,7 @@ static struct file *path_openat(struct n
 		const char *s = path_init(nd, flags);
 		while (!(error = link_path_walk(s, nd)) &&
 		       (s = open_last_lookups(nd, file, op)) != NULL)
-			;
+			nd->flags &= ~LOOKUP_ALT;
 		if (!error)
 			error = do_open(nd, file, op);
 		terminate_walk(nd);
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -69,7 +69,7 @@ static DEFINE_IDA(mnt_group_ida);
 static struct hlist_head *mount_hashtable __read_mostly;
 static struct hlist_head *mountpoint_hashtable __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
-static DECLARE_RWSEM(namespace_sem);
+DECLARE_RWSEM(namespace_sem);
 static HLIST_HEAD(unmounted);	/* protected by namespace_sem */
 static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
 
@@ -1435,6 +1435,16 @@ static inline void namespace_lock(void)
 	down_write(&namespace_sem);
 }
 
+void mnt_namespace_lock_read(void)
+{
+	down_read(&namespace_sem);
+}
+
+void mnt_namespace_unlock_read(void)
+{
+	up_read(&namespace_sem);
+}
+
 enum umount_tree_flags {
 	UMOUNT_SYNC = 1,
 	UMOUNT_PROPAGATE = 2,
--- a/fs/open.c
+++ b/fs/open.c
@@ -1098,6 +1098,8 @@ inline int build_open_flags(const struct
 		lookup_flags |= LOOKUP_BENEATH;
 	if (how->resolve & RESOLVE_IN_ROOT)
 		lookup_flags |= LOOKUP_IN_ROOT;
+	if (how->resolve & RESOLVE_ALT)
+		lookup_flags |= LOOKUP_ALT;
 
 	op->lookup_flags = lookup_flags;
 	return 0;
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -128,7 +128,7 @@ static int show_vfsmnt(struct seq_file *
 	return err;
 }
 
-static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt)
+int show_mountinfo(struct seq_file *m, struct vfsmount *mnt)
 {
 	struct proc_mounts *p = m->private;
 	struct mount *r = real_mount(mnt);
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -19,7 +19,7 @@
 /* List of all valid flags for the how->resolve argument: */
 #define VALID_RESOLVE_FLAGS \
 	(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
-	 RESOLVE_BENEATH | RESOLVE_IN_ROOT)
+	 RESOLVE_BENEATH | RESOLVE_IN_ROOT | RESOLVE_ALT)
 
 /* List of all open_how "versions". */
 #define OPEN_HOW_SIZE_VER0	24 /* sizeof first published struct */
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -15,7 +15,7 @@ enum { MAX_NESTED_LINKS = 8 };
 /*
  * Type of the last component on LOOKUP_PARENT
  */
-enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
+enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_META};
 
 /* pathwalk mode */
 #define LOOKUP_FOLLOW		0x0001	/* follow links at the end */
@@ -27,6 +27,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LA
 
 #define LOOKUP_REVAL		0x0020	/* tell ->d_revalidate() to trust no cache */
 #define LOOKUP_RCU		0x0040	/* RCU pathwalk mode; semi-internal */
+#define LOOKUP_ALT		0x200000 /* Alternative path walk mode */
 
 /* These tell filesystem methods that we are dealing with the final component... */
 #define LOOKUP_OPEN		0x0100	/* ... in open */
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -88,6 +88,7 @@
 #define BPF_FS_MAGIC		0xcafe4a11
 #define AAFS_MAGIC		0x5a3c69f0
 #define ZONEFS_MAGIC		0x5a4f4653
+#define FSMETA_MAGIC		0x9f8ea387
 
 /* Since UDF 2.01 is ISO 13346 based... */
 #define UDF_SUPER_MAGIC		0x15013346
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -35,5 +35,7 @@ struct open_how {
 #define RESOLVE_IN_ROOT		0x10 /* Make all jumps to "/" and ".."
 					be scoped inside the dirfd
 					(similar to chroot(2)). */
+#define RESOLVE_ALT		0x20 /* Alternative path walk mode where
+					multiple slashes have special meaning */
 
 #endif /* _UAPI_LINUX_OPENAT2_H */

^ permalink raw reply

* Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Pavel Machek @ 2020-08-11 13:08 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mark Rutland, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86
In-Reply-To: <6cca8eac-f767-b891-dc92-eaa7504a0e8b@linux.microsoft.com>

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Hi!

> >> Thanks for the lively discussion. I have tried to answer some of the
> >> comments below.
> > 
> >>> There are options today, e.g.
> >>>
> >>> a) If the restriction is only per-alias, you can have distinct aliases
> >>>    where one is writable and another is executable, and you can make it
> >>>    hard to find the relationship between the two.
> >>>
> >>> b) If the restriction is only temporal, you can write instructions into
> >>>    an RW- buffer, transition the buffer to R--, verify the buffer
> >>>    contents, then transition it to --X.
> >>>
> >>> c) You can have two processes A and B where A generates instrucitons into
> >>>    a buffer that (only) B can execute (where B may be restricted from
> >>>    making syscalls like write, mprotect, etc).
> >>
> >> The general principle of the mitigation is W^X. I would argue that
> >> the above options are violations of the W^X principle. If they are
> >> allowed today, they must be fixed. And they will be. So, we cannot
> >> rely on them.
> > 
> > Would you mind describing your threat model?
> > 
> > Because I believe you are using model different from everyone else.
> > 
> > In particular, I don't believe b) is a problem or should be fixed.
> 
> It is a problem because a kernel that implements W^X properly
> will not allow it. It has no idea what has been done in userland.
> It has no idea that the user has checked and verified the buffer
> contents after transitioning the page to R--.

No, it is not a problem. W^X is designed to protect from attackers
doing buffer overflows, not attackers doing arbitrary syscalls.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply

* Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Madhavan T. Venkataraman @ 2020-08-11 12:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mark Rutland, kernel-hardening, linux-api, linux-arm-kernel,
	linux-fsdevel, linux-integrity, linux-kernel,
	linux-security-module, oleg, x86
In-Reply-To: <20200808221748.GA1020@bug>



On 8/8/20 5:17 PM, Pavel Machek wrote:
> Hi!
> 
>> Thanks for the lively discussion. I have tried to answer some of the
>> comments below.
> 
>>> There are options today, e.g.
>>>
>>> a) If the restriction is only per-alias, you can have distinct aliases
>>>    where one is writable and another is executable, and you can make it
>>>    hard to find the relationship between the two.
>>>
>>> b) If the restriction is only temporal, you can write instructions into
>>>    an RW- buffer, transition the buffer to R--, verify the buffer
>>>    contents, then transition it to --X.
>>>
>>> c) You can have two processes A and B where A generates instrucitons into
>>>    a buffer that (only) B can execute (where B may be restricted from
>>>    making syscalls like write, mprotect, etc).
>>
>> The general principle of the mitigation is W^X. I would argue that
>> the above options are violations of the W^X principle. If they are
>> allowed today, they must be fixed. And they will be. So, we cannot
>> rely on them.
> 
> Would you mind describing your threat model?
> 
> Because I believe you are using model different from everyone else.
> 
> In particular, I don't believe b) is a problem or should be fixed.

It is a problem because a kernel that implements W^X properly
will not allow it. It has no idea what has been done in userland.
It has no idea that the user has checked and verified the buffer
contents after transitioning the page to R--.


> 
> I'll add d), application mmaps a file(R--), and uses write syscall to change
> trampolines in it.
> 

No matter how you do it, these are all user-level methods that can be
hacked. The kernel cannot be sure that an attacker's code has
not found its way into the file.

>> b) This is again a violation. The kernel should refuse to give execute
>> ???????? permission to a page that was writeable in the past and refuse to
>> ???????? give write permission to a page that was executable in the past.
> 
> Why?

I don't know about the latter part. I guess I need to think about it.
But the former is valid. When a page is RW-, a hacker could hack the
page. Then it does not matter that the page is transitioned to R--.
Again, the kernel cannot be sure that the user has verified the contents
after R--.

IMO, W^X needs to be enforced temporally as well.

Madhavan

^ permalink raw reply

* Re: [PATCH v7 0/7] Add support for O_MAYEXEC
From: Mickaël Salaün @ 2020-08-11  8:50 UTC (permalink / raw)
  To: David Laight, Al Viro
  Cc: Kees Cook, Andrew Morton, linux-kernel@vger.kernel.org,
	Aleksa Sarai, Alexei Starovoitov, Andy Lutomirski,
	Christian Brauner, Christian Heimes, Daniel Borkmann,
	Deven Bowers, Dmitry Vyukov, Eric Biggers, Eric Chiang,
	Florian Weimer, James Morris, Jan Kara, Jann Horn,
	Jonathan Corbet, Lakshmi Ramasubramanian, Matthew Garrett,
	Matthew Wilcox, Michael Kerrisk, Mimi Zohar,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel,
	kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org,
	linux-integrity@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
In-Reply-To: <26a4a8378f3b4ad28eaa476853092716@AcuMS.aculab.com>


On 11/08/2020 10:09, David Laight wrote:
>> On 11/08/2020 00:28, Al Viro wrote:
>>> On Mon, Aug 10, 2020 at 10:09:09PM +0000, David Laight wrote:
>>>>> On Mon, Aug 10, 2020 at 10:11:53PM +0200, Mickaël Salaün wrote:
>>>>>> It seems that there is no more complains nor questions. Do you want me
>>>>>> to send another series to fix the order of the S-o-b in patch 7?
>>>>>
>>>>> There is a major question regarding the API design and the choice of
>>>>> hooking that stuff on open().  And I have not heard anything resembling
>>>>> a coherent answer.
>>>>
>>>> To me O_MAYEXEC is just the wrong name.
>>>> The bit would be (something like) O_INTERPRET to indicate
>>>> what you want to do with the contents.
>>
>> The properties is "execute permission". This can then be checked by
>> interpreters or other applications, then the generic O_MAYEXEC name.
> 
> The english sense of MAYEXEC is just wrong for what you are trying
> to check.

We think it reflects exactly what it's purpose is.

> 
>>> ... which does not answer the question - name of constant is the least of
>>> the worries here.  Why the hell is "apply some unspecified checks to
>>> file" combined with opening it, rather than being an independent primitive
>>> you apply to an already opened file?  Just in case - "'cuz that's how we'd
>>> done it" does not make a good answer...
> 
> Maybe an access_ok() that acts on an open fd would be more
> appropriate.
> Which might end up being an fcntrl() action.
> That would give you a full 32bit mask of options.

I previously talk about fcntl(2):
https://lore.kernel.org/lkml/eaf5bc42-e086-740b-a90c-93e67c535eee@digikod.net/

^ permalink raw reply

* Re: [PATCH v7 0/7] Add support for O_MAYEXEC
From: Mickaël Salaün @ 2020-08-11  8:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Andrew Morton, linux-kernel, Aleksa Sarai,
	Alexei Starovoitov, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk, Mimi Zohar,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <20200810230521.GG1236603@ZenIV.linux.org.uk>


On 11/08/2020 01:05, Al Viro wrote:
> On Tue, Aug 11, 2020 at 12:43:52AM +0200, Mickaël Salaün wrote:
> 
>> Hooking on open is a simple design that enables processes to check files
>> they intend to open, before they open them.
> 
> Which is a good thing, because...?
> 
>> From an API point of view,
>> this series extends openat2(2) with one simple flag: O_MAYEXEC. The
>> enforcement is then subject to the system policy (e.g. mount points,
>> file access rights, IMA, etc.).
> 
> That's what "unspecified" means - as far as the kernel concerned, it's
> "something completely opaque, will let these hooks to play, semantics is
> entirely up to them".

I see it as an access controls mechanism; access may be granted or
denied, as for O_RDONLY, O_WRONLY or (non-Linux) O_EXEC. Even for common
access controls, there are capabilities to bypass them (i.e.
CAP_DAC_OVERRIDE), but multiple layers may enforce different
complementary policies.

>  
>> Checking on open enables to not open a file if it does not meet some
>> requirements, the same way as if the path doesn't exist or (for whatever
>> reasons, including execution permission) if access is denied. It is a
>> good practice to check as soon as possible such properties, and it may
>> enables to avoid (user space) time-of-check to time-of-use (TOCTOU)
>> attacks (i.e. misuse of already open resources).
> 
> ?????  You explicitly assume a cooperating caller.

As said in the below (removed) reply, no, quite the contrary.

>  If it can't be trusted
> to issue the check between open and use, or can be manipulated (ptraced,
> etc.) into not doing so, how can you rely upon the flag having been passed
> in the first place?  And TOCTOU window is definitely not wider that way.

OK, I guess it would be considered a bug in the application (e.g. buggy
resource management between threads).

> 
> If you want to have it done immediately after open(), bloody well do it
> immediately after open.  If attacker has subverted your control flow to the
> extent that allows them to hit descriptor table in the interval between
> these two syscalls, you have already lost - they'll simply prevent that
> flag from being passed.
> 
> What's the point of burying it inside openat2()?  A convenient multiplexor
> to hook into?  We already have one - it's called do_syscall_...
> 

To check as soon as possible without opening something that should not
be opened in the first place.

Isn't a dedicated syscall a bit too much for this feature? What about
adding a new command/flag to fcntl(2)?

^ permalink raw reply

* Re: [PATCH v7 0/7] Add support for O_MAYEXEC
From: Mickaël Salaün @ 2020-08-11  8:48 UTC (permalink / raw)
  To: Jann Horn, Kees Cook, Deven Bowers, Mimi Zohar
  Cc: Al Viro, Andrew Morton, kernel list, Aleksa Sarai,
	Alexei Starovoitov, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Dmitry Vyukov, Eric Biggers,
	Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jonathan Corbet, Lakshmi Ramasubramanian, Matthew Garrett,
	Matthew Wilcox, Michael Kerrisk, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	Kernel Hardening, Linux API, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <CAG48ez0NAV5gPgmbDaSjo=zzE=FgnYz=-OHuXwu0Vts=B5gesA@mail.gmail.com>


On 11/08/2020 01:03, Jann Horn wrote:
> On Tue, Aug 11, 2020 at 12:43 AM Mickaël Salaün <mic@digikod.net> wrote:
>> On 10/08/2020 22:21, Al Viro wrote:
>>> On Mon, Aug 10, 2020 at 10:11:53PM +0200, Mickaël Salaün wrote:
>>>> It seems that there is no more complains nor questions. Do you want me
>>>> to send another series to fix the order of the S-o-b in patch 7?
>>>
>>> There is a major question regarding the API design and the choice of
>>> hooking that stuff on open().  And I have not heard anything resembling
>>> a coherent answer.
>>
>> Hooking on open is a simple design that enables processes to check files
>> they intend to open, before they open them. From an API point of view,
>> this series extends openat2(2) with one simple flag: O_MAYEXEC. The
>> enforcement is then subject to the system policy (e.g. mount points,
>> file access rights, IMA, etc.).
>>
>> Checking on open enables to not open a file if it does not meet some
>> requirements, the same way as if the path doesn't exist or (for whatever
>> reasons, including execution permission) if access is denied.
> 
> You can do exactly the same thing if you do the check in a separate
> syscall though.
> 
> And it provides a greater degree of flexibility; for example, you can
> use it in combination with fopen() without having to modify the
> internals of fopen() or having to use fdopen().
> 
>> It is a
>> good practice to check as soon as possible such properties, and it may
>> enables to avoid (user space) time-of-check to time-of-use (TOCTOU)
>> attacks (i.e. misuse of already open resources).
> 
> The assumption that security checks should happen as early as possible
> can actually cause security problems. For example, because seccomp was
> designed to do its checks as early as possible, including before
> ptrace, we had an issue for a long time where the ptrace API could be
> abused to bypass seccomp filters.
> 
> Please don't decide that a check must be ordered first _just_ because
> it is a security check. While that can be good for limiting attack
> surface, it can also create issues when the idea is applied too
> broadly.

I'd be interested with such security issue examples.

I hope that delaying checks will not be an issue for mechanisms such as
IMA or IPE:
https://lore.kernel.org/lkml/1544699060.6703.11.camel@linux.ibm.com/

Any though Mimi, Deven, Chrome OS folks?

> 
> I don't see how TOCTOU issues are relevant in any way here. If someone
> can turn a script that is considered a trusted file into an untrusted
> file and then maliciously change its contents, you're going to have
> issues either way because the modifications could still happen after
> openat(); if this was possible, the whole thing would kind of fall
> apart. And if that isn't possible, I don't see any TOCTOU.

Sure, and if the scripts are not protected in some way there is no point
to check anything.

> 
>> It is important to keep
>> in mind that the use cases we are addressing consider that the (user
>> space) script interpreters (or linkers) are trusted and unaltered (i.e.
>> integrity/authenticity checked). These are similar sought defensive
>> properties as for SUID/SGID binaries: attackers can still launch them
>> with malicious inputs (e.g. file paths, file descriptors, environment
>> variables, etc.), but the binaries can then have a way to check if they
>> can extend their trust to some file paths.
>>
>> Checking file descriptors may help in some use cases, but not the ones
>> motivating this series.
> 
> It actually provides a superset of the functionality that your
> existing patches provide.

It also brings new issues with multiple file descriptor origins (e.g.
memfd_create).

> 
>> Checking (already) opened resources could be a
>> *complementary* way to check execute permission, but it is not in the
>> scope of this series.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox