Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor
From: Madhavan T. Venkataraman @ 2020-08-12 18:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kernel-hardening, linux-api, linux-arm-kernel, linux-fsdevel,
	linux-integrity, linux-kernel, linux-security-module, oleg, x86
In-Reply-To: <20200812100650.GB28154@C02TD0UTHF1T.local>

On 8/12/20 5:06 AM, Mark Rutland wrote:
> [..]
>>
>> The general principle of the mitigation is W^X. I would argue that
>> the above options are violations of the W^X principle. If they are
>> allowed today, they must be fixed. And they will be. So, we cannot
>> rely on them.
> 
> Hold on.
> 
> Contemporary W^X means that a given virtual alias cannot be writeable
> and executeable simultaneously, permitting (a) and (b). If you read the
> references on the Wikipedia page for W^X you'll see the OpenBSD 3.3
> release notes and related presentation make this clear, and further they
> expect (b) to occur with JITS flipping W/X with mprotect().
> 
> Please don't conflate your assumed stronger semantics with the general
> principle. It not matching you expectations does not necessarily mean
> that it is wrong.
> 
> If you want a stronger W^X semantics, please refer to this specifically
> with a distinct name.

OK. Fair enough. We can give a different name to the stronger requirement.
Just for the sake of this discussion and for the want of a better name,
let us call it WX2.

> 
>> a) This requires a remap operation. Two mappings point to the same
>>      physical page. One mapping has W and the other one has X. This
>>      is a violation of W^X.
>>
>> b) This is again a violation. The kernel should refuse to give execute
>>      permission to a page that was writeable in the past and refuse to
>>      give write permission to a page that was executable in the past.
>>
>> c) This is just a variation of (a).
> 
> As above, this is not true.
> 
> If you have a rationale for why this is desirable or necessary, please
> justify that before using this as justification for additional features.
> 

I already supplied the justification. Any user level method can potentially
be hijacked by an attacker for his purpose.

WX does not prevent all of the methods. We need WX2.

>> In general, the problem with user-level methods to map and execute
>> dynamic code is that the kernel cannot tell if a genuine application is
>> using them or an attacker is using them or piggy-backing on them.
> 
> Yes, and as I pointed out the same is true for trampfd unless you can
> somehow authenticate the calls are legitimate (in both callsite and the
> set of arguments), and I don't see any reasonable way of doing that.
> 

I am afraid I am not in agreement with this. If WX2 is not implemented,
an attacker can hack both code and data. If WX2 is implemented, an attacker
can only attack data. The attack surface is reduced.

Also, trampfd calls coming from code from a signed file can be authenticated.
trampfd calls coming from an attacker's generated code cannot be authenticated.

> If you relax your threat model to an attacker not being able to make
> arbitrary syscalls, then your suggestion that userspace can perorm
> chceks between syscalls may be sufficient, but as I pointed out that's
> equally true for a sealed memfd or similar.
> 

Actually, I did not suggest that userspace can perform checks. I said that
the kernel can perform checks.

User space cannot reliably perform checks between calls. A clever hacker
can cover his tracks.

In any case, the kernel has no knowledge of these checks. So, when execute
permissions are requested for a page, a properly implemented WX2 can refuse.

>> Off the top of my head, I have tried to identify some examples
>> where we can have more trust on dynamic code and have the kernel
>> permit its execution.
>>
>> 1. If the kernel can do the job, then that is one safe way. Here, the kernel
>>     is the code. There is no code generation involved. This is what I
>>     have presented in the patch series as the first cut.
> 
> This is sleight-of-hand; it doesn't matter where the logic is performed
> if the power is identical. Practically speaking this is equivalent to
> some dynamic code generation.
> 
> I think that it's misleading to say that because the kernel emulates
> something it is safe when the provenance of the syscall arguments cannot
> be verified.

I submit that there are two aspects - code and data. In one case, both
code and data can be hacked. So, an attacker can modify both code
and data. In the other case, the attacker can only modify data.
The power is not identical. The attack surface is not the same.

Most of the times, security measures are mitigations. They are not a 100%.
This approach of not allowing the user to do certain things that can be
exploited and having the kernel doing them increases our confidence.
From that perspective, the two approaches are different and it is worth
pursuing a kernel based mitigation.

> 
> [...]
> 
>> Anyway, these are just examples. The principle is - if we can identify
>> dynamic code that has a certain measure of trust, can the kernel
>> permit their execution?
> 
> My point generally is that the kernel cannot identify this, and if
> usrspace code is trusted to dynamically generate trampfd arguments it
> can equally be trusted to dyncamilly generate code.

I am afraid not. See my previous response. Ability to hack only data
gives an attacker fewer options as compared to the ability to hack
both code and data.

> 
> [...]
> 
>> As I have mentioned above, I intend to have the kernel generate code
>> only if the code generation is simple enough. For more complicated cases,
>> I plan to use a user-level code generator that is for exclusive kernel use.
>> I have yet to work out the details on how this would work. Need time.
> 
> This reads to me like trampfd is only dealing with a few special cases
> and we know that we need a more general solution.
> 
> I hope I am mistaken, but I get the strong impression that you're trying
> to justify your existing solution rather than trying to understand the
> problem space.
> 

I do understand the problem space. I wanted to address dynamic code in 3
different ways in separate phases starting from the easiest and working
my way up to the more difficult ones.

1. Remove dynamic code where possible

   If the kernel can replace user level dynamic code, then do it.
   This is what I did in version 1.

2. Replace dynamic code with static code

   Where you cannot do (1), replace dynamic code with static code with
   the kernel's help. I wanted to do this later. But I have decided to
   do this in version 2. This combined with signature verification of
   files adds a measure or trust in the code.

3. Deal with JIT, DBT, etc

   In (1) and (2), we deal with machine code. In (3), there is some source
   from which dynamic code needs to be generated using a code generator.
   E.g., JIT code from Java byte code. Here, the solution I had in mind
   had two parts:

       - Make the source more trustworthy by requiring it to be part
         of a signed file
       - Design a code generator trusted and used exclusively by the kernel

In this patchset, I wanted to lay a foundation for all 3 and attempt to
solve (1) first. Once this was in place, I wanted to do (2) and then (3).

In retrospect, I should have probably started with the big picture first
instead of starting with just item (1). But I always had the big picture
in mind. That said, I did not necessarily have all the details fleshed
out for all the phases. (3) is complex.

My focus was to define the API in a generic enough fashion so that all
3 phases can be implemented. But I realize that it is a hard sell at this
point to convince people that the API is adequate for phase 3. So,
I have decided to do (1) and (2). (3) has to be done separately with
more thought and details put into it.

Also, it may be the case that there are some examples of dynamic code
out there than can never be addressed. My goal is to try to address a
majority of the dynamic code out there.

> To be clear, my strong opinion is that we should not be trying to do
> this sort of emulation or code generation within the kernel. I do think
> it's worthwhile to look at mechanisms to make it harder to subvert
> dynamic userspace code generation, but I think the code generation
> itself needs to live in userspace (e.g. for ABI reasons I previously
> mentioned).
> 

I completely agree that the kernel should not deal with the complexities
of code generation and ABI details. My version 1 did not have any code
generation. But since a performance issue was raised, I explored the idea
of kernel code generation. To be honest, I was not really that
comfortable with the idea.

That is why I have decided to implement the second piece I had in
my plan now. This piece does not have the code generation complexities
or ABI issues. This piece can be used to solve libffi, GCC, etc.
I will still write the code in such a way that I can use the first
approach in the future if I really need it. But it will not involve any
code generation from the kernel. It will only be used for cases that
don't mind the extra trip to the kernel.

Madhavan

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-12 18:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200812173911.GT1236603@ZenIV.linux.org.uk>

On Wed, Aug 12, 2020 at 06:39:11PM +0100, Al Viro wrote:
> On Wed, Aug 12, 2020 at 07:16:37PM +0200, Miklos Szeredi wrote:
> > On Wed, Aug 12, 2020 at 6:33 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Wed, Aug 12, 2020 at 05:13:14PM +0200, Miklos Szeredi wrote:
> > 
> > > > Why does it have to have a struct mount?  It does not have to use
> > > > dentry/mount based path lookup.
> > >
> > > What the fuck?  So we suddenly get an additional class of objects
> > > serving as kinda-sorta analogues of dentries *AND* now struct file
> > > might refer to that instead of a dentry/mount pair - all on the VFS
> > > level?  And so do all the syscalls you want to allow for such "pathnames"?
> > 
> > The only syscall I'd want to allow is open, everything else would be
> > on the open files themselves.
> > 
> > file->f_path can refer to an anon mount/inode, the real object is
> > referred to by file->private_data.
> > 
> > The change to namei.c would be on the order of ~10 lines.  No other
> > parts of the VFS would be affected.
> 
> If some of the things you open are directories (and you *have* said that
> directories will be among those just upthread, and used references to
> readdir() as argument in favour of your approach elsewhere in the thread),
> you will have to do something about fchdir().  And that's the least of
> the issues.

BTW, what would such opened files look like from /proc/*/fd/* POV?  And
what would happen if you walk _through_ that symlink, with e.g. ".."
following it?  Or with names of those attributes, for that matter...
What about a normal open() of such a sucker?  It won't know where to
look for your ->private_data...

FWIW, you keep refering to regularity of this stuff from the syscall
POV, but it looks like you have no real idea of what subset of the
things available for normal descriptors will be available for those.

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Linus Torvalds @ 2020-08-12 18:18 UTC (permalink / raw)
  To: David Howells
  Cc: Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <52483.1597190733@warthog.procyon.org.uk>

On Tue, Aug 11, 2020 at 5:05 PM David Howells <dhowells@redhat.com> wrote:
>
> Well, the start of it was my proposal of an fsinfo() system call.

Ugh. Ok, it's that thing.

This all seems *WAY* over-designed - both your fsinfo and Miklos' version.

What's wrong with fstatfs()? All the extra magic metadata seems to not
really be anything people really care about.

What people are actually asking for seems to be some unique mount ID,
and we have 16 bytes of spare information in 'struct statfs64'.

All the other fancy fsinfo stuff seems to be "just because", and like
complete overdesign.

Let's not add system calls just because we can.

             Linus

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-12 17:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegv8MTnO9YAiFUJPjr3ryeT82=KWHUpLFmgRNOcQfeS17w@mail.gmail.com>

On Wed, Aug 12, 2020 at 07:16:37PM +0200, Miklos Szeredi wrote:
> On Wed, Aug 12, 2020 at 6:33 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Wed, Aug 12, 2020 at 05:13:14PM +0200, Miklos Szeredi wrote:
> 
> > > Why does it have to have a struct mount?  It does not have to use
> > > dentry/mount based path lookup.
> >
> > What the fuck?  So we suddenly get an additional class of objects
> > serving as kinda-sorta analogues of dentries *AND* now struct file
> > might refer to that instead of a dentry/mount pair - all on the VFS
> > level?  And so do all the syscalls you want to allow for such "pathnames"?
> 
> The only syscall I'd want to allow is open, everything else would be
> on the open files themselves.
> 
> file->f_path can refer to an anon mount/inode, the real object is
> referred to by file->private_data.
> 
> The change to namei.c would be on the order of ~10 lines.  No other
> parts of the VFS would be affected.

If some of the things you open are directories (and you *have* said that
directories will be among those just upthread, and used references to
readdir() as argument in favour of your approach elsewhere in the thread),
you will have to do something about fchdir().  And that's the least of
the issues.

>   Maybe I'm optimistic; we'll
> see...


> Now off to something completely different.  Back on Tuesday.

... after the window closes.  You know, it's really starting to look
like rather nasty tactical games...

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12 17:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200812163347.GS1236603@ZenIV.linux.org.uk>

On Wed, Aug 12, 2020 at 6:33 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Aug 12, 2020 at 05:13:14PM +0200, Miklos Szeredi wrote:

> > Why does it have to have a struct mount?  It does not have to use
> > dentry/mount based path lookup.
>
> What the fuck?  So we suddenly get an additional class of objects
> serving as kinda-sorta analogues of dentries *AND* now struct file
> might refer to that instead of a dentry/mount pair - all on the VFS
> level?  And so do all the syscalls you want to allow for such "pathnames"?

The only syscall I'd want to allow is open, everything else would be
on the open files themselves.

file->f_path can refer to an anon mount/inode, the real object is
referred to by file->private_data.

The change to namei.c would be on the order of ~10 lines.  No other
parts of the VFS would be affected.   Maybe I'm optimistic; we'll
see...

Now off to something completely different.  Back on Tuesday.

Thanks,
Miklos

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Deven Bowers @ 2020-08-12 17:07 UTC (permalink / raw)
  To: Chuck Lever, James Morris
  Cc: Mimi Zohar, James Bottomley, Pavel Machek, Sasha Levin, snitzer,
	dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet, nramas,
	serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <70603A4E-A548-4ECB-97D4-D3102CE77701@gmail.com>

On 8/12/2020 7:18 AM, Chuck Lever wrote:
> 
> 
>> On Aug 11, 2020, at 5:03 PM, James Morris <jmorris@namei.org> wrote:
>>
>> On Sat, 8 Aug 2020, Chuck Lever wrote:
>>
>>> My interest is in code integrity enforcement for executables stored
>>> in NFS files.
>>>
>>> My struggle with IPE is that due to its dependence on dm-verity, it
>>> does not seem to able to protect content that is stored separately
>>> from its execution environment and accessed via a file access
>>> protocol (FUSE, SMB, NFS, etc).
>>
>> It's not dependent on DM-Verity, that's just one possible integrity
>> verification mechanism, and one of two supported in this initial
>> version. The other is 'boot_verified' for a verified or otherwise trusted
>> rootfs. Future versions will support FS-Verity, at least.
>>
>> IPE was designed to be extensible in this way, with a strong separation of
>> mechanism and policy.
> 
> I got that, but it looked to me like the whole system relied on having
> access to the block device under the filesystem. That's not possible
> for a remote filesystem like Ceph or NFS.

Block device structure no, (though that's what the currently used, to be
fair). It really has a hard dependency on the file structure,
specifically the ability to determine whether that file structure can be 
used to navigate back to the integrity claim provided by the mechanism.

In the current world of IPE, the integrity claim is the root-hash or 
root-hash-signature on the block device, provided by dm-verity's 
setsecurity hooks (also introduced in this series).

> 
> I'm happy to take a closer look if someone can point me the right way.
> 

Sure, if you look at the 2nd patch, you want to look at the file 
"security/ipe/ipe-property.h", it defines what methods are required to
be implemented by a mechanism to work with IPE. It passes the engine
context which is defined as:

  struct ipe_engine_ctx {
  	enum ipe_op op;
  	enum ipe_hook hook;
  	const struct file *file;
  	const char *audit_pathname;
	const struct ipe_bdev_blob *sec_bdev;
  };

Now, if the security blob existed for the block_device, it would be
in sec_bdev, but that may be NULL, as well to be fair.

If you want a more worked example of how integration works, patches 8
and 10 introduce the dm-verity properties mentioned in this patch.

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-12 16:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegsQF1aN4XJ_8j977rnQESxc=Kcn7Z2C+LnVDWXo4PKhTQ@mail.gmail.com>

On Wed, Aug 12, 2020 at 05:13:14PM +0200, Miklos Szeredi wrote:

> > Lovely.  And what of fchdir() to those?
> 
> Not allowed.

Not allowed _how_?  Existing check is "is it a directory"; what do you
propose?  IIRC, you've mentioned using readdir() in that context, so
it's not that you only allow to open the leaves there.

> > > > Is that a flat space, or can they be directories?"
> > >
> > > Yes it has a directory tree.   But you can't mkdir, rename, link,
> > > symlink, etc on anything in there.
> >
> > That kills the "shared inode" part - you'll get deadlocks from
> > hell that way.
> 
> No.  The shared inode is not for lookup, just for the open file.

Bloody hell...  So what inodes are you using for lookups?  And that
thing you would be passing to readdir() - what inode will _that_ have?

> > Next: what will that tree be attached to?  As in, "what's the parent
> > of its root"?  And while we are at it, what will be the struct mount
> > used with those - same as the original file, something different
> > attached to it, something created on the fly for each pathwalk and
> > lazy-umounted?  And see above re fchdir() - if they can be directories,
> > it's very much in the game.
> 
> Why does it have to have a struct mount?  It does not have to use
> dentry/mount based path lookup.

What the fuck?  So we suddenly get an additional class of objects
serving as kinda-sorta analogues of dentries *AND* now struct file
might refer to that instead of a dentry/mount pair - all on the VFS
level?  And so do all the syscalls you want to allow for such "pathnames"?

Sure, that avoids all questions about dcache interactions - by growing
a replacement layer and making just about everything in fs/namei.c,
fs/open.c, etc. special-case the handling of that crap.

But yes, the syscall-level interface will be simple.  Wonderful.

I really hope that's not what you have in mind, though.

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: James Bottomley @ 2020-08-12 15:51 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <02D551EF-C975-4B91-86CA-356FA0FF515C@gmail.com>

On Wed, 2020-08-12 at 10:15 -0400, Chuck Lever wrote:
> > On Aug 11, 2020, at 11:53 AM, James Bottomley
> > <James.Bottomley@HansenPartnership.com> wrote:
> > 
> > On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
[...]
> > > > 
> > > > and what is nice to have to speed up the verification
> > > > process.  The choice for the latter is cache or reconstruct
> > > > depending on the resources available.  If the tree gets cached
> > > > on the server, that would be a server implementation detail
> > > > invisible to the client.
> > > 
> > > We assume that storage targets (for block or file) are not
> > > trusted. Therefore storage clients cannot rely on intermediate
> > > results (eg, middle nodes in a Merkle tree) unless those results
> > > are generated within the client's trust envelope.
> > 
> > Yes, they can ... because supplied nodes can be verified.  That's
> > the whole point of a merkle tree.  As long as I'm sure of the root
> > hash I can verify all the rest even if supplied by an untrusted
> > source.  If you consider a simple merkle tree covering 4 blocks:
> > 
> >       R
> >     /   \
> >  H11     H12
> >  / \     / \
> > H21 H22 H23 H24
> > >    |   |   |
> > 
> > B1   B2  B3  B4
> > 
> > Assume I have the verified root hash R.  If you supply B3 you also
> > supply H24 and H11 as proof.  I verify by hashing B3 to produce H23
> > then hash H23 and H24 to produce H12 and if H12 and your supplied
> > H11 hash to R the tree is correct and the B3 you supplied must
> > likewise be correct.
> 
> I'm not sure what you are proving here. Obviously this has to work
> in order for a client to reconstruct the file's Merkle tree given
> only R and the file content.

You implied the server can't be trusted to generate the merkel tree. 
I'm showing above it can because of the tree path based verification.

> It's the construction of the tree and verification of the hashes that
> are potentially expensive. The point of caching intermediate hashes
> is so that the client verifies them as few times as possible.  I
> don't see value in caching those hashes on an untrusted server --
> the client will have to reverify them anyway, and there will be no
> savings.

I'm not making any claim about server caching, I'm just saying the
client can request pieces of the tree from the server without having to
reconstruct the whole thing itself because it can verify their
correctness.

> Cache once, as close as you can to where the data will be used.
> 
> 
> > > So: if the storage target is considered inside the client's trust
> > > envelope, it can cache or store durably any intermediate parts of
> > > the verification process. If not, the network and file storage is
> > > considered untrusted, and the client has to rely on nothing but
> > > the signed digest of the tree root.
> > > 
> > > We could build a scheme around, say, fscache, that might save the
> > > intermediate results durably and locally.
> > 
> > I agree we want caching on the client, but we can always page in
> > from the remote as long as we page enough to verify up to R, so
> > we're always sure the remote supplied genuine information.
> 
> Agreed.
> 
> 
> > > > > For this reason, the idea was to save only the signature of
> > > > > the tree's root on durable storage. The client would retrieve
> > > > > that signature possibly at open time, and reconstruct the
> > > > > tree at that time.
> > > > 
> > > > Right that's the integrity data you must have.
> > > > 
> > > > > Or the tree could be partially constructed on-demand at the
> > > > > time each unit is to be checked (say, as part of 2. above).
> > > > 
> > > > Whether it's reconstructed or cached can be an implementation
> > > > detail. You clearly have to reconstruct once, but whether you
> > > > have to do it again depends on the memory available for caching
> > > > and all the other resource calls in the system.
> > > > 
> > > > > The client would have to reconstruct that tree again if
> > > > > memory pressure caused some or all of the tree to be evicted,
> > > > > so perhaps an on-demand mechanism is preferable.
> > > > 
> > > > Right, but I think that's implementation detail.  Probably what
> > > > we need is a way to get the log(N) verification hashes from the
> > > > server and it's up to the client whether it caches them or not.
> > > 
> > > Agreed, these are implementation details. But see above about the
> > > trustworthiness of the intermediate hashes. If they are conveyed
> > > on an untrusted network, then they can't be trusted either.
> > 
> > Yes, they can, provided enough of them are asked for to verify.  If
> > you look at the simple example above, suppose I have cached H11 and
> > H12, but I've lost the entire H2X layer.  I want to verify B3 so I
> > also ask you for your copy of H24.  Then I generate H23 from B3 and
> > Hash H23 and H24.  If this doesn't hash to H12 I know either you
> > supplied me the wrong block or lied about H24.  However, if it all
> > hashes correctly I know you supplied me with both the correct B3
> > and the correct H24.
> 
> My point is there is a difference between a trusted cache and an
> untrusted cache. I argue there is not much value in a cache where
> the hashes have to be verified again.

And my point isn't about caching, it's about where the tree comes from.
 I claim and you agree the client can get the tree from the server a
piece at a time (because it can path verify it) and doesn't have to
generate it itself.  How much of the tree the client has to store and
whether the server caches, reads it in from somewhere or reconstructs
it is an implementation detail.

James


^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: James Bottomley @ 2020-08-12 15:42 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <2CA41152-6445-4716-B5EE-2D14E5C59368@gmail.com>

On Wed, 2020-08-12 at 09:56 -0400, Chuck Lever wrote:
> > On Aug 11, 2020, at 2:28 PM, James Bottomley <James.Bottomley@Hanse
> > nPartnership.com> wrote:
> > 
> > On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
> > > Mimi's earlier point is that any IMA metadata format that
> > > involves unsigned digests is exposed to an alteration attack at
> > > rest or in transit, thus will not provide a robust end-to-end
> > > integrity guarantee.
> > 
> > I don't believe that is Mimi's point, because it's mostly not
> > correct: the xattr mechanism does provide this today.  The point is
> > the mechanism we use for storing IMA hashes and signatures today is
> > xattrs because they have robust security properties for local
> > filesystems that the kernel enforces.  This use goes beyond IMA,
> > selinux labels for instance use this property as well.
> 
> I don't buy this for a second. If storing a security label in a
> local xattr is so secure, we wouldn't have any need for EVM.

What don't you buy?  Security xattrs can only be updated by local root.
 If you trust local root, the xattr mechanism is fine ... it's the only
one a lot of LSMs use, for instance.  If you don't trust local root or
worry about offline backups, you use EVM.  A thing isn't secure or
insecure, it depends on the threat model.  However, if you don't trust
the NFS server it doesn't matter whether you do or don't trust local
root, you can't believe the contents of the xattr.

> > What I think you're saying is that NFS can't provide the robust
> > security for xattrs we've been relying on, so you need some other
> > mechanism for storing them.
> 
> For NFS, there's a network traversal which is an attack surface.
> 
> A local xattr can be attacked as well: a device or bus malfunction
> can corrupt the content of an xattr, or a privileged user can modify
> it.
> 
> How does that metadata get from the software provider to the end
> user? It's got to go over a network, stored in various ways, some
> of which will not be trusted. To attain an unbroken chain of
> provenance, that metadata has to be signed.
> 
> I don't think the question is the storage mechanism, but rather the
> protection mechanism. Signing the metadata protects it in all of
> these cases.

I think we're saying about the same thing.  For most people the
security mechanism of local xattrs is sufficient.  If you're paranoid,
you don't believe it is and you use EVM.

> > I think Mimi's other point is actually that IMA uses a flat hash
> > which we derive by reading the entire file and then watching for
> > mutations. Since you cannot guarantee we get notice of mutation
> > with NFS, the entire IMA mechanism can't really be applied in its
> > current form and we have to resort to chunk at a time verifications
> > that a Merkel tree would provide.
> 
> I'm not sure what you mean by this. An NFS client relies on
> notification of mutation to maintain the integrity of its cache of
> NFS file content, and it's done that since the 1980s.

Mutation detection is part of the current IMA security model.  If IMA
sees a file mutate it has to be rehashed the next time it passes the
gate.  If we can't trust the NFS server, we can't trust the NFS
mutation notification and we have to have a different mechanism to
check the file.

> In addition to examining a file's mtime and ctime as maintained by
> the NFS server, a client can rely on the file's NFSv4 change
> attribute or an NFSv4 delegation.

And that's secure in the face of a malicious or compromised server?

The bottom line is still, I think we can't use linear hashes with an
open/exec/mmap gate with NFS and we have to move to chunk at a time
verification like that provided by a merkel tree.

James


^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: David Howells @ 2020-08-12 15:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Al Viro, Linus Torvalds, Jann Horn, Casey Schaufler,
	Andy Lutomirski, linux-fsdevel, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegsQF1aN4XJ_8j977rnQESxc=Kcn7Z2C+LnVDWXo4PKhTQ@mail.gmail.com>

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Why does it have to have a struct mount?  It does not have to use
> dentry/mount based path lookup.

file->f_path.mnt

David


^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12 15:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200812150807.GR1236603@ZenIV.linux.org.uk>

On Wed, Aug 12, 2020 at 5:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Aug 12, 2020 at 04:46:20PM +0200, Miklos Szeredi wrote:
>
> > > "Can those suckers be passed to
> > > ...at() as starting points?
> >
> > No.
>
> Lovely.  And what of fchdir() to those?

Not allowed.

> Are they all non-directories?
> Because the starting point of ...at() can be simulated that way...
>
> > >  Can they be bound in namespace?
> >
> > No.
> >
> > > Can something be bound *on* them?
> >
> > No.
> >
> > >  What do they have for inodes
> > > and what maintains their inumbers (and st_dev, while we are at
> > > it)?
> >
> > Irrelevant.  Can be some anon dev + shared inode.
> >
> > The only attribute of an attribute that I can think of that makes
> > sense would be st_size, but even that is probably unimportant.
> >
> > >  Can _they_ have secondaries like that (sensu Swift)?
> >
> > Reference?
>
> http://www.online-literature.com/swift/3515/
>         So, naturalists observe, a flea
>         Has smaller fleas that on him prey;
>         And these have smaller still to bite 'em,
>         And so proceed ad infinitum.
> of course ;-)
> IOW, can the things in those trees have secondary trees on them, etc.?
> Not "will they have it in your originally intended use?" - "do we need
> the architecture of the entire thing to be capable to deal with that?"

No.

>
> > > Is that a flat space, or can they be directories?"
> >
> > Yes it has a directory tree.   But you can't mkdir, rename, link,
> > symlink, etc on anything in there.
>
> That kills the "shared inode" part - you'll get deadlocks from
> hell that way.

No.  The shared inode is not for lookup, just for the open file.

>  "Can't mkdir" doesn't save you from that.  BTW,
> what of unlink()?  If the tree shape is not a hardwired constant,
> you get to decide how it's initially populated...
>
> Next: what will that tree be attached to?  As in, "what's the parent
> of its root"?  And while we are at it, what will be the struct mount
> used with those - same as the original file, something different
> attached to it, something created on the fly for each pathwalk and
> lazy-umounted?  And see above re fchdir() - if they can be directories,
> it's very much in the game.

Why does it have to have a struct mount?  It does not have to use
dentry/mount based path lookup.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-12 15:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegvFBdp3v9VcCp-wNDjZnQF3q6cufb-8PJieaGDz14sbBg@mail.gmail.com>

On Wed, Aug 12, 2020 at 04:46:20PM +0200, Miklos Szeredi wrote:

> > "Can those suckers be passed to
> > ...at() as starting points?
> 
> No.

Lovely.  And what of fchdir() to those?  Are they all non-directories?
Because the starting point of ...at() can be simulated that way...

> >  Can they be bound in namespace?
> 
> No.
> 
> > Can something be bound *on* them?
> 
> No.
> 
> >  What do they have for inodes
> > and what maintains their inumbers (and st_dev, while we are at
> > it)?
> 
> Irrelevant.  Can be some anon dev + shared inode.
> 
> The only attribute of an attribute that I can think of that makes
> sense would be st_size, but even that is probably unimportant.
> 
> >  Can _they_ have secondaries like that (sensu Swift)?
> 
> Reference?

http://www.online-literature.com/swift/3515/
	So, naturalists observe, a flea
	Has smaller fleas that on him prey;
	And these have smaller still to bite 'em,
	And so proceed ad infinitum.
of course ;-)
IOW, can the things in those trees have secondary trees on them, etc.?
Not "will they have it in your originally intended use?" - "do we need
the architecture of the entire thing to be capable to deal with that?"

> > Is that a flat space, or can they be directories?"
> 
> Yes it has a directory tree.   But you can't mkdir, rename, link,
> symlink, etc on anything in there.

That kills the "shared inode" part - you'll get deadlocks from
hell that way.  "Can't mkdir" doesn't save you from that.  BTW,
what of unlink()?  If the tree shape is not a hardwired constant,
you get to decide how it's initially populated...

Next: what will that tree be attached to?  As in, "what's the parent
of its root"?  And while we are at it, what will be the struct mount
used with those - same as the original file, something different
attached to it, something created on the fly for each pathwalk and
lazy-umounted?  And see above re fchdir() - if they can be directories,
it's very much in the game.

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12 14:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200812143957.GQ1236603@ZenIV.linux.org.uk>

On Wed, Aug 12, 2020 at 4:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Aug 12, 2020 at 09:23:23AM +0200, Miklos Szeredi wrote:
>
> > Anyway, starting with just introducing the alt namespace without
> > unification seems to be a good first step. If that turns out to be
> > workable, we can revisit unification later.
>
> Start with coming up with answers to the questions on semantics
> upthread.  To spare you the joy of digging through the branches
> of that thread, how's that for starters?
>
> "Can those suckers be passed to
> ...at() as starting points?

No.

>  Can they be bound in namespace?

No.

> Can something be bound *on* them?

No.

>  What do they have for inodes
> and what maintains their inumbers (and st_dev, while we are at
> it)?

Irrelevant.  Can be some anon dev + shared inode.

The only attribute of an attribute that I can think of that makes
sense would be st_size, but even that is probably unimportant.

>  Can _they_ have secondaries like that (sensu Swift)?

Reference?

> Is that a flat space, or can they be directories?"

Yes it has a directory tree.   But you can't mkdir, rename, link,
symlink, etc on anything in there.

Thanks,
Miklos

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Chuck Lever @ 2020-08-12 14:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <1597159969.4325.21.camel@HansenPartnership.com>

> On Aug 11, 2020, at 11:32 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 1:43 AM, James Bottomley
>>> <James.Bottomley@HansenPartnership.com> wrote:
>>> On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
> [...]
>>>> Thanks for the help! I just want to emphasize that documentation
>>>> (eg, a specification) will be critical for remote filesystems.
>>>> 
>>>> If any of this is to be supported by a remote filesystem, then we
>>>> need an unencumbered description of the new metadata format
>>>> rather than code. GPL-encumbered formats cannot be contributed to
>>>> the NFS standard, and are probably difficult for other
>>>> filesystems that are not Linux-native, like SMB, as well.
>>> 
>>> I don't understand what you mean by GPL encumbered formats.  The
>>> GPL is a code licence not a data or document licence.
>> 
>> IETF contributions occur under a BSD-style license incompatible
>> with the GPL.
>> 
>> https://trustee.ietf.org/trust-legal-provisions.html
>> 
>> Non-Linux implementers (of OEM storage devices) rely on such
>> standards processes to indemnify them against licensing claims.
> 
> Well, that simply means we won't be contributing the Linux
> implementation, right?

At the present time, there is nothing but the Linux implementation.
There's no English description, there's no specification of the
formats, the format is described only by source code.

The only way to contribute current IMA metadata formats to an open
standards body like the IETF is to look at encumbered code first.
We would effectively be contributing an implementation in this case.

(I'm not saying the current formats should or should not be
contributed; merely that there is a legal stumbling block to doing
so that can be avoided for newly defined formats).

> Well, let me put the counterpoint: I can write a book about how linux
> device drivers work (which includes describing the data formats)

Our position is that someone who reads that book and implements those
formats under a non-GPL-compatible license would be in breach of the
GPL.

The point of the standards process is to indemnify implementing
and distributing under _any_ license what has been published by the
standards body. That legally enables everyone to use the published
protocol/format in their own code no matter how it happens to be
licensed.

> Fine, good grief, people who take a sensible view of this can write the
> data format down and publish it under any licence you like then you can
> pick it up again safely.

That's what I proposed. Write it down under the IETF Trust legal
provisions license. And I volunteered to do that.

All I'm saying is that description needs to come before code.

--
Chuck Lever
chucklever@gmail.com

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Al Viro @ 2020-08-12 14:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Jann Horn, Casey Schaufler, Andy Lutomirski,
	linux-fsdevel, David Howells, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegtXtj2Q1wsR-3eUNA0S=_skzHF0CEmcK_Krd8dtKkWkGA@mail.gmail.com>

On Wed, Aug 12, 2020 at 09:23:23AM +0200, Miklos Szeredi wrote:

> Anyway, starting with just introducing the alt namespace without
> unification seems to be a good first step. If that turns out to be
> workable, we can revisit unification later.

Start with coming up with answers to the questions on semantics
upthread.  To spare you the joy of digging through the branches
of that thread, how's that for starters?

"Can those suckers be passed to
...at() as starting points?  Can they be bound in namespace?
Can something be bound *on* them?  What do they have for inodes
and what maintains their inumbers (and st_dev, while we are at
it)?  Can _they_ have secondaries like that (sensu Swift)?
Is that a flat space, or can they be directories?"

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: David Howells @ 2020-08-12 14:23 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegvLaoQHZTm1-QKorzsL3ZDnTOcHpcAJn36yF=n-YymCow@mail.gmail.com>

Miklos Szeredi <miklos@szeredi.hu> wrote:

> The point is that generic operations already exist and no need to add
> new, specialized ones to access metadata.

open and read already exist, yes, but the metadata isn't currently in
convenient inodes and dentries that you can just walk through.  So you're
going to end up with a specialised filesystem instead, I suspect.  Basically,
it's the same as your do-everything-through-/proc/self/fds/ approach.

And it's going to be heavier.  I don't know if you're planning on creating a
superblock each time you do an O_ALT open, but you will end up creating some
inodes, dentries and a file - even before you get to the reading bit.

David

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Chuck Lever @ 2020-08-12 14:18 UTC (permalink / raw)
  To: James Morris
  Cc: Mimi Zohar, James Bottomley, Deven Bowers, Pavel Machek,
	Sasha Levin, snitzer, dm-devel, tyhicks, agk, Paul Moore,
	Jonathan Corbet, nramas, serge, pasha.tatashin, Jann Horn,
	linux-block, Al Viro, Jens Axboe, mdsakib, open list, eparis,
	linux-security-module, linux-audit, linux-fsdevel,
	linux-integrity, jaskarankhurana
In-Reply-To: <alpine.LRH.2.21.2008120643370.10591@namei.org>



> On Aug 11, 2020, at 5:03 PM, James Morris <jmorris@namei.org> wrote:
> 
> On Sat, 8 Aug 2020, Chuck Lever wrote:
> 
>> My interest is in code integrity enforcement for executables stored
>> in NFS files.
>> 
>> My struggle with IPE is that due to its dependence on dm-verity, it
>> does not seem to able to protect content that is stored separately
>> from its execution environment and accessed via a file access
>> protocol (FUSE, SMB, NFS, etc).
> 
> It's not dependent on DM-Verity, that's just one possible integrity 
> verification mechanism, and one of two supported in this initial 
> version. The other is 'boot_verified' for a verified or otherwise trusted 
> rootfs. Future versions will support FS-Verity, at least.
> 
> IPE was designed to be extensible in this way, with a strong separation of 
> mechanism and policy.

I got that, but it looked to me like the whole system relied on having
access to the block device under the filesystem. That's not possible
for a remote filesystem like Ceph or NFS.

I'm happy to take a closer look if someone can point me the right way.


--
Chuck Lever
chucklever@gmail.com




^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Chuck Lever @ 2020-08-12 14:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <1597161218.4325.38.camel@HansenPartnership.com>



> On Aug 11, 2020, at 11:53 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 1:43 AM, James Bottomley <James.Bottomley@Hanse
>>> nPartnership.com> wrote:
>>> 
>>> On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
>>>>> On Aug 10, 2020, at 11:35 AM, James Bottomley
>>>>> <James.Bottomley@HansenPartnership.com> wrote:
> [...]
>>>>> The first basic is that a merkle tree allows unit at a time
>>>>> verification. First of all we should agree on the unit.  Since
>>>>> we always fault a page at a time, I think our merkle tree unit
>>>>> should be a page not a block.
>>>> 
>>>> Remote filesystems will need to agree that the size of that unit
>>>> is the same everywhere, or the unit size could be stored in the
>>>> per-filemetadata.
>>>> 
>>>> 
>>>>> Next, we should agree where the check gates for the per page
>>>>> accesses should be ... definitely somewhere in readpage, I
>>>>> suspect and finally we should agree how the merkle tree is
>>>>> presented at the gate.  I think there are three ways:
>>>>> 
>>>>> 1. Ahead of time transfer:  The merkle tree is transferred and
>>>>> verified
>>>>>    at some time before the accesses begin, so we already have
>>>>> a
>>>>>    verified copy and can compare against the lower leaf.
>>>>> 2. Async transfer:  We provide an async mechanism to transfer
>>>>> the
>>>>>    necessary components, so when presented with a unit, we
>>>>> check the
>>>>>    log n components required to get to the root
>>>>> 3. The protocol actually provides the capability of 2 (like
>>>>> the SCSI
>>>>>    DIF/DIX), so to IMA all the pieces get presented instead of
>>>>> IMA
>>>>>    having to manage the tree
>>>> 
>>>> A Merkle tree is potentially large enough that it cannot be
>>>> stored in an extended attribute. In addition, an extended
>>>> attribute is not a byte stream that you can seek into or read
>>>> small parts of, it is retrieved in a single shot.
>>> 
>>> Well you wouldn't store the tree would you, just the head
>>> hash.  The rest of the tree can be derived from the data.  You need
>>> to distinguish between what you *must* have to verify integrity
>>> (the head hash, possibly signed)
>> 
>> We're dealing with an untrusted storage device, and for a remote
>> filesystem, an untrusted network.
>> 
>> Mimi's earlier point is that any IMA metadata format that involves
>> unsigned digests is exposed to an alteration attack at rest or in
>> transit, thus will not provide a robust end-to-end integrity
>> guarantee.
>> 
>> Therefore, tree root digests must be cryptographically signed to be
>> properly protected in these environments. Verifying that signature
>> should be done infrequently relative to reading a file's content.
> 
> I'm not disagreeing there has to be a way for the relying party to
> trust the root hash.
> 
>>> and what is nice to have to speed up the verification
>>> process.  The choice for the latter is cache or reconstruct
>>> depending on the resources available.  If the tree gets cached on
>>> the server, that would be a server implementation detail invisible
>>> to the client.
>> 
>> We assume that storage targets (for block or file) are not trusted.
>> Therefore storage clients cannot rely on intermediate results (eg,
>> middle nodes in a Merkle tree) unless those results are generated
>> within the client's trust envelope.
> 
> Yes, they can ... because supplied nodes can be verified.  That's the
> whole point of a merkle tree.  As long as I'm sure of the root hash I
> can verify all the rest even if supplied by an untrusted source.  If
> you consider a simple merkle tree covering 4 blocks:
> 
>       R
>     /   \
>  H11     H12
>  / \     / \
> H21 H22 H23 H24
> |    |   |   |
> B1   B2  B3  B4
> 
> Assume I have the verified root hash R.  If you supply B3 you also
> supply H24 and H11 as proof.  I verify by hashing B3 to produce H23
> then hash H23 and H24 to produce H12 and if H12 and your supplied H11
> hash to R the tree is correct and the B3 you supplied must likewise be
> correct.

I'm not sure what you are proving here. Obviously this has to work
in order for a client to reconstruct the file's Merkle tree given
only R and the file content.

It's the construction of the tree and verification of the hashes that
are potentially expensive. The point of caching intermediate hashes
is so that the client verifies them as few times as possible.  I
don't see value in caching those hashes on an untrusted server --
the client will have to reverify them anyway, and there will be no
savings.

Cache once, as close as you can to where the data will be used.


>> So: if the storage target is considered inside the client's trust
>> envelope, it can cache or store durably any intermediate parts of
>> the verification process. If not, the network and file storage is
>> considered untrusted, and the client has to rely on nothing but the
>> signed digest of the tree root.
>> 
>> We could build a scheme around, say, fscache, that might save the
>> intermediate results durably and locally.
> 
> I agree we want caching on the client, but we can always page in from
> the remote as long as we page enough to verify up to R, so we're always
> sure the remote supplied genuine information.

Agreed.


>>>> For this reason, the idea was to save only the signature of the
>>>> tree's root on durable storage. The client would retrieve that
>>>> signature possibly at open time, and reconstruct the tree at that
>>>> time.
>>> 
>>> Right that's the integrity data you must have.
>>> 
>>>> Or the tree could be partially constructed on-demand at the time
>>>> each unit is to be checked (say, as part of 2. above).
>>> 
>>> Whether it's reconstructed or cached can be an implementation
>>> detail. You clearly have to reconstruct once, but whether you have
>>> to do it again depends on the memory available for caching and all
>>> the other resource calls in the system.
>>> 
>>>> The client would have to reconstruct that tree again if memory
>>>> pressure caused some or all of the tree to be evicted, so perhaps
>>>> an on-demand mechanism is preferable.
>>> 
>>> Right, but I think that's implementation detail.  Probably what we
>>> need is a way to get the log(N) verification hashes from the server
>>> and it's up to the client whether it caches them or not.
>> 
>> Agreed, these are implementation details. But see above about the
>> trustworthiness of the intermediate hashes. If they are conveyed
>> on an untrusted network, then they can't be trusted either.
> 
> Yes, they can, provided enough of them are asked for to verify.  If you
> look at the simple example above, suppose I have cached H11 and H12,
> but I've lost the entire H2X layer.  I want to verify B3 so I also ask
> you for your copy of H24.  Then I generate H23 from B3 and Hash H23 and
> H24.  If this doesn't hash to H12 I know either you supplied me the
> wrong block or lied about H24.  However, if it all hashes correctly I
> know you supplied me with both the correct B3 and the correct H24.

My point is there is a difference between a trusted cache and an
untrusted cache. I argue there is not much value in a cache where
the hashes have to be verified again.


>>>>> There are also a load of minor things like how we get the head
>>>>> hash, which must be presented and verified ahead of time for
>>>>> each of the above 3.
>>>> 
>>>> Also, changes to a file's content and its tree signature are not
>>>> atomic. If a file is mutable, then there is the period between
>>>> when the file content has changed and when the signature is
>>>> updated. Some discussion of how a client is to behave in those
>>>> situations will be necessary.
>>> 
>>> For IMA, if you write to a checked file, it gets rechecked the next
>>> time the gate (open/exec/mmap) is triggered.  This means you must
>>> complete the update and have the new integrity data in-place before
>>> triggering the check.  I think this could apply equally to a merkel
>>> tree based system.  It's a sort of Doctor, Doctor it hurts when I
>>> do this situation.
>> 
>> I imagine it's a common situation where a "yum update" process is
>> modifying executables while clients are running them. To prevent
>> a read from pulling refreshed content before the new tree root is
>> available, it would have to block temporarily until the verification
>> process succeeds with the updated tree root.
> 
> No ... it's not.  Yum specifically worries about that today because if
> you update running binaries, it causes a crash.  Yum constructs the
> entire new file then atomically links it into place and deletes the old
> inode to prevent these crashes.  It never allows you to get into the
> situation where you can execute something that will be modified. 
> That's also why you have to restart stuff after a yum update because if
> you didn't it would still be attached to the deleted inode.

Fair enough.

--
Chuck Lever
chucklever@gmail.com




^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12 14:10 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, linux-fsdevel, Al Viro, Karel Zak, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <135551.1597240486@warthog.procyon.org.uk>

On Wed, Aug 12, 2020 at 3:54 PM David Howells <dhowells@redhat.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > IOW, if you do something more along the lines of
> >
> >        fd = open(""foo/bar", O_PATH);
> >        metadatafd = openat(fd, "metadataname", O_ALT);
> >
> > it might be workable.
>
> What is it going to walk through?  You need to end up with an inode and dentry
> from somewhere.
>
> It sounds like this would have to open up a procfs-like magic filesystem, and
> walk into it.  But how would that actually work?  Would you create a new
> superblock each time you do this, labelled with the starting object (say the
> dentry for "foo/bar" in this case), and then walk from the root?
>
> An alternative, maybe, could be to make a new dentry type, say, and include it
> in the superblock of the object being queried - and let the filesystems deal
> with it.  That would mean that non-dir dentries would then have virtual
> children.  You could then even use this to implement resource forks...
>
> Another alternative would be to note O_ALT and then skip pathwalk entirely,
> but just use the name as a key to the attribute, creating an anonfd to read
> it.  But then why use openat() at all?  You could instead do:
>
>         metadatafd = openmeta(fd, "metadataname");
>
> and save the page flag.  You could even merge the two opens and do:
>
>         metadatafd = openmeta("foo/bar", "metadataname");
>
> Why not even combine this with Miklos's readfile() idea:
>
>         readmeta(AT_FDCWD, "foo/bar", "metadataname", buf, sizeof(buf));

And writemeta() and createmeta() and readdirmeta() and ...

The point is that generic operations already exist and no need to add
new, specialized ones to access metadata.

Thanks,
Miklos

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Chuck Lever @ 2020-08-12 13:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <1597170509.4325.55.camel@HansenPartnership.com>

> On Aug 11, 2020, at 2:28 PM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
>> Mimi's earlier point is that any IMA metadata format that involves
>> unsigned digests is exposed to an alteration attack at rest or in
>> transit, thus will not provide a robust end-to-end integrity
>> guarantee.
> 
> I don't believe that is Mimi's point, because it's mostly not correct:
> the xattr mechanism does provide this today.  The point is the
> mechanism we use for storing IMA hashes and signatures today is xattrs
> because they have robust security properties for local filesystems that
> the kernel enforces.  This use goes beyond IMA, selinux labels for
> instance use this property as well.

I don't buy this for a second. If storing a security label in a
local xattr is so secure, we wouldn't have any need for EVM.

> What I think you're saying is that NFS can't provide the robust
> security for xattrs we've been relying on, so you need some other
> mechanism for storing them.

For NFS, there's a network traversal which is an attack surface.

A local xattr can be attacked as well: a device or bus malfunction
can corrupt the content of an xattr, or a privileged user can modify
it.

How does that metadata get from the software provider to the end
user? It's got to go over a network, stored in various ways, some
of which will not be trusted. To attain an unbroken chain of
provenance, that metadata has to be signed.

I don't think the question is the storage mechanism, but rather the
protection mechanism. Signing the metadata protects it in all of
these cases.

> I think Mimi's other point is actually that IMA uses a flat hash which
> we derive by reading the entire file and then watching for mutations. 
> Since you cannot guarantee we get notice of mutation with NFS, the
> entire IMA mechanism can't really be applied in its current form and we
> have to resort to chunk at a time verifications that a Merkel tree
> would provide.

I'm not sure what you mean by this. An NFS client relies on notification
of mutation to maintain the integrity of its cache of NFS file content,
and it's done that since the 1980s.

In addition to examining a file's mtime and ctime as maintained by
the NFS server, a client can rely on the file's NFSv4 change attribute
or an NFSv4 delegation.

> Doesn't this make moot any thinking about
> standardisation in NFS for the current IMA flat hash mechanism because
> we simply can't use it ... If I were to construct a prototype I'd have
> to work out and securely cache the hash of ever chunk when verifying
> the flat hash so I could recheck on every chunk read.  I think that's
> infeasible for large files.
> 
> James
> 

--
Chuck Lever
chucklever@gmail.com

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12 13:54 UTC (permalink / raw)
  To: David Howells
  Cc: Karel Zak, Linus Torvalds, linux-fsdevel, Al Viro, Jeff Layton,
	Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <133508.1597239193@warthog.procyon.org.uk>

On Wed, Aug 12, 2020 at 3:33 PM David Howells <dhowells@redhat.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > You said yourself, that what's really needed is e.g. consistent
> > snapshot of a complete mount tree topology.  And to get the complete
> > topology FSINFO_ATTR_MOUNT_TOPOLOGY and FSINFO_ATTR_MOUNT_CHILDREN are
> > needed for *each* individual mount.
>
> That's not entirely true.
>
> FSINFO_ATTR_MOUNT_ALL can be used instead of FSINFO_ATTR_MOUNT_CHILDREN if you
> want to scan an entire subtree in one go.  It returns the same record type.
>
> The result from ALL/CHILDREN includes sufficient information to build the
> tree.  That only requires the parent ID.  All the rest of the information
> TOPOLOGY exposes is to do with propagation.
>
> Now, granted, I didn't include all of the topology info in the records
> returned by ALL/CHILDREN because I don't expect it to change very often.  But
> you can check the event counter supplied with each record to see if it might
> have changed - and then call TOPOLOGY on the ones that changed.

IDGI, you have all these interfaces but how will they be used?

E.g. one wants to build a consistent topology together with
propagation and attributes.   That would start with
FSINFO_ATTR_MOUNT_ALL, then iterate the given mounts calling
FSINFO_ATTR_MOUNT_INFO and FSINFO_ATTR_MOUNT_TOPOLOGY for each.  Then
when done, check the subtree notification counter with
FSINFO_ATTR_MOUNT_INFO on the top one  to see if anything has changed
in the meantime.  If it has, the whole process needs to be restarted
to see which has been changed (unless notification is also enabled).
How does the atomicity of FSINFO_ATTR_MOUNT_ALL help with that?  The
same could be done with just FSINFO_ATTR_MOUNT_CHILDREN.

And more importantly does level of consistency matter at all?  There's
no such thing for directory trees, why are mount trees different in
this respect?

> Text interfaces are also a PITA, especially when you may get multiple pieces
> of information returned in one buffer and especially when you throw in
> character escaping.  Of course, we can do it - and we do do it all over - but
> that doesn't make it efficient.

Agreed.  The format of text interfaces matters very much.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: David Howells @ 2020-08-12 13:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wjzLmMRf=QG-n+1HnxWCx4KTQn9+OhVvUSJ=ZCQd6Y1WA@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> IOW, if you do something more along the lines of
> 
>        fd = open(""foo/bar", O_PATH);
>        metadatafd = openat(fd, "metadataname", O_ALT);
> 
> it might be workable.

What is it going to walk through?  You need to end up with an inode and dentry
from somewhere.

It sounds like this would have to open up a procfs-like magic filesystem, and
walk into it.  But how would that actually work?  Would you create a new
superblock each time you do this, labelled with the starting object (say the
dentry for "foo/bar" in this case), and then walk from the root?

An alternative, maybe, could be to make a new dentry type, say, and include it
in the superblock of the object being queried - and let the filesystems deal
with it.  That would mean that non-dir dentries would then have virtual
children.  You could then even use this to implement resource forks...

Another alternative would be to note O_ALT and then skip pathwalk entirely,
but just use the name as a key to the attribute, creating an anonfd to read
it.  But then why use openat() at all?  You could instead do:

	metadatafd = openmeta(fd, "metadataname");

and save the page flag.  You could even merge the two opens and do:

	metadatafd = openmeta("foo/bar", "metadataname");

Why not even combine this with Miklos's readfile() idea:

	readmeta(AT_FDCWD, "foo/bar", "metadataname", buf, sizeof(buf));

and we're now down to one syscall and no fds and you don't even need a magic
filesystem to make it work.

There's another consideration too: Paths are not unique handles to mounts.
It's entirely possible to have colocated mounts.  We need to be able to query
all the mounts on a mountpoint.

David

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: David Howells @ 2020-08-12 13:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Karel Zak, Linus Torvalds, linux-fsdevel, Al Viro,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAJfpegs4gzvJMBz=su8KgXXxX41tv8tVhO88Eap9pDeHRaSDPA@mail.gmail.com>

Miklos Szeredi <miklos@szeredi.hu> wrote:

> You said yourself, that what's really needed is e.g. consistent
> snapshot of a complete mount tree topology.  And to get the complete
> topology FSINFO_ATTR_MOUNT_TOPOLOGY and FSINFO_ATTR_MOUNT_CHILDREN are
> needed for *each* individual mount.

That's not entirely true.

FSINFO_ATTR_MOUNT_ALL can be used instead of FSINFO_ATTR_MOUNT_CHILDREN if you
want to scan an entire subtree in one go.  It returns the same record type.

The result from ALL/CHILDREN includes sufficient information to build the
tree.  That only requires the parent ID.  All the rest of the information
TOPOLOGY exposes is to do with propagation.

Now, granted, I didn't include all of the topology info in the records
returned by ALL/CHILDREN because I don't expect it to change very often.  But
you can check the event counter supplied with each record to see if it might
have changed - and then call TOPOLOGY on the ones that changed.

If it simplifies life, I could add the propagation info into ALL/CHILDREN so
that you only need to call ALL to scan everything.  It requires larger
buffers, however.

> Adding a few generic binary interfaces is okay.   Adding many
> specialized binary interfaces is a PITA.

Text interfaces are also a PITA, especially when you may get multiple pieces
of information returned in one buffer and especially when you throw in
character escaping.  Of course, we can do it - and we do do it all over - but
that doesn't make it efficient.

David

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Miklos Szeredi @ 2020-08-12 13:09 UTC (permalink / raw)
  To: Karel Zak
  Cc: Linus Torvalds, linux-fsdevel, David Howells, Al Viro,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <20200812101405.brquf7xxt2q22dd3@ws.net.home>

On Wed, Aug 12, 2020 at 12:14 PM Karel Zak <kzak@redhat.com> wrote:

> For example,  by fsinfo(FSINFO_ATTR_MOUNT_TOPOLOGY) you get all
> mountpoint propagation setting and relations by one syscall,

That's just an arbitrary grouping of attributes.

You said yourself, that what's really needed is e.g. consistent
snapshot of a complete mount tree topology.  And to get the complete
topology FSINFO_ATTR_MOUNT_TOPOLOGY and FSINFO_ATTR_MOUNT_CHILDREN are
needed for *each* individual mount.  The topology can obviously change
between those calls.

So there's no fundamental difference between getting individual
attributes or getting attribute groups in this respect.

> It would be also nice to avoid some strings formatting and separators
> like we use in the current mountinfo.

I think quoting non-printable is okay.

> I can imagine multiple values separated by binary header (like we already
> have for watch_notification, inotify, etc):

Adding a few generic binary interfaces is okay.   Adding many
specialized binary interfaces is a PITA.

Thanks,
Miklos

^ permalink raw reply

* Re: file metadata via fs API
From: David Howells @ 2020-08-12 13:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Karel Zak, Steven Whitehouse, Linus Torvalds,
	linux-fsdevel, Al Viro, Jeff Layton, Miklos Szeredi,
	Nicolas Dichtel, Christian Brauner, Lennart Poettering, Linux API,
	Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAJfpegv4sC2zm+N5tvEmYaEFvvWJRHfdGqXUoBzbeKj81uNCvQ@mail.gmail.com>

Miklos Szeredi <miklos@szeredi.hu> wrote:

> That presumably means the mount ID <-> mount path mapping already
> exists, which means it's just possible to use the open(mount_path,
> O_PATH) to obtain the base fd.

No, you can't.  A path more correspond to multiple mounts stacked on top of
each other, e.g.:

	mount -t tmpfs none /mnt
	mount -t tmpfs none /mnt
	mount -t tmpfs none /mnt

Now you have three co-located mounts and you can't use the path to
differentiate them.  I think this might be an issue in autofs, but Ian would
need to comment on that.

David


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox