fanotify and LSM path hooks

Linux Security Modules development
 help / color / mirror / Atom feed

* fanotify and LSM path hooks
@ 2019-04-14 16:04 Amir Goldstein
  2019-04-14 16:39 ` Al Viro
  2019-04-16 15:45 ` Jan Kara
  0 siblings, 2 replies; 12+ messages in thread
From: Amir Goldstein @ 2019-04-14 16:04 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, LSM List, Serge E. Hallyn, James Morris, Al Viro,
	Miklos Szeredi, Matthew Bobrowski, Kentaro Takeda, Tetsuo Handa,
	John Johansen

Hi Jan,

I started to look at directory pre-modification "permission" hooks
that were discussed on last year's LSFMM:
https://lwn.net/Articles/755277/

The two existing fanotify_perm() hooks are called
from security_file_permission() and security_file_open()
and depend on build time CONFIG_SECURITY.
If you look at how the fsnotify_perm() hooks are planted inside the
generic security hooks, one might wonder, why are fanotify permission
hooks getting a special treatment and are not registering as LSM hooks?

One benefit from an fanotify LSM, besides more generic code, would be
that fanotify permission hooks could be disabled with boot parameters.

I only bring this up because security hooks seems like the most natural
place to add pre-modify fanotify events for the purpose of maintaining
a filesystem change journal. It would be ugly to spray more fsnotify hooks
inside security hooks instead of registering an fanotify LSM, but maybe
there are downsides of registering fanotify as LSM that I am not aware of?

Another observation relates to the security_path_ hooks.
Let's take rename as an example.
LSM could implement security_path_rename() and/or security_inode_rename()
hooks and rename syscalls will end up calling both hooks.
The security_path_ hooks are more attractive for fanotify, because the path
information could be used to setup pre-modification permission mask on
mount marks and not only on filesystem/inode marks.

One problem with security_path_ hooks is that they require an extra
build time CONFIG_SECURITY_PATH.
Another problem is that they seem to be bypassed by several subsystems.
cachefiles, ecryptfs, overlayfs and nfsd all call the vfs_rename() helper, but
only cachefiles bothers to call the security_path_rename() hook.
This is of course true for all other security_path_ hooks.
I think that is something that requires fixing regardless of the fanotify pre
modification hooks. I wonder if tomoyo and apparmor developers
(LSM that implement security_path_ hooks) are aware of those missing
hooks?

Would love to get feedback about whether or not fanotify LSM sounds
like a good or bad idea and about the security_path_ hooks questions.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-14 16:04 fanotify and LSM path hooks Amir Goldstein
@ 2019-04-14 16:39 ` Al Viro
  2019-04-14 18:51   ` Amir Goldstein
  2019-04-16 15:45 ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2019-04-14 16:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, LSM List, Serge E. Hallyn, James Morris,
	Miklos Szeredi, Matthew Bobrowski, Kentaro Takeda, Tetsuo Handa,
	John Johansen

On Sun, Apr 14, 2019 at 07:04:14PM +0300, Amir Goldstein wrote:

> Another problem is that they seem to be bypassed by several subsystems.
> cachefiles, ecryptfs, overlayfs and nfsd all call the vfs_rename() helper, but
> only cachefiles bothers to call the security_path_rename() hook.
> This is of course true for all other security_path_ hooks.
> I think that is something that requires fixing regardless of the fanotify pre
> modification hooks. I wonder if tomoyo and apparmor developers
> (LSM that implement security_path_ hooks) are aware of those missing
> hooks?

First of all, _what_ path?  You do realize that there is no such thing
as *the* pathname of dentry, right?  The same filesystem may be mounted
in any number of places, some of which might be visible in a given
namespace (including "none of them" - and you are not even guaranteed
that they are visible in any namespace at all).

It's not "bypassed", it's "inapplicable and deeply flawed in general".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-14 16:39 ` Al Viro
@ 2019-04-14 18:51   ` Amir Goldstein
  2019-04-14 19:26     ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2019-04-14 18:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, linux-fsdevel, LSM List, Serge E. Hallyn, James Morris,
	Miklos Szeredi, Matthew Bobrowski, Kentaro Takeda, Tetsuo Handa,
	John Johansen

On Sun, Apr 14, 2019 at 7:40 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Apr 14, 2019 at 07:04:14PM +0300, Amir Goldstein wrote:
>
> > Another problem is that they seem to be bypassed by several subsystems.
> > cachefiles, ecryptfs, overlayfs and nfsd all call the vfs_rename() helper, but
> > only cachefiles bothers to call the security_path_rename() hook.
> > This is of course true for all other security_path_ hooks.
> > I think that is something that requires fixing regardless of the fanotify pre
> > modification hooks. I wonder if tomoyo and apparmor developers
> > (LSM that implement security_path_ hooks) are aware of those missing
> > hooks?
>
> First of all, _what_ path?  You do realize that there is no such thing
> as *the* pathname of dentry, right?

Yap.

> The same filesystem may be mounted
> in any number of places, some of which might be visible in a given
> namespace (including "none of them" - and you are not even guaranteed
> that they are visible in any namespace at all).
>
> It's not "bypassed", it's "inapplicable and deeply flawed in general".

I am assuming that you are referring to the security_path_ hooks in
general. I cannot speak on behalf of Tomoyo and Apparmor, but I think
I understand why you view path based security policy as flawed.

From fanotify POV, passing the path that was used to operate on
dentries to fanotify lets the users choose if they want to get events
for all operations on an inode, all operations on a specific filesystem or
all operations where the inode was accessed via a specific mount
(FAN_MARK_MOUNT).

When looking at userspace applications of kernel filesystem change
notifications, like https://facebook.github.io/watchman/
I believe what users really want is a subtree watch.
So for example, you may want to monitor your git workspace(s) for changes
on a large source tree(s) to save time on git index updates.
If you bind mount your git work tree and watch the mount for changes,
chances are nobody is messing with your source files from another
mount and that you do not have hardlinks pointing outside the git
workspace root.

But the truth is I would much rather that users have a way to mark
a subtree root and ask fanotify for events under that subtree.
As a matter of fact, I have some private POC patches that allow users to
setup a mark on a "subtree root" dentry, which really marks the super block
and keep a reference to the dentry. Than every event on that super block
is filtered with is_subdir() against the marked dentry.
I am just not convinced that is the most efficient way to implement a
subtree filter... another though I had was to filter not by subtree, but by
project id and let users worry about populating their subtrees of interest
with appropriate projids.

More comments about ideas that are deeply flawed are welcome.
Those comments would hopefully help me navigate towards the
mildly flawed ideas ;-)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-14 18:51   ` Amir Goldstein
@ 2019-04-14 19:26     ` Al Viro
  2019-04-14 20:28       ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2019-04-14 19:26 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, LSM List, Serge E. Hallyn, James Morris,
	Miklos Szeredi, Matthew Bobrowski, Kentaro Takeda, Tetsuo Handa,
	John Johansen

On Sun, Apr 14, 2019 at 09:51:38PM +0300, Amir Goldstein wrote:

> But the truth is I would much rather that users have a way to mark
> a subtree root and ask fanotify for events under that subtree.
> As a matter of fact, I have some private POC patches that allow users to
> setup a mark on a "subtree root" dentry, which really marks the super block
> and keep a reference to the dentry. Than every event on that super block
> is filtered with is_subdir() against the marked dentry.

And that is_subdir() is protected by what, exactly?  And what happens
if you have many such dentries?

Or, for that matter, what happens if that dentry gets invalidated?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-14 19:26     ` Al Viro
@ 2019-04-14 20:28       ` Amir Goldstein
  0 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2019-04-14 20:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, linux-fsdevel, LSM List, Serge E. Hallyn, James Morris,
	Miklos Szeredi, Matthew Bobrowski, Kentaro Takeda, Tetsuo Handa,
	John Johansen

On Sun, Apr 14, 2019 at 10:26 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Apr 14, 2019 at 09:51:38PM +0300, Amir Goldstein wrote:
>
> > But the truth is I would much rather that users have a way to mark
> > a subtree root and ask fanotify for events under that subtree.
> > As a matter of fact, I have some private POC patches that allow users to
> > setup a mark on a "subtree root" dentry, which really marks the super block
> > and keep a reference to the dentry. Than every event on that super block
> > is filtered with is_subdir() against the marked dentry.
>
> And that is_subdir() is protected by what, exactly?  And what happens
> if you have many such dentries?
>
> Or, for that matter, what happens if that dentry gets invalidated?

Well, as I said, its just a POC, it only supports filtering by a single dentry
and I didn't think that was the way to go forward. I am looking for the best way
to go forward.

That's why I was looking for an API to "fence" renaming objects in/out of
of a subtree root. It sounds useful to have something like this for
containers and chroots.

Let's look at the options users have today.
Users can use recursive inotify that is racy and pins too many inodes in cache.
Users can use the new fanotify sb mark to get all events on filesystem and
filter them by path is userspace (also racy w.r.t ancestry).

I donno, maybe filtering by projid or another inherited persistent
inode property
is good enough for the existing use cases out there - this seems to be the way
ext4 is going with encrypted subtrees and case insensitive subtrees.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-14 16:04 fanotify and LSM path hooks Amir Goldstein
  2019-04-14 16:39 ` Al Viro
@ 2019-04-16 15:45 ` Jan Kara
  2019-04-16 18:24   ` Amir Goldstein
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2019-04-16 15:45 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, LSM List, Serge E. Hallyn, James Morris,
	Al Viro, Miklos Szeredi, Matthew Bobrowski, Kentaro Takeda,
	Tetsuo Handa, John Johansen

Hi Amir!

On Sun 14-04-19 19:04:14, Amir Goldstein wrote:
> I started to look at directory pre-modification "permission" hooks
> that were discussed on last year's LSFMM:
> https://lwn.net/Articles/755277/
> 
> The two existing fanotify_perm() hooks are called
> from security_file_permission() and security_file_open()
> and depend on build time CONFIG_SECURITY.
> If you look at how the fsnotify_perm() hooks are planted inside the
> generic security hooks, one might wonder, why are fanotify permission
> hooks getting a special treatment and are not registering as LSM hooks?
> 
> One benefit from an fanotify LSM, besides more generic code, would be
> that fanotify permission hooks could be disabled with boot parameters.
> 
> I only bring this up because security hooks seems like the most natural
> place to add pre-modify fanotify events for the purpose of maintaining
> a filesystem change journal. It would be ugly to spray more fsnotify hooks
> inside security hooks instead of registering an fanotify LSM, but maybe
> there are downsides of registering fanotify as LSM that I am not aware of?

I kind of like the idea of generating fanotify permission events from
special LSM hooks.

I'm not so sure about directory pre-modification hooks. Given the amount of
problems we face with applications using fanotify permission events and
deadlocking the system, I'm not very fond of expanding that API... AFAIU
you want to use such hooks for recording (and persisting) that some change
is going to happen and provide crash-consistency guarantees for such
journal?

> Another observation relates to the security_path_ hooks.
> Let's take rename as an example.
> LSM could implement security_path_rename() and/or security_inode_rename()
> hooks and rename syscalls will end up calling both hooks.
> The security_path_ hooks are more attractive for fanotify, because the path
> information could be used to setup pre-modification permission mask on
> mount marks and not only on filesystem/inode marks.
> 
> One problem with security_path_ hooks is that they require an extra
> build time CONFIG_SECURITY_PATH.
> Another problem is that they seem to be bypassed by several subsystems.
> cachefiles, ecryptfs, overlayfs and nfsd all call the vfs_rename() helper, but
> only cachefiles bothers to call the security_path_rename() hook.
> This is of course true for all other security_path_ hooks.
> I think that is something that requires fixing regardless of the fanotify pre
> modification hooks. I wonder if tomoyo and apparmor developers
> (LSM that implement security_path_ hooks) are aware of those missing
> hooks?
> 
> Would love to get feedback about whether or not fanotify LSM sounds
> like a good or bad idea and about the security_path_ hooks questions.

I don't have strong opinion on using security_path_ hooks. I guess if
they're not used everywhere then it's just easier to avoid them.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-16 15:45 ` Jan Kara
@ 2019-04-16 18:24   ` Amir Goldstein
  2019-04-17 11:30     ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2019-04-16 18:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Al Viro, Miklos Szeredi, Matthew Bobrowski,
	LSM List, overlayfs

On Tue, Apr 16, 2019 at 6:45 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi Amir!
>
> On Sun 14-04-19 19:04:14, Amir Goldstein wrote:
> > I started to look at directory pre-modification "permission" hooks
> > that were discussed on last year's LSFMM:
> > https://lwn.net/Articles/755277/
> >
> > The two existing fanotify_perm() hooks are called
> > from security_file_permission() and security_file_open()
> > and depend on build time CONFIG_SECURITY.
> > If you look at how the fsnotify_perm() hooks are planted inside the
> > generic security hooks, one might wonder, why are fanotify permission
> > hooks getting a special treatment and are not registering as LSM hooks?
> >
> > One benefit from an fanotify LSM, besides more generic code, would be
> > that fanotify permission hooks could be disabled with boot parameters.
> >
> > I only bring this up because security hooks seems like the most natural
> > place to add pre-modify fanotify events for the purpose of maintaining
> > a filesystem change journal. It would be ugly to spray more fsnotify hooks
> > inside security hooks instead of registering an fanotify LSM, but maybe
> > there are downsides of registering fanotify as LSM that I am not aware of?
>
> I kind of like the idea of generating fanotify permission events from
> special LSM hooks.
>

Cool, I think that all we really need to do is call security_add_hooks().
[reducing LSM CC list]

> I'm not so sure about directory pre-modification hooks. Given the amount of
> problems we face with applications using fanotify permission events and
> deadlocking the system, I'm not very fond of expanding that API... AFAIU
> you want to use such hooks for recording (and persisting) that some change
> is going to happen and provide crash-consistency guarantees for such
> journal?
>

That's the general idea.
I have two use cases for pre-modification hooks:
1. VFS level snapshots
2. persistent change tracking

TBH, I did not consider implementing any of the above in userspace,
so I do not have a specific interest in extending the fanotify API.
I am actually interested in pre-modify fsnotify hooks (not fanotify),
that a snapshot or change tracking subsystem can register with.
An in-kernel fsnotify event handler can set a flag in current task
struct to circumvent system deadlocks on nested filesystem access.

My current implementation of overlayfs snapshots [1] uses a stackable
filesystem (a.k.a. snapshot fs) as a means for pre-modify hooks.
This approach has some advantages and some disadvantages
compared to fsnotify pre-modify hooks.

With fsnotify pre-modify hooks it would be possible to protect
the underlying filesystem from un-monitored changes even when
filesystem is accessed from another mount point or another namespace.

As a matter of fact, the protection to underlying filesystem changes
needed for overlayfs snapshots is also useful for standard overlayfs -
Modification to underlying overlayfs layers are strongly discouraged,
but there is nothing preventing the user from making such modifications.
If overlayfs were to register for fsnotify pre-modify hooks on underlying
filesystem, it could prevent users from modifying underlying layers.

And not only that - because security_inode_rename() hook is called
with s_vfs_rename_mutex held, it is safe to use d_ancestor() to
prevent renames in and out of overlay layer subtrees.
With that protection in place, it is safe (?) to use is_subdir() in other
hooks to check if an object is under an overlayfs layer subtree,
because the entire ancestry below the layers roots is stable.

Will see if I can sketch a POC.

Thanks,
Amir.

[1] https://github.com/amir73il/overlayfs/wiki/Overlayfs-snapshots

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-16 18:24   ` Amir Goldstein
@ 2019-04-17 11:30     ` Jan Kara
  2019-04-17 12:14       ` Miklos Szeredi
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2019-04-17 11:30 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Al Viro, Miklos Szeredi,
	Matthew Bobrowski, LSM List, overlayfs

On Tue 16-04-19 21:24:44, Amir Goldstein wrote:
> > I'm not so sure about directory pre-modification hooks. Given the amount of
> > problems we face with applications using fanotify permission events and
> > deadlocking the system, I'm not very fond of expanding that API... AFAIU
> > you want to use such hooks for recording (and persisting) that some change
> > is going to happen and provide crash-consistency guarantees for such
> > journal?
> >
> 
> That's the general idea.
> I have two use cases for pre-modification hooks:
> 1. VFS level snapshots
> 2. persistent change tracking
> 
> TBH, I did not consider implementing any of the above in userspace,
> so I do not have a specific interest in extending the fanotify API.
> I am actually interested in pre-modify fsnotify hooks (not fanotify),
> that a snapshot or change tracking subsystem can register with.
> An in-kernel fsnotify event handler can set a flag in current task
> struct to circumvent system deadlocks on nested filesystem access.

OK, I'm not opposed to fsnotify pre-modify hooks as such. As long as
handlers stay within the kernel, I'm fine with that. After all this is what
LSMs are already doing. Just exposing this to userspace for arbitration is
what I have a problem with.

> My current implementation of overlayfs snapshots [1] uses a stackable
> filesystem (a.k.a. snapshot fs) as a means for pre-modify hooks.
> This approach has some advantages and some disadvantages
> compared to fsnotify pre-modify hooks.
> 
> With fsnotify pre-modify hooks it would be possible to protect
> the underlying filesystem from un-monitored changes even when
> filesystem is accessed from another mount point or another namespace.
> 
> As a matter of fact, the protection to underlying filesystem changes
> needed for overlayfs snapshots is also useful for standard overlayfs -
> Modification to underlying overlayfs layers are strongly discouraged,
> but there is nothing preventing the user from making such modifications.
> If overlayfs were to register for fsnotify pre-modify hooks on underlying
> filesystem, it could prevent users from modifying underlying layers.
> 
> And not only that - because security_inode_rename() hook is called
> with s_vfs_rename_mutex held, it is safe to use d_ancestor() to
> prevent renames in and out of overlay layer subtrees.
> With that protection in place, it is safe (?) to use is_subdir() in other
> hooks to check if an object is under an overlayfs layer subtree,
> because the entire ancestry below the layers roots is stable.

Uf, OK, should be. But it looks subtle. Not sure what Al will say about it
;).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-17 11:30     ` Jan Kara
@ 2019-04-17 12:14       ` Miklos Szeredi
  2019-04-17 14:05         ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2019-04-17 12:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, linux-fsdevel, Al Viro, Matthew Bobrowski,
	LSM List, overlayfs

On Wed, Apr 17, 2019 at 1:30 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 16-04-19 21:24:44, Amir Goldstein wrote:
> > > I'm not so sure about directory pre-modification hooks. Given the amount of
> > > problems we face with applications using fanotify permission events and
> > > deadlocking the system, I'm not very fond of expanding that API... AFAIU
> > > you want to use such hooks for recording (and persisting) that some change
> > > is going to happen and provide crash-consistency guarantees for such
> > > journal?
> > >
> >
> > That's the general idea.
> > I have two use cases for pre-modification hooks:
> > 1. VFS level snapshots
> > 2. persistent change tracking
> >
> > TBH, I did not consider implementing any of the above in userspace,
> > so I do not have a specific interest in extending the fanotify API.
> > I am actually interested in pre-modify fsnotify hooks (not fanotify),
> > that a snapshot or change tracking subsystem can register with.
> > An in-kernel fsnotify event handler can set a flag in current task
> > struct to circumvent system deadlocks on nested filesystem access.
>
> OK, I'm not opposed to fsnotify pre-modify hooks as such. As long as
> handlers stay within the kernel, I'm fine with that. After all this is what
> LSMs are already doing. Just exposing this to userspace for arbitration is
> what I have a problem with.

There's one more usecase that I'd like to explore: providing coherent
view of host filesystem in virtualized environments.  This requires
that guest is synchronously notified when the host filesystem changes.
  I do agree, however, that adding sync hooks to userspace is
problematic.

One idea would be to use shared memory instead of a procedural
notification.  I.e. application (hypervisor) registers a pointer to a
version number that the kernel associates with the given inode.  When
the inode is changed, then the version number is incremented.  The
guest kernel can then look at the version number when verifying cache
validity.   That way perfect coherency is guaranteed between host and
guest filesystems without allowing a broken guest or even a broken
hypervisor to DoS the host.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-17 12:14       ` Miklos Szeredi
@ 2019-04-17 14:05         ` Jan Kara
  2019-04-17 14:14           ` Miklos Szeredi
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2019-04-17 14:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, Al Viro,
	Matthew Bobrowski, LSM List, overlayfs

On Wed 17-04-19 14:14:58, Miklos Szeredi wrote:
> On Wed, Apr 17, 2019 at 1:30 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 16-04-19 21:24:44, Amir Goldstein wrote:
> > > > I'm not so sure about directory pre-modification hooks. Given the amount of
> > > > problems we face with applications using fanotify permission events and
> > > > deadlocking the system, I'm not very fond of expanding that API... AFAIU
> > > > you want to use such hooks for recording (and persisting) that some change
> > > > is going to happen and provide crash-consistency guarantees for such
> > > > journal?
> > > >
> > >
> > > That's the general idea.
> > > I have two use cases for pre-modification hooks:
> > > 1. VFS level snapshots
> > > 2. persistent change tracking
> > >
> > > TBH, I did not consider implementing any of the above in userspace,
> > > so I do not have a specific interest in extending the fanotify API.
> > > I am actually interested in pre-modify fsnotify hooks (not fanotify),
> > > that a snapshot or change tracking subsystem can register with.
> > > An in-kernel fsnotify event handler can set a flag in current task
> > > struct to circumvent system deadlocks on nested filesystem access.
> >
> > OK, I'm not opposed to fsnotify pre-modify hooks as such. As long as
> > handlers stay within the kernel, I'm fine with that. After all this is what
> > LSMs are already doing. Just exposing this to userspace for arbitration is
> > what I have a problem with.
> 
> There's one more usecase that I'd like to explore: providing coherent
> view of host filesystem in virtualized environments.  This requires
> that guest is synchronously notified when the host filesystem changes.
>   I do agree, however, that adding sync hooks to userspace is
> problematic.
> 
> One idea would be to use shared memory instead of a procedural
> notification.  I.e. application (hypervisor) registers a pointer to a
> version number that the kernel associates with the given inode.  When
> the inode is changed, then the version number is incremented.  The
> guest kernel can then look at the version number when verifying cache
> validity.   That way perfect coherency is guaranteed between host and
> guest filesystems without allowing a broken guest or even a broken
> hypervisor to DoS the host.

Well, statx() and looking at i_version can do this for you. So I guess
that's too slow for your purposes? Also how many inodes do you want to
monitor like this?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-17 14:05         ` Jan Kara
@ 2019-04-17 14:14           ` Miklos Szeredi
  2019-04-18 10:53             ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2019-04-17 14:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, linux-fsdevel, Al Viro, Matthew Bobrowski,
	LSM List, overlayfs

On Wed, Apr 17, 2019 at 4:06 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 17-04-19 14:14:58, Miklos Szeredi wrote:
> > On Wed, Apr 17, 2019 at 1:30 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Tue 16-04-19 21:24:44, Amir Goldstein wrote:
> > > > > I'm not so sure about directory pre-modification hooks. Given the amount of
> > > > > problems we face with applications using fanotify permission events and
> > > > > deadlocking the system, I'm not very fond of expanding that API... AFAIU
> > > > > you want to use such hooks for recording (and persisting) that some change
> > > > > is going to happen and provide crash-consistency guarantees for such
> > > > > journal?
> > > > >
> > > >
> > > > That's the general idea.
> > > > I have two use cases for pre-modification hooks:
> > > > 1. VFS level snapshots
> > > > 2. persistent change tracking
> > > >
> > > > TBH, I did not consider implementing any of the above in userspace,
> > > > so I do not have a specific interest in extending the fanotify API.
> > > > I am actually interested in pre-modify fsnotify hooks (not fanotify),
> > > > that a snapshot or change tracking subsystem can register with.
> > > > An in-kernel fsnotify event handler can set a flag in current task
> > > > struct to circumvent system deadlocks on nested filesystem access.
> > >
> > > OK, I'm not opposed to fsnotify pre-modify hooks as such. As long as
> > > handlers stay within the kernel, I'm fine with that. After all this is what
> > > LSMs are already doing. Just exposing this to userspace for arbitration is
> > > what I have a problem with.
> >
> > There's one more usecase that I'd like to explore: providing coherent
> > view of host filesystem in virtualized environments.  This requires
> > that guest is synchronously notified when the host filesystem changes.
> >   I do agree, however, that adding sync hooks to userspace is
> > problematic.
> >
> > One idea would be to use shared memory instead of a procedural
> > notification.  I.e. application (hypervisor) registers a pointer to a
> > version number that the kernel associates with the given inode.  When
> > the inode is changed, then the version number is incremented.  The
> > guest kernel can then look at the version number when verifying cache
> > validity.   That way perfect coherency is guaranteed between host and
> > guest filesystems without allowing a broken guest or even a broken
> > hypervisor to DoS the host.
>
> Well, statx() and looking at i_version can do this for you. So I guess
> that's too slow for your purposes?

Okay, missing piece of information: we want to make use of the dcache
and icache in the guest kernel, otherwise lookup/stat will be
painfully slow.  That would preclude doing statx() or anything else
that requires a synchronous round trip to the host for the likely case
of a valid cache.

> Also how many inodes do you want to
> monitor like this?

Everything that's in the guest caches.  Which means: a lot.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fanotify and LSM path hooks
  2019-04-17 14:14           ` Miklos Szeredi
@ 2019-04-18 10:53             ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2019-04-18 10:53 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, Al Viro,
	Matthew Bobrowski, LSM List, overlayfs

On Wed 17-04-19 16:14:32, Miklos Szeredi wrote:
> On Wed, Apr 17, 2019 at 4:06 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 17-04-19 14:14:58, Miklos Szeredi wrote:
> > > On Wed, Apr 17, 2019 at 1:30 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Tue 16-04-19 21:24:44, Amir Goldstein wrote:
> > > > > > I'm not so sure about directory pre-modification hooks. Given the amount of
> > > > > > problems we face with applications using fanotify permission events and
> > > > > > deadlocking the system, I'm not very fond of expanding that API... AFAIU
> > > > > > you want to use such hooks for recording (and persisting) that some change
> > > > > > is going to happen and provide crash-consistency guarantees for such
> > > > > > journal?
> > > > > >
> > > > >
> > > > > That's the general idea.
> > > > > I have two use cases for pre-modification hooks:
> > > > > 1. VFS level snapshots
> > > > > 2. persistent change tracking
> > > > >
> > > > > TBH, I did not consider implementing any of the above in userspace,
> > > > > so I do not have a specific interest in extending the fanotify API.
> > > > > I am actually interested in pre-modify fsnotify hooks (not fanotify),
> > > > > that a snapshot or change tracking subsystem can register with.
> > > > > An in-kernel fsnotify event handler can set a flag in current task
> > > > > struct to circumvent system deadlocks on nested filesystem access.
> > > >
> > > > OK, I'm not opposed to fsnotify pre-modify hooks as such. As long as
> > > > handlers stay within the kernel, I'm fine with that. After all this is what
> > > > LSMs are already doing. Just exposing this to userspace for arbitration is
> > > > what I have a problem with.
> > >
> > > There's one more usecase that I'd like to explore: providing coherent
> > > view of host filesystem in virtualized environments.  This requires
> > > that guest is synchronously notified when the host filesystem changes.
> > >   I do agree, however, that adding sync hooks to userspace is
> > > problematic.
> > >
> > > One idea would be to use shared memory instead of a procedural
> > > notification.  I.e. application (hypervisor) registers a pointer to a
> > > version number that the kernel associates with the given inode.  When
> > > the inode is changed, then the version number is incremented.  The
> > > guest kernel can then look at the version number when verifying cache
> > > validity.   That way perfect coherency is guaranteed between host and
> > > guest filesystems without allowing a broken guest or even a broken
> > > hypervisor to DoS the host.
> >
> > Well, statx() and looking at i_version can do this for you. So I guess
> > that's too slow for your purposes?
> 
> Okay, missing piece of information: we want to make use of the dcache
> and icache in the guest kernel, otherwise lookup/stat will be
> painfully slow.  That would preclude doing statx() or anything else
> that requires a synchronous round trip to the host for the likely case
> of a valid cache.

Ok, understood.
 
> > Also how many inodes do you want to
> > monitor like this?
> 
> Everything that's in the guest caches.  Which means: a lot.

Yeah, but that would mean also non-trivial amount of memory pinned for this
communication channel... And AFAIU the cost of invalidation going to the
guest isn't so critical as that isn't expected to be that frequent. It is
only the cost of 'is_valid' check that needs to be low to make the caching
in the guest useful. So won't it be better if we just had some kind of ring
buffer for invalidation events going from host to guest, host could just
queue event there (i.e., event processing from the host would have well
bounded time like processing normal fsnotify events has), guest would be
consuming events and updating its validity information. If the buffer
overflows, it means "invalidate all" which is expensive for the guest but
if buffers are reasonably sized, it should not happen frequently...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-18 10:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-14 16:04 fanotify and LSM path hooks Amir Goldstein
2019-04-14 16:39 ` Al Viro
2019-04-14 18:51   ` Amir Goldstein
2019-04-14 19:26     ` Al Viro
2019-04-14 20:28       ` Amir Goldstein
2019-04-16 15:45 ` Jan Kara
2019-04-16 18:24   ` Amir Goldstein
2019-04-17 11:30     ` Jan Kara
2019-04-17 12:14       ` Miklos Szeredi
2019-04-17 14:05         ` Jan Kara
2019-04-17 14:14           ` Miklos Szeredi
2019-04-18 10:53             ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox