Re: [GIT PULL] notification tree: directory events

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [GIT PULL] notification tree: directory events
       [not found]   ` <1282230012.21419.1566.camel@acb20005.ipt.aol.com>
@ 2010-08-19 23:41     ` Andreas Gruenbacher
  2010-08-20  3:38       ` Eric Paris
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 23:41 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

[Adding linux-fsdevel to the list as this really is a filesystem discussion.]

On Thursday 19 August 2010 17:00:12 Eric Paris wrote:
> On Thu, 2010-08-19 at 14:44 +0200, Andreas Gruenbacher wrote:
> > On Wednesday 18 August 2010 17:59:06 Eric Paris wrote:
> > 
> > The half-thought-out directory events are not part of this subset though.
> > They are not useful in their own right and only generate overheads, and
> > much worse, they could even get in the way when proper directory event
> > support is eventually added.  So that part should really be removed
> > ASAP.
> 
> They aren't something I specifically added so you can call them
> zero-thought-out.  fanotify is just a userspace interface on top of the
> notification infrastructure in the kernel.  The events the
> infrastructure delivers are the events it sends.

Apparently the infrastructure delivers a number of directory events like file 
creation, deletion, and renames through inotify that are not delivered through 
fanotify, and fanotify only sees a subset of all directory events.  The result 
is that directory events for inotify are useful because they allow to watch a 
directory for changes, and the directory events in fanotify are not useful for 
that right now.

>[...]
>> Your changes could be as simple as defining a new flag and then add an
> if (flag && S_IFDIR(i_mode)) to the fanotify_should_send_event handler.
> I'd love to hear objections, failings, short comings, or suggestions if
> you think the interface is inadequate to address these or other needs.

I don't think the events that fanotify delivers for directories (open_perm, 
open, access_perm, close) are useful at all, or that we should allow perm 
events for directories.

So let's please get rid of those directory events unconditionally now, and add 
them back in their proper, final form later, after 2.6.36.

> > We are not talking about Eric's own private syscalls here.  Things we
> > screw up now may be very hard or impossible to fix later; syscalls don't
> > just change from release to release.
> 
> Which is why there was so much discussion and reworking of the
> interface.  It went through many iterations to end up here.  What all
> did we have in those discussions?  2 magic proc files, ioctls on a char
> dev, netlink, a socket with get/setsockopt and eventually we landed on 2
> syscalls that noone found fault with.

Unfortunately there wasn't a lot of discussion about which events should be 
generated when.  Fanotify was merged before turning into a superset of 
inotify, and there was even less discussion about how fanotify should look if 
it isn't a superset of inotify.

> > This also applies to the error reporting mess I have commented on.  At
> > least the interface should be changed so that it can report a valid file
> > descriptor and an error condition at the same time; otherwise, we will
> > end up with a weakness in the interface that we won't be able to fix.
> 
> Can you describe your problem for me again.  I'm not sure I understand
> your request.  I don't understand how we have an error and a valid fd at
> the same time.

Yes.  Right now, struct fanotify_event_metadata contains a file descriptor 
which is the only information we have about the object the event refers to, or 
a negative error code if a file descriptor could not be opened with 
dentry_open() or some other error occurred.

Being able to identify the object that an event refers to is important.  There 
are two ways to do that:

	(1) Include fields like st_dev and st_ino in struct
	    fanotify_event_metadata.  This is not ideal because the listener
	    won't be able to find out any additional information about the file
	    (like an idea about its name, for example).

		 For example, if inode->i_generation is ever exposed to user-space
	    through stat(), that information would still be missing in struct
	    fanotify_event_metadata.

	(2) Construct a file descriptor that refers to the file that could not be
	    dentry_open()ed, but which cannot be used for any I/O, and pass the
	    error condition in a separate field.  The kernel has all the
	    information needed for doing that, and it shouldn't be hard to
	    implement.

	    That way, the listener always has a file descriptor to poke around
	    with.

Failing to do (2) right now, I think it still makes sense to separate the file 
descriptor from the error code in struct fanotify_event_metadata; this would 
enable us to do (2) later if we decide to.

> I understand you want new features but I'm not seeing failures of the
> interface to be forward looking or failures in the feature set that is
> provided.

I do see failures of being forward looking, inconsistencies in the feature set 
which might lead to trouble as we extend fanotify in the future, and bugs that 
would have been shaken out in a proper review (OOMing the system when a 
listener stops listening or blocking access to files when a listener dies; I 
have reported that).

Sorry to say, but this code really should not have been merged yet.  (And mind 
I'm not saying this because I want to block fanotify from making progress -- 
quite the contrary.)

Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
       [not found]   ` <201008171009.51737.agruen@suse.de>
@ 2010-08-20  0:00     ` Andreas Gruenbacher
  0 siblings, 0 replies; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20  0:00 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

[Adding linux-fsdevel here as well.]

On Tuesday 17 August 2010 10:09:50 Andreas Gruenbacher wrote:
> > > Q: What prevents the system from going out of memory when a listener
> > > decides to stop reading events or simply can't keep up?  There doesn't
> > > seem to be a limit on the queue depth.  Listeners currently need
> > > CAP_SYS_ADMIN, but somehow limiting the queue depth and throttling when
> > > things start to go bad still sounds like a reasonable thing to do,
> > > right?
> > 
> > It's an interesting question and obviously one that I've thought about.
> > You remember when we talked previously I said the hardest part left was
> > allowing non-root users to use the interface.  It gets especially
> > difficult when thinking about perm-events.  I was specifically told not
> > to timeout or drop those.  But when dealing with non-root users using
> > perm events?   As for pure notification we can do something like inotify
> > does quite easily.
> > 
> > I'm not certain exactly what the best semantics are for non trusted
> > users, so I didn't push any patches that way.  Suggestions welcome   :)
> 
> The system will happily go OOM for trusted users and non-perm events if the
> listener doesn't keep up, so some throttling, dropping, or both needs to
> happen for non-perm events.  This is the critical case.  Doing what inotify
> does (queue an overflow event and drop further events) seems to make sense
> here.
> 
> The situation with perm-events is less severe because the number of
> outstanding perm events is bounded by the number of running processes. 
> This may be enough of a limit.
> 
> I don't think we need to worry about perm-events for untrusted users.  We
> can start supporting some kinds of non-perm-events for untrusted users
> later; this won't change the existing interface.

Another case where fanotify fails to generate useful events is when a listener 
runs out of file descriptors; events will simply end up with fd == -EMFILE in 
that case.  I don't think this behavior is useful; instead, reading from the 
fanotify file descriptor (he one returned by fanotify_init()) should fail to 
give the listener a chance to react.

Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-19 23:41     ` [GIT PULL] notification tree: directory events Andreas Gruenbacher
@ 2010-08-20  3:38       ` Eric Paris
  2010-08-20  5:19         ` Andreas Dilger
                           ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Eric Paris @ 2010-08-20  3:38 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
> [Adding linux-fsdevel to the list as this really is a filesystem discussion.]

> Yes.  Right now, struct fanotify_event_metadata contains a file descriptor 
> which is the only information we have about the object the event refers to, or 
> a negative error code if a file descriptor could not be opened with 
> dentry_open() or some other error occurred.
> 
> Being able to identify the object that an event refers to is important.  There 
> are two ways to do that:
> 
> 	(1) Include fields like st_dev and st_ino in struct
> 	    fanotify_event_metadata.  This is not ideal because the listener
> 	    won't be able to find out any additional information about the file
> 	    (like an idea about its name, for example).
> 
> 		 For example, if inode->i_generation is ever exposed to user-space
> 	    through stat(), that information would still be missing in struct
> 	    fanotify_event_metadata.
> 
> 	(2) Construct a file descriptor that refers to the file that could not be
> 	    dentry_open()ed, but which cannot be used for any I/O, and pass the
> 	    error condition in a separate field.  The kernel has all the
> 	    information needed for doing that, and it shouldn't be hard to
> 	    implement.
> 
> 	    That way, the listener always has a file descriptor to poke around
> 	    with.
> 
> Failing to do (2) right now, I think it still makes sense to separate the file 
> descriptor from the error code in struct fanotify_event_metadata; this would 
> enable us to do (2) later if we decide to.

In reference to (2), I don't even understand what an fd is that can't be
used for anything.  I'll let Al or Christoph respond if they feel like
it, but it sounds crazy to me.  You want to just magic up some struct
file and attach a struct path to it even though dentry_open() failed?
So you can do what with it?

On a more realistic note, I'm not opposed to (1), however, your
arguments would lead one to reject inotify as the IN_OVERFLOW or oom
conditions will result in silently lost events or events which provide
no useful information since the notification system has broken down.
When the appropriate use of notification is impossible I'm certainly not
opposed to patches which add best effort information, but you are
already outside the bounds of a reasonably functional system and there
is no good solution.

> bugs that 
> would have been shaken out in a proper review (OOMing the system when a 
> listener stops listening or blocking access to files when a listener dies; I 
> have reported that).

So the actual bugs you have reported, I see two.

The (nearly) unbounded number of potential outstanding notifications
events is a known situation, pointed out in previous discussions well
before this commit and is one of the (numerous) reasons why fanotify is
at this time CAP_SYS_ADMIN only.  It is something that is difficult to
address while still making fanotify useful for permissions gating.  But
the issue is clearly noted.

As to when a listener dies:  You have to define 'dies'.  If the process
just stops respond it is supposed to lock forever.  I was explicitly ask
to remove timeouts (even though I've already been ask off list to put
them back)  In Linus' tree there is a race in which both processes (the
listener and the process doing an action waiting on the listener) can
deadlock and hang forever.  A (much less racy but not right) patch to
fix this has already been posted.  Do you have comments on that patch?
I have a better version which has worked well for me today which I will
likely post tomorrow morning after I look over it again.  I'd love your
feedback.

-Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38       ` Eric Paris
@ 2010-08-20  5:19         ` Andreas Dilger
  2010-08-20  9:21           ` Christoph Hellwig
  2010-08-20  9:09         ` Tvrtko Ursulin
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Andreas Dilger @ 2010-08-20  5:19 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk, linux-fsdevel

On 2010-08-19, at 21:38, Eric Paris wrote:
> On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
>> Being able to identify the object that an event refers to is important.  There 
>> are two ways to do that:
>> 
>> 	(1) Include fields like st_dev and st_ino in struct
>> 	    fanotify_event_metadata.
> 
> On a more realistic note, I'm not opposed to (1), however, your
> arguments would lead one to reject inotify as the IN_OVERFLOW or oom
> conditions will result in silently lost events or events which provide
> no useful information since the notification system has broken down.
> When the appropriate use of notification is impossible I'm certainly not
> opposed to patches which add best effort information, but you are
> already outside the bounds of a reasonably functional system and there
> is no good solution.

What about unifying the file identification here with the file handle used for open_by_handle()?  That keeps a consistent identifier for the inode between multiple system calls (always a good thing), and allows the inode to be opened again via open_by_handle() if needed.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38       ` Eric Paris
  2010-08-20  5:19         ` Andreas Dilger
@ 2010-08-20  9:09         ` Tvrtko Ursulin
  2010-08-20 11:07         ` Andreas Gruenbacher
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2010-08-20  9:09 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	viro@zeniv.linux.org.uk, akpm@linux-foundation.org,
	Michael Kerrisk, linux-fsdevel@vger.kernel.org

On Friday 20 Aug 2010 04:38:17 Eric Paris wrote:
> On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
> >     (2) Construct a file descriptor that refers to the file that could not
> >     be
> >
> >         dentry_open()ed, but which cannot be used for any I/O, and pass
the
> >         error condition in a separate field.  The kernel has all the
> >         information needed for doing that, and it shouldn't be hard to
> >         implement.
> >
> >         That way, the listener always has a file descriptor to poke around
> >         with.
> >
> > Failing to do (2) right now, I think it still makes sense to separate the
> > file descriptor from the error code in struct fanotify_event_metadata;
> > this would enable us to do (2) later if we decide to.
>
> In reference to (2), I don't even understand what an fd is that can't be
> used for anything.  I'll let Al or Christoph respond if they feel like
> it, but it sounds crazy to me.  You want to just magic up some struct
> file and attach a struct path to it even though dentry_open() failed?
> So you can do what with it?

I think Andreas would like to get a path even if open failed so it could be
for that use? If creating such file object will be impossible or too hacky,
but the general idea accepted, it may be better just to attach the path to the
event.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  5:19         ` Andreas Dilger
@ 2010-08-20  9:21           ` Christoph Hellwig
  2010-08-20 15:29             ` Andreas Gruenbacher
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2010-08-20  9:21 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Eric Paris, Andreas Gruenbacher, Christoph Hellwig, Matt Helsley,
	torvalds, linux-kernel, viro, akpm, Michael Kerrisk,
	linux-fsdevel

On Thu, Aug 19, 2010 at 11:19:07PM -0600, Andreas Dilger wrote:
> What about unifying the file identification here with the file handle used for open_by_handle()?  That keeps a consistent identifier for the inode between multiple system calls (always a good thing), and allows the inode to be opened again via open_by_handle() if needed.

That's what the dmapi callouts that are intendeded to do just about the
same as fanotify always did.  I'm pretty sure I brought this up earlier
in the discussion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38       ` Eric Paris
  2010-08-20  5:19         ` Andreas Dilger
  2010-08-20  9:09         ` Tvrtko Ursulin
@ 2010-08-20 11:07         ` Andreas Gruenbacher
  2010-08-20 11:25         ` Andreas Gruenbacher
  2010-08-20 12:16         ` Andreas Gruenbacher
  4 siblings, 0 replies; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 11:07 UTC (permalink / raw)
  To: Eric Paris, David Howells
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 05:38:17 Eric Paris wrote:
> On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
> > 	(2) Construct a file descriptor that refers to the file that could not be
> > 	    dentry_open()ed, but which cannot be used for any I/O, and pass the
> > 	    error condition in a separate field.  The kernel has all the
> > 	    information needed for doing that, and it shouldn't be hard to
> > 	    implement.
> > 	    
> > 	    That way, the listener always has a file descriptor to poke around
> > 	    with.
> > 
> > Failing to do (2) right now, I think it still makes sense to separate the
> > file descriptor from the error code in struct fanotify_event_metadata;
> > this would enable us to do (2) later if we decide to.
> 
> In reference to (2), I don't even understand what an fd is that can't be
> used for anything.  I'll let Al or Christoph respond if they feel like
> it, but it sounds crazy to me.  You want to just magic up some struct
> file and attach a struct path to it even though dentry_open() failed?
> So you can do what with it?

I've never said the fd can't be used for anything.  The idea would be to
allow the file to be stat()ed, to poke around in /proc/fd/ in the hope to
get a somewhat useful pathname out, or to use something like xstat() on the
fd one day (David Howells is going to be looking into that).

It would be a good thing to be able to use the same mechanisms for
identifying the file that are available for a "normal" file descriptor.

> On a more realistic note, I'm not opposed to (1), however, your
> arguments would lead one to reject inotify as the IN_OVERFLOW or oom
> conditions will result in silently lost events or events which provide
> no useful information since the notification system has broken down.

Nonsense.  An overflow event obviously is not associated with any one file,
so that would be the (only) case where you should get an event that doesn't
refer to a file.

Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38       ` Eric Paris
                           ` (2 preceding siblings ...)
  2010-08-20 11:07         ` Andreas Gruenbacher
@ 2010-08-20 11:25         ` Andreas Gruenbacher
  2010-08-20 12:16         ` Andreas Gruenbacher
  4 siblings, 0 replies; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 11:25 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 05:38:17 Eric Paris wrote:
> So the actual bugs you have reported, I see two.
> 
> The (nearly) unbounded number of potential outstanding notifications
> events is a known situation, pointed out in previous discussions well
> before this commit and is one of the (numerous) reasons why fanotify is
> at this time CAP_SYS_ADMIN only.  It is something that is difficult to
> address while still making fanotify useful for permissions gating.  But
> the issue is clearly noted.

Clearly noting and blissfully ignoring the problem is not enough
unfortunately; this needs to be addressed now.  I have already pointed out 
(quoted below) that permission gating is not the worst problem here; the worst 
problem are listeners whose fanotify event queue just grows and grows.  There 
is no throttling, and no guarantee that even a listener which simply reads and 
completely ignores all events will manage to keep up.  The system will run out 
of memory eventually.

Here is a quote from a previous message about this problem that only went to 
linux-kernel:

On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > Q: What prevents the system from going out of memory when a listener
> > decides to stop reading events or simply can't keep up?  There doesn't
> > seem to be a limit on the queue depth.  Listeners currently need
> > CAP_SYS_ADMIN, but somehow limiting the queue depth and throttling when
> > things start to go bad still sounds like a reasonable thing to do,
> > right?
> 
> It's an interesting question and obviously one that I've thought about.
> You remember when we talked previously I said the hardest part left was
> allowing non-root users to use the interface.  It gets especially
> difficult when thinking about perm-events.  I was specifically told not
> to timeout or drop those.  But when dealing with non-root users using
> perm events?   As for pure notification we can do something like inotify
> does quite easily.
> 
> I'm not certain exactly what the best semantics are for non trusted
> users, so I didn't push any patches that way.  Suggestions welcome   :)

The system will happily go OOM for trusted users and non-perm events if the 
listener doesn't keep up, so some throttling, dropping, or both needs to 
happen for non-perm events.  This is the critical case.  Doing what inotify 
does (queue an overflow event and drop further events) seems to make sense 
here.

The situation with perm-events is less severe because the number of 
outstanding perm events is bounded by the number of running processes.  This 
may be enough of a limit.

I don't think we need to worry about perm-events for untrusted users.  We can 
start supporting some kinds of non-perm-events for untrusted users later; this 
won't change the existing interface.

Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38       ` Eric Paris
                           ` (3 preceding siblings ...)
  2010-08-20 11:25         ` Andreas Gruenbacher
@ 2010-08-20 12:16         ` Andreas Gruenbacher
  4 siblings, 0 replies; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 12:16 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 05:38:17 Eric Paris wrote:
> As to when a listener dies:  You have to define 'dies'.

I meant a process that gets killed or simply exits while there are outstanding 
perm events.  Nothing in the code would wake up the processes blocked on the 
perm event; they will simply be stuck forever.  (They cannot even be killed 
with SIGKILL.)

This is easy to reproduce with a listener that is killed while processing a 
perm event: just run the fanotify example [1] with the new -s option and hit 
^C while it is sleeping.

Bonus points would be awarded to make a process blocked on a listener killable 
with SIGKILL or other lethal signals.

[1] http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git

The listener will also hang forever in that case; not sure why that is the 
case.

I already told you about this before (http://lkml.org/lkml/2010/8/16/349); not 
sure why everything needs to repeated multiple times until it sinks in.

> If the process just stops respond it is supposed to lock forever.

I agree.

> I was explicitly ask to remove timeouts (even though I've already been ask
> off list to put them back)

I disagree with putting timeouts back in.

> In Linus' tree there is a race in which both processes (the
> listener and the process doing an action waiting on the listener) can
> deadlock and hang forever.  A (much less racy but not right) patch to
> fix this has already been posted.
>
> I have a better version which has worked well for me today which I will
> likely post tomorrow morning after I look over it again.  I'd love your
> feedback.

Do you have a pointer to that?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
       [not found]   ` <1282276236.21419.2101.camel@acb20005.ipt.aol.com>
@ 2010-08-20 12:38     ` Andreas Gruenbacher
  2010-08-23 16:46       ` Eric Paris
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 12:38 UTC (permalink / raw)
  To: Eric Paris, linux-fsdevel
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Friday 20 August 2010 05:50:36 Eric Paris wrote:
> We must be doing something different...  What kernel?  what kconfig?
> What exact FS setup?  What exact steps are you taking?  What programs
> are you using to test east side?

I'm runnning 2.6.36-rc1, with CONFIG_FANOTIFY and 
CONFIG_FANOTIFY_ACCESS_PERMISSIONS on apparently.  I am watching the same 
directory with inotify and fanotify at the same time, that is, with both an 
inotify and an fanotify listener running in two separate processes.  The 
inotify listener is code I cannot send so easily, but I've shown the resulting 
strace.  The fanotify listener is the one from [1].

	[1] http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git

Together with the traces I've provided this should give you way enough clues 
to be able to look up in the code why listening for fanotify events apparently 
causes a concurrent inotify listener to return an inotify event with struct 
inotify_event->mask == 0 for each fanotify perm event.

Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  9:21           ` Christoph Hellwig
@ 2010-08-20 15:29             ` Andreas Gruenbacher
  2010-08-20 20:39               ` Andreas Dilger
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 15:29 UTC (permalink / raw)
  To: Christoph Hellwig, Andreas Dilger
  Cc: Eric Paris, Matt Helsley, torvalds, linux-kernel, viro, akpm,
	Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 11:21:27 Christoph Hellwig wrote:
> On Thu, Aug 19, 2010 at 11:19:07PM -0600, Andreas Dilger wrote:
> > What about unifying the file identification here with the file handle
> > used for open_by_handle()?  That keeps a consistent identifier for the
> > inode between multiple system calls (always a good thing), and allows
> > the inode to be opened again via open_by_handle() if needed.
> 
> That's what the dmapi callouts that are intendeded to do just about the
> same as fanotify always did.  I'm pretty sure I brought this up earlier
> in the discussion.

I remember you bringing this up.

The persistent handles require CAP_DAC_READ_SEARCH for using open_by_handle() 
to prevent an unprivileged process from forging handles and bypassing 
directory permission checks.  This would be okay for users of fanotify which 
can listen to all events in their namespace (and which requires CAP_SYS_ADMIN 
anyway).

I don't see how open_by_handle() could be made safe to use by arbitrary 
processes; I don't think we can make the handles cryptographically strong, for 
example.  But I may be overlooking something here.

[Side note: checking for CAP_DAC_READ_SEARCH in fanotify would probably be 
enough when no perm events are involved because dentry_open() performs tail 
permission checks anyway.]

In the future it will make sense to extend fanotify to allow unprivileged 
processes to listen to "their own" events, for example, like inotify does 
today.  (You know that providing a better inotify replacement was one of the 
goals of fanotify before it got merged anyway.)  Unprivileged processes 
wouldn't be allowed to use open_by_handle() anymore though, and so file 
handles still look like a better choice for fanotify to me.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20 15:29             ` Andreas Gruenbacher
@ 2010-08-20 20:39               ` Andreas Dilger
  0 siblings, 0 replies; 14+ messages in thread
From: Andreas Dilger @ 2010-08-20 20:39 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Eric Paris, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk, linux-fsdevel

On 2010-08-20, at 09:29, Andreas Gruenbacher wrote:
> On Friday 20 August 2010 11:21:27 Christoph Hellwig wrote:
>> On Thu, Aug 19, 2010 at 11:19:07PM -0600, Andreas Dilger wrote:
>>> What about unifying the file identification here with the file handle
>>> used for open_by_handle()?  That keeps a consistent identifier for the
>>> inode between multiple system calls (always a good thing), and allows
>>> the inode to be opened again via open_by_handle() if needed.
>> 
>> That's what the dmapi callouts that are intended to do just about the
>> same as fanotify always did.  I'm pretty sure I brought this up earlier
>> in the discussion.
> 
> I remember you bringing this up.
> 
> The persistent handles require CAP_DAC_READ_SEARCH for using open_by_handle() 
> to prevent an unprivileged process from forging handles and bypassing 
> directory permission checks.  This would be okay for users of fanotify which 
> can listen to all events in their namespace (and which requires CAP_SYS_ADMIN 
> anyway).
> 
> I don't see how open_by_handle() could be made safe to use by arbitrary 
> processes; I don't think we can make the handles cryptographically strong, for 
> example.  But I may be overlooking something here.

If the file handles only need to have a limited lifetime for unprivileged processes, then they can contain a random salt that is kept on the in-core inode.  For me and my intended HPC use case this would be a useful addition for open_by_handle() to allow unprivileged process access.  At worst, if the inode is evicted from memory the process would redo the name_to_handle(), or do the slower open-by-path().

I suspect it would also be possible to have an array of per-superblock (or global) crypto keys that are hashed with the file handle data.  That avoids the per-inode memory, and allows a well-defined lifetime for the handle (minutes, hours, days) only as a function of how quickly the crypto key needs to rotate (based on key length and attack speed) and the size of the array that is kept.

> In the future it will make sense to extend fanotify to allow unprivileged 
> processes to listen to "their own" events, for example, like inotify does 
> today.  (You know that providing a better inotify replacement was one of the 
> goals of fanotify before it got merged anyway.)  Unprivileged processes 
> wouldn't be allowed to use open_by_handle() anymore though, and so file 
> handles still look like a better choice for fanotify to me.


Cheers, Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-20 12:38     ` Andreas Gruenbacher
@ 2010-08-23 16:46       ` Eric Paris
  2010-08-23 22:38         ` Andreas Gruenbacher
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Paris @ 2010-08-23 16:46 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: linux-fsdevel, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Fri, 2010-08-20 at 14:38 +0200, Andreas Gruenbacher wrote:
> On Friday 20 August 2010 05:50:36 Eric Paris wrote:
> > We must be doing something different...  What kernel?  what kconfig?
> > What exact FS setup?  What exact steps are you taking?  What programs
> > are you using to test east side?
> 
> I'm runnning 2.6.36-rc1, with CONFIG_FANOTIFY and 
> CONFIG_FANOTIFY_ACCESS_PERMISSIONS on apparently.  I am watching the same 
> directory with inotify and fanotify at the same time, that is, with both an 
> inotify and an fanotify listener running in two separate processes.  The 
> inotify listener is code I cannot send so easily, but I've shown the resulting 
> strace.  The fanotify listener is the one from [1].
> 
> 	[1] http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git
> 
> Together with the traces I've provided this should give you way enough clues 
> to be able to look up in the code why listening for fanotify events apparently 
> causes a concurrent inotify listener to return an inotify event with struct 
> inotify_event->mask == 0 for each fanotify perm event.

Spent a bit of the weekend trying to figure out what you were doing and
couldn't reproduce it or find it in the code because I had already fixed
it (albeit for slightly different reasons).  The patch in question was:

http://marc.info/?l=linux-kernel&m=128214903125780&w=2

the vfsmount_test_mask was always initialized but since no vfsmount
marks were found it was never cleared.  This left the code thinking that
the given (inode) mark was interested in the event.  I think you could
reproduce it differently

inotifywait -m -e open /mnt/tmp
inotifywait -m -e close /mnt/tmp

and you would get both event types for both watches.

-Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-23 16:46       ` Eric Paris
@ 2010-08-23 22:38         ` Andreas Gruenbacher
  0 siblings, 0 replies; 14+ messages in thread
From: Andreas Gruenbacher @ 2010-08-23 22:38 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-fsdevel, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Monday 23 August 2010 18:46:13 Eric Paris wrote:
> Spent a bit of the weekend trying to figure out what you were doing and
> couldn't reproduce it or find it in the code because I had already fixed
> it (albeit for slightly different reasons).  The patch in question was:
> 
> http://marc.info/?l=linux-kernel&m=128214903125780&w=2
> 
> the vfsmount_test_mask was always initialized but since no vfsmount
> marks were found it was never cleared.  This left the code thinking that
> the given (inode) mark was interested in the event.

That seems to explain it.  I can no longer reproduce the bug with your current 
for-linus tree (3ba67c8a) using my previous method.

Sorry this bug was difficult for you to track down; I was not even aware of 
this fix until today.

> I think you could reproduce it differently
> 
> inotifywait -m -e open /mnt/tmp
> inotifywait -m -e close /mnt/tmp
> 
> and you would get both event types for both watches.

No, the open listener gets:
	d/ OPEN,ISDIR

and the close listener gets:
	d/ CLOSE_NOWRITE,CLOSE,ISDIR

I tried to reproduce the previously failing case with inotifywait. I can see 
that inotifywait receives five inotify events at the syscall level, but it 
will not report events with mask == 0.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-08-23 22:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1281110319.17812.21.camel@dhcp231-200.rdu.redhat.com>
     [not found] ` <201008191444.08966.agruen@suse.de>
     [not found]   ` <1282230012.21419.1566.camel@acb20005.ipt.aol.com>
2010-08-19 23:41     ` [GIT PULL] notification tree: directory events Andreas Gruenbacher
2010-08-20  3:38       ` Eric Paris
2010-08-20  5:19         ` Andreas Dilger
2010-08-20  9:21           ` Christoph Hellwig
2010-08-20 15:29             ` Andreas Gruenbacher
2010-08-20 20:39               ` Andreas Dilger
2010-08-20  9:09         ` Tvrtko Ursulin
2010-08-20 11:07         ` Andreas Gruenbacher
2010-08-20 11:25         ` Andreas Gruenbacher
2010-08-20 12:16         ` Andreas Gruenbacher
     [not found] ` <1282016387.21419.113.camel@acb20005.ipt.aol.com>
     [not found]   ` <201008171009.51737.agruen@suse.de>
2010-08-20  0:00     ` [GIT PULL] notification tree - try 37! Andreas Gruenbacher
     [not found] ` <201008192307.32526.agruen@suse.de>
     [not found]   ` <1282276236.21419.2101.camel@acb20005.ipt.aol.com>
2010-08-20 12:38     ` Andreas Gruenbacher
2010-08-23 16:46       ` Eric Paris
2010-08-23 22:38         ` Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).