netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Evgeniy Polyakov <zbr@ioremap.net>,
	David Miller <davem@davemloft.net>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	netdev@vger.kernel.org, viro@zeniv.linux.org.uk,
	alan@linux.intel.com, hch@infradead.org
Subject: Re: fanotify as syscalls
Date: Wed, 16 Sep 2009 08:52:19 +0100	[thread overview]
Message-ID: <20090916075219.GA22024@shareable.org> (raw)
In-Reply-To: <1253064391.5213.37.camel@dhcp231-106.rdu.redhat.com>

Eric Paris wrote:
> On Tue, 2009-09-15 at 16:49 -0700, Linus Torvalds wrote:
> > And btw, I still want to know what's so wonderful about fanotify that we 
> > would actually want yet-another-filesystem-notification-interface. So I'm 
> > not sayying that I'll take a system call interface.
> 
> The real thing that fanotify provides is an open fd with the event
> rather than some arbitrary 'watch descriptor' that userspace must
> somehow magically map back to data on disk.  This means that it could be
> used to provide subtree notification, which inotify is completely
> incapable of doing.

That's a bit of a spurious claim.

- fanotify does not provide subtree notification in it's
  present form.  When it is extended to do that, why wouldn't
  inotify be as well?  That's an fsnotify feature, common to both.

- fanotify does not provide notification at all for some events that
  you get with inotify.  It is not a superset, so you can't use
  fanotify to provide a subtree-capable equivalent to inotify.  What
  a mess when you need the combination of both features!

- fanotify requires you call readlink(/proc/fd/N) for every event to
  get the path.  It's not a particularly efficient way to get it,
  especially when an apps wants to know if it's something in it's
  region of interest but doesn't care about the actual path.
  When an apps knows it needs the map back to to path, why make it
  slow to get it?  That "extensible data format" is being
  underutilised...

- fanotify's descriptor may be race-prone as a way to get the subtree
  used for access, because any of the parent directories could have
  moved and even been deleted before the app calls
  readlink(/proc/fd/N).  I don't know if a _reliable_ way to track
  changes in a subtree can be built on it.  Maybe it can but it
  appears this hasn't been analysed.  It depends on
  readlink(/proc/fd/N)'s behaviour when the dentry's have been
  changed, among other things.

- Does the descriptor cause umount to fail when user does "do some
  stuff in baz; umount baz", or does it serialise nicely?  That's one
  of inotify's nice features - it doesn't cause umounts to fail.

> And it can be used to provide system wide notification.  We all know
> who wants that.

People who want to break out of chroot/namespace jails using the
conveniently provided open file descriptor? :-)

Seriously, what does system-wide fanotify do when run from a
chroot/namespace/cgroup, and a file outside them is accessed?

If the event is delivered with file desciptor, that's a security hole.
If it's not delivered, that sounds like working subtree support?

I'd expect anti-malware to want to be run inside VMs quite often...

Note that there's no such thing as "the real system root" any more.

> It provides an extensible data format which allows growth impossible in
> inotify.  I don't know if anyone remember the inotify patches which
> wanted to overload the inotify cookie field for some other information,
> but inotify information extension is not reasonable or backwards
> compatible.

I agree with this (although that's what flags are for -- see clone).

I don't have a problem with the next interface being fanotify (despite
arguing a lot); I just want to see the next one being useful for the
things I would otherwise be proposing my own yet-another-interface
for.  So we don't need a fourth one soon after the third due to
easily foreseen limitations.

> I've got private commitments for two very large anti malware companies,
> both of which unprotect and hack syscall tables in their customer's
> kernels, that they would like to move to an fanotify interface.  Both
> Red Hat and Suse have expressed interest in these patches and have
> contributed to the patch set.
> 
> The patch set is actually rather small (entire set of about 20 patches
> is 1800 lines) as it builds on the fsnotify work already in 2.6.31 to
> reuse code from inotify rather than reimplement the same things over and
> over (like we previously had with inotify and dnotify)

I don't have any problem with either of these, and _fs_notify
generally seems like an improvement.  I don't have a problem with
fanotify either.  For what it does, it's ok.

> Don't know what else to say.....

Answer questions about use-cases that you're not interested in?  Why
block them?  What about Evigny's request for an event without an open
fd - because he needs the pid information (inotify doesn't provide)
but not the fd?

Sorry to be so harsh.  I'm really trying to make sure we don't repeat
the mistakes of dnotify and inotify, and end up with a third interface
which also is too restrictive (because it's good enough for your
anti-malware and HSM customers) so that a fourth interface will be
needed soon after.

I'd like to be able to use it from some applications to accelerate
userspace caching of things (faster Make, faster Samba) without
penalising all other applications touching unrelated parts of the
filesystem.  The attitude "you can live with 10% slowdown" worries me.
I'm sure that can be fixed with a bit of care.

If the intention is to maintain fanotify and inotify side-by-side for
different uses (because fanotify returns open descriptors and blocks
the accessing process until acked), that's ok with me.  It makes
sense.  But then it's messy that neither offers a superset of the
other regarding which files and events are tracked.

If it's right that inotify has no room for extensibility (I'm not sure
about this), than it appears we already made a mess with dnotify and
inotify, so it would be a shame to repeat the same mistakes again.
Let's get the next one right, even it takes a bit longer, ok?

-- Jamie

  reply	other threads:[~2009-09-16  7:52 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  5:25 [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Eric Paris
2009-09-11  5:26 ` [PATCH 2/8] vfs: introduce FMODE_NONOTIFY Eric Paris
2009-09-11  5:26 ` [PATCH 3/8] fanotify: fscking all notification system Eric Paris
2009-09-11  5:26 ` [PATCH 4/8] fanotify:drop notification if they exist in the outgoing queue Eric Paris
2009-09-11  5:26 ` [PATCH 5/8] fanotify: merge notification events with different masks Eric Paris
2009-09-11  5:26 ` [PATCH 6/8] fanotify: userspace socket Eric Paris
2009-09-11  5:26 ` [PATCH 7/8] fanotify: userspace can add and remove fsnotify inode marks Eric Paris
2009-09-11  5:26 ` [PATCH 8/8] fanotify: send events to userspace over socket reads Eric Paris
2009-09-11 14:08   ` Daniel Walker
2009-09-11 14:15     ` Eric Paris
2009-09-11 14:22       ` Daniel Walker
2009-09-11 14:32       ` Daniel Walker
2009-09-11 14:32 ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Andreas Gruenbacher
2009-09-11 16:04   ` Eric Paris
2009-09-11 18:46 ` David Miller
2009-09-11 19:33   ` Eric Paris
2009-09-11 20:46     ` Jamie Lokier
2009-09-11 21:13       ` Eric Paris
2009-09-11 21:27         ` Jamie Lokier
2009-09-11 21:51           ` Eric Paris
2009-09-12  9:41             ` Evgeniy Polyakov
2009-09-14  0:17               ` Jamie Lokier
2009-09-14 14:07                 ` Evgeniy Polyakov
2009-09-14 19:08                   ` fanotify as syscalls Eric Paris
2009-09-15 20:16                     ` Evgeniy Polyakov
2009-09-15 21:54                       ` Eric Paris
2009-09-15 23:49                         ` Linus Torvalds
2009-09-16  1:26                           ` Eric Paris
2009-09-16  7:52                             ` Jamie Lokier [this message]
2009-09-16  9:48                               ` Eric Paris
2009-09-16 12:17                                 ` Jamie Lokier
2009-09-17 20:07                                   ` Andreas Gruenbacher
2009-09-18 20:52                                     ` Eric Paris
2009-09-18 22:00                                       ` Andreas Gruenbacher
2009-09-19  3:04                                         ` Eric Paris
2009-09-21 20:04                                           ` Andreas Gruenbacher
2009-09-21 20:28                                             ` Jamie Lokier
2009-09-21 21:27                                               ` Andreas Gruenbacher
2009-09-21 22:00                                                 ` Jamie Lokier
2009-09-21 23:09                                                   ` Andreas Gruenbacher
2009-09-21 23:56                                                     ` Jamie Lokier
2009-09-21 22:18                                               ` Davide Libenzi
2009-09-21 23:12                                                 ` Jamie Lokier
2009-09-22 14:51                                                   ` Davide Libenzi
2009-09-22 15:31                                                     ` Andreas Gruenbacher
2009-09-22 16:04                                                       ` Davide Libenzi
2009-09-23  8:39                                                         ` Tvrtko Ursulin
2009-09-23 11:20                                                           ` hch
2009-09-23 15:35                                                             ` Davide Libenzi
2009-09-23 21:58                                                               ` hch
2009-09-23 11:32                                                           ` Arjan van de Ven
2009-09-23 15:42                                                             ` Tvrtko Ursulin
2009-09-23 15:51                                                             ` Eric Paris
2009-09-23 21:56                                                               ` hch
2009-09-23 15:26                                                           ` Davide Libenzi
2009-09-23 15:45                                                             ` Tvrtko Ursulin
2009-09-23 17:31                                                               ` Davide Libenzi
2009-09-22 16:11                                                       ` Eric Paris
2009-09-22 16:27                                                         ` Jamie Lokier
2009-09-22 23:43                                                           ` Davide Libenzi
2009-09-22 21:06                                             ` Eric Paris
2009-09-22 21:38                                               ` Andreas Gruenbacher
2009-09-16 10:41                               ` Alan Cox
2009-09-16 11:41                                 ` Jamie Lokier
2009-09-16 12:01                                   ` Alan Cox
2009-09-16 12:56                                     ` Jamie Lokier
2009-09-16 15:53                                       ` Eric Paris
2009-09-16 21:49                                         ` Jamie Lokier
2009-09-16 22:33                                           ` Eric Paris
2009-09-16 11:30                         ` Arnd Bergmann
2009-09-16 12:05                         ` Evgeniy Polyakov
2009-09-16 12:27                           ` Jamie Lokier
2009-09-17 16:40                             ` Linus Torvalds
2009-09-17 17:35                               ` Arjan van de Ven
2009-09-17 18:53                               ` Eric Paris
2009-09-22  0:15                               ` Eric W. Biederman
2009-09-22  0:22                                 ` Randy Dunlap
2009-09-11 21:21     ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers jamal
2009-09-11 21:42       ` Jamie Lokier
2009-09-11 22:52         ` jamal
2009-09-14  0:03           ` Jamie Lokier
2009-09-14  1:26             ` Eric Paris
2009-09-14 13:15             ` jamal
2009-09-12  9:47         ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090916075219.GA22024@shareable.org \
    --to=jamie@shareable.org \
    --cc=alan@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=eparis@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).