linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Alan Cox <alan@linux.intel.com>
Cc: Eric Paris <eparis@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Evgeniy Polyakov <zbr@ioremap.net>,
	David Miller <davem@davemloft.net>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	netdev@vger.kernel.org, viro@zeniv.linux.org.uk,
	hch@infradead.org
Subject: Re: fanotify as syscalls
Date: Wed, 16 Sep 2009 12:41:07 +0100	[thread overview]
Message-ID: <20090916114107.GB29359@shareable.org> (raw)
In-Reply-To: <20090916114111.2228f0fc@linux.intel.com>

Alan Cox wrote:
> > - fanotify does not provide subtree notification in it's
> >   present form.  When it is extended to do that, why wouldn't
> >   inotify be as well?  That's an fsnotify feature, common to both.
> 
> Because inotify gives you no reliable access to the object monitored as
> the name passed back is not an object reference and is racy. Inotify is
> fine for making pretty icons pop up on desktops and making file
> selectors update, but it is somewhat inadequate for indexers and
> completely useless for stuff like HSM.

That was my point.  (Why do people keep not getting it?)

You can't rely on the name being non-racy, but you _can_ reliably
invalidate application-level caches from the sequence of events
including file writes, creates, renames, links, unlinks, mounts.  And
revalidate such caches by the absence of pending events.

(There is one obscure case which inotify is missing, though, which
means it cannot detect file changes in certain cases with hard links.
I intend to fix that one.)

For that, an inode isn't useful, a descriptor isn't useful, a
directory descriptor/inode and pathname isn't useful, and file write
events by themselves aren't useful.  None of them quite do it by
themselves.

But with the correct combination of events, you can maintain very
efficient application-level caching of file data / directory listing
and lookups / stat results you have previously read from the
filesystem.  That's because the information you have previously
depended upon, including path lookups, are all notified as one sort of
inotify event or another when changed.

Which doesn't sound all that special until you realise you can very
quickly revalidate application-caches of any data structure calculated
from reading things from the filesystem, no matter how many
prerequisites or how complex the data structures, in a single system
call.  Amortised over many revalidations if you have them in parallel.

That can apply to things like git, make, ccache, samba, rsync, httpd
path walks, and virtually any "web templating" framework.  Of course
it takes userspace support as well, but that's where I'm coming from
regarding "acceleration" and the essential kernel infrastructure.

Clearly, I'm going to have to explain with working code :-)

> but it is somewhat inadequate for indexers

For indexers, the real inadequacy is the need to attach inotify
watches to every directory at system startup, and to stat() everything
to check it hasn't changed since the indexer was last running.  Both
are very slow on a large directory tree.  The former can be dealt with
using subtree watches (yes, even with hard links - I have proposed an
algorithm for this but I think nobody understood it ;-).  The latter
needs filesystem support for a persistent change attribute.

> > - fanotify requires you call readlink(/proc/fd/N) for every event to
> >   get the path.  It's not a particularly efficient way to get it,
> 
> IFF you want the path, but the path isn't usually the most valuable bit.
> Plus you'll find the readlink is extremely quick anyway.

I agree, you don't usually want the whole path.

So what was the point about fanotify making subtree tracking possible
with it's file descriptor, if not by readlink(/proc/fd/N)?
Descriptors don't tell you which subtree a file is in any better than
inotify watches.  I.e. they do, if you track them and their containing
directories all individually.

> > People who want to break out of chroot/namespace jails using the
> > conveniently provided open file descriptor? :-)
> 
> chroot isn't a security model. You can already do this with AF_UNIX
> sockets (and there are apps that intentionally use fchdir that way)

Ah, no.  AF_UNIX works with explicit sender cooperation.

fanotify gives you access to files without sender cooperation, as it
intercepts every open().

> > I'd expect anti-malware to want to be run inside VMs quite often...
> 
> Inside of containers - unlikely.

Why not?  Some people run entire distributions in containiners, and
present them as VMs to the world for other people to admin.

> > the accessing process until acked), that's ok with me.  It makes
> > sense.  But then it's messy that neither offers a superset of the
> > other regarding which files and events are tracked.
> 
> Agreed.

In the end this is my main gripe.

-- Jamie


  reply	other threads:[~2009-09-16 11:41 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  5:25 [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Eric Paris
2009-09-11  5:26 ` [PATCH 2/8] vfs: introduce FMODE_NONOTIFY Eric Paris
2009-09-11  5:26 ` [PATCH 3/8] fanotify: fscking all notification system Eric Paris
2009-09-11  5:26 ` [PATCH 4/8] fanotify:drop notification if they exist in the outgoing queue Eric Paris
2009-09-11  5:26 ` [PATCH 5/8] fanotify: merge notification events with different masks Eric Paris
2009-09-11  5:26 ` [PATCH 6/8] fanotify: userspace socket Eric Paris
2009-09-11  5:26 ` [PATCH 7/8] fanotify: userspace can add and remove fsnotify inode marks Eric Paris
2009-09-11  5:26 ` [PATCH 8/8] fanotify: send events to userspace over socket reads Eric Paris
2009-09-11 14:08   ` Daniel Walker
2009-09-11 14:15     ` Eric Paris
2009-09-11 14:22       ` Daniel Walker
2009-09-11 14:32       ` Daniel Walker
2009-09-11 14:32 ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Andreas Gruenbacher
2009-09-11 16:04   ` Eric Paris
2009-09-11 18:46 ` David Miller
2009-09-11 19:33   ` Eric Paris
2009-09-11 20:46     ` Jamie Lokier
2009-09-11 21:13       ` Eric Paris
2009-09-11 21:27         ` Jamie Lokier
2009-09-11 21:51           ` Eric Paris
2009-09-12  9:41             ` Evgeniy Polyakov
2009-09-14  0:17               ` Jamie Lokier
2009-09-14 14:07                 ` Evgeniy Polyakov
2009-09-14 19:08                   ` fanotify as syscalls Eric Paris
2009-09-15 20:16                     ` Evgeniy Polyakov
2009-09-15 21:54                       ` Eric Paris
2009-09-15 23:49                         ` Linus Torvalds
2009-09-16  1:26                           ` Eric Paris
2009-09-16  7:52                             ` Jamie Lokier
2009-09-16  9:48                               ` Eric Paris
2009-09-16 12:17                                 ` Jamie Lokier
2009-09-17 20:07                                   ` Andreas Gruenbacher
2009-09-18 20:52                                     ` Eric Paris
2009-09-18 22:00                                       ` Andreas Gruenbacher
2009-09-19  3:04                                         ` Eric Paris
2009-09-21 20:04                                           ` Andreas Gruenbacher
2009-09-21 20:28                                             ` Jamie Lokier
2009-09-21 21:27                                               ` Andreas Gruenbacher
2009-09-21 22:00                                                 ` Jamie Lokier
2009-09-21 23:09                                                   ` Andreas Gruenbacher
2009-09-21 23:56                                                     ` Jamie Lokier
2009-09-21 22:18                                               ` Davide Libenzi
2009-09-21 23:12                                                 ` Jamie Lokier
2009-09-22 14:51                                                   ` Davide Libenzi
2009-09-22 15:31                                                     ` Andreas Gruenbacher
2009-09-22 16:04                                                       ` Davide Libenzi
2009-09-23  8:39                                                         ` Tvrtko Ursulin
2009-09-23 11:20                                                           ` hch
2009-09-23 15:35                                                             ` Davide Libenzi
2009-09-23 21:58                                                               ` hch
2009-09-23 11:32                                                           ` Arjan van de Ven
2009-09-23 15:42                                                             ` Tvrtko Ursulin
2009-09-23 15:51                                                             ` Eric Paris
2009-09-23 21:56                                                               ` hch
2009-09-23 15:26                                                           ` Davide Libenzi
2009-09-23 15:45                                                             ` Tvrtko Ursulin
2009-09-23 17:31                                                               ` Davide Libenzi
2009-09-22 16:11                                                       ` Eric Paris
2009-09-22 16:27                                                         ` Jamie Lokier
2009-09-22 23:43                                                           ` Davide Libenzi
2009-09-22 21:06                                             ` Eric Paris
2009-09-22 21:38                                               ` Andreas Gruenbacher
2009-09-16 10:41                               ` Alan Cox
2009-09-16 11:41                                 ` Jamie Lokier [this message]
2009-09-16 12:01                                   ` Alan Cox
2009-09-16 12:56                                     ` Jamie Lokier
2009-09-16 15:53                                       ` Eric Paris
2009-09-16 21:49                                         ` Jamie Lokier
2009-09-16 22:33                                           ` Eric Paris
2009-09-16 11:30                         ` Arnd Bergmann
2009-09-16 12:05                         ` Evgeniy Polyakov
2009-09-16 12:27                           ` Jamie Lokier
2009-09-17 16:40                             ` Linus Torvalds
2009-09-17 17:35                               ` Arjan van de Ven
2009-09-17 18:53                               ` Eric Paris
2009-09-22  0:15                               ` Eric W. Biederman
2009-09-22  0:22                                 ` Randy Dunlap
2009-09-11 21:21     ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers jamal
2009-09-11 21:42       ` Jamie Lokier
2009-09-11 22:52         ` jamal
2009-09-14  0:03           ` Jamie Lokier
2009-09-14  1:26             ` Eric Paris
2009-09-14 13:15             ` jamal
2009-09-12  9:47         ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090916114107.GB29359@shareable.org \
    --to=jamie@shareable.org \
    --cc=alan@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=eparis@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).