From: Jamie Lokier <jamie@shareable.org>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Alan Cox <alan@linux.intel.com>, Eric Paris <eparis@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Evgeniy Polyakov <zbr@ioremap.net>,
David Miller <davem@davemloft.net>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
netdev@vger.kernel.org, viro@zeniv.linux.org.uk,
hch@infradead.org
Subject: Re: fanotify as syscalls
Date: Wed, 16 Sep 2009 13:56:58 +0100 [thread overview]
Message-ID: <20090916125658.GF29359@shareable.org> (raw)
In-Reply-To: <20090916130127.439c9222@lxorguk.ukuu.org.uk>
Alan Cox wrote:
> > You can't rely on the name being non-racy, but you _can_ reliably
> > invalidate application-level caches from the sequence of events
> > including file writes, creates, renames, links, unlinks, mounts. And
> > revalidate such caches by the absence of pending events.
>
> You can't however create the caches reliably because you've no idea if
> you are referencing the right object in the first place - which is why
> you want a handle in these cases. I see fanotify as a handle producing
> addition to inotify, not as a replacement (plus some other bits around
> open blocking for HSM etc)
There are two sets of events getting mixed up here. Inode events -
reads, writes, truncates, chmods; and directory events - renames,
links, creates, unlinks.
Inode events alone _not enough_ to maintain caches, and here's why.
With a file descriptor for an _inode_ event, that's fine. If you have
{ int fd1 = open("/foo/bar"), fd2 = open("/foo/baz"); } early in your
program, and later cached_file_read(fd1) and cached_file_read(fd2),
you have to recognise the inode number and invalidate both.
You have to call fstat() on the event's descriptor and then look up a
device+inode number in your own table. (The inotify way doesn't need
the fstat() but is otherwise the same).
That's fine for files you're keeping open and only want to know if the
content changes _of an open file_.
But that's not so useful.
More often, you want to validate cached_file_read("/foo/bar"). That
is, validate what you'd get if you opened that path _now_ and read it.
Same for cached_stat("/foo/bar") to cache permissions, and other
things like that.
That needs to validate the path lookup _and_ the inode state.
For that, we need directory events, and they must include the name in
the directory that's affected. If you receive a directory event
involving name "bar" in directory (identified by inode) "/foo", you
invalidate cached_file_read("/foo/bar") and cached_stat("/foo/bar").
Oh, but wait, how do we know the inode for the directory in our event
still refers to "/foo"? Answer: We're also watching it's parent
directory "/". Assuming no reordering of certain events, that's ok.
That way, by watching "/", "/foo" and "/foo/bar", when you receive no
events you validate the results of cached_file_read("/foo/bar") and
cached_stat("/foo/bar"). A lot to set up, but fast to check. Worth
it if you're checking a lot of things that rarely change.
If you receive inode events while watching the parent directory of the
path used to access the inode, then you can avoid watching "/foo/bar",
and just watch the path of parent directories. That saves an order of
magnitude of watches typically. fanotify offers something similar,
and in this case the event is probably more useful than inotify's.
(The above is even hard-link-safe, if you do it right. I won't
complicate the explanation with details).
-- Jamie
next prev parent reply other threads:[~2009-09-16 12:56 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-11 5:25 [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Eric Paris
2009-09-11 5:26 ` [PATCH 2/8] vfs: introduce FMODE_NONOTIFY Eric Paris
2009-09-11 5:26 ` [PATCH 3/8] fanotify: fscking all notification system Eric Paris
2009-09-11 5:26 ` [PATCH 4/8] fanotify:drop notification if they exist in the outgoing queue Eric Paris
2009-09-11 5:26 ` [PATCH 5/8] fanotify: merge notification events with different masks Eric Paris
2009-09-11 5:26 ` [PATCH 6/8] fanotify: userspace socket Eric Paris
2009-09-11 5:26 ` [PATCH 7/8] fanotify: userspace can add and remove fsnotify inode marks Eric Paris
2009-09-11 5:26 ` [PATCH 8/8] fanotify: send events to userspace over socket reads Eric Paris
2009-09-11 14:08 ` Daniel Walker
2009-09-11 14:15 ` Eric Paris
2009-09-11 14:22 ` Daniel Walker
2009-09-11 14:32 ` Daniel Walker
2009-09-11 14:32 ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers Andreas Gruenbacher
2009-09-11 16:04 ` Eric Paris
2009-09-11 18:46 ` David Miller
2009-09-11 19:33 ` Eric Paris
2009-09-11 20:46 ` Jamie Lokier
2009-09-11 21:13 ` Eric Paris
2009-09-11 21:27 ` Jamie Lokier
2009-09-11 21:51 ` Eric Paris
2009-09-12 9:41 ` Evgeniy Polyakov
2009-09-14 0:17 ` Jamie Lokier
2009-09-14 14:07 ` Evgeniy Polyakov
2009-09-14 19:08 ` fanotify as syscalls Eric Paris
2009-09-15 20:16 ` Evgeniy Polyakov
2009-09-15 21:54 ` Eric Paris
2009-09-15 23:49 ` Linus Torvalds
2009-09-16 1:26 ` Eric Paris
2009-09-16 7:52 ` Jamie Lokier
2009-09-16 9:48 ` Eric Paris
2009-09-16 12:17 ` Jamie Lokier
2009-09-17 20:07 ` Andreas Gruenbacher
2009-09-18 20:52 ` Eric Paris
2009-09-18 22:00 ` Andreas Gruenbacher
2009-09-19 3:04 ` Eric Paris
2009-09-21 20:04 ` Andreas Gruenbacher
2009-09-21 20:28 ` Jamie Lokier
2009-09-21 21:27 ` Andreas Gruenbacher
2009-09-21 22:00 ` Jamie Lokier
2009-09-21 23:09 ` Andreas Gruenbacher
2009-09-21 23:56 ` Jamie Lokier
2009-09-21 22:18 ` Davide Libenzi
2009-09-21 23:12 ` Jamie Lokier
2009-09-22 14:51 ` Davide Libenzi
2009-09-22 15:31 ` Andreas Gruenbacher
2009-09-22 16:04 ` Davide Libenzi
2009-09-23 8:39 ` Tvrtko Ursulin
2009-09-23 11:20 ` hch
2009-09-23 15:35 ` Davide Libenzi
2009-09-23 21:58 ` hch
2009-09-23 11:32 ` Arjan van de Ven
2009-09-23 15:42 ` Tvrtko Ursulin
2009-09-23 15:51 ` Eric Paris
2009-09-23 21:56 ` hch
2009-09-23 15:26 ` Davide Libenzi
2009-09-23 15:45 ` Tvrtko Ursulin
2009-09-23 17:31 ` Davide Libenzi
2009-09-22 16:11 ` Eric Paris
2009-09-22 16:27 ` Jamie Lokier
2009-09-22 23:43 ` Davide Libenzi
2009-09-22 21:06 ` Eric Paris
2009-09-22 21:38 ` Andreas Gruenbacher
2009-09-16 10:41 ` Alan Cox
2009-09-16 11:41 ` Jamie Lokier
2009-09-16 12:01 ` Alan Cox
2009-09-16 12:56 ` Jamie Lokier [this message]
2009-09-16 15:53 ` Eric Paris
2009-09-16 21:49 ` Jamie Lokier
2009-09-16 22:33 ` Eric Paris
2009-09-16 11:30 ` Arnd Bergmann
2009-09-16 12:05 ` Evgeniy Polyakov
2009-09-16 12:27 ` Jamie Lokier
2009-09-17 16:40 ` Linus Torvalds
2009-09-17 17:35 ` Arjan van de Ven
2009-09-17 18:53 ` Eric Paris
2009-09-22 0:15 ` Eric W. Biederman
2009-09-22 0:22 ` Randy Dunlap
2009-09-11 21:21 ` [PATCH 1/8] networking/fanotify: declare fanotify socket numbers jamal
2009-09-11 21:42 ` Jamie Lokier
2009-09-11 22:52 ` jamal
2009-09-14 0:03 ` Jamie Lokier
2009-09-14 1:26 ` Eric Paris
2009-09-14 13:15 ` jamal
2009-09-12 9:47 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090916125658.GF29359@shareable.org \
--to=jamie@shareable.org \
--cc=alan@linux.intel.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=davem@davemloft.net \
--cc=eparis@redhat.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=zbr@ioremap.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).