From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: mtk.manpages@gmail.com, John McCutchan <john@johnmccutchan.com>,
Robert Love <rlove@rlove.org>, Eric Paris <eparis@redhat.com>,
Lennart Poettering <lennart@poettering.net>,
radu.voicilas@gmail.com, daniel@veillard.com,
Christoph Hellwig <hch@infradead.org>,
Vegard Nossum <vegard.nossum@oracle.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
linux-man <linux-man@vger.kernel.org>,
gamin-list@gnome.org, lkml <linux-kernel@vger.kernel.org>,
inotify-tools-general@lists.sourceforge.net
Subject: Re: Things I wish I'd known about Inotify
Date: Fri, 04 Apr 2014 09:35:50 +0200 [thread overview]
Message-ID: <533E60D6.2000704@gmail.com> (raw)
In-Reply-To: <20140403205236.GE14107@quack.suse.cz>
On 04/03/2014 10:52 PM, Jan Kara wrote:
> On Thu 03-04-14 08:34:44, Michael Kerrisk (man-pages) wrote:
>> Limitations and caveats
>> The inotify API provides no information about the user or process
>> that triggered the inotify event. In particular, there is no
>> easy way for a process that is monitoring events via inotify to
>> distinguish events that it triggers itself from those that are
>> triggered by other processes.
>>
>> The inotify API identifies affected files by filename. However,
>> by the time an application processes an inotify event, the file‐
>> name may already have been deleted or renamed.
>>
>> The inotify API identifies events via watch descriptors. It is
>> the application's responsibility to cache a mapping (if one is
>> needed) between watch descriptors and pathnames. Be aware that
>> directory renamings may affect multiple cached pathnames.
>>
>> Inotify monitoring of directories is not recursive: to monitor
>> subdirectories under a directory, additional watches must be cre‐
>> ated. This can take a significant amount time for large direc‐
>> tory trees.
> And also there's a problem with the limit on the number of watches a user
> can have.
What is the problem exactly (given that the limit is configurable)?
>> If monitoring an entire directory subtree, and a new subdirectory
>> is created in that tree or an existing directory is renamed into
>> that tree, be aware that by the time you create a watch for the
>> new subdirectory, new files (and subdirectories) may already
>> exist inside the subdirectory. Therefore, you may want to scan
>> the contents of the subdirectory immediately after adding the
>> watch (and, if desired, recursively add watches for any subdirec‐
>> tories that it contains).
>>
>> Note that the event queue can overflow. In this case, events are
>> lost. Robust applications should handle the possibility of lost
>> events gracefully. For example, it may be necessary to rebuild
>> part or all of the application cache. (One simple, but possibly
>> expensive, approach is to close the inotify file descriptor,
>> empty the cache, create a new inotify file descriptor, and then
>> re-create watches and cache entries for the objects to be moni‐
>> tored.)
>>
>> Dealing with rename() events
>> The IN_MOVED_FROM and IN_MOVED_TO events that are generated by
>> rename(2) are usually available as consecutive events when read‐
>> ing from the inotify file descriptor. However, this is not guar‐
>> anteed. If multiple processes are triggering events for moni‐
>> tored objects, then (on rare occasions) an arbitrary number of
>> other events may appear between the IN_MOVED_FROM and IN_MOVED_TO
>> events.
>>
>> Matching up the IN_MOVED_FROM and IN_MOVED_TO event pair gener‐
>> ated by rename(2) is thus inherently racy. (Don't forget that if
>> an object is renamed outside of a monitored directory, there may
>> not even be an IN_MOVED_TO event.) Heuristic approaches (e.g.,
>> assume the events are always consecutive) can be used to ensure a
>> match in most cases, but will inevitably miss some cases, causing
>> the application to perceive the IN_MOVED_FROM and IN_MOVED_TO
>> events as being unrelated. If watch descriptors are destroyed
>> and re-created as a result, then those watch descriptors will be
>> inconsistent with the watch descriptors in any pending events.
>> (Re-creating the inotify file descriptor and rebuilding the cache
>> may be useful to deal with this scenario.)
> Well, but there's 'cookie' value meant exactly for matching up
> IN_MOVED_FROM and IN_MOVED_TO events. And 'cookie' is guaranteed to be
> unique at least within the inotify instance (in fact currently it is unique
> within the whole system but I don't think we want to give that promise).
Yes, that's already assumed by my discussion above (its described elsewhere
in the page). But your comment makes me think I should add a few words to
remind the reader of that fact. I'll do that.
But, the point is that even with the cookie, matching the events is
nontrivial, since:
* There may not even be an IN_MOVED_FROM event
* There may be an arbitrary number of other events in between the
IN_MOVED_FROM and the IN_MOVED_TO.
Therefore, one has to use heuristic approaches such as "allow at least
N millisconds" or "check the next N events" to see if there is an
IN_MOVED_FROM that matches the IN_MOVED_TO. I can't see any way around
that being inherently racy. (It's unfortunate that the kernel can't
provide a guarantee that the two events are always consecutive, since
that would simply user space's life considerably.)
Cheers,
Michael
>> Applications should also allow for the possibility that the
>> IN_MOVED_FROM event was the last event that could fit in the buf‐
>> fer returned by the current call to read(2), and the accompanying
>> IN_MOVED_TO event might be fetched only on the next read(2).
>
> Honza
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
next prev parent reply other threads:[~2014-04-04 7:35 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-03 6:34 Things I wish I'd known about Inotify Michael Kerrisk (man-pages)
2014-04-03 6:34 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjymzawYMKZGedK=fai55cwo4p=yeYe6GT8MdxWON__zw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-03 15:38 ` Eric W. Biederman
2014-04-03 15:38 ` Eric W. Biederman
2014-04-03 15:38 ` Eric W. Biederman
2014-04-04 7:59 ` Michael Kerrisk (man-pages)
2014-04-04 7:59 ` Michael Kerrisk (man-pages)
2014-04-04 20:24 ` Stef Bon
2014-04-03 20:52 ` Jan Kara
2014-04-03 20:52 ` Jan Kara
2014-04-04 7:35 ` Michael Kerrisk (man-pages) [this message]
[not found] ` <533E60D6.2000704-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-04 12:43 ` Jan Kara
2014-04-04 12:43 ` Jan Kara
2014-04-06 9:00 ` Michael Kerrisk (man-pages)
2014-04-06 9:00 ` Michael Kerrisk (man-pages)
[not found] ` <534117AD.1030708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-07 9:31 ` Jan Kara
2014-04-07 9:31 ` Jan Kara
[not found] ` <20140407093152.GC14927-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2014-04-12 5:44 ` Michael Kerrisk (man-pages)
2014-04-12 5:44 ` Michael Kerrisk (man-pages)
2014-07-12 19:06 ` Michael Kerrisk (man-pages)
2014-07-12 19:06 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkj=j2Auym+Euis0qYot3nYoASkeaf4kFPWvL-M-FCXEvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-14 11:28 ` Jan Kara
2014-07-14 11:28 ` Jan Kara
2014-07-15 4:15 ` Michael Kerrisk (man-pages)
2014-07-15 4:15 ` Michael Kerrisk (man-pages)
2014-04-04 13:00 ` David Herrmann
2014-04-04 13:00 ` David Herrmann
[not found] ` <CANq1E4RzNA_ajEwf1rRbTa8xOP392_YfD0mShK6QV=FexoOpUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-04 13:08 ` David Herrmann
2014-04-04 13:08 ` David Herrmann
2014-04-04 14:50 ` Eric Paris
2014-04-04 14:50 ` Eric Paris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=533E60D6.2000704@gmail.com \
--to=mtk.manpages@gmail.com \
--cc=daniel@veillard.com \
--cc=eparis@redhat.com \
--cc=gamin-list@gnome.org \
--cc=hch@infradead.org \
--cc=inotify-tools-general@lists.sourceforge.net \
--cc=jack@suse.cz \
--cc=john@johnmccutchan.com \
--cc=lennart@poettering.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=radu.voicilas@gmail.com \
--cc=rlove@rlove.org \
--cc=vegard.nossum@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.