* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] <20101018223540.GA20730@gradator.net> @ 2011-08-19 23:03 ` Sylvain Rochet [not found] ` <20110819230344.GA24784-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> 2011-08-21 22:29 ` NeilBrown 0 siblings, 2 replies; 15+ messages in thread From: Sylvain Rochet @ 2011-08-19 23:03 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-nfs; +Cc: Sylvain Rochet [-- Attachment #1: Type: text/plain, Size: 3039 bytes --] Hi, On Tue, Oct 19, 2010 at 12:35:40AM +0200, Sylvain Rochet wrote: > > ... upgraded to 2.6.33.5, then 2.6.33.7, finally to 2.6.35.7, and I > always end up with the same ending, it seems inotify can miss some VFS > events from time to time. I finally find out why. The NFS server does not always know the name of the modified file, if the modified inode was cleared from the VFS cache fsnotify does not know as well the filename then inotify child events on directories are silently tossed. Easy way to reproduce: Add a few printk debug (here it only works if /data is the NFS export): --- begin//fs/nfsd/vfs.c 2011-07-22 04:17:23.000000000 +0200 +++ linux-3.0/fs/nfsd/vfs.c 2011-07-30 03:18:17.837560809 +0200 @@ -975,6 +975,8 @@ inode = dentry->d_inode; exp = fhp->fh_export; + printk("nfsd write inode=%ld name=%s\n", inode->i_ino, dentry->d_name.name); + /* * Request sync writes if * - the sync export option has been set, or diff -Nru begin//include/linux/fsnotify.h linux-3.0/include/linux/fsnotify.h --- begin//include/linux/fsnotify.h 2011-07-22 04:17:23.000000000 +0200 +++ linux-3.0/include/linux/fsnotify.h 2011-07-30 03:07:00.330239062 +0200 @@ -216,8 +232,15 @@ mask |= FS_ISDIR; if (!(file->f_mode & FMODE_NONOTIFY)) { + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") ) + printk("fsnotify modify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name); fsnotify_parent(path, NULL, mask); fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0); + } else { + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") ) + printk("fsnotify modify-nonotify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name); } } On the NFS client, open a fd and send some data: # exec 1> test # ls -la # On the NFS server, check the kern log: Aug 20 00:57:44 inotifydebug kernel: nfsd write inode=13 name=test Aug 20 00:57:44 inotifydebug kernel: fsnotify modify inode=13 name=test Everything goes well. Now, clear the VFS cache on the NFS server: # echo 3 > /proc/sys/vm/drop_caches On the NFS client, send some data to the fd: # ls -la # On the NFS server, check the kern log: Aug 20 00:58:56 inotifydebug kernel: nfsd write inode=13 name= Aug 20 00:58:56 inotifydebug kernel: fsnotify modify inode=13 name= The filename is lost, fsnotify does not know the filename anymore, therefore inotify cannot send event about a modified file in a watched directory. End of the story. I guess this is almost impossible to fix this fsnotify bug, this is due by the fact that NFS use inode as file identifiers, so in some case this is impossible to know the modified filepath, and therefore impossible to match the file event to the directory watch. Kind regards, Sylvain [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20110819230344.GA24784-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org>]
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] ` <20110819230344.GA24784-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> @ 2011-08-19 23:37 ` Jamie Lokier [not found] ` <20110819233756.GI11512-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jamie Lokier @ 2011-08-19 23:37 UTC (permalink / raw) To: Sylvain Rochet Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA Sylvain Rochet wrote: > Hi, > > On Tue, Oct 19, 2010 at 12:35:40AM +0200, Sylvain Rochet wrote: > > > > ... upgraded to 2.6.33.5, then 2.6.33.7, finally to 2.6.35.7, and I > > always end up with the same ending, it seems inotify can miss some VFS > > events from time to time. > > I finally find out why. > > The NFS server does not always know the name of the modified file, if > the modified inode was cleared from the VFS cache fsnotify does not know > as well the filename then inotify child events on directories are > silently tossed. > > Easy way to reproduce: > > Add a few printk debug (here it only works if /data is the NFS export): > > --- begin//fs/nfsd/vfs.c 2011-07-22 04:17:23.000000000 +0200 > +++ linux-3.0/fs/nfsd/vfs.c 2011-07-30 03:18:17.837560809 +0200 > @@ -975,6 +975,8 @@ > inode = dentry->d_inode; > exp = fhp->fh_export; > > + printk("nfsd write inode=%ld name=%s\n", inode->i_ino, dentry->d_name.name); > + > /* > * Request sync writes if > * - the sync export option has been set, or > > diff -Nru begin//include/linux/fsnotify.h linux-3.0/include/linux/fsnotify.h > --- begin//include/linux/fsnotify.h 2011-07-22 04:17:23.000000000 +0200 > +++ linux-3.0/include/linux/fsnotify.h 2011-07-30 03:07:00.330239062 +0200 > @@ -216,8 +232,15 @@ > mask |= FS_ISDIR; > > if (!(file->f_mode & FMODE_NONOTIFY)) { > + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") ) > + printk("fsnotify modify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name); > fsnotify_parent(path, NULL, mask); > fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0); > + } else { > + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") ) > + printk("fsnotify modify-nonotify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name); > } > } > > > On the NFS client, open a fd and send some data: > > # exec 1> test > # ls -la > # > > On the NFS server, check the kern log: > > Aug 20 00:57:44 inotifydebug kernel: nfsd write inode=13 name=test > Aug 20 00:57:44 inotifydebug kernel: fsnotify modify inode=13 name=test > > Everything goes well. > > Now, clear the VFS cache on the NFS server: > > # echo 3 > /proc/sys/vm/drop_caches > > On the NFS client, send some data to the fd: > > # ls -la > # > > On the NFS server, check the kern log: > > Aug 20 00:58:56 inotifydebug kernel: nfsd write inode=13 name= > Aug 20 00:58:56 inotifydebug kernel: fsnotify modify inode=13 name= > > The filename is lost, fsnotify does not know the filename anymore, > therefore inotify cannot send event about a modified file in a watched > directory. > > End of the story. > > I guess this is almost impossible to fix this fsnotify bug, this is due > by the fact that NFS use inode as file identifiers, so in some case this > is impossible to know the modified filepath, and therefore impossible to > match the file event to the directory watch. Oh dear, that's a security hole, if something is using inotify/dnotify to watch and assumes that file contents (on the same machine, i.e. server in this case) do not change if there's no event received. It also breaks cache applications which make the same assumption. Is a solution to open inotify watches on every file individually? If so that seems quite severe. I do quite like the idea of using it to break past fanotify security restrictions though ;-) Can it also be bypassed with sys_open_by_handle_at? Possible solution: One way to look at this as like NFS having a secret hard link to the file, which does not show up in st_nlink. Hard links are already a bit tricky with fsnotify and directory watches. You can monitor a directory, but a file in it can change contents through another path. However, you can track changes of hard-linked files accurately by either putting a watch directly on all files whose st_nlink >= 2, and/or making sure you have watches on enough distinct directories that they contain st_nlink entries for the same file between them, because at least one of those directories will get an event. This is quite practical: You watch the files directly, until such time as you have found all its links (if you ever do), then you can remove the direct file watches. That gives me an idea to help with the NFS no-name watching: It looks like when a file is referenced by inode without a path, the problem is there's no path, so no directory inode to receive the event? Then this can be solved, in principle (if there's no better way), by watching a "virtual directory" that gets all events for when the access doesn't have a parent directory. There needs to be some way to watch it, and some way to get the appropriate file from the event (as there is no real directory. Or maybe there could be a virtual filesystem (like /proc, /sys etc.) containing a magic directory that receives these inode-only events, such that lookups in that directory yield the affected file. Exactly as if the directory contains a hard link to every file, perhaps a text encoding of the handles passed through sys_open_by_handle_at. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20110819233756.GI11512-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>]
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] ` <20110819233756.GI11512-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> @ 2011-08-20 0:47 ` Sylvain Rochet 2011-08-20 3:21 ` Jamie Lokier 0 siblings, 1 reply; 15+ messages in thread From: Sylvain Rochet @ 2011-08-20 0:47 UTC (permalink / raw) To: Jamie Lokier Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Sylvain Rochet [-- Attachment #1: Type: text/plain, Size: 4246 bytes --] Hi Jamie, On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote: > > Oh dear, that's a security hole, if something is using inotify/dnotify > to watch and assumes that file contents (on the same machine, > i.e. server in this case) do not change if there's no event received. > > It also breaks cache applications which make the same assumption. > > I do quite like the idea of using it to break past fanotify security > restrictions though ;-) It also probably means that fanotify misses some events when a filesystem is modified over NFS. If fanotify is used the way it is designed, i.e. with an antiviruse software, this may be an interesting way to skip the antiviruse check. Here we go: NFS server, run the fanotify example tool: ~/fanotify-example# ./fanotify -m /data/ NFS client, open a fd then do some I/O: # exec 1> test # ls -la # NFS server log: /data/test: pid=1235 modify close(writable) NFS server, cache clearing: # echo 3 > /proc/sys/vm/drop_caches NFS client, more I/O: # ls -la NFS server log: /data: pid=1234 modify close(writable) We receive an event... which is obviously wrong. This is even worse than no event at all, we receive an event about the wrong inode, the parent inode of the modified file actually. > Is a solution to open inotify watches on every file individually? If > so that seems quite severe. This is what I am going to do, at least temporarily, I only need to watch about a million file (and slowly counting). The startup time to watch an entire filesystem using inotify already require a full filesystem walk, watching all files and directories instead of directories only will not change much because most of the time is spent waiting I/O operations. This may however require a lot more memory both on kernel side and userland side. > Can it also be bypassed with sys_open_by_handle_at? No clue, this should be checked but there is no evident reason that it cannot be bypassed this way as well. > Possible solution: > > One way to look at this as like NFS having a secret hard link to the > file, which does not show up in st_nlink. > > Hard links are already a bit tricky with fsnotify and directory > watches. You can monitor a directory, but a file in it can change > contents through another path. > > However, you can track changes of hard-linked files accurately by > either putting a watch directly on all files whose st_nlink >= 2, > and/or making sure you have watches on enough distinct directories > that they contain st_nlink entries for the same file between them, > because at least one of those directories will get an event. This is > quite practical: You watch the files directly, until such time as you > have found all its links (if you ever do), then you can remove the > direct file watches. Yup, I agree. > That gives me an idea to help with the NFS no-name watching: > > It looks like when a file is referenced by inode without a path, the > problem is there's no path, so no directory inode to receive the > event? No filepath and no filename at all actually. There is no way to find the (or "a" if the file is linked to more than one directory) filename other than walking among all the directory tree to find where the inode is linked. We need a directory entry (dirent) to send an event about a modified file inside a watched directory. > Then this can be solved, in principle (if there's no better way), by > watching a "virtual directory" that gets all events for when the > access doesn't have a parent directory. There needs to be some way to > watch it, and some way to get the appropriate file from the event (as > there is no real directory. Or maybe there could be a virtual > filesystem (like /proc, /sys etc.) containing a magic directory that > receives these inode-only events, such that lookups in that directory > yield the affected file. Exactly as if the directory contains a hard > link to every file, perhaps a text encoding of the handles passed > through sys_open_by_handle_at. By doing that, we'll only get the inode nb as we cannot fetch the filename. Sylvain [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-20 0:47 ` Sylvain Rochet @ 2011-08-20 3:21 ` Jamie Lokier 0 siblings, 0 replies; 15+ messages in thread From: Jamie Lokier @ 2011-08-20 3:21 UTC (permalink / raw) To: Sylvain Rochet; +Cc: linux-kernel, linux-fsdevel, linux-nfs Sylvain Rochet wrote: > Hi Jamie, > > > On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote: > > > > Oh dear, that's a security hole, if something is using inotify/dnotify > > to watch and assumes that file contents (on the same machine, > > i.e. server in this case) do not change if there's no event received. > > > > It also breaks cache applications which make the same assumption. > > > > I do quite like the idea of using it to break past fanotify security > > restrictions though ;-) > > It also probably means that fanotify misses some events when a filesystem > is modified over NFS. If fanotify is used the way it is designed, i.e. > with an antiviruse software, this may be an interesting way to skip the > antiviruse check. > > Here we go: > > NFS server, run the fanotify example tool: > > ~/fanotify-example# ./fanotify -m /data/ > > NFS client, open a fd then do some I/O: > > # exec 1> test > # ls -la > # > > NFS server log: > > /data/test: pid=1235 modify close(writable) > > NFS server, cache clearing: > > # echo 3 > /proc/sys/vm/drop_caches > > NFS client, more I/O: > > # ls -la > > NFS server log: > > /data: pid=1234 modify close(writable) > > We receive an event... which is obviously wrong. This is even worse than > no event at all, we receive an event about the wrong inode, the parent > inode of the modified file actually. That sounds like a proper bug, maybe it can be fixed at least? > > Is a solution to open inotify watches on every file individually? If > > so that seems quite severe. > > This is what I am going to do, at least temporarily, I only need to > watch about a million file (and slowly counting). > > The startup time to watch an entire filesystem using inotify already > require a full filesystem walk, watching all files and directories > instead of directories only will not change much because most of the > time is spent waiting I/O operations. This may however require a lot > more memory both on kernel side and userland side. Watching an entire filesystem entails reading all the directories, but you don't have to fetch the inodes of files. But still, it's very slow (takes about 15 minutes on my /home from cold cache, just to read the million or so directories). There was some work on propagating events upwards so that efficient recursive watches could be established, in the context of fanotify but it would make sense to be available to all fsnotify users. I wonder how that went. > > Then this can be solved, in principle (if there's no better way), by > > watching a "virtual directory" that gets all events for when the > > access doesn't have a parent directory. There needs to be some way to > > watch it, and some way to get the appropriate file from the event (as > > there is no real directory. Or maybe there could be a virtual > > filesystem (like /proc, /sys etc.) containing a magic directory that > > receives these inode-only events, such that lookups in that directory > > yield the affected file. Exactly as if the directory contains a hard > > link to every file, perhaps a text encoding of the handles passed > > through sys_open_by_handle_at. > > By doing that, we'll only get the inode nb as we cannot fetch the filename. Yes... That's ok if it's one we are tracking inode->multiple-paths in userspace anyway (for hard links). But it's quite demanding if we hoped to avoid fetching and storing that in userspace for st_nlink == 1 files. In that case it is still better to get a notification "something unknown on this FS has changed", rather than no notification. Userspace would react by flushing all of its cached knowledge of things under directory watches that don't have direct watches. But at least that's reliable and correct behaviour, and if it happens often, userspace heuristics can react by watching priority inodes more directly. If that's the common case, then these nameless, pathless events could just trigger a simple event with catch-all IN_NO_PATH flag set, referring to the filesytem but no more detail than that. inotify would accept that flag when adding a watch, ignore the inode given but remember the filesystem, and send all events with no path to the watch(es) created with that flag on that filesystem. It's a flag because the event type is still useful. All the best, -- Jamie ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-19 23:37 ` Jamie Lokier [not found] ` <20110819233756.GI11512-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> @ 2011-08-20 1:29 ` Al Viro 2011-08-20 1:43 ` Randy Dunlap [not found] ` <20110820012943.GD2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2011-08-25 21:47 ` Sylvain Rochet 2 siblings, 2 replies; 15+ messages in thread From: Al Viro @ 2011-08-20 1:29 UTC (permalink / raw) To: Jamie Lokier; +Cc: Sylvain Rochet, linux-kernel, linux-fsdevel, linux-nfs On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote: > Possible solution: > Then this can be solved, in principle (if there's no better way), by > watching a "virtual directory" that gets all events for when the > access doesn't have a parent directory. There needs to be some way to > watch it, and some way to get the appropriate file from the event (as > there is no real directory. Or maybe there could be a virtual > filesystem (like /proc, /sys etc.) containing a magic directory that > receives these inode-only events, such that lookups in that directory > yield the affected file. Exactly as if the directory contains a hard > link to every file, perhaps a text encoding of the handles passed > through sys_open_by_handle_at. There is a better way - stop using idiotify... It has always been a mistake, driven down our throats by filemangler and desktop crowd. Broken in many, _many_ respects... Deprecate that crap, remove it completely in a couple of revisions, let these clowns cope. Yeah, yeah, I know... Not going to happen ;-/ One can dream, though... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-20 1:29 ` Al Viro @ 2011-08-20 1:43 ` Randy Dunlap 2011-08-20 2:01 ` Al Viro [not found] ` <20110820012943.GD2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Randy Dunlap @ 2011-08-20 1:43 UTC (permalink / raw) To: Al Viro Cc: Jamie Lokier, Sylvain Rochet, linux-kernel, linux-fsdevel, linux-nfs On Sat, 20 Aug 2011 02:29:43 +0100 Al Viro wrote: > On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote: > > > Possible solution: > > > Then this can be solved, in principle (if there's no better way), by > > watching a "virtual directory" that gets all events for when the > > access doesn't have a parent directory. There needs to be some way to > > watch it, and some way to get the appropriate file from the event (as > > there is no real directory. Or maybe there could be a virtual > > filesystem (like /proc, /sys etc.) containing a magic directory that > > receives these inode-only events, such that lookups in that directory > > yield the affected file. Exactly as if the directory contains a hard > > link to every file, perhaps a text encoding of the handles passed > > through sys_open_by_handle_at. > > There is a better way - stop using idiotify... It has always been a > mistake, driven down our throats by filemangler and desktop crowd. > Broken in many, _many_ respects... Deprecate that crap, remove it > completely in a couple of revisions, let these clowns cope. Yeah, > yeah, I know... Not going to happen ;-/ One can dream, though... What do you suggest as a functioning alternative to inotify? --- ~Randy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-20 1:43 ` Randy Dunlap @ 2011-08-20 2:01 ` Al Viro 0 siblings, 0 replies; 15+ messages in thread From: Al Viro @ 2011-08-20 2:01 UTC (permalink / raw) To: Randy Dunlap Cc: Jamie Lokier, Sylvain Rochet, linux-kernel, linux-fsdevel, linux-nfs On Fri, Aug 19, 2011 at 06:43:11PM -0700, Randy Dunlap wrote: > What do you suggest as a functioning alternative to inotify? Depends on what particular (ab)use do you have in mind... ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20110820012943.GD2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] ` <20110820012943.GD2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2011-08-20 3:03 ` Jamie Lokier [not found] ` <20110820030335.GA14899-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Jamie Lokier @ 2011-08-20 3:03 UTC (permalink / raw) To: Al Viro Cc: Sylvain Rochet, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA Al Viro wrote: > On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote: > > > Possible solution: > > > Then this can be solved, in principle (if there's no better way), by > > watching a "virtual directory" that gets all events for when the > > access doesn't have a parent directory. There needs to be some way to > > watch it, and some way to get the appropriate file from the event (as > > there is no real directory. Or maybe there could be a virtual > > filesystem (like /proc, /sys etc.) containing a magic directory that > > receives these inode-only events, such that lookups in that directory > > yield the affected file. Exactly as if the directory contains a hard > > link to every file, perhaps a text encoding of the handles passed > > through sys_open_by_handle_at. > > There is a better way - stop using idiotify... It has always been a > mistake, driven down our throats by filemangler and desktop crowd. Well you still have your sense of humour... I've never understood why you think it's about the file manager / desktop, or why you so strongly dislike the feature. It originated there historically, but that is not it's primary use. The implementation, sure, but you seem to dislike the very *principle* of subscribing to changes. Every interesting use of inotify that I've seen is for some kind of cache support, to eliminate the majority of stat() calls, to remove disk I/O (no stat means no inode), to ensure correctness (st_mtime is coarse and unreliable), and to avoid having to modify every application which might affect any file from which cached items are derived to explicitly notify all the applications which might use any of those files. You like high performance, reliable and correct behaviour, and high scalability. So I have never understood why you dislike the change-subscription principle so strongly, because it is a natural ally to those properties. All the best, -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20110820030335.GA14899-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org>]
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] ` <20110820030335.GA14899-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> @ 2011-08-21 17:07 ` J. Bruce Fields [not found] ` <20110821170714.GB9296-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: J. Bruce Fields @ 2011-08-21 17:07 UTC (permalink / raw) To: Jamie Lokier Cc: Al Viro, Sylvain Rochet, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA On Sat, Aug 20, 2011 at 04:03:35AM +0100, Jamie Lokier wrote: > Well you still have your sense of humour... > > I've never understood why you think it's about the file manager / > desktop, or why you so strongly dislike the feature. It originated > there historically, but that is not it's primary use. > > The implementation, sure, but you seem to dislike the very *principle* > of subscribing to changes. > > Every interesting use of inotify that I've seen is for some kind of > cache support, to eliminate the majority of stat() calls, to remove > disk I/O (no stat means no inode), to ensure correctness (st_mtime is > coarse and unreliable), It seems rather fragile as an mtime replacement unless it's also got some sort of logging built in at a pretty low level so that you don't lose events while you're not listening. And of course events have to be defined very carefully to avoid problems such as this one. > and to avoid having to modify every > application which might affect any file from which cached items are > derived to explicitly notify all the applications which might use any > of those files. > > You like high performance, reliable and correct behaviour, and high > scalability. So I have never understood why you dislike the > change-subscription principle so strongly, because it is a natural > ally to those properties. I don't think we've seen a design that does all of that yet. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20110821170714.GB9296-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] ` <20110821170714.GB9296-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2011-08-21 20:20 ` Jamie Lokier 2011-08-21 23:07 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: Jamie Lokier @ 2011-08-21 20:20 UTC (permalink / raw) To: J. Bruce Fields Cc: Al Viro, Sylvain Rochet, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA J. Bruce Fields wrote: > On Sat, Aug 20, 2011 at 04:03:35AM +0100, Jamie Lokier wrote: > > Well you still have your sense of humour... > > > > I've never understood why you think it's about the file manager / > > desktop, or why you so strongly dislike the feature. It originated > > there historically, but that is not it's primary use. > > > > The implementation, sure, but you seem to dislike the very *principle* > > of subscribing to changes. > > > > Every interesting use of inotify that I've seen is for some kind of > > cache support, to eliminate the majority of stat() calls, to remove > > disk I/O (no stat means no inode), to ensure correctness (st_mtime is > > coarse and unreliable), > > It seems rather fragile as an mtime replacement unless it's also got > some sort of logging built in at a pretty low level so that you don't > lose events while you're not listening. It mainly serves as an accelerator for existing stat/mtime checks, though it does improve change detection in the last second or so since a previous change, which with mtime you have to make pessimistic or sometimes-incorrect assumptions. Quite a few programs use inotify now because it saves a little power, and is a bit more responsive than, say, polling config files with stat(). For reliable filesystem tracking across times when not listening, especially if you don't trust the clock to have no backward steps (and you should not), a lazy change count file attribute would do. It's been discussed but never implemented. > And of course events have to be defined very carefully to avoid problems > such as this one. This thread has revealed quite a big hole, I agree. Apps cannot even use their normal filesystem-type whitelisting to catch this. This is bad! It is not the first hole that was found in inotify/dnotify, but it's the first one I'm aware of that wasn't pointed out long ago and then quietly ignored :-/ > > and to avoid having to modify every > > application which might affect any file from which cached items are > > derived to explicitly notify all the applications which might use any > > of those files. > > > > You like high performance, reliable and correct behaviour, and high > > scalability. So I have never understood why you dislike the > > change-subscription principle so strongly, because it is a natural > > ally to those properties. > > I don't think we've seen a design that does all of that yet. Designs get discussed from time to time, over the decades. I think one of the reasons it doesn't go further is Al's well-known objection -- why put the effort in if you know it will be rejected. And a widespread view that it's just unimportant GUI file manager fluff. The latter also means dependability issues have tended not to be taken seriously. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-21 20:20 ` Jamie Lokier @ 2011-08-21 23:07 ` NeilBrown 2011-08-22 17:22 ` J. Bruce Fields 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2011-08-21 23:07 UTC (permalink / raw) To: Jamie Lokier Cc: J. Bruce Fields, Al Viro, Sylvain Rochet, linux-kernel, linux-fsdevel, linux-nfs On Sun, 21 Aug 2011 21:20:58 +0100 Jamie Lokier <jamie@shareable.org> wrote: > J. Bruce Fields wrote: > > On Sat, Aug 20, 2011 at 04:03:35AM +0100, Jamie Lokier wrote: > > > Well you still have your sense of humour... > > > > > > I've never understood why you think it's about the file manager / > > > desktop, or why you so strongly dislike the feature. It originated > > > there historically, but that is not it's primary use. > > > > > > The implementation, sure, but you seem to dislike the very *principle* > > > of subscribing to changes. > > > > > > Every interesting use of inotify that I've seen is for some kind of > > > cache support, to eliminate the majority of stat() calls, to remove > > > disk I/O (no stat means no inode), to ensure correctness (st_mtime is > > > coarse and unreliable), > > > > It seems rather fragile as an mtime replacement unless it's also got > > some sort of logging built in at a pretty low level so that you don't > > lose events while you're not listening. > > It mainly serves as an accelerator for existing stat/mtime checks, > though it does improve change detection in the last second or so since > a previous change, which with mtime you have to make pessimistic or > sometimes-incorrect assumptions. > > Quite a few programs use inotify now because it saves a little power, > and is a bit more responsive than, say, polling config files with stat(). > > For reliable filesystem tracking across times when not listening, > especially if you don't trust the clock to have no backward steps (and > you should not), a lazy change count file attribute would do. It's > been discussed but never implemented. > > > And of course events have to be defined very carefully to avoid problems > > such as this one. > > This thread has revealed quite a big hole, I agree. Apps cannot even > use their normal filesystem-type whitelisting to catch this. This is bad! > > It is not the first hole that was found in inotify/dnotify, but it's > the first one I'm aware of that wasn't pointed out long ago and > then quietly ignored :-/ > > > > and to avoid having to modify every > > > application which might affect any file from which cached items are > > > derived to explicitly notify all the applications which might use any > > > of those files. > > > > > > You like high performance, reliable and correct behaviour, and high > > > scalability. So I have never understood why you dislike the > > > change-subscription principle so strongly, because it is a natural > > > ally to those properties. > > > > I don't think we've seen a design that does all of that yet. > > Designs get discussed from time to time, over the decades. > > I think one of the reasons it doesn't go further is Al's well-known > objection -- why put the effort in if you know it will be rejected. > And a widespread view that it's just unimportant GUI file manager fluff. > The latter also means dependability issues have tended not to be taken > seriously. I know you weren't asking for design suggestions, but somehow I just couldn't help myself :-) The (or "a") problem with {d,i,fa}notify is that it makes a core assumption that is flawed. i.e. that a file is in some directory. It might be nice if that were a reliable fact but thanks to our founding fathers, it is not. If a file only ever had one name - never more nor less - and could not have that name changed while it were open, then quite a lot of things would be a lot easier. And probably a lot of things would be a lot harder. But we don't live in that world (others do - I think you know where it is). So we must drop this assumption. Getting notification on an fd when the opened file changes makes perfect sense. Some /proc and /sys files already provide this functionality and we can expect that more will. Adding that to regular filesystems may not be out of the question. This would be useful, but of limited use. You could find out when a given file changed - either an mtime-like change or a ctime-like change. By monitoring a directory you could find out when a name was added or removed. But to find out when "any file in a directory changes" you would need to open and monitor every file, which is expensive. The other ("another") problem is the lack of recursion. You can find out when a file in a directory changes, but not a file in a directory tree. This significantly reduces the value. We really want to know about directory trees. However a "directory tree" - much like "all the files in a directory" isn't really a very well defined concept - at least from the perspective of providing notifications. You cannot easily answer "is this file in that tree?" or "which tree(s) is this file in?". However there are well defined sets of files such that we could reliably generate notifications if any file in the set were changed, or if a file were added-to or remove-from the set. We should be looking for these sorts of sets and seeing which are useful. e.g. - all files in a given filesystem. Generating notification for any change in a given filesystem is a well defined task. It might generate too much noise, but it would still have a place. - all files with a given uid (or gid). - all directories. or all regular files - all setuid, setgid, or world-writable files Each of these are strongly defined and we can map from file to set quite easily. We could obviously intersect the sets to, so I could get events when any directory owned by me on a particular filesystem was changed. It would even be reasonable for the events to contain a newly opened fd from which I can extract dev/inode info and possibly extract a path name. However this still might not be fine-grained enough. While a "directory tree" is not really a well defined concept, it is in my mind. e.g. it seems reasonable to want to find out about all changes in $HOME/.config I can see two approaches to this - though there might be others. All of them must in some way create a strong concept of a directory tree. One is to use bind mounts. i.e. I effectively do mount --bind $HOME/.config $HOME/.config and ask for events from the newly created vfsmnt. This will not catch changes made through file descriptors that were opened before I did the mount, or through hard links from some other directory tree. But for a particular use-case that might not be a problem. The other requires support from the filesystem and so cannot be provided universally. It could possibly be imposed generically for filesystems that support extended attributes .... but I feel dirty even suggesting that (Dobby must now go and iron his hands!) The filesystem could support the concept of a 'directory tree' much like BTRFS allows subvolumes which a like independent filesystems within the one big filesystem. However for this purpose the 'directory tree' would be a very light weight concept (it wouldn't need its own inode number space). For example, each inode could store an extra number which is the inode number of the root of its directory tree. This would be inherited from parent during create. Renaming or linking a file would fail if the target had a different directory tree number. (renaming a file with only one link might succeed and change the directory tree number). An empty directory could be told to become a root somehow. Then you would have a strong concept of a directory tree that could be used for notifications. Obviously this approach could not be used to solve any immediate problems. But if new filesystems started supporting light-weight-directory-trees as a well defined set of files, then in 5-10 years we might have a nice working solution. [of course then you need to layer any design you come up with on NFS ... but that can probably be done in user-space with libraries and daemons]. NeilBrown ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-21 23:07 ` NeilBrown @ 2011-08-22 17:22 ` J. Bruce Fields [not found] ` <20110822172239.GA15960-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: J. Bruce Fields @ 2011-08-22 17:22 UTC (permalink / raw) To: NeilBrown Cc: Jamie Lokier, Al Viro, Sylvain Rochet, linux-kernel, linux-fsdevel, linux-nfs On Mon, Aug 22, 2011 at 09:07:51AM +1000, NeilBrown wrote: > One is to use bind mounts. i.e. I effectively do > mount --bind $HOME/.config $HOME/.config > and ask for events from the newly created vfsmnt. > This will not catch changes made through file descriptors that were opened > before I did the mount, or through hard links from some other directory > tree. But for a particular use-case that might not be a problem. I'm missing what the extra vfsmount gets you here. The problems seem just the same as if you don't have one. Oh, wait, I see, it's that the file descriptors are associated with vfsmounts, not just dentries. Hm. --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20110822172239.GA15960-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing [not found] ` <20110822172239.GA15960-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2011-08-22 23:21 ` NeilBrown 0 siblings, 0 replies; 15+ messages in thread From: NeilBrown @ 2011-08-22 23:21 UTC (permalink / raw) To: J. Bruce Fields Cc: Jamie Lokier, Al Viro, Sylvain Rochet, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA On Mon, 22 Aug 2011 13:22:39 -0400 "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Mon, Aug 22, 2011 at 09:07:51AM +1000, NeilBrown wrote: > > One is to use bind mounts. i.e. I effectively do > > mount --bind $HOME/.config $HOME/.config > > and ask for events from the newly created vfsmnt. > > This will not catch changes made through file descriptors that were opened > > before I did the mount, or through hard links from some other directory > > tree. But for a particular use-case that might not be a problem. > > I'm missing what the extra vfsmount gets you here. The problems seem > just the same as if you don't have one. > > Oh, wait, I see, it's that the file descriptors are associated with > vfsmounts, not just dentries. Hm. "Hm" might be right. writes have a 'struct file', but mkdir and rename etc just get an inode. So to be able to notify based on vfsmnt, mkdirat - for examine - would need to pass the path to vfs_mkdir rather than path.dentry->d_inode, and vfs_mkdir would have to pass that path to fsnotify_mkdir. So more intrusive that I imagined, but still quite do-able.... if it were thought to be useful. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-19 23:37 ` Jamie Lokier [not found] ` <20110819233756.GI11512-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> 2011-08-20 1:29 ` Al Viro @ 2011-08-25 21:47 ` Sylvain Rochet 2 siblings, 0 replies; 15+ messages in thread From: Sylvain Rochet @ 2011-08-25 21:47 UTC (permalink / raw) To: Jamie Lokier; +Cc: linux-kernel, linux-fsdevel, linux-nfs [-- Attachment #1.1: Type: text/plain, Size: 371 bytes --] Hi, On Sat, Aug 20, 2011 at 12:37:56AM +0100, Jamie Lokier wrote: > > Can it also be bypassed with sys_open_by_handle_at? Just checked that and... as planned it can be bypassed with name_to_handle_at/open_by_handle_at as well. (I added my quick'n'dirty test code, maybe someone will be interested by a working fhandle example on Linux 3.0.x). Sylvain [-- Attachment #1.2: fhandle3.c --] [-- Type: text/x-csrc, Size: 892 bytes --] #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <string.h> struct file_handle { int handle_size; int handle_type; void *f_handle; }; int main(int argc, char *argv[]) { int ret; int fd; int mntfd; struct file_handle fh; fh.handle_type = 0; fh.f_handle = malloc(100); fh.handle_size = 100/sizeof(int); errno = 0; printf("%s\n", argv[1]); int mntid = 20; ret = syscall(303, 0, argv[1], &fh, &mntid, 0); if (ret) { perror("303 Error:"); exit(1); } printf("%d - %d\n", mntid, fh.handle_size); printf("sleep\n"); sleep(60); mntfd = open("/data", O_RDONLY); printf("dir = %d\n", mntfd ); fd = syscall(304, mntfd, &fh, O_WRONLY); if (fd < 0 ) { perror("304 Error:"); exit(1); } printf("fd = %d\n", fd); write(fd, "toto", 4); if( close(fd) ) { printf("close failed\n"); } return 0; } [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PROBLEM: 2.6.35.7 to 3.0 Inotify events missing 2011-08-19 23:03 ` PROBLEM: 2.6.35.7 to 3.0 Inotify events missing Sylvain Rochet [not found] ` <20110819230344.GA24784-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> @ 2011-08-21 22:29 ` NeilBrown 1 sibling, 0 replies; 15+ messages in thread From: NeilBrown @ 2011-08-21 22:29 UTC (permalink / raw) To: Sylvain Rochet; +Cc: linux-kernel, linux-fsdevel, linux-nfs On Sat, 20 Aug 2011 01:03:44 +0200 Sylvain Rochet <gradator@gradator.net> wrote: > Hi, > > On Tue, Oct 19, 2010 at 12:35:40AM +0200, Sylvain Rochet wrote: > > > > ... upgraded to 2.6.33.5, then 2.6.33.7, finally to 2.6.35.7, and I > > always end up with the same ending, it seems inotify can miss some VFS > > events from time to time. > > I finally find out why. > > The NFS server does not always know the name of the modified file, if > the modified inode was cleared from the VFS cache fsnotify does not know > as well the filename then inotify child events on directories are > silently tossed. This may not be helpful but if you export the filesystem with the "subtree_check" option then the NFS server will always know a name for any file that it modifies. This is not without cost though, which is why it is no longer the default. If the client opens a file, the server moves it to a new directory and then the server cache is flushed, then the client will get an ESTALE error. This is rarely a problem in practice but when it is it can be quite annoying... NeilBrown > > Easy way to reproduce: > > Add a few printk debug (here it only works if /data is the NFS export): > > --- begin//fs/nfsd/vfs.c 2011-07-22 04:17:23.000000000 +0200 > +++ linux-3.0/fs/nfsd/vfs.c 2011-07-30 03:18:17.837560809 +0200 > @@ -975,6 +975,8 @@ > inode = dentry->d_inode; > exp = fhp->fh_export; > > + printk("nfsd write inode=%ld name=%s\n", inode->i_ino, dentry->d_name.name); > + > /* > * Request sync writes if > * - the sync export option has been set, or > > diff -Nru begin//include/linux/fsnotify.h linux-3.0/include/linux/fsnotify.h > --- begin//include/linux/fsnotify.h 2011-07-22 04:17:23.000000000 +0200 > +++ linux-3.0/include/linux/fsnotify.h 2011-07-30 03:07:00.330239062 +0200 > @@ -216,8 +232,15 @@ > mask |= FS_ISDIR; > > if (!(file->f_mode & FMODE_NONOTIFY)) { > + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") ) > + printk("fsnotify modify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name); > fsnotify_parent(path, NULL, mask); > fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0); > + } else { > + if( !strcmp(path->mnt->mnt_mountpoint->d_name.name, "data") ) > + printk("fsnotify modify-nonotify inode=%ld name=%s\n", inode->i_ino, file->f_dentry->d_name.name); > } > } > > > On the NFS client, open a fd and send some data: > > # exec 1> test > # ls -la > # > > On the NFS server, check the kern log: > > Aug 20 00:57:44 inotifydebug kernel: nfsd write inode=13 name=test > Aug 20 00:57:44 inotifydebug kernel: fsnotify modify inode=13 name=test > > Everything goes well. > > Now, clear the VFS cache on the NFS server: > > # echo 3 > /proc/sys/vm/drop_caches > > On the NFS client, send some data to the fd: > > # ls -la > # > > On the NFS server, check the kern log: > > Aug 20 00:58:56 inotifydebug kernel: nfsd write inode=13 name= > Aug 20 00:58:56 inotifydebug kernel: fsnotify modify inode=13 name= > > The filename is lost, fsnotify does not know the filename anymore, > therefore inotify cannot send event about a modified file in a watched > directory. > > End of the story. > > I guess this is almost impossible to fix this fsnotify bug, this is due > by the fact that NFS use inode as file identifiers, so in some case this > is impossible to know the modified filepath, and therefore impossible to > match the file event to the directory watch. > > Kind regards, > Sylvain ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-08-25 21:47 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20101018223540.GA20730@gradator.net> 2011-08-19 23:03 ` PROBLEM: 2.6.35.7 to 3.0 Inotify events missing Sylvain Rochet [not found] ` <20110819230344.GA24784-XWGZPxRNpGHk1uMJSBkQmQ@public.gmane.org> 2011-08-19 23:37 ` Jamie Lokier [not found] ` <20110819233756.GI11512-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> 2011-08-20 0:47 ` Sylvain Rochet 2011-08-20 3:21 ` Jamie Lokier 2011-08-20 1:29 ` Al Viro 2011-08-20 1:43 ` Randy Dunlap 2011-08-20 2:01 ` Al Viro [not found] ` <20110820012943.GD2203-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2011-08-20 3:03 ` Jamie Lokier [not found] ` <20110820030335.GA14899-DqlFc3psUjeg7Qil/0GVWOc42C6kRsbE@public.gmane.org> 2011-08-21 17:07 ` J. Bruce Fields [not found] ` <20110821170714.GB9296-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2011-08-21 20:20 ` Jamie Lokier 2011-08-21 23:07 ` NeilBrown 2011-08-22 17:22 ` J. Bruce Fields [not found] ` <20110822172239.GA15960-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2011-08-22 23:21 ` NeilBrown 2011-08-25 21:47 ` Sylvain Rochet 2011-08-21 22:29 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).