public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend
@ 2008-11-25 17:20 Eric Paris
  2008-11-25 17:20 ` [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification Eric Paris
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:20 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

This patch series implements fsnotify a filesystem notification backend which
should be the basis for dnotify, inotify, and eventually fanotify.

This series only reimplements dnotify using the new fsnotify backend.  If
accepted I will do the work to port inotify as well.  Currently struct inode
goes from:

#ifdef CONFIG_DNOTIFY
       unsigned long           i_dnotify_mask; /* Directory notify events */
       struct dnotify_struct   *i_dnotify; /* for directory notifications */
#endif

to:
#ifdef CONFIG_FSNOTIFY
       unsigned long           i_fsnotify_mask; /* all events this inode cares about */
       struct list_head        i_fsnotify_mark_entries; /* fsnotify mark entries */
       spinlock_t              i_fsnotify_lock; /* protect the entries list */
#endif

so the inode still grows, but the inotify fields will be dropped as well
resulting in a smaller struct inode.  These are all the fields fanotify will
want as well.

rwlocks have been completely dropped in favor of even smaller critical section
spinlocks.  i_lock is no longer used to protect dnotify information and
instead the more specific i_fsnotify_lock is used.
---

Eric Paris (8):
      dnotify: reimplement dnotify using fsnotify
      fsnotify: add in inode fsnotify markings
      fsnotify: add group priorities
      fsnotify: unified filesystem notification backend
      fsnotify: use the new open-exec hook for inotify and dnotify
      fsnotify: sys_execve and sys_uselib do not call into fsnotify
      fsnotify: pass a file instead of an inode to open, read, and write
      filesystem notification: create fs/notify to contain all fs notification


 fs/Kconfig                       |   39 --
 fs/Makefile                      |    5 
 fs/compat.c                      |    5 
 fs/dnotify.c                     |  194 --------
 fs/exec.c                        |    5 
 fs/inode.c                       |    7 
 fs/inotify.c                     |  911 --------------------------------------
 fs/inotify_user.c                |  778 --------------------------------
 fs/nfsd/vfs.c                    |    4 
 fs/notify/Kconfig                |   14 +
 fs/notify/Makefile               |    4 
 fs/notify/dnotify/Kconfig        |   11 
 fs/notify/dnotify/Makefile       |    1 
 fs/notify/dnotify/dnotify.c      |  400 +++++++++++++++++
 fs/notify/fsnotify.c             |  104 ++++
 fs/notify/fsnotify.h             |   99 ++++
 fs/notify/group.c                |  150 ++++++
 fs/notify/inode_mark.c           |  226 +++++++++
 fs/notify/inotify/Kconfig        |   27 +
 fs/notify/inotify/Makefile       |    2 
 fs/notify/inotify/inotify.c      |  911 ++++++++++++++++++++++++++++++++++++++
 fs/notify/inotify/inotify_user.c |  778 ++++++++++++++++++++++++++++++++
 fs/notify/notification.c         |  188 ++++++++
 fs/open.c                        |    2 
 fs/read_write.c                  |    8 
 include/linux/dnotify.h          |   21 -
 include/linux/fs.h               |    7 
 include/linux/fsnotify.h         |   73 ++-
 include/linux/fsnotify_backend.h |  103 ++++
 29 files changed, 3100 insertions(+), 1977 deletions(-)
 delete mode 100644 fs/dnotify.c
 delete mode 100644 fs/inotify.c
 delete mode 100644 fs/inotify_user.c
 create mode 100644 fs/notify/Kconfig
 create mode 100644 fs/notify/Makefile
 create mode 100644 fs/notify/dnotify/Kconfig
 create mode 100644 fs/notify/dnotify/Makefile
 create mode 100644 fs/notify/dnotify/dnotify.c
 create mode 100644 fs/notify/fsnotify.c
 create mode 100644 fs/notify/fsnotify.h
 create mode 100644 fs/notify/group.c
 create mode 100644 fs/notify/inode_mark.c
 create mode 100644 fs/notify/inotify/Kconfig
 create mode 100644 fs/notify/inotify/Makefile
 create mode 100644 fs/notify/inotify/inotify.c
 create mode 100644 fs/notify/inotify/inotify_user.c
 create mode 100644 fs/notify/notification.c
 create mode 100644 include/linux/fsnotify_backend.h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
@ 2008-11-25 17:20 ` Eric Paris
  2008-11-28  5:24   ` Al Viro
  2008-11-25 17:21 ` [PATCH -v3 2/8] fsnotify: pass a file instead of an inode to open, read, and write Eric Paris
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:20 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

Adding yet another filesystem notification system it seemed like a good
idea to clean up fs/ by creating an fs/notify and putting everything
there.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/Kconfig                       |   39 --
 fs/Makefile                      |    5 
 fs/dnotify.c                     |  194 --------
 fs/inotify.c                     |  911 --------------------------------------
 fs/inotify_user.c                |  778 --------------------------------
 fs/notify/Kconfig                |    2 
 fs/notify/Makefile               |    2 
 fs/notify/dnotify/Kconfig        |   10 
 fs/notify/dnotify/Makefile       |    1 
 fs/notify/dnotify/dnotify.c      |  194 ++++++++
 fs/notify/inotify/Kconfig        |   27 +
 fs/notify/inotify/Makefile       |    2 
 fs/notify/inotify/inotify.c      |  911 ++++++++++++++++++++++++++++++++++++++
 fs/notify/inotify/inotify_user.c |  778 ++++++++++++++++++++++++++++++++
 14 files changed, 1929 insertions(+), 1925 deletions(-)
 delete mode 100644 fs/dnotify.c
 delete mode 100644 fs/inotify.c
 delete mode 100644 fs/inotify_user.c
 create mode 100644 fs/notify/Kconfig
 create mode 100644 fs/notify/Makefile
 create mode 100644 fs/notify/dnotify/Kconfig
 create mode 100644 fs/notify/dnotify/Makefile
 create mode 100644 fs/notify/dnotify/dnotify.c
 create mode 100644 fs/notify/inotify/Kconfig
 create mode 100644 fs/notify/inotify/Makefile
 create mode 100644 fs/notify/inotify/inotify.c
 create mode 100644 fs/notify/inotify/inotify_user.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 522469a..ff0e819 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -270,44 +270,7 @@ config OCFS2_COMPAT_JBD
 
 endif # BLOCK
 
-config DNOTIFY
-	bool "Dnotify support"
-	default y
-	help
-	  Dnotify is a directory-based per-fd file change notification system
-	  that uses signals to communicate events to user-space.  There exist
-	  superior alternatives, but some applications may still rely on
-	  dnotify.
-
-	  If unsure, say Y.
-
-config INOTIFY
-	bool "Inotify file change notification support"
-	default y
-	---help---
-	  Say Y here to enable inotify support.  Inotify is a file change
-	  notification system and a replacement for dnotify.  Inotify fixes
-	  numerous shortcomings in dnotify and introduces several new features
-	  including multiple file events, one-shot support, and unmount
-	  notification.
-
-	  For more information, see <file:Documentation/filesystems/inotify.txt>
-
-	  If unsure, say Y.
-
-config INOTIFY_USER
-	bool "Inotify support for userspace"
-	depends on INOTIFY
-	default y
-	---help---
-	  Say Y here to enable inotify support for userspace, including the
-	  associated system calls.  Inotify allows monitoring of both files and
-	  directories via a single open fd.  Events are read from the file
-	  descriptor, which is also select()- and poll()-able.
-
-	  For more information, see <file:Documentation/filesystems/inotify.txt>
-
-	  If unsure, say Y.
+source "fs/notify/Kconfig"
 
 config QUOTA
 	bool "Quota support"
diff --git a/fs/Makefile b/fs/Makefile
index d9f8afe..e6f423d 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -20,8 +20,7 @@ obj-y +=	no-block.o
 endif
 
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o
-obj-$(CONFIG_INOTIFY)		+= inotify.o
-obj-$(CONFIG_INOTIFY_USER)	+= inotify_user.o
+obj-y				+= notify/
 obj-$(CONFIG_EPOLL)		+= eventpoll.o
 obj-$(CONFIG_ANON_INODES)	+= anon_inodes.o
 obj-$(CONFIG_SIGNALFD)		+= signalfd.o
@@ -57,8 +56,6 @@ obj-$(CONFIG_QFMT_V1)		+= quota_v1.o
 obj-$(CONFIG_QFMT_V2)		+= quota_v2.o
 obj-$(CONFIG_QUOTACTL)		+= quota.o
 
-obj-$(CONFIG_DNOTIFY)		+= dnotify.o
-
 obj-$(CONFIG_PROC_FS)		+= proc/
 obj-y				+= partitions/
 obj-$(CONFIG_SYSFS)		+= sysfs/
diff --git a/fs/dnotify.c b/fs/dnotify.c
deleted file mode 100644
index 676073b..0000000
--- a/fs/dnotify.c
+++ /dev/null
@@ -1,194 +0,0 @@
-/*
- * Directory notifications for Linux.
- *
- * Copyright (C) 2000,2001,2002 Stephen Rothwell
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation; either version 2, or (at your option) any
- * later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- */
-#include <linux/fs.h>
-#include <linux/module.h>
-#include <linux/sched.h>
-#include <linux/dnotify.h>
-#include <linux/init.h>
-#include <linux/spinlock.h>
-#include <linux/slab.h>
-#include <linux/fdtable.h>
-
-int dir_notify_enable __read_mostly = 1;
-
-static struct kmem_cache *dn_cache __read_mostly;
-
-static void redo_inode_mask(struct inode *inode)
-{
-	unsigned long new_mask;
-	struct dnotify_struct *dn;
-
-	new_mask = 0;
-	for (dn = inode->i_dnotify; dn != NULL; dn = dn->dn_next)
-		new_mask |= dn->dn_mask & ~DN_MULTISHOT;
-	inode->i_dnotify_mask = new_mask;
-}
-
-void dnotify_flush(struct file *filp, fl_owner_t id)
-{
-	struct dnotify_struct *dn;
-	struct dnotify_struct **prev;
-	struct inode *inode;
-
-	inode = filp->f_path.dentry->d_inode;
-	if (!S_ISDIR(inode->i_mode))
-		return;
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
-	while ((dn = *prev) != NULL) {
-		if ((dn->dn_owner == id) && (dn->dn_filp == filp)) {
-			*prev = dn->dn_next;
-			redo_inode_mask(inode);
-			kmem_cache_free(dn_cache, dn);
-			break;
-		}
-		prev = &dn->dn_next;
-	}
-	spin_unlock(&inode->i_lock);
-}
-
-int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
-{
-	struct dnotify_struct *dn;
-	struct dnotify_struct *odn;
-	struct dnotify_struct **prev;
-	struct inode *inode;
-	fl_owner_t id = current->files;
-	struct file *f;
-	int error = 0;
-
-	if ((arg & ~DN_MULTISHOT) == 0) {
-		dnotify_flush(filp, id);
-		return 0;
-	}
-	if (!dir_notify_enable)
-		return -EINVAL;
-	inode = filp->f_path.dentry->d_inode;
-	if (!S_ISDIR(inode->i_mode))
-		return -ENOTDIR;
-	dn = kmem_cache_alloc(dn_cache, GFP_KERNEL);
-	if (dn == NULL)
-		return -ENOMEM;
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
-	while ((odn = *prev) != NULL) {
-		if ((odn->dn_owner == id) && (odn->dn_filp == filp)) {
-			odn->dn_fd = fd;
-			odn->dn_mask |= arg;
-			inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
-			goto out_free;
-		}
-		prev = &odn->dn_next;
-	}
-
-	rcu_read_lock();
-	f = fcheck(fd);
-	rcu_read_unlock();
-	/* we'd lost the race with close(), sod off silently */
-	/* note that inode->i_lock prevents reordering problems
-	 * between accesses to descriptor table and ->i_dnotify */
-	if (f != filp)
-		goto out_free;
-
-	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
-	if (error)
-		goto out_free;
-
-	dn->dn_mask = arg;
-	dn->dn_fd = fd;
-	dn->dn_filp = filp;
-	dn->dn_owner = id;
-	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
-	dn->dn_next = inode->i_dnotify;
-	inode->i_dnotify = dn;
-	spin_unlock(&inode->i_lock);
-
-	if (filp->f_op && filp->f_op->dir_notify)
-		return filp->f_op->dir_notify(filp, arg);
-	return 0;
-
-out_free:
-	spin_unlock(&inode->i_lock);
-	kmem_cache_free(dn_cache, dn);
-	return error;
-}
-
-void __inode_dir_notify(struct inode *inode, unsigned long event)
-{
-	struct dnotify_struct *	dn;
-	struct dnotify_struct **prev;
-	struct fown_struct *	fown;
-	int			changed = 0;
-
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
-	while ((dn = *prev) != NULL) {
-		if ((dn->dn_mask & event) == 0) {
-			prev = &dn->dn_next;
-			continue;
-		}
-		fown = &dn->dn_filp->f_owner;
-		send_sigio(fown, dn->dn_fd, POLL_MSG);
-		if (dn->dn_mask & DN_MULTISHOT)
-			prev = &dn->dn_next;
-		else {
-			*prev = dn->dn_next;
-			changed = 1;
-			kmem_cache_free(dn_cache, dn);
-		}
-	}
-	if (changed)
-		redo_inode_mask(inode);
-	spin_unlock(&inode->i_lock);
-}
-
-EXPORT_SYMBOL(__inode_dir_notify);
-
-/*
- * This is hopelessly wrong, but unfixable without API changes.  At
- * least it doesn't oops the kernel...
- *
- * To safely access ->d_parent we need to keep d_move away from it.  Use the
- * dentry's d_lock for this.
- */
-void dnotify_parent(struct dentry *dentry, unsigned long event)
-{
-	struct dentry *parent;
-
-	if (!dir_notify_enable)
-		return;
-
-	spin_lock(&dentry->d_lock);
-	parent = dentry->d_parent;
-	if (parent->d_inode->i_dnotify_mask & event) {
-		dget(parent);
-		spin_unlock(&dentry->d_lock);
-		__inode_dir_notify(parent->d_inode, event);
-		dput(parent);
-	} else {
-		spin_unlock(&dentry->d_lock);
-	}
-}
-EXPORT_SYMBOL_GPL(dnotify_parent);
-
-static int __init dnotify_init(void)
-{
-	dn_cache = kmem_cache_create("dnotify_cache",
-		sizeof(struct dnotify_struct), 0, SLAB_PANIC, NULL);
-	return 0;
-}
-
-module_init(dnotify_init)
diff --git a/fs/inotify.c b/fs/inotify.c
deleted file mode 100644
index 7bbed1b..0000000
--- a/fs/inotify.c
+++ /dev/null
@@ -1,911 +0,0 @@
-/*
- * fs/inotify.c - inode-based file event notifications
- *
- * Authors:
- *	John McCutchan	<ttb@tentacle.dhs.org>
- *	Robert Love	<rml@novell.com>
- *
- * Kernel API added by: Amy Griffis <amy.griffis@hp.com>
- *
- * Copyright (C) 2005 John McCutchan
- * Copyright 2006 Hewlett-Packard Development Company, L.P.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation; either version 2, or (at your option) any
- * later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- */
-
-#include <linux/module.h>
-#include <linux/kernel.h>
-#include <linux/spinlock.h>
-#include <linux/idr.h>
-#include <linux/slab.h>
-#include <linux/fs.h>
-#include <linux/sched.h>
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/writeback.h>
-#include <linux/inotify.h>
-
-static atomic_t inotify_cookie;
-
-/*
- * Lock ordering:
- *
- * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
- * iprune_mutex (synchronize shrink_icache_memory())
- * 	inode_lock (protects the super_block->s_inodes list)
- * 	inode->inotify_mutex (protects inode->inotify_watches and watches->i_list)
- * 		inotify_handle->mutex (protects inotify_handle and watches->h_list)
- *
- * The inode->inotify_mutex and inotify_handle->mutex and held during execution
- * of a caller's event handler.  Thus, the caller must not hold any locks
- * taken in their event handler while calling any of the published inotify
- * interfaces.
- */
-
-/*
- * Lifetimes of the three main data structures--inotify_handle, inode, and
- * inotify_watch--are managed by reference count.
- *
- * inotify_handle: Lifetime is from inotify_init() to inotify_destroy().
- * Additional references can bump the count via get_inotify_handle() and drop
- * the count via put_inotify_handle().
- *
- * inotify_watch: for inotify's purposes, lifetime is from inotify_add_watch()
- * to remove_watch_no_event().  Additional references can bump the count via
- * get_inotify_watch() and drop the count via put_inotify_watch().  The caller
- * is reponsible for the final put after receiving IN_IGNORED, or when using
- * IN_ONESHOT after receiving the first event.  Inotify does the final put if
- * inotify_destroy() is called.
- *
- * inode: Pinned so long as the inode is associated with a watch, from
- * inotify_add_watch() to the final put_inotify_watch().
- */
-
-/*
- * struct inotify_handle - represents an inotify instance
- *
- * This structure is protected by the mutex 'mutex'.
- */
-struct inotify_handle {
-	struct idr		idr;		/* idr mapping wd -> watch */
-	struct mutex		mutex;		/* protects this bad boy */
-	struct list_head	watches;	/* list of watches */
-	atomic_t		count;		/* reference count */
-	u32			last_wd;	/* the last wd allocated */
-	const struct inotify_operations *in_ops; /* inotify caller operations */
-};
-
-static inline void get_inotify_handle(struct inotify_handle *ih)
-{
-	atomic_inc(&ih->count);
-}
-
-static inline void put_inotify_handle(struct inotify_handle *ih)
-{
-	if (atomic_dec_and_test(&ih->count)) {
-		idr_destroy(&ih->idr);
-		kfree(ih);
-	}
-}
-
-/**
- * get_inotify_watch - grab a reference to an inotify_watch
- * @watch: watch to grab
- */
-void get_inotify_watch(struct inotify_watch *watch)
-{
-	atomic_inc(&watch->count);
-}
-EXPORT_SYMBOL_GPL(get_inotify_watch);
-
-int pin_inotify_watch(struct inotify_watch *watch)
-{
-	struct super_block *sb = watch->inode->i_sb;
-	spin_lock(&sb_lock);
-	if (sb->s_count >= S_BIAS) {
-		atomic_inc(&sb->s_active);
-		spin_unlock(&sb_lock);
-		atomic_inc(&watch->count);
-		return 1;
-	}
-	spin_unlock(&sb_lock);
-	return 0;
-}
-
-/**
- * put_inotify_watch - decrements the ref count on a given watch.  cleans up
- * watch references if the count reaches zero.  inotify_watch is freed by
- * inotify callers via the destroy_watch() op.
- * @watch: watch to release
- */
-void put_inotify_watch(struct inotify_watch *watch)
-{
-	if (atomic_dec_and_test(&watch->count)) {
-		struct inotify_handle *ih = watch->ih;
-
-		iput(watch->inode);
-		ih->in_ops->destroy_watch(watch);
-		put_inotify_handle(ih);
-	}
-}
-EXPORT_SYMBOL_GPL(put_inotify_watch);
-
-void unpin_inotify_watch(struct inotify_watch *watch)
-{
-	struct super_block *sb = watch->inode->i_sb;
-	put_inotify_watch(watch);
-	deactivate_super(sb);
-}
-
-/*
- * inotify_handle_get_wd - returns the next WD for use by the given handle
- *
- * Callers must hold ih->mutex.  This function can sleep.
- */
-static int inotify_handle_get_wd(struct inotify_handle *ih,
-				 struct inotify_watch *watch)
-{
-	int ret;
-
-	do {
-		if (unlikely(!idr_pre_get(&ih->idr, GFP_KERNEL)))
-			return -ENOSPC;
-		ret = idr_get_new_above(&ih->idr, watch, ih->last_wd+1, &watch->wd);
-	} while (ret == -EAGAIN);
-
-	if (likely(!ret))
-		ih->last_wd = watch->wd;
-
-	return ret;
-}
-
-/*
- * inotify_inode_watched - returns nonzero if there are watches on this inode
- * and zero otherwise.  We call this lockless, we do not care if we race.
- */
-static inline int inotify_inode_watched(struct inode *inode)
-{
-	return !list_empty(&inode->inotify_watches);
-}
-
-/*
- * Get child dentry flag into synch with parent inode.
- * Flag should always be clear for negative dentrys.
- */
-static void set_dentry_child_flags(struct inode *inode, int watched)
-{
-	struct dentry *alias;
-
-	spin_lock(&dcache_lock);
-	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
-		struct dentry *child;
-
-		list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
-			if (!child->d_inode)
-				continue;
-
-			spin_lock(&child->d_lock);
-			if (watched)
-				child->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
-			else
-				child->d_flags &=~DCACHE_INOTIFY_PARENT_WATCHED;
-			spin_unlock(&child->d_lock);
-		}
-	}
-	spin_unlock(&dcache_lock);
-}
-
-/*
- * inotify_find_handle - find the watch associated with the given inode and
- * handle
- *
- * Callers must hold inode->inotify_mutex.
- */
-static struct inotify_watch *inode_find_handle(struct inode *inode,
-					       struct inotify_handle *ih)
-{
-	struct inotify_watch *watch;
-
-	list_for_each_entry(watch, &inode->inotify_watches, i_list) {
-		if (watch->ih == ih)
-			return watch;
-	}
-
-	return NULL;
-}
-
-/*
- * remove_watch_no_event - remove watch without the IN_IGNORED event.
- *
- * Callers must hold both inode->inotify_mutex and ih->mutex.
- */
-static void remove_watch_no_event(struct inotify_watch *watch,
-				  struct inotify_handle *ih)
-{
-	list_del(&watch->i_list);
-	list_del(&watch->h_list);
-
-	if (!inotify_inode_watched(watch->inode))
-		set_dentry_child_flags(watch->inode, 0);
-
-	idr_remove(&ih->idr, watch->wd);
-}
-
-/**
- * inotify_remove_watch_locked - Remove a watch from both the handle and the
- * inode.  Sends the IN_IGNORED event signifying that the inode is no longer
- * watched.  May be invoked from a caller's event handler.
- * @ih: inotify handle associated with watch
- * @watch: watch to remove
- *
- * Callers must hold both inode->inotify_mutex and ih->mutex.
- */
-void inotify_remove_watch_locked(struct inotify_handle *ih,
-				 struct inotify_watch *watch)
-{
-	remove_watch_no_event(watch, ih);
-	ih->in_ops->handle_event(watch, watch->wd, IN_IGNORED, 0, NULL, NULL);
-}
-EXPORT_SYMBOL_GPL(inotify_remove_watch_locked);
-
-/* Kernel API for producing events */
-
-/*
- * inotify_d_instantiate - instantiate dcache entry for inode
- */
-void inotify_d_instantiate(struct dentry *entry, struct inode *inode)
-{
-	struct dentry *parent;
-
-	if (!inode)
-		return;
-
-	spin_lock(&entry->d_lock);
-	parent = entry->d_parent;
-	if (parent->d_inode && inotify_inode_watched(parent->d_inode))
-		entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
-	spin_unlock(&entry->d_lock);
-}
-
-/*
- * inotify_d_move - dcache entry has been moved
- */
-void inotify_d_move(struct dentry *entry)
-{
-	struct dentry *parent;
-
-	parent = entry->d_parent;
-	if (inotify_inode_watched(parent->d_inode))
-		entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
-	else
-		entry->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED;
-}
-
-/**
- * inotify_inode_queue_event - queue an event to all watches on this inode
- * @inode: inode event is originating from
- * @mask: event mask describing this event
- * @cookie: cookie for synchronization, or zero
- * @name: filename, if any
- * @n_inode: inode associated with name
- */
-void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
-			       const char *name, struct inode *n_inode)
-{
-	struct inotify_watch *watch, *next;
-
-	if (!inotify_inode_watched(inode))
-		return;
-
-	mutex_lock(&inode->inotify_mutex);
-	list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
-		u32 watch_mask = watch->mask;
-		if (watch_mask & mask) {
-			struct inotify_handle *ih= watch->ih;
-			mutex_lock(&ih->mutex);
-			if (watch_mask & IN_ONESHOT)
-				remove_watch_no_event(watch, ih);
-			ih->in_ops->handle_event(watch, watch->wd, mask, cookie,
-						 name, n_inode);
-			mutex_unlock(&ih->mutex);
-		}
-	}
-	mutex_unlock(&inode->inotify_mutex);
-}
-EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
-
-/**
- * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
- * @dentry: the dentry in question, we queue against this dentry's parent
- * @mask: event mask describing this event
- * @cookie: cookie for synchronization, or zero
- * @name: filename, if any
- */
-void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
-				       u32 cookie, const char *name)
-{
-	struct dentry *parent;
-	struct inode *inode;
-
-	if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
-		return;
-
-	spin_lock(&dentry->d_lock);
-	parent = dentry->d_parent;
-	inode = parent->d_inode;
-
-	if (inotify_inode_watched(inode)) {
-		dget(parent);
-		spin_unlock(&dentry->d_lock);
-		inotify_inode_queue_event(inode, mask, cookie, name,
-					  dentry->d_inode);
-		dput(parent);
-	} else
-		spin_unlock(&dentry->d_lock);
-}
-EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
-
-/**
- * inotify_get_cookie - return a unique cookie for use in synchronizing events.
- */
-u32 inotify_get_cookie(void)
-{
-	return atomic_inc_return(&inotify_cookie);
-}
-EXPORT_SYMBOL_GPL(inotify_get_cookie);
-
-/**
- * inotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
- * @list: list of inodes being unmounted (sb->s_inodes)
- *
- * Called with inode_lock held, protecting the unmounting super block's list
- * of inodes, and with iprune_mutex held, keeping shrink_icache_memory() at bay.
- * We temporarily drop inode_lock, however, and CAN block.
- */
-void inotify_unmount_inodes(struct list_head *list)
-{
-	struct inode *inode, *next_i, *need_iput = NULL;
-
-	list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
-		struct inotify_watch *watch, *next_w;
-		struct inode *need_iput_tmp;
-		struct list_head *watches;
-
-		/*
-		 * If i_count is zero, the inode cannot have any watches and
-		 * doing an __iget/iput with MS_ACTIVE clear would actually
-		 * evict all inodes with zero i_count from icache which is
-		 * unnecessarily violent and may in fact be illegal to do.
-		 */
-		if (!atomic_read(&inode->i_count))
-			continue;
-
-		/*
-		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
-		 * I_WILL_FREE which is fine because by that point the inode
-		 * cannot have any associated watches.
-		 */
-		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
-			continue;
-
-		need_iput_tmp = need_iput;
-		need_iput = NULL;
-		/* In case inotify_remove_watch_locked() drops a reference. */
-		if (inode != need_iput_tmp)
-			__iget(inode);
-		else
-			need_iput_tmp = NULL;
-		/* In case the dropping of a reference would nuke next_i. */
-		if ((&next_i->i_sb_list != list) &&
-				atomic_read(&next_i->i_count) &&
-				!(next_i->i_state & (I_CLEAR | I_FREEING |
-					I_WILL_FREE))) {
-			__iget(next_i);
-			need_iput = next_i;
-		}
-
-		/*
-		 * We can safely drop inode_lock here because we hold
-		 * references on both inode and next_i.  Also no new inodes
-		 * will be added since the umount has begun.  Finally,
-		 * iprune_mutex keeps shrink_icache_memory() away.
-		 */
-		spin_unlock(&inode_lock);
-
-		if (need_iput_tmp)
-			iput(need_iput_tmp);
-
-		/* for each watch, send IN_UNMOUNT and then remove it */
-		mutex_lock(&inode->inotify_mutex);
-		watches = &inode->inotify_watches;
-		list_for_each_entry_safe(watch, next_w, watches, i_list) {
-			struct inotify_handle *ih= watch->ih;
-			mutex_lock(&ih->mutex);
-			ih->in_ops->handle_event(watch, watch->wd, IN_UNMOUNT, 0,
-						 NULL, NULL);
-			inotify_remove_watch_locked(ih, watch);
-			mutex_unlock(&ih->mutex);
-		}
-		mutex_unlock(&inode->inotify_mutex);
-		iput(inode);		
-
-		spin_lock(&inode_lock);
-	}
-}
-EXPORT_SYMBOL_GPL(inotify_unmount_inodes);
-
-/**
- * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
- * @inode: inode that is about to be removed
- */
-void inotify_inode_is_dead(struct inode *inode)
-{
-	struct inotify_watch *watch, *next;
-
-	mutex_lock(&inode->inotify_mutex);
-	list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
-		struct inotify_handle *ih = watch->ih;
-		mutex_lock(&ih->mutex);
-		inotify_remove_watch_locked(ih, watch);
-		mutex_unlock(&ih->mutex);
-	}
-	mutex_unlock(&inode->inotify_mutex);
-}
-EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
-
-/* Kernel Consumer API */
-
-/**
- * inotify_init - allocate and initialize an inotify instance
- * @ops: caller's inotify operations
- */
-struct inotify_handle *inotify_init(const struct inotify_operations *ops)
-{
-	struct inotify_handle *ih;
-
-	ih = kmalloc(sizeof(struct inotify_handle), GFP_KERNEL);
-	if (unlikely(!ih))
-		return ERR_PTR(-ENOMEM);
-
-	idr_init(&ih->idr);
-	INIT_LIST_HEAD(&ih->watches);
-	mutex_init(&ih->mutex);
-	ih->last_wd = 0;
-	ih->in_ops = ops;
-	atomic_set(&ih->count, 0);
-	get_inotify_handle(ih);
-
-	return ih;
-}
-EXPORT_SYMBOL_GPL(inotify_init);
-
-/**
- * inotify_init_watch - initialize an inotify watch
- * @watch: watch to initialize
- */
-void inotify_init_watch(struct inotify_watch *watch)
-{
-	INIT_LIST_HEAD(&watch->h_list);
-	INIT_LIST_HEAD(&watch->i_list);
-	atomic_set(&watch->count, 0);
-	get_inotify_watch(watch); /* initial get */
-}
-EXPORT_SYMBOL_GPL(inotify_init_watch);
-
-/*
- * Watch removals suck violently.  To kick the watch out we need (in this
- * order) inode->inotify_mutex and ih->mutex.  That's fine if we have
- * a hold on inode; however, for all other cases we need to make damn sure
- * we don't race with umount.  We can *NOT* just grab a reference to a
- * watch - inotify_unmount_inodes() will happily sail past it and we'll end
- * with reference to inode potentially outliving its superblock.  Ideally
- * we just want to grab an active reference to superblock if we can; that
- * will make sure we won't go into inotify_umount_inodes() until we are
- * done.  Cleanup is just deactivate_super().  However, that leaves a messy
- * case - what if we *are* racing with umount() and active references to
- * superblock can't be acquired anymore?  We can bump ->s_count, grab
- * ->s_umount, which will almost certainly wait until the superblock is shut
- * down and the watch in question is pining for fjords.  That's fine, but
- * there is a problem - we might have hit the window between ->s_active
- * getting to 0 / ->s_count - below S_BIAS (i.e. the moment when superblock
- * is past the point of no return and is heading for shutdown) and the
- * moment when deactivate_super() acquires ->s_umount.  We could just do
- * drop_super() yield() and retry, but that's rather antisocial and this
- * stuff is luser-triggerable.  OTOH, having grabbed ->s_umount and having
- * found that we'd got there first (i.e. that ->s_root is non-NULL) we know
- * that we won't race with inotify_umount_inodes().  So we could grab a
- * reference to watch and do the rest as above, just with drop_super() instead
- * of deactivate_super(), right?  Wrong.  We had to drop ih->mutex before we
- * could grab ->s_umount.  So the watch could've been gone already.
- *
- * That still can be dealt with - we need to save watch->wd, do idr_find()
- * and compare its result with our pointer.  If they match, we either have
- * the damn thing still alive or we'd lost not one but two races at once,
- * the watch had been killed and a new one got created with the same ->wd
- * at the same address.  That couldn't have happened in inotify_destroy(),
- * but inotify_rm_wd() could run into that.  Still, "new one got created"
- * is not a problem - we have every right to kill it or leave it alone,
- * whatever's more convenient.
- *
- * So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
- * "grab it and kill it" check.  If it's been our original watch, we are
- * fine, if it's a newcomer - nevermind, just pretend that we'd won the
- * race and kill the fscker anyway; we are safe since we know that its
- * superblock won't be going away.
- *
- * And yes, this is far beyond mere "not very pretty"; so's the entire
- * concept of inotify to start with.
- */
-
-/**
- * pin_to_kill - pin the watch down for removal
- * @ih: inotify handle
- * @watch: watch to kill
- *
- * Called with ih->mutex held, drops it.  Possible return values:
- * 0 - nothing to do, it has died
- * 1 - remove it, drop the reference and deactivate_super()
- * 2 - remove it, drop the reference and drop_super(); we tried hard to avoid
- * that variant, since it involved a lot of PITA, but that's the best that
- * could've been done.
- */
-static int pin_to_kill(struct inotify_handle *ih, struct inotify_watch *watch)
-{
-	struct super_block *sb = watch->inode->i_sb;
-	s32 wd = watch->wd;
-
-	spin_lock(&sb_lock);
-	if (sb->s_count >= S_BIAS) {
-		atomic_inc(&sb->s_active);
-		spin_unlock(&sb_lock);
-		get_inotify_watch(watch);
-		mutex_unlock(&ih->mutex);
-		return 1;	/* the best outcome */
-	}
-	sb->s_count++;
-	spin_unlock(&sb_lock);
-	mutex_unlock(&ih->mutex); /* can't grab ->s_umount under it */
-	down_read(&sb->s_umount);
-	if (likely(!sb->s_root)) {
-		/* fs is already shut down; the watch is dead */
-		drop_super(sb);
-		return 0;
-	}
-	/* raced with the final deactivate_super() */
-	mutex_lock(&ih->mutex);
-	if (idr_find(&ih->idr, wd) != watch || watch->inode->i_sb != sb) {
-		/* the watch is dead */
-		mutex_unlock(&ih->mutex);
-		drop_super(sb);
-		return 0;
-	}
-	/* still alive or freed and reused with the same sb and wd; kill */
-	get_inotify_watch(watch);
-	mutex_unlock(&ih->mutex);
-	return 2;
-}
-
-static void unpin_and_kill(struct inotify_watch *watch, int how)
-{
-	struct super_block *sb = watch->inode->i_sb;
-	put_inotify_watch(watch);
-	switch (how) {
-	case 1:
-		deactivate_super(sb);
-		break;
-	case 2:
-		drop_super(sb);
-	}
-}
-
-/**
- * inotify_destroy - clean up and destroy an inotify instance
- * @ih: inotify handle
- */
-void inotify_destroy(struct inotify_handle *ih)
-{
-	/*
-	 * Destroy all of the watches for this handle. Unfortunately, not very
-	 * pretty.  We cannot do a simple iteration over the list, because we
-	 * do not know the inode until we iterate to the watch.  But we need to
-	 * hold inode->inotify_mutex before ih->mutex.  The following works.
-	 *
-	 * AV: it had to become even uglier to start working ;-/
-	 */
-	while (1) {
-		struct inotify_watch *watch;
-		struct list_head *watches;
-		struct super_block *sb;
-		struct inode *inode;
-		int how;
-
-		mutex_lock(&ih->mutex);
-		watches = &ih->watches;
-		if (list_empty(watches)) {
-			mutex_unlock(&ih->mutex);
-			break;
-		}
-		watch = list_first_entry(watches, struct inotify_watch, h_list);
-		sb = watch->inode->i_sb;
-		how = pin_to_kill(ih, watch);
-		if (!how)
-			continue;
-
-		inode = watch->inode;
-		mutex_lock(&inode->inotify_mutex);
-		mutex_lock(&ih->mutex);
-
-		/* make sure we didn't race with another list removal */
-		if (likely(idr_find(&ih->idr, watch->wd))) {
-			remove_watch_no_event(watch, ih);
-			put_inotify_watch(watch);
-		}
-
-		mutex_unlock(&ih->mutex);
-		mutex_unlock(&inode->inotify_mutex);
-		unpin_and_kill(watch, how);
-	}
-
-	/* free this handle: the put matching the get in inotify_init() */
-	put_inotify_handle(ih);
-}
-EXPORT_SYMBOL_GPL(inotify_destroy);
-
-/**
- * inotify_find_watch - find an existing watch for an (ih,inode) pair
- * @ih: inotify handle
- * @inode: inode to watch
- * @watchp: pointer to existing inotify_watch
- *
- * Caller must pin given inode (via nameidata).
- */
-s32 inotify_find_watch(struct inotify_handle *ih, struct inode *inode,
-		       struct inotify_watch **watchp)
-{
-	struct inotify_watch *old;
-	int ret = -ENOENT;
-
-	mutex_lock(&inode->inotify_mutex);
-	mutex_lock(&ih->mutex);
-
-	old = inode_find_handle(inode, ih);
-	if (unlikely(old)) {
-		get_inotify_watch(old); /* caller must put watch */
-		*watchp = old;
-		ret = old->wd;
-	}
-
-	mutex_unlock(&ih->mutex);
-	mutex_unlock(&inode->inotify_mutex);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(inotify_find_watch);
-
-/**
- * inotify_find_update_watch - find and update the mask of an existing watch
- * @ih: inotify handle
- * @inode: inode's watch to update
- * @mask: mask of events to watch
- *
- * Caller must pin given inode (via nameidata).
- */
-s32 inotify_find_update_watch(struct inotify_handle *ih, struct inode *inode,
-			      u32 mask)
-{
-	struct inotify_watch *old;
-	int mask_add = 0;
-	int ret;
-
-	if (mask & IN_MASK_ADD)
-		mask_add = 1;
-
-	/* don't allow invalid bits: we don't want flags set */
-	mask &= IN_ALL_EVENTS | IN_ONESHOT;
-	if (unlikely(!mask))
-		return -EINVAL;
-
-	mutex_lock(&inode->inotify_mutex);
-	mutex_lock(&ih->mutex);
-
-	/*
-	 * Handle the case of re-adding a watch on an (inode,ih) pair that we
-	 * are already watching.  We just update the mask and return its wd.
-	 */
-	old = inode_find_handle(inode, ih);
-	if (unlikely(!old)) {
-		ret = -ENOENT;
-		goto out;
-	}
-
-	if (mask_add)
-		old->mask |= mask;
-	else
-		old->mask = mask;
-	ret = old->wd;
-out:
-	mutex_unlock(&ih->mutex);
-	mutex_unlock(&inode->inotify_mutex);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(inotify_find_update_watch);
-
-/**
- * inotify_add_watch - add a watch to an inotify instance
- * @ih: inotify handle
- * @watch: caller allocated watch structure
- * @inode: inode to watch
- * @mask: mask of events to watch
- *
- * Caller must pin given inode (via nameidata).
- * Caller must ensure it only calls inotify_add_watch() once per watch.
- * Calls inotify_handle_get_wd() so may sleep.
- */
-s32 inotify_add_watch(struct inotify_handle *ih, struct inotify_watch *watch,
-		      struct inode *inode, u32 mask)
-{
-	int ret = 0;
-	int newly_watched;
-
-	/* don't allow invalid bits: we don't want flags set */
-	mask &= IN_ALL_EVENTS | IN_ONESHOT;
-	if (unlikely(!mask))
-		return -EINVAL;
-	watch->mask = mask;
-
-	mutex_lock(&inode->inotify_mutex);
-	mutex_lock(&ih->mutex);
-
-	/* Initialize a new watch */
-	ret = inotify_handle_get_wd(ih, watch);
-	if (unlikely(ret))
-		goto out;
-	ret = watch->wd;
-
-	/* save a reference to handle and bump the count to make it official */
-	get_inotify_handle(ih);
-	watch->ih = ih;
-
-	/*
-	 * Save a reference to the inode and bump the ref count to make it
-	 * official.  We hold a reference to nameidata, which makes this safe.
-	 */
-	watch->inode = igrab(inode);
-
-	/* Add the watch to the handle's and the inode's list */
-	newly_watched = !inotify_inode_watched(inode);
-	list_add(&watch->h_list, &ih->watches);
-	list_add(&watch->i_list, &inode->inotify_watches);
-	/*
-	 * Set child flags _after_ adding the watch, so there is no race
-	 * windows where newly instantiated children could miss their parent's
-	 * watched flag.
-	 */
-	if (newly_watched)
-		set_dentry_child_flags(inode, 1);
-
-out:
-	mutex_unlock(&ih->mutex);
-	mutex_unlock(&inode->inotify_mutex);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(inotify_add_watch);
-
-/**
- * inotify_clone_watch - put the watch next to existing one
- * @old: already installed watch
- * @new: new watch
- *
- * Caller must hold the inotify_mutex of inode we are dealing with;
- * it is expected to remove the old watch before unlocking the inode.
- */
-s32 inotify_clone_watch(struct inotify_watch *old, struct inotify_watch *new)
-{
-	struct inotify_handle *ih = old->ih;
-	int ret = 0;
-
-	new->mask = old->mask;
-	new->ih = ih;
-
-	mutex_lock(&ih->mutex);
-
-	/* Initialize a new watch */
-	ret = inotify_handle_get_wd(ih, new);
-	if (unlikely(ret))
-		goto out;
-	ret = new->wd;
-
-	get_inotify_handle(ih);
-
-	new->inode = igrab(old->inode);
-
-	list_add(&new->h_list, &ih->watches);
-	list_add(&new->i_list, &old->inode->inotify_watches);
-out:
-	mutex_unlock(&ih->mutex);
-	return ret;
-}
-
-void inotify_evict_watch(struct inotify_watch *watch)
-{
-	get_inotify_watch(watch);
-	mutex_lock(&watch->ih->mutex);
-	inotify_remove_watch_locked(watch->ih, watch);
-	mutex_unlock(&watch->ih->mutex);
-}
-
-/**
- * inotify_rm_wd - remove a watch from an inotify instance
- * @ih: inotify handle
- * @wd: watch descriptor to remove
- *
- * Can sleep.
- */
-int inotify_rm_wd(struct inotify_handle *ih, u32 wd)
-{
-	struct inotify_watch *watch;
-	struct super_block *sb;
-	struct inode *inode;
-	int how;
-
-	mutex_lock(&ih->mutex);
-	watch = idr_find(&ih->idr, wd);
-	if (unlikely(!watch)) {
-		mutex_unlock(&ih->mutex);
-		return -EINVAL;
-	}
-	sb = watch->inode->i_sb;
-	how = pin_to_kill(ih, watch);
-	if (!how)
-		return 0;
-
-	inode = watch->inode;
-
-	mutex_lock(&inode->inotify_mutex);
-	mutex_lock(&ih->mutex);
-
-	/* make sure that we did not race */
-	if (likely(idr_find(&ih->idr, wd) == watch))
-		inotify_remove_watch_locked(ih, watch);
-
-	mutex_unlock(&ih->mutex);
-	mutex_unlock(&inode->inotify_mutex);
-	unpin_and_kill(watch, how);
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(inotify_rm_wd);
-
-/**
- * inotify_rm_watch - remove a watch from an inotify instance
- * @ih: inotify handle
- * @watch: watch to remove
- *
- * Can sleep.
- */
-int inotify_rm_watch(struct inotify_handle *ih,
-		     struct inotify_watch *watch)
-{
-	return inotify_rm_wd(ih, watch->wd);
-}
-EXPORT_SYMBOL_GPL(inotify_rm_watch);
-
-/*
- * inotify_setup - core initialization function
- */
-static int __init inotify_setup(void)
-{
-	atomic_set(&inotify_cookie, 0);
-
-	return 0;
-}
-
-module_init(inotify_setup);
diff --git a/fs/inotify_user.c b/fs/inotify_user.c
deleted file mode 100644
index d367e9b..0000000
--- a/fs/inotify_user.c
+++ /dev/null
@@ -1,778 +0,0 @@
-/*
- * fs/inotify_user.c - inotify support for userspace
- *
- * Authors:
- *	John McCutchan	<ttb@tentacle.dhs.org>
- *	Robert Love	<rml@novell.com>
- *
- * Copyright (C) 2005 John McCutchan
- * Copyright 2006 Hewlett-Packard Development Company, L.P.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation; either version 2, or (at your option) any
- * later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- */
-
-#include <linux/kernel.h>
-#include <linux/sched.h>
-#include <linux/slab.h>
-#include <linux/fs.h>
-#include <linux/file.h>
-#include <linux/mount.h>
-#include <linux/namei.h>
-#include <linux/poll.h>
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/inotify.h>
-#include <linux/syscalls.h>
-#include <linux/magic.h>
-
-#include <asm/ioctls.h>
-
-static struct kmem_cache *watch_cachep __read_mostly;
-static struct kmem_cache *event_cachep __read_mostly;
-
-static struct vfsmount *inotify_mnt __read_mostly;
-
-/* these are configurable via /proc/sys/fs/inotify/ */
-static int inotify_max_user_instances __read_mostly;
-static int inotify_max_user_watches __read_mostly;
-static int inotify_max_queued_events __read_mostly;
-
-/*
- * Lock ordering:
- *
- * inotify_dev->up_mutex (ensures we don't re-add the same watch)
- * 	inode->inotify_mutex (protects inode's watch list)
- * 		inotify_handle->mutex (protects inotify_handle's watch list)
- * 			inotify_dev->ev_mutex (protects device's event queue)
- */
-
-/*
- * Lifetimes of the main data structures:
- *
- * inotify_device: Lifetime is managed by reference count, from
- * sys_inotify_init() until release.  Additional references can bump the count
- * via get_inotify_dev() and drop the count via put_inotify_dev().
- *
- * inotify_user_watch: Lifetime is from create_watch() to the receipt of an
- * IN_IGNORED event from inotify, or when using IN_ONESHOT, to receipt of the
- * first event, or to inotify_destroy().
- */
-
-/*
- * struct inotify_device - represents an inotify instance
- *
- * This structure is protected by the mutex 'mutex'.
- */
-struct inotify_device {
-	wait_queue_head_t 	wq;		/* wait queue for i/o */
-	struct mutex		ev_mutex;	/* protects event queue */
-	struct mutex		up_mutex;	/* synchronizes watch updates */
-	struct list_head 	events;		/* list of queued events */
-	atomic_t		count;		/* reference count */
-	struct user_struct	*user;		/* user who opened this dev */
-	struct inotify_handle	*ih;		/* inotify handle */
-	struct fasync_struct    *fa;            /* async notification */
-	unsigned int		queue_size;	/* size of the queue (bytes) */
-	unsigned int		event_count;	/* number of pending events */
-	unsigned int		max_events;	/* maximum number of events */
-};
-
-/*
- * struct inotify_kernel_event - An inotify event, originating from a watch and
- * queued for user-space.  A list of these is attached to each instance of the
- * device.  In read(), this list is walked and all events that can fit in the
- * buffer are returned.
- *
- * Protected by dev->ev_mutex of the device in which we are queued.
- */
-struct inotify_kernel_event {
-	struct inotify_event	event;	/* the user-space event */
-	struct list_head        list;	/* entry in inotify_device's list */
-	char			*name;	/* filename, if any */
-};
-
-/*
- * struct inotify_user_watch - our version of an inotify_watch, we add
- * a reference to the associated inotify_device.
- */
-struct inotify_user_watch {
-	struct inotify_device	*dev;	/* associated device */
-	struct inotify_watch	wdata;	/* inotify watch data */
-};
-
-#ifdef CONFIG_SYSCTL
-
-#include <linux/sysctl.h>
-
-static int zero;
-
-ctl_table inotify_table[] = {
-	{
-		.ctl_name	= INOTIFY_MAX_USER_INSTANCES,
-		.procname	= "max_user_instances",
-		.data		= &inotify_max_user_instances,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec_minmax,
-		.strategy	= &sysctl_intvec,
-		.extra1		= &zero,
-	},
-	{
-		.ctl_name	= INOTIFY_MAX_USER_WATCHES,
-		.procname	= "max_user_watches",
-		.data		= &inotify_max_user_watches,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec_minmax,
-		.strategy	= &sysctl_intvec,
-		.extra1		= &zero,
-	},
-	{
-		.ctl_name	= INOTIFY_MAX_QUEUED_EVENTS,
-		.procname	= "max_queued_events",
-		.data		= &inotify_max_queued_events,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec_minmax,
-		.strategy	= &sysctl_intvec,
-		.extra1		= &zero
-	},
-	{ .ctl_name = 0 }
-};
-#endif /* CONFIG_SYSCTL */
-
-static inline void get_inotify_dev(struct inotify_device *dev)
-{
-	atomic_inc(&dev->count);
-}
-
-static inline void put_inotify_dev(struct inotify_device *dev)
-{
-	if (atomic_dec_and_test(&dev->count)) {
-		atomic_dec(&dev->user->inotify_devs);
-		free_uid(dev->user);
-		kfree(dev);
-	}
-}
-
-/*
- * free_inotify_user_watch - cleans up the watch and its references
- */
-static void free_inotify_user_watch(struct inotify_watch *w)
-{
-	struct inotify_user_watch *watch;
-	struct inotify_device *dev;
-
-	watch = container_of(w, struct inotify_user_watch, wdata);
-	dev = watch->dev;
-
-	atomic_dec(&dev->user->inotify_watches);
-	put_inotify_dev(dev);
-	kmem_cache_free(watch_cachep, watch);
-}
-
-/*
- * kernel_event - create a new kernel event with the given parameters
- *
- * This function can sleep.
- */
-static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
-						  const char *name)
-{
-	struct inotify_kernel_event *kevent;
-
-	kevent = kmem_cache_alloc(event_cachep, GFP_NOFS);
-	if (unlikely(!kevent))
-		return NULL;
-
-	/* we hand this out to user-space, so zero it just in case */
-	memset(&kevent->event, 0, sizeof(struct inotify_event));
-
-	kevent->event.wd = wd;
-	kevent->event.mask = mask;
-	kevent->event.cookie = cookie;
-
-	INIT_LIST_HEAD(&kevent->list);
-
-	if (name) {
-		size_t len, rem, event_size = sizeof(struct inotify_event);
-
-		/*
-		 * We need to pad the filename so as to properly align an
-		 * array of inotify_event structures.  Because the structure is
-		 * small and the common case is a small filename, we just round
-		 * up to the next multiple of the structure's sizeof.  This is
-		 * simple and safe for all architectures.
-		 */
-		len = strlen(name) + 1;
-		rem = event_size - len;
-		if (len > event_size) {
-			rem = event_size - (len % event_size);
-			if (len % event_size == 0)
-				rem = 0;
-		}
-
-		kevent->name = kmalloc(len + rem, GFP_KERNEL);
-		if (unlikely(!kevent->name)) {
-			kmem_cache_free(event_cachep, kevent);
-			return NULL;
-		}
-		memcpy(kevent->name, name, len);
-		if (rem)
-			memset(kevent->name + len, 0, rem);
-		kevent->event.len = len + rem;
-	} else {
-		kevent->event.len = 0;
-		kevent->name = NULL;
-	}
-
-	return kevent;
-}
-
-/*
- * inotify_dev_get_event - return the next event in the given dev's queue
- *
- * Caller must hold dev->ev_mutex.
- */
-static inline struct inotify_kernel_event *
-inotify_dev_get_event(struct inotify_device *dev)
-{
-	return list_entry(dev->events.next, struct inotify_kernel_event, list);
-}
-
-/*
- * inotify_dev_get_last_event - return the last event in the given dev's queue
- *
- * Caller must hold dev->ev_mutex.
- */
-static inline struct inotify_kernel_event *
-inotify_dev_get_last_event(struct inotify_device *dev)
-{
-	if (list_empty(&dev->events))
-		return NULL;
-	return list_entry(dev->events.prev, struct inotify_kernel_event, list);
-}
-
-/*
- * inotify_dev_queue_event - event handler registered with core inotify, adds
- * a new event to the given device
- *
- * Can sleep (calls kernel_event()).
- */
-static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd, u32 mask,
-				    u32 cookie, const char *name,
-				    struct inode *ignored)
-{
-	struct inotify_user_watch *watch;
-	struct inotify_device *dev;
-	struct inotify_kernel_event *kevent, *last;
-
-	watch = container_of(w, struct inotify_user_watch, wdata);
-	dev = watch->dev;
-
-	mutex_lock(&dev->ev_mutex);
-
-	/* we can safely put the watch as we don't reference it while
-	 * generating the event
-	 */
-	if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
-		put_inotify_watch(w); /* final put */
-
-	/* coalescing: drop this event if it is a dupe of the previous */
-	last = inotify_dev_get_last_event(dev);
-	if (last && last->event.mask == mask && last->event.wd == wd &&
-			last->event.cookie == cookie) {
-		const char *lastname = last->name;
-
-		if (!name && !lastname)
-			goto out;
-		if (name && lastname && !strcmp(lastname, name))
-			goto out;
-	}
-
-	/* the queue overflowed and we already sent the Q_OVERFLOW event */
-	if (unlikely(dev->event_count > dev->max_events))
-		goto out;
-
-	/* if the queue overflows, we need to notify user space */
-	if (unlikely(dev->event_count == dev->max_events))
-		kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
-	else
-		kevent = kernel_event(wd, mask, cookie, name);
-
-	if (unlikely(!kevent))
-		goto out;
-
-	/* queue the event and wake up anyone waiting */
-	dev->event_count++;
-	dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
-	list_add_tail(&kevent->list, &dev->events);
-	wake_up_interruptible(&dev->wq);
-	kill_fasync(&dev->fa, SIGIO, POLL_IN);
-
-out:
-	mutex_unlock(&dev->ev_mutex);
-}
-
-/*
- * remove_kevent - cleans up the given kevent
- *
- * Caller must hold dev->ev_mutex.
- */
-static void remove_kevent(struct inotify_device *dev,
-			  struct inotify_kernel_event *kevent)
-{
-	list_del(&kevent->list);
-
-	dev->event_count--;
-	dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
-}
-
-/*
- * free_kevent - frees the given kevent.
- */
-static void free_kevent(struct inotify_kernel_event *kevent)
-{
-	kfree(kevent->name);
-	kmem_cache_free(event_cachep, kevent);
-}
-
-/*
- * inotify_dev_event_dequeue - destroy an event on the given device
- *
- * Caller must hold dev->ev_mutex.
- */
-static void inotify_dev_event_dequeue(struct inotify_device *dev)
-{
-	if (!list_empty(&dev->events)) {
-		struct inotify_kernel_event *kevent;
-		kevent = inotify_dev_get_event(dev);
-		remove_kevent(dev, kevent);
-		free_kevent(kevent);
-	}
-}
-
-/*
- * find_inode - resolve a user-given path to a specific inode
- */
-static int find_inode(const char __user *dirname, struct path *path,
-		      unsigned flags)
-{
-	int error;
-
-	error = user_path_at(AT_FDCWD, dirname, flags, path);
-	if (error)
-		return error;
-	/* you can only watch an inode if you have read permissions on it */
-	error = inode_permission(path->dentry->d_inode, MAY_READ);
-	if (error)
-		path_put(path);
-	return error;
-}
-
-/*
- * create_watch - creates a watch on the given device.
- *
- * Callers must hold dev->up_mutex.
- */
-static int create_watch(struct inotify_device *dev, struct inode *inode,
-			u32 mask)
-{
-	struct inotify_user_watch *watch;
-	int ret;
-
-	if (atomic_read(&dev->user->inotify_watches) >=
-			inotify_max_user_watches)
-		return -ENOSPC;
-
-	watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
-	if (unlikely(!watch))
-		return -ENOMEM;
-
-	/* save a reference to device and bump the count to make it official */
-	get_inotify_dev(dev);
-	watch->dev = dev;
-
-	atomic_inc(&dev->user->inotify_watches);
-
-	inotify_init_watch(&watch->wdata);
-	ret = inotify_add_watch(dev->ih, &watch->wdata, inode, mask);
-	if (ret < 0)
-		free_inotify_user_watch(&watch->wdata);
-
-	return ret;
-}
-
-/* Device Interface */
-
-static unsigned int inotify_poll(struct file *file, poll_table *wait)
-{
-	struct inotify_device *dev = file->private_data;
-	int ret = 0;
-
-	poll_wait(file, &dev->wq, wait);
-	mutex_lock(&dev->ev_mutex);
-	if (!list_empty(&dev->events))
-		ret = POLLIN | POLLRDNORM;
-	mutex_unlock(&dev->ev_mutex);
-
-	return ret;
-}
-
-static ssize_t inotify_read(struct file *file, char __user *buf,
-			    size_t count, loff_t *pos)
-{
-	size_t event_size = sizeof (struct inotify_event);
-	struct inotify_device *dev;
-	char __user *start;
-	int ret;
-	DEFINE_WAIT(wait);
-
-	start = buf;
-	dev = file->private_data;
-
-	while (1) {
-
-		prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
-
-		mutex_lock(&dev->ev_mutex);
-		if (!list_empty(&dev->events)) {
-			ret = 0;
-			break;
-		}
-		mutex_unlock(&dev->ev_mutex);
-
-		if (file->f_flags & O_NONBLOCK) {
-			ret = -EAGAIN;
-			break;
-		}
-
-		if (signal_pending(current)) {
-			ret = -EINTR;
-			break;
-		}
-
-		schedule();
-	}
-
-	finish_wait(&dev->wq, &wait);
-	if (ret)
-		return ret;
-
-	while (1) {
-		struct inotify_kernel_event *kevent;
-
-		ret = buf - start;
-		if (list_empty(&dev->events))
-			break;
-
-		kevent = inotify_dev_get_event(dev);
-		if (event_size + kevent->event.len > count) {
-			if (ret == 0 && count > 0) {
-				/*
-				 * could not get a single event because we
-				 * didn't have enough buffer space.
-				 */
-				ret = -EINVAL;
-			}
-			break;
-		}
-		remove_kevent(dev, kevent);
-
-		/*
-		 * Must perform the copy_to_user outside the mutex in order
-		 * to avoid a lock order reversal with mmap_sem.
-		 */
-		mutex_unlock(&dev->ev_mutex);
-
-		if (copy_to_user(buf, &kevent->event, event_size)) {
-			ret = -EFAULT;
-			break;
-		}
-		buf += event_size;
-		count -= event_size;
-
-		if (kevent->name) {
-			if (copy_to_user(buf, kevent->name, kevent->event.len)){
-				ret = -EFAULT;
-				break;
-			}
-			buf += kevent->event.len;
-			count -= kevent->event.len;
-		}
-
-		free_kevent(kevent);
-
-		mutex_lock(&dev->ev_mutex);
-	}
-	mutex_unlock(&dev->ev_mutex);
-
-	return ret;
-}
-
-static int inotify_fasync(int fd, struct file *file, int on)
-{
-	struct inotify_device *dev = file->private_data;
-
-	return fasync_helper(fd, file, on, &dev->fa) >= 0 ? 0 : -EIO;
-}
-
-static int inotify_release(struct inode *ignored, struct file *file)
-{
-	struct inotify_device *dev = file->private_data;
-
-	inotify_destroy(dev->ih);
-
-	/* destroy all of the events on this device */
-	mutex_lock(&dev->ev_mutex);
-	while (!list_empty(&dev->events))
-		inotify_dev_event_dequeue(dev);
-	mutex_unlock(&dev->ev_mutex);
-
-	/* free this device: the put matching the get in inotify_init() */
-	put_inotify_dev(dev);
-
-	return 0;
-}
-
-static long inotify_ioctl(struct file *file, unsigned int cmd,
-			  unsigned long arg)
-{
-	struct inotify_device *dev;
-	void __user *p;
-	int ret = -ENOTTY;
-
-	dev = file->private_data;
-	p = (void __user *) arg;
-
-	switch (cmd) {
-	case FIONREAD:
-		ret = put_user(dev->queue_size, (int __user *) p);
-		break;
-	}
-
-	return ret;
-}
-
-static const struct file_operations inotify_fops = {
-	.poll           = inotify_poll,
-	.read           = inotify_read,
-	.fasync         = inotify_fasync,
-	.release        = inotify_release,
-	.unlocked_ioctl = inotify_ioctl,
-	.compat_ioctl	= inotify_ioctl,
-};
-
-static const struct inotify_operations inotify_user_ops = {
-	.handle_event	= inotify_dev_queue_event,
-	.destroy_watch	= free_inotify_user_watch,
-};
-
-asmlinkage long sys_inotify_init1(int flags)
-{
-	struct inotify_device *dev;
-	struct inotify_handle *ih;
-	struct user_struct *user;
-	struct file *filp;
-	int fd, ret;
-
-	/* Check the IN_* constants for consistency.  */
-	BUILD_BUG_ON(IN_CLOEXEC != O_CLOEXEC);
-	BUILD_BUG_ON(IN_NONBLOCK != O_NONBLOCK);
-
-	if (flags & ~(IN_CLOEXEC | IN_NONBLOCK))
-		return -EINVAL;
-
-	fd = get_unused_fd_flags(flags & O_CLOEXEC);
-	if (fd < 0)
-		return fd;
-
-	filp = get_empty_filp();
-	if (!filp) {
-		ret = -ENFILE;
-		goto out_put_fd;
-	}
-
-	user = get_uid(current->user);
-	if (unlikely(atomic_read(&user->inotify_devs) >=
-			inotify_max_user_instances)) {
-		ret = -EMFILE;
-		goto out_free_uid;
-	}
-
-	dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
-	if (unlikely(!dev)) {
-		ret = -ENOMEM;
-		goto out_free_uid;
-	}
-
-	ih = inotify_init(&inotify_user_ops);
-	if (IS_ERR(ih)) {
-		ret = PTR_ERR(ih);
-		goto out_free_dev;
-	}
-	dev->ih = ih;
-	dev->fa = NULL;
-
-	filp->f_op = &inotify_fops;
-	filp->f_path.mnt = mntget(inotify_mnt);
-	filp->f_path.dentry = dget(inotify_mnt->mnt_root);
-	filp->f_mapping = filp->f_path.dentry->d_inode->i_mapping;
-	filp->f_mode = FMODE_READ;
-	filp->f_flags = O_RDONLY | (flags & O_NONBLOCK);
-	filp->private_data = dev;
-
-	INIT_LIST_HEAD(&dev->events);
-	init_waitqueue_head(&dev->wq);
-	mutex_init(&dev->ev_mutex);
-	mutex_init(&dev->up_mutex);
-	dev->event_count = 0;
-	dev->queue_size = 0;
-	dev->max_events = inotify_max_queued_events;
-	dev->user = user;
-	atomic_set(&dev->count, 0);
-
-	get_inotify_dev(dev);
-	atomic_inc(&user->inotify_devs);
-	fd_install(fd, filp);
-
-	return fd;
-out_free_dev:
-	kfree(dev);
-out_free_uid:
-	free_uid(user);
-	put_filp(filp);
-out_put_fd:
-	put_unused_fd(fd);
-	return ret;
-}
-
-asmlinkage long sys_inotify_init(void)
-{
-	return sys_inotify_init1(0);
-}
-
-asmlinkage long sys_inotify_add_watch(int fd, const char __user *pathname, u32 mask)
-{
-	struct inode *inode;
-	struct inotify_device *dev;
-	struct path path;
-	struct file *filp;
-	int ret, fput_needed;
-	unsigned flags = 0;
-
-	filp = fget_light(fd, &fput_needed);
-	if (unlikely(!filp))
-		return -EBADF;
-
-	/* verify that this is indeed an inotify instance */
-	if (unlikely(filp->f_op != &inotify_fops)) {
-		ret = -EINVAL;
-		goto fput_and_out;
-	}
-
-	if (!(mask & IN_DONT_FOLLOW))
-		flags |= LOOKUP_FOLLOW;
-	if (mask & IN_ONLYDIR)
-		flags |= LOOKUP_DIRECTORY;
-
-	ret = find_inode(pathname, &path, flags);
-	if (unlikely(ret))
-		goto fput_and_out;
-
-	/* inode held in place by reference to path; dev by fget on fd */
-	inode = path.dentry->d_inode;
-	dev = filp->private_data;
-
-	mutex_lock(&dev->up_mutex);
-	ret = inotify_find_update_watch(dev->ih, inode, mask);
-	if (ret == -ENOENT)
-		ret = create_watch(dev, inode, mask);
-	mutex_unlock(&dev->up_mutex);
-
-	path_put(&path);
-fput_and_out:
-	fput_light(filp, fput_needed);
-	return ret;
-}
-
-asmlinkage long sys_inotify_rm_watch(int fd, u32 wd)
-{
-	struct file *filp;
-	struct inotify_device *dev;
-	int ret, fput_needed;
-
-	filp = fget_light(fd, &fput_needed);
-	if (unlikely(!filp))
-		return -EBADF;
-
-	/* verify that this is indeed an inotify instance */
-	if (unlikely(filp->f_op != &inotify_fops)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	dev = filp->private_data;
-
-	/* we free our watch data when we get IN_IGNORED */
-	ret = inotify_rm_wd(dev->ih, wd);
-
-out:
-	fput_light(filp, fput_needed);
-	return ret;
-}
-
-static int
-inotify_get_sb(struct file_system_type *fs_type, int flags,
-	       const char *dev_name, void *data, struct vfsmount *mnt)
-{
-	return get_sb_pseudo(fs_type, "inotify", NULL,
-			INOTIFYFS_SUPER_MAGIC, mnt);
-}
-
-static struct file_system_type inotify_fs_type = {
-    .name           = "inotifyfs",
-    .get_sb         = inotify_get_sb,
-    .kill_sb        = kill_anon_super,
-};
-
-/*
- * inotify_user_setup - Our initialization function.  Note that we cannnot return
- * error because we have compiled-in VFS hooks.  So an (unlikely) failure here
- * must result in panic().
- */
-static int __init inotify_user_setup(void)
-{
-	int ret;
-
-	ret = register_filesystem(&inotify_fs_type);
-	if (unlikely(ret))
-		panic("inotify: register_filesystem returned %d!\n", ret);
-
-	inotify_mnt = kern_mount(&inotify_fs_type);
-	if (IS_ERR(inotify_mnt))
-		panic("inotify: kern_mount ret %ld!\n", PTR_ERR(inotify_mnt));
-
-	inotify_max_queued_events = 16384;
-	inotify_max_user_instances = 128;
-	inotify_max_user_watches = 8192;
-
-	watch_cachep = kmem_cache_create("inotify_watch_cache",
-					 sizeof(struct inotify_user_watch),
-					 0, SLAB_PANIC, NULL);
-	event_cachep = kmem_cache_create("inotify_event_cache",
-					 sizeof(struct inotify_kernel_event),
-					 0, SLAB_PANIC, NULL);
-
-	return 0;
-}
-
-module_init(inotify_user_setup);
diff --git a/fs/notify/Kconfig b/fs/notify/Kconfig
new file mode 100644
index 0000000..50914d7
--- /dev/null
+++ b/fs/notify/Kconfig
@@ -0,0 +1,2 @@
+source "fs/notify/dnotify/Kconfig"
+source "fs/notify/inotify/Kconfig"
diff --git a/fs/notify/Makefile b/fs/notify/Makefile
new file mode 100644
index 0000000..5a95b60
--- /dev/null
+++ b/fs/notify/Makefile
@@ -0,0 +1,2 @@
+obj-y			+= dnotify/
+obj-y			+= inotify/
diff --git a/fs/notify/dnotify/Kconfig b/fs/notify/dnotify/Kconfig
new file mode 100644
index 0000000..26adf5d
--- /dev/null
+++ b/fs/notify/dnotify/Kconfig
@@ -0,0 +1,10 @@
+config DNOTIFY
+	bool "Dnotify support"
+	default y
+	help
+	  Dnotify is a directory-based per-fd file change notification system
+	  that uses signals to communicate events to user-space.  There exist
+	  superior alternatives, but some applications may still rely on
+	  dnotify.
+
+	  If unsure, say Y.
diff --git a/fs/notify/dnotify/Makefile b/fs/notify/dnotify/Makefile
new file mode 100644
index 0000000..f145251
--- /dev/null
+++ b/fs/notify/dnotify/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_DNOTIFY)		+= dnotify.o
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
new file mode 100644
index 0000000..676073b
--- /dev/null
+++ b/fs/notify/dnotify/dnotify.c
@@ -0,0 +1,194 @@
+/*
+ * Directory notifications for Linux.
+ *
+ * Copyright (C) 2000,2001,2002 Stephen Rothwell
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/dnotify.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/fdtable.h>
+
+int dir_notify_enable __read_mostly = 1;
+
+static struct kmem_cache *dn_cache __read_mostly;
+
+static void redo_inode_mask(struct inode *inode)
+{
+	unsigned long new_mask;
+	struct dnotify_struct *dn;
+
+	new_mask = 0;
+	for (dn = inode->i_dnotify; dn != NULL; dn = dn->dn_next)
+		new_mask |= dn->dn_mask & ~DN_MULTISHOT;
+	inode->i_dnotify_mask = new_mask;
+}
+
+void dnotify_flush(struct file *filp, fl_owner_t id)
+{
+	struct dnotify_struct *dn;
+	struct dnotify_struct **prev;
+	struct inode *inode;
+
+	inode = filp->f_path.dentry->d_inode;
+	if (!S_ISDIR(inode->i_mode))
+		return;
+	spin_lock(&inode->i_lock);
+	prev = &inode->i_dnotify;
+	while ((dn = *prev) != NULL) {
+		if ((dn->dn_owner == id) && (dn->dn_filp == filp)) {
+			*prev = dn->dn_next;
+			redo_inode_mask(inode);
+			kmem_cache_free(dn_cache, dn);
+			break;
+		}
+		prev = &dn->dn_next;
+	}
+	spin_unlock(&inode->i_lock);
+}
+
+int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
+{
+	struct dnotify_struct *dn;
+	struct dnotify_struct *odn;
+	struct dnotify_struct **prev;
+	struct inode *inode;
+	fl_owner_t id = current->files;
+	struct file *f;
+	int error = 0;
+
+	if ((arg & ~DN_MULTISHOT) == 0) {
+		dnotify_flush(filp, id);
+		return 0;
+	}
+	if (!dir_notify_enable)
+		return -EINVAL;
+	inode = filp->f_path.dentry->d_inode;
+	if (!S_ISDIR(inode->i_mode))
+		return -ENOTDIR;
+	dn = kmem_cache_alloc(dn_cache, GFP_KERNEL);
+	if (dn == NULL)
+		return -ENOMEM;
+	spin_lock(&inode->i_lock);
+	prev = &inode->i_dnotify;
+	while ((odn = *prev) != NULL) {
+		if ((odn->dn_owner == id) && (odn->dn_filp == filp)) {
+			odn->dn_fd = fd;
+			odn->dn_mask |= arg;
+			inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
+			goto out_free;
+		}
+		prev = &odn->dn_next;
+	}
+
+	rcu_read_lock();
+	f = fcheck(fd);
+	rcu_read_unlock();
+	/* we'd lost the race with close(), sod off silently */
+	/* note that inode->i_lock prevents reordering problems
+	 * between accesses to descriptor table and ->i_dnotify */
+	if (f != filp)
+		goto out_free;
+
+	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+	if (error)
+		goto out_free;
+
+	dn->dn_mask = arg;
+	dn->dn_fd = fd;
+	dn->dn_filp = filp;
+	dn->dn_owner = id;
+	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
+	dn->dn_next = inode->i_dnotify;
+	inode->i_dnotify = dn;
+	spin_unlock(&inode->i_lock);
+
+	if (filp->f_op && filp->f_op->dir_notify)
+		return filp->f_op->dir_notify(filp, arg);
+	return 0;
+
+out_free:
+	spin_unlock(&inode->i_lock);
+	kmem_cache_free(dn_cache, dn);
+	return error;
+}
+
+void __inode_dir_notify(struct inode *inode, unsigned long event)
+{
+	struct dnotify_struct *	dn;
+	struct dnotify_struct **prev;
+	struct fown_struct *	fown;
+	int			changed = 0;
+
+	spin_lock(&inode->i_lock);
+	prev = &inode->i_dnotify;
+	while ((dn = *prev) != NULL) {
+		if ((dn->dn_mask & event) == 0) {
+			prev = &dn->dn_next;
+			continue;
+		}
+		fown = &dn->dn_filp->f_owner;
+		send_sigio(fown, dn->dn_fd, POLL_MSG);
+		if (dn->dn_mask & DN_MULTISHOT)
+			prev = &dn->dn_next;
+		else {
+			*prev = dn->dn_next;
+			changed = 1;
+			kmem_cache_free(dn_cache, dn);
+		}
+	}
+	if (changed)
+		redo_inode_mask(inode);
+	spin_unlock(&inode->i_lock);
+}
+
+EXPORT_SYMBOL(__inode_dir_notify);
+
+/*
+ * This is hopelessly wrong, but unfixable without API changes.  At
+ * least it doesn't oops the kernel...
+ *
+ * To safely access ->d_parent we need to keep d_move away from it.  Use the
+ * dentry's d_lock for this.
+ */
+void dnotify_parent(struct dentry *dentry, unsigned long event)
+{
+	struct dentry *parent;
+
+	if (!dir_notify_enable)
+		return;
+
+	spin_lock(&dentry->d_lock);
+	parent = dentry->d_parent;
+	if (parent->d_inode->i_dnotify_mask & event) {
+		dget(parent);
+		spin_unlock(&dentry->d_lock);
+		__inode_dir_notify(parent->d_inode, event);
+		dput(parent);
+	} else {
+		spin_unlock(&dentry->d_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(dnotify_parent);
+
+static int __init dnotify_init(void)
+{
+	dn_cache = kmem_cache_create("dnotify_cache",
+		sizeof(struct dnotify_struct), 0, SLAB_PANIC, NULL);
+	return 0;
+}
+
+module_init(dnotify_init)
diff --git a/fs/notify/inotify/Kconfig b/fs/notify/inotify/Kconfig
new file mode 100644
index 0000000..4467928
--- /dev/null
+++ b/fs/notify/inotify/Kconfig
@@ -0,0 +1,27 @@
+config INOTIFY
+	bool "Inotify file change notification support"
+	default y
+	---help---
+	  Say Y here to enable inotify support.  Inotify is a file change
+	  notification system and a replacement for dnotify.  Inotify fixes
+	  numerous shortcomings in dnotify and introduces several new features
+	  including multiple file events, one-shot support, and unmount
+	  notification.
+
+	  For more information, see <file:Documentation/filesystems/inotify.txt>
+
+	  If unsure, say Y.
+
+config INOTIFY_USER
+	bool "Inotify support for userspace"
+	depends on INOTIFY
+	default y
+	---help---
+	  Say Y here to enable inotify support for userspace, including the
+	  associated system calls.  Inotify allows monitoring of both files and
+	  directories via a single open fd.  Events are read from the file
+	  descriptor, which is also select()- and poll()-able.
+
+	  For more information, see <file:Documentation/filesystems/inotify.txt>
+
+	  If unsure, say Y.
diff --git a/fs/notify/inotify/Makefile b/fs/notify/inotify/Makefile
new file mode 100644
index 0000000..e290f3b
--- /dev/null
+++ b/fs/notify/inotify/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_INOTIFY)		+= inotify.o
+obj-$(CONFIG_INOTIFY_USER)	+= inotify_user.o
diff --git a/fs/notify/inotify/inotify.c b/fs/notify/inotify/inotify.c
new file mode 100644
index 0000000..7bbed1b
--- /dev/null
+++ b/fs/notify/inotify/inotify.c
@@ -0,0 +1,911 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ *	John McCutchan	<ttb@tentacle.dhs.org>
+ *	Robert Love	<rml@novell.com>
+ *
+ * Kernel API added by: Amy Griffis <amy.griffis@hp.com>
+ *
+ * Copyright (C) 2005 John McCutchan
+ * Copyright 2006 Hewlett-Packard Development Company, L.P.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+static atomic_t inotify_cookie;
+
+/*
+ * Lock ordering:
+ *
+ * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
+ * iprune_mutex (synchronize shrink_icache_memory())
+ * 	inode_lock (protects the super_block->s_inodes list)
+ * 	inode->inotify_mutex (protects inode->inotify_watches and watches->i_list)
+ * 		inotify_handle->mutex (protects inotify_handle and watches->h_list)
+ *
+ * The inode->inotify_mutex and inotify_handle->mutex and held during execution
+ * of a caller's event handler.  Thus, the caller must not hold any locks
+ * taken in their event handler while calling any of the published inotify
+ * interfaces.
+ */
+
+/*
+ * Lifetimes of the three main data structures--inotify_handle, inode, and
+ * inotify_watch--are managed by reference count.
+ *
+ * inotify_handle: Lifetime is from inotify_init() to inotify_destroy().
+ * Additional references can bump the count via get_inotify_handle() and drop
+ * the count via put_inotify_handle().
+ *
+ * inotify_watch: for inotify's purposes, lifetime is from inotify_add_watch()
+ * to remove_watch_no_event().  Additional references can bump the count via
+ * get_inotify_watch() and drop the count via put_inotify_watch().  The caller
+ * is reponsible for the final put after receiving IN_IGNORED, or when using
+ * IN_ONESHOT after receiving the first event.  Inotify does the final put if
+ * inotify_destroy() is called.
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * inotify_add_watch() to the final put_inotify_watch().
+ */
+
+/*
+ * struct inotify_handle - represents an inotify instance
+ *
+ * This structure is protected by the mutex 'mutex'.
+ */
+struct inotify_handle {
+	struct idr		idr;		/* idr mapping wd -> watch */
+	struct mutex		mutex;		/* protects this bad boy */
+	struct list_head	watches;	/* list of watches */
+	atomic_t		count;		/* reference count */
+	u32			last_wd;	/* the last wd allocated */
+	const struct inotify_operations *in_ops; /* inotify caller operations */
+};
+
+static inline void get_inotify_handle(struct inotify_handle *ih)
+{
+	atomic_inc(&ih->count);
+}
+
+static inline void put_inotify_handle(struct inotify_handle *ih)
+{
+	if (atomic_dec_and_test(&ih->count)) {
+		idr_destroy(&ih->idr);
+		kfree(ih);
+	}
+}
+
+/**
+ * get_inotify_watch - grab a reference to an inotify_watch
+ * @watch: watch to grab
+ */
+void get_inotify_watch(struct inotify_watch *watch)
+{
+	atomic_inc(&watch->count);
+}
+EXPORT_SYMBOL_GPL(get_inotify_watch);
+
+int pin_inotify_watch(struct inotify_watch *watch)
+{
+	struct super_block *sb = watch->inode->i_sb;
+	spin_lock(&sb_lock);
+	if (sb->s_count >= S_BIAS) {
+		atomic_inc(&sb->s_active);
+		spin_unlock(&sb_lock);
+		atomic_inc(&watch->count);
+		return 1;
+	}
+	spin_unlock(&sb_lock);
+	return 0;
+}
+
+/**
+ * put_inotify_watch - decrements the ref count on a given watch.  cleans up
+ * watch references if the count reaches zero.  inotify_watch is freed by
+ * inotify callers via the destroy_watch() op.
+ * @watch: watch to release
+ */
+void put_inotify_watch(struct inotify_watch *watch)
+{
+	if (atomic_dec_and_test(&watch->count)) {
+		struct inotify_handle *ih = watch->ih;
+
+		iput(watch->inode);
+		ih->in_ops->destroy_watch(watch);
+		put_inotify_handle(ih);
+	}
+}
+EXPORT_SYMBOL_GPL(put_inotify_watch);
+
+void unpin_inotify_watch(struct inotify_watch *watch)
+{
+	struct super_block *sb = watch->inode->i_sb;
+	put_inotify_watch(watch);
+	deactivate_super(sb);
+}
+
+/*
+ * inotify_handle_get_wd - returns the next WD for use by the given handle
+ *
+ * Callers must hold ih->mutex.  This function can sleep.
+ */
+static int inotify_handle_get_wd(struct inotify_handle *ih,
+				 struct inotify_watch *watch)
+{
+	int ret;
+
+	do {
+		if (unlikely(!idr_pre_get(&ih->idr, GFP_KERNEL)))
+			return -ENOSPC;
+		ret = idr_get_new_above(&ih->idr, watch, ih->last_wd+1, &watch->wd);
+	} while (ret == -EAGAIN);
+
+	if (likely(!ret))
+		ih->last_wd = watch->wd;
+
+	return ret;
+}
+
+/*
+ * inotify_inode_watched - returns nonzero if there are watches on this inode
+ * and zero otherwise.  We call this lockless, we do not care if we race.
+ */
+static inline int inotify_inode_watched(struct inode *inode)
+{
+	return !list_empty(&inode->inotify_watches);
+}
+
+/*
+ * Get child dentry flag into synch with parent inode.
+ * Flag should always be clear for negative dentrys.
+ */
+static void set_dentry_child_flags(struct inode *inode, int watched)
+{
+	struct dentry *alias;
+
+	spin_lock(&dcache_lock);
+	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
+		struct dentry *child;
+
+		list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
+			if (!child->d_inode)
+				continue;
+
+			spin_lock(&child->d_lock);
+			if (watched)
+				child->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+			else
+				child->d_flags &=~DCACHE_INOTIFY_PARENT_WATCHED;
+			spin_unlock(&child->d_lock);
+		}
+	}
+	spin_unlock(&dcache_lock);
+}
+
+/*
+ * inotify_find_handle - find the watch associated with the given inode and
+ * handle
+ *
+ * Callers must hold inode->inotify_mutex.
+ */
+static struct inotify_watch *inode_find_handle(struct inode *inode,
+					       struct inotify_handle *ih)
+{
+	struct inotify_watch *watch;
+
+	list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+		if (watch->ih == ih)
+			return watch;
+	}
+
+	return NULL;
+}
+
+/*
+ * remove_watch_no_event - remove watch without the IN_IGNORED event.
+ *
+ * Callers must hold both inode->inotify_mutex and ih->mutex.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+				  struct inotify_handle *ih)
+{
+	list_del(&watch->i_list);
+	list_del(&watch->h_list);
+
+	if (!inotify_inode_watched(watch->inode))
+		set_dentry_child_flags(watch->inode, 0);
+
+	idr_remove(&ih->idr, watch->wd);
+}
+
+/**
+ * inotify_remove_watch_locked - Remove a watch from both the handle and the
+ * inode.  Sends the IN_IGNORED event signifying that the inode is no longer
+ * watched.  May be invoked from a caller's event handler.
+ * @ih: inotify handle associated with watch
+ * @watch: watch to remove
+ *
+ * Callers must hold both inode->inotify_mutex and ih->mutex.
+ */
+void inotify_remove_watch_locked(struct inotify_handle *ih,
+				 struct inotify_watch *watch)
+{
+	remove_watch_no_event(watch, ih);
+	ih->in_ops->handle_event(watch, watch->wd, IN_IGNORED, 0, NULL, NULL);
+}
+EXPORT_SYMBOL_GPL(inotify_remove_watch_locked);
+
+/* Kernel API for producing events */
+
+/*
+ * inotify_d_instantiate - instantiate dcache entry for inode
+ */
+void inotify_d_instantiate(struct dentry *entry, struct inode *inode)
+{
+	struct dentry *parent;
+
+	if (!inode)
+		return;
+
+	spin_lock(&entry->d_lock);
+	parent = entry->d_parent;
+	if (parent->d_inode && inotify_inode_watched(parent->d_inode))
+		entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+	spin_unlock(&entry->d_lock);
+}
+
+/*
+ * inotify_d_move - dcache entry has been moved
+ */
+void inotify_d_move(struct dentry *entry)
+{
+	struct dentry *parent;
+
+	parent = entry->d_parent;
+	if (inotify_inode_watched(parent->d_inode))
+		entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+	else
+		entry->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED;
+}
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ * @n_inode: inode associated with name
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+			       const char *name, struct inode *n_inode)
+{
+	struct inotify_watch *watch, *next;
+
+	if (!inotify_inode_watched(inode))
+		return;
+
+	mutex_lock(&inode->inotify_mutex);
+	list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+		u32 watch_mask = watch->mask;
+		if (watch_mask & mask) {
+			struct inotify_handle *ih= watch->ih;
+			mutex_lock(&ih->mutex);
+			if (watch_mask & IN_ONESHOT)
+				remove_watch_no_event(watch, ih);
+			ih->in_ops->handle_event(watch, watch->wd, mask, cookie,
+						 name, n_inode);
+			mutex_unlock(&ih->mutex);
+		}
+	}
+	mutex_unlock(&inode->inotify_mutex);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+				       u32 cookie, const char *name)
+{
+	struct dentry *parent;
+	struct inode *inode;
+
+	if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
+		return;
+
+	spin_lock(&dentry->d_lock);
+	parent = dentry->d_parent;
+	inode = parent->d_inode;
+
+	if (inotify_inode_watched(inode)) {
+		dget(parent);
+		spin_unlock(&dentry->d_lock);
+		inotify_inode_queue_event(inode, mask, cookie, name,
+					  dentry->d_inode);
+		dput(parent);
+	} else
+		spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events.
+ */
+u32 inotify_get_cookie(void)
+{
+	return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
+ * @list: list of inodes being unmounted (sb->s_inodes)
+ *
+ * Called with inode_lock held, protecting the unmounting super block's list
+ * of inodes, and with iprune_mutex held, keeping shrink_icache_memory() at bay.
+ * We temporarily drop inode_lock, however, and CAN block.
+ */
+void inotify_unmount_inodes(struct list_head *list)
+{
+	struct inode *inode, *next_i, *need_iput = NULL;
+
+	list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
+		struct inotify_watch *watch, *next_w;
+		struct inode *need_iput_tmp;
+		struct list_head *watches;
+
+		/*
+		 * If i_count is zero, the inode cannot have any watches and
+		 * doing an __iget/iput with MS_ACTIVE clear would actually
+		 * evict all inodes with zero i_count from icache which is
+		 * unnecessarily violent and may in fact be illegal to do.
+		 */
+		if (!atomic_read(&inode->i_count))
+			continue;
+
+		/*
+		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
+		 * I_WILL_FREE which is fine because by that point the inode
+		 * cannot have any associated watches.
+		 */
+		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
+			continue;
+
+		need_iput_tmp = need_iput;
+		need_iput = NULL;
+		/* In case inotify_remove_watch_locked() drops a reference. */
+		if (inode != need_iput_tmp)
+			__iget(inode);
+		else
+			need_iput_tmp = NULL;
+		/* In case the dropping of a reference would nuke next_i. */
+		if ((&next_i->i_sb_list != list) &&
+				atomic_read(&next_i->i_count) &&
+				!(next_i->i_state & (I_CLEAR | I_FREEING |
+					I_WILL_FREE))) {
+			__iget(next_i);
+			need_iput = next_i;
+		}
+
+		/*
+		 * We can safely drop inode_lock here because we hold
+		 * references on both inode and next_i.  Also no new inodes
+		 * will be added since the umount has begun.  Finally,
+		 * iprune_mutex keeps shrink_icache_memory() away.
+		 */
+		spin_unlock(&inode_lock);
+
+		if (need_iput_tmp)
+			iput(need_iput_tmp);
+
+		/* for each watch, send IN_UNMOUNT and then remove it */
+		mutex_lock(&inode->inotify_mutex);
+		watches = &inode->inotify_watches;
+		list_for_each_entry_safe(watch, next_w, watches, i_list) {
+			struct inotify_handle *ih= watch->ih;
+			mutex_lock(&ih->mutex);
+			ih->in_ops->handle_event(watch, watch->wd, IN_UNMOUNT, 0,
+						 NULL, NULL);
+			inotify_remove_watch_locked(ih, watch);
+			mutex_unlock(&ih->mutex);
+		}
+		mutex_unlock(&inode->inotify_mutex);
+		iput(inode);		
+
+		spin_lock(&inode_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(inotify_unmount_inodes);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+	struct inotify_watch *watch, *next;
+
+	mutex_lock(&inode->inotify_mutex);
+	list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+		struct inotify_handle *ih = watch->ih;
+		mutex_lock(&ih->mutex);
+		inotify_remove_watch_locked(ih, watch);
+		mutex_unlock(&ih->mutex);
+	}
+	mutex_unlock(&inode->inotify_mutex);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Kernel Consumer API */
+
+/**
+ * inotify_init - allocate and initialize an inotify instance
+ * @ops: caller's inotify operations
+ */
+struct inotify_handle *inotify_init(const struct inotify_operations *ops)
+{
+	struct inotify_handle *ih;
+
+	ih = kmalloc(sizeof(struct inotify_handle), GFP_KERNEL);
+	if (unlikely(!ih))
+		return ERR_PTR(-ENOMEM);
+
+	idr_init(&ih->idr);
+	INIT_LIST_HEAD(&ih->watches);
+	mutex_init(&ih->mutex);
+	ih->last_wd = 0;
+	ih->in_ops = ops;
+	atomic_set(&ih->count, 0);
+	get_inotify_handle(ih);
+
+	return ih;
+}
+EXPORT_SYMBOL_GPL(inotify_init);
+
+/**
+ * inotify_init_watch - initialize an inotify watch
+ * @watch: watch to initialize
+ */
+void inotify_init_watch(struct inotify_watch *watch)
+{
+	INIT_LIST_HEAD(&watch->h_list);
+	INIT_LIST_HEAD(&watch->i_list);
+	atomic_set(&watch->count, 0);
+	get_inotify_watch(watch); /* initial get */
+}
+EXPORT_SYMBOL_GPL(inotify_init_watch);
+
+/*
+ * Watch removals suck violently.  To kick the watch out we need (in this
+ * order) inode->inotify_mutex and ih->mutex.  That's fine if we have
+ * a hold on inode; however, for all other cases we need to make damn sure
+ * we don't race with umount.  We can *NOT* just grab a reference to a
+ * watch - inotify_unmount_inodes() will happily sail past it and we'll end
+ * with reference to inode potentially outliving its superblock.  Ideally
+ * we just want to grab an active reference to superblock if we can; that
+ * will make sure we won't go into inotify_umount_inodes() until we are
+ * done.  Cleanup is just deactivate_super().  However, that leaves a messy
+ * case - what if we *are* racing with umount() and active references to
+ * superblock can't be acquired anymore?  We can bump ->s_count, grab
+ * ->s_umount, which will almost certainly wait until the superblock is shut
+ * down and the watch in question is pining for fjords.  That's fine, but
+ * there is a problem - we might have hit the window between ->s_active
+ * getting to 0 / ->s_count - below S_BIAS (i.e. the moment when superblock
+ * is past the point of no return and is heading for shutdown) and the
+ * moment when deactivate_super() acquires ->s_umount.  We could just do
+ * drop_super() yield() and retry, but that's rather antisocial and this
+ * stuff is luser-triggerable.  OTOH, having grabbed ->s_umount and having
+ * found that we'd got there first (i.e. that ->s_root is non-NULL) we know
+ * that we won't race with inotify_umount_inodes().  So we could grab a
+ * reference to watch and do the rest as above, just with drop_super() instead
+ * of deactivate_super(), right?  Wrong.  We had to drop ih->mutex before we
+ * could grab ->s_umount.  So the watch could've been gone already.
+ *
+ * That still can be dealt with - we need to save watch->wd, do idr_find()
+ * and compare its result with our pointer.  If they match, we either have
+ * the damn thing still alive or we'd lost not one but two races at once,
+ * the watch had been killed and a new one got created with the same ->wd
+ * at the same address.  That couldn't have happened in inotify_destroy(),
+ * but inotify_rm_wd() could run into that.  Still, "new one got created"
+ * is not a problem - we have every right to kill it or leave it alone,
+ * whatever's more convenient.
+ *
+ * So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
+ * "grab it and kill it" check.  If it's been our original watch, we are
+ * fine, if it's a newcomer - nevermind, just pretend that we'd won the
+ * race and kill the fscker anyway; we are safe since we know that its
+ * superblock won't be going away.
+ *
+ * And yes, this is far beyond mere "not very pretty"; so's the entire
+ * concept of inotify to start with.
+ */
+
+/**
+ * pin_to_kill - pin the watch down for removal
+ * @ih: inotify handle
+ * @watch: watch to kill
+ *
+ * Called with ih->mutex held, drops it.  Possible return values:
+ * 0 - nothing to do, it has died
+ * 1 - remove it, drop the reference and deactivate_super()
+ * 2 - remove it, drop the reference and drop_super(); we tried hard to avoid
+ * that variant, since it involved a lot of PITA, but that's the best that
+ * could've been done.
+ */
+static int pin_to_kill(struct inotify_handle *ih, struct inotify_watch *watch)
+{
+	struct super_block *sb = watch->inode->i_sb;
+	s32 wd = watch->wd;
+
+	spin_lock(&sb_lock);
+	if (sb->s_count >= S_BIAS) {
+		atomic_inc(&sb->s_active);
+		spin_unlock(&sb_lock);
+		get_inotify_watch(watch);
+		mutex_unlock(&ih->mutex);
+		return 1;	/* the best outcome */
+	}
+	sb->s_count++;
+	spin_unlock(&sb_lock);
+	mutex_unlock(&ih->mutex); /* can't grab ->s_umount under it */
+	down_read(&sb->s_umount);
+	if (likely(!sb->s_root)) {
+		/* fs is already shut down; the watch is dead */
+		drop_super(sb);
+		return 0;
+	}
+	/* raced with the final deactivate_super() */
+	mutex_lock(&ih->mutex);
+	if (idr_find(&ih->idr, wd) != watch || watch->inode->i_sb != sb) {
+		/* the watch is dead */
+		mutex_unlock(&ih->mutex);
+		drop_super(sb);
+		return 0;
+	}
+	/* still alive or freed and reused with the same sb and wd; kill */
+	get_inotify_watch(watch);
+	mutex_unlock(&ih->mutex);
+	return 2;
+}
+
+static void unpin_and_kill(struct inotify_watch *watch, int how)
+{
+	struct super_block *sb = watch->inode->i_sb;
+	put_inotify_watch(watch);
+	switch (how) {
+	case 1:
+		deactivate_super(sb);
+		break;
+	case 2:
+		drop_super(sb);
+	}
+}
+
+/**
+ * inotify_destroy - clean up and destroy an inotify instance
+ * @ih: inotify handle
+ */
+void inotify_destroy(struct inotify_handle *ih)
+{
+	/*
+	 * Destroy all of the watches for this handle. Unfortunately, not very
+	 * pretty.  We cannot do a simple iteration over the list, because we
+	 * do not know the inode until we iterate to the watch.  But we need to
+	 * hold inode->inotify_mutex before ih->mutex.  The following works.
+	 *
+	 * AV: it had to become even uglier to start working ;-/
+	 */
+	while (1) {
+		struct inotify_watch *watch;
+		struct list_head *watches;
+		struct super_block *sb;
+		struct inode *inode;
+		int how;
+
+		mutex_lock(&ih->mutex);
+		watches = &ih->watches;
+		if (list_empty(watches)) {
+			mutex_unlock(&ih->mutex);
+			break;
+		}
+		watch = list_first_entry(watches, struct inotify_watch, h_list);
+		sb = watch->inode->i_sb;
+		how = pin_to_kill(ih, watch);
+		if (!how)
+			continue;
+
+		inode = watch->inode;
+		mutex_lock(&inode->inotify_mutex);
+		mutex_lock(&ih->mutex);
+
+		/* make sure we didn't race with another list removal */
+		if (likely(idr_find(&ih->idr, watch->wd))) {
+			remove_watch_no_event(watch, ih);
+			put_inotify_watch(watch);
+		}
+
+		mutex_unlock(&ih->mutex);
+		mutex_unlock(&inode->inotify_mutex);
+		unpin_and_kill(watch, how);
+	}
+
+	/* free this handle: the put matching the get in inotify_init() */
+	put_inotify_handle(ih);
+}
+EXPORT_SYMBOL_GPL(inotify_destroy);
+
+/**
+ * inotify_find_watch - find an existing watch for an (ih,inode) pair
+ * @ih: inotify handle
+ * @inode: inode to watch
+ * @watchp: pointer to existing inotify_watch
+ *
+ * Caller must pin given inode (via nameidata).
+ */
+s32 inotify_find_watch(struct inotify_handle *ih, struct inode *inode,
+		       struct inotify_watch **watchp)
+{
+	struct inotify_watch *old;
+	int ret = -ENOENT;
+
+	mutex_lock(&inode->inotify_mutex);
+	mutex_lock(&ih->mutex);
+
+	old = inode_find_handle(inode, ih);
+	if (unlikely(old)) {
+		get_inotify_watch(old); /* caller must put watch */
+		*watchp = old;
+		ret = old->wd;
+	}
+
+	mutex_unlock(&ih->mutex);
+	mutex_unlock(&inode->inotify_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(inotify_find_watch);
+
+/**
+ * inotify_find_update_watch - find and update the mask of an existing watch
+ * @ih: inotify handle
+ * @inode: inode's watch to update
+ * @mask: mask of events to watch
+ *
+ * Caller must pin given inode (via nameidata).
+ */
+s32 inotify_find_update_watch(struct inotify_handle *ih, struct inode *inode,
+			      u32 mask)
+{
+	struct inotify_watch *old;
+	int mask_add = 0;
+	int ret;
+
+	if (mask & IN_MASK_ADD)
+		mask_add = 1;
+
+	/* don't allow invalid bits: we don't want flags set */
+	mask &= IN_ALL_EVENTS | IN_ONESHOT;
+	if (unlikely(!mask))
+		return -EINVAL;
+
+	mutex_lock(&inode->inotify_mutex);
+	mutex_lock(&ih->mutex);
+
+	/*
+	 * Handle the case of re-adding a watch on an (inode,ih) pair that we
+	 * are already watching.  We just update the mask and return its wd.
+	 */
+	old = inode_find_handle(inode, ih);
+	if (unlikely(!old)) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	if (mask_add)
+		old->mask |= mask;
+	else
+		old->mask = mask;
+	ret = old->wd;
+out:
+	mutex_unlock(&ih->mutex);
+	mutex_unlock(&inode->inotify_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(inotify_find_update_watch);
+
+/**
+ * inotify_add_watch - add a watch to an inotify instance
+ * @ih: inotify handle
+ * @watch: caller allocated watch structure
+ * @inode: inode to watch
+ * @mask: mask of events to watch
+ *
+ * Caller must pin given inode (via nameidata).
+ * Caller must ensure it only calls inotify_add_watch() once per watch.
+ * Calls inotify_handle_get_wd() so may sleep.
+ */
+s32 inotify_add_watch(struct inotify_handle *ih, struct inotify_watch *watch,
+		      struct inode *inode, u32 mask)
+{
+	int ret = 0;
+	int newly_watched;
+
+	/* don't allow invalid bits: we don't want flags set */
+	mask &= IN_ALL_EVENTS | IN_ONESHOT;
+	if (unlikely(!mask))
+		return -EINVAL;
+	watch->mask = mask;
+
+	mutex_lock(&inode->inotify_mutex);
+	mutex_lock(&ih->mutex);
+
+	/* Initialize a new watch */
+	ret = inotify_handle_get_wd(ih, watch);
+	if (unlikely(ret))
+		goto out;
+	ret = watch->wd;
+
+	/* save a reference to handle and bump the count to make it official */
+	get_inotify_handle(ih);
+	watch->ih = ih;
+
+	/*
+	 * Save a reference to the inode and bump the ref count to make it
+	 * official.  We hold a reference to nameidata, which makes this safe.
+	 */
+	watch->inode = igrab(inode);
+
+	/* Add the watch to the handle's and the inode's list */
+	newly_watched = !inotify_inode_watched(inode);
+	list_add(&watch->h_list, &ih->watches);
+	list_add(&watch->i_list, &inode->inotify_watches);
+	/*
+	 * Set child flags _after_ adding the watch, so there is no race
+	 * windows where newly instantiated children could miss their parent's
+	 * watched flag.
+	 */
+	if (newly_watched)
+		set_dentry_child_flags(inode, 1);
+
+out:
+	mutex_unlock(&ih->mutex);
+	mutex_unlock(&inode->inotify_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(inotify_add_watch);
+
+/**
+ * inotify_clone_watch - put the watch next to existing one
+ * @old: already installed watch
+ * @new: new watch
+ *
+ * Caller must hold the inotify_mutex of inode we are dealing with;
+ * it is expected to remove the old watch before unlocking the inode.
+ */
+s32 inotify_clone_watch(struct inotify_watch *old, struct inotify_watch *new)
+{
+	struct inotify_handle *ih = old->ih;
+	int ret = 0;
+
+	new->mask = old->mask;
+	new->ih = ih;
+
+	mutex_lock(&ih->mutex);
+
+	/* Initialize a new watch */
+	ret = inotify_handle_get_wd(ih, new);
+	if (unlikely(ret))
+		goto out;
+	ret = new->wd;
+
+	get_inotify_handle(ih);
+
+	new->inode = igrab(old->inode);
+
+	list_add(&new->h_list, &ih->watches);
+	list_add(&new->i_list, &old->inode->inotify_watches);
+out:
+	mutex_unlock(&ih->mutex);
+	return ret;
+}
+
+void inotify_evict_watch(struct inotify_watch *watch)
+{
+	get_inotify_watch(watch);
+	mutex_lock(&watch->ih->mutex);
+	inotify_remove_watch_locked(watch->ih, watch);
+	mutex_unlock(&watch->ih->mutex);
+}
+
+/**
+ * inotify_rm_wd - remove a watch from an inotify instance
+ * @ih: inotify handle
+ * @wd: watch descriptor to remove
+ *
+ * Can sleep.
+ */
+int inotify_rm_wd(struct inotify_handle *ih, u32 wd)
+{
+	struct inotify_watch *watch;
+	struct super_block *sb;
+	struct inode *inode;
+	int how;
+
+	mutex_lock(&ih->mutex);
+	watch = idr_find(&ih->idr, wd);
+	if (unlikely(!watch)) {
+		mutex_unlock(&ih->mutex);
+		return -EINVAL;
+	}
+	sb = watch->inode->i_sb;
+	how = pin_to_kill(ih, watch);
+	if (!how)
+		return 0;
+
+	inode = watch->inode;
+
+	mutex_lock(&inode->inotify_mutex);
+	mutex_lock(&ih->mutex);
+
+	/* make sure that we did not race */
+	if (likely(idr_find(&ih->idr, wd) == watch))
+		inotify_remove_watch_locked(ih, watch);
+
+	mutex_unlock(&ih->mutex);
+	mutex_unlock(&inode->inotify_mutex);
+	unpin_and_kill(watch, how);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(inotify_rm_wd);
+
+/**
+ * inotify_rm_watch - remove a watch from an inotify instance
+ * @ih: inotify handle
+ * @watch: watch to remove
+ *
+ * Can sleep.
+ */
+int inotify_rm_watch(struct inotify_handle *ih,
+		     struct inotify_watch *watch)
+{
+	return inotify_rm_wd(ih, watch->wd);
+}
+EXPORT_SYMBOL_GPL(inotify_rm_watch);
+
+/*
+ * inotify_setup - core initialization function
+ */
+static int __init inotify_setup(void)
+{
+	atomic_set(&inotify_cookie, 0);
+
+	return 0;
+}
+
+module_init(inotify_setup);
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
new file mode 100644
index 0000000..d367e9b
--- /dev/null
+++ b/fs/notify/inotify/inotify_user.c
@@ -0,0 +1,778 @@
+/*
+ * fs/inotify_user.c - inotify support for userspace
+ *
+ * Authors:
+ *	John McCutchan	<ttb@tentacle.dhs.org>
+ *	Robert Love	<rml@novell.com>
+ *
+ * Copyright (C) 2005 John McCutchan
+ * Copyright 2006 Hewlett-Packard Development Company, L.P.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/inotify.h>
+#include <linux/syscalls.h>
+#include <linux/magic.h>
+
+#include <asm/ioctls.h>
+
+static struct kmem_cache *watch_cachep __read_mostly;
+static struct kmem_cache *event_cachep __read_mostly;
+
+static struct vfsmount *inotify_mnt __read_mostly;
+
+/* these are configurable via /proc/sys/fs/inotify/ */
+static int inotify_max_user_instances __read_mostly;
+static int inotify_max_user_watches __read_mostly;
+static int inotify_max_queued_events __read_mostly;
+
+/*
+ * Lock ordering:
+ *
+ * inotify_dev->up_mutex (ensures we don't re-add the same watch)
+ * 	inode->inotify_mutex (protects inode's watch list)
+ * 		inotify_handle->mutex (protects inotify_handle's watch list)
+ * 			inotify_dev->ev_mutex (protects device's event queue)
+ */
+
+/*
+ * Lifetimes of the main data structures:
+ *
+ * inotify_device: Lifetime is managed by reference count, from
+ * sys_inotify_init() until release.  Additional references can bump the count
+ * via get_inotify_dev() and drop the count via put_inotify_dev().
+ *
+ * inotify_user_watch: Lifetime is from create_watch() to the receipt of an
+ * IN_IGNORED event from inotify, or when using IN_ONESHOT, to receipt of the
+ * first event, or to inotify_destroy().
+ */
+
+/*
+ * struct inotify_device - represents an inotify instance
+ *
+ * This structure is protected by the mutex 'mutex'.
+ */
+struct inotify_device {
+	wait_queue_head_t 	wq;		/* wait queue for i/o */
+	struct mutex		ev_mutex;	/* protects event queue */
+	struct mutex		up_mutex;	/* synchronizes watch updates */
+	struct list_head 	events;		/* list of queued events */
+	atomic_t		count;		/* reference count */
+	struct user_struct	*user;		/* user who opened this dev */
+	struct inotify_handle	*ih;		/* inotify handle */
+	struct fasync_struct    *fa;            /* async notification */
+	unsigned int		queue_size;	/* size of the queue (bytes) */
+	unsigned int		event_count;	/* number of pending events */
+	unsigned int		max_events;	/* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An inotify event, originating from a watch and
+ * queued for user-space.  A list of these is attached to each instance of the
+ * device.  In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->ev_mutex of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+	struct inotify_event	event;	/* the user-space event */
+	struct list_head        list;	/* entry in inotify_device's list */
+	char			*name;	/* filename, if any */
+};
+
+/*
+ * struct inotify_user_watch - our version of an inotify_watch, we add
+ * a reference to the associated inotify_device.
+ */
+struct inotify_user_watch {
+	struct inotify_device	*dev;	/* associated device */
+	struct inotify_watch	wdata;	/* inotify watch data */
+};
+
+#ifdef CONFIG_SYSCTL
+
+#include <linux/sysctl.h>
+
+static int zero;
+
+ctl_table inotify_table[] = {
+	{
+		.ctl_name	= INOTIFY_MAX_USER_INSTANCES,
+		.procname	= "max_user_instances",
+		.data		= &inotify_max_user_instances,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+	{
+		.ctl_name	= INOTIFY_MAX_USER_WATCHES,
+		.procname	= "max_user_watches",
+		.data		= &inotify_max_user_watches,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+	{
+		.ctl_name	= INOTIFY_MAX_QUEUED_EVENTS,
+		.procname	= "max_queued_events",
+		.data		= &inotify_max_queued_events,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero
+	},
+	{ .ctl_name = 0 }
+};
+#endif /* CONFIG_SYSCTL */
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+	atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+	if (atomic_dec_and_test(&dev->count)) {
+		atomic_dec(&dev->user->inotify_devs);
+		free_uid(dev->user);
+		kfree(dev);
+	}
+}
+
+/*
+ * free_inotify_user_watch - cleans up the watch and its references
+ */
+static void free_inotify_user_watch(struct inotify_watch *w)
+{
+	struct inotify_user_watch *watch;
+	struct inotify_device *dev;
+
+	watch = container_of(w, struct inotify_user_watch, wdata);
+	dev = watch->dev;
+
+	atomic_dec(&dev->user->inotify_watches);
+	put_inotify_dev(dev);
+	kmem_cache_free(watch_cachep, watch);
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ *
+ * This function can sleep.
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+						  const char *name)
+{
+	struct inotify_kernel_event *kevent;
+
+	kevent = kmem_cache_alloc(event_cachep, GFP_NOFS);
+	if (unlikely(!kevent))
+		return NULL;
+
+	/* we hand this out to user-space, so zero it just in case */
+	memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+	kevent->event.wd = wd;
+	kevent->event.mask = mask;
+	kevent->event.cookie = cookie;
+
+	INIT_LIST_HEAD(&kevent->list);
+
+	if (name) {
+		size_t len, rem, event_size = sizeof(struct inotify_event);
+
+		/*
+		 * We need to pad the filename so as to properly align an
+		 * array of inotify_event structures.  Because the structure is
+		 * small and the common case is a small filename, we just round
+		 * up to the next multiple of the structure's sizeof.  This is
+		 * simple and safe for all architectures.
+		 */
+		len = strlen(name) + 1;
+		rem = event_size - len;
+		if (len > event_size) {
+			rem = event_size - (len % event_size);
+			if (len % event_size == 0)
+				rem = 0;
+		}
+
+		kevent->name = kmalloc(len + rem, GFP_KERNEL);
+		if (unlikely(!kevent->name)) {
+			kmem_cache_free(event_cachep, kevent);
+			return NULL;
+		}
+		memcpy(kevent->name, name, len);
+		if (rem)
+			memset(kevent->name + len, 0, rem);
+		kevent->event.len = len + rem;
+	} else {
+		kevent->event.len = 0;
+		kevent->name = NULL;
+	}
+
+	return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->ev_mutex.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+	return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_get_last_event - return the last event in the given dev's queue
+ *
+ * Caller must hold dev->ev_mutex.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_last_event(struct inotify_device *dev)
+{
+	if (list_empty(&dev->events))
+		return NULL;
+	return list_entry(dev->events.prev, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - event handler registered with core inotify, adds
+ * a new event to the given device
+ *
+ * Can sleep (calls kernel_event()).
+ */
+static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd, u32 mask,
+				    u32 cookie, const char *name,
+				    struct inode *ignored)
+{
+	struct inotify_user_watch *watch;
+	struct inotify_device *dev;
+	struct inotify_kernel_event *kevent, *last;
+
+	watch = container_of(w, struct inotify_user_watch, wdata);
+	dev = watch->dev;
+
+	mutex_lock(&dev->ev_mutex);
+
+	/* we can safely put the watch as we don't reference it while
+	 * generating the event
+	 */
+	if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
+		put_inotify_watch(w); /* final put */
+
+	/* coalescing: drop this event if it is a dupe of the previous */
+	last = inotify_dev_get_last_event(dev);
+	if (last && last->event.mask == mask && last->event.wd == wd &&
+			last->event.cookie == cookie) {
+		const char *lastname = last->name;
+
+		if (!name && !lastname)
+			goto out;
+		if (name && lastname && !strcmp(lastname, name))
+			goto out;
+	}
+
+	/* the queue overflowed and we already sent the Q_OVERFLOW event */
+	if (unlikely(dev->event_count > dev->max_events))
+		goto out;
+
+	/* if the queue overflows, we need to notify user space */
+	if (unlikely(dev->event_count == dev->max_events))
+		kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+	else
+		kevent = kernel_event(wd, mask, cookie, name);
+
+	if (unlikely(!kevent))
+		goto out;
+
+	/* queue the event and wake up anyone waiting */
+	dev->event_count++;
+	dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+	list_add_tail(&kevent->list, &dev->events);
+	wake_up_interruptible(&dev->wq);
+	kill_fasync(&dev->fa, SIGIO, POLL_IN);
+
+out:
+	mutex_unlock(&dev->ev_mutex);
+}
+
+/*
+ * remove_kevent - cleans up the given kevent
+ *
+ * Caller must hold dev->ev_mutex.
+ */
+static void remove_kevent(struct inotify_device *dev,
+			  struct inotify_kernel_event *kevent)
+{
+	list_del(&kevent->list);
+
+	dev->event_count--;
+	dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+}
+
+/*
+ * free_kevent - frees the given kevent.
+ */
+static void free_kevent(struct inotify_kernel_event *kevent)
+{
+	kfree(kevent->name);
+	kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->ev_mutex.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+	if (!list_empty(&dev->events)) {
+		struct inotify_kernel_event *kevent;
+		kevent = inotify_dev_get_event(dev);
+		remove_kevent(dev, kevent);
+		free_kevent(kevent);
+	}
+}
+
+/*
+ * find_inode - resolve a user-given path to a specific inode
+ */
+static int find_inode(const char __user *dirname, struct path *path,
+		      unsigned flags)
+{
+	int error;
+
+	error = user_path_at(AT_FDCWD, dirname, flags, path);
+	if (error)
+		return error;
+	/* you can only watch an inode if you have read permissions on it */
+	error = inode_permission(path->dentry->d_inode, MAY_READ);
+	if (error)
+		path_put(path);
+	return error;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Callers must hold dev->up_mutex.
+ */
+static int create_watch(struct inotify_device *dev, struct inode *inode,
+			u32 mask)
+{
+	struct inotify_user_watch *watch;
+	int ret;
+
+	if (atomic_read(&dev->user->inotify_watches) >=
+			inotify_max_user_watches)
+		return -ENOSPC;
+
+	watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+	if (unlikely(!watch))
+		return -ENOMEM;
+
+	/* save a reference to device and bump the count to make it official */
+	get_inotify_dev(dev);
+	watch->dev = dev;
+
+	atomic_inc(&dev->user->inotify_watches);
+
+	inotify_init_watch(&watch->wdata);
+	ret = inotify_add_watch(dev->ih, &watch->wdata, inode, mask);
+	if (ret < 0)
+		free_inotify_user_watch(&watch->wdata);
+
+	return ret;
+}
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+	struct inotify_device *dev = file->private_data;
+	int ret = 0;
+
+	poll_wait(file, &dev->wq, wait);
+	mutex_lock(&dev->ev_mutex);
+	if (!list_empty(&dev->events))
+		ret = POLLIN | POLLRDNORM;
+	mutex_unlock(&dev->ev_mutex);
+
+	return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+			    size_t count, loff_t *pos)
+{
+	size_t event_size = sizeof (struct inotify_event);
+	struct inotify_device *dev;
+	char __user *start;
+	int ret;
+	DEFINE_WAIT(wait);
+
+	start = buf;
+	dev = file->private_data;
+
+	while (1) {
+
+		prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+		mutex_lock(&dev->ev_mutex);
+		if (!list_empty(&dev->events)) {
+			ret = 0;
+			break;
+		}
+		mutex_unlock(&dev->ev_mutex);
+
+		if (file->f_flags & O_NONBLOCK) {
+			ret = -EAGAIN;
+			break;
+		}
+
+		if (signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+
+		schedule();
+	}
+
+	finish_wait(&dev->wq, &wait);
+	if (ret)
+		return ret;
+
+	while (1) {
+		struct inotify_kernel_event *kevent;
+
+		ret = buf - start;
+		if (list_empty(&dev->events))
+			break;
+
+		kevent = inotify_dev_get_event(dev);
+		if (event_size + kevent->event.len > count) {
+			if (ret == 0 && count > 0) {
+				/*
+				 * could not get a single event because we
+				 * didn't have enough buffer space.
+				 */
+				ret = -EINVAL;
+			}
+			break;
+		}
+		remove_kevent(dev, kevent);
+
+		/*
+		 * Must perform the copy_to_user outside the mutex in order
+		 * to avoid a lock order reversal with mmap_sem.
+		 */
+		mutex_unlock(&dev->ev_mutex);
+
+		if (copy_to_user(buf, &kevent->event, event_size)) {
+			ret = -EFAULT;
+			break;
+		}
+		buf += event_size;
+		count -= event_size;
+
+		if (kevent->name) {
+			if (copy_to_user(buf, kevent->name, kevent->event.len)){
+				ret = -EFAULT;
+				break;
+			}
+			buf += kevent->event.len;
+			count -= kevent->event.len;
+		}
+
+		free_kevent(kevent);
+
+		mutex_lock(&dev->ev_mutex);
+	}
+	mutex_unlock(&dev->ev_mutex);
+
+	return ret;
+}
+
+static int inotify_fasync(int fd, struct file *file, int on)
+{
+	struct inotify_device *dev = file->private_data;
+
+	return fasync_helper(fd, file, on, &dev->fa) >= 0 ? 0 : -EIO;
+}
+
+static int inotify_release(struct inode *ignored, struct file *file)
+{
+	struct inotify_device *dev = file->private_data;
+
+	inotify_destroy(dev->ih);
+
+	/* destroy all of the events on this device */
+	mutex_lock(&dev->ev_mutex);
+	while (!list_empty(&dev->events))
+		inotify_dev_event_dequeue(dev);
+	mutex_unlock(&dev->ev_mutex);
+
+	/* free this device: the put matching the get in inotify_init() */
+	put_inotify_dev(dev);
+
+	return 0;
+}
+
+static long inotify_ioctl(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct inotify_device *dev;
+	void __user *p;
+	int ret = -ENOTTY;
+
+	dev = file->private_data;
+	p = (void __user *) arg;
+
+	switch (cmd) {
+	case FIONREAD:
+		ret = put_user(dev->queue_size, (int __user *) p);
+		break;
+	}
+
+	return ret;
+}
+
+static const struct file_operations inotify_fops = {
+	.poll           = inotify_poll,
+	.read           = inotify_read,
+	.fasync         = inotify_fasync,
+	.release        = inotify_release,
+	.unlocked_ioctl = inotify_ioctl,
+	.compat_ioctl	= inotify_ioctl,
+};
+
+static const struct inotify_operations inotify_user_ops = {
+	.handle_event	= inotify_dev_queue_event,
+	.destroy_watch	= free_inotify_user_watch,
+};
+
+asmlinkage long sys_inotify_init1(int flags)
+{
+	struct inotify_device *dev;
+	struct inotify_handle *ih;
+	struct user_struct *user;
+	struct file *filp;
+	int fd, ret;
+
+	/* Check the IN_* constants for consistency.  */
+	BUILD_BUG_ON(IN_CLOEXEC != O_CLOEXEC);
+	BUILD_BUG_ON(IN_NONBLOCK != O_NONBLOCK);
+
+	if (flags & ~(IN_CLOEXEC | IN_NONBLOCK))
+		return -EINVAL;
+
+	fd = get_unused_fd_flags(flags & O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	filp = get_empty_filp();
+	if (!filp) {
+		ret = -ENFILE;
+		goto out_put_fd;
+	}
+
+	user = get_uid(current->user);
+	if (unlikely(atomic_read(&user->inotify_devs) >=
+			inotify_max_user_instances)) {
+		ret = -EMFILE;
+		goto out_free_uid;
+	}
+
+	dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+	if (unlikely(!dev)) {
+		ret = -ENOMEM;
+		goto out_free_uid;
+	}
+
+	ih = inotify_init(&inotify_user_ops);
+	if (IS_ERR(ih)) {
+		ret = PTR_ERR(ih);
+		goto out_free_dev;
+	}
+	dev->ih = ih;
+	dev->fa = NULL;
+
+	filp->f_op = &inotify_fops;
+	filp->f_path.mnt = mntget(inotify_mnt);
+	filp->f_path.dentry = dget(inotify_mnt->mnt_root);
+	filp->f_mapping = filp->f_path.dentry->d_inode->i_mapping;
+	filp->f_mode = FMODE_READ;
+	filp->f_flags = O_RDONLY | (flags & O_NONBLOCK);
+	filp->private_data = dev;
+
+	INIT_LIST_HEAD(&dev->events);
+	init_waitqueue_head(&dev->wq);
+	mutex_init(&dev->ev_mutex);
+	mutex_init(&dev->up_mutex);
+	dev->event_count = 0;
+	dev->queue_size = 0;
+	dev->max_events = inotify_max_queued_events;
+	dev->user = user;
+	atomic_set(&dev->count, 0);
+
+	get_inotify_dev(dev);
+	atomic_inc(&user->inotify_devs);
+	fd_install(fd, filp);
+
+	return fd;
+out_free_dev:
+	kfree(dev);
+out_free_uid:
+	free_uid(user);
+	put_filp(filp);
+out_put_fd:
+	put_unused_fd(fd);
+	return ret;
+}
+
+asmlinkage long sys_inotify_init(void)
+{
+	return sys_inotify_init1(0);
+}
+
+asmlinkage long sys_inotify_add_watch(int fd, const char __user *pathname, u32 mask)
+{
+	struct inode *inode;
+	struct inotify_device *dev;
+	struct path path;
+	struct file *filp;
+	int ret, fput_needed;
+	unsigned flags = 0;
+
+	filp = fget_light(fd, &fput_needed);
+	if (unlikely(!filp))
+		return -EBADF;
+
+	/* verify that this is indeed an inotify instance */
+	if (unlikely(filp->f_op != &inotify_fops)) {
+		ret = -EINVAL;
+		goto fput_and_out;
+	}
+
+	if (!(mask & IN_DONT_FOLLOW))
+		flags |= LOOKUP_FOLLOW;
+	if (mask & IN_ONLYDIR)
+		flags |= LOOKUP_DIRECTORY;
+
+	ret = find_inode(pathname, &path, flags);
+	if (unlikely(ret))
+		goto fput_and_out;
+
+	/* inode held in place by reference to path; dev by fget on fd */
+	inode = path.dentry->d_inode;
+	dev = filp->private_data;
+
+	mutex_lock(&dev->up_mutex);
+	ret = inotify_find_update_watch(dev->ih, inode, mask);
+	if (ret == -ENOENT)
+		ret = create_watch(dev, inode, mask);
+	mutex_unlock(&dev->up_mutex);
+
+	path_put(&path);
+fput_and_out:
+	fput_light(filp, fput_needed);
+	return ret;
+}
+
+asmlinkage long sys_inotify_rm_watch(int fd, u32 wd)
+{
+	struct file *filp;
+	struct inotify_device *dev;
+	int ret, fput_needed;
+
+	filp = fget_light(fd, &fput_needed);
+	if (unlikely(!filp))
+		return -EBADF;
+
+	/* verify that this is indeed an inotify instance */
+	if (unlikely(filp->f_op != &inotify_fops)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	dev = filp->private_data;
+
+	/* we free our watch data when we get IN_IGNORED */
+	ret = inotify_rm_wd(dev->ih, wd);
+
+out:
+	fput_light(filp, fput_needed);
+	return ret;
+}
+
+static int
+inotify_get_sb(struct file_system_type *fs_type, int flags,
+	       const char *dev_name, void *data, struct vfsmount *mnt)
+{
+	return get_sb_pseudo(fs_type, "inotify", NULL,
+			INOTIFYFS_SUPER_MAGIC, mnt);
+}
+
+static struct file_system_type inotify_fs_type = {
+    .name           = "inotifyfs",
+    .get_sb         = inotify_get_sb,
+    .kill_sb        = kill_anon_super,
+};
+
+/*
+ * inotify_user_setup - Our initialization function.  Note that we cannnot return
+ * error because we have compiled-in VFS hooks.  So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_user_setup(void)
+{
+	int ret;
+
+	ret = register_filesystem(&inotify_fs_type);
+	if (unlikely(ret))
+		panic("inotify: register_filesystem returned %d!\n", ret);
+
+	inotify_mnt = kern_mount(&inotify_fs_type);
+	if (IS_ERR(inotify_mnt))
+		panic("inotify: kern_mount ret %ld!\n", PTR_ERR(inotify_mnt));
+
+	inotify_max_queued_events = 16384;
+	inotify_max_user_instances = 128;
+	inotify_max_user_watches = 8192;
+
+	watch_cachep = kmem_cache_create("inotify_watch_cache",
+					 sizeof(struct inotify_user_watch),
+					 0, SLAB_PANIC, NULL);
+	event_cachep = kmem_cache_create("inotify_event_cache",
+					 sizeof(struct inotify_kernel_event),
+					 0, SLAB_PANIC, NULL);
+
+	return 0;
+}
+
+module_init(inotify_user_setup);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 2/8] fsnotify: pass a file instead of an inode to open, read, and write
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
  2008-11-25 17:20 ` [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-25 17:21 ` [PATCH -v3 3/8] fsnotify: sys_execve and sys_uselib do not call into fsnotify Eric Paris
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

fanotify, the upcoming notification system actually needs a f_path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process.  Close already was passing a file.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/compat.c              |    5 ++---
 fs/nfsd/vfs.c            |    4 ++--
 fs/open.c                |    2 +-
 fs/read_write.c          |    8 ++++----
 include/linux/fsnotify.h |    9 ++++++---
 5 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index e5f49f5..4a7788f 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1158,11 +1158,10 @@ out:
 	if (iov != iovstack)
 		kfree(iov);
 	if ((ret + (type == READ)) > 0) {
-		struct dentry *dentry = file->f_path.dentry;
 		if (type == READ)
-			fsnotify_access(dentry);
+			fsnotify_access(file);
 		else
-			fsnotify_modify(dentry);
+			fsnotify_modify(file);
 	}
 	return ret;
 }
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4433c8f..f4bc1e6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -940,7 +940,7 @@ nfsd_vfs_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 		nfsdstats.io_read += host_err;
 		*count = host_err;
 		err = 0;
-		fsnotify_access(file->f_path.dentry);
+		fsnotify_access(file);
 	} else 
 		err = nfserrno(host_err);
 out:
@@ -1007,7 +1007,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	set_fs(oldfs);
 	if (host_err >= 0) {
 		nfsdstats.io_write += cnt;
-		fsnotify_modify(file->f_path.dentry);
+		fsnotify_modify(file);
 	}
 
 	/* clear setuid/setgid flag after write */
diff --git a/fs/open.c b/fs/open.c
index 83cdb9d..9d69dd9 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1020,7 +1020,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, int mode)
 				put_unused_fd(fd);
 				fd = PTR_ERR(f);
 			} else {
-				fsnotify_open(f->f_path.dentry);
+				fsnotify_open(f);
 				fd_install(fd, f);
 			}
 		}
diff --git a/fs/read_write.c b/fs/read_write.c
index 969a6d9..7eb2949 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -280,7 +280,7 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 		else
 			ret = do_sync_read(file, buf, count, pos);
 		if (ret > 0) {
-			fsnotify_access(file->f_path.dentry);
+			fsnotify_access(file);
 			add_rchar(current, ret);
 		}
 		inc_syscr(current);
@@ -335,7 +335,7 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 		else
 			ret = do_sync_write(file, buf, count, pos);
 		if (ret > 0) {
-			fsnotify_modify(file->f_path.dentry);
+			fsnotify_modify(file);
 			add_wchar(current, ret);
 		}
 		inc_syscw(current);
@@ -626,9 +626,9 @@ out:
 		kfree(iov);
 	if ((ret + (type == READ)) > 0) {
 		if (type == READ)
-			fsnotify_access(file->f_path.dentry);
+			fsnotify_access(file);
 		else
-			fsnotify_modify(file->f_path.dentry);
+			fsnotify_modify(file);
 	}
 	return ret;
 }
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 00fbd5b..dec1afb 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -136,8 +136,9 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 /*
  * fsnotify_access - file was read
  */
-static inline void fsnotify_access(struct dentry *dentry)
+static inline void fsnotify_access(struct file *file)
 {
+	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
 	u32 mask = IN_ACCESS;
 
@@ -152,8 +153,9 @@ static inline void fsnotify_access(struct dentry *dentry)
 /*
  * fsnotify_modify - file was modified
  */
-static inline void fsnotify_modify(struct dentry *dentry)
+static inline void fsnotify_modify(struct file *file)
 {
+	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
 	u32 mask = IN_MODIFY;
 
@@ -168,8 +170,9 @@ static inline void fsnotify_modify(struct dentry *dentry)
 /*
  * fsnotify_open - file was opened
  */
-static inline void fsnotify_open(struct dentry *dentry)
+static inline void fsnotify_open(struct file *file)
 {
+	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
 	u32 mask = IN_OPEN;
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 3/8] fsnotify: sys_execve and sys_uselib do not call into fsnotify
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
  2008-11-25 17:20 ` [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification Eric Paris
  2008-11-25 17:21 ` [PATCH -v3 2/8] fsnotify: pass a file instead of an inode to open, read, and write Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-28 10:16   ` Christoph Hellwig
  2008-11-25 17:21 ` [PATCH -v3 4/8] fsnotify: use the new open-exec hook for inotify and dnotify Eric Paris
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

sys_execve and sys_uselib do not call into fsnotify so inotify, dnotify,
and importantly to me fanotify do not see opens on things which are going
to be exectued.  Create a generic fsnotify hook for these paths.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/exec.c                |    5 +++++
 include/linux/fsnotify.h |    7 +++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 4e834f1..8f56995 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -51,6 +51,7 @@
 #include <linux/audit.h>
 #include <linux/tracehook.h>
 #include <linux/kmod.h>
+#include <linux/fsnotify.h>
 
 #include <asm/uaccess.h>
 #include <asm/mmu_context.h>
@@ -135,6 +136,8 @@ asmlinkage long sys_uselib(const char __user * library)
 	if (IS_ERR(file))
 		goto out;
 
+	fsnotify_open_exec(file);
+
 	error = -ENOEXEC;
 	if(file->f_op) {
 		struct linux_binfmt * fmt;
@@ -687,6 +690,8 @@ struct file *open_exec(const char *name)
 	if (IS_ERR(file))
 		return file;
 
+	fsnotify_open_exec(file);
+
 	err = deny_write_access(file);
 	if (err) {
 		fput(file);
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index dec1afb..ffe787f 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -168,6 +168,13 @@ static inline void fsnotify_modify(struct file *file)
 }
 
 /*
+ * fsnotify_open_exec - file was opened by execve or uselib
+ */
+static inline void fsnotify_open_exec(struct file *file)
+{
+}
+
+/*
  * fsnotify_open - file was opened
  */
 static inline void fsnotify_open(struct file *file)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 4/8] fsnotify: use the new open-exec hook for inotify and dnotify
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
                   ` (2 preceding siblings ...)
  2008-11-25 17:21 ` [PATCH -v3 3/8] fsnotify: sys_execve and sys_uselib do not call into fsnotify Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

inotify and dnotify did not get access events when their children were
accessed for shlib or exec purposes.  Trigger on those events as well.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 include/linux/fsnotify.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index ffe787f..6fbf455 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -172,6 +172,12 @@ static inline void fsnotify_modify(struct file *file)
  */
 static inline void fsnotify_open_exec(struct file *file)
 {
+	struct dentry *dentry = file->f_path.dentry;
+	struct inode *inode = dentry->d_inode;
+
+	dnotify_parent(dentry, DN_ACCESS);
+	inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0, dentry->d_name.name);
+	inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL, NULL);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
                   ` (3 preceding siblings ...)
  2008-11-25 17:21 ` [PATCH -v3 4/8] fsnotify: use the new open-exec hook for inotify and dnotify Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-27 16:14   ` Peter Zijlstra
                     ` (4 more replies)
  2008-11-25 17:21 ` [PATCH -v3 6/8] fsnotify: add group priorities Eric Paris
                   ` (3 subsequent siblings)
  8 siblings, 5 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

fsnotify is a backend for filesystem notification.  fsnotify does
not provide any userspace interface but does provide the basis
needed for other notification schemes such as dnotify.  fsnotify
can be extended to be the backend for inotify or the upcoming
fsnotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/Kconfig                |   12 ++
 fs/notify/Makefile               |    2 
 fs/notify/fsnotify.c             |   78 ++++++++++++++++
 fs/notify/fsnotify.h             |   69 ++++++++++++++
 fs/notify/group.c                |  124 +++++++++++++++++++++++++
 fs/notify/notification.c         |  188 ++++++++++++++++++++++++++++++++++++++
 include/linux/fsnotify_backend.h |   80 ++++++++++++++++
 7 files changed, 553 insertions(+), 0 deletions(-)
 create mode 100644 fs/notify/fsnotify.c
 create mode 100644 fs/notify/fsnotify.h
 create mode 100644 fs/notify/group.c
 create mode 100644 fs/notify/notification.c
 create mode 100644 include/linux/fsnotify_backend.h

diff --git a/fs/notify/Kconfig b/fs/notify/Kconfig
index 50914d7..269b59a 100644
--- a/fs/notify/Kconfig
+++ b/fs/notify/Kconfig
@@ -1,2 +1,14 @@
+config FSNOTIFY
+        bool "Filesystem notification backend"
+        default y
+        ---help---
+           fsnotify is a backend for filesystem notification.  fsnotify does
+           not provide any userspace interface but does provide the basis
+           needed for other notification schemes such as dnotify and fsnotify.
+
+           Say Y here to enable fsnotify suport.
+
+           If unsure, say Y.
+
 source "fs/notify/dnotify/Kconfig"
 source "fs/notify/inotify/Kconfig"
diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index 5a95b60..7cb285a 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,2 +1,4 @@
 obj-y			+= dnotify/
 obj-y			+= inotify/
+
+obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
new file mode 100644
index 0000000..3c4262b
--- /dev/null
+++ b/fs/notify/fsnotify.c
@@ -0,0 +1,78 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/srcu.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+void fsnotify(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask)
+{
+	struct fsnotify_group *group;
+	struct fsnotify_event *event = NULL;
+	int idx;
+
+	if (likely(list_empty(&fsnotify_groups)))
+		return;
+
+	if (!(mask & fsnotify_mask))
+		return;
+
+	/*
+	 * SRCU!!  the groups list is very very much read only and the path is
+	 * very hot (assuming something is using fsnotify)  Not blocking while
+	 * walking this list is ugly.  We could preallocate an event and an
+	 * event holder for every group that event might need to be put on, but
+	 * all that possibly wasted allocation is nuts.  For all we know there
+	 * are already mark entries, groups don't need this event, or all
+	 * sorts of reasons to believe not every kernel action is going to get
+	 * sent to userspace.  Hopefully this won't get shit on too much,
+	 * because going to a mutex here is really going to needlessly serialize
+	 * read/write/open/close across the whole system....
+	 */
+	idx = srcu_read_lock(&fsnotify_grp_srcu_struct);
+	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
+		if (mask & group->mask) {
+			if (!event) {
+				event = fsnotify_create_event(file, dentry, inode, mask);
+				/* shit, we OOM'd and now we can't tell, lets hope something else blows up */
+				if (!event)
+					break;
+			}
+			group->ops->event_to_notif(group, event);
+		}
+	}
+	srcu_read_unlock(&fsnotify_grp_srcu_struct, idx);
+	/*
+	 * fsnotify_create_event() took a reference so the event can't be cleaned
+	 * up while we are still trying to add it to lists, drop that one.
+	 */
+	if (event)
+		fsnotify_put_event(event);
+}
+EXPORT_SYMBOL_GPL(fsnotify);
+
+static __init int fsnotify_init(void)
+{
+	return init_srcu_struct(&fsnotify_grp_srcu_struct);
+}
+subsys_initcall(fsnotify_init);
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
new file mode 100644
index 0000000..007bc28
--- /dev/null
+++ b/fs/notify/fsnotify.h
@@ -0,0 +1,69 @@
+#ifndef _LINUX_FSNOTIFY_PRIVATE_H
+#define _LINUX_FSNOTIFY_PRIVATE_H
+
+#include <linux/dcache.h>
+#include <linux/list.h>
+#include <linux/fs.h>
+#include <linux/path.h>
+#include <linux/spinlock.h>
+
+#include <linux/fsnotify.h>
+
+#include <asm/atomic.h>
+/*
+ * A single event can be queued in multiple group->notification_lists.
+ *
+ * each group->notification_list will point to an event_holder which in turns points
+ * to the actual event that needs to be sent to userspace.
+ *
+ * Seemed cheaper to create a refcnt'd event and a small holder for every group
+ * than create a different event for every group
+ *
+ */
+struct fsnotify_event_holder {
+	struct fsnotify_event *event;
+	struct list_head event_list;
+};
+
+/*
+ * all of the information about the original object we want to now send to
+ * a scanner.  If you want to carry more info from the accessing task to the
+ * listener this structure is where you need to be adding fields.
+ */
+struct fsnotify_event {
+	/*
+	 * If we create an event we are also going to need to create a holder
+	 * to link to a group.  So embed one holder in the event.  Means only
+	 * one allocation for the common case where we only have one group
+	 */
+	struct fsnotify_event_holder holder;
+	spinlock_t holder_spinlock; /* protection for the associated event_holder */
+	/*
+	 * depending on the event type we should have either a path, dentry, or inode
+	 * we should never have more than one....
+	 */
+	union {
+		struct path path;
+		struct dentry *dentry;
+		struct inode *inode;
+	};
+#define FSNOTIFY_EVENT_PATH	1
+#define FSNOTIFY_EVENT_DENTRY	2
+#define FSNOTIFY_EVENT_INODE	3
+	int flag;		/* which of the above we have */
+	unsigned long mask;	/* the type of access */
+	atomic_t refcnt;	/* how many groups still are using/need to send this event */
+};
+
+extern struct srcu_struct fsnotify_grp_srcu_struct;
+extern struct list_head fsnotify_groups;
+extern unsigned long fsnotify_mask;
+
+extern int fsnotify_check_notif_queue(struct fsnotify_group *group);
+extern void fsnotify_clear_notif(struct fsnotify_group *group);
+extern void fsnotify_get_event(struct fsnotify_event *event);
+extern void fsnotify_put_event(struct fsnotify_event *event);
+extern struct fsnotify_event *fsnotify_create_event(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask);
+extern struct fsnotify_event_holder *fsnotify_alloc_event_holder(void);
+extern void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder);
+#endif	/* _LINUX_FSNOTIFY_PRIVATE_H */
diff --git a/fs/notify/group.c b/fs/notify/group.c
new file mode 100644
index 0000000..dcc0547
--- /dev/null
+++ b/fs/notify/group.c
@@ -0,0 +1,124 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/rculist.h>
+#include <linux/wait.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+#include <asm/atomic.h>
+
+DEFINE_MUTEX(fsnotify_grp_mutex);
+struct srcu_struct fsnotify_grp_srcu_struct;
+LIST_HEAD(fsnotify_groups);
+unsigned long fsnotify_mask;
+
+void fsnotify_recalc_global_mask(void)
+{
+	struct fsnotify_group *group;
+	unsigned long mask = 0;
+	int idx;
+
+	idx = srcu_read_lock(&fsnotify_grp_srcu_struct);
+	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
+		mask |= group->mask;
+	}
+	srcu_read_unlock(&fsnotify_grp_srcu_struct, idx);
+	fsnotify_mask = mask;
+}
+
+void fsnotify_get_group(struct fsnotify_group *group)
+{
+	atomic_inc(&group->refcnt);
+}
+
+void fsnotify_kill_group(struct fsnotify_group *group)
+{
+	/* clear the notification queue of all events */
+	fsnotify_clear_notif(group);
+
+	kfree(group);
+}
+
+void fsnotify_put_group(struct fsnotify_group *group)
+{
+	mutex_lock(&fsnotify_grp_mutex);
+	if (atomic_dec_and_test(&group->refcnt)) {
+		list_del_rcu(&group->group_list);
+		mutex_unlock(&fsnotify_grp_mutex);
+
+		synchronize_srcu(&fsnotify_grp_srcu_struct);
+
+		fsnotify_recalc_global_mask();
+		fsnotify_kill_group(group);
+
+		return;
+	}
+	mutex_unlock(&fsnotify_grp_mutex);
+
+	return;
+}
+
+struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops)
+{
+	struct fsnotify_group *group_iter;
+	struct fsnotify_group *group = NULL;
+
+	mutex_lock(&fsnotify_grp_mutex);
+	list_for_each_entry_rcu(group_iter, &fsnotify_groups, group_list) {
+		if (group_iter->group_num == group_num) {
+			if ((group_iter->mask == mask) &&
+			    (group_iter->ops == ops)) {
+				fsnotify_get_group(group_iter);
+				group = group_iter;
+			} else
+				group = ERR_PTR(-EEXIST);
+			goto out;
+		}
+	}
+
+	group = kmalloc(sizeof(struct fsnotify_group), GFP_KERNEL);
+	if (!group) {
+		group = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	atomic_set(&group->refcnt, 1);
+
+	group->group_num = group_num;
+	group->mask = mask;
+
+	mutex_init(&group->notification_mutex);
+	INIT_LIST_HEAD(&group->notification_list);
+	init_waitqueue_head(&group->notification_waitq);
+
+	group->ops = ops;
+
+	/* add it */
+	list_add_rcu(&group->group_list, &fsnotify_groups);
+
+out:
+	mutex_unlock(&fsnotify_grp_mutex);
+	fsnotify_recalc_global_mask();
+	return group;
+}
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
new file mode 100644
index 0000000..2467b5b
--- /dev/null
+++ b/fs/notify/notification.c
@@ -0,0 +1,188 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mount.h>
+#include <linux/mutex.h>
+#include <linux/namei.h>
+#include <linux/path.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <asm/atomic.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+static struct kmem_cache *event_kmem_cache;
+static struct kmem_cache *event_holder_kmem_cache;
+
+int fsnotify_check_notif_queue(struct fsnotify_group *group)
+{
+	mutex_lock(&group->notification_mutex);
+	if (!list_empty(&group->notification_list))
+		return 1;
+	mutex_unlock(&group->notification_mutex);
+	return 0;
+}
+
+void fsnotify_get_event(struct fsnotify_event *event)
+{
+	atomic_inc(&event->refcnt);
+}
+
+void fsnotify_put_event(struct fsnotify_event *event)
+{
+	if (!event)
+		return;
+
+	if (atomic_dec_and_test(&event->refcnt)) {
+		switch (event->flag) {
+		case FSNOTIFY_EVENT_PATH:
+			path_put(&event->path);
+			event->path.dentry = NULL;
+			event->path.mnt = NULL;
+			break;
+		case FSNOTIFY_EVENT_INODE:
+			iput(event->inode);
+			event->inode = NULL;
+			break;
+		case FSNOTIFY_EVENT_DENTRY:
+			dput(event->dentry);
+			event->dentry = NULL;
+			break;
+		default:
+			BUG();
+		};
+
+		event->mask = 0;
+		kmem_cache_free(event_kmem_cache, event);
+	}
+}
+
+struct fsnotify_event_holder *alloc_event_holder(void)
+{
+	return kmem_cache_alloc(event_holder_kmem_cache, GFP_KERNEL);
+}
+
+void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder)
+{
+	kmem_cache_free(event_holder_kmem_cache, holder);
+}
+
+/*
+ * must be called with group->notification_mutex held and must know event is present.
+ * it is the responsibility of the caller to call put_event() on the returned
+ * structure
+ */
+struct fsnotify_event *get_event_from_notif(struct fsnotify_group *group)
+{
+	struct fsnotify_event *event;
+	struct fsnotify_event_holder *holder;
+
+	holder = list_first_entry(&group->notification_list, struct fsnotify_event_holder, event_list);
+
+	event = holder->event;
+
+	spin_lock(&event->holder_spinlock);
+	holder->event = NULL;
+	list_del_init(&holder->event_list);
+	spin_unlock(&event->holder_spinlock);
+
+	/* event == holder means we are referenced through the in event holder */
+	if (event != (struct fsnotify_event *)holder)
+		fsnotify_destroy_event_holder(holder);
+
+	return event;
+}
+
+void fsnotify_clear_notif(struct fsnotify_group *group)
+{
+	struct fsnotify_event *event;
+
+	while (fsnotify_check_notif_queue(group)) {
+		event = get_event_from_notif(group);
+		fsnotify_put_event(event);
+		/* fsnotify_check_notif_queue() took this lock */
+		mutex_unlock(&group->notification_mutex);
+	}
+}
+
+struct fsnotify_event *fsnotify_create_event(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask)
+{
+	struct fsnotify_event *event;
+
+	event = kmem_cache_alloc(event_kmem_cache, GFP_KERNEL);
+	if (!event)
+		return NULL;
+
+	event->holder.event = NULL;
+	INIT_LIST_HEAD(&event->holder.event_list);
+	atomic_set(&event->refcnt, 1);
+
+	spin_lock_init(&event->holder_spinlock);
+
+	event->path.dentry = NULL;
+	event->path.mnt = NULL;
+	event->dentry = NULL;
+	event->inode = NULL;
+
+	if (file) {
+		event->path.dentry = file->f_path.dentry;
+		event->path.mnt = file->f_path.mnt;
+		path_get(&event->path);
+		event->flag = FSNOTIFY_EVENT_PATH;
+	} else if (dentry) {
+		event->dentry = dget(dentry);
+		event->flag = FSNOTIFY_EVENT_DENTRY;
+	} else if (inode) {
+		event->inode = igrab(inode);
+		event->flag = FSNOTIFY_EVENT_INODE;
+	}
+
+#if 1
+	/* did we fuck up and get more than one? */
+	do {
+		int i = 0;
+		if (file)
+			i++;
+		if (dentry)
+			i++;
+		if (inode)
+			i++;
+		WARN_ON(i != 1);
+	} while (0);
+#endif
+
+	event->mask = mask;
+
+	return event;
+}
+
+__init int fsnotify_notification_init(void)
+{
+	event_kmem_cache = kmem_cache_create("fsnotify_event", sizeof(struct fsnotify_event), 0, SLAB_PANIC, NULL);
+	event_holder_kmem_cache = kmem_cache_create("fsnotify_event_holder", sizeof(struct fsnotify_event_holder), 0, SLAB_PANIC, NULL);
+
+	return 0;
+}
+subsys_initcall(fsnotify_notification_init);
+
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
new file mode 100644
index 0000000..6a7b95c
--- /dev/null
+++ b/include/linux/fsnotify_backend.h
@@ -0,0 +1,80 @@
+/*
+ * Filesystem access notification for Linux
+ *
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ */
+
+#ifndef _LINUX_FSNOTIFY_BACKEND_H
+#define _LINUX_FSNOTIFY_BACKEND_H
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/wait.h>
+
+#include <asm/atomic.h>
+
+#define FS_ACCESS		0x00000001	/* file was accessed */
+#define FS_ACCESS_CHILD		0x00000002	/* child was accessed */
+#define FS_MODIFY		0x00000004	/* file was modified */
+#define FS_MODIFY_CHILD		0x00000008	/* child was modified */
+#define FS_DELETE		0x00000010	/* deleted */
+#define FS_DELETE_CHILD		0x00000020	/* child was deleted */
+#define FS_ATTRIB		0x00000040	/* attributes were changed */
+#define FS_ATTRIB_CHILD		0x00000080	/* child attributed changed */
+#define FS_CLOSE_NOWRITE	0x00000100	/* Unwrittable file closed */
+#define FS_CLOSE_WRITE		0x00000200	/* Writtable file closed */
+#define FS_OPEN			0x00000400	/* File was opened */
+#define FS_CREATE		0x00000800	/* new file created */
+#define FS_RENAME		0x00001000	/* file renamed */
+
+/* FIXME currently Q's have no limit.... */
+#define FS_Q_OVERFLOW		0x80000000	/* Event queued overflowed */
+#define FS_DN_MULTISHOT		0x40000000	/* dnotify multishot */
+
+/* helper events */
+#define FS_CLOSE		(FS_CLOSE_WRITE | FS_CLOSE_NOWRITE) /* close */
+
+struct fsnotify_group;
+struct fsnotify_event;
+
+struct fsnotify_ops {
+	int (*event_to_notif)(struct fsnotify_group *group, struct fsnotify_event *event);
+};
+
+struct fsnotify_group {
+	struct list_head group_list;	/* list of all groups on the system */
+	unsigned int group_num;		/* the 'name' of the event */
+	unsigned long mask;		/* mask of events this group cares about */
+	atomic_t refcnt;		/* num of processes with a special file open */
+
+	struct fsnotify_ops *ops;	/* how this group handles things */
+
+	/* needed to send notification to userspace */
+	struct mutex notification_mutex;/* protect the notification_list */
+	struct list_head notification_list;	/* list of event_holder this group needs to send to userspace */
+	wait_queue_head_t notification_waitq;	/* read() on the notification file blocks on this waitq */
+};
+
+#ifdef CONFIG_FSNOTIFY
+
+/* called from the vfs to signal fs events */
+extern void fsnotify(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask);
+
+/* called from fsnotify interfaces, such as fanotify or dnotify */
+extern void fsnotify_recalc_global_mask(void);
+extern void fsnotify_get_group(struct fsnotify_group *group);
+extern struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops);
+extern void fsnotify_put_group(struct fsnotify_group *group);
+#else
+
+static inline void fsnotify(struct file *file, unsigned long mask)
+{}
+#endif	/* CONFIG_FSNOTIFY */
+
+#endif	/* __KERNEL __ */
+
+#endif	/* _LINUX_FSNOTIFY_BACKEND_H */


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 6/8] fsnotify: add group priorities
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
                   ` (4 preceding siblings ...)
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-27 16:25   ` Peter Zijlstra
  2008-11-25 17:21 ` [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings Eric Paris
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

In preperation for blocking fsnotify calls group priorities must be added.
When multiple groups request the same event type the lowest priority group
will receive the notification first.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/group.c                |   28 ++++++++++++++++++++++++----
 include/linux/fsnotify_backend.h |    4 +++-
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index dcc0547..bb8d6c6 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -79,15 +79,17 @@ void fsnotify_put_group(struct fsnotify_group *group)
 	return;
 }
 
-struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_find_group(unsigned int priority, unsigned int group_num,
+					   unsigned long mask, struct fsnotify_ops *ops)
 {
 	struct fsnotify_group *group_iter;
 	struct fsnotify_group *group = NULL;
 
 	mutex_lock(&fsnotify_grp_mutex);
 	list_for_each_entry_rcu(group_iter, &fsnotify_groups, group_list) {
-		if (group_iter->group_num == group_num) {
+		if (group_iter->priority == priority) {
 			if ((group_iter->mask == mask) &&
+			    (group_iter->group_num == group_num) &&
 			    (group_iter->ops == ops)) {
 				fsnotify_get_group(group_iter);
 				group = group_iter;
@@ -105,6 +107,7 @@ struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long
 
 	atomic_set(&group->refcnt, 1);
 
+	group->priority = priority;
 	group->group_num = group_num;
 	group->mask = mask;
 
@@ -114,9 +117,26 @@ struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long
 
 	group->ops = ops;
 
-	/* add it */
-	list_add_rcu(&group->group_list, &fsnotify_groups);
+	/* Do we need to be the first entry? */
+	if (list_empty(&fsnotify_groups)) {
+		list_add_rcu(&group->group_list, &fsnotify_groups);
+		goto out;
+	}
+
+	list_for_each_entry(group_iter, &fsnotify_groups, group_list) {
+		/* insert in front of this one? */
+		if (priority < group_iter->priority) {
+			/* I used list_add_tail() to insert in front of group_iter...  */
+			list_add_tail_rcu(&group->group_list, &group_iter->group_list);
+			break;
+		}
 
+		/* are we at the end?  if so insert at end */
+		if (list_is_last(&group_iter->group_list, &fsnotify_groups)) {
+			list_add_tail_rcu(&group->group_list, &fsnotify_groups);
+			break;
+		}
+	}
 out:
 	mutex_unlock(&fsnotify_grp_mutex);
 	fsnotify_recalc_global_mask();
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 6a7b95c..e0b5528 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -57,6 +57,8 @@ struct fsnotify_group {
 	struct mutex notification_mutex;/* protect the notification_list */
 	struct list_head notification_list;	/* list of event_holder this group needs to send to userspace */
 	wait_queue_head_t notification_waitq;	/* read() on the notification file blocks on this waitq */
+
+	unsigned int priority;		/* order this group should receive msgs.  low first */
 };
 
 #ifdef CONFIG_FSNOTIFY
@@ -67,7 +69,7 @@ extern void fsnotify(struct file *file, struct dentry *dentry, struct inode *ino
 /* called from fsnotify interfaces, such as fanotify or dnotify */
 extern void fsnotify_recalc_global_mask(void);
 extern void fsnotify_get_group(struct fsnotify_group *group);
-extern struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops);
+extern struct fsnotify_group *fsnotify_find_group(unsigned int priority, unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops);
 extern void fsnotify_put_group(struct fsnotify_group *group);
 #else
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
                   ` (5 preceding siblings ...)
  2008-11-25 17:21 ` [PATCH -v3 6/8] fsnotify: add group priorities Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-27 16:29   ` Peter Zijlstra
  2008-11-28  5:42   ` Al Viro
  2008-11-25 17:21 ` [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify Eric Paris
  2008-11-26  0:14 ` [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Andrew Morton
  8 siblings, 2 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

This patch creates in inode fsnotify markings.  dnotify will make use of in
inode markings to mark which inodes it wishes to send events for.  fanotify
will use this to mark which inodes it does not wish to send events for.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/inode.c                       |    7 +
 fs/notify/Makefile               |    2 
 fs/notify/fsnotify.c             |   26 ++++
 fs/notify/fsnotify.h             |   30 +++++
 fs/notify/group.c                |    6 +
 fs/notify/inode_mark.c           |  226 ++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h               |    6 +
 include/linux/fsnotify.h         |    8 +
 include/linux/fsnotify_backend.h |   21 ++++
 9 files changed, 331 insertions(+), 1 deletions(-)
 create mode 100644 fs/notify/inode_mark.c

diff --git a/fs/inode.c b/fs/inode.c
index 0487ddb..d17cd05 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@
 #include <linux/cdev.h>
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
+#include <linux/fsnotify.h>
 #include <linux/mount.h>
 
 /*
@@ -183,6 +184,11 @@ static struct inode *alloc_inode(struct super_block *sb)
 		}
 		inode->i_private = NULL;
 		inode->i_mapping = mapping;
+#ifdef CONFIG_FSNOTIFY
+		inode->i_fsnotify_mask = 0;
+		INIT_LIST_HEAD(&inode->i_fsnotify_mark_entries);
+		spin_lock_init(&inode->i_fsnotify_lock);
+#endif
 	}
 	return inode;
 }
@@ -191,6 +197,7 @@ void destroy_inode(struct inode *inode)
 {
 	BUG_ON(inode_has_buffers(inode));
 	security_inode_free(inode);
+	fsnotify_inode_delete(inode);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
 	else
diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index 7cb285a..47b60f3 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,4 +1,4 @@
 obj-y			+= dnotify/
 obj-y			+= inotify/
 
-obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o
+obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o inode_mark.o
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 3c4262b..1d43bf4 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -25,18 +25,42 @@
 #include <linux/fsnotify_backend.h>
 #include "fsnotify.h"
 
+void __fsnotify_inode_delete(struct inode *inode)
+{
+	if (likely(list_empty(&fsnotify_groups)))
+		return;
+
+	fsnotify_clear_mark_inode(inode, 0, FSNOTIFY_FORCE_CLEAR_MARK);
+}
+EXPORT_SYMBOL_GPL(__fsnotify_inode_delete);
+
 void fsnotify(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask)
 {
 	struct fsnotify_group *group;
 	struct fsnotify_event *event = NULL;
+	struct inode *cinode;
 	int idx;
 
 	if (likely(list_empty(&fsnotify_groups)))
 		return;
 
+	if (file)
+		cinode = file->f_path.dentry->d_inode;
+	else if (dentry)
+		cinode = dentry->d_inode;
+	else if (inode)
+		cinode = inode;
+	else
+		BUG();
+
+	if (mask & FS_MODIFY)
+		fsnotify_clear_mark_inode(cinode, mask, 0);
+
 	if (!(mask & fsnotify_mask))
 		return;
 
+	if (!(mask & cinode->i_fsnotify_mask))
+		return;
 	/*
 	 * SRCU!!  the groups list is very very much read only and the path is
 	 * very hot (assuming something is using fsnotify)  Not blocking while
@@ -52,6 +76,8 @@ void fsnotify(struct file *file, struct dentry *dentry, struct inode *inode, uns
 	idx = srcu_read_lock(&fsnotify_grp_srcu_struct);
 	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
 		if (mask & group->mask) {
+			if (!group->ops->should_send_event(group, cinode, mask))
+				continue;
 			if (!event) {
 				event = fsnotify_create_event(file, dentry, inode, mask);
 				/* shit, we OOM'd and now we can't tell, lets hope something else blows up */
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 007bc28..e5e21a5 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -55,6 +55,29 @@ struct fsnotify_event {
 	atomic_t refcnt;	/* how many groups still are using/need to send this event */
 };
 
+/*
+ * a mark is simply an entry attached to an in core inode which allows an
+ * fsnotify listener to indicate they are either no longer interested in events
+ * of a type matching mask or only interested in those events.
+ *
+ * these are flushed when an inode is evicted from core and may be flushed
+ * when the inode is modified (as seen by fsnotify_access).  Some fsnotify users
+ * (such as dnotify) will flush these when the open fd is closed and not at
+ * inode eviction or modification.
+ */
+struct fsnotify_mark_entry {
+	struct fsnotify_group *group;	/* group this mark entry is for */
+	unsigned long mask;		/* mask this mark entry is for */
+	struct inode *inode;		/* inode this entry is associated with */
+	void *private;			/* private data for the listener */
+	spinlock_t lock;		/* protect refcnt and killme */
+	int refcnt;			/* active things looking at this mark */
+	/* indication one of the users wants this object dead.  Kill will happen when refcnt hits 0 */
+	int killme;
+	struct list_head i_list;	/* list of mark_entries by inode->i_fsnotify_mark_entries */
+	struct list_head g_list;	/* list of mark_entries by group->i_fsnotify_mark_entries */
+};
+
 extern struct srcu_struct fsnotify_grp_srcu_struct;
 extern struct list_head fsnotify_groups;
 extern unsigned long fsnotify_mask;
@@ -66,4 +89,11 @@ extern void fsnotify_put_event(struct fsnotify_event *event);
 extern struct fsnotify_event *fsnotify_create_event(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask);
 extern struct fsnotify_event_holder *fsnotify_alloc_event_holder(void);
 extern void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder);
+
+#define FSNOTIFY_FORCE_CLEAR_MARK	0x01
+extern void fsnotify_mark_get(struct fsnotify_mark_entry *entry);
+extern void fsnotify_mark_put(struct fsnotify_mark_entry *entry);
+extern void fsnotify_clear_mark_group(struct fsnotify_group *group);
+extern void fsnotify_kill_mark_inode(struct fsnotify_mark_entry *entry);
+extern void fsnotify_clear_mark_inode(struct inode *inode, unsigned long mask, unsigned int flags);
 #endif	/* _LINUX_FSNOTIFY_PRIVATE_H */
diff --git a/fs/notify/group.c b/fs/notify/group.c
index bb8d6c6..5e6f974 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -57,6 +57,9 @@ void fsnotify_kill_group(struct fsnotify_group *group)
 	/* clear the notification queue of all events */
 	fsnotify_clear_notif(group);
 
+	/* clear all inode mark entries for this group */
+	fsnotify_clear_mark_group(group);
+
 	kfree(group);
 }
 
@@ -115,6 +118,9 @@ struct fsnotify_group *fsnotify_find_group(unsigned int priority, unsigned int g
 	INIT_LIST_HEAD(&group->notification_list);
 	init_waitqueue_head(&group->notification_waitq);
 
+	mutex_init(&group->mark_mutex);
+	INIT_LIST_HEAD(&group->mark_entries);
+
 	group->ops = ops;
 
 	/* Do we need to be the first entry? */
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
new file mode 100644
index 0000000..7d95eeb
--- /dev/null
+++ b/fs/notify/inode_mark.c
@@ -0,0 +1,226 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <asm/atomic.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+static struct kmem_cache *fsnotify_mark_kmem_cache;
+
+static void fsnotify_mark_kill(struct fsnotify_mark_entry *entry)
+{
+	entry->group = NULL;
+	entry->inode = NULL;
+	entry->mask = 0;
+	entry->private = NULL;
+	INIT_LIST_HEAD(&entry->i_list);
+	INIT_LIST_HEAD(&entry->g_list);
+	kmem_cache_free(fsnotify_mark_kmem_cache, entry);
+}
+
+static struct fsnotify_mark_entry *fsnotify_mark_alloc(void)
+{
+	struct fsnotify_mark_entry *entry;
+
+	entry = kmem_cache_alloc(fsnotify_mark_kmem_cache, GFP_KERNEL);
+
+	return entry;
+}
+
+void fsnotify_mark_get(struct fsnotify_mark_entry *entry)
+{
+	spin_lock(&entry->lock);
+	entry->refcnt++;
+	spin_unlock(&entry->lock);
+}
+
+void fsnotify_mark_put(struct fsnotify_mark_entry *entry)
+{
+	spin_lock(&entry->lock);
+	entry->refcnt--;
+	/* if (!refcnt && killme) we are off both lists and nothing else can find us. */
+	if ((!entry->refcnt) && (entry->killme)) {
+		spin_unlock(&entry->lock);
+		fsnotify_mark_kill(entry);
+		return;
+	}
+	spin_unlock(&entry->lock);
+}
+
+void fsnotify_clear_mark_group(struct fsnotify_group *group)
+{
+	struct fsnotify_mark_entry *entry;
+	struct inode *inode;
+
+	mutex_lock(&group->mark_mutex);
+	while (!list_empty(&group->mark_entries)) {
+		entry = list_first_entry(&group->mark_entries, struct fsnotify_mark_entry, g_list);
+
+		/* make sure the entry survives until it is off both lists */
+		fsnotify_mark_get(entry);
+
+		/* remove from g_list */
+		list_del_init(&entry->g_list);
+		mutex_unlock(&group->mark_mutex);
+
+		inode = entry->inode;
+
+		spin_lock(&entry->lock);
+		entry->killme = 1;
+		spin_unlock(&entry->lock);
+
+		/* remove from i_list */
+		spin_lock(&inode->i_fsnotify_lock);
+		list_del_init(&entry->i_list);
+		spin_unlock(&inode->i_fsnotify_lock);
+
+		/* off both lists, may free now */
+		fsnotify_mark_put(entry);
+
+		mutex_lock(&group->mark_mutex);
+	}
+	mutex_unlock(&group->mark_mutex);
+}
+
+/* called with entry->inode->fsnotify_mark_lock held! */
+void fsnotify_kill_mark_inode(struct fsnotify_mark_entry *entry)
+{
+	struct fsnotify_group *group = entry->group;
+	struct inode *inode = entry->inode;
+
+	/* make sure the entry survives until it is off both lists */
+	fsnotify_mark_get(entry);
+
+	list_del_init(&entry->i_list);
+	spin_unlock(&inode->i_fsnotify_lock);
+
+	spin_lock(&entry->lock);
+	entry->killme = 1;
+	spin_unlock(&entry->lock);
+
+	/* remove from g_list */
+	mutex_lock(&group->mark_mutex);
+	list_del_init(&entry->g_list);
+	mutex_unlock(&group->mark_mutex);
+
+	/* off both lists, may free now */
+	fsnotify_mark_put(entry);
+
+	spin_lock(&inode->i_fsnotify_lock);
+}
+
+void fsnotify_clear_mark_inode(struct inode *inode, unsigned long mask, unsigned int flags)
+{
+	struct fsnotify_mark_entry *entry;
+	LIST_HEAD(list);
+
+	spin_lock(&inode->i_fsnotify_lock);
+
+	/* blank the inode list and move all entries to a new list */
+	list_splice_init(&inode->i_fsnotify_mark_entries, &list);
+
+	/*
+	 * walk the new list letting each group decide how to handle it's mark.
+	 * we hold the inode->mark_lock so group unregister can't free out entries
+	 * under us.  Be careful if you drop this lock.
+	 */
+	while (!list_empty(&list)) {
+		entry = list_first_entry(&list, struct fsnotify_mark_entry, i_list);
+		entry->group->ops->mark_clear_inode(entry, inode, mask, flags);
+	}
+	spin_unlock(&inode->i_fsnotify_lock);
+}
+
+/*
+ * add (we use |=) the mark to the in core inode mark
+ *
+ * THIS FUNCTION RETURNS HOLDING THE INODE LOCK!!!
+ */
+struct fsnotify_mark_entry *fsnotify_mark_add(struct fsnotify_group *group, struct inode *inode, unsigned long mask)
+{
+	/* we initialize entry to shut up the compiler in case we just to out... */
+	struct fsnotify_mark_entry *entry = NULL, *lentry;
+
+	/* pre allocate an entry so we can hold the lock */
+	entry = fsnotify_mark_alloc();
+	if (!entry)
+		return NULL;
+
+	/*
+	 * this is the only place we hold both the group and the inode lock
+	 * we could take them in either order, but we take the mutex first
+	 * so we don't sleep holding the spinlock
+	 */
+	mutex_lock(&group->mark_mutex);
+	spin_lock(&inode->i_fsnotify_lock);
+	list_for_each_entry(lentry, &inode->i_fsnotify_mark_entries, i_list) {
+		if (lentry->group == group) {
+			lentry->mask |= mask;
+			/* we didn't use entry, kill it */
+			fsnotify_mark_kill(entry);
+			entry = lentry;
+			fsnotify_mark_get(entry);
+			goto out_unlock;
+		}
+	}
+
+	spin_lock_init(&entry->lock);
+	entry->refcnt = 1;
+	entry->group = group;
+	entry->mask = mask;
+	entry->inode = inode;
+	entry->killme = 0;
+	entry->private = NULL;
+
+	list_add(&entry->i_list, &inode->i_fsnotify_mark_entries);
+	list_add(&entry->g_list, &group->mark_entries);
+
+out_unlock:
+	mutex_unlock(&group->mark_mutex);
+	return entry;
+}
+
+void fsnotify_recalc_inode_mask(struct inode *inode)
+{
+	unsigned long new_mask = 0;
+	struct fsnotify_mark_entry *entry;
+
+	list_for_each_entry(entry, &inode->i_fsnotify_mark_entries, i_list) {
+		new_mask |= entry->mask;
+	}
+
+	inode->i_fsnotify_mask = new_mask;
+}
+
+
+__init int fsnotify_mark_init(void)
+{
+	fsnotify_mark_kmem_cache = kmem_cache_create("fsnotify_mark_entry", sizeof(struct fsnotify_mark_entry), 0, SLAB_PANIC, NULL);
+
+	return 0;
+}
+subsys_initcall(fsnotify_mark_init);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0dcdd94..de1e477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -664,6 +664,12 @@ struct inode {
 
 	__u32			i_generation;
 
+#ifdef CONFIG_FSNOTIFY
+	unsigned long		i_fsnotify_mask; /* all events this inode cares about */
+	struct list_head	i_fsnotify_mark_entries; /* fsnotify mark entries */
+	spinlock_t		i_fsnotify_lock; /* protect the entries list */
+#endif
+
 #ifdef CONFIG_DNOTIFY
 	unsigned long		i_dnotify_mask; /* Directory notify events */
 	struct dnotify_struct	*i_dnotify; /* for directory notifications */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 6fbf455..efd1d85 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -81,6 +81,14 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 }
 
 /*
+ * fsnotify_inode_delete - and inode is being evicted from cache, clean up is needed
+ */
+static inline void fsnotify_inode_delete(struct inode *inode)
+{
+	__fsnotify_inode_delete(inode);
+}
+
+/*
  * fsnotify_inoderemove - an inode is going away
  */
 static inline void fsnotify_inoderemove(struct inode *inode)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index e0b5528..d3fda04 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -40,9 +40,12 @@
 
 struct fsnotify_group;
 struct fsnotify_event;
+struct fsnotify_mark_entry;
 
 struct fsnotify_ops {
 	int (*event_to_notif)(struct fsnotify_group *group, struct fsnotify_event *event);
+	void (*mark_clear_inode)(struct fsnotify_mark_entry *entry, struct inode *inode, unsigned long mask, unsigned int flags);
+	int (*should_send_event)(struct fsnotify_group *group, struct inode *inode, unsigned long mask);
 };
 
 struct fsnotify_group {
@@ -58,6 +61,10 @@ struct fsnotify_group {
 	struct list_head notification_list;	/* list of event_holder this group needs to send to userspace */
 	wait_queue_head_t notification_waitq;	/* read() on the notification file blocks on this waitq */
 
+	/* stores all fastapth entries assoc with this group so they can be cleaned on unregister */
+	struct mutex mark_mutex;    /* protect mark_entries list */
+	struct list_head mark_entries; /* all inode mark entries for this group */
+
 	unsigned int priority;		/* order this group should receive msgs.  low first */
 };
 
@@ -65,16 +72,30 @@ struct fsnotify_group {
 
 /* called from the vfs to signal fs events */
 extern void fsnotify(struct file *file, struct dentry *dentry, struct inode *inode, unsigned long mask);
+extern void __fsnotify_inode_delete(struct inode *inode);
 
 /* called from fsnotify interfaces, such as fanotify or dnotify */
 extern void fsnotify_recalc_global_mask(void);
 extern void fsnotify_get_group(struct fsnotify_group *group);
 extern struct fsnotify_group *fsnotify_find_group(unsigned int priority, unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops);
 extern void fsnotify_put_group(struct fsnotify_group *group);
+
+extern void fsnotify_recalc_inode_mask(struct inode *inode);
+extern void fsnotify_mark_get(struct fsnotify_mark_entry *entry);
+extern void fsnotify_mark_put(struct fsnotify_mark_entry *entry);
+extern void fsnotify_clear_mark_group(struct fsnotify_group *group);
+extern void fsnotify_kill_mark_inode(struct fsnotify_mark_entry *entry);
+extern void fsnotify_clear_mark_inode(struct inode *inode, unsigned long mask, unsigned int flags);
+extern struct fsnotify_mark_entry *fsnotify_mark_add(struct fsnotify_group *group, struct inode *inode, unsigned long mask);
+
 #else
 
 static inline void fsnotify(struct file *file, unsigned long mask)
 {}
+
+static inline void __fsnotify_inode_delete(struct inode *inode)
+{}
+
 #endif	/* CONFIG_FSNOTIFY */
 
 #endif	/* __KERNEL __ */


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
                   ` (6 preceding siblings ...)
  2008-11-25 17:21 ` [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings Eric Paris
@ 2008-11-25 17:21 ` Eric Paris
  2008-11-28  5:14   ` Al Viro
  2008-11-28  6:25   ` Al Viro
  2008-11-26  0:14 ` [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Andrew Morton
  8 siblings, 2 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-25 17:21 UTC (permalink / raw)
  To: linux-kernel, malware-list; +Cc: viro, akpm, alan, arjan, hch, a.p.zijlstra

Reimplement dnotify using fsnotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/dnotify/Kconfig   |    1 
 fs/notify/dnotify/dnotify.c |  334 +++++++++++++++++++++++++++++++++++--------
 include/linux/dnotify.h     |   21 +--
 include/linux/fs.h          |    5 -
 include/linux/fsnotify.h    |   45 +++---
 5 files changed, 300 insertions(+), 106 deletions(-)

diff --git a/fs/notify/dnotify/Kconfig b/fs/notify/dnotify/Kconfig
index 26adf5d..904ff8d 100644
--- a/fs/notify/dnotify/Kconfig
+++ b/fs/notify/dnotify/Kconfig
@@ -1,5 +1,6 @@
 config DNOTIFY
 	bool "Dnotify support"
+	depends on FSNOTIFY
 	default y
 	help
 	  Dnotify is a directory-based per-fd file change notification system
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 676073b..6908860 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -21,24 +21,169 @@
 #include <linux/spinlock.h>
 #include <linux/slab.h>
 #include <linux/fdtable.h>
+#include <linux/fsnotify_backend.h>
+
+#include "../fsnotify.h"
 
 int dir_notify_enable __read_mostly = 1;
 
 static struct kmem_cache *dn_cache __read_mostly;
 
-static void redo_inode_mask(struct inode *inode)
+static int inode_dir_notify(struct fsnotify_group *group, struct fsnotify_event *event);
+static void clear_mark_dir_notify(struct fsnotify_mark_entry *entry, struct inode *inode, unsigned long mask, unsigned int flags);
+static int should_send_event_dir_notify(struct fsnotify_group *group, struct inode *inode, unsigned long mask);
+
+static struct fsnotify_ops dnotify_fsnotify_ops = {
+	.event_to_notif = inode_dir_notify,
+	.mark_clear_inode = clear_mark_dir_notify,
+	.should_send_event = should_send_event_dir_notify,
+};
+
+static inline struct fsnotify_mark_entry *dnotify_get_mark(struct inode *inode)
+{
+	struct fsnotify_mark_entry *entry;
+	list_for_each_entry(entry, &inode->i_fsnotify_mark_entries, i_list) {
+		if (entry->group->ops == &dnotify_fsnotify_ops)
+			fsnotify_mark_get(entry);
+			return entry;
+	}
+	return NULL;
+}
+
+/* holding the fsnotify_mark_lock to protect the private data and the inode mask */
+static void dnotify_recalc_inode_mask(struct fsnotify_mark_entry *entry)
 {
 	unsigned long new_mask;
 	struct dnotify_struct *dn;
 
 	new_mask = 0;
-	for (dn = inode->i_dnotify; dn != NULL; dn = dn->dn_next)
-		new_mask |= dn->dn_mask & ~DN_MULTISHOT;
-	inode->i_dnotify_mask = new_mask;
+	dn = (struct dnotify_struct *)entry->private;
+	for (; dn != NULL; dn = dn->dn_next)
+		new_mask |= (dn->dn_mask & ~FS_DN_MULTISHOT);
+	entry->mask = new_mask;
+	fsnotify_recalc_inode_mask(entry->inode);
+}
+
+static int inode_dir_notify(struct fsnotify_group *group, struct fsnotify_event *event)
+{
+	struct fsnotify_mark_entry *entry = NULL;
+	struct inode *inode;
+	struct dnotify_struct *dn;
+	struct dnotify_struct **prev;
+	struct fown_struct *fown;
+	int changed = 0;
+
+	switch (event->flag) {
+	case FSNOTIFY_EVENT_INODE:
+		inode = event->inode;
+		break;
+	case FSNOTIFY_EVENT_PATH:
+		inode = event->path.dentry->d_inode;
+		break;
+	case FSNOTIFY_EVENT_DENTRY:
+		inode = event->dentry->d_inode;
+		break;
+	default:
+		BUG();
+	};
+
+	spin_lock(&inode->i_fsnotify_lock);
+
+	entry = dnotify_get_mark(inode);
+	/* unlikely since we alreay passed should_send_event_dir_notify() */
+	if (unlikely(!entry))
+		goto out_unlock;
+
+	prev = (struct dnotify_struct **)&entry->private;
+	while ((dn = *prev) != NULL) {
+		if ((dn->dn_mask & event->mask) == 0) {
+			prev = &dn->dn_next;
+			continue;
+		}
+		fown = &dn->dn_filp->f_owner;
+		send_sigio(fown, dn->dn_fd, POLL_MSG);
+		if (dn->dn_mask & FS_DN_MULTISHOT)
+			prev = &dn->dn_next;
+		else {
+			*prev = dn->dn_next;
+			changed = 1;
+			kmem_cache_free(dn_cache, dn);
+		}
+	}
+	if (changed)
+		dnotify_recalc_inode_mask(entry);
+
+out_unlock:
+	spin_unlock(&inode->i_fsnotify_lock);
+	if (entry)
+		fsnotify_mark_put(entry);
+
+	return 0;
+}
+
+static void clear_mark_dir_notify(struct fsnotify_mark_entry *entry, struct inode *inode, unsigned long mask __attribute__ ((unused)), unsigned int flags)
+{
+	struct dnotify_struct *dn;
+	struct dnotify_struct **prev;
+	struct fsnotify_group *dnotify_group;
+
+	/* if this isn't a dir it better not have dnotify marks */
+	BUG_ON(!S_ISDIR(inode->i_mode));
+
+	if (!(flags & FSNOTIFY_FORCE_CLEAR_MARK)) {
+		/* not a force just put it back */
+		list_move(&entry->i_list, &inode->i_fsnotify_mark_entries);
+		return;
+	}
+
+	/* everything should have been cleaned up before we got here. */
+	/* this could/should be a BUG(), but for now just be safe */
+	WARN(1, "A dnotify watch survived until the inode was evicted form cache!\n");
+
+	/* free our private data */
+	prev = (struct dnotify_struct **)&entry->private;
+	while ((dn = *prev) != NULL) {
+		*prev = dn->dn_next;
+		kmem_cache_free(dn_cache, dn);
+	}
+
+	dnotify_group = entry->group;
+	/* clean up both lists and mark the entry for free'ing */
+	fsnotify_kill_mark_inode(entry);
+	fsnotify_put_group(dnotify_group);
+}
+
+static int should_send_event_dir_notify(struct fsnotify_group *group, struct inode *inode, unsigned long mask)
+{
+	struct fsnotify_mark_entry *entry;
+	int send = 0;
+
+	/* !dir_notify_enable should never get here, don't waste time checking
+	if (!dir_notify_enable)
+		return 0; */
+
+	/* not a dir, dnotify doesn't care */
+	if (!S_ISDIR(inode->i_mode))
+		return 0;
+
+	spin_lock(&inode->i_fsnotify_lock);
+
+	/* no mark means no dnotify watch */
+	entry = dnotify_get_mark(inode);
+	if (!entry)
+		goto out;
+
+	send = !!(mask & entry->mask);
+	fsnotify_mark_put(entry);
+out:
+	spin_unlock(&inode->i_fsnotify_lock);
+	return send;
 }
 
 void dnotify_flush(struct file *filp, fl_owner_t id)
 {
+	struct fsnotify_group *dnotify_group = NULL;
+	struct fsnotify_mark_entry *entry = NULL;
 	struct dnotify_struct *dn;
 	struct dnotify_struct **prev;
 	struct inode *inode;
@@ -46,22 +191,62 @@ void dnotify_flush(struct file *filp, fl_owner_t id)
 	inode = filp->f_path.dentry->d_inode;
 	if (!S_ISDIR(inode->i_mode))
 		return;
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
+	spin_lock(&inode->i_fsnotify_lock);
+	entry = dnotify_get_mark(inode);
+	if (!entry)
+		goto out_unlock;
+
+	prev = (struct dnotify_struct **)&entry->private;
 	while ((dn = *prev) != NULL) {
 		if ((dn->dn_owner == id) && (dn->dn_filp == filp)) {
 			*prev = dn->dn_next;
-			redo_inode_mask(inode);
+			dnotify_recalc_inode_mask(entry);
 			kmem_cache_free(dn_cache, dn);
 			break;
 		}
 		prev = &dn->dn_next;
 	}
-	spin_unlock(&inode->i_lock);
+
+	/* last dnotify watch on this inode is gone */
+	if (entry->private == NULL) {
+		dnotify_group = entry->group;
+		fsnotify_kill_mark_inode(entry);
+	}
+out_unlock:
+	spin_unlock(&inode->i_fsnotify_lock);
+	if (entry)
+		fsnotify_mark_put(entry);
+	if (dnotify_group)
+		fsnotify_put_group(dnotify_group);
+}
+
+/* this conversion is done only at watch creation */
+static inline unsigned long convert_arg(unsigned long arg)
+{
+	unsigned long new_mask = 0;
+
+	if (arg & DN_MULTISHOT)
+		new_mask |= FS_DN_MULTISHOT;
+	if (arg & DN_DELETE)
+		new_mask |= (FS_DELETE | FS_DELETE_CHILD);
+	if (arg & DN_MODIFY)
+		new_mask |= (FS_MODIFY | FS_MODIFY_CHILD);
+	if (arg & DN_ACCESS)
+		new_mask |= (FS_ACCESS | FS_ACCESS_CHILD);
+	if (arg & DN_ATTRIB)
+		new_mask |= (FS_ATTRIB | FS_ATTRIB_CHILD);
+	if (arg & DN_RENAME)
+		new_mask |= FS_RENAME;
+	if (arg & DN_CREATE)
+		new_mask |= FS_CREATE;
+
+	return new_mask;
 }
 
 int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 {
+	struct fsnotify_group *dnotify_group;
+	struct fsnotify_mark_entry *entry;
 	struct dnotify_struct *dn;
 	struct dnotify_struct *odn;
 	struct dnotify_struct **prev;
@@ -69,27 +254,63 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	fl_owner_t id = current->files;
 	struct file *f;
 	int error = 0;
+	unsigned long mask;
+
+	if (!dir_notify_enable)
+		return -EINVAL;
 
 	if ((arg & ~DN_MULTISHOT) == 0) {
 		dnotify_flush(filp, id);
 		return 0;
 	}
-	if (!dir_notify_enable)
-		return -EINVAL;
 	inode = filp->f_path.dentry->d_inode;
 	if (!S_ISDIR(inode->i_mode))
 		return -ENOTDIR;
+
+	/* expect most fcntl to add new rather than augment old */
 	dn = kmem_cache_alloc(dn_cache, GFP_KERNEL);
 	if (dn == NULL)
 		return -ENOMEM;
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
+
+	/* convert the userspace DN_* "arg" to the internal FS_* defines in fsnotify */
+	mask = convert_arg(arg);
+
+	/*
+	 * I really don't like using ALL_DNOTIFY_EVENTS.  We could probably do
+	 * better setting the group->mask equal to only those dnotify watches care
+	 * about, but removing events means running the entire group->mark_entries
+	 * list to recalculate the mask.  Also makes it harder to find the right
+	 * group, but this is not a fast path, so harder doesn't mean bad.
+	 * Maybe a future performance win since it could result in faster fsnotify()
+	 * processing.
+	 */
+	dnotify_group = fsnotify_find_group(INT_MAX, INT_MAX, ALL_DNOTIFY_EVENTS, &dnotify_fsnotify_ops);
+	/* screw it, i don't care */
+	if (IS_ERR(dnotify_group)) {
+		error = PTR_ERR(dnotify_group);
+		goto out_free;
+	}
+
+	/* if successful mark_add returns holding the inode->i_fsnotify_lock */
+	entry = fsnotify_mark_add(dnotify_group, inode, mask);
+	if (!entry) {
+		error = -ENOMEM;
+		goto out_put_group;
+	}
+	/* entry->private == NULL means that this is a new inode mark.
+	 * take a group reference for it.  */
+	if (entry->private == NULL)
+		fsnotify_get_group(dnotify_group);
+
+	prev = (struct dnotify_struct **)&entry->private;
 	while ((odn = *prev) != NULL) {
+		/* do we already have a dnotify struct and we are just adding more events? */
 		if ((odn->dn_owner == id) && (odn->dn_filp == filp)) {
 			odn->dn_fd = fd;
-			odn->dn_mask |= arg;
-			inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
-			goto out_free;
+			odn->dn_mask |= mask;
+			/* recalculate the entry->mask and entry->inode->i_fsnotify_mask */
+			dnotify_recalc_inode_mask(entry);
+			goto out_unlock;
 		}
 		prev = &odn->dn_next;
 	}
@@ -98,65 +319,38 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	f = fcheck(fd);
 	rcu_read_unlock();
 	/* we'd lost the race with close(), sod off silently */
-	/* note that inode->i_lock prevents reordering problems
-	 * between accesses to descriptor table and ->i_dnotify */
+	/* note that inode->i_fsnotify_lock prevents reordering problems
+	 * between accesses to descriptor table and the private data in the
+	 * inode mark */
 	if (f != filp)
-		goto out_free;
+		goto out_unlock;
 
 	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
 	if (error)
-		goto out_free;
+		goto out_unlock;
 
-	dn->dn_mask = arg;
+	dn->dn_mask = mask;
 	dn->dn_fd = fd;
 	dn->dn_filp = filp;
 	dn->dn_owner = id;
-	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
-	dn->dn_next = inode->i_dnotify;
-	inode->i_dnotify = dn;
-	spin_unlock(&inode->i_lock);
-
-	if (filp->f_op && filp->f_op->dir_notify)
-		return filp->f_op->dir_notify(filp, arg);
+	dn->dn_next = entry->private;
+	entry->private = dn;
+	dnotify_recalc_inode_mask(entry);
+	spin_unlock(&inode->i_fsnotify_lock);
+	fsnotify_mark_put(entry);
+	fsnotify_put_group(dnotify_group);
 	return 0;
 
+out_unlock:
+	spin_unlock(&inode->i_fsnotify_lock);
+	fsnotify_mark_put(entry);
+out_put_group:
+	fsnotify_put_group(dnotify_group);
 out_free:
-	spin_unlock(&inode->i_lock);
 	kmem_cache_free(dn_cache, dn);
 	return error;
 }
 
-void __inode_dir_notify(struct inode *inode, unsigned long event)
-{
-	struct dnotify_struct *	dn;
-	struct dnotify_struct **prev;
-	struct fown_struct *	fown;
-	int			changed = 0;
-
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
-	while ((dn = *prev) != NULL) {
-		if ((dn->dn_mask & event) == 0) {
-			prev = &dn->dn_next;
-			continue;
-		}
-		fown = &dn->dn_filp->f_owner;
-		send_sigio(fown, dn->dn_fd, POLL_MSG);
-		if (dn->dn_mask & DN_MULTISHOT)
-			prev = &dn->dn_next;
-		else {
-			*prev = dn->dn_next;
-			changed = 1;
-			kmem_cache_free(dn_cache, dn);
-		}
-	}
-	if (changed)
-		redo_inode_mask(inode);
-	spin_unlock(&inode->i_lock);
-}
-
-EXPORT_SYMBOL(__inode_dir_notify);
-
 /*
  * This is hopelessly wrong, but unfixable without API changes.  At
  * least it doesn't oops the kernel...
@@ -164,23 +358,35 @@ EXPORT_SYMBOL(__inode_dir_notify);
  * To safely access ->d_parent we need to keep d_move away from it.  Use the
  * dentry's d_lock for this.
  */
-void dnotify_parent(struct dentry *dentry, unsigned long event)
+void dnotify_parent(struct dentry *dentry, unsigned long mask)
 {
 	struct dentry *parent;
+	struct fsnotify_mark_entry *entry;
 
 	if (!dir_notify_enable)
 		return;
 
 	spin_lock(&dentry->d_lock);
 	parent = dentry->d_parent;
-	if (parent->d_inode->i_dnotify_mask & event) {
+
+	if (!(parent->d_inode->i_fsnotify_mask & mask))
+		goto out_unlock;
+
+	entry = dnotify_get_mark(parent->d_inode);
+	if (!entry)
+		goto out_unlock;
+
+	if (entry->mask & mask) {
 		dget(parent);
 		spin_unlock(&dentry->d_lock);
-		__inode_dir_notify(parent->d_inode, event);
+		fsnotify(NULL, parent, NULL, mask);
 		dput(parent);
-	} else {
-		spin_unlock(&dentry->d_lock);
 	}
+	fsnotify_mark_put(entry);
+	return;
+
+out_unlock:
+	spin_unlock(&dentry->d_lock);
 }
 EXPORT_SYMBOL_GPL(dnotify_parent);
 
diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h
index 102a902..5fbb01c 100644
--- a/include/linux/dnotify.h
+++ b/include/linux/dnotify.h
@@ -21,23 +21,18 @@ struct dnotify_struct {
 
 #ifdef CONFIG_DNOTIFY
 
-extern void __inode_dir_notify(struct inode *, unsigned long);
+#define ALL_DNOTIFY_EVENTS (FS_DELETE | FS_DELETE_CHILD |\
+			    FS_MODIFY | FS_MODIFY_CHILD |\
+			    FS_ACCESS | FS_ACCESS_CHILD |\
+			    FS_ATTRIB | FS_ATTRIB_CHILD |\
+			    FS_CREATE | FS_RENAME)
+
 extern void dnotify_flush(struct file *, fl_owner_t);
 extern int fcntl_dirnotify(int, struct file *, unsigned long);
 extern void dnotify_parent(struct dentry *, unsigned long);
 
-static inline void inode_dir_notify(struct inode *inode, unsigned long event)
-{
-	if (inode->i_dnotify_mask & (event))
-		__inode_dir_notify(inode, event);
-}
-
 #else
 
-static inline void __inode_dir_notify(struct inode *inode, unsigned long event)
-{
-}
-
 static inline void dnotify_flush(struct file *filp, fl_owner_t id)
 {
 }
@@ -51,10 +46,6 @@ static inline void dnotify_parent(struct dentry *dentry, unsigned long event)
 {
 }
 
-static inline void inode_dir_notify(struct inode *inode, unsigned long event)
-{
-}
-
 #endif /* CONFIG_DNOTIFY */
 
 #endif /* __KERNEL __ */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index de1e477..3ada08d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -670,11 +670,6 @@ struct inode {
 	spinlock_t		i_fsnotify_lock; /* protect the entries list */
 #endif
 
-#ifdef CONFIG_DNOTIFY
-	unsigned long		i_dnotify_mask; /* Directory notify events */
-	struct dnotify_struct	*i_dnotify; /* for directory notifications */
-#endif
-
 #ifdef CONFIG_INOTIFY
 	struct list_head	inotify_watches; /* watches on this inode */
 	struct mutex		inotify_mutex;	/* protects the watches list */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index efd1d85..55703fb 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -14,6 +14,7 @@
 #include <linux/dnotify.h>
 #include <linux/inotify.h>
 #include <linux/audit.h>
+#include <linux/fsnotify_backend.h>
 
 /*
  * fsnotify_d_instantiate - instantiate a dentry for inode
@@ -44,11 +45,11 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 	struct inode *source = moved->d_inode;
 	u32 cookie = inotify_get_cookie();
 
-	if (old_dir == new_dir)
-		inode_dir_notify(old_dir, DN_RENAME);
-	else {
-		inode_dir_notify(old_dir, DN_DELETE);
-		inode_dir_notify(new_dir, DN_CREATE);
+	if (old_dir == new_dir) {
+		fsnotify(NULL, NULL, old_dir, FS_RENAME);
+	} else {
+		fsnotify(NULL, NULL, old_dir, FS_DELETE);
+		fsnotify(NULL, NULL, new_dir, FS_CREATE);
 	}
 
 	if (isdir)
@@ -76,7 +77,7 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 {
 	if (isdir)
 		isdir = IN_ISDIR;
-	dnotify_parent(dentry, DN_DELETE);
+	dnotify_parent(dentry, FS_DELETE_CHILD);
 	inotify_dentry_parent_queue_event(dentry, IN_DELETE|isdir, 0, dentry->d_name.name);
 }
 
@@ -110,7 +111,7 @@ static inline void fsnotify_link_count(struct inode *inode)
  */
 static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
 {
-	inode_dir_notify(inode, DN_CREATE);
+	fsnotify(NULL, NULL, inode, FS_CREATE);
 	inotify_inode_queue_event(inode, IN_CREATE, 0, dentry->d_name.name,
 				  dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
@@ -123,7 +124,7 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
  */
 static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct dentry *new_dentry)
 {
-	inode_dir_notify(dir, DN_CREATE);
+	fsnotify(NULL, NULL, dir, FS_CREATE);
 	inotify_inode_queue_event(dir, IN_CREATE, 0, new_dentry->d_name.name,
 				  inode);
 	fsnotify_link_count(inode);
@@ -135,7 +136,7 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct
  */
 static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 {
-	inode_dir_notify(inode, DN_CREATE);
+	fsnotify(NULL, NULL, inode, FS_CREATE);
 	inotify_inode_queue_event(inode, IN_CREATE | IN_ISDIR, 0, 
 				  dentry->d_name.name, dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
@@ -153,7 +154,7 @@ static inline void fsnotify_access(struct file *file)
 	if (S_ISDIR(inode->i_mode))
 		mask |= IN_ISDIR;
 
-	dnotify_parent(dentry, DN_ACCESS);
+	dnotify_parent(dentry, FS_ACCESS_CHILD);
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 }
@@ -170,7 +171,7 @@ static inline void fsnotify_modify(struct file *file)
 	if (S_ISDIR(inode->i_mode))
 		mask |= IN_ISDIR;
 
-	dnotify_parent(dentry, DN_MODIFY);
+	dnotify_parent(dentry, FS_MODIFY_CHILD);
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 }
@@ -183,7 +184,7 @@ static inline void fsnotify_open_exec(struct file *file)
 	struct dentry *dentry = file->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
 
-	dnotify_parent(dentry, DN_ACCESS);
+	dnotify_parent(dentry, FS_ACCESS_CHILD);
 	inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL, NULL);
 }
@@ -244,40 +245,40 @@ static inline void fsnotify_xattr(struct dentry *dentry)
 static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 {
 	struct inode *inode = dentry->d_inode;
-	int dn_mask = 0;
+	int fs_dn_mask = 0;
 	u32 in_mask = 0;
 
 	if (ia_valid & ATTR_UID) {
 		in_mask |= IN_ATTRIB;
-		dn_mask |= DN_ATTRIB;
+		fs_dn_mask |= FS_ATTRIB_CHILD;
 	}
 	if (ia_valid & ATTR_GID) {
 		in_mask |= IN_ATTRIB;
-		dn_mask |= DN_ATTRIB;
+		fs_dn_mask |= FS_ATTRIB_CHILD;
 	}
 	if (ia_valid & ATTR_SIZE) {
 		in_mask |= IN_MODIFY;
-		dn_mask |= DN_MODIFY;
+		fs_dn_mask |= FS_MODIFY_CHILD;
 	}
 	/* both times implies a utime(s) call */
 	if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
 	{
 		in_mask |= IN_ATTRIB;
-		dn_mask |= DN_ATTRIB;
+		fs_dn_mask |= FS_ATTRIB_CHILD;
 	} else if (ia_valid & ATTR_ATIME) {
 		in_mask |= IN_ACCESS;
-		dn_mask |= DN_ACCESS;
+		fs_dn_mask |= FS_ACCESS_CHILD;
 	} else if (ia_valid & ATTR_MTIME) {
 		in_mask |= IN_MODIFY;
-		dn_mask |= DN_MODIFY;
+		fs_dn_mask |= FS_MODIFY_CHILD;
 	}
 	if (ia_valid & ATTR_MODE) {
 		in_mask |= IN_ATTRIB;
-		dn_mask |= DN_ATTRIB;
+		fs_dn_mask |= FS_ATTRIB_CHILD;
 	}
 
-	if (dn_mask)
-		dnotify_parent(dentry, dn_mask);
+	if (fs_dn_mask)
+		dnotify_parent(dentry, fs_dn_mask);
 	if (in_mask) {
 		if (S_ISDIR(inode->i_mode))
 			in_mask |= IN_ISDIR;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend
  2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
                   ` (7 preceding siblings ...)
  2008-11-25 17:21 ` [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify Eric Paris
@ 2008-11-26  0:14 ` Andrew Morton
  2008-11-26  2:00   ` Eric Paris
  8 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2008-11-26  0:14 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, viro, alan, arjan, hch, a.p.zijlstra

On Tue, 25 Nov 2008 12:20:51 -0500
Eric Paris <eparis@redhat.com> wrote:

> This series only reimplements dnotify using the new fsnotify backend.  If
> accepted I will do the work to port inotify as well.  Currently struct inode
> goes from:
> 
> #ifdef CONFIG_DNOTIFY
>        unsigned long           i_dnotify_mask; /* Directory notify events */
>        struct dnotify_struct   *i_dnotify; /* for directory notifications */
> #endif
> 
> to:
> #ifdef CONFIG_FSNOTIFY
>        unsigned long           i_fsnotify_mask; /* all events this inode cares about */
>        struct list_head        i_fsnotify_mark_entries; /* fsnotify mark entries */
>        spinlock_t              i_fsnotify_lock; /* protect the entries list */
> #endif
> 
> so the inode still grows, but the inotify fields will be dropped as well
> resulting in a smaller struct inode.  These are all the fields fanotify will
> want as well.

Did you consider using i_lock to protect that list?  Its mandate is "an
innermost lock which protects fields within the inode".

> 29 files changed, 3100 insertions(+), 1977 deletions(-)

	if (code > code_reviewers)
		fix();

but how?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend
  2008-11-26  0:14 ` [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Andrew Morton
@ 2008-11-26  2:00   ` Eric Paris
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-26  2:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, malware-list, viro, alan, arjan, hch, a.p.zijlstra

On Tue, 2008-11-25 at 16:14 -0800, Andrew Morton wrote:
> On Tue, 25 Nov 2008 12:20:51 -0500
> Eric Paris <eparis@redhat.com> wrote:
> 
> > This series only reimplements dnotify using the new fsnotify backend.  If
> > accepted I will do the work to port inotify as well.  Currently struct inode
> > goes from:
> > 
> > #ifdef CONFIG_DNOTIFY
> >        unsigned long           i_dnotify_mask; /* Directory notify events */
> >        struct dnotify_struct   *i_dnotify; /* for directory notifications */
> > #endif
> > 
> > to:
> > #ifdef CONFIG_FSNOTIFY
> >        unsigned long           i_fsnotify_mask; /* all events this inode cares about */
> >        struct list_head        i_fsnotify_mark_entries; /* fsnotify mark entries */
> >        spinlock_t              i_fsnotify_lock; /* protect the entries list */
> > #endif
> > 
> > so the inode still grows, but the inotify fields will be dropped as well
> > resulting in a smaller struct inode.  These are all the fields fanotify will
> > want as well.
> 
> Did you consider using i_lock to protect that list?  Its mandate is "an
> innermost lock which protects fields within the inode".

I didn't really consider it. It absolutely could be used.  Currently
dnotify used the i_lock and inotify uses it's own smaller mutex.  If
people like I can try to run some perf tests between using i_lock and
this smaller lock and would gladly send a patch on top of this set to
drop the i_fsnotify_lock.

> > 29 files changed, 3100 insertions(+), 1977 deletions(-)
> 
> 	if (code > code_reviewers)
> 		fix();
> 
> but how?

patch #1 does nothing but move dnotify and inotify...
14 files changed, 1929 insertions(+), 1925 deletions(-)

So that total number seems worse than it is (but I'll agree that all of
this for no new functionality sucks.  But at least I promise a smaller
inode struct at the end!


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
@ 2008-11-27 16:14   ` Peter Zijlstra
  2008-11-27 16:17   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-27 16:14 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> +
> +void fsnotify_put_group(struct fsnotify_group *group)
> +{
> +       mutex_lock(&fsnotify_grp_mutex);
> +       if (atomic_dec_and_test(&group->refcnt)) {
> +               list_del_rcu(&group->group_list);
> +               mutex_unlock(&fsnotify_grp_mutex);
> +
> +               synchronize_srcu(&fsnotify_grp_srcu_struct);
> +
> +               fsnotify_recalc_global_mask();
> +               fsnotify_kill_group(group);
> +
> +               return;
> +       }
> +       mutex_unlock(&fsnotify_grp_mutex);
> +
> +       return;
> +}

do you really need that mutex in the ! case?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
  2008-11-27 16:14   ` Peter Zijlstra
@ 2008-11-27 16:17   ` Peter Zijlstra
  2008-11-27 16:20   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-27 16:17 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> +struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long mask, struct fsnotify_ops *ops)
> +{
> +       struct fsnotify_group *group_iter;
> +       struct fsnotify_group *group = NULL;
> +
> +       mutex_lock(&fsnotify_grp_mutex);
> +       list_for_each_entry_rcu(group_iter, &fsnotify_groups, group_list) {
> +               if (group_iter->group_num == group_num) {
> +                       if ((group_iter->mask == mask) &&
> +                           (group_iter->ops == ops)) {
> +                               fsnotify_get_group(group_iter);
> +                               group = group_iter;
> +                       } else
> +                               group = ERR_PTR(-EEXIST);
> +                       goto out;
> +               }
> +       }
> +
> +       group = kmalloc(sizeof(struct fsnotify_group), GFP_KERNEL);
> +       if (!group) {
> +               group = ERR_PTR(-ENOMEM);
> +               goto out;
> +       }
> +
> +       atomic_set(&group->refcnt, 1);
> +
> +       group->group_num = group_num;
> +       group->mask = mask;
> +
> +       mutex_init(&group->notification_mutex);
> +       INIT_LIST_HEAD(&group->notification_list);
> +       init_waitqueue_head(&group->notification_waitq);
> +
> +       group->ops = ops;
> +
> +       /* add it */
> +       list_add_rcu(&group->group_list, &fsnotify_groups);
> +
> +out:
> +       mutex_unlock(&fsnotify_grp_mutex);
> +       fsnotify_recalc_global_mask();
> +       return group;
> +}

Can't you do a lockless lookup and handle the insertion race?

Also, since it creates the object if its not found, _find_ might not be
the best name, how about obtain?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
  2008-11-27 16:14   ` Peter Zijlstra
  2008-11-27 16:17   ` Peter Zijlstra
@ 2008-11-27 16:20   ` Peter Zijlstra
  2008-11-28 23:22     ` Eric Paris
  2008-11-27 16:21   ` Peter Zijlstra
  2008-11-28  4:54   ` Al Viro
  4 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-27 16:20 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> +int fsnotify_check_notif_queue(struct fsnotify_group *group)
> +{
> +       mutex_lock(&group->notification_mutex);
> +       if (!list_empty(&group->notification_list))
> +               return 1;
> +       mutex_unlock(&group->notification_mutex);
> +       return 0;
> +}

> +void fsnotify_clear_notif(struct fsnotify_group *group)
> +{
> +       struct fsnotify_event *event;
> +
> +       while (fsnotify_check_notif_queue(group)) {
> +               event = get_event_from_notif(group);
> +               fsnotify_put_event(event);
> +               /* fsnotify_check_notif_queue() took this lock */
> +               mutex_unlock(&group->notification_mutex);
> +       }
> +}

That is quite horrible, please just open code that to keep the locking
symmetric.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
                     ` (2 preceding siblings ...)
  2008-11-27 16:20   ` Peter Zijlstra
@ 2008-11-27 16:21   ` Peter Zijlstra
  2008-11-28  4:54   ` Al Viro
  4 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-27 16:21 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:

> +void fsnotify_kill_group(struct fsnotify_group *group)
> +{
> +       /* clear the notification queue of all events */
> +       fsnotify_clear_notif(group);
> +
> +       kfree(group);
> +}

We're not sending any signals ;-) destroy or free perhaps?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 6/8] fsnotify: add group priorities
  2008-11-25 17:21 ` [PATCH -v3 6/8] fsnotify: add group priorities Eric Paris
@ 2008-11-27 16:25   ` Peter Zijlstra
  2008-12-01 15:20     ` Eric Paris
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-27 16:25 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> In preperation for blocking fsnotify calls group priorities must be added.
> When multiple groups request the same event type the lowest priority group
> will receive the notification first.

> @@ -114,9 +117,26 @@ struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long
>  
>  	group->ops = ops;
>  
> -	/* add it */
> -	list_add_rcu(&group->group_list, &fsnotify_groups);
> +	/* Do we need to be the first entry? */
> +	if (list_empty(&fsnotify_groups)) {
> +		list_add_rcu(&group->group_list, &fsnotify_groups);
> +		goto out;
> +	}
> +
> +	list_for_each_entry(group_iter, &fsnotify_groups, group_list) {
> +		/* insert in front of this one? */
> +		if (priority < group_iter->priority) {
> +			/* I used list_add_tail() to insert in front of group_iter...  */
> +			list_add_tail_rcu(&group->group_list, &group_iter->group_list);
> +			break;
> +		}
>  
> +		/* are we at the end?  if so insert at end */
> +		if (list_is_last(&group_iter->group_list, &fsnotify_groups)) {
> +			list_add_tail_rcu(&group->group_list, &fsnotify_groups);
> +			break;
> +		}
> +	}
>  out:
>  	mutex_unlock(&fsnotify_grp_mutex);
>  	fsnotify_recalc_global_mask();

What priority range do you need to cater for, and how many groups? I can
imagine for many groups and limit range a priority list might be better
suited.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings
  2008-11-25 17:21 ` [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings Eric Paris
@ 2008-11-27 16:29   ` Peter Zijlstra
  2008-11-28  5:42   ` Al Viro
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-27 16:29 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> +/*
> + * add (we use |=) the mark to the in core inode mark
> + *
> + * THIS FUNCTION RETURNS HOLDING THE INODE LOCK!!!
> + */

YUCK, BUT I GUESS VFS PEOPLE WILL HAVE TO VALIDATE ITS MERRIT.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
                     ` (3 preceding siblings ...)
  2008-11-27 16:21   ` Peter Zijlstra
@ 2008-11-28  4:54   ` Al Viro
  2008-11-28 23:32     ` Eric Paris
  4 siblings, 1 reply; 31+ messages in thread
From: Al Viro @ 2008-11-28  4:54 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Tue, Nov 25, 2008 at 12:21:18PM -0500, Eric Paris wrote:

What the hell is ->notification_list and what in this patchset would
add stuff to it?  Even more interesting question: how long would these
guys remain there and what's to prevent a race with umount?  At least
'inode-only' events will pin down the inode and leaving the matching
iput() until after umount() is a Bad Thing(tm)...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify
  2008-11-25 17:21 ` [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify Eric Paris
@ 2008-11-28  5:14   ` Al Viro
  2008-11-28 23:37     ` Eric Paris
  2008-11-28  6:25   ` Al Viro
  1 sibling, 1 reply; 31+ messages in thread
From: Al Viro @ 2008-11-28  5:14 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Tue, Nov 25, 2008 at 12:21:33PM -0500, Eric Paris wrote:

> +	.mark_clear_inode = clear_mark_dir_notify,

... called under a spinlock

> +static void clear_mark_dir_notify(struct fsnotify_mark_entry *entry, struct inode *inode, unsigned long mask __attribute__ ((unused)), unsigned int flags)
> +{
...
> +	fsnotify_put_group(dnotify_group);

... which grabs a mutex.

Incidentally, why the hell do you bother with refcounting on groups here?
dnotify is not something that's going to be unloaded, for fsck sake...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification
  2008-11-25 17:20 ` [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification Eric Paris
@ 2008-11-28  5:24   ` Al Viro
  0 siblings, 0 replies; 31+ messages in thread
From: Al Viro @ 2008-11-28  5:24 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Tue, Nov 25, 2008 at 12:20:57PM -0500, Eric Paris wrote:
> Adding yet another filesystem notification system it seemed like a good
> idea to clean up fs/ by creating an fs/notify and putting everything
> there.

	FWIW, passing -M to git-format-patch and friends is a damn good
idea - at least it would be immediately obvious from the posted diff
which files simply got moved as-is.  ~3KLines off the mail also wouldn't
hurt...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings
  2008-11-25 17:21 ` [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings Eric Paris
  2008-11-27 16:29   ` Peter Zijlstra
@ 2008-11-28  5:42   ` Al Viro
  2008-11-28 23:43     ` Eric Paris
  1 sibling, 1 reply; 31+ messages in thread
From: Al Viro @ 2008-11-28  5:42 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Tue, Nov 25, 2008 at 12:21:28PM -0500, Eric Paris wrote:

> +void fsnotify_mark_get(struct fsnotify_mark_entry *entry)
> +{
> +	spin_lock(&entry->lock);
> +	entry->refcnt++;
> +	spin_unlock(&entry->lock);
> +}

> +void fsnotify_mark_put(struct fsnotify_mark_entry *entry)
> +{
> +	spin_lock(&entry->lock);
> +	entry->refcnt--;
> +	/* if (!refcnt && killme) we are off both lists and nothing else can find us. */
> +	if ((!entry->refcnt) && (entry->killme)) {
> +		spin_unlock(&entry->lock);
> +		fsnotify_mark_kill(entry);
> +		return;
> +	}
> +	spin_unlock(&entry->lock);
> +}

Uh-huh...  And what happens if fsnotify_mark_get() comes in the middle
of final fsnotify_mark_put()?  You spin on entry->lock, gain it just before
fsnotify_mark_kill() which proceeds to kfree entry under you just as you
increment its refcnt...

> +void fsnotify_clear_mark_group(struct fsnotify_group *group)
> +{
> +	struct fsnotify_mark_entry *entry;
> +	struct inode *inode;
> +
> +	mutex_lock(&group->mark_mutex);
> +	while (!list_empty(&group->mark_entries)) {
> +		entry = list_first_entry(&group->mark_entries, struct fsnotify_mark_entry, g_list);
> +
> +		/* make sure the entry survives until it is off both lists */
> +		fsnotify_mark_get(entry);
> +
> +		/* remove from g_list */
> +		list_del_init(&entry->g_list);
> +		mutex_unlock(&group->mark_mutex);
> +
> +		inode = entry->inode;
> +
> +		spin_lock(&entry->lock);
> +		entry->killme = 1;
> +		spin_unlock(&entry->lock);
> 
> +		/* remove from i_list */
> +		spin_lock(&inode->i_fsnotify_lock);

... and just what would keep the inode from being freed under you here?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify
  2008-11-25 17:21 ` [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify Eric Paris
  2008-11-28  5:14   ` Al Viro
@ 2008-11-28  6:25   ` Al Viro
  2008-11-28 23:44     ` Eric Paris
  1 sibling, 1 reply; 31+ messages in thread
From: Al Viro @ 2008-11-28  6:25 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Tue, Nov 25, 2008 at 12:21:33PM -0500, Eric Paris wrote:
> -	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
> -	dn->dn_next = inode->i_dnotify;
> -	inode->i_dnotify = dn;
> -	spin_unlock(&inode->i_lock);
> -
> -	if (filp->f_op && filp->f_op->dir_notify)
> -		return filp->f_op->dir_notify(filp, arg);
> +	dn->dn_next = entry->private;
> +	entry->private = dn;
> +	dnotify_recalc_inode_mask(entry);
> +	spin_unlock(&inode->i_fsnotify_lock);
> +	fsnotify_mark_put(entry);
> +	fsnotify_put_group(dnotify_group);

Now, that is interesting - you've just taken out the fscked-in-head
->dir_notify().  The action is quite laudable, but it deserves being
announced properly:

* Remove the hopelessly misguided ->dir_notify().  The only instance (cifs)
has been broken by design from the very beginning; the objects it creates
are never destroyed, keep references to struct file they can outlive, nothing
that could possibly evict them exists on close(2) path *and* no locking
whatsoever is done to prevent races with close(), should the previous, er,
deficiencies someday be dealt with.

While we are at it, removing the only call of that method is obviously
only a half of the job...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 3/8] fsnotify: sys_execve and sys_uselib do not call into fsnotify
  2008-11-25 17:21 ` [PATCH -v3 3/8] fsnotify: sys_execve and sys_uselib do not call into fsnotify Eric Paris
@ 2008-11-28 10:16   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2008-11-28 10:16 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch,
	a.p.zijlstra

On Tue, Nov 25, 2008 at 12:21:07PM -0500, Eric Paris wrote:
> sys_execve and sys_uselib do not call into fsnotify so inotify, dnotify,
> and importantly to me fanotify do not see opens on things which are going
> to be exectued.  Create a generic fsnotify hook for these paths.

Can you please send these exec fixes in a separate series not depending
on any of the other stuff?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-27 16:20   ` Peter Zijlstra
@ 2008-11-28 23:22     ` Eric Paris
  2008-11-28 23:39       ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Paris @ 2008-11-28 23:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Thu, 2008-11-27 at 17:20 +0100, Peter Zijlstra wrote:
> On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> > +int fsnotify_check_notif_queue(struct fsnotify_group *group)
> > +{
> > +       mutex_lock(&group->notification_mutex);
> > +       if (!list_empty(&group->notification_list))
> > +               return 1;
> > +       mutex_unlock(&group->notification_mutex);
> > +       return 0;
> > +}
> 
> > +void fsnotify_clear_notif(struct fsnotify_group *group)
> > +{
> > +       struct fsnotify_event *event;
> > +
> > +       while (fsnotify_check_notif_queue(group)) {
> > +               event = get_event_from_notif(group);
> > +               fsnotify_put_event(event);
> > +               /* fsnotify_check_notif_queue() took this lock */
> > +               mutex_unlock(&group->notification_mutex);
> > +       }
> > +}
> 
> That is quite horrible, please just open code that to keep the locking
> symmetric.

While horrible, I use fsnotify_check_notif_queue in my fsnotify (not in
this series as this only includes dnotify) has

wait_event_interruptible(group->notification_waitq, fanotify_check_notif_queue(group));

So I wouldn't know how to open code that...  I can open code this
instance, but it's going to mean redoing all of that other code to
handle having thing not be present when we return.  Since I didn't
submit that as well I guess I'm not allowed to use it as a reason...


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-28  4:54   ` Al Viro
@ 2008-11-28 23:32     ` Eric Paris
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-28 23:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Fri, 2008-11-28 at 04:54 +0000, Al Viro wrote:
> On Tue, Nov 25, 2008 at 12:21:18PM -0500, Eric Paris wrote:
> 
> What the hell is ->notification_list and what in this patchset would
> add stuff to it?  Even more interesting question: how long would these
> guys remain there and what's to prevent a race with umount?  At least
> 'inode-only' events will pin down the inode and leaving the matching
> iput() until after umount() is a Bad Thing(tm)...

It's not in this set, my failure.  But I'm glad you noticed it since you
can help me get it right before I send the fanotify stuff....

If you look at the "fsnotify()" function in

http://marc.info/?l=linux-kernel&m=122650641702090&w=2

you will see users.  I've since moved fanotify_add_event_to_notif() to
be a per group function.  dnotify doesn't make use of the
notification_list.  fanotify will.  I can remove that for this patch set
(but removing everything that isn't in preperation for fanotify leaves
us with little new and useful)

Anyway at the fsnotify_BLAH my intention is to only put events which
include a struct path (for which I've take a path_get()).  When the
event is later pulled off of the queue I call dentry_open.  I assume
that a normal opened fd, if it returns is always safe vs umount.  Since
I've taken a ref to the path I assume it's safe to use in an open call.

In my previous patch set these entries with struct path can survive
forever if userspace fanotify listeners suck.  I saw it as a future
improvement to drop notification events on a timer if needed...

-Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify
  2008-11-28  5:14   ` Al Viro
@ 2008-11-28 23:37     ` Eric Paris
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-28 23:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Fri, 2008-11-28 at 05:14 +0000, Al Viro wrote:
> On Tue, Nov 25, 2008 at 12:21:33PM -0500, Eric Paris wrote:
> 
> > +	.mark_clear_inode = clear_mark_dir_notify,
> 
> ... called under a spinlock
> 
> > +static void clear_mark_dir_notify(struct fsnotify_mark_entry *entry, struct inode *inode, unsigned long mask __attribute__ ((unused)), unsigned int flags)
> > +{
> ...
> > +	fsnotify_put_group(dnotify_group);
> 
> ... which grabs a mutex.

You're right, I should drop and retake the spinlock.  But in reality I
shouldn't ever get here and plan to replace all of this code with a
BUG() rather than the WARN() I have today since I know I can safely
recover.

> Incidentally, why the hell do you bother with refcounting on groups here?
> dnotify is not something that's going to be unloaded, for fsck sake...

Well, I do unregister dnotify if you stop watching any files.  I also
plan to implement inotify as one inotify_init() per group.  And fsnotify
groups exist only as long as there is a fsnotify socket bound....

-Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 5/8] fsnotify: unified filesystem notification backend
  2008-11-28 23:22     ` Eric Paris
@ 2008-11-28 23:39       ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2008-11-28 23:39 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch,
	Ingo Molnar

On Fri, 2008-11-28 at 18:22 -0500, Eric Paris wrote:
> On Thu, 2008-11-27 at 17:20 +0100, Peter Zijlstra wrote:
> > On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> > > +int fsnotify_check_notif_queue(struct fsnotify_group *group)
> > > +{
> > > +       mutex_lock(&group->notification_mutex);
> > > +       if (!list_empty(&group->notification_list))
> > > +               return 1;
> > > +       mutex_unlock(&group->notification_mutex);
> > > +       return 0;
> > > +}
> > 
> > > +void fsnotify_clear_notif(struct fsnotify_group *group)
> > > +{
> > > +       struct fsnotify_event *event;
> > > +
> > > +       while (fsnotify_check_notif_queue(group)) {
> > > +               event = get_event_from_notif(group);
> > > +               fsnotify_put_event(event);
> > > +               /* fsnotify_check_notif_queue() took this lock */
> > > +               mutex_unlock(&group->notification_mutex);
> > > +       }
> > > +}
> > 
> > That is quite horrible, please just open code that to keep the locking
> > symmetric.
> 
> While horrible, I use fsnotify_check_notif_queue in my fsnotify (not in
> this series as this only includes dnotify) has
> 
> wait_event_interruptible(group->notification_waitq, fanotify_check_notif_queue(group));
> 
> So I wouldn't know how to open code that...  I can open code this
> instance, but it's going to mean redoing all of that other code to
> handle having thing not be present when we return.  Since I didn't
> submit that as well I guess I'm not allowed to use it as a reason...

Or you add a lock parameter to wait_event*() which gets unlocked before
schedule and locks again afterwards.

That would allow you to write it like so:

 mutex_lock(&group->notification_mutex);
 wait_event_interruptible_lock(group->notification_waitq,
                               !list_empty(&group_notificatioin_list), 
                               &group_notification_mutex);

 /* handle the !empty list */
 mutex_unlock(&group->notification_mutex);

You could use the type matching magic we have to select between
spinlock/mutex operations for the lock argument.

I've come across such a pattern a few times, most of the times we end up
open coding the wait_event stuff.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings
  2008-11-28  5:42   ` Al Viro
@ 2008-11-28 23:43     ` Eric Paris
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-28 23:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Fri, 2008-11-28 at 05:42 +0000, Al Viro wrote:
> On Tue, Nov 25, 2008 at 12:21:28PM -0500, Eric Paris wrote:
> 
> > +void fsnotify_mark_get(struct fsnotify_mark_entry *entry)
> > +{
> > +	spin_lock(&entry->lock);
> > +	entry->refcnt++;
> > +	spin_unlock(&entry->lock);
> > +}
> 
> > +void fsnotify_mark_put(struct fsnotify_mark_entry *entry)
> > +{
> > +	spin_lock(&entry->lock);
> > +	entry->refcnt--;
> > +	/* if (!refcnt && killme) we are off both lists and nothing else can find us. */
> > +	if ((!entry->refcnt) && (entry->killme)) {
> > +		spin_unlock(&entry->lock);
> > +		fsnotify_mark_kill(entry);
> > +		return;
> > +	}
> > +	spin_unlock(&entry->lock);
> > +}
> 
> Uh-huh...  And what happens if fsnotify_mark_get() comes in the middle
> of final fsnotify_mark_put()?  You spin on entry->lock, gain it just before
> fsnotify_mark_kill() which proceeds to kfree entry under you just as you
> increment its refcnt...

fsnotify_mark_get() can only find this object through either the
entry->i_list or entry->g_list.  When we drop our ref to 0 and hold the
spinlock we know that no other task would have been able to find us on
those lists (everything that searches the i_list holds the
i_fsnotify_lock and that lock was dropped since we cleared ourselves
from that list and the same is true for the lock on the g_list side)

So if kill_me is set and the refcnt == 0 we are not on either list and
no other task could find this to try to call mark_get().   I'll review
it to make sure, but the design is that we are safe since nothing else
can find us to increment the ref cnt.
> 
> > +void fsnotify_clear_mark_group(struct fsnotify_group *group)
> > +{
> > +	struct fsnotify_mark_entry *entry;
> > +	struct inode *inode;
> > +
> > +	mutex_lock(&group->mark_mutex);
> > +	while (!list_empty(&group->mark_entries)) {
> > +		entry = list_first_entry(&group->mark_entries, struct fsnotify_mark_entry, g_list);
> > +
> > +		/* make sure the entry survives until it is off both lists */
> > +		fsnotify_mark_get(entry);
> > +
> > +		/* remove from g_list */
> > +		list_del_init(&entry->g_list);
> > +		mutex_unlock(&group->mark_mutex);
> > +
> > +		inode = entry->inode;
> > +
> > +		spin_lock(&entry->lock);
> > +		entry->killme = 1;
> > +		spin_unlock(&entry->lock);
> > 
> > +		/* remove from i_list */
> > +		spin_lock(&inode->i_fsnotify_lock);
> 
> ... and just what would keep the inode from being freed under you here?

I'll review.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify
  2008-11-28  6:25   ` Al Viro
@ 2008-11-28 23:44     ` Eric Paris
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Paris @ 2008-11-28 23:44 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, malware-list, akpm, alan, arjan, hch, a.p.zijlstra

On Fri, 2008-11-28 at 06:25 +0000, Al Viro wrote:
> On Tue, Nov 25, 2008 at 12:21:33PM -0500, Eric Paris wrote:
> > -	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
> > -	dn->dn_next = inode->i_dnotify;
> > -	inode->i_dnotify = dn;
> > -	spin_unlock(&inode->i_lock);
> > -
> > -	if (filp->f_op && filp->f_op->dir_notify)
> > -		return filp->f_op->dir_notify(filp, arg);
> > +	dn->dn_next = entry->private;
> > +	entry->private = dn;
> > +	dnotify_recalc_inode_mask(entry);
> > +	spin_unlock(&inode->i_fsnotify_lock);
> > +	fsnotify_mark_put(entry);
> > +	fsnotify_put_group(dnotify_group);
> 
> Now, that is interesting - you've just taken out the fscked-in-head
> ->dir_notify().  The action is quite laudable, but it deserves being
> announced properly:
> 
> * Remove the hopelessly misguided ->dir_notify().  The only instance (cifs)
> has been broken by design from the very beginning; the objects it creates
> are never destroyed, keep references to struct file they can outlive, nothing
> that could possibly evict them exists on close(2) path *and* no locking
> whatsoever is done to prevent races with close(), should the previous, er,
> deficiencies someday be dealt with.
> 
> While we are at it, removing the only call of that method is obviously
> only a half of the job...

crap, I actually meant to move that out to the actual do_fcntl() call to
get it out of my way.  I did see that it is useless since we don't
handle responses in any way and obviously this has nothing to do with
dnotify.....

I'll poke the cifs people to see if they are ok with dropping it
altogether properly....


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 6/8] fsnotify: add group priorities
  2008-11-27 16:25   ` Peter Zijlstra
@ 2008-12-01 15:20     ` Eric Paris
  2008-12-01 15:37       ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Paris @ 2008-12-01 15:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Thu, 2008-11-27 at 17:25 +0100, Peter Zijlstra wrote:
> On Tue, 2008-11-25 at 12:21 -0500, Eric Paris wrote:
> > In preperation for blocking fsnotify calls group priorities must be added.
> > When multiple groups request the same event type the lowest priority group
> > will receive the notification first.
> 
> > @@ -114,9 +117,26 @@ struct fsnotify_group *fsnotify_find_group(unsigned int group_num, unsigned long
> >  
> >  	group->ops = ops;
> >  
> > -	/* add it */
> > -	list_add_rcu(&group->group_list, &fsnotify_groups);
> > +	/* Do we need to be the first entry? */
> > +	if (list_empty(&fsnotify_groups)) {
> > +		list_add_rcu(&group->group_list, &fsnotify_groups);
> > +		goto out;
> > +	}
> > +
> > +	list_for_each_entry(group_iter, &fsnotify_groups, group_list) {
> > +		/* insert in front of this one? */
> > +		if (priority < group_iter->priority) {
> > +			/* I used list_add_tail() to insert in front of group_iter...  */
> > +			list_add_tail_rcu(&group->group_list, &group_iter->group_list);
> > +			break;
> > +		}
> >  
> > +		/* are we at the end?  if so insert at end */
> > +		if (list_is_last(&group_iter->group_list, &fsnotify_groups)) {
> > +			list_add_tail_rcu(&group->group_list, &fsnotify_groups);
> > +			break;
> > +		}
> > +	}
> >  out:
> >  	mutex_unlock(&fsnotify_grp_mutex);
> >  	fsnotify_recalc_global_mask();
> 
> What priority range do you need to cater for, and how many groups? 

On a typical system I'd expect to see one group for dnotify (rpmidmapd
uses dnotify so most systems will end up having 1 group I would expect)

inotify I wouldn't expect more than 3-4 inotify_init() calls

fsnotify I wouldn't imagine more than 3 groups.

So total we are talking about maybe 10 groups on a system really making
use of fs notification?

> I can
> imagine for many groups and limit range a priority list might be better
> suited.

talking about plist.h?  Since I don't allow 2 groups with the same
priority I'd say a lot of the plist code would just be overhead (the
prio list and the node list would be the same)

That's not a big deal since I don't really care about the add/remove
code paths since they are all notification overhead/setup/teardown.  I
would think that cleaner simpler code would probably be a better idea
rather than performance for these areas especially since it looks like
the speed critical parts of plists (list_for_each_entry) would be the
exact same.

what I don't see is plists being protected by RCU and looking at
plist_del it doesn't seem like it would be rcu safe.  RCU safe plists
might be a good idea, but for now I think I should just do my own
priority listing so I don't have to hold a lock while I walk the group
list (that path is VERY hot)

-Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v3 6/8] fsnotify: add group priorities
  2008-12-01 15:20     ` Eric Paris
@ 2008-12-01 15:37       ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2008-12-01 15:37 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, malware-list, viro, akpm, alan, arjan, hch

On Mon, 2008-12-01 at 10:20 -0500, Eric Paris wrote:
> 
> > I can
> > imagine for many groups and limit range a priority list might be better
> > suited.
> 
> talking about plist.h?  Since I don't allow 2 groups with the same
> priority I'd say a lot of the plist code would just be overhead (the
> prio list and the node list would be the same)
> 
> That's not a big deal since I don't really care about the add/remove
> code paths since they are all notification overhead/setup/teardown.  I
> would think that cleaner simpler code would probably be a better idea
> rather than performance for these areas especially since it looks like
> the speed critical parts of plists (list_for_each_entry) would be the
> exact same.
> 
> what I don't see is plists being protected by RCU and looking at
> plist_del it doesn't seem like it would be rcu safe.  RCU safe plists
> might be a good idea, but for now I think I should just do my own
> priority listing so I don't have to hold a lock while I walk the group
> list (that path is VERY hot)

plist.h provides a 2d structure, where you can iterate the priorities in
constant time no matter how many items of any one priority are enqueued.

Its basically a list of lists.

If as you say, you only have a hand full of items, there is no point.



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-12-01 15:38 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-25 17:20 [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Eric Paris
2008-11-25 17:20 ` [PATCH -v3 1/8] filesystem notification: create fs/notify to contain all fs notification Eric Paris
2008-11-28  5:24   ` Al Viro
2008-11-25 17:21 ` [PATCH -v3 2/8] fsnotify: pass a file instead of an inode to open, read, and write Eric Paris
2008-11-25 17:21 ` [PATCH -v3 3/8] fsnotify: sys_execve and sys_uselib do not call into fsnotify Eric Paris
2008-11-28 10:16   ` Christoph Hellwig
2008-11-25 17:21 ` [PATCH -v3 4/8] fsnotify: use the new open-exec hook for inotify and dnotify Eric Paris
2008-11-25 17:21 ` [PATCH -v3 5/8] fsnotify: unified filesystem notification backend Eric Paris
2008-11-27 16:14   ` Peter Zijlstra
2008-11-27 16:17   ` Peter Zijlstra
2008-11-27 16:20   ` Peter Zijlstra
2008-11-28 23:22     ` Eric Paris
2008-11-28 23:39       ` Peter Zijlstra
2008-11-27 16:21   ` Peter Zijlstra
2008-11-28  4:54   ` Al Viro
2008-11-28 23:32     ` Eric Paris
2008-11-25 17:21 ` [PATCH -v3 6/8] fsnotify: add group priorities Eric Paris
2008-11-27 16:25   ` Peter Zijlstra
2008-12-01 15:20     ` Eric Paris
2008-12-01 15:37       ` Peter Zijlstra
2008-11-25 17:21 ` [PATCH -v3 7/8] fsnotify: add in inode fsnotify markings Eric Paris
2008-11-27 16:29   ` Peter Zijlstra
2008-11-28  5:42   ` Al Viro
2008-11-28 23:43     ` Eric Paris
2008-11-25 17:21 ` [PATCH -v3 8/8] dnotify: reimplement dnotify using fsnotify Eric Paris
2008-11-28  5:14   ` Al Viro
2008-11-28 23:37     ` Eric Paris
2008-11-28  6:25   ` Al Viro
2008-11-28 23:44     ` Eric Paris
2008-11-26  0:14 ` [PATCH -v3 0/8] file notification: fsnotify a unified file notification backend Andrew Morton
2008-11-26  2:00   ` Eric Paris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox