From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: John McCutchan <john@johnmccutchan.com>,
holt@sgi.com, linux-kernel@vger.kernel.org, rml@novell.com,
arnd@arndb.de, hch@lst.de, Dipankar Sarma <dipankar@in.ibm.com>
Subject: Re: udevd is killing file write performance.
Date: Fri, 24 Feb 2006 18:07:43 +1100 [thread overview]
Message-ID: <43FEB0BF.6080403@yahoo.com.au> (raw)
In-Reply-To: <20060223220053.2f7a977e.akpm@osdl.org>
[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]
Andrew Morton wrote:
> John McCutchan <john@johnmccutchan.com> wrote:
>
>> > > @@ -538,7 +537,7 @@
>> > > struct dentry *parent;
>> > > struct inode *inode;
>> > >
>> > > - if (!atomic_read (&inotify_watches))
>> > > + if (!atomic_read (&dentry->d_sb->s_inotify_watches))
>> > > return;
>> > >
>> >
>> > What happens here if we're watching a mountpoint - the parent is on a
>> > different fs?
>>
>> There are four cases to consider here.
>>
>> Case 1: parent fs watched and child fs watched
>> correct results
>> Case 2: parent fs watched and child fs not watched
>> We may not deliver an event that should be delivered.
>> Case 3: parent fs not watched and child fs watched
>> We take d_lock when we don't need to
>> Case 4: parent fs not watched and child fs not watched
>> correct results
>>
>> Case 2 screws us. We have to take the lock to even look at the parent's
>> dentry->d_sb->s_inotify_watches. I don't know of a way around this one.
>
>
> Yeah. There are a lot of "screw"s in this thread.
>
> I wonder if RCU can save us - if we do an rcu_read_lock() we at least know
> that the dentries won't get deallocated. Then we can take a look at
> d_parent (which might not be the parent any more). Once in a million years
> we might send a false event or miss sending an event, depending on where
> our dentry suddenly got moved to. Not very nice, but at least it won't
> oops.
>
> (hopefully cc's Dipankar)
I saw this problem when testing my lockless pagecache a while back.
Attached is a first implementation of what was my idea then of how
to solve it... note it is pretty rough and I never got around to doing
much testing of it.
Basically: moves work out of inotify event time and to inotify attach
/detach time while staying out of the core VFS.
--
SUSE Labs, Novell Inc.
[-- Attachment #2: inotify-dentry-flag.patch --]
[-- Type: text/plain, Size: 5880 bytes --]
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -799,6 +799,7 @@ void d_instantiate(struct dentry *entry,
if (inode)
list_add(&entry->d_alias, &inode->i_dentry);
entry->d_inode = inode;
+ fsnotify_d_instantiate(entry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
}
@@ -850,6 +851,7 @@ struct dentry *d_instantiate_unique(stru
list_add(&entry->d_alias, &inode->i_dentry);
do_negative:
entry->d_inode = inode;
+ fsnotify_d_instantiate(entry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
return NULL;
@@ -980,6 +982,7 @@ struct dentry *d_splice_alias(struct ino
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+ fsnotify_d_instantiate(new, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
@@ -989,6 +992,7 @@ struct dentry *d_splice_alias(struct ino
/* d_instantiate takes dcache_lock, so we do it by hand */
list_add(&dentry->d_alias, &inode->i_dentry);
dentry->d_inode = inode;
+ fsnotify_d_instantiate(dentry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
Index: linux-2.6/fs/inotify.c
===================================================================
--- linux-2.6.orig/fs/inotify.c
+++ linux-2.6/fs/inotify.c
@@ -381,6 +381,41 @@ static int find_inode(const char __user
}
/*
+ * inotify_inode_watched - returns nonzero if there are watches on this inode
+ * and zero otherwise. We call this lockless, we do not care if we race.
+ */
+static inline int inotify_inode_watched(struct inode *inode)
+{
+ return !list_empty(&inode->inotify_watches);
+}
+
+static void set_dentry_child_flags(struct inode *inode, int new_watch)
+{
+ struct dentry *alias;
+
+ if (inotify_inode_watched(inode))
+ return;
+
+ spin_lock(&dcache_lock);
+ list_for_each_entry(alias, &inode->i_dentry, d_alias) {
+ struct dentry *child;
+
+ list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
+ spin_lock(&child->d_lock);
+ if (new_watch) {
+ BUG_ON(child->d_flags & DCACHE_INOTIFY_PARENT_WATCHED);
+ child->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ } else {
+ BUG_ON(!(child->d_flags & DCACHE_INOTIFY_PARENT_WATCHED));
+ child->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED;
+ }
+ spin_unlock(&child->d_lock);
+ }
+ }
+ spin_unlock(&dcache_lock);
+}
+
+/*
* create_watch - creates a watch on the given device.
*
* Callers must hold dev->sem. Calls inotify_dev_get_wd() so may sleep.
@@ -406,6 +441,8 @@ static struct inotify_watch *create_watc
return ERR_PTR(ret);
}
+ set_dentry_child_flags(inode, 1);
+
dev->last_wd = watch->wd;
watch->mask = mask;
atomic_set(&watch->count, 0);
@@ -462,6 +499,8 @@ static void remove_watch_no_event(struct
atomic_dec(&inotify_watches);
idr_remove(&dev->idr, watch->wd);
put_inotify_watch(watch);
+
+ set_dentry_child_flags(watch->inode, 0);
}
/*
@@ -481,16 +520,18 @@ static void remove_watch(struct inotify_
remove_watch_no_event(watch, dev);
}
-/*
- * inotify_inode_watched - returns nonzero if there are watches on this inode
- * and zero otherwise. We call this lockless, we do not care if we race.
- */
-static inline int inotify_inode_watched(struct inode *inode)
+/* Kernel API */
+
+void inotify_d_instantiate(struct dentry *entry, struct inode *inode)
{
- return !list_empty(&inode->inotify_watches);
-}
+ struct dentry *parent;
-/* Kernel API */
+ spin_lock(&entry->d_lock);
+ parent = entry->d_parent;
+ if (inotify_inode_watched(parent->d_inode))
+ entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ spin_unlock(&entry->d_lock);
+}
/**
* inotify_inode_queue_event - queue an event to all watches on this inode
@@ -538,7 +579,7 @@ void inotify_dentry_parent_queue_event(s
struct dentry *parent;
struct inode *inode;
- if (!atomic_read (&inotify_watches))
+ if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
return;
spin_lock(&dentry->d_lock);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -162,6 +162,8 @@ d_iput: no no no yes
#define DCACHE_REFERENCED 0x0008 /* Recently used, don't discard. */
#define DCACHE_UNHASHED 0x0010
+#define DCACHE_INOTIFY_PARENT_WATCHED 0x0020 /* Parent inode is watched */
+
extern spinlock_t dcache_lock;
/**
Index: linux-2.6/include/linux/fsnotify.h
===================================================================
--- linux-2.6.orig/include/linux/fsnotify.h
+++ linux-2.6/include/linux/fsnotify.h
@@ -16,6 +16,12 @@
#include <linux/dnotify.h>
#include <linux/inotify.h>
+static inline void fsnotify_d_instantiate(struct dentry *entry,
+ struct inode *inode)
+{
+ inotify_d_instantiate(entry, inode);
+}
+
/*
* fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
*/
Index: linux-2.6/include/linux/inotify.h
===================================================================
--- linux-2.6.orig/include/linux/inotify.h
+++ linux-2.6/include/linux/inotify.h
@@ -71,6 +71,7 @@ struct inotify_event {
#ifdef CONFIG_INOTIFY
+extern void inotify_d_instantiate(struct dentry *, struct inode *);
extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
const char *);
extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
@@ -81,6 +82,10 @@ extern u32 inotify_get_cookie(void);
#else
+static inline void inotify_d_instantiate(struct dentry *, struct inode *)
+{
+}
+
static inline void inotify_inode_queue_event(struct inode *inode,
__u32 mask, __u32 cookie,
const char *filename)
next prev parent reply other threads:[~2006-02-24 7:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-22 13:42 udevd is killing file write performance Robin Holt
2006-02-22 13:55 ` Andrew Morton
2006-02-22 16:48 ` John McCutchan
2006-02-22 17:50 ` Robin Holt
2006-02-22 20:05 ` Andrew Morton
2006-02-22 21:50 ` Jeff V. Merkey
2006-02-23 12:56 ` Robin Holt
2006-02-23 13:42 ` David Chinner
2006-02-22 22:52 ` John McCutchan
2006-02-22 23:12 ` Andrew Morton
2006-02-22 23:41 ` John McCutchan
2006-02-24 0:14 ` Andrew Morton
2006-02-24 0:14 ` Andrew Morton
2006-02-24 5:47 ` John McCutchan
2006-02-24 6:00 ` Andrew Morton
2006-02-24 7:07 ` Nick Piggin [this message]
2006-02-24 7:16 ` Andrew Morton
2006-02-24 7:19 ` Nick Piggin
2006-02-26 16:58 ` John McCutchan
2006-02-24 18:56 ` Robin Holt
2006-02-25 2:44 ` Nick Piggin
2006-02-25 15:53 ` [patch] inotify: lock avoidance with parent watch status in dentry Nick Piggin
2006-02-28 0:48 ` Andrew Morton
2006-02-26 16:55 ` udevd is killing file write performance John McCutchan
2006-02-27 10:11 ` Nick Piggin
2006-02-27 20:17 ` John McCutchan
2006-02-23 20:38 ` Benjamin LaHaise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43FEB0BF.6080403@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=arnd@arndb.de \
--cc=dipankar@in.ibm.com \
--cc=hch@lst.de \
--cc=holt@sgi.com \
--cc=john@johnmccutchan.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rml@novell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.