netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>, Al Viro <viro@ftp.linux.org.uk>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC, PATCH] dont insert sockets/pipes dentries into dentry_hashtable.
Date: Wed, 01 Nov 2006 14:54:56 +0100	[thread overview]
Message-ID: <4548A730.8060507@cosmosbay.com> (raw)
In-Reply-To: <20061031.231954.23010447.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

David Miller a écrit :
> It turns out that while procfs uses a different "mechanism", those
> procfs symlinks do point to the real socket dentry, so when you
> readlink() on it you do d_path() on the real socket dentry.
> 
> Al Viro just suggested a way around this to me:
> 
> 1) Just mark the dentry HASHED by hand in the dentry flags, but don't
>    actually hash it.
> 
> 2) Create a special dentry->d_deleted method for sockets that returns
>    0 and clears by hand the HASHED flag bit in the dentry (see what
>    dput() does when this happens).
> 
> It's an abuse but it will work.
> 


Thank you David and Al for the feedback.

Here is a new version of the patch with your suggestions included.

If necessary, I could split this patch in 3 elementary patches. I chose to
sent it as one patch for ease of discussion.

[RFC, PATCH] dont insert sockets/pipes dentries into dentry_hashtable.

We currently insert sockets/pipes dentries into the global dentry hashtable.
This is *useless* because there is currently no way these entries can be used
for a lookup(). (/proc/xxx/fd/xxx uses a different mechanism)

Machines with a lot of sockets/pipes might suffer from longer chains in dentry
hashtable.

Since dentries an unhashed dentry means __dpath() adds a " (deleted)", the 
trick for socket/pipe dentries is to :

- Right after d_alloc(), pretend they are hashed by clearing the 
DCACHE_UNHASHED bit. __dpath() & friends work as intended.

- Call d_instantiate() instead of d_add() : dentry is not inserted in hash table.

- Once dput() must clear the dentry, setting again DCACHE_UNHASHED bit inside 
the custom d_delete() function provided by socket/pipe code.


[patch 1/3] Small optimization to bypass RCU freeing in d_free() for dentries 
that were never hashed (like sockets and pipes). Such dentries dont have to 
wait a RCU grace period.

[patch 2/3] Change socket code to use d_instantiate() instead of d_add()

[patch 3/3] Change pipe code to use d_instantiate() instead of d_add()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

[-- Attachment #2: donthash.patch --]
[-- Type: text/plain, Size: 3861 bytes --]

--- linux-2.6.19-rc4/fs/dcache.c	2006-11-01 12:38:23.000000000 +0100
+++ linux-2.6.19-rc4-ed/fs/dcache.c	2006-11-01 13:22:44.000000000 +0100
@@ -68,15 +68,19 @@
 	.age_limit = 45,
 };
 
-static void d_callback(struct rcu_head *head)
+static void __d_free(struct dentry *dentry)
 {
-	struct dentry * dentry = container_of(head, struct dentry, d_u.d_rcu);
-
 	if (dname_external(dentry))
 		kfree(dentry->d_name.name);
 	kmem_cache_free(dentry_cache, dentry); 
 }
 
+static void d_callback(struct rcu_head *head)
+{
+	struct dentry * dentry = container_of(head, struct dentry, d_u.d_rcu);
+	__d_free(dentry);
+}
+
 /*
  * no dcache_lock, please.  The caller must decrement dentry_stat.nr_dentry
  * inside dcache_lock.
@@ -85,7 +89,11 @@
 {
 	if (dentry->d_op && dentry->d_op->d_release)
 		dentry->d_op->d_release(dentry);
- 	call_rcu(&dentry->d_u.d_rcu, d_callback);
+	/* if dentry was never inserted into hash, immediate free is OK */
+	if (dentry->d_hash.pprev == NULL)
+		__d_free(dentry);
+	else
+		call_rcu(&dentry->d_u.d_rcu, d_callback);
 }
 
 /*
--- linux-2.6.19-rc4/net/socket.c	2006-11-01 12:40:27.000000000 +0100
+++ linux-2.6.19-rc4-ed/net/socket.c	2006-11-01 14:33:18.000000000 +0100
@@ -306,7 +306,14 @@
 
 static int sockfs_delete_dentry(struct dentry *dentry)
 {
-	return 1;
+	/*
+	 * At creation time, we pretended this dentry was hashed
+	 * (by clearing DCACHE_UNHASHED bit in d_flags)
+	 * At delete time, we restore the truth : not hashed.
+	 * (so that dput() can proceed correctly)
+	 */
+	dentry->d_flags |= DCACHE_UNHASHED;
+	return 0;
 }
 static struct dentry_operations sockfs_dentry_operations = {
 	.d_delete = sockfs_delete_dentry,
@@ -354,14 +361,20 @@
 
 	this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino);
 	this.name = name;
-	this.hash = SOCK_INODE(sock)->i_ino;
+	this.hash = 0;
 
 	file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this);
 	if (unlikely(!file->f_dentry))
 		return -ENOMEM;
 
 	file->f_dentry->d_op = &sockfs_dentry_operations;
-	d_add(file->f_dentry, SOCK_INODE(sock));
+	/*
+	 * We dont want to push this dentry into global dentry hash table. 
+	 * We pretend dentry is already hashed, by unsetting DCACHE_UNHASHED
+	 * This hack permits a working /proc/$pid/fd/XXX on sockets
+	 */
+	file->f_dentry->d_flags &= ~DCACHE_UNHASHED;
+	d_instantiate(file->f_dentry, SOCK_INODE(sock));
 	file->f_vfsmnt = mntget(sock_mnt);
 	file->f_mapping = file->f_dentry->d_inode->i_mapping;
 
--- linux-2.6.19-rc4/fs/pipe.c	2006-11-01 12:56:05.000000000 +0100
+++ linux-2.6.19-rc4-ed/fs/pipe.c	2006-11-01 14:33:18.000000000 +0100
@@ -830,7 +830,14 @@
 static struct vfsmount *pipe_mnt __read_mostly;
 static int pipefs_delete_dentry(struct dentry *dentry)
 {
-	return 1;
+	/*
+	 * At creation time, we pretended this dentry was hashed
+	 * (by clearing DCACHE_UNHASHED bit in d_flags)
+	 * At delete time, we restore the truth : not hashed.
+	 * (so that dput() can proceed correctly)
+	 */
+	dentry->d_flags |= DCACHE_UNHASHED;
+	return 0;
 }
 
 static struct dentry_operations pipefs_dentry_operations = {
@@ -891,17 +898,22 @@
 	if (!inode)
 		goto err_file;
 
-	sprintf(name, "[%lu]", inode->i_ino);
+	this.len = sprintf(name, "[%lu]", inode->i_ino);
 	this.name = name;
-	this.len = strlen(name);
-	this.hash = inode->i_ino; /* will go */
+	this.hash = 0;
 	err = -ENOMEM;
 	dentry = d_alloc(pipe_mnt->mnt_sb->s_root, &this);
 	if (!dentry)
 		goto err_inode;
 
 	dentry->d_op = &pipefs_dentry_operations;
-	d_add(dentry, inode);
+	/*
+	 * We dont want to push this dentry into global dentry hash table. 
+	 * We pretend dentry is already hashed, by unsetting DCACHE_UNHASHED
+	 * This hack permits a working /proc/$pid/fd/XXX on pipes
+	 */
+	dentry->d_flags &= ~DCACHE_UNHASHED;
+	d_instantiate(dentry, inode);
 	f->f_vfsmnt = mntget(pipe_mnt);
 	f->f_dentry = dentry;
 	f->f_mapping = inode->i_mapping;

  parent reply	other threads:[~2006-11-01 13:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-24 14:17 [PATCH] fix integer overflow in H-TCP congestion control Gavin McCullagh
2006-10-24 22:30 ` David Miller
2006-10-25  8:47   ` Gavin McCullagh
2006-10-26  6:06     ` David Miller
2006-10-31 18:48       ` [RFC, PATCH] dont insert sockets/pipes dentries into dentry_hashtable Eric Dumazet
2006-11-01  7:19         ` David Miller
2006-11-01  8:21           ` Eric Dumazet
2006-11-01  8:34             ` David Miller
2006-11-01  8:38             ` Al Viro
2006-11-01  8:42               ` Al Viro
2006-11-01  9:04               ` Eric Dumazet
2006-11-01 13:54           ` Eric Dumazet [this message]
2006-11-22 18:00         ` [PATCH] [NET] dont insert socket " Eric Dumazet
2006-11-28 23:35           ` David Miller
2006-11-29  0:13             ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4548A730.8060507@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=viro@ftp.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).