linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mguzik@redhat.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Yann Droneaud <ydroneaud@opteya.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install
Date: Thu, 16 Apr 2015 14:16:31 +0200	[thread overview]
Message-ID: <20150416121628.GA20615@mguzik> (raw)

Hi,

Currently obtaining a new file descriptor results in locking fdtable
twice - once in order to reserve a slot and second time to fill it.

Hack below gets rid of the second lock usage.

It gives me a ~30% speedup (~300k ops -> ~400k ops) in a microbenchmark
where 16 threads create a pipe (2 fds) and call 2 * close.

Results are fluctuating and even with the patch sometimes drop to around
~300k ops. However, without the patch they never get higher.

I can come up with a better benchmark if that's necessary.

Comments?

==============================================

Subject: [PATCH] fs: use a sequence counter instead of file_lock in fd_install

Because the lock is not held, it is possible that fdtable will be
reallocated as we fill it.

RCU is used to guarantee the old table is not freed just in case we
happen to write to it (which is harmless).

sequence counter is used to ensure we updated the right table.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
---
 fs/file.c               | 24 +++++++++++++++++++-----
 include/linux/fdtable.h |  5 +++++
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 93c5f89..bd1ef4c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, int nr)
 	cur_fdt = files_fdtable(files);
 	if (nr >= cur_fdt->max_fds) {
 		/* Continue as planned */
+		write_seqcount_begin(&files->fdt_seqcount);
 		copy_fdtable(new_fdt, cur_fdt);
 		rcu_assign_pointer(files->fdt, new_fdt);
+		write_seqcount_end(&files->fdt_seqcount);
 		if (cur_fdt != &files->fdtab)
 			call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
 	} else {
@@ -256,6 +258,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	atomic_set(&newf->count, 1);
 
 	spin_lock_init(&newf->file_lock);
+	seqcount_init(&newf->fdt_seqcount);
 	newf->next_fd = 0;
 	new_fdt = &newf->fdtab;
 	new_fdt->max_fds = NR_OPEN_DEFAULT;
@@ -429,6 +432,7 @@ void exit_files(struct task_struct *tsk)
 
 struct files_struct init_files = {
 	.count		= ATOMIC_INIT(1),
+	.fdt_seqcount	= SEQCNT_ZERO(fdt_seqcount),
 	.fdt		= &init_files.fdtab,
 	.fdtab		= {
 		.max_fds	= NR_OPEN_DEFAULT,
@@ -552,12 +556,22 @@ EXPORT_SYMBOL(put_unused_fd);
 void __fd_install(struct files_struct *files, unsigned int fd,
 		struct file *file)
 {
+	unsigned long seq;
 	struct fdtable *fdt;
-	spin_lock(&files->file_lock);
-	fdt = files_fdtable(files);
-	BUG_ON(fdt->fd[fd] != NULL);
-	rcu_assign_pointer(fdt->fd[fd], file);
-	spin_unlock(&files->file_lock);
+
+	rcu_read_lock();
+	do {
+		seq = read_seqcount_begin(&files->fdt_seqcount);
+		fdt = files_fdtable_seq(files);
+		/*
+		 * Entry in the table can already be equal to file if we
+		 * had to restart and copy_fdtable picked up our update.
+		 */
+		BUG_ON(!(fdt->fd[fd] == NULL || fdt->fd[fd] == file));
+		rcu_assign_pointer(fdt->fd[fd], file);
+		smp_mb();
+	} while (__read_seqcount_retry(&files->fdt_seqcount, seq));
+	rcu_read_unlock();
 }
 
 void fd_install(unsigned int fd, struct file *file)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 230f87b..9e41765 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
+#include <linux/seqlock.h>
 
 #include <linux/atomic.h>
 
@@ -47,6 +48,7 @@ struct files_struct {
    * read mostly part
    */
 	atomic_t count;
+	seqcount_t fdt_seqcount;
 	struct fdtable __rcu *fdt;
 	struct fdtable fdtab;
   /*
@@ -69,6 +71,9 @@ struct dentry;
 #define files_fdtable(files) \
 	rcu_dereference_check_fdtable((files), (files)->fdt)
 
+#define files_fdtable_seq(files) \
+	rcu_dereference((files)->fdt)
+
 /*
  * The caller must ensure that fd table isn't shared or hold rcu or file lock
  */
-- 
1.8.3.1

             reply	other threads:[~2015-04-16 12:16 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-16 12:16 Mateusz Guzik [this message]
2015-04-16 17:47 ` [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Eric Dumazet
2015-04-16 18:09 ` Al Viro
2015-04-16 20:42   ` Eric Dumazet
2015-04-16 20:55     ` Eric Dumazet
2015-04-16 22:00       ` Mateusz Guzik
2015-04-16 22:52         ` Eric Dumazet
2015-04-16 22:35   ` Mateusz Guzik
2015-04-17 21:46 ` Eric Dumazet
2015-04-17 22:16   ` Mateusz Guzik
2015-04-17 23:02     ` Al Viro
2015-04-18 19:41       ` Eric Dumazet
2015-04-20 13:41         ` Mateusz Guzik
2015-04-20 16:46           ` Eric Dumazet
2015-04-20 16:48             ` Eric Dumazet
2015-04-20 13:06       ` Mateusz Guzik
2015-04-20 13:43         ` Mateusz Guzik
2015-04-20 15:10           ` Mateusz Guzik
2015-04-20 17:15             ` Eric Dumazet
2015-04-20 20:49               ` Eric Dumazet
2015-04-21 18:05                 ` Eric Dumazet
2015-04-21 20:06                   ` Mateusz Guzik
2015-04-21 20:12                     ` Mateusz Guzik
2015-04-21 21:06                       ` Eric Dumazet
2015-04-22  4:59                         ` [PATCH] fs/file.c: don't acquire files->file_lock in fd_install() Eric Dumazet
2015-04-27 19:05                           ` Mateusz Guzik
2015-04-28 16:20                             ` Eric Dumazet
2015-04-29  4:25                           ` [PATCH v2] " Eric Dumazet
2015-06-22  2:32                             ` Al Viro
2015-06-23  5:31                               ` Eric Dumazet
2015-06-30 13:54                             ` [PATCH v3] " Eric Dumazet
2015-04-22 13:31                         ` [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Mateusz Guzik
2015-04-22 13:55                           ` Eric Dumazet
2015-04-21 20:57                     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150416121628.GA20615@mguzik \
    --to=mguzik@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=ydroneaud@opteya.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).