All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Maciej Żenczykowski" <maze@google.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 1/3] fput: don't abuse task_work_add() when possible
Date: Tue, 8 Sep 2015 19:14:46 +0200	[thread overview]
Message-ID: <20150908171446.GA14589@redhat.com> (raw)
In-Reply-To: <20150908171433.GA14573@redhat.com>

The commit c82199061009 ("task_work: remove fifo ordering guarantee")
removed the "Reverse the list" loop because the ->task_works list can
be huge if (say) a process with 2,000,000 files is killed.

However, imo this doesn't really fix the problem: we do not want this
list to be huge. For example, suppose that keyctl_session_to_parent()
races with the exiting parent which has a lot of opened files. In this
case task_work_cancel() will spend the same time walking the list but
with irqs disabled and tasklist_lock/pi_lock held. Yes, this is very
unlikely, but still this does not look good imho. Plus the out-of-tree
modules like systemtap can (more likely) hit this problem too.

And I don't think that "remove fifo ordering" is the right thing, see
the next change.

With this patch fput(file) checks the last queued work, if it is also
the ____fput() callback, it just adds this "file" to the list processed
by ____fput(). This adds the new ->f_next_put member into "struct file",
but hopefully it can share the memory with another member, see the next
patch. This way the exiting task will likely do task_work_add(____fput)
only once, so ->task_works shouldn't grow too much and we can probably
even remove cond_resched() in task_work_run().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/file_table.c    | 37 ++++++++++++++++++++++++++++++-------
 include/linux/fs.h |  1 +
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05..8b91ef9 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -41,9 +41,12 @@ static struct kmem_cache *filp_cachep __read_mostly;
 
 static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
+#define rcuhead_to_file(head) \
+	container_of(head, struct file, f_u.fu_rcuhead)
+
 static void file_free_rcu(struct rcu_head *head)
 {
-	struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
+	struct file *f = rcuhead_to_file(head);
 
 	put_cred(f->f_cred);
 	kmem_cache_free(filp_cachep, f);
@@ -239,11 +242,6 @@ static void delayed_fput(struct work_struct *unused)
 	}
 }
 
-static void ____fput(struct callback_head *work)
-{
-	__fput(container_of(work, struct file, f_u.fu_rcuhead));
-}
-
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -261,15 +259,40 @@ void flush_delayed_fput(void)
 
 static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
+static void ____fput(struct callback_head *work)
+{
+	struct file *file = rcuhead_to_file(work);
+
+	do {
+		struct file *next = READ_ONCE(file->f_next_put);
+		__fput(file);
+		cond_resched();
+		file = next;
+
+	} while (file);
+}
+
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
 		struct task_struct *task = current;
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
+			struct callback_head *work = READ_ONCE(task->task_works);
+
+			/* avoid task_work_add() below if it is aready pending */
+			if (work && work->func == ____fput) {
+				struct file *prev = rcuhead_to_file(work);
+				file->f_next_put = prev->f_next_put;
+				prev->f_next_put = file;
+				return;
+			}
+
 			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
+			if (!task_work_add(task, &file->f_u.fu_rcuhead, true)) {
+				file->f_next_put = NULL;
 				return;
+			}
 			/*
 			 * After this task has run exit_task_work(),
 			 * task_work_add() will fail.  Fall through to delayed
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8514e65..3941b86 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -861,6 +861,7 @@ struct file {
 	struct file_ra_state	f_ra;
 
 	u64			f_version;
+	struct file		*f_next_put;
 #ifdef CONFIG_SECURITY
 	void			*f_security;
 #endif
-- 
2.4.3


  reply	other threads:[~2015-09-08 17:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-08 17:14 [PATCH 0/3] task_work: restore fifo ordering guarantee Oleg Nesterov
2015-09-08 17:14 ` Oleg Nesterov [this message]
2015-09-08 17:14 ` [PATCH 2/3] fput: move ->f_next_put into a union with ->f_version Oleg Nesterov
2015-09-08 17:14 ` [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee" Oleg Nesterov
2015-09-08 17:39   ` Linus Torvalds
2015-09-08 17:41     ` Linus Torvalds
2015-09-09 13:16     ` Oleg Nesterov
2015-09-09 16:17       ` Linus Torvalds
2015-09-09 16:43         ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150908171446.GA14589@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maze@google.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.