From: Oleg Nesterov <oleg@redhat.com>
To: Al Viro <viro@ZenIV.linux.org.uk>,
Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@kernel.org>,
"Maciej Żenczykowski" <maze@google.com>
Subject: [PATCH?] fput: don't abuse task_work_add() too much
Date: Mon, 7 Sep 2015 14:27:09 +0200 [thread overview]
Message-ID: <20150907122709.GA31811@redhat.com> (raw)
In-Reply-To: <20150905053536.GD22011@ZenIV.linux.org.uk>
On 09/05, Al Viro wrote:
>
> On Sat, Aug 29, 2015 at 02:49:21PM +0200, Oleg Nesterov wrote:
>
> > Let's look keyctl_session_to_parent(). It does task_work_cancel()
> > but only because we can not trust user-space. Otherwise we could
> > remove it and just do task_work_add(), but this needs fifo.
> >
> > Fifo just looks more sane to me.
>
> Not if it costs us.
OK, Eric reports this trivial lockless loop takes 275ms. Apparently the
list of ____fput() works is huge. Now, how much does it cost to _create_
this huge list using task_work_add() which does cmpxchg() ? I guess a bit
more. This is what should be fixed if we think this hurts performance-wise.
And again, ignoring the latency problem due to the lack of cond_resched,
I am wondering if these 275ms are actually noticable compared to the time
the next loop needs to call all these ____fput's.
> As far as files closing is concerned, the order
> really doesn't matter. Ditto for final mntput() uses of that stuff.
Sure, fput/mntput do not care about the ordering. And more, they do
not even really need task_work_add(). We can just remove it from fput()
and everything will work fine. Correctness-wise, I mean.
And yes, unfortunately we do not have in-kernel users which already
rely on fifo.
But task_work_add() is the generic API, loosing the ordering makes
it less useful or at least less convenient.
Just for (yes sure, artificial) example. Suppose we want to implement
sys_change_process_flags(int pid, uint set, uint clr). Only current
can change its ->flags, so we can use task_work_add() to do this. But
obviously only if it is fifo.
fput() differs because it does not care which process actually does
__fput(). And thus imo we should not count this user if we need to
decide do we need fifo or not.
> IMO the obvious solution is to lose the reordering...
Oh, I disagree. But I guess I can't convince you/Eric/Linus, so I have
to shut up.
Damn. But I can't relax ;) Al, Linus, could you comment the patch below?
Not for inclusion, lacks the changelog/testing, fput() can be simplified.
But as you can see it is simple. With this patch task_work_add(____fput)
will be called only once by (say) do_exit() path. ->fput_list does not
need any serialization / atomic ops / etc. Probably we also need to move
cond_resched() from task_work_run() to ____fput() after this patch.
Again, it is not that I think this actually makes sense, but since you
dislike these 275ms...
What do you think?
Oleg.
---
diff --git a/fs/file_table.c b/fs/file_table.c
index 294174d..36af701 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -241,7 +241,13 @@ static void delayed_fput(struct work_struct *unused)
static void ____fput(struct callback_head *work)
{
- __fput(container_of(work, struct file, f_u.fu_rcuhead));
+ struct task_struct *task = current;
+ do {
+ struct file *file = task->fput_list;
+ task->fput_list = file->f_u.fu_next;
+ __fput(file);
+
+ } while (task->fput_list);
}
/*
@@ -267,9 +273,19 @@ void fput(struct file *file)
struct task_struct *task = current;
if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
+ if (task->fput_list) {
+ /* task_work_add() below was already called */
+ file->f_u.fu_next = task->fput_list;
+ task->fput_list = file;
+ return;
+ }
+
init_task_work(&file->f_u.fu_rcuhead, ____fput);
- if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
+ if (!task_work_add(task, &file->f_u.fu_rcuhead, true)) {
+ file->f_u.fu_next = NULL;
+ task->fput_list = file;
return;
+ }
/*
* After this task has run exit_task_work(),
* task_work_add() will fail. Fall through to delayed
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0774487..73fe16c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -830,6 +830,7 @@ struct file {
union {
struct llist_node fu_llist;
struct rcu_head fu_rcuhead;
+ struct file *fu_next;
} f_u;
struct path f_path;
struct inode *f_inode; /* cached value */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f192cfe..6f704ff 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1477,6 +1477,7 @@ struct task_struct {
struct fs_struct *fs;
/* open file information */
struct files_struct *files;
+ struct file *fput_list;
/* namespaces */
struct nsproxy *nsproxy;
/* signal handlers */
diff --git a/kernel/fork.c b/kernel/fork.c
index 03c1eaa..77c0a50 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,7 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
struct files_struct *oldf, *newf;
int error = 0;
+ tsk->fput_list = NULL;
/*
* A background process may not have any files ...
*/
next prev parent reply other threads:[~2015-09-07 12:29 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-29 2:42 [PATCH] task_work: remove fifo ordering guarantee Eric Dumazet
2015-08-29 3:19 ` Linus Torvalds
2015-08-29 9:22 ` Ingo Molnar
2015-08-29 12:54 ` Oleg Nesterov
2015-08-31 6:02 ` Ingo Molnar
2015-08-31 12:51 ` Oleg Nesterov
2015-08-29 12:49 ` Oleg Nesterov
2015-08-29 13:57 ` Eric Dumazet
2015-08-29 14:11 ` Eric Dumazet
2015-08-29 17:08 ` Linus Torvalds
2015-08-31 5:22 ` yalin wang
2015-09-05 5:19 ` Al Viro
2015-08-31 12:44 ` Oleg Nesterov
2015-09-05 5:12 ` Al Viro
2015-09-05 5:42 ` Al Viro
2015-09-05 20:46 ` Linus Torvalds
2015-08-31 12:05 ` change filp_close() to use __fput_sync() ? (Was: [PATCH] task_work: remove fifo ordering guarantee) Oleg Nesterov
2015-09-05 5:35 ` [PATCH] task_work: remove fifo ordering guarantee Al Viro
2015-09-07 12:27 ` Oleg Nesterov [this message]
2015-09-07 13:49 ` [PATCH? v2] fput: don't abuse task_work_add() too much Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150907122709.GA31811@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maze@google.com \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.