From: Oleg Nesterov <oleg@redhat.com>
To: Darren Hart <darren@dvhart.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>
Cc: Jerome Marchand <jmarchan@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Mateusz Guzik <mguzik@redhat.com>,
linux-kernel@vger.kernel.org
Subject: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads
Date: Mon, 2 Feb 2015 15:05:15 +0100 [thread overview]
Message-ID: <20150202140515.GA26398@redhat.com> (raw)
Thomas, et al, could you please review?
The change looks trivial, but I simply can not understand this logic,
please help.
First of all, why exactly do we need this mm/PF_KTHREAD check added by
f0d71b3dcb8332f7971 ? Of course, it is simply wrong to declare a random
kernel thread to be the owner as the changelog says. But why kthread is
worse than a random user-space task, say, /sbin/init?
IIUC, the fact that we can abuse ->pi_state_list is not that bad, no matter
if this (k)thread will exit or not. AFAICS, the only problem is that we can
boost the prio of this thread. Or I missed another problem?
I am asking because we need to backport some fixes, and I am trying to
convince myself that I actually understand what I am trying to do ;)
And another question. Lets forget about this ->mm check. I simply can not
understand this
ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN
logic in attach_to_pi_owner(). First of all, why do we need to retry if
PF_EXITING is set but PF_EXITPIDONE is not? Why we can not simply ignore
PF_EXITING and rely on exit_pi_state_list() if PF_EXITPIDONE is not set?
I must have missed something but this looks buggy, I do not see any
preemption point in this "retry" loop. Suppose that max_cpus=1 and rt_task()
preempts the non-rt PF_EXITING owner. Looks like futex_lock_pi() can spin
forever in this case? (OK, ignoring RT throttling).
IOW. Could you explain why the patch below is wrong correctness-wise?
Lets ignore the fact it is obviously ugly and suboptimal...
Thanks,
Oleg.
--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -684,13 +684,7 @@ void do_exit(long code)
if (unlikely(tsk->flags & PF_EXITING)) {
pr_alert("Fixing recursive fault but reboot is needed!\n");
/*
- * We can do this unlocked here. The futex code uses
- * this flag just to verify whether the pi state
- * cleanup has been done or not. In the worst case it
- * loops once more. We pretend that the cleanup was
- * done as there is no way to return. Either the
- * OWNER_DIED bit is set by now or we push the blocked
- * task into the wait for ever nirwana as well.
+ * XXX: ADD COMMENT
*/
tsk->flags |= PF_EXITPIDONE;
set_current_state(TASK_UNINTERRUPTIBLE);
@@ -699,8 +693,7 @@ void do_exit(long code)
exit_signals(tsk); /* sets PF_EXITING */
/*
- * tsk->flags are checked in the futex code to protect against
- * an exiting task cleaning up the robust pi futexes.
+ * XXX: anobody else needs this ?
*/
smp_mb();
raw_spin_unlock_wait(&tsk->pi_lock);
@@ -779,12 +772,6 @@ void do_exit(long code)
* Make sure we are holding no locks:
*/
debug_check_no_locks_held();
- /*
- * We can do this unlocked here. The futex code uses this flag
- * just to verify whether the pi state cleanup has been done
- * or not. In the worst case it loops once more.
- */
- tsk->flags |= PF_EXITPIDONE;
if (tsk->io_context)
exit_io_context(tsk);
--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -803,8 +803,7 @@ void mm_release(struct task_struct *tsk,
tsk->compat_robust_list = NULL;
}
#endif
- if (unlikely(!list_empty(&tsk->pi_state_list)))
- exit_pi_state_list(tsk);
+ exit_pi_state_list(tsk);
#endif
uprobe_free_utask(tsk);
--- x/kernel/futex.c
+++ x/kernel/futex.c
@@ -722,6 +722,7 @@ void exit_pi_state_list(struct task_stru
* versus waiters unqueueing themselves:
*/
raw_spin_lock_irq(&curr->pi_lock);
+ curr->flags |= PF_EXITPIDONE;
while (!list_empty(head)) {
next = head->next;
@@ -912,17 +913,15 @@ static int attach_to_pi_owner(u32 uval,
* p->pi_lock:
*/
raw_spin_lock_irq(&p->pi_lock);
- if (unlikely(p->flags & PF_EXITING)) {
+ if (unlikely(p->flags & PF_EXITPIDONE)) {
/*
* The task is on the way out. When PF_EXITPIDONE is
* set, we know that the task has finished the
* cleanup:
*/
- int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
-
raw_spin_unlock_irq(&p->pi_lock);
put_task_struct(p);
- return ret;
+ return -ESRCH;
}
/*
next reply other threads:[~2015-02-02 14:06 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-02 14:05 Oleg Nesterov [this message]
2015-02-02 14:05 ` [PATCH 1/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads Oleg Nesterov
2015-02-04 10:48 ` Peter Zijlstra
2015-02-14 18:01 ` Davidlohr Bueso
2015-02-14 20:57 ` Oleg Nesterov
2015-02-14 21:15 ` Davidlohr Bueso
2015-02-14 21:54 ` Oleg Nesterov
2015-02-18 17:11 ` [tip:locking/core] locking/futex: Check " tip-bot for Oleg Nesterov
2015-02-02 15:11 ` [PATCH 0/1] futex: check " Peter Zijlstra
2015-02-02 15:13 ` Peter Zijlstra
2015-02-02 15:14 ` Peter Zijlstra
2015-02-02 16:20 ` Oleg Nesterov
2015-02-03 20:09 ` Oleg Nesterov
2015-02-04 11:12 ` Peter Zijlstra
2015-02-04 20:25 ` Oleg Nesterov
2015-02-05 16:27 ` Peter Zijlstra
2015-02-05 18:10 ` Oleg Nesterov
2015-02-06 10:46 ` Peter Zijlstra
2015-02-06 17:04 ` Oleg Nesterov
2015-02-09 20:38 ` Darren Hart
2015-02-10 11:14 ` Oleg Nesterov
2015-02-16 20:13 ` [PATCH 0/1] futex: don't spin waiting for PF_EXITING -> PF_EXITPIDONE transition Oleg Nesterov
2015-02-16 20:13 ` [PATCH 1/1] " Oleg Nesterov
2015-02-27 9:52 ` Peter Zijlstra
2015-02-27 11:54 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150202140515.GA26398@redhat.com \
--to=oleg@redhat.com \
--cc=darren@dvhart.com \
--cc=jmarchan@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lwoodman@redhat.com \
--cc=mguzik@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox