From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
a.p.zijlstra@chello.nl, y-goto@jp.fujitsu.com,
akpm@linux-foundation.org, tglx@linutronix.de, mingo@elte.hu,
linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/core] sched: Fix ancient race in do_exit()
Date: Sun, 29 Jan 2012 19:59:49 +0100 [thread overview]
Message-ID: <20120129185949.GA27192@redhat.com> (raw)
In-Reply-To: <CA+55aFz20=H0ZubZEAmEbv0gL8j3C5Kgt4m2+5MU63uVipZt0Q@mail.gmail.com>
On 01/29, Linus Torvalds wrote:
>
> But yes, if you're talking about TASK_UNINTERRUPTIBLE, we do need to
> just remove the setting of that entirely. It needs to be set *before*
> adding us to the list, not after. That's just a bug - we get woken up
> when we've been given the lock.
Yes, I think this should work although I am not familiar with this code.
If we remove set_task_state() from the main waiting loop we can never race
with __rwsem_do_wake()->try_to_wake_up() seeing us in UNINTERRUPTIBLE state.
rwsem_down_failed_common() simply can't return until UNINTERRUPTIBLE->RUNNING
transition is finished (__rwsem_do_wake does wakeup first).
And since we do not play with current->state after spin_unlock(), it is
fine to "race" with waiter->task clearing, just we can do the unnecessary
but harmless schedule() in TASK_RUNNING.
> So it may be completely and utterly broken for some subtle reason,
Well, what about another spurious wakeup from somewhere? In this case
rwsem_down_failed_common() will do a busy-wait loop.
> @@ -92,10 +92,9 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type)
> */
> list_del(&waiter->list);
> tsk = waiter->task;
> + wake_up_process(tsk);
> smp_mb();
> waiter->task = NULL;
OK, now I understand why do we need "clear after wakeup".
But then I don't really understand this mb, perhaps wmb() is enough?
Afaics we only need to ensure we change waiter->task after changing
task's state.
OTOH,
> @@ -183,7 +181,6 @@ rwsem_down_failed_common(struct rw_semaphore *sem,
> raw_spin_lock_irq(&sem->wait_lock);
> waiter.task = tsk;
> waiter.flags = flags;
> - get_task_struct(tsk);
>
> if (list_empty(&sem->wait_list))
> adjustment += RWSEM_WAITING_BIAS;
> @@ -211,11 +208,8 @@ rwsem_down_failed_common(struct rw_semaphore *sem,
> if (!waiter.task)
> break;
> schedule();
> - set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> }
>
> - tsk->state = TASK_RUNNING;
> -
> return sem;
> }
Suppose that this task does tsk->state = TASK_WHATEVER after that.
It seems that we need mb() before return, otherwise the next ->state
change can be reordered with "if (!waiter.task)" above. Or not?
Oleg.
next prev parent reply other threads:[~2012-01-29 19:06 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-22 0:42 [BUG] TASK_DEAD task is able to be woken up in special condition Yasunori Goto
2011-12-22 2:14 ` KOSAKI Motohiro
2011-12-22 8:22 ` Yasunori Goto
2011-12-22 20:02 ` KOSAKI Motohiro
2011-12-23 9:49 ` Peter Zijlstra
2011-12-23 15:41 ` Oleg Nesterov
2011-12-26 8:23 ` Yasunori Goto
2011-12-26 17:11 ` Oleg Nesterov
2011-12-27 6:48 ` Yasunori Goto
2012-01-06 10:22 ` Yasunori Goto
2012-01-06 11:01 ` Peter Zijlstra
2012-01-06 12:01 ` Yasunori Goto
2012-01-06 12:43 ` Peter Zijlstra
2012-01-06 14:12 ` Oleg Nesterov
2012-01-06 14:19 ` Oleg Nesterov
2012-01-07 1:31 ` Yasunori Goto
2012-01-16 11:51 ` Yasunori Goto
2012-01-16 13:38 ` Peter Zijlstra
2012-01-17 8:40 ` Yasunori Goto
2012-01-17 9:06 ` Ingo Molnar
2012-01-17 15:12 ` Oleg Nesterov
2012-01-18 9:42 ` Ingo Molnar
2012-01-18 14:20 ` Oleg Nesterov
2012-01-24 10:19 ` Peter Zijlstra
2012-01-24 10:55 ` Peter Zijlstra
2012-01-24 17:25 ` KOSAKI Motohiro
2012-01-25 15:45 ` Oleg Nesterov
2012-01-25 16:51 ` Peter Zijlstra
2012-01-25 17:43 ` Oleg Nesterov
2012-01-26 15:32 ` Peter Zijlstra
2012-01-26 16:26 ` Oleg Nesterov
2012-01-27 8:59 ` Peter Zijlstra
2012-01-24 10:11 ` Peter Zijlstra
2012-01-26 9:39 ` Ingo Molnar
2012-01-28 12:03 ` [tip:sched/core] sched: Fix ancient race in do_exit() tip-bot for Yasunori Goto
2012-01-28 21:12 ` Linus Torvalds
2012-01-29 16:07 ` Oleg Nesterov
2012-01-29 17:44 ` Linus Torvalds
2012-01-29 18:28 ` Linus Torvalds
2012-01-29 18:59 ` Oleg Nesterov [this message]
2012-01-30 16:27 ` Linus Torvalds
2012-01-06 13:48 ` [BUG] TASK_DEAD task is able to be woken up in special condition Oleg Nesterov
2011-12-28 21:07 ` KOSAKI Motohiro
2012-01-24 10:23 ` Peter Zijlstra
2012-01-24 18:01 ` KOSAKI Motohiro
2012-01-25 6:15 ` Mike Galbraith
2012-01-26 21:24 ` KOSAKI Motohiro
2012-01-25 10:10 ` Peter Zijlstra
2012-01-26 20:25 ` [tip:sched/urgent] sched: Fix rq->nr_uninterruptible update race tip-bot for Peter Zijlstra
2012-01-27 5:20 ` Rakib Mullick
2012-01-27 8:19 ` Peter Zijlstra
2012-01-27 14:11 ` Rakib Mullick
2012-01-26 21:21 ` [BUG] TASK_DEAD task is able to be woken up in special condition KOSAKI Motohiro
2012-01-27 8:21 ` Peter Zijlstra
2011-12-26 6:52 ` Yasunori Goto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120129185949.GA27192@redhat.com \
--to=oleg@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=y-goto@jp.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.