rt_mutex_timed_lock() vs hrtimer_wakeup() race ?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@tv-sign.ru>
To: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Esben Nielsen <nielsen.esben@googlemail.com>
Cc: linux-kernel@vger.kernel.org
Subject: rt_mutex_timed_lock() vs hrtimer_wakeup() race ?
Date: Sun, 30 Jul 2006 08:36:05 +0400	[thread overview]
Message-ID: <20060730043605.GA2894@oleg> (raw)

I am trying to get some understanding of rt_mutex, but I'm afraid it's
not possible without your help...

// runs on CPU 0
rt_mutex_slowlock:

	// main loop
	for (;;) {
		...

			if (timeout && !timeout->task) {
				ret = -ETIMEDOUT;
				break;
			}

		...

		schedule();

		...

		set_current_state(state);
	}

What if timeout->timer is fired on CPU 1 right before set_current_state() ?

hrtimer_wakeup() does:

	timeout->task = NULL;		<----- [1]

	spin_lock(runqueues->lock);

	task->state = TASK_RUNNING;	<----- [2]

(task->array != NULL, so try_to_wake_up() just goes to out_running)

If my understanding correct, [1] may slip into the critical section (because
spin_lock() is not a wmb), so that CPU 0 will see [2] but not [1]. In that
case rt_mutex_slowlock() can miss the timeout (set_current_state()->mb()
doesn't help).

Of course, this race (even _if_ I am right) is pure theoretical, but probably
we need smp_wmb() after [1] in hrtimer_wakeup().

Note that do_nanosleep() is ok, hrtimer_base->lock provides a necessary
serialization. In fact, I think it can use __set_current_state(), because
both hrtimer_start() and run_hrtimer_queue() do lock/unlock of base->lock.

Another question, task_blocks_on_rt_mutex() does get_task_struct() and checks
owner->pi_blocked_on != NULL under owner->pi_lock. Why ? RT_MUTEX_HAS_WAITERS
bit is set, we are holding ->wait_lock, so the 'owner' can't go away until
we drop ->wait_lock. I think we can drop owner->pi_lock right after
__rt_mutex_adjust_prio(owner), we can't miss owner->pi_blocked_on != NULL
if it was true before we take owner->pi_lock, and this is the case we should
worry about, yes?

In other words (because I myself can't parse the paragraph above :), could
you explain me why this patch is not correct:

--- rtmutex.c~	2006-07-30 05:15:38.000000000 +0400
+++ rtmutex.c	2006-07-30 05:41:44.000000000 +0400
@@ -407,7 +407,7 @@ static int task_blocks_on_rt_mutex(struc
 	struct task_struct *owner = rt_mutex_owner(lock);
 	struct rt_mutex_waiter *top_waiter = waiter;
 	unsigned long flags;
-	int boost = 0, res;
+	int res;

 	spin_lock_irqsave(&current->pi_lock, flags);
 	__rt_mutex_adjust_prio(current);
@@ -431,24 +431,20 @@ static int task_blocks_on_rt_mutex(struc
 		plist_add(&waiter->pi_list_entry, &owner->pi_waiters);

 		__rt_mutex_adjust_prio(owner);
-		if (owner->pi_blocked_on) {
-			boost = 1;
-			/* gets dropped in rt_mutex_adjust_prio_chain()! */
-			get_task_struct(owner);
-		}
 		spin_unlock_irqrestore(&owner->pi_lock, flags);
+
+		if (owner->pi_blocked_on)
+			goto boost;
 	}
 	else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) {
-		spin_lock_irqsave(&owner->pi_lock, flags);
-		if (owner->pi_blocked_on) {
-			boost = 1;
-			/* gets dropped in rt_mutex_adjust_prio_chain()! */
-			get_task_struct(owner);
-		}
-		spin_unlock_irqrestore(&owner->pi_lock, flags);
+		if (owner->pi_blocked_on)
+			goto boost;
 	}
-	if (!boost)
-		return 0;
+
+	return 0;
+boost:
+	/* gets dropped in rt_mutex_adjust_prio_chain()! */
+	get_task_struct(owner);

 	spin_unlock(&lock->wait_lock);

----------------------------------------------------------
The same question for remove_waiter()/rt_mutex_adjust_pi().

The last (stupid) one,
wake_up_new_task:

		if (unlikely(!current->array))
			__activate_task(p, rq);

(offtopic) Is it really possible to have current->array == NULL here?

		else {
			p->prio = current->prio;

What if current was pi-boosted so that rt_prio(current->prio) == 1,
who will de-boost the child?

			p->normal_prio = current->normal_prio;

Why? p->normal_prio was calculated by effective_prio() above, could you
explain why that value is not ok?

Thanks,

Oleg.

next             reply	other threads:[~2006-07-30  0:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-30  4:36 Oleg Nesterov [this message]
2006-07-30 22:23 ` rt_mutex_timed_lock() vs hrtimer_wakeup() race ? Steven Rostedt
2006-08-01  0:12   ` Oleg Nesterov
2006-07-31 20:47     ` Steven Rostedt
2006-08-01  7:58 ` Thomas Gleixner
2006-08-01 12:07   ` Steven Rostedt
2006-08-01 12:52     ` Thomas Gleixner
2006-08-01 13:21       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060730043605.GA2894@oleg \
    --to=oleg@tv-sign.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nielsen.esben@googlemail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox