Re: [PATCH] semaphore fairness patch against test11-pre6

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <andrewm@uow.edu.au>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Christoph Rohland <cr@sap.com>,
	David Mansfield <lkml@dm.ultramaster.com>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] semaphore fairness patch against test11-pre6
Date: Tue, 21 Nov 2000 00:39:29 +1100	[thread overview]
Message-ID: <3A192991.12BC2EC2@uow.edu.au> (raw)
In-Reply-To: <3A17CCB4.19416026@uow.edu.au> <Pine.LNX.4.10.10011191008150.1934-100000@penguin.transmeta.com>

Linus Torvalds wrote:
> 
> ...
> 
> I'll think about this some more. One thing I noticed is that the
> "wake_up(&sem->wait);" at the end of __down() is kind of bogus: we don't
> actually want to wake anybody up at that point at all, it's just that if
> we don't wake anybody up we'll end up having "sem = 0, sleeper = 0", and
> when we unlock the semaphore the "__up()" logic won't trigger, and we
> won't ever wake anybody up. That's just incredibly bogus.

There's another reason why we need the wakeup in the tail of __down():

1	void __down(struct semaphore * sem)
2	{
3		struct task_struct *tsk = current;
4		DECLARE_WAITQUEUE(wait, tsk);
5		tsk->state = TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE;
6		add_wait_queue_exclusive(&sem->wait, &wait);
7
8		spin_lock_irq(&semaphore_lock);
9		sem->sleepers++;
10		for (;;) {
11			int sleepers = sem->sleepers;
12
13			/*
14			 * Add "everybody else" into it. They aren't
15			 * playing, because we own the spinlock.
16			 */
17			if (!atomic_add_negative(sleepers - 1, &sem->count)) {
18				sem->sleepers = 0;
19				break;
20			}
21			sem->sleepers = 1;	/* us - see -1 above */
22			spin_unlock_irq(&semaphore_lock);
23
24			schedule();
25			tsk->state = TASK_UNINTERRUPTIBLE;
26			spin_lock_irq(&semaphore_lock);
27		}
28		spin_unlock_irq(&semaphore_lock);
29		remove_wait_queue(&sem->wait, &wait);
30		tsk->state = TASK_RUNNING;
31		wake_up(&sem->wait);
32	}

Suppose two tasks A and B are sleeping on the semaphore and somebody
does an up().  Task A wakes and sets TASK_UNINTERRUPTIBLE.

Task A then executes statements 26, 11, 17, 18, 19, 28 and 29 while
it's on the head of the exclusive waitqueue in state
TASK_UNINTERRUPTIBLE.  If during this, another CPU does an up() on the
semaphore the wakeup will be directed to Task A and will be lost.

Hence Task A must do the wakeup once it has removed itself from the
waitqueue just in case this happened.  Did anyone ever mention that
exclusive wakeups should atomically remove tasks from the waitqueue? :)


> Instead of the "wake_up()" at the end of __down(), we should have
> something like this at the end of __down() instead:
> 
>                         ... for-loop ...
>                 }
>                 tsk->state = TASK_RUNNING;
>                 remove_wait_queue(&sem->wait, &wait);
> 
>                 /* If there are others, mark the semaphore active */
>                 if (wait_queue_active(&sem_wait)) {
>                         atomic_dec(&sem->count);
>                         sem->sleepers = 1;
>                 }
>                 spin_unlock_irq(&semaphore_lock);
>         }
> 
> which would avoid an unnecessary reschedule, and cause the wakeup to
> happen at the proper point, namely "__up()" when we release the
> semaphore.

This won't help, because in the failure scenario we never get to
exit the 'for' loop.  Read on...

Here's why the `goto inside' thingy isn't working:

Assume the semaphore is free:

Task A does down():
-------------------

count->0
sleepers=0

Task B does down():
-------------------

count: 0 -> -1
sleepers: 0->1
if (sleepers > 1) is false
atomic_add_negative(1-1, -1): count -1 -> -1.
`if' fails.
sleepers: 1->1
schedule()

Task C does down():
-------------------

count: -1 -> -2
sleepers: 1->2
if (sleepers > 1) is true
schedule()

Someone does up():
------------------

(reminder: count=-2, sleepers=2)
count: -2 -> -1
wakeup()
-FINISHED-

Task C, CPU0					CPU1

wakes up
tsk->state = TASK_UNINTERRUPTIBLE
spin_lock
						down()
						count: -1->-2
						spins....
(reminder: sleepers=2, count=-2)
atomic_add_negative(2-1, -2): count->-1
aargh! `if' fails!
sleepers->1
spin_unlock()
schedule()
						unspins....
						sem->sleepers++ sleepers: 1->2
						if (sleepers > 1) is true
						aargh! atomic_add_negative test would have succeeded!
						spin_unlock()
						schedule()


	** Nobody woke up **


Someone does up(), but assume CPU1 gets the lock
------------------------------------------------

(reminder: count=-2, sleepers=2)

count: -2 -> -1
wakeup()
-FINISHED-

Task C, CPU0					CPU1

wakes up
tsk->state = TASK_UNINTERRUPTIBLE
						down()
						count: -1->-2
						spin_lock
spins...
						sem->sleepers++ sleepers: 2->3
						if (sleepers > 1) is true
						spin_unlock()
						schedule()
unspins...
(reminder: sleepers=3, count=-2)
atomic_add_negative(3-1, -2): count->0
'if' succeeds
sleepers->1
spin_unlock()
schedule()

	** OK, that worked, and it was "fair" **

Here's the story with the current (test11) implementation
=========================================================

Task A does down():
-------------------

count->0
sleepers=0

Task B does down():
-------------------

count: 0 -> -1
sleepers: 0->1
atomic_add_negative: count -1 -> -1. false.
sleepers->1
schedule()

Task C does down():
-------------------

count: -1 -> -2
sleepers: 1->2
atomic_add_negative: count -2 -> -1. false
sleepers->1
schedule()

Someone does up():
------------------

count: -1 -> 0
wakeup()
-FINISHED-

Task C, CPU0					CPU1

wakes up
tsk->state = TASK_UNINTERRUPTIBLE
spin_lock
						down()
						count: 0 -> -1
						spins....
(reminder: count=-1, sleepers=1)
atomic_add_negative(1-1, -1): count->-1
`if' fails.
sleepers->1
spin_unlock()
schedule()
						unspins...
						sem->sleepers++: 1->2
						atomic_add_negative(2-1, -1): count: -1->0
						'if' is true.
						sleepers->0
						break
						<runs>

	** OK, that worked too **
						

> I suspect this may be part of the trouble with the "sleepers" count
> playing: we had these magic rules that I know I thought about when the
> code was written, but that aren't really part of the "real" rules.

Linus, this code is too complex.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

     prev parent reply	other threads:[~2000-11-20 14:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-11-18  1:01 [PATCH] semaphore fairness patch against test11-pre6 David Mansfield
2000-11-18  9:45 ` Christoph Rohland
2000-11-19  1:12   ` Andrew Morton
2000-11-19  1:47     ` Linus Torvalds
2000-11-19 12:51       ` Andrew Morton
2000-11-19 18:46         ` Linus Torvalds
2000-11-20 13:39           ` Andrew Morton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A192991.12BC2EC2@uow.edu.au \
    --to=andrewm@uow.edu.au \
    --cc=cr@sap.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@dm.ultramaster.com \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox