From: Andrew Morton <andrewm@uow.edu.au>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
linux-kernel@vger.kernel.org,
"David S. Miller" <davem@redhat.com>
Subject: Re: Linux 2.2.19pre2
Date: Sun, 24 Dec 2000 16:17:10 +1100 [thread overview]
Message-ID: <3A4586D6.EF357D62@uow.edu.au> (raw)
In-Reply-To: <3A4303AC.C635F671@uow.edu.au>, <3A4303AC.C635F671@uow.edu.au>; <20001222141929.A13032@athlon.random> <3A444CAA.4C5A7A89@uow.edu.au>, <3A444CAA.4C5A7A89@uow.edu.au>; <20001223191159.B29450@athlon.random> <3A454205.D33090A8@uow.edu.au>, <3A454205.D33090A8@uow.edu.au>; <20001224015346.A17046@athlon.random> <3A455F6B.FAC3A4B7@uow.edu.au>, <3A455F6B.FAC3A4B7@uow.edu.au>; from andrewm@uow.edu.au on Sun, Dec 24, 2000 at 01:28:59PM +1100 <20001224052128.A24560@athlon.random>
Andrea Arcangeli wrote:
>
> On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote:
> > This could happen with the old scheme where exclusiveness
> > was stored in the task, not the waitqueue.
> >
> > >From test4:
> >
> > for (;;) {
> > __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE);
> > /* WINDOW */
> > spin_lock_irq(&io_request_lock);
> > rq = get_request(q, rw);
> > spin_unlock_irq(&io_request_lock);
> > if (rq)
> > break;
> > generic_unplug_device(q);
> > schedule();
> > }
> >
> > As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this
> > task becomes visible to __wake_up_common and can soak
> > up a second wakeup. [..]
>
> I disagree, none wakeup could be lost in test4 in the above section of code
> (besides the __set_current_state that should be set_current_state ;) and that's
> not an issue for both x86 and alpha where spin_lock is a full memory barrier).
>
> Let's examine what happens in test4:
>
> [ snip ]
I was talking about a different scenario:
add_wait_queue_exclusive(&q->wait_for_request, &wait);
for (;;) {
__set_current_state(TASK_UNINTERRUPTIBLE);
/* WINDOW */
spin_lock_irq(&io_request_lock);
rq = get_request(q, rw);
spin_unlock_irq(&io_request_lock);
if (rq)
break;
generic_unplug_device(q);
schedule();
}
remove_wait_queue(&q->wait_for_request, &wait);
Suppose there are two tasks sleeping in the schedule().
A wakeup comes. One task wakes. It loops aound and reaches
the window. At this point in time, another wakeup gets sent
to the waitqueue. It gets directed to the task which just
woke up! It should have been directed to a sleeping task,
not the current one which just *looks* like it's sleeping,
because it's in state TASK_UNINTERRUPTIBLE.
This can happen in test4.
> This race is been introduced in test1X because there the task keeps asking for
> an exclusive wakeup even after getting a wakeup: until it has the time to
> cleanup and unregister explicitly.
No, I think it's the same in test4 and test1X. In current kernels
it's still the case that the woken task gets atomically switched into
state TASK_RUNNING, and then becomes "hidden" from the wakeup code.
The problem is, the task bogusly sets itself back into
TASK_UNINTERRUPTIBLE for a very short period and becomes
eligible for another wakeup.
There's not much which ll_rw_blk.c can do about this.
> And btw, 2.2.19pre3 has the same race, while the alternate wake-one
> implementation in 2.2.18aa2 doesn't have it (for the same reasons).
The above scenario can happen in all kernels.
> And /* WINDOW */ is not the only window for such race: the window is the whole
> section where the task is registered in the waitqueue in exclusive mode:
>
> add_wait_queue_exclusive
> /* all is WINDOW here */
> remove_wait_queue
>
> Until remove_wait_queue runs explicitly in the task context any second wakeup
> will keep waking up only such task (so in turn it will be lost). So it should
> trigger very easily under high load since two wakeups will happen much faster
> compared to the task that needs to be rescheduled in order run
> remove_wait_queue and cleanup.
>
> Any deadlocks in test1X could be very well explained by this race condition.
Yes, but I haven't seen a "stuck in D state" report for a while.
I assume this is because this waitqueue gets lots of wakeups sent to it.
The ones in networking I expect are protected by other locking
which serialises the wakeups.
The one in the x86 semaphores avoids it by sending an extra wakeup
to the waitqueue on the way out.
> > Still, changing the wakeup code in the way we've discussed
> > seems the way to go. [..]
>
> I agree. I'm quite convinced it's right way too and of course I see why we
> moved the exclusive information in the waitqueue instead of keeping it in the
> task struct to provide sane semantics in the long run ;).
Linus suggested at one point that we clear the waitqueue's
WQ_FLAG_EXCLUSIVE bit when we wake it up, so it falls back
to non-exclusive mode. This was to address one of the
task-on-two-waitqueues problems...
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2000-12-24 5:42 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-12-16 19:11 Linux 2.2.19pre2 Alan Cox
2000-12-17 10:56 ` Petri Kaukasoina
2000-12-17 15:38 ` Kurt Garloff
2000-12-20 10:32 ` Petri Kaukasoina
2000-12-20 10:44 ` Kurt Garloff
2000-12-20 13:28 ` Andrea Arcangeli
2000-12-20 14:57 ` Andrew Morton
2000-12-20 15:24 ` Andrea Arcangeli
2000-12-20 17:48 ` Rik van Riel
2000-12-20 18:09 ` Andrea Arcangeli
2000-12-21 10:38 ` Andrew Morton
2000-12-21 15:19 ` Andrea Arcangeli
2000-12-21 17:07 ` Rik van Riel
2000-12-21 17:44 ` Andrea Arcangeli
2000-12-21 17:55 ` Rik van Riel
2000-12-22 7:33 ` Andrew Morton
2000-12-22 13:19 ` Andrea Arcangeli
2000-12-23 6:56 ` Andrew Morton
2000-12-23 18:11 ` Andrea Arcangeli
2000-12-24 0:23 ` Andrew Morton
2000-12-24 0:53 ` Andrea Arcangeli
2000-12-24 2:28 ` Andrew Morton
2000-12-24 4:21 ` Andrea Arcangeli
2000-12-24 5:17 ` Andrew Morton [this message]
2000-12-24 14:43 ` Andrea Arcangeli
2000-12-24 15:40 ` Andrea Arcangeli
2000-12-29 17:09 ` wake-one-3 bug (affected 2.2.19pre3aa[123]) Andrea Arcangeli
2000-12-21 20:23 ` Linux 2.2.19pre2 Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3A4586D6.EF357D62@uow.edu.au \
--to=andrewm@uow.edu.au \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=andrea@suse.de \
--cc=davem@redhat.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox