From: Jarek Poplawski <jarkao2@o2.pl>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>, Miklos Szeredi <miklos@szeredi.hu>,
cebbert@redhat.com, chris@atlee.ca, linux-kernel@vger.kernel.org,
tglx@linutronix.de, akpm@linux-foundation.org
Subject: Re: [BUG] long freezes on thinkpad t60
Date: Fri, 22 Jun 2007 12:38:17 +0200 [thread overview]
Message-ID: <20070622103817.GC1778@ff.dom.local> (raw)
In-Reply-To: <alpine.LFD.0.98.0706210851570.3593@woody.linux-foundation.org>
On Thu, Jun 21, 2007 at 09:01:28AM -0700, Linus Torvalds wrote:
>
>
> On Thu, 21 Jun 2007, Jarek Poplawski wrote:
...
> So I don't see how you could possibly having two different CPU's getting
> into some lock-step in that loop: changing "task_rq()" is a really quite
> heavy operation (it's about migrating between CPU's), and generally
> happens at a fairly low frequency (ie "a couple of times a second" kind of
> thing, not "tight CPU loop").
Yes, I've agreed with Ingo it was only one of my illusions...
> But bugs happen..
>
> > Another possible problem could be a result of some wrong optimization
> > or wrong propagation of change of this task_rq(p) value.
>
> I agree, but that kind of bug would likely not cause temporary hangs, but
> actual "the machine is dead" operations. If you get the totally *wrong*
> value due to some systematic bug, you'd be waiting forever for it to
> match, not get into a loop for a half second and then it clearing up..
>
> But I don't think we can throw the "it's another bug" theory _entirely_
> out the window.
Alas, I'm the last here who should talk with you or Ingo about
hardware, but my point is that until it's not 100% proven this
is spinlocks vs. cpu case any nearby possibilities should be
considered. One of them is this loop, which can ... loop. Of
course we can't see any reason for this, but if something is
theoretically allowed it can happen. Here it's enough the
task_rq(p) is for some very unprobable reason (maybe buggy
hardware, code or compiling) cached or read too late, and
maybe it's only on this one only notebook in the world? If
you're sure this is not the case, let's forget about this.
Of course, I don't mean it's a direct optimization. I think
about something that could be triggered by something like this
tested smp_mb() in entirely another place. IMHO, it would be
very interesting to assure if this barrier fixed the spinlock
or maybe some variable.
There is also another interesting question: if it was only about
spinlocks (which may be) why on those watchdog traces there
are mainly two "fighters": wait_task_inactive() and
try_to_wake_up(). It seems there should be seen more clients
of task_rq_lock(). So, my another unlikely idea is: maybe
for some other strange, unprobable but only theoretically
possibility there could be something wrong around this
yield() in wait_task_inactive(). The comment above this
function reads about the task that *will* unschedule.
So, maybe it would be not nice if e.g. during this yield()
something wakes it up?
...
> But it only happens with badly coded software: the rule simply is that you
> MUST NOT release and immediately re-acquire the same spinlock on the same
> core, because as far as other cores are concerned, that's basically the
> same as never releasing it in the first place.
I totally agree! Let's only get certainty there is nothing
more here. BTW, is there any reason not to add some test with
a warning under spinlock debugging against such badly behaving
places?
Regards,
Jarek P.
next prev parent reply other threads:[~2007-06-22 10:30 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-24 12:04 [BUG] long freezes on thinkpad t60 Miklos Szeredi
2007-05-24 12:54 ` Ingo Molnar
2007-05-24 14:03 ` Miklos Szeredi
2007-05-24 14:10 ` Ingo Molnar
2007-05-24 14:28 ` Miklos Szeredi
2007-05-24 14:42 ` Ingo Molnar
2007-05-24 14:44 ` Ingo Molnar
2007-05-24 17:09 ` Miklos Szeredi
2007-05-24 21:01 ` Ingo Molnar
2007-05-25 9:51 ` Miklos Szeredi
2007-06-14 16:04 ` Miklos Szeredi
2007-06-15 21:25 ` Chuck Ebbert
2007-06-16 10:37 ` Ingo Molnar
2007-06-17 21:46 ` Miklos Szeredi
2007-06-18 6:43 ` Ingo Molnar
2007-06-18 7:24 ` Miklos Szeredi
2007-06-18 8:12 ` Ingo Molnar
2007-06-18 8:20 ` Andrew Morton
2007-06-19 4:22 ` Ravikiran G Thirumalai
2007-06-18 8:25 ` Miklos Szeredi
2007-06-18 8:31 ` Ingo Molnar
2007-06-18 8:34 ` Miklos Szeredi
2007-06-18 9:18 ` Ingo Molnar
2007-06-18 9:38 ` Ingo Molnar
2007-06-18 9:44 ` Ingo Molnar
2007-06-18 10:18 ` Miklos Szeredi
2007-06-18 12:36 ` Ingo Molnar
2007-06-18 13:10 ` Miklos Szeredi
2007-06-18 16:34 ` Linus Torvalds
2007-06-18 17:41 ` Miklos Szeredi
2007-06-18 17:48 ` Linus Torvalds
2007-06-18 18:02 ` Ingo Molnar
2007-06-18 18:00 ` Ingo Molnar
2007-06-18 18:25 ` Linus Torvalds
2007-06-20 9:36 ` Jarek Poplawski
2007-06-20 17:34 ` Linus Torvalds
2007-06-21 7:30 ` Ingo Molnar
2007-06-21 15:50 ` Linus Torvalds
2007-06-21 16:08 ` Ingo Molnar
2007-06-21 16:32 ` Linus Torvalds
2007-06-21 16:44 ` Chuck Ebbert
2007-06-21 17:31 ` Linus Torvalds
2007-06-21 18:29 ` Eric Dumazet
2007-06-21 18:44 ` Linus Torvalds
2007-06-21 19:35 ` Linus Torvalds
2007-06-21 20:09 ` Ingo Molnar
2007-06-21 20:14 ` Linus Torvalds
2007-06-21 20:30 ` Ingo Molnar
2007-06-21 20:48 ` Linus Torvalds
2007-06-21 21:06 ` Ingo Molnar
2007-06-21 20:42 ` [patch] spinlock debug: make looping nicer Ingo Molnar
2007-06-21 20:58 ` Linus Torvalds
2007-06-21 21:15 ` Ingo Molnar
2007-06-22 7:00 ` Jarek Poplawski
2007-06-21 20:36 ` [BUG] long freezes on thinkpad t60 Eric Dumazet
2007-06-21 19:56 ` Ingo Molnar
2007-06-21 20:10 ` Linus Torvalds
2007-06-21 20:23 ` Ingo Molnar
2007-06-21 20:12 ` Ingo Molnar
2007-06-26 8:42 ` Nick Piggin
2007-06-26 10:56 ` Jarek Poplawski
2007-06-26 17:23 ` Linus Torvalds
2007-06-27 5:23 ` Nick Piggin
2007-06-27 6:04 ` Linus Torvalds
2007-06-27 6:20 ` Nick Piggin
2007-06-27 19:47 ` Linus Torvalds
2007-06-27 20:10 ` Ingo Molnar
2007-06-27 20:17 ` Davide Libenzi
2007-06-27 22:11 ` Linus Torvalds
2007-06-27 23:30 ` Davide Libenzi
2007-06-28 0:46 ` Linus Torvalds
2007-06-28 3:03 ` Davide Libenzi
2007-07-02 7:06 ` Nick Piggin
2007-06-21 20:16 ` Ingo Molnar
2007-06-22 8:17 ` Ingo Molnar
2007-06-23 10:36 ` Miklos Szeredi
2007-06-23 16:39 ` Linus Torvalds
2007-06-25 6:45 ` Jarek Poplawski
2007-06-21 20:18 ` Ingo Molnar
2007-06-21 20:36 ` Linus Torvalds
2007-06-21 7:38 ` Jarek Poplawski
2007-06-21 8:39 ` Ingo Molnar
2007-06-21 11:09 ` Jarek Poplawski
2007-06-21 16:01 ` Linus Torvalds
2007-06-22 10:38 ` Jarek Poplawski [this message]
2007-05-24 22:08 ` Henrique de Moraes Holschuh
2007-05-24 22:13 ` Kok, Auke
2007-05-25 6:58 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070622103817.GC1778@ff.dom.local \
--to=jarkao2@o2.pl \
--cc=akpm@linux-foundation.org \
--cc=cebbert@redhat.com \
--cc=chris@atlee.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.