* Re: 2.6.12 hangs on boot
[not found] <200506221813.50385.gluk@php4.ru>
@ 2005-06-24 22:20 ` Linus Torvalds
2005-07-07 14:18 ` Alexander Y. Fomichev
2005-07-18 11:27 ` Alexander Y. Fomichev
0 siblings, 2 replies; 5+ messages in thread
From: Linus Torvalds @ 2005-06-24 22:20 UTC (permalink / raw)
To: Alexander Y. Fomichev; +Cc: Kernel Mailing List, admin, Git Mailing List
On Wed, 22 Jun 2005, Alexander Y. Fomichev wrote:
>
> I've been trying to switch from 2.6.12-rc3 to 2.6.12 on Dual EM64T 2.8 GHz
> [ MoBo: Intel E7520, intel 82801 ]
> but kernel hangs on boot right after records:
>
> Booting processor 2/1 rip 6000 rsp ffff8100023dbf58
> Initializing CPU#2
Hmm.. Since you seem to be a git user, maybe you could try the git
"bisect" thing to help narrow down exactly where this happened (and help
test that thing too ;).
You can basically use git to find the half-way point between a set of
"known good" points and a "known bad" point ("bisecting" the set of
commits), and doing just a few of those should give us a much better view
of where things started going wrong.
For example, since you know that 2.6.12-rc3 is good, and 2.6.12 is bad,
you'd do
git-rev-list --bisect v2.6.12 ^v2.6.12-rc3
where the "v2.6.12 ^v2.6.12-rc3" thing basically means "everything in
v2.6.12 but _not_ in v2.6.12-rc3" (that's what the ^ marks), and the
"--bisect" flag just asks git-rev-list to list the middle-most commit,
rather than all the commits in between those kernel versions.
You should get the answer "0e6ef3e02b6f07e37ba1c1abc059f8bee4e0847f", but
before you go any further, just make sure your git index is all clean:
git status
should not print anything else than "nothing to commit". If so, then
you're ready to try the new "mid-point" head:
git-rev-list --bisect v2.6.12 ^v2.6.12-rc3 > .git/refs/heads/try1
git checkout try1
which will create a new branch called "try1", where the head is that
"mid-point", and it will switch to that branch (this requires a fairly
recent "git", btw, so make sure you update your git first).
Then, compile that kernel, and try it out.
Now, there are two possibilities: either "try1" ends up being good, or it
still shows the bug. If it is a buggy kernel, then you now have a new
"bad" point, and you do
git-rev-list --bisect try1 ^v2.6.12-rc3 > .git/refs/heads/try2
git checkout try2
which is all the same thing as you did before, except now we use "try1" as
the known bad one rather than v2.6.12 (and we call the new branch "try2"
of course).
However, if that "try1" is _good_, and doesn't show the bug, then you
shouldn't replace the other "known good" case, but instead you should add
it to the list of good commits (aka commits we don't want to know about):
git-rev-list --bisect v2.6.12 ^v2.6.12-rc3 ^try1 > .git/refs/heads/try2
git checkout try2
ie notice how we now say: want to get the bisection of the commits in
v2.6.12 (known bad) but _not_ in either of v2.6.12-rc3 or the 'try1'
branch (which are known good).
After compiling and testing a few kernels, you will have narrowed the
range down a _lot_, and at some point you can just say
git-rev-list --pretty try4 ^v2.6.12-rc3 ^try1 ^try3
(or however the "success/failure" pattern ends up being - the above
example line assumes that "try1" didn't have the bug, but "try2" did, and
then "try3" was ok again but "try4" was buggy), and you'll get a fairly
small list of commits that are the potential "bad" ones.
After the above four tries, you'd have limited it down to a list of 95
changes (from the original 1520), so it would really be best to try six or
seven different kernels, but at that point you'd have it down to less than
20 commits and then pinpointing the bug is usually much easier.
And when you're done, you can just do
git checkout master
and you're back to where you started.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.12 hangs on boot
2005-06-24 22:20 ` 2.6.12 hangs on boot Linus Torvalds
@ 2005-07-07 14:18 ` Alexander Y. Fomichev
2005-07-18 11:27 ` Alexander Y. Fomichev
1 sibling, 0 replies; 5+ messages in thread
From: Alexander Y. Fomichev @ 2005-07-07 14:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List, admin, Git Mailing List
On Saturday 25 June 2005 02:20, Linus Torvalds wrote:
> On Wed, 22 Jun 2005, Alexander Y. Fomichev wrote:
> > I've been trying to switch from 2.6.12-rc3 to 2.6.12 on Dual EM64T 2.8
> > GHz [ MoBo: Intel E7520, intel 82801 ]
> > but kernel hangs on boot right after records:
> >
> > Booting processor 2/1 rip 6000 rsp ffff8100023dbf58
> > Initializing CPU#2
>
> Hmm.. Since you seem to be a git user, maybe you could try the git
> "bisect" thing to help narrow down exactly where this happened (and help
> test that thing too ;).
>
> You can basically use git to find the half-way point between a set of
> "known good" points and a "known bad" point ("bisecting" the set of
> commits), and doing just a few of those should give us a much better view
> of where things started going wrong.
>
> For example, since you know that 2.6.12-rc3 is good, and 2.6.12 is bad,
> you'd do
>
> git-rev-list --bisect v2.6.12 ^v2.6.12-rc3
>
> where the "v2.6.12 ^v2.6.12-rc3" thing basically means "everything in
> v2.6.12 but _not_ in v2.6.12-rc3" (that's what the ^ marks), and the
> "--bisect" flag just asks git-rev-list to list the middle-most commit,
> rather than all the commits in between those kernel versions.
>
> You should get the answer "0e6ef3e02b6f07e37ba1c1abc059f8bee4e0847f", but
> before you go any further, just make sure your git index is all clean:
>
> git status
>
> should not print anything else than "nothing to commit". If so, then
> you're ready to try the new "mid-point" head:
>
> git-rev-list --bisect v2.6.12 ^v2.6.12-rc3 > .git/refs/heads/try1
> git checkout try1
>
> which will create a new branch called "try1", where the head is that
> "mid-point", and it will switch to that branch (this requires a fairly
> recent "git", btw, so make sure you update your git first).
>
> Then, compile that kernel, and try it out.
>
> Now, there are two possibilities: either "try1" ends up being good, or it
> still shows the bug. If it is a buggy kernel, then you now have a new
> "bad" point, and you do
>
> git-rev-list --bisect try1 ^v2.6.12-rc3 > .git/refs/heads/try2
> git checkout try2
>
> which is all the same thing as you did before, except now we use "try1" as
> the known bad one rather than v2.6.12 (and we call the new branch "try2"
> of course).
>
> However, if that "try1" is _good_, and doesn't show the bug, then you
> shouldn't replace the other "known good" case, but instead you should add
> it to the list of good commits (aka commits we don't want to know about):
>
> git-rev-list --bisect v2.6.12 ^v2.6.12-rc3 ^try1 > .git/refs/heads/try2
> git checkout try2
>
> ie notice how we now say: want to get the bisection of the commits in
> v2.6.12 (known bad) but _not_ in either of v2.6.12-rc3 or the 'try1'
> branch (which are known good).
>
> After compiling and testing a few kernels, you will have narrowed the
> range down a _lot_, and at some point you can just say
>
> git-rev-list --pretty try4 ^v2.6.12-rc3 ^try1 ^try3
>
> (or however the "success/failure" pattern ends up being - the above
> example line assumes that "try1" didn't have the bug, but "try2" did, and
> then "try3" was ok again but "try4" was buggy), and you'll get a fairly
> small list of commits that are the potential "bad" ones.
>
> After the above four tries, you'd have limited it down to a list of 95
> changes (from the original 1520), so it would really be best to try six or
> seven different kernels, but at that point you'd have it down to less than
> 20 commits and then pinpointing the bug is usually much easier.
>
> And when you're done, you can just do
>
> git checkout master
>
> and you're back to where you started.
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Thank you for your answer, i've been on vacations last two weeks,
and i didn't have an access to my mail account.
Hmmm... it seems that 'bisect' method not applicable to this host, this
is production server, not so critical to one or two reboots but 'bisect' will
require much more, i suspect. I've another host, nearly the same as of
hardware and non-critical where such tests could be done , but i haven't a
serial console on it as now. It takes some time to link console because both
of this are remote hosts.
--
Best regards.
Alexander Y. Fomichev <gluk@php4.ru>
Public PGP key: http://sysadminday.org.ru/gluk.asc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.12 hangs on boot
2005-06-24 22:20 ` 2.6.12 hangs on boot Linus Torvalds
2005-07-07 14:18 ` Alexander Y. Fomichev
@ 2005-07-18 11:27 ` Alexander Y. Fomichev
2005-07-18 12:58 ` Andi Kleen
1 sibling, 1 reply; 5+ messages in thread
From: Alexander Y. Fomichev @ 2005-07-18 11:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List, admin, Git Mailing List, ak
On Saturday 25 June 2005 02:20, Linus Torvalds wrote:
> On Wed, 22 Jun 2005, Alexander Y. Fomichev wrote:
> > I've been trying to switch from 2.6.12-rc3 to 2.6.12 on Dual EM64T 2.8
> > GHz [ MoBo: Intel E7520, intel 82801 ]
> > but kernel hangs on boot right after records:
> >
> > Booting processor 2/1 rip 6000 rsp ffff8100023dbf58
> > Initializing CPU#2
>
> Hmm.. Since you seem to be a git user, maybe you could try the git
> "bisect" thing to help narrow down exactly where this happened (and help
> test that thing too ;).
[skiped]
Ok, as i can see [and as Andi guessed
http://bugme.osdl.org/show_bug.cgi?id=4792]
issue have been introduced by new TSC sync algorithm
git id: dda50e716dc9451f40eebfb2902c260e4f62cf34.
And, yes, seems like it depends of timings...
In my case kludge with insertion of low delay (e.g. printk) between
cpu_set/mb and tsc_sync_wait() makes kernel bootable.
diff -urN b/arch/x86_64/kernel/smpboot.c a/arch/x86_64/kernel/smpboot.c
--- b/arch/x86_64/kernel/smpboot.c 2005-07-17 21:55:55.000000000 +0400
+++ a/arch/x86_64/kernel/smpboot.c 2005-07-17 21:57:56.000000000 +0400
@@ -451,6 +451,7 @@
cpu_set(smp_processor_id(), cpu_online_map);
mb();
+ printk(KERN_INFO "We're still here!\n");
/* Wait for TSC sync to not schedule things before.
We still process interrupts, which could see an inconsistent
time in that window unfortunately. */
--
Best regards.
Alexander Y. Fomichev <gluk@php4.ru>
Public PGP key: http://sysadminday.org.ru/gluk.asc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.12 hangs on boot
2005-07-18 11:27 ` Alexander Y. Fomichev
@ 2005-07-18 12:58 ` Andi Kleen
2005-07-19 11:53 ` Alexander Y. Fomichev
0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2005-07-18 12:58 UTC (permalink / raw)
To: Alexander Y. Fomichev
Cc: Linus Torvalds, Kernel Mailing List, admin, Git Mailing List, ak
Can you please test if this patch fixes it?
-Andi
Don't compare linux processor index with APICID
Fixes boot up lockups on some machines where CPU apic ids
don't start with 0
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -211,7 +211,7 @@ static __cpuinit void sync_master(void *
{
unsigned long flags, i;
- if (smp_processor_id() != boot_cpu_id)
+ if (smp_processor_id() != 0)
return;
go[MASTER] = 0;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.12 hangs on boot
2005-07-18 12:58 ` Andi Kleen
@ 2005-07-19 11:53 ` Alexander Y. Fomichev
0 siblings, 0 replies; 5+ messages in thread
From: Alexander Y. Fomichev @ 2005-07-19 11:53 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linus Torvalds, Kernel Mailing List, admin, Git Mailing List
On Monday 18 July 2005 16:58, Andi Kleen wrote:
> Can you please test if this patch fixes it?
>
> -Andi
>
>
> Don't compare linux processor index with APICID
>
> Fixes boot up lockups on some machines where CPU apic ids
> don't start with 0
>
> Signed-off-by: Andi Kleen <ak@suse.de>
>
> Index: linux/arch/x86_64/kernel/smpboot.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/smpboot.c
> +++ linux/arch/x86_64/kernel/smpboot.c
> @@ -211,7 +211,7 @@ static __cpuinit void sync_master(void *
> {
> unsigned long flags, i;
>
> - if (smp_processor_id() != boot_cpu_id)
> + if (smp_processor_id() != 0)
> return;
>
> go[MASTER] = 0;
No, sorry, the same result -- hangs just after:
Booting processor 2/1 rip 6000 rsp ffff8100dff7df58
Initializing CPU#2
(hmm... as i can see one string above [and if i understand correctly]
boot_cpu_id == 0 in my case:
CPU 1: Syncing TSC to CPU 0 )
--
Best regards.
Alexander Y. Fomichev <gluk@php4.ru>
Public PGP key: http://sysadminday.org.ru/gluk.asc
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-07-19 11:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200506221813.50385.gluk@php4.ru>
2005-06-24 22:20 ` 2.6.12 hangs on boot Linus Torvalds
2005-07-07 14:18 ` Alexander Y. Fomichev
2005-07-18 11:27 ` Alexander Y. Fomichev
2005-07-18 12:58 ` Andi Kleen
2005-07-19 11:53 ` Alexander Y. Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).