public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.16-rc6-git[12] spontaneous reboots on x86_64
@ 2006-03-14 11:55 Andrew Clayton
  2006-03-14 15:30 ` Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Clayton @ 2006-03-14 11:55 UTC (permalink / raw)
  To: linux-kernel

Hi,

With the above kernels I am seeing spontaneous system reboots. Nothing
seems to get logged anywhere and when I've been at the console I haven't
noticed any oops or anything before the machine resets.

This was first triggered by accessing a usb key drive thing, this
happened a couple of times and then this morning while investigating
some more it happened as I was exiting my X session.  

The machine is an AMD Athlon(tm) 64 Processor 3500+ (Single processor,
single core), with 1GB RAM. GCC is gcc (GCC) 4.0.2 20051125 (Red Hat
4.0.2-8) from Fedora Core 4


2.6.16-rc6 is working fine.


The following change looked an obvious candidate

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c33d4568aca9028a22857f94f5e0850012b6444b

So I took a 2.6.16-rc6-git2 tree and reverted arch/x86_64/kernel/entry.S
to the one in 2.6.16-rc6 and so far (35 minutes) no problems.



Let me know if you'd like any more info.


Cheers,

Andrew



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 11:55 2.6.16-rc6-git[12] spontaneous reboots on x86_64 Andrew Clayton
@ 2006-03-14 15:30 ` Hugh Dickins
  2006-03-14 15:39   ` Andi Kleen
  2006-03-14 16:06   ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Hugh Dickins @ 2006-03-14 15:30 UTC (permalink / raw)
  To: Andrew Clayton; +Cc: Linus Torvalds, Andrew Morton, Andi Kleen, linux-kernel

On Tue, 14 Mar 2006, Andrew Clayton wrote:
> 
> With the above kernels I am seeing spontaneous system reboots. Nothing
> seems to get logged anywhere and when I've been at the console I haven't
> noticed any oops or anything before the machine resets.
> 
> This was first triggered by accessing a usb key drive thing, this
> happened a couple of times and then this morning while investigating
> some more it happened as I was exiting my X session.  
> 
> The machine is an AMD Athlon(tm) 64 Processor 3500+ (Single processor,
> single core), with 1GB RAM. GCC is gcc (GCC) 4.0.2 20051125 (Red Hat
> 4.0.2-8) from Fedora Core 4
> 
> 
> 2.6.16-rc6 is working fine.
> 
> 
> The following change looked an obvious candidate
> 
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c33d4568aca9028a22857f94f5e0850012b6444b
> 
> So I took a 2.6.16-rc6-git2 tree and reverted arch/x86_64/kernel/entry.S
> to the one in 2.6.16-rc6 and so far (35 minutes) no problems.

Yep, that one's a turkey, definitely something for Linus to revert.

Seeing your report, I gave 2.6.16-rc6-git2 a try at concurrent kernel
builds on dual HT EM64T: collapsed in all kinds of weird page table
corruption or slab corruption within minutes, three boots in a row.
Backed out that patch and it's going fine for half an hour now.

Andi, if you've a replacement patch you'd like everybody to test,
please post: I for one will surely give it a try.

Hugh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 15:30 ` Hugh Dickins
@ 2006-03-14 15:39   ` Andi Kleen
  2006-03-14 16:27     ` Hugh Dickins
  2006-03-14 16:06   ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2006-03-14 15:39 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Clayton, Linus Torvalds, Andrew Morton, linux-kernel

On Tuesday 14 March 2006 16:30, Hugh Dickins wrote:
> Andi, if you've a replacement patch you'd like everybody to test,
> please post: I for one will surely give it a try.

Hrm, it worked on my test machine. 

But what happens when you just revert the last hunk (the stub_execve change)?

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 15:30 ` Hugh Dickins
  2006-03-14 15:39   ` Andi Kleen
@ 2006-03-14 16:06   ` Linus Torvalds
  2006-03-14 16:24     ` Andrew Clayton
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2006-03-14 16:06 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Clayton, Andrew Morton, Andi Kleen, linux-kernel



On Tue, 14 Mar 2006, Hugh Dickins wrote:
>
> Yep, that one's a turkey, definitely something for Linus to revert.

Reverted. Let's get wider testing before applying an alternate fix.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 16:06   ` Linus Torvalds
@ 2006-03-14 16:24     ` Andrew Clayton
  2006-03-14 18:50       ` Hugh Dickins
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Clayton @ 2006-03-14 16:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, Andi Kleen, linux-kernel

On Tue, 2006-03-14 at 08:06 -0800, Linus Torvalds wrote:
> 
> On Tue, 14 Mar 2006, Hugh Dickins wrote:
> >
> > Yep, that one's a turkey, definitely something for Linus to revert.
> 
> Reverted. Let's get wider testing before applying an alternate fix.
> 
> 		Linus


Just to note: Doing what Andi suggested seems to be working OK.

Cheers,

Andrew



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 15:39   ` Andi Kleen
@ 2006-03-14 16:27     ` Hugh Dickins
  0 siblings, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2006-03-14 16:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Clayton, Linus Torvalds, Andrew Morton, linux-kernel

On Tue, 14 Mar 2006, Andi Kleen wrote:
> 
> But what happens when you just revert the last hunk (the stub_execve change)?

Still no good: spontaneously rebooted under load after eight minutes.

Hugh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 16:24     ` Andrew Clayton
@ 2006-03-14 18:50       ` Hugh Dickins
  2006-03-14 18:55         ` Andrew Clayton
  2006-03-14 21:30         ` Andrew Clayton
  0 siblings, 2 replies; 9+ messages in thread
From: Hugh Dickins @ 2006-03-14 18:50 UTC (permalink / raw)
  To: Andrew Clayton; +Cc: Linus Torvalds, Andrew Morton, Andi Kleen, linux-kernel

On Tue, 14 Mar 2006, Andrew Clayton wrote:
> On Tue, 2006-03-14 at 08:06 -0800, Linus Torvalds wrote:
> > 
> > Reverted. Let's get wider testing before applying an alternate fix.
> 
> Just to note: Doing what Andi suggested seems to be working OK.

Whereas on EM64T I found the opposite,
reverting just the stub_execve hunk still behaved badly.

I've double-checked that finding since, built and ran another
kernel to confirm it.  But your Athlon64 still works OK that way?

Just trying to clarify - I don't think we're in any rush to
settle it now that Linus has reverted the damage from his tree.

Hugh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 18:50       ` Hugh Dickins
@ 2006-03-14 18:55         ` Andrew Clayton
  2006-03-14 21:30         ` Andrew Clayton
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Clayton @ 2006-03-14 18:55 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, Andrew Morton, Andi Kleen, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]

On Tue, 2006-03-14 at 18:50 +0000, Hugh Dickins wrote:
> On Tue, 14 Mar 2006, Andrew Clayton wrote:
> > On Tue, 2006-03-14 at 08:06 -0800, Linus Torvalds wrote:
> > > 
> > > Reverted. Let's get wider testing before applying an alternate fix.
> > 
> > Just to note: Doing what Andi suggested seems to be working OK.
> 
> Whereas on EM64T I found the opposite,
> reverting just the stub_execve hunk still behaved badly.
> 
> I've double-checked that finding since, built and ran another
> kernel to confirm it.  But your Athlon64 still works OK that way?

Yeah, reverting just the stub_execve hunk and 3 hours later everything
still looks good.

> Just trying to clarify - I don't think we're in any rush to
> settle it now that Linus has reverted the damage from his tree.

Sure.

> Hugh

Andrew


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.16-rc6-git[12] spontaneous reboots on x86_64
  2006-03-14 18:50       ` Hugh Dickins
  2006-03-14 18:55         ` Andrew Clayton
@ 2006-03-14 21:30         ` Andrew Clayton
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Clayton @ 2006-03-14 21:30 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, Andrew Morton, Andi Kleen, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

On Tue, 2006-03-14 at 18:50 +0000, Hugh Dickins wrote:
> On Tue, 14 Mar 2006, Andrew Clayton wrote:
> > On Tue, 2006-03-14 at 08:06 -0800, Linus Torvalds wrote:
> > > 
> > > Reverted. Let's get wider testing before applying an alternate fix.
> > 
> > Just to note: Doing what Andi suggested seems to be working OK.
> 
> Whereas on EM64T I found the opposite,
> reverting just the stub_execve hunk still behaved badly.
> 
> I've double-checked that finding since, built and ran another
> kernel to confirm it.  But your Athlon64 still works OK that way?

OK, looks like I may have spoke too soon, just found my ssh session to
it dead and the machine no longer reachable (other machines on the same
network are). I'll be able to see for sure when I get into work in the
morning.


Andrew


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-03-14 21:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-14 11:55 2.6.16-rc6-git[12] spontaneous reboots on x86_64 Andrew Clayton
2006-03-14 15:30 ` Hugh Dickins
2006-03-14 15:39   ` Andi Kleen
2006-03-14 16:27     ` Hugh Dickins
2006-03-14 16:06   ` Linus Torvalds
2006-03-14 16:24     ` Andrew Clayton
2006-03-14 18:50       ` Hugh Dickins
2006-03-14 18:55         ` Andrew Clayton
2006-03-14 21:30         ` Andrew Clayton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox