public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
       [not found] <200611111129.kABBTWgp014081@fire-2.osdl.org>
@ 2006-11-11 18:00 ` Andrew Morton
  2006-11-11 18:10   ` Arjan van de Ven
  2006-11-13  6:42   ` Neil Brown
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2006-11-11 18:00 UTC (permalink / raw)
  To: David Howells, Neil Brown
  Cc: bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex

On Sat, 11 Nov 2006 03:29:32 -0800
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=7495
> 
>            Summary: Kernel periodically hangs.
>     Kernel Version: Linux version 2.6.18.2 (root@pub) (gcc version 3.4.6)
>                     #13 SMP Fr
>             Status: NEW
>           Severity: blocking
>              Owner: other_other@kernel-bugs.osdl.org
>          Submitter: alex@hausnet.ru
> 
> 
> [42587.676000] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 0000003c
> [42587.680000]  printing eip:
> [42587.680000] 781610e7
> [42587.680000] *pde = 00000000
> [42587.680000] Oops: 0000 [#1]
> [42587.684000] SMP
> [42587.684000] Modules linked in: sata_promise sk98lin 8250_pnp 8250 
> i2c_nforce2 ehci_hcd serial_core sata_nv ahci i2c_core ohci_hcd forcedeth 
> libata
> [42587.688000] CPU:    1
> [42587.688000] EIP:    0060:[<781610e7>]    Not tainted VLI
> [42587.688000] EFLAGS: 00010286   (2.6.18.2 #13)
> [42587.692000] EIP is at clear_inode+0x96/0xce
> [42587.692000] eax: 00000000   ebx: c0102240   ecx: f7f278d4   edx: f510d400
> [42587.692000] esi: c0102384   edi: f7e6dec0   ebp: 00000070   esp: f7e6de98
> [42587.696000] ds: 007b   es: 007b   ss: 0068
> [42587.696000] Process kswapd0 (pid: 230, ti=f7e6c000 task=f7c03560 
> task.ti=f7e6c000)
> [42587.696000] Stack: c0102248 c0102240 7816116a da7b4af0 da7b4af8 00000000 
> 00000080 781614a2
> [42587.700000]        00000080 00000080 c01023f8 ef78dca8 00000000 00009858 
> 00000083 f7fee560
> [42587.700000]        781614c8 7813a643 00261600 00000000 00009858 00000005 
> 00000000 00000000
> [42587.700000] Call Trace:
> [42587.704000]  [<7816116a>] dispose_list+0x4b/0xc1
> [42587.708000]  [<781614a2>] prune_icache+0x17c/0x18e
> [42587.708000]  [<781614c8>] shrink_icache_memory+0x14/0x2b
> [42587.708000]  [<7813a643>] shrink_slab+0x130/0x18c
> [42587.712000]  [<7813b75a>] balance_pgdat+0x1ea/0x2dd
> [42587.712000]  [<7813b933>] kswapd+0xe6/0xe8
> [42587.716000]  [<781261dc>] kthread+0x7d/0xa1
> [42587.716000]  [<78100e05>] kernel_thread_helper+0x5/0xb

I've seen three or four reports of oopses like this in 2.6.18.  I have a
suspision we broke something.


> Kernel started with noapic option, cause it hands on load without this option.

Him and a million other people.  I know we broke APIC.  Around 2.6.9, I
think.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-11 18:00 ` [Bugme-new] [Bug 7495] New: Kernel periodically hangs Andrew Morton
@ 2006-11-11 18:10   ` Arjan van de Ven
  2006-11-11 18:19     ` Andrew Morton
  2006-11-13  6:42   ` Neil Brown
  1 sibling, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-11 18:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org,
	linux-kernel, alex

> > Kernel started with noapic option, cause it hands on load without this option.
> 
> Him and a million other people.  I know we broke APIC.  Around 2.6.9, I
> think.


is that when the "enable apic even on UP so that distro kernels can
install on the ibm x44*" patches went in?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-11 18:10   ` Arjan van de Ven
@ 2006-11-11 18:19     ` Andrew Morton
  2006-11-12 11:50       ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-11-11 18:19 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org,
	linux-kernel, alex

On Sat, 11 Nov 2006 19:10:03 +0100
Arjan van de Ven <arjan@infradead.org> wrote:

> > > Kernel started with noapic option, cause it hands on load without this option.
> > 
> > Him and a million other people.  I know we broke APIC.  Around 2.6.9, I
> > think.
> 
> 
> is that when the "enable apic even on UP so that distro kernels can
> install on the ibm x44*" patches went in?
> 

I don't know.  In fact I forget how I worked out that it worsened in
2.6.early.

google(noapic) gets 232,000 hits.

I don't think it really matters when or why it happened.  If we take the
approach of fixing one machine at a time, we'll only need to fix a few
individual machines to improve the situation for a lot of people.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-11 18:19     ` Andrew Morton
@ 2006-11-12 11:50       ` Arjan van de Ven
  2006-11-12 12:53         ` Adrian Bunk
  0 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 11:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org,
	linux-kernel, alex


> I don't know.  In fact I forget how I worked out that it worsened in
> 2.6.early.
> 
> google(noapic) gets 232,000 hits.

is there a way to ask google "only stuff in the last year"?
Asking because "noapic" in 2.4 was the standard "try this" answer when
people had a bios that had busted MPS (but good ACPI)...


> I don't think it really matters when or why it happened. 

well to some degree it does; if it's one patch causing it narrowing it
down at least somewhat in time would help ;)

>  If we take the
> approach of fixing one machine at a time, we'll only need to fix a few
> individual machines to improve the situation for a lot of people.

alternative is that more new machines showed up that need it somehow, eg
not really a regression just something else. Different approach is
needed for hunting that down. But to be realistic we need to narrow
things down a bit, which means

1) Only care about SMP machines. APIC on true UP (no
Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft
doesn't use it) and is just too likely to trip up SMM and other bad BIOS
stuff. 
 * exception is probably people who don't WANT to use apic but where it
somehow gets used anyway; if that happens we probably have the magic
bullet that causes the regression :)
2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for
this, but most vendors hardly maintain those anymore at all and they are
generally just /dev/random nowadays
3) Ignore overclocking; if you overclock using the FSB the apic busses
run out of spec as well; can be a huge timewaster in debug time.



-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 11:50       ` Arjan van de Ven
@ 2006-11-12 12:53         ` Adrian Bunk
  2006-11-12 13:16           ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Adrian Bunk @ 2006-11-12 12:53 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, Nov 12, 2006 at 12:50:37PM +0100, Arjan van de Ven wrote:
> 
> > I don't know.  In fact I forget how I worked out that it worsened in
> > 2.6.early.
> > 
> > google(noapic) gets 232,000 hits.
> 
> is there a way to ask google "only stuff in the last year"?
> Asking because "noapic" in 2.4 was the standard "try this" answer when
> people had a bios that had busted MPS (but good ACPI)...

Some APIC-related bugs in the kernel Bugzilla that have been reported or 
confirmed during the last 12 months (I only looked at "apic" in the 
subject, there might be more related bugs in the Bugzilla):

#5038 Fast running system clock with IO-APIC enabled
#5303 AMD64 Erratum: Should not enable C2 when using APIC
#5565 Guess of i386 APIC PTE area scribble
#6404 APIC error on CPU0: 40(40)
#6748 Clock drifts by 30% for SMP kernel w/APIC
#6859 Linux kernel won't work without "nolapic" passed
#6890 Kernel boot freezes when APIC is enabled & SATA is used

> > I don't think it really matters when or why it happened. 
> 
> well to some degree it does; if it's one patch causing it narrowing it
> down at least somewhat in time would help ;)
> 
> >  If we take the
> > approach of fixing one machine at a time, we'll only need to fix a few
> > individual machines to improve the situation for a lot of people.
> 
> alternative is that more new machines showed up that need it somehow, eg
> not really a regression just something else. Different approach is
> needed for hunting that down. But to be realistic we need to narrow
> things down a bit, which means
> 
> 1) Only care about SMP machines. APIC on true UP (no
> Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft
> doesn't use it) and is just too likely to trip up SMM and other bad BIOS
> stuff. 
>  * exception is probably people who don't WANT to use apic but where it
> somehow gets used anyway; if that happens we probably have the magic
> bullet that causes the regression :)

On i386, it's a kernel configuration option.

On x86_64, the APIC is currently always enabled even when configuring a 
UP kernel.

> 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for
> this, but most vendors hardly maintain those anymore at all and they are
> generally just /dev/random nowadays

What about non-ACPI SMP?

> 3) Ignore overclocking; if you overclock using the FSB the apic busses
> run out of spec as well; can be a huge timewaster in debug time.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 12:53         ` Adrian Bunk
@ 2006-11-12 13:16           ` Arjan van de Ven
  2006-11-12 13:37             ` Adrian Bunk
  2006-11-12 19:18             ` Ingo Oeser
  0 siblings, 2 replies; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 13:16 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo


> Some APIC-related bugs in the kernel Bugzilla that have been reported or 
> confirmed during the last 12 months (I only looked at "apic" in the 
> subject, there might be more related bugs in the Bugzilla):
> 
> #5038 Fast running system clock with IO-APIC enabled

This is a UP machine. NotInteresting(tm) wrt APIC.

> #5303 AMD64 Erratum: Should not enable C2 when using APIC

This is clearly not a linux issue but a hardware bug, as the title says

> #5565 Guess of i386 APIC PTE area scribble
this is only on one machine and a "special case"; not ruling out
anything fundamental but..

> #6404 APIC error on CPU0: 40(40)

This bug is a mess though; many different people seeing a symptom of an
apic error, and all jumping in assuming they see the same problem...
Also it's afaik only a message and not (yet) fatal in any way.
Sometimes apics do this a few times a day, esp when things are getting
hot in the box. Afaik there is then just a resend of the message and
nothing is lost.

> #6748 Clock drifts by 30% for SMP kernel w/APIC

this looks like a totally weird hardware case that probably just wants
to be blacklisted.

> #6859 Linux kernel won't work without "nolapic" passed
weird one, probably a bios issue but it's the opposite of "noapic", and
also this is about local apic not about ioapic. Although they share 4
letters they're entirely different animals.

> #6890 Kernel boot freezes when APIC is enabled & SATA is used

seems to be UP as well but asked for confirmation in the bug (lack of
lots of information here!). 

If this isn't UP this could be the first real case of "noapic" in your
entire list...... which isn't too useful. 
Maybe we need to get more/any people who see "need noapic on SMP" to
file a bug (and provide a reasonable amount of info)

> > 
> > 1) Only care about SMP machines. APIC on true UP (no
> > Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft
> > doesn't use it) and is just too likely to trip up SMM and other bad BIOS
> > stuff. 
> >  * exception is probably people who don't WANT to use apic but where it
> > somehow gets used anyway; if that happens we probably have the magic
> > bullet that causes the regression :)
> 
> On i386, it's a kernel configuration option.

yes but it's generally a bad idea to set it; it only works on some
machines. (and it can't be fixed)
> 
> On x86_64, the APIC is currently always enabled even when configuring a 
> UP kernel.

I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS
cause it to be turned off automatic most of the time.

> 
> > 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for
> > this, but most vendors hardly maintain those anymore at all and they are
> > generally just /dev/random nowadays
> 
> What about non-ACPI SMP?

if the machine is new enough to run ACPI I don't care about the non-ACPI
case; just enable it. Really. On newish machines (and that is 7 years
old or newer) MPS tables are NOT getting much if any attention by the
bios guys. So Linux should use ACPI, and if you deliberately disable
ACPI and THEN hit a problem to a large degree you asked for the problem
in the first place.

Older machines, different story.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 13:16           ` Arjan van de Ven
@ 2006-11-12 13:37             ` Adrian Bunk
  2006-11-12 13:57               ` Arjan van de Ven
  2006-11-12 19:18             ` Ingo Oeser
  1 sibling, 1 reply; 22+ messages in thread
From: Adrian Bunk @ 2006-11-12 13:37 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, Nov 12, 2006 at 02:16:16PM +0100, Arjan van de Ven wrote:
> 
> > Some APIC-related bugs in the kernel Bugzilla that have been reported or 
> > confirmed during the last 12 months (I only looked at "apic" in the 
> > subject, there might be more related bugs in the Bugzilla):
> > 
> > #5038 Fast running system clock with IO-APIC enabled
> 
> This is a UP machine. NotInteresting(tm) wrt APIC.
>... 

Currently it's a supported configuration.

We must either handle such cases or explicitely disable the APIC on all 
UP machines (BTW: Is there any way to handle this when installing a 
distribution kernel with CONFIG_HOTPLUG_CPU=y on an UP machine?).

> > > 1) Only care about SMP machines. APIC on true UP (no
> > > Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft
> > > doesn't use it) and is just too likely to trip up SMM and other bad BIOS
> > > stuff. 
> > >  * exception is probably people who don't WANT to use apic but where it
> > > somehow gets used anyway; if that happens we probably have the magic
> > > bullet that causes the regression :)
> > 
> > On i386, it's a kernel configuration option.
> 
> yes but it's generally a bad idea to set it; it only works on some
> machines. (and it can't be fixed)
> > 
> > On x86_64, the APIC is currently always enabled even when configuring a 
> > UP kernel.
> 
> I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS
> cause it to be turned off automatic most of the time.

I'd doubt the latter. Even on my cheap Asus board running an i386
AMD Athlon XP with 1.8 GHz the APIC is both used and working without any
problems.

> > > 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for
> > > this, but most vendors hardly maintain those anymore at all and they are
> > > generally just /dev/random nowadays
> > 
> > What about non-ACPI SMP?
> 
> if the machine is new enough to run ACPI I don't care about the non-ACPI
> case; just enable it. Really. On newish machines (and that is 7 years
> old or newer) MPS tables are NOT getting much if any attention by the
> bios guys. So Linux should use ACPI, and if you deliberately disable
> ACPI and THEN hit a problem to a large degree you asked for the problem
> in the first place.
> 
> Older machines, different story.

My point was regarding the latter ones...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 13:37             ` Adrian Bunk
@ 2006-11-12 13:57               ` Arjan van de Ven
  2006-11-12 14:10                 ` Adrian Bunk
  0 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 13:57 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, 2006-11-12 at 14:37 +0100, Adrian Bunk wrote:
> On Sun, Nov 12, 2006 at 02:16:16PM +0100, Arjan van de Ven wrote:
> > 
> > > Some APIC-related bugs in the kernel Bugzilla that have been reported or 
> > > confirmed during the last 12 months (I only looked at "apic" in the 
> > > subject, there might be more related bugs in the Bugzilla):
> > > 
> > > #5038 Fast running system clock with IO-APIC enabled
> > 
> > This is a UP machine. NotInteresting(tm) wrt APIC.
> >... 
> 
> Currently it's a supported configuration.

define "supported"; we have code to try it and it's great if it works.
But if it doesn't... you're out of luck.

We KNOW it can't work on a sizable amount of machines.  This is why it
is a config option; you can enable it if YOUR machine is KNOWN to work,
and you get some gains. But it's also understood that it often it won't
work. So any sensible distro (since they have to aim for a wide
audience) disables this option ...

> 
> We must either handle such cases or explicitely disable the APIC on all 
> UP machines 

that'd be the same as setting the config option off...
> > I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS
> > cause it to be turned off automatic most of the time.
> 
> I'd doubt the latter. Even on my cheap Asus board running an i386
> AMD Athlon XP with 1.8 GHz the APIC is both used and working without any
> problems.

"it works on my one machine so it works for everyone". That's simply not
true. We KNOW it can't work everywhere on UP, especially on i386. SMM
assumptions; people gluing the apic pins to the reset line, we've seen
it all. 
That it works for you is great. But that doesn't mean it automatically
works for everyone.



-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 13:57               ` Arjan van de Ven
@ 2006-11-12 14:10                 ` Adrian Bunk
  2006-11-12 14:16                   ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Adrian Bunk @ 2006-11-12 14:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, Nov 12, 2006 at 02:57:48PM +0100, Arjan van de Ven wrote:
> On Sun, 2006-11-12 at 14:37 +0100, Adrian Bunk wrote:
> > On Sun, Nov 12, 2006 at 02:16:16PM +0100, Arjan van de Ven wrote:
> > > 
> > > > Some APIC-related bugs in the kernel Bugzilla that have been reported or 
> > > > confirmed during the last 12 months (I only looked at "apic" in the 
> > > > subject, there might be more related bugs in the Bugzilla):
> > > > 
> > > > #5038 Fast running system clock with IO-APIC enabled
> > > 
> > > This is a UP machine. NotInteresting(tm) wrt APIC.
> > >... 
> > 
> > Currently it's a supported configuration.
> 
> define "supported"; we have code to try it and it's great if it works.
> But if it doesn't... you're out of luck.
> 
> We KNOW it can't work on a sizable amount of machines.  This is why it
> is a config option; you can enable it if YOUR machine is KNOWN to work,
> and you get some gains. But it's also understood that it often it won't
> work. So any sensible distro (since they have to aim for a wide
> audience) disables this option ...

Nowadays, many distributions only ship CONFIG_SMP=y kernels...

> > We must either handle such cases or explicitely disable the APIC on all 
> > UP machines 
> 
> that'd be the same as setting the config option off...

Except for the common case of CONFIG_SMP=y kernels on UP machines...

> > > I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS
> > > cause it to be turned off automatic most of the time.
> > 
> > I'd doubt the latter. Even on my cheap Asus board running an i386
> > AMD Athlon XP with 1.8 GHz the APIC is both used and working without any
> > problems.
> 
> "it works on my one machine so it works for everyone". That's simply not
> true. We KNOW it can't work everywhere on UP, especially on i386. SMM
> assumptions; people gluing the apic pins to the reset line, we've seen
> it all. 
> That it works for you is great. But that doesn't mean it automatically
> works for everyone.

You miss my point.

You said you'd suspect it to be turned off automatic most of the time, 
and that's the point I think you might be wrong at.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 14:10                 ` Adrian Bunk
@ 2006-11-12 14:16                   ` Arjan van de Ven
  2006-11-12 15:21                     ` Adrian Bunk
  2006-11-12 21:45                     ` Dave Jones
  0 siblings, 2 replies; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 14:16 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo


> > We KNOW it can't work on a sizable amount of machines.  This is why it
> > is a config option; you can enable it if YOUR machine is KNOWN to work,
> > and you get some gains. But it's also understood that it often it won't
> > work. So any sensible distro (since they have to aim for a wide
> > audience) disables this option ...
> 
> Nowadays, many distributions only ship CONFIG_SMP=y kernels...

that's a calculated risk on their side (and they know that); they're
balancing not functioning on a set of machines off against needing more
kernels.


> You miss my point.
> 
> You said you'd suspect it to be turned off automatic most of the time, 
> and that's the point I think you might be wrong at.

it won't be turned off on machines that support dual core processors
etc, since those DO get validated and designed for APIC use.. even if
you only stick a single core processor in. So yes you're right, that
nowadays is a pretty large group. But it's the safe group I guess:)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 14:16                   ` Arjan van de Ven
@ 2006-11-12 15:21                     ` Adrian Bunk
  2006-11-12 15:50                       ` Arjan van de Ven
  2006-11-12 15:59                       ` Patrick McFarland
  2006-11-12 21:45                     ` Dave Jones
  1 sibling, 2 replies; 22+ messages in thread
From: Adrian Bunk @ 2006-11-12 15:21 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote:
> 
> > > We KNOW it can't work on a sizable amount of machines.  This is why it
> > > is a config option; you can enable it if YOUR machine is KNOWN to work,
> > > and you get some gains. But it's also understood that it often it won't
> > > work. So any sensible distro (since they have to aim for a wide
> > > audience) disables this option ...
> > 
> > Nowadays, many distributions only ship CONFIG_SMP=y kernels...
> 
> that's a calculated risk on their side (and they know that); they're
> balancing not functioning on a set of machines off against needing more
> kernels.

This might soon affect the majority of Linux users, so it's a case that 
has to be handled...

> > You miss my point.
> > 
> > You said you'd suspect it to be turned off automatic most of the time, 
> > and that's the point I think you might be wrong at.
> 
> it won't be turned off on machines that support dual core processors
> etc, since those DO get validated and designed for APIC use.. even if
> you only stick a single core processor in. So yes you're right, that
> nowadays is a pretty large group. But it's the safe group I guess:)

But if APIC is even used on my more than 1 year old 40 Euro Socket A 
board (AFAIK there have never been dual core Socket A processors, there 
were no Socket A hyperthreading CPUs, it's not an SMP board, and the
VIA KT600 is not an SMP chipset) it's not in what you call "safe group",
and I don't see any reason why my board should behave different in this 
respect from all of the millions of other UP Socket A boards.

Googling show that it could be that your claim "APIC on true UP (no 
Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft 
doesn't use it)" earlier in this thread was wrong. Looking at e.g. [1], 
it seems Windows does use the APIC even on UP.

cu
Adrian

[1] http://www.microsoft.com/whdc/system/sysperf/IO-APIC.mspx

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 15:21                     ` Adrian Bunk
@ 2006-11-12 15:50                       ` Arjan van de Ven
  2006-11-12 15:59                       ` Patrick McFarland
  1 sibling, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 15:50 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo


> But if APIC is even used on my more than 1 year old 40 Euro Socket A 

once sparrow does not a summer make.


now can we get constructive again. If you find a real case where noapic
is needed on an SMP machine, preferably one where it wasn't needed
before earlier in 2.6, let us know; it's worthwhile to chase those down
since we know it's a decent use case and it's not flaky hardware.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 15:21                     ` Adrian Bunk
  2006-11-12 15:50                       ` Arjan van de Ven
@ 2006-11-12 15:59                       ` Patrick McFarland
  2006-11-12 16:07                         ` Arjan van de Ven
  2006-11-12 16:47                         ` Adrian Bunk
  1 sibling, 2 replies; 22+ messages in thread
From: Patrick McFarland @ 2006-11-12 15:59 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sunday 12 November 2006 10:21, Adrian Bunk wrote:
> On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote:
> > > > We KNOW it can't work on a sizable amount of machines.  This is why
> > > > it is a config option; you can enable it if YOUR machine is KNOWN to
> > > > work, and you get some gains. But it's also understood that it often
> > > > it won't work. So any sensible distro (since they have to aim for a
> > > > wide audience) disables this option ...
> > >
> > > Nowadays, many distributions only ship CONFIG_SMP=y kernels...
> >
> > that's a calculated risk on their side (and they know that); they're
> > balancing not functioning on a set of machines off against needing more
> > kernels.
>
> This might soon affect the majority of Linux users, so it's a case that
> has to be handled...

I actually agree here. Linux needs to be easier for people to use, not harder. 
Isn't there a way for bootloaders or the kernel early on figure out if the 
machine supports SMP, and if it doesnt, load a uniproc kernel instead?

> > > You miss my point.
> > >
> > > You said you'd suspect it to be turned off automatic most of the time,
> > > and that's the point I think you might be wrong at.
> >
> > it won't be turned off on machines that support dual core processors
> > etc, since those DO get validated and designed for APIC use.. even if
> > you only stick a single core processor in. So yes you're right, that
> > nowadays is a pretty large group. But it's the safe group I guess:)
>
> But if APIC is even used on my more than 1 year old 40 Euro Socket A
> board (AFAIK there have never been dual core Socket A processors, there
> were no Socket A hyperthreading CPUs, it's not an SMP board, and the
> VIA KT600 is not an SMP chipset) it's not in what you call "safe group",
> and I don't see any reason why my board should behave different in this
> respect from all of the millions of other UP Socket A boards.
>
> Googling show that it could be that your claim "APIC on true UP (no
> Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft
> doesn't use it)" earlier in this thread was wrong. Looking at e.g. [1],
> it seems Windows does use the APIC even on UP.

Socket A CPUs are also ungodly common. They're as common as slot 1/socket 370 
Pentium 3s, and, at least with my old P3 board, trying to use APIC on UP 
caused lockups. My Duron 1ghz laptop also does the same thing. (Booting 
either with noapic fixes it).

So yeah, if distros make stupid choices like these, then we're pretty screwed.

> cu
> Adrian
>
> [1] http://www.microsoft.com/whdc/system/sysperf/IO-APIC.mspx

-- 
Patrick McFarland || http://AdTerrasPerAspera.com
"Computer games don't affect kids; I mean if Pac-Man affected us as kids,
we'd all be running around in darkened rooms, munching magic pills and
listening to repetitive electronic music." -- Kristian Wilson, Nintendo,
Inc, 1989


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 15:59                       ` Patrick McFarland
@ 2006-11-12 16:07                         ` Arjan van de Ven
  2006-11-12 16:47                         ` Adrian Bunk
  1 sibling, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 16:07 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, 2006-11-12 at 10:59 -0500, Patrick McFarland wrote:
> On Sunday 12 November 2006 10:21, Adrian Bunk wrote:
> > On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote:
> > > > > We KNOW it can't work on a sizable amount of machines.  This is why
> > > > > it is a config option; you can enable it if YOUR machine is KNOWN to
> > > > > work, and you get some gains. But it's also understood that it often
> > > > > it won't work. So any sensible distro (since they have to aim for a
> > > > > wide audience) disables this option ...
> > > >
> > > > Nowadays, many distributions only ship CONFIG_SMP=y kernels...
> > >
> > > that's a calculated risk on their side (and they know that); they're
> > > balancing not functioning on a set of machines off against needing more
> > > kernels.
> >
> > This might soon affect the majority of Linux users, so it's a case that
> > has to be handled...
> 
> I actually agree here. Linux needs to be easier for people to use, not harder. 
> Isn't there a way for bootloaders or the kernel early on figure out if the 
> machine supports SMP, and if it doesnt, load a uniproc kernel instead?

this is what OS installers have been doing for a decade or so.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 15:59                       ` Patrick McFarland
  2006-11-12 16:07                         ` Arjan van de Ven
@ 2006-11-12 16:47                         ` Adrian Bunk
  1 sibling, 0 replies; 22+ messages in thread
From: Adrian Bunk @ 2006-11-12 16:47 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: Arjan van de Ven, Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, Nov 12, 2006 at 10:59:55AM -0500, Patrick McFarland wrote:
>...
> Socket A CPUs are also ungodly common. They're as common as slot 1/socket 370 
> Pentium 3s, and, at least with my old P3 board, trying to use APIC on UP 
> caused lockups. My Duron 1ghz laptop also does the same thing. (Booting 
> either with noapic fixes it).
>...

It might depend on the age of your computer.

Microsoft mandates the presence of an APIC implemented per MADT and all 
hardware interrupts connected to an IOAPIC for all servers and desktops 
with a "Designed for Windows XP" sticker.

This implies more or less that a working APIC is present in all
non-laptop x86 UP systems manufactured during the last 5 years.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 13:16           ` Arjan van de Ven
  2006-11-12 13:37             ` Adrian Bunk
@ 2006-11-12 19:18             ` Ingo Oeser
  2006-11-12 19:34               ` Andrew Morton
  2006-11-12 20:32               ` Arjan van de Ven
  1 sibling, 2 replies; 22+ messages in thread
From: Ingo Oeser @ 2006-11-12 19:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

Hi there,

On Sunday, 12. November 2006 14:16, Arjan van de Ven wrote:
> If this isn't UP this could be the first real case of "noapic" in your
> entire list...... which isn't too useful. 
> Maybe we need to get more/any people who see "need noapic on SMP" to
> file a bug (and provide a reasonable amount of info)

I need noapic since ever (5 years!) to get my USB controller running.
Without noapic it doesn't get any interrupts for some reason.

If now is the time to fix those bugs, I would be happy to try a new kernel
and get you the dmesg + result of plugging in an usb mass storage device
and reading from it on a DAILY basis.

If you need anything else to resolve the issue, I would be happy to help 
out here.

Maybe a pattern can be detected, which could help others.
If you like to blacklist this machine by DMI, that would also
help me.

Many Thanks!

Best Regards

Ingo Oeser

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 19:18             ` Ingo Oeser
@ 2006-11-12 19:34               ` Andrew Morton
  2006-11-12 20:32               ` Arjan van de Ven
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2006-11-12 19:34 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Arjan van de Ven, Adrian Bunk, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, 12 Nov 2006 20:18:51 +0100
Ingo Oeser <ioe-lkml@rameria.de> wrote:

> Hi there,
> 
> On Sunday, 12. November 2006 14:16, Arjan van de Ven wrote:
> > If this isn't UP this could be the first real case of "noapic" in your
> > entire list...... which isn't too useful. 
> > Maybe we need to get more/any people who see "need noapic on SMP" to
> > file a bug (and provide a reasonable amount of info)
> 
> I need noapic since ever (5 years!) to get my USB controller running.
> Without noapic it doesn't get any interrupts for some reason.
> 
> If now is the time to fix those bugs, I would be happy to try a new kernel
> and get you the dmesg + result of plugging in an usb mass storage device
> and reading from it on a DAILY basis.

Yes, please send those.  It'd be best to get the info into bugzilla too -
this doesn't look like a quick-fix scenario.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 19:18             ` Ingo Oeser
  2006-11-12 19:34               ` Andrew Morton
@ 2006-11-12 20:32               ` Arjan van de Ven
  1 sibling, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2006-11-12 20:32 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo

On Sun, 2006-11-12 at 20:18 +0100, Ingo Oeser wrote:
> Hi there,
> 
> On Sunday, 12. November 2006 14:16, Arjan van de Ven wrote:
> > If this isn't UP this could be the first real case of "noapic" in your
> > entire list...... which isn't too useful. 
> > Maybe we need to get more/any people who see "need noapic on SMP" to
> > file a bug (and provide a reasonable amount of info)
> 
> I need noapic since ever (5 years!) to get my USB controller running.
> Without noapic it doesn't get any interrupts for some reason.

so it never worked? (that's important to know versus regression)

Also does this machine use ACPI for interrupt routing?
That's also important, because if you're NOT using ACPI, "noapic" means
that you're using the PIRQ for irq routing and not MPS, so you're not
"just" changing apic behavior, you're actually using a different BIOS
table. (and to be honest, a buggy bios table is more likely the
cause ... ;)



-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 14:16                   ` Arjan van de Ven
  2006-11-12 15:21                     ` Adrian Bunk
@ 2006-11-12 21:45                     ` Dave Jones
  2006-11-13  2:07                       ` Andi Kleen
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Jones @ 2006-11-12 21:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown,
	bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo, ak

On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote:
 > 
 > > > We KNOW it can't work on a sizable amount of machines.  This is why it
 > > > is a config option; you can enable it if YOUR machine is KNOWN to work,
 > > > and you get some gains. But it's also understood that it often it won't
 > > > work. So any sensible distro (since they have to aim for a wide
 > > > audience) disables this option ...
 > > 
 > > Nowadays, many distributions only ship CONFIG_SMP=y kernels...
 > 
 > that's a calculated risk on their side (and they know that); they're
 > balancing not functioning on a set of machines off against needing more
 > kernels.

Andi has a nice patch in the suse kernel which adds heuristics to disable
apic on systems where it isn't likely to work.  It DTRT in at least
one problem case that I know of.   The actual fall-out from enabling
'run SMP kernels on UP i686' for FC6 has mostly been a non-event.
Literally a handful of cases, that will likely all get caught and worked
around by Andi's patch or similar.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-12 21:45                     ` Dave Jones
@ 2006-11-13  2:07                       ` Andi Kleen
  0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2006-11-13  2:07 UTC (permalink / raw)
  To: Dave Jones
  Cc: Arjan van de Ven, Adrian Bunk, Andrew Morton, David Howells,
	Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex,
	mingo


> Andi has a nice patch in the suse kernel which adds heuristics to disable
> apic on systems where it isn't likely to work.  It DTRT in at least
> one problem case that I know of.   The actual fall-out from enabling
> 'run SMP kernels on UP i686' for FC6 has mostly been a non-event.
> Literally a handful of cases, that will likely all get caught and worked
> around by Andi's patch or similar.

I haven't pushed that recently because i was busy with other things, but
needs to be revisited yes.

One broken case that still happens is that the patch assumes working
SMBIOS. When there is no year in SMBIOS it will turn off APIC because
it assumes it is a very old system. But sometimes new systems who would
like APIC have illegal or broken SMBIOS year. On very new systems it isn't
a problem again because those tend to have multiple cores.

That could be probably a bit more clever. It's always difficult to
navigate around all kinds of BIOS bugs.

-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-11 18:00 ` [Bugme-new] [Bug 7495] New: Kernel periodically hangs Andrew Morton
  2006-11-11 18:10   ` Arjan van de Ven
@ 2006-11-13  6:42   ` Neil Brown
  2006-11-13 11:22     ` David Howells
  1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2006-11-13  6:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, bugme-daemon@kernel-bugs.osdl.org, linux-kernel,
	alex

On Saturday November 11, akpm@osdl.org wrote:
> On Sat, 11 Nov 2006 03:29:32 -0800
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=7495
> > 
> >            Summary: Kernel periodically hangs.
> >     Kernel Version: Linux version 2.6.18.2 (root@pub) (gcc version 3.4.6)
> >                     #13 SMP Fr
> >             Status: NEW
> >           Severity: blocking
> >              Owner: other_other@kernel-bugs.osdl.org
> >          Submitter: alex@hausnet.ru

So getting back to the main issue in this bug report.....


> > 
> > 
> > [42587.676000] BUG: unable to handle kernel NULL pointer dereference at 
> > virtual address 0000003c

it would appear that in:
	if (inode->i_sb && inode->i_sb->s_op->clear_inode)
		inode->i_sb->s_op->clear_inode(inode);

inode->i_sb->s_op is NULL.  This is unfortunate :-)
alloc_super initialises s_op to '&default_op' and it isn't cleared on
unmount, so the implication seems to be that i_sb has been freed and
the memory has been reused.  This tends to suggest that
generic_shutdown_super isn't releasing all inodes before the
superblock gets destroyed.

I cannot see how this could be happening yet, but it might be helpful
to compile with CONFIG_DEBUG_SLAB and maybe even
CONFIG_DEBUG_PAGEALLOC.
That might make the problem trigger earlier and so be easier to track.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs.
  2006-11-13  6:42   ` Neil Brown
@ 2006-11-13 11:22     ` David Howells
  0 siblings, 0 replies; 22+ messages in thread
From: David Howells @ 2006-11-13 11:22 UTC (permalink / raw)
  To: Neil Brown
  Cc: Andrew Morton, David Howells, bugme-daemon@kernel-bugs.osdl.org,
	linux-kernel, alex

Neil Brown <neilb@suse.de> wrote:

> it would appear that in:
> 	if (inode->i_sb && inode->i_sb->s_op->clear_inode)
> 		inode->i_sb->s_op->clear_inode(inode);
> 
> inode->i_sb->s_op is NULL.

Agreed.

> This tends to suggest that generic_shutdown_super isn't releasing all inodes
> before the superblock gets destroyed.
> 
> I cannot see how this could be happening

Perhaps sb->s_root == NULL?  That would permit most of generic_shutdown_super()
to be bypassed, including the check that all the inodes have been consumed.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-11-13 11:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200611111129.kABBTWgp014081@fire-2.osdl.org>
2006-11-11 18:00 ` [Bugme-new] [Bug 7495] New: Kernel periodically hangs Andrew Morton
2006-11-11 18:10   ` Arjan van de Ven
2006-11-11 18:19     ` Andrew Morton
2006-11-12 11:50       ` Arjan van de Ven
2006-11-12 12:53         ` Adrian Bunk
2006-11-12 13:16           ` Arjan van de Ven
2006-11-12 13:37             ` Adrian Bunk
2006-11-12 13:57               ` Arjan van de Ven
2006-11-12 14:10                 ` Adrian Bunk
2006-11-12 14:16                   ` Arjan van de Ven
2006-11-12 15:21                     ` Adrian Bunk
2006-11-12 15:50                       ` Arjan van de Ven
2006-11-12 15:59                       ` Patrick McFarland
2006-11-12 16:07                         ` Arjan van de Ven
2006-11-12 16:47                         ` Adrian Bunk
2006-11-12 21:45                     ` Dave Jones
2006-11-13  2:07                       ` Andi Kleen
2006-11-12 19:18             ` Ingo Oeser
2006-11-12 19:34               ` Andrew Morton
2006-11-12 20:32               ` Arjan van de Ven
2006-11-13  6:42   ` Neil Brown
2006-11-13 11:22     ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox