* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. [not found] <200611111129.kABBTWgp014081@fire-2.osdl.org> @ 2006-11-11 18:00 ` Andrew Morton 2006-11-11 18:10 ` Arjan van de Ven 2006-11-13 6:42 ` Neil Brown 0 siblings, 2 replies; 22+ messages in thread From: Andrew Morton @ 2006-11-11 18:00 UTC (permalink / raw) To: David Howells, Neil Brown Cc: bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex On Sat, 11 Nov 2006 03:29:32 -0800 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=7495 > > Summary: Kernel periodically hangs. > Kernel Version: Linux version 2.6.18.2 (root@pub) (gcc version 3.4.6) > #13 SMP Fr > Status: NEW > Severity: blocking > Owner: other_other@kernel-bugs.osdl.org > Submitter: alex@hausnet.ru > > > [42587.676000] BUG: unable to handle kernel NULL pointer dereference at > virtual address 0000003c > [42587.680000] printing eip: > [42587.680000] 781610e7 > [42587.680000] *pde = 00000000 > [42587.680000] Oops: 0000 [#1] > [42587.684000] SMP > [42587.684000] Modules linked in: sata_promise sk98lin 8250_pnp 8250 > i2c_nforce2 ehci_hcd serial_core sata_nv ahci i2c_core ohci_hcd forcedeth > libata > [42587.688000] CPU: 1 > [42587.688000] EIP: 0060:[<781610e7>] Not tainted VLI > [42587.688000] EFLAGS: 00010286 (2.6.18.2 #13) > [42587.692000] EIP is at clear_inode+0x96/0xce > [42587.692000] eax: 00000000 ebx: c0102240 ecx: f7f278d4 edx: f510d400 > [42587.692000] esi: c0102384 edi: f7e6dec0 ebp: 00000070 esp: f7e6de98 > [42587.696000] ds: 007b es: 007b ss: 0068 > [42587.696000] Process kswapd0 (pid: 230, ti=f7e6c000 task=f7c03560 > task.ti=f7e6c000) > [42587.696000] Stack: c0102248 c0102240 7816116a da7b4af0 da7b4af8 00000000 > 00000080 781614a2 > [42587.700000] 00000080 00000080 c01023f8 ef78dca8 00000000 00009858 > 00000083 f7fee560 > [42587.700000] 781614c8 7813a643 00261600 00000000 00009858 00000005 > 00000000 00000000 > [42587.700000] Call Trace: > [42587.704000] [<7816116a>] dispose_list+0x4b/0xc1 > [42587.708000] [<781614a2>] prune_icache+0x17c/0x18e > [42587.708000] [<781614c8>] shrink_icache_memory+0x14/0x2b > [42587.708000] [<7813a643>] shrink_slab+0x130/0x18c > [42587.712000] [<7813b75a>] balance_pgdat+0x1ea/0x2dd > [42587.712000] [<7813b933>] kswapd+0xe6/0xe8 > [42587.716000] [<781261dc>] kthread+0x7d/0xa1 > [42587.716000] [<78100e05>] kernel_thread_helper+0x5/0xb I've seen three or four reports of oopses like this in 2.6.18. I have a suspision we broke something. > Kernel started with noapic option, cause it hands on load without this option. Him and a million other people. I know we broke APIC. Around 2.6.9, I think. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-11 18:00 ` [Bugme-new] [Bug 7495] New: Kernel periodically hangs Andrew Morton @ 2006-11-11 18:10 ` Arjan van de Ven 2006-11-11 18:19 ` Andrew Morton 2006-11-13 6:42 ` Neil Brown 1 sibling, 1 reply; 22+ messages in thread From: Arjan van de Ven @ 2006-11-11 18:10 UTC (permalink / raw) To: Andrew Morton Cc: David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex > > Kernel started with noapic option, cause it hands on load without this option. > > Him and a million other people. I know we broke APIC. Around 2.6.9, I > think. is that when the "enable apic even on UP so that distro kernels can install on the ibm x44*" patches went in? -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-11 18:10 ` Arjan van de Ven @ 2006-11-11 18:19 ` Andrew Morton 2006-11-12 11:50 ` Arjan van de Ven 0 siblings, 1 reply; 22+ messages in thread From: Andrew Morton @ 2006-11-11 18:19 UTC (permalink / raw) To: Arjan van de Ven Cc: David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex On Sat, 11 Nov 2006 19:10:03 +0100 Arjan van de Ven <arjan@infradead.org> wrote: > > > Kernel started with noapic option, cause it hands on load without this option. > > > > Him and a million other people. I know we broke APIC. Around 2.6.9, I > > think. > > > is that when the "enable apic even on UP so that distro kernels can > install on the ibm x44*" patches went in? > I don't know. In fact I forget how I worked out that it worsened in 2.6.early. google(noapic) gets 232,000 hits. I don't think it really matters when or why it happened. If we take the approach of fixing one machine at a time, we'll only need to fix a few individual machines to improve the situation for a lot of people. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-11 18:19 ` Andrew Morton @ 2006-11-12 11:50 ` Arjan van de Ven 2006-11-12 12:53 ` Adrian Bunk 0 siblings, 1 reply; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 11:50 UTC (permalink / raw) To: Andrew Morton Cc: David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex > I don't know. In fact I forget how I worked out that it worsened in > 2.6.early. > > google(noapic) gets 232,000 hits. is there a way to ask google "only stuff in the last year"? Asking because "noapic" in 2.4 was the standard "try this" answer when people had a bios that had busted MPS (but good ACPI)... > I don't think it really matters when or why it happened. well to some degree it does; if it's one patch causing it narrowing it down at least somewhat in time would help ;) > If we take the > approach of fixing one machine at a time, we'll only need to fix a few > individual machines to improve the situation for a lot of people. alternative is that more new machines showed up that need it somehow, eg not really a regression just something else. Different approach is needed for hunting that down. But to be realistic we need to narrow things down a bit, which means 1) Only care about SMP machines. APIC on true UP (no Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft doesn't use it) and is just too likely to trip up SMM and other bad BIOS stuff. * exception is probably people who don't WANT to use apic but where it somehow gets used anyway; if that happens we probably have the magic bullet that causes the regression :) 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for this, but most vendors hardly maintain those anymore at all and they are generally just /dev/random nowadays 3) Ignore overclocking; if you overclock using the FSB the apic busses run out of spec as well; can be a huge timewaster in debug time. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 11:50 ` Arjan van de Ven @ 2006-11-12 12:53 ` Adrian Bunk 2006-11-12 13:16 ` Arjan van de Ven 0 siblings, 1 reply; 22+ messages in thread From: Adrian Bunk @ 2006-11-12 12:53 UTC (permalink / raw) To: Arjan van de Ven Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, Nov 12, 2006 at 12:50:37PM +0100, Arjan van de Ven wrote: > > > I don't know. In fact I forget how I worked out that it worsened in > > 2.6.early. > > > > google(noapic) gets 232,000 hits. > > is there a way to ask google "only stuff in the last year"? > Asking because "noapic" in 2.4 was the standard "try this" answer when > people had a bios that had busted MPS (but good ACPI)... Some APIC-related bugs in the kernel Bugzilla that have been reported or confirmed during the last 12 months (I only looked at "apic" in the subject, there might be more related bugs in the Bugzilla): #5038 Fast running system clock with IO-APIC enabled #5303 AMD64 Erratum: Should not enable C2 when using APIC #5565 Guess of i386 APIC PTE area scribble #6404 APIC error on CPU0: 40(40) #6748 Clock drifts by 30% for SMP kernel w/APIC #6859 Linux kernel won't work without "nolapic" passed #6890 Kernel boot freezes when APIC is enabled & SATA is used > > I don't think it really matters when or why it happened. > > well to some degree it does; if it's one patch causing it narrowing it > down at least somewhat in time would help ;) > > > If we take the > > approach of fixing one machine at a time, we'll only need to fix a few > > individual machines to improve the situation for a lot of people. > > alternative is that more new machines showed up that need it somehow, eg > not really a regression just something else. Different approach is > needed for hunting that down. But to be realistic we need to narrow > things down a bit, which means > > 1) Only care about SMP machines. APIC on true UP (no > Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft > doesn't use it) and is just too likely to trip up SMM and other bad BIOS > stuff. > * exception is probably people who don't WANT to use apic but where it > somehow gets used anyway; if that happens we probably have the magic > bullet that causes the regression :) On i386, it's a kernel configuration option. On x86_64, the APIC is currently always enabled even when configuring a UP kernel. > 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for > this, but most vendors hardly maintain those anymore at all and they are > generally just /dev/random nowadays What about non-ACPI SMP? > 3) Ignore overclocking; if you overclock using the FSB the apic busses > run out of spec as well; can be a huge timewaster in debug time. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 12:53 ` Adrian Bunk @ 2006-11-12 13:16 ` Arjan van de Ven 2006-11-12 13:37 ` Adrian Bunk 2006-11-12 19:18 ` Ingo Oeser 0 siblings, 2 replies; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 13:16 UTC (permalink / raw) To: Adrian Bunk Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo > Some APIC-related bugs in the kernel Bugzilla that have been reported or > confirmed during the last 12 months (I only looked at "apic" in the > subject, there might be more related bugs in the Bugzilla): > > #5038 Fast running system clock with IO-APIC enabled This is a UP machine. NotInteresting(tm) wrt APIC. > #5303 AMD64 Erratum: Should not enable C2 when using APIC This is clearly not a linux issue but a hardware bug, as the title says > #5565 Guess of i386 APIC PTE area scribble this is only on one machine and a "special case"; not ruling out anything fundamental but.. > #6404 APIC error on CPU0: 40(40) This bug is a mess though; many different people seeing a symptom of an apic error, and all jumping in assuming they see the same problem... Also it's afaik only a message and not (yet) fatal in any way. Sometimes apics do this a few times a day, esp when things are getting hot in the box. Afaik there is then just a resend of the message and nothing is lost. > #6748 Clock drifts by 30% for SMP kernel w/APIC this looks like a totally weird hardware case that probably just wants to be blacklisted. > #6859 Linux kernel won't work without "nolapic" passed weird one, probably a bios issue but it's the opposite of "noapic", and also this is about local apic not about ioapic. Although they share 4 letters they're entirely different animals. > #6890 Kernel boot freezes when APIC is enabled & SATA is used seems to be UP as well but asked for confirmation in the bug (lack of lots of information here!). If this isn't UP this could be the first real case of "noapic" in your entire list...... which isn't too useful. Maybe we need to get more/any people who see "need noapic on SMP" to file a bug (and provide a reasonable amount of info) > > > > 1) Only care about SMP machines. APIC on true UP (no > > Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft > > doesn't use it) and is just too likely to trip up SMM and other bad BIOS > > stuff. > > * exception is probably people who don't WANT to use apic but where it > > somehow gets used anyway; if that happens we probably have the magic > > bullet that causes the regression :) > > On i386, it's a kernel configuration option. yes but it's generally a bad idea to set it; it only works on some machines. (and it can't be fixed) > > On x86_64, the APIC is currently always enabled even when configuring a > UP kernel. I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS cause it to be turned off automatic most of the time. > > > 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for > > this, but most vendors hardly maintain those anymore at all and they are > > generally just /dev/random nowadays > > What about non-ACPI SMP? if the machine is new enough to run ACPI I don't care about the non-ACPI case; just enable it. Really. On newish machines (and that is 7 years old or newer) MPS tables are NOT getting much if any attention by the bios guys. So Linux should use ACPI, and if you deliberately disable ACPI and THEN hit a problem to a large degree you asked for the problem in the first place. Older machines, different story. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 13:16 ` Arjan van de Ven @ 2006-11-12 13:37 ` Adrian Bunk 2006-11-12 13:57 ` Arjan van de Ven 2006-11-12 19:18 ` Ingo Oeser 1 sibling, 1 reply; 22+ messages in thread From: Adrian Bunk @ 2006-11-12 13:37 UTC (permalink / raw) To: Arjan van de Ven Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, Nov 12, 2006 at 02:16:16PM +0100, Arjan van de Ven wrote: > > > Some APIC-related bugs in the kernel Bugzilla that have been reported or > > confirmed during the last 12 months (I only looked at "apic" in the > > subject, there might be more related bugs in the Bugzilla): > > > > #5038 Fast running system clock with IO-APIC enabled > > This is a UP machine. NotInteresting(tm) wrt APIC. >... Currently it's a supported configuration. We must either handle such cases or explicitely disable the APIC on all UP machines (BTW: Is there any way to handle this when installing a distribution kernel with CONFIG_HOTPLUG_CPU=y on an UP machine?). > > > 1) Only care about SMP machines. APIC on true UP (no > > > Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft > > > doesn't use it) and is just too likely to trip up SMM and other bad BIOS > > > stuff. > > > * exception is probably people who don't WANT to use apic but where it > > > somehow gets used anyway; if that happens we probably have the magic > > > bullet that causes the regression :) > > > > On i386, it's a kernel configuration option. > > yes but it's generally a bad idea to set it; it only works on some > machines. (and it can't be fixed) > > > > On x86_64, the APIC is currently always enabled even when configuring a > > UP kernel. > > I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS > cause it to be turned off automatic most of the time. I'd doubt the latter. Even on my cheap Asus board running an i386 AMD Athlon XP with 1.8 GHz the APIC is both used and working without any problems. > > > 2) Only care about ACPI using kernels. Non-ACPI uses MPS tables for > > > this, but most vendors hardly maintain those anymore at all and they are > > > generally just /dev/random nowadays > > > > What about non-ACPI SMP? > > if the machine is new enough to run ACPI I don't care about the non-ACPI > case; just enable it. Really. On newish machines (and that is 7 years > old or newer) MPS tables are NOT getting much if any attention by the > bios guys. So Linux should use ACPI, and if you deliberately disable > ACPI and THEN hit a problem to a large degree you asked for the problem > in the first place. > > Older machines, different story. My point was regarding the latter ones... cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 13:37 ` Adrian Bunk @ 2006-11-12 13:57 ` Arjan van de Ven 2006-11-12 14:10 ` Adrian Bunk 0 siblings, 1 reply; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 13:57 UTC (permalink / raw) To: Adrian Bunk Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, 2006-11-12 at 14:37 +0100, Adrian Bunk wrote: > On Sun, Nov 12, 2006 at 02:16:16PM +0100, Arjan van de Ven wrote: > > > > > Some APIC-related bugs in the kernel Bugzilla that have been reported or > > > confirmed during the last 12 months (I only looked at "apic" in the > > > subject, there might be more related bugs in the Bugzilla): > > > > > > #5038 Fast running system clock with IO-APIC enabled > > > > This is a UP machine. NotInteresting(tm) wrt APIC. > >... > > Currently it's a supported configuration. define "supported"; we have code to try it and it's great if it works. But if it doesn't... you're out of luck. We KNOW it can't work on a sizable amount of machines. This is why it is a config option; you can enable it if YOUR machine is KNOWN to work, and you get some gains. But it's also understood that it often it won't work. So any sensible distro (since they have to aim for a wide audience) disables this option ... > > We must either handle such cases or explicitely disable the APIC on all > UP machines that'd be the same as setting the config option off... > > I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS > > cause it to be turned off automatic most of the time. > > I'd doubt the latter. Even on my cheap Asus board running an i386 > AMD Athlon XP with 1.8 GHz the APIC is both used and working without any > problems. "it works on my one machine so it works for everyone". That's simply not true. We KNOW it can't work everywhere on UP, especially on i386. SMM assumptions; people gluing the apic pins to the reset line, we've seen it all. That it works for you is great. But that doesn't mean it automatically works for everyone. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 13:57 ` Arjan van de Ven @ 2006-11-12 14:10 ` Adrian Bunk 2006-11-12 14:16 ` Arjan van de Ven 0 siblings, 1 reply; 22+ messages in thread From: Adrian Bunk @ 2006-11-12 14:10 UTC (permalink / raw) To: Arjan van de Ven Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, Nov 12, 2006 at 02:57:48PM +0100, Arjan van de Ven wrote: > On Sun, 2006-11-12 at 14:37 +0100, Adrian Bunk wrote: > > On Sun, Nov 12, 2006 at 02:16:16PM +0100, Arjan van de Ven wrote: > > > > > > > Some APIC-related bugs in the kernel Bugzilla that have been reported or > > > > confirmed during the last 12 months (I only looked at "apic" in the > > > > subject, there might be more related bugs in the Bugzilla): > > > > > > > > #5038 Fast running system clock with IO-APIC enabled > > > > > > This is a UP machine. NotInteresting(tm) wrt APIC. > > >... > > > > Currently it's a supported configuration. > > define "supported"; we have code to try it and it's great if it works. > But if it doesn't... you're out of luck. > > We KNOW it can't work on a sizable amount of machines. This is why it > is a config option; you can enable it if YOUR machine is KNOWN to work, > and you get some gains. But it's also understood that it often it won't > work. So any sensible distro (since they have to aim for a wide > audience) disables this option ... Nowadays, many distributions only ship CONFIG_SMP=y kernels... > > We must either handle such cases or explicitely disable the APIC on all > > UP machines > > that'd be the same as setting the config option off... Except for the common case of CONFIG_SMP=y kernels on UP machines... > > > I think that's a mistake. But oh well, I suspect in practice ACPI/BIOS > > > cause it to be turned off automatic most of the time. > > > > I'd doubt the latter. Even on my cheap Asus board running an i386 > > AMD Athlon XP with 1.8 GHz the APIC is both used and working without any > > problems. > > "it works on my one machine so it works for everyone". That's simply not > true. We KNOW it can't work everywhere on UP, especially on i386. SMM > assumptions; people gluing the apic pins to the reset line, we've seen > it all. > That it works for you is great. But that doesn't mean it automatically > works for everyone. You miss my point. You said you'd suspect it to be turned off automatic most of the time, and that's the point I think you might be wrong at. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 14:10 ` Adrian Bunk @ 2006-11-12 14:16 ` Arjan van de Ven 2006-11-12 15:21 ` Adrian Bunk 2006-11-12 21:45 ` Dave Jones 0 siblings, 2 replies; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 14:16 UTC (permalink / raw) To: Adrian Bunk Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo > > We KNOW it can't work on a sizable amount of machines. This is why it > > is a config option; you can enable it if YOUR machine is KNOWN to work, > > and you get some gains. But it's also understood that it often it won't > > work. So any sensible distro (since they have to aim for a wide > > audience) disables this option ... > > Nowadays, many distributions only ship CONFIG_SMP=y kernels... that's a calculated risk on their side (and they know that); they're balancing not functioning on a set of machines off against needing more kernels. > You miss my point. > > You said you'd suspect it to be turned off automatic most of the time, > and that's the point I think you might be wrong at. it won't be turned off on machines that support dual core processors etc, since those DO get validated and designed for APIC use.. even if you only stick a single core processor in. So yes you're right, that nowadays is a pretty large group. But it's the safe group I guess:) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 14:16 ` Arjan van de Ven @ 2006-11-12 15:21 ` Adrian Bunk 2006-11-12 15:50 ` Arjan van de Ven 2006-11-12 15:59 ` Patrick McFarland 2006-11-12 21:45 ` Dave Jones 1 sibling, 2 replies; 22+ messages in thread From: Adrian Bunk @ 2006-11-12 15:21 UTC (permalink / raw) To: Arjan van de Ven Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote: > > > > We KNOW it can't work on a sizable amount of machines. This is why it > > > is a config option; you can enable it if YOUR machine is KNOWN to work, > > > and you get some gains. But it's also understood that it often it won't > > > work. So any sensible distro (since they have to aim for a wide > > > audience) disables this option ... > > > > Nowadays, many distributions only ship CONFIG_SMP=y kernels... > > that's a calculated risk on their side (and they know that); they're > balancing not functioning on a set of machines off against needing more > kernels. This might soon affect the majority of Linux users, so it's a case that has to be handled... > > You miss my point. > > > > You said you'd suspect it to be turned off automatic most of the time, > > and that's the point I think you might be wrong at. > > it won't be turned off on machines that support dual core processors > etc, since those DO get validated and designed for APIC use.. even if > you only stick a single core processor in. So yes you're right, that > nowadays is a pretty large group. But it's the safe group I guess:) But if APIC is even used on my more than 1 year old 40 Euro Socket A board (AFAIK there have never been dual core Socket A processors, there were no Socket A hyperthreading CPUs, it's not an SMP board, and the VIA KT600 is not an SMP chipset) it's not in what you call "safe group", and I don't see any reason why my board should behave different in this respect from all of the millions of other UP Socket A boards. Googling show that it could be that your claim "APIC on true UP (no Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft doesn't use it)" earlier in this thread was wrong. Looking at e.g. [1], it seems Windows does use the APIC even on UP. cu Adrian [1] http://www.microsoft.com/whdc/system/sysperf/IO-APIC.mspx -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 15:21 ` Adrian Bunk @ 2006-11-12 15:50 ` Arjan van de Ven 2006-11-12 15:59 ` Patrick McFarland 1 sibling, 0 replies; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 15:50 UTC (permalink / raw) To: Adrian Bunk Cc: Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo > But if APIC is even used on my more than 1 year old 40 Euro Socket A once sparrow does not a summer make. now can we get constructive again. If you find a real case where noapic is needed on an SMP machine, preferably one where it wasn't needed before earlier in 2.6, let us know; it's worthwhile to chase those down since we know it's a decent use case and it's not flaky hardware. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 15:21 ` Adrian Bunk 2006-11-12 15:50 ` Arjan van de Ven @ 2006-11-12 15:59 ` Patrick McFarland 2006-11-12 16:07 ` Arjan van de Ven 2006-11-12 16:47 ` Adrian Bunk 1 sibling, 2 replies; 22+ messages in thread From: Patrick McFarland @ 2006-11-12 15:59 UTC (permalink / raw) To: Adrian Bunk Cc: Arjan van de Ven, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sunday 12 November 2006 10:21, Adrian Bunk wrote: > On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote: > > > > We KNOW it can't work on a sizable amount of machines. This is why > > > > it is a config option; you can enable it if YOUR machine is KNOWN to > > > > work, and you get some gains. But it's also understood that it often > > > > it won't work. So any sensible distro (since they have to aim for a > > > > wide audience) disables this option ... > > > > > > Nowadays, many distributions only ship CONFIG_SMP=y kernels... > > > > that's a calculated risk on their side (and they know that); they're > > balancing not functioning on a set of machines off against needing more > > kernels. > > This might soon affect the majority of Linux users, so it's a case that > has to be handled... I actually agree here. Linux needs to be easier for people to use, not harder. Isn't there a way for bootloaders or the kernel early on figure out if the machine supports SMP, and if it doesnt, load a uniproc kernel instead? > > > You miss my point. > > > > > > You said you'd suspect it to be turned off automatic most of the time, > > > and that's the point I think you might be wrong at. > > > > it won't be turned off on machines that support dual core processors > > etc, since those DO get validated and designed for APIC use.. even if > > you only stick a single core processor in. So yes you're right, that > > nowadays is a pretty large group. But it's the safe group I guess:) > > But if APIC is even used on my more than 1 year old 40 Euro Socket A > board (AFAIK there have never been dual core Socket A processors, there > were no Socket A hyperthreading CPUs, it's not an SMP board, and the > VIA KT600 is not an SMP chipset) it's not in what you call "safe group", > and I don't see any reason why my board should behave different in this > respect from all of the millions of other UP Socket A boards. > > Googling show that it could be that your claim "APIC on true UP (no > Hyperthreading/Dualcore) is a thing no hardware vendor tests (Microsoft > doesn't use it)" earlier in this thread was wrong. Looking at e.g. [1], > it seems Windows does use the APIC even on UP. Socket A CPUs are also ungodly common. They're as common as slot 1/socket 370 Pentium 3s, and, at least with my old P3 board, trying to use APIC on UP caused lockups. My Duron 1ghz laptop also does the same thing. (Booting either with noapic fixes it). So yeah, if distros make stupid choices like these, then we're pretty screwed. > cu > Adrian > > [1] http://www.microsoft.com/whdc/system/sysperf/IO-APIC.mspx -- Patrick McFarland || http://AdTerrasPerAspera.com "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 15:59 ` Patrick McFarland @ 2006-11-12 16:07 ` Arjan van de Ven 2006-11-12 16:47 ` Adrian Bunk 1 sibling, 0 replies; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 16:07 UTC (permalink / raw) To: Patrick McFarland Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, 2006-11-12 at 10:59 -0500, Patrick McFarland wrote: > On Sunday 12 November 2006 10:21, Adrian Bunk wrote: > > On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote: > > > > > We KNOW it can't work on a sizable amount of machines. This is why > > > > > it is a config option; you can enable it if YOUR machine is KNOWN to > > > > > work, and you get some gains. But it's also understood that it often > > > > > it won't work. So any sensible distro (since they have to aim for a > > > > > wide audience) disables this option ... > > > > > > > > Nowadays, many distributions only ship CONFIG_SMP=y kernels... > > > > > > that's a calculated risk on their side (and they know that); they're > > > balancing not functioning on a set of machines off against needing more > > > kernels. > > > > This might soon affect the majority of Linux users, so it's a case that > > has to be handled... > > I actually agree here. Linux needs to be easier for people to use, not harder. > Isn't there a way for bootloaders or the kernel early on figure out if the > machine supports SMP, and if it doesnt, load a uniproc kernel instead? this is what OS installers have been doing for a decade or so. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 15:59 ` Patrick McFarland 2006-11-12 16:07 ` Arjan van de Ven @ 2006-11-12 16:47 ` Adrian Bunk 1 sibling, 0 replies; 22+ messages in thread From: Adrian Bunk @ 2006-11-12 16:47 UTC (permalink / raw) To: Patrick McFarland Cc: Arjan van de Ven, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, Nov 12, 2006 at 10:59:55AM -0500, Patrick McFarland wrote: >... > Socket A CPUs are also ungodly common. They're as common as slot 1/socket 370 > Pentium 3s, and, at least with my old P3 board, trying to use APIC on UP > caused lockups. My Duron 1ghz laptop also does the same thing. (Booting > either with noapic fixes it). >... It might depend on the age of your computer. Microsoft mandates the presence of an APIC implemented per MADT and all hardware interrupts connected to an IOAPIC for all servers and desktops with a "Designed for Windows XP" sticker. This implies more or less that a working APIC is present in all non-laptop x86 UP systems manufactured during the last 5 years. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 14:16 ` Arjan van de Ven 2006-11-12 15:21 ` Adrian Bunk @ 2006-11-12 21:45 ` Dave Jones 2006-11-13 2:07 ` Andi Kleen 1 sibling, 1 reply; 22+ messages in thread From: Dave Jones @ 2006-11-12 21:45 UTC (permalink / raw) To: Arjan van de Ven Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo, ak On Sun, Nov 12, 2006 at 03:16:38PM +0100, Arjan van de Ven wrote: > > > > We KNOW it can't work on a sizable amount of machines. This is why it > > > is a config option; you can enable it if YOUR machine is KNOWN to work, > > > and you get some gains. But it's also understood that it often it won't > > > work. So any sensible distro (since they have to aim for a wide > > > audience) disables this option ... > > > > Nowadays, many distributions only ship CONFIG_SMP=y kernels... > > that's a calculated risk on their side (and they know that); they're > balancing not functioning on a set of machines off against needing more > kernels. Andi has a nice patch in the suse kernel which adds heuristics to disable apic on systems where it isn't likely to work. It DTRT in at least one problem case that I know of. The actual fall-out from enabling 'run SMP kernels on UP i686' for FC6 has mostly been a non-event. Literally a handful of cases, that will likely all get caught and worked around by Andi's patch or similar. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 21:45 ` Dave Jones @ 2006-11-13 2:07 ` Andi Kleen 0 siblings, 0 replies; 22+ messages in thread From: Andi Kleen @ 2006-11-13 2:07 UTC (permalink / raw) To: Dave Jones Cc: Arjan van de Ven, Adrian Bunk, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo > Andi has a nice patch in the suse kernel which adds heuristics to disable > apic on systems where it isn't likely to work. It DTRT in at least > one problem case that I know of. The actual fall-out from enabling > 'run SMP kernels on UP i686' for FC6 has mostly been a non-event. > Literally a handful of cases, that will likely all get caught and worked > around by Andi's patch or similar. I haven't pushed that recently because i was busy with other things, but needs to be revisited yes. One broken case that still happens is that the patch assumes working SMBIOS. When there is no year in SMBIOS it will turn off APIC because it assumes it is a very old system. But sometimes new systems who would like APIC have illegal or broken SMBIOS year. On very new systems it isn't a problem again because those tend to have multiple cores. That could be probably a bit more clever. It's always difficult to navigate around all kinds of BIOS bugs. -Andi ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 13:16 ` Arjan van de Ven 2006-11-12 13:37 ` Adrian Bunk @ 2006-11-12 19:18 ` Ingo Oeser 2006-11-12 19:34 ` Andrew Morton 2006-11-12 20:32 ` Arjan van de Ven 1 sibling, 2 replies; 22+ messages in thread From: Ingo Oeser @ 2006-11-12 19:18 UTC (permalink / raw) To: Arjan van de Ven Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo [-- Attachment #1: Type: text/plain, Size: 917 bytes --] Hi there, On Sunday, 12. November 2006 14:16, Arjan van de Ven wrote: > If this isn't UP this could be the first real case of "noapic" in your > entire list...... which isn't too useful. > Maybe we need to get more/any people who see "need noapic on SMP" to > file a bug (and provide a reasonable amount of info) I need noapic since ever (5 years!) to get my USB controller running. Without noapic it doesn't get any interrupts for some reason. If now is the time to fix those bugs, I would be happy to try a new kernel and get you the dmesg + result of plugging in an usb mass storage device and reading from it on a DAILY basis. If you need anything else to resolve the issue, I would be happy to help out here. Maybe a pattern can be detected, which could help others. If you like to blacklist this machine by DMI, that would also help me. Many Thanks! Best Regards Ingo Oeser [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 19:18 ` Ingo Oeser @ 2006-11-12 19:34 ` Andrew Morton 2006-11-12 20:32 ` Arjan van de Ven 1 sibling, 0 replies; 22+ messages in thread From: Andrew Morton @ 2006-11-12 19:34 UTC (permalink / raw) To: Ingo Oeser Cc: Arjan van de Ven, Adrian Bunk, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, 12 Nov 2006 20:18:51 +0100 Ingo Oeser <ioe-lkml@rameria.de> wrote: > Hi there, > > On Sunday, 12. November 2006 14:16, Arjan van de Ven wrote: > > If this isn't UP this could be the first real case of "noapic" in your > > entire list...... which isn't too useful. > > Maybe we need to get more/any people who see "need noapic on SMP" to > > file a bug (and provide a reasonable amount of info) > > I need noapic since ever (5 years!) to get my USB controller running. > Without noapic it doesn't get any interrupts for some reason. > > If now is the time to fix those bugs, I would be happy to try a new kernel > and get you the dmesg + result of plugging in an usb mass storage device > and reading from it on a DAILY basis. Yes, please send those. It'd be best to get the info into bugzilla too - this doesn't look like a quick-fix scenario. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-12 19:18 ` Ingo Oeser 2006-11-12 19:34 ` Andrew Morton @ 2006-11-12 20:32 ` Arjan van de Ven 1 sibling, 0 replies; 22+ messages in thread From: Arjan van de Ven @ 2006-11-12 20:32 UTC (permalink / raw) To: Ingo Oeser Cc: Adrian Bunk, Andrew Morton, David Howells, Neil Brown, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex, mingo On Sun, 2006-11-12 at 20:18 +0100, Ingo Oeser wrote: > Hi there, > > On Sunday, 12. November 2006 14:16, Arjan van de Ven wrote: > > If this isn't UP this could be the first real case of "noapic" in your > > entire list...... which isn't too useful. > > Maybe we need to get more/any people who see "need noapic on SMP" to > > file a bug (and provide a reasonable amount of info) > > I need noapic since ever (5 years!) to get my USB controller running. > Without noapic it doesn't get any interrupts for some reason. so it never worked? (that's important to know versus regression) Also does this machine use ACPI for interrupt routing? That's also important, because if you're NOT using ACPI, "noapic" means that you're using the PIRQ for irq routing and not MPS, so you're not "just" changing apic behavior, you're actually using a different BIOS table. (and to be honest, a buggy bios table is more likely the cause ... ;) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-11 18:00 ` [Bugme-new] [Bug 7495] New: Kernel periodically hangs Andrew Morton 2006-11-11 18:10 ` Arjan van de Ven @ 2006-11-13 6:42 ` Neil Brown 2006-11-13 11:22 ` David Howells 1 sibling, 1 reply; 22+ messages in thread From: Neil Brown @ 2006-11-13 6:42 UTC (permalink / raw) To: Andrew Morton Cc: David Howells, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex On Saturday November 11, akpm@osdl.org wrote: > On Sat, 11 Nov 2006 03:29:32 -0800 > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=7495 > > > > Summary: Kernel periodically hangs. > > Kernel Version: Linux version 2.6.18.2 (root@pub) (gcc version 3.4.6) > > #13 SMP Fr > > Status: NEW > > Severity: blocking > > Owner: other_other@kernel-bugs.osdl.org > > Submitter: alex@hausnet.ru So getting back to the main issue in this bug report..... > > > > > > [42587.676000] BUG: unable to handle kernel NULL pointer dereference at > > virtual address 0000003c it would appear that in: if (inode->i_sb && inode->i_sb->s_op->clear_inode) inode->i_sb->s_op->clear_inode(inode); inode->i_sb->s_op is NULL. This is unfortunate :-) alloc_super initialises s_op to '&default_op' and it isn't cleared on unmount, so the implication seems to be that i_sb has been freed and the memory has been reused. This tends to suggest that generic_shutdown_super isn't releasing all inodes before the superblock gets destroyed. I cannot see how this could be happening yet, but it might be helpful to compile with CONFIG_DEBUG_SLAB and maybe even CONFIG_DEBUG_PAGEALLOC. That might make the problem trigger earlier and so be easier to track. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bugme-new] [Bug 7495] New: Kernel periodically hangs. 2006-11-13 6:42 ` Neil Brown @ 2006-11-13 11:22 ` David Howells 0 siblings, 0 replies; 22+ messages in thread From: David Howells @ 2006-11-13 11:22 UTC (permalink / raw) To: Neil Brown Cc: Andrew Morton, David Howells, bugme-daemon@kernel-bugs.osdl.org, linux-kernel, alex Neil Brown <neilb@suse.de> wrote: > it would appear that in: > if (inode->i_sb && inode->i_sb->s_op->clear_inode) > inode->i_sb->s_op->clear_inode(inode); > > inode->i_sb->s_op is NULL. Agreed. > This tends to suggest that generic_shutdown_super isn't releasing all inodes > before the superblock gets destroyed. > > I cannot see how this could be happening Perhaps sb->s_root == NULL? That would permit most of generic_shutdown_super() to be bypassed, including the check that all the inodes have been consumed. David ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2006-11-13 11:25 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200611111129.kABBTWgp014081@fire-2.osdl.org>
2006-11-11 18:00 ` [Bugme-new] [Bug 7495] New: Kernel periodically hangs Andrew Morton
2006-11-11 18:10 ` Arjan van de Ven
2006-11-11 18:19 ` Andrew Morton
2006-11-12 11:50 ` Arjan van de Ven
2006-11-12 12:53 ` Adrian Bunk
2006-11-12 13:16 ` Arjan van de Ven
2006-11-12 13:37 ` Adrian Bunk
2006-11-12 13:57 ` Arjan van de Ven
2006-11-12 14:10 ` Adrian Bunk
2006-11-12 14:16 ` Arjan van de Ven
2006-11-12 15:21 ` Adrian Bunk
2006-11-12 15:50 ` Arjan van de Ven
2006-11-12 15:59 ` Patrick McFarland
2006-11-12 16:07 ` Arjan van de Ven
2006-11-12 16:47 ` Adrian Bunk
2006-11-12 21:45 ` Dave Jones
2006-11-13 2:07 ` Andi Kleen
2006-11-12 19:18 ` Ingo Oeser
2006-11-12 19:34 ` Andrew Morton
2006-11-12 20:32 ` Arjan van de Ven
2006-11-13 6:42 ` Neil Brown
2006-11-13 11:22 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox