* latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() @ 2008-08-19 19:51 Vegard Nossum 2008-08-20 1:39 ` Andi Kleen 0 siblings, 1 reply; 30+ messages in thread From: Vegard Nossum @ 2008-08-19 19:51 UTC (permalink / raw) To: the arch/x86 maintainers; +Cc: Andi Kleen, Linux Kernel Mailing List Hi, With latest -git (1fca25427482387689fa27594c992a961d98768f), I got this on reading from /dev/cpu/*/* while hot-unplugging cpu1. ------------[ cut here ]------------ WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() Pid: 3881, comm: cat Not tainted 2.6.27-rc3-00464-g1fca254 #12 [<c013591f>] warn_on_slowpath+0x4f/0x80 [<c010a300>] ? native_sched_clock+0x80/0x110 [<c010a335>] ? native_sched_clock+0xb5/0x110 [<c015ae5a>] ? __lock_acquire+0x27a/0xa00 [<c015635b>] ? trace_hardirqs_off+0xb/0x10 [<c010a335>] ? native_sched_clock+0xb5/0x110 [<c01563bd>] ? put_lock_stats+0xd/0x30 [<c0118a43>] send_IPI_mask_bitmask+0xc3/0xe0 [<c01017c8>] send_IPI_mask+0x8/0x10 [<c0118307>] native_send_call_func_single_ipi+0x27/0x30 [<c0160a2b>] generic_exec_single+0x7b/0x80 [<c0160adf>] smp_call_function_single+0x5f/0x110 [<c037a440>] ? __rdmsr_safe_on_cpu+0x0/0x60 [<c037a440>] ? __rdmsr_safe_on_cpu+0x0/0x60 [<c037a597>] _rdmsr_on_cpu+0x27/0x60 [<c037a5ea>] rdmsr_safe_on_cpu+0x1a/0x20 [<c011733e>] msr_read+0x6e/0xa0 [<c01a87b4>] vfs_read+0x94/0x130 [<c01172d0>] ? msr_read+0x0/0xa0 [<c01a8b5d>] sys_read+0x3d/0x70 [<c01040db>] sysenter_do_call+0x12/0x3f ======================= ---[ end trace fe4338948cb73be2 ]--- BUG: soft lockup - CPU#0 stuck for 61s! [cat:3881] irq event stamp: 14632440 hardirqs last enabled at (14632439): [<c015968b>] trace_hardirqs_on+0xb/0x10 hardirqs last disabled at (14632440): [<c015635b>] trace_hardirqs_off+0xb/0x10 softirqs last enabled at (14632434): [<c013a4d1>] __do_softirq+0xe1/0x100 softirqs last disabled at (14632427): [<c013a595>] do_softirq+0xa5/0xb0 Pid: 3881, comm: cat Tainted: G W (2.6.27-rc3-00464-g1fca254 #12) EIP: 0060:[<c0160952>] EFLAGS: 00200202 CPU: 0 EIP is at csd_flag_wait+0x12/0x20 EAX: f5f31ef0 EBX: c215dc60 ECX: ffffb300 EDX: 000008fa ESI: 00200292 EDI: c215dc68 EBP: f5f31ec0 ESP: f5f31ec0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 087d0a5c CR3: 33c36000 CR4: 000006d0 DR0: c0ebd43c DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0160a15>] generic_exec_single+0x65/0x80 [<c0160adf>] smp_call_function_single+0x5f/0x110 [<c037a440>] ? __rdmsr_safe_on_cpu+0x0/0x60 [<c037a440>] ? __rdmsr_safe_on_cpu+0x0/0x60 [<c037a597>] _rdmsr_on_cpu+0x27/0x60 [<c037a5ea>] rdmsr_safe_on_cpu+0x1a/0x20 [<c011733e>] msr_read+0x6e/0xa0 [<c01a87b4>] vfs_read+0x94/0x130 [<c01172d0>] ? msr_read+0x0/0xa0 [<c01a8b5d>] sys_read+0x3d/0x70 [<c01040db>] sysenter_do_call+0x12/0x3f ======================= At least SSH is not usable after this, but I guess SysRq and such would work (the "CPU stuck" message still showed up after the apparent freeze). Vegard PS: This is probably not a regression. -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-19 19:51 latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() Vegard Nossum @ 2008-08-20 1:39 ` Andi Kleen 2008-08-20 6:26 ` Vegard Nossum 0 siblings, 1 reply; 30+ messages in thread From: Andi Kleen @ 2008-08-20 1:39 UTC (permalink / raw) To: Vegard Nossum Cc: the arch/x86 maintainers, Andi Kleen, Linux Kernel Mailing List On Tue, Aug 19, 2008 at 09:51:44PM +0200, Vegard Nossum wrote: > Hi, > > With latest -git (1fca25427482387689fa27594c992a961d98768f), I got > this on reading from /dev/cpu/*/* while hot-unplugging cpu1. It's generally known the oprofile doesn't support CPU hotplug well. Someone needs to make a project out of fixing it properly. Right now it's just a "don't do that when it hurts" -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-20 1:39 ` Andi Kleen @ 2008-08-20 6:26 ` Vegard Nossum 2008-08-22 0:36 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Vegard Nossum @ 2008-08-20 6:26 UTC (permalink / raw) To: Andi Kleen; +Cc: the arch/x86 maintainers, Linux Kernel Mailing List On Wed, Aug 20, 2008 at 3:39 AM, Andi Kleen <andi@firstfloor.org> wrote: > On Tue, Aug 19, 2008 at 09:51:44PM +0200, Vegard Nossum wrote: >> Hi, >> >> With latest -git (1fca25427482387689fa27594c992a961d98768f), I got >> this on reading from /dev/cpu/*/* while hot-unplugging cpu1. > > It's generally known the oprofile doesn't support CPU hotplug well. > Someone needs to make a project out of fixing it properly. Right now > it's just a "don't do that when it hurts" Hm. What you say is true, but this one in particular has nothing to do with oprofile! It has something to do with reading /dev/cpu/*/msr while hot-unplugging cpu1: [<c011733e>] msr_read+0x6e/0xa0 [<c01a87b4>] vfs_read+0x94/0x130 I wasn't using oprofile when this happened. So I think it should also be considered a separate issue. Though yes -- CPU hotplug in general tends to break a lot of things. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-20 6:26 ` Vegard Nossum @ 2008-08-22 0:36 ` Dave Jones 2008-08-22 2:13 ` H. Peter Anvin 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2008-08-22 0:36 UTC (permalink / raw) To: Vegard Nossum Cc: Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List On Wed, Aug 20, 2008 at 08:26:19AM +0200, Vegard Nossum wrote: > On Wed, Aug 20, 2008 at 3:39 AM, Andi Kleen <andi@firstfloor.org> wrote: > > On Tue, Aug 19, 2008 at 09:51:44PM +0200, Vegard Nossum wrote: > >> Hi, > >> > >> With latest -git (1fca25427482387689fa27594c992a961d98768f), I got > >> this on reading from /dev/cpu/*/* while hot-unplugging cpu1. > > > > It's generally known the oprofile doesn't support CPU hotplug well. > > Someone needs to make a project out of fixing it properly. Right now > > it's just a "don't do that when it hurts" > > Hm. What you say is true, but this one in particular has nothing to do > with oprofile! It has something to do with reading /dev/cpu/*/msr > while hot-unplugging cpu1: > > [<c011733e>] msr_read+0x6e/0xa0 > [<c01a87b4>] vfs_read+0x94/0x130 > > I wasn't using oprofile when this happened. So I think it should also > be considered a separate issue. Though yes -- CPU hotplug in general > tends to break a lot of things. >From my reading of the msr code, we check that the cpu is online in ->open, but we never check it again, and also, we make no guarantees that it won't go away before we ->read or even ->close it. Would adding a get_cpu/put_cpu across the open/close solve this? Peter? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 0:36 ` Dave Jones @ 2008-08-22 2:13 ` H. Peter Anvin 2008-08-22 2:28 ` Andi Kleen 2008-08-24 9:20 ` Vegard Nossum 0 siblings, 2 replies; 30+ messages in thread From: H. Peter Anvin @ 2008-08-22 2:13 UTC (permalink / raw) To: Dave Jones, Vegard Nossum, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List Cc: Rusty Russell [-- Attachment #1: Type: text/plain, Size: 1957 bytes --] Dave Jones wrote: > > > > Hm. What you say is true, but this one in particular has nothing to do > > with oprofile! It has something to do with reading /dev/cpu/*/msr > > while hot-unplugging cpu1: > > > > [<c011733e>] msr_read+0x6e/0xa0 > > [<c01a87b4>] vfs_read+0x94/0x130 > > > > I wasn't using oprofile when this happened. So I think it should also > > be considered a separate issue. Though yes -- CPU hotplug in general > > tends to break a lot of things. > > From my reading of the msr code, we check that the cpu is online in ->open, > but we never check it again, and also, we make no guarantees that it > won't go away before we ->read or even ->close it. > > Would adding a get_cpu/put_cpu across the open/close solve this? > Peter? > A get_cpu/put_cpu across the whole open..close sequence would seem to be, ahem, rude, since userspace could hold it for an arbitrary amount of time (plus, there is no guarantee that they are invoked on the same CPU.) The cpuid driver has the same problem, obviously. get_online_cpus() and put_online_cpus() around the call to {rd,wr}msr_safe_on_cpu() should work; and the CPU hotplug documentation seems to claim that we can just disable preemption around those calls, which is exactly what get_cpu()..put_cpu() does, so I guess get_cpu()..put_cpu() here is fine. Now, the big question is: should this really be done in the MSR/CPUID drivers, or should it be done in smp_call_function_single(), which is the generic code invoked by this? It seems to be that doing it in smp_call_function_single() would be more correct as it's already protected by get_cpu()..put_cpu() and a cpu_online() test in there should not be expensive in comparison to the whole rest of the code. You may want to see if this patch fixes the problem; it does *NOT* have the correct error behaviour (some of the intervening layers don't propagate errors), but it should make the fault go away. -hpa [-- Attachment #2: smp-unplug-fault.patch --] [-- Type: text/x-patch, Size: 1157 bytes --] diff --git a/kernel/smp.c b/kernel/smp.c index 782e2b9..f362a85 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -210,8 +210,10 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, { struct call_single_data d; unsigned long flags; - /* prevent preemption and reschedule on another processor */ + /* prevent preemption and reschedule on another processor, + as well as CPU removal */ int me = get_cpu(); + int err = 0; /* Can deadlock when called with interrupts disabled */ WARN_ON(irqs_disabled()); @@ -220,7 +222,7 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, local_irq_save(flags); func(info); local_irq_restore(flags); - } else { + } else if ((unsigned)cpu < NR_CPUS && cpu_online(cpu)) { struct call_single_data *data = NULL; if (!wait) { @@ -236,10 +238,12 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, data->func = func; data->info = info; generic_exec_single(cpu, data); + } else { + err = -ENXIO; /* CPU not online */ } put_cpu(); - return 0; + return err; } EXPORT_SYMBOL(smp_call_function_single); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 2:13 ` H. Peter Anvin @ 2008-08-22 2:28 ` Andi Kleen 2008-08-22 6:24 ` H. Peter Anvin 2008-08-22 11:13 ` adobriyan 2008-08-24 9:20 ` Vegard Nossum 1 sibling, 2 replies; 30+ messages in thread From: Andi Kleen @ 2008-08-22 2:28 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Vegard Nossum, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell > You may want to see if this patch fixes the problem; it does *NOT* have > the correct error behaviour (some of the intervening layers don't > propagate errors), but it should make the fault go away. The alternative would be to just take out those msr_on_cpu() interfaces again. Right now they are useless in the kernel, but still cause problems. They were only added for OpenVZ's vCPUs which they back then promised me would hit mainline soon. But that was some time ago and there wasn't much progress on this. -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 2:28 ` Andi Kleen @ 2008-08-22 6:24 ` H. Peter Anvin 2008-08-22 9:35 ` Andi Kleen 2008-08-22 11:13 ` adobriyan 1 sibling, 1 reply; 30+ messages in thread From: H. Peter Anvin @ 2008-08-22 6:24 UTC (permalink / raw) To: Andi Kleen Cc: H. Peter Anvin, Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell Andi Kleen wrote: >> You may want to see if this patch fixes the problem; it does *NOT* have >> the correct error behaviour (some of the intervening layers don't >> propagate errors), but it should make the fault go away. > > The alternative would be to just take out those msr_on_cpu() > interfaces again. Right now they are useless in the kernel, > but still cause problems. > > They were only added for OpenVZ's vCPUs which they back then > promised me would hit mainline soon. But that was some time > ago and there wasn't much progress on this. > > -Andi We still need the equivalent functionality, though. The midlayer (msr_on_cpu) may be pointless, but that doesn't change the fact that putting this functionality in the lower layer (smp_call_function_single) makes more sense. -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 6:24 ` H. Peter Anvin @ 2008-08-22 9:35 ` Andi Kleen 2008-08-22 16:41 ` H. Peter Anvin 0 siblings, 1 reply; 30+ messages in thread From: Andi Kleen @ 2008-08-22 9:35 UTC (permalink / raw) To: H. Peter Anvin Cc: Andi Kleen, H. Peter Anvin, Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell > We still need the equivalent functionality, though. The midlayer > (msr_on_cpu) may be pointless, but that doesn't change the fact that > putting this functionality in the lower layer (smp_call_function_single) > makes more sense. Assuming you can actually have interrupts enabled at these point and be otherwise ready to do call_function_simple (e.g. cpu hotplug locking etc.) For a lot of MSR accesses in more complicated subsystems like cpufreq that requires complications. I would think for many circumstances it's better to simply set affinity of the thread before at a higher level. In hindsight I think it was my mistake to ever merge that. I admit I never liked it, but just merged it because I wasn't able to come up with a strong enough counter argument back then. -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 9:35 ` Andi Kleen @ 2008-08-22 16:41 ` H. Peter Anvin 2008-08-23 6:42 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 30+ messages in thread From: H. Peter Anvin @ 2008-08-22 16:41 UTC (permalink / raw) To: Andi Kleen Cc: H. Peter Anvin, Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell Andi Kleen wrote: >> We still need the equivalent functionality, though. The midlayer >> (msr_on_cpu) may be pointless, but that doesn't change the fact that >> putting this functionality in the lower layer (smp_call_function_single) >> makes more sense. > > Assuming you can actually have interrupts enabled at these point > and be otherwise ready to do call_function_simple (e.g. cpu hotplug > locking etc.) > > For a lot of MSR accesses in more complicated subsystems like cpufreq > that requires complications. I would think for many circumstances it's > better to simply set affinity of the thread before at a higher level. > > In hindsight I think it was my mistake to ever merge that. > I admit I never liked it, but just merged it because I wasn't able > to come up with a strong enough counter argument back then. Well, smp_call_function_single already does all necessary locking; it makes more sense for it to check that what it's about to call still exists while inside the lock, instead of requiring the higher layers to guarantee that cannot happen on it. This is simply a matter of the cost of checking at this point being quite low. -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 16:41 ` H. Peter Anvin @ 2008-08-23 6:42 ` Jeremy Fitzhardinge 2008-08-23 6:44 ` H. Peter Anvin 0 siblings, 1 reply; 30+ messages in thread From: Jeremy Fitzhardinge @ 2008-08-23 6:42 UTC (permalink / raw) To: H. Peter Anvin Cc: Andi Kleen, H. Peter Anvin, Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell H. Peter Anvin wrote: > Well, smp_call_function_single already does all necessary locking; it > makes more sense for it to check that what it's about to call still > exists while inside the lock, instead of requiring the higher layers > to guarantee that cannot happen on it. This is simply a matter of the > cost of checking at this point being quite low. It does, already doesn't it? Hm, smp_call_function_mask() ands the provided mask with the online mask, but it doesn't look like smp_call_function_single() does the equivalent. J ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-23 6:42 ` Jeremy Fitzhardinge @ 2008-08-23 6:44 ` H. Peter Anvin 0 siblings, 0 replies; 30+ messages in thread From: H. Peter Anvin @ 2008-08-23 6:44 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, H. Peter Anvin, Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell Jeremy Fitzhardinge wrote: > H. Peter Anvin wrote: >> Well, smp_call_function_single already does all necessary locking; it >> makes more sense for it to check that what it's about to call still >> exists while inside the lock, instead of requiring the higher layers >> to guarantee that cannot happen on it. This is simply a matter of the >> cost of checking at this point being quite low. > > It does, already doesn't it? Hm, smp_call_function_mask() ands the > provided mask with the online mask, but it doesn't look like > smp_call_function_single() does the equivalent. It doesn't, and that's how this bug was introduced. It's a trivial add (see test patch already posted) and should hardly matter in terms of execution time. I'll write up a clean patch with all the error propagation tomorrow or Sunday. -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 2:28 ` Andi Kleen 2008-08-22 6:24 ` H. Peter Anvin @ 2008-08-22 11:13 ` adobriyan 1 sibling, 0 replies; 30+ messages in thread From: adobriyan @ 2008-08-22 11:13 UTC (permalink / raw) To: Andi Kleen Cc: H. Peter Anvin, Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Fri, Aug 22, 2008 at 04:28:41AM +0200, Andi Kleen wrote: > > You may want to see if this patch fixes the problem; it does *NOT* have > > the correct error behaviour (some of the intervening layers don't > > propagate errors), but it should make the fault go away. > > The alternative would be to just take out those msr_on_cpu() > interfaces again. Right now they are useless in the kernel, > but still cause problems. > > They were only added for OpenVZ's vCPUs which they back then > promised me would hit mainline soon. There were no such promises made. Reread thread. > But that was some time ago and there wasn't much progress on this. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-22 2:13 ` H. Peter Anvin 2008-08-22 2:28 ` Andi Kleen @ 2008-08-24 9:20 ` Vegard Nossum 2008-08-24 16:43 ` H. Peter Anvin 2008-08-24 17:17 ` H. Peter Anvin 1 sibling, 2 replies; 30+ messages in thread From: Vegard Nossum @ 2008-08-24 9:20 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Fri, Aug 22, 2008 at 4:13 AM, H. Peter Anvin <hpa@kernel.org> wrote: > It seems to be that doing it in smp_call_function_single() would be more > correct as it's already protected by get_cpu()..put_cpu() and a cpu_online() > test in there should not be expensive in comparison to the whole rest of the > code. > > You may want to see if this patch fixes the problem; it does *NOT* have the > correct error behaviour (some of the intervening layers don't propagate > errors), but it should make the fault go away. Hm. Kernel fails to detect cpu1 at all. I am currently unsure of whether it's your patch or not. But it's the same config that I've been booting for ages (and I copy it over for each new kernel version I check out). Processor #0 (Bootup-CPU) I/O APIC #2 Version 32 at 0xFEC00000. Enabling APIC mode: Flat. Using 1 I/O APICs Processors: 1 SMP: Allowing 1 CPUs, 0 hotplug CPUs mapped APIC to ffffb000 (fee00000) mapped IOAPIC to ffffa000 (fec00000) Allocating PCI resources starting at 50000000 (gap: 40000000:bee00000) PERCPU: Allocating 1221764 bytes of per cpu data NR_CPUS: 7, nr_cpu_ids: 1, nr_node_ids 1 I really don't get it. Is this something that can be caused by your patch _at all_ ? Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 9:20 ` Vegard Nossum @ 2008-08-24 16:43 ` H. Peter Anvin 2008-08-24 17:17 ` H. Peter Anvin 1 sibling, 0 replies; 30+ messages in thread From: H. Peter Anvin @ 2008-08-24 16:43 UTC (permalink / raw) To: Vegard Nossum Cc: Dave Jones, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell Vegard Nossum wrote: > > Hm. > > Kernel fails to detect cpu1 at all. > > I am currently unsure of whether it's your patch or not. But it's the > same config that I've been booting for ages (and I copy it over for > each new kernel version I check out). > > Processor #0 (Bootup-CPU) > I/O APIC #2 Version 32 at 0xFEC00000. > Enabling APIC mode: Flat. Using 1 I/O APICs > Processors: 1 > SMP: Allowing 1 CPUs, 0 hotplug CPUs > mapped APIC to ffffb000 (fee00000) > mapped IOAPIC to ffffa000 (fec00000) > Allocating PCI resources starting at 50000000 (gap: 40000000:bee00000) > PERCPU: Allocating 1221764 bytes of per cpu data > NR_CPUS: 7, nr_cpu_ids: 1, nr_node_ids 1 > > I really don't get it. Is this something that can be caused by your > patch _at all_ ? > Well, if smp_call_function_single() is called during the CPU up sequence, without the CPU having been added to the online mask, then yes, it could. The most likely place would be from a notifier. That makes it ugly. Need to track down the reason. -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 9:20 ` Vegard Nossum 2008-08-24 16:43 ` H. Peter Anvin @ 2008-08-24 17:17 ` H. Peter Anvin 2008-08-24 17:22 ` Vegard Nossum 1 sibling, 1 reply; 30+ messages in thread From: H. Peter Anvin @ 2008-08-24 17:17 UTC (permalink / raw) To: Vegard Nossum Cc: Dave Jones, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell [-- Attachment #1: Type: text/plain, Size: 903 bytes --] Vegard Nossum wrote: > > Hm. > > Kernel fails to detect cpu1 at all. > > I am currently unsure of whether it's your patch or not. But it's the > same config that I've been booting for ages (and I copy it over for > each new kernel version I check out). > > Processor #0 (Bootup-CPU) > I/O APIC #2 Version 32 at 0xFEC00000. > Enabling APIC mode: Flat. Using 1 I/O APICs > Processors: 1 > SMP: Allowing 1 CPUs, 0 hotplug CPUs > mapped APIC to ffffb000 (fee00000) > mapped IOAPIC to ffffa000 (fec00000) > Allocating PCI resources starting at 50000000 (gap: 40000000:bee00000) > PERCPU: Allocating 1221764 bytes of per cpu data > NR_CPUS: 7, nr_cpu_ids: 1, nr_node_ids 1 > > I really don't get it. Is this something that can be caused by your > patch _at all_ ? > Could you try this patch? It should (hopefully) tell us if there is any such invocations and what the call trace looks like. -hpa [-- Attachment #2: smp-unplug-fault2.patch --] [-- Type: text/x-patch, Size: 1167 bytes --] diff --git a/kernel/smp.c b/kernel/smp.c index 782e2b9..95e1bad 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -210,8 +210,10 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, { struct call_single_data d; unsigned long flags; - /* prevent preemption and reschedule on another processor */ + /* prevent preemption and reschedule on another processor, + as well as CPU removal */ int me = get_cpu(); + int err = 0; /* Can deadlock when called with interrupts disabled */ WARN_ON(irqs_disabled()); @@ -220,7 +222,7 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, local_irq_save(flags); func(info); local_irq_restore(flags); - } else { + } else if ((unsigned)cpu < NR_CPUS && cpu_online(cpu)) { struct call_single_data *data = NULL; if (!wait) { @@ -236,10 +238,13 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info, data->func = func; data->info = info; generic_exec_single(cpu, data); + } else { + BUG(); + err = -ENXIO; /* CPU not online */ } put_cpu(); - return 0; + return err; } EXPORT_SYMBOL(smp_call_function_single); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 17:17 ` H. Peter Anvin @ 2008-08-24 17:22 ` Vegard Nossum 2008-08-24 17:45 ` Vegard Nossum 0 siblings, 1 reply; 30+ messages in thread From: Vegard Nossum @ 2008-08-24 17:22 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Sun, Aug 24, 2008 at 7:17 PM, H. Peter Anvin <hpa@kernel.org> wrote: > Vegard Nossum wrote: >> >> Hm. >> >> Kernel fails to detect cpu1 at all. >> >> I am currently unsure of whether it's your patch or not. But it's the >> same config that I've been booting for ages (and I copy it over for >> each new kernel version I check out). >> >> Processor #0 (Bootup-CPU) >> I/O APIC #2 Version 32 at 0xFEC00000. >> Enabling APIC mode: Flat. Using 1 I/O APICs >> Processors: 1 >> SMP: Allowing 1 CPUs, 0 hotplug CPUs >> mapped APIC to ffffb000 (fee00000) >> mapped IOAPIC to ffffa000 (fec00000) >> Allocating PCI resources starting at 50000000 (gap: 40000000:bee00000) >> PERCPU: Allocating 1221764 bytes of per cpu data >> NR_CPUS: 7, nr_cpu_ids: 1, nr_node_ids 1 >> >> I really don't get it. Is this something that can be caused by your >> patch _at all_ ? >> > > Could you try this patch? It should (hopefully) tell us if there is any > such invocations and what the call trace looks like. I'm sorry, I _just_ reverted your patch and tested the bare kernel... but it still only detects cpu0 :-( Apart from that, it's also incredibly slow and I get some "end_request: I/O error, dev fd0, sector 0" messages. Start-up (init 3 on a F7) takes closer to 10 minutes. Will now take a closer look at my config. Oh. I _just_ noticed a completely different change -- I added acpi=off to my boot line *blush* Will now remove it and retry your original patch. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 17:22 ` Vegard Nossum @ 2008-08-24 17:45 ` Vegard Nossum 2008-08-24 17:59 ` H. Peter Anvin 2008-08-24 18:13 ` Dave Jones 0 siblings, 2 replies; 30+ messages in thread From: Vegard Nossum @ 2008-08-24 17:45 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Sun, Aug 24, 2008 at 7:22 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: >>> Kernel fails to detect cpu1 at all. > I'm sorry, I _just_ reverted your patch and tested the bare kernel... > but it still only detects cpu0 :-( > > Apart from that, it's also incredibly slow and I get some > "end_request: I/O error, dev fd0, sector 0" messages. Start-up (init 3 > on a F7) takes closer to 10 minutes. Will now take a closer look at my > config. > > Oh. I _just_ noticed a completely different change -- I added acpi=off > to my boot line *blush* Removing acpi=off helps with the CPU detection problem. The kernel is still really slow, though. From /proc/cpuinfo: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 5 cpu MHz : 375.000 cache size : 2048 KB Why is MHz on 375!? I tried cpufreq-selector, but nothing changed. Maybe calling acpi_cpufreq_init+0x0/0x90 initcall acpi_cpufreq_init+0x0/0x90 returned -19 after 0 msecs There's also this: SMP: Allowing 2 CPUs, 0 hotplug CPUs (but CPU hotplug still work, is the line above about something different, like physical hotplug?) Apart from that, with your patch applied, hotplug seems to work OK (no warnings). Okay, now I used cpufreq-selector to change to "ondemand" governor, and MHz goes back to 3000. Weird. Why would "performance" governor put my machine to a constant 375? Thanks, Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 17:45 ` Vegard Nossum @ 2008-08-24 17:59 ` H. Peter Anvin 2008-08-24 18:13 ` Dave Jones 1 sibling, 0 replies; 30+ messages in thread From: H. Peter Anvin @ 2008-08-24 17:59 UTC (permalink / raw) To: Vegard Nossum Cc: Dave Jones, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell Vegard Nossum wrote: > > Okay, now I used cpufreq-selector to change to "ondemand" governor, > and MHz goes back to 3000. Weird. Why would "performance" governor put > my machine to a constant 375? > That would be a problem... I presume this problem is independent of the patch, though? -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 17:45 ` Vegard Nossum 2008-08-24 17:59 ` H. Peter Anvin @ 2008-08-24 18:13 ` Dave Jones 2008-08-25 18:31 ` Vegard Nossum 2008-08-25 18:36 ` Andi Kleen 1 sibling, 2 replies; 30+ messages in thread From: Dave Jones @ 2008-08-24 18:13 UTC (permalink / raw) To: Vegard Nossum Cc: H. Peter Anvin, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Sun, Aug 24, 2008 at 07:45:48PM +0200, Vegard Nossum wrote: > Removing acpi=off helps with the CPU detection problem. The kernel is > still really slow, though. From /proc/cpuinfo: > > processor : 1 > vendor_id : GenuineIntel > cpu family : 15 > model : 6 > model name : Intel(R) Pentium(R) 4 CPU 3.00GHz > stepping : 5 > cpu MHz : 375.000 > cache size : 2048 KB > > Why is MHz on 375!? I tried cpufreq-selector, but nothing changed. Maybe > > calling acpi_cpufreq_init+0x0/0x90 > initcall acpi_cpufreq_init+0x0/0x90 returned -19 after 0 msecs -ENODEV. Because you don't have frequency scaling capable CPU. > Okay, now I used cpufreq-selector to change to "ondemand" governor, > and MHz goes back to 3000. Weird. Why would "performance" governor put > my machine to a constant 375? Probably because you're using p4-clockmod, and it's crap. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 18:13 ` Dave Jones @ 2008-08-25 18:31 ` Vegard Nossum 2008-08-25 18:38 ` Dave Jones 2008-08-25 18:36 ` Andi Kleen 1 sibling, 1 reply; 30+ messages in thread From: Vegard Nossum @ 2008-08-25 18:31 UTC (permalink / raw) To: Dave Jones, H. Peter Anvin, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Sun, Aug 24, 2008 at 8:13 PM, Dave Jones <davej@redhat.com> wrote: > On Sun, Aug 24, 2008 at 07:45:48PM +0200, Vegard Nossum wrote: > > Why is MHz on 375!? I tried cpufreq-selector, but nothing changed. Maybe > > > > calling acpi_cpufreq_init+0x0/0x90 > > initcall acpi_cpufreq_init+0x0/0x90 returned -19 after 0 msecs > > -ENODEV. Because you don't have frequency scaling capable CPU. > > > Okay, now I used cpufreq-selector to change to "ondemand" governor, > > and MHz goes back to 3000. Weird. Why would "performance" governor put > > my machine to a constant 375? > > Probably because you're using p4-clockmod, and it's crap. On Sun, Aug 24, 2008 at 7:59 PM, H. Peter Anvin <hpa@kernel.org> wrote: > That would be a problem... I presume this problem is independent of the > patch, though? I sorted it -- thanks! It turned out to be pretty obscure; my tty setting for the receiving end of the serial console was set to echo. So when the machine booted, it was echoing lots of characters into the Fedora 7 init, which would prompt for the starting of cpuspeed initscript. Turning off echo for the tty was what triggered the slowness; removing cpuspeed from the runlevel entirely solved the problem. Don't know why cpuspeed would select a governor which runs the CPU at a constant 300 MHz, though. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 18:31 ` Vegard Nossum @ 2008-08-25 18:38 ` Dave Jones 0 siblings, 0 replies; 30+ messages in thread From: Dave Jones @ 2008-08-25 18:38 UTC (permalink / raw) To: Vegard Nossum Cc: H. Peter Anvin, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, Aug 25, 2008 at 08:31:04PM +0200, Vegard Nossum wrote: > Fedora 7 init, which would prompt for the starting of cpuspeed > initscript. Turning off echo for the tty was what triggered the > slowness; removing cpuspeed from the runlevel entirely solved the > problem. > > Don't know why cpuspeed would select a governor which runs the CPU at > a constant 300 MHz, though. p4-clockmod is the only cpufreq driver that can run on your hardware. There's nothing better. A while back, Fedora stopped loading (and even building) p4-clockmod, because it sucks so bad. I can't remember when we made that change, but it sounds like it must have been a post F7 thing. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-24 18:13 ` Dave Jones 2008-08-25 18:31 ` Vegard Nossum @ 2008-08-25 18:36 ` Andi Kleen 2008-08-25 18:54 ` Dave Jones 2008-08-25 19:08 ` H. Peter Anvin 1 sibling, 2 replies; 30+ messages in thread From: Andi Kleen @ 2008-08-25 18:36 UTC (permalink / raw) To: Dave Jones, Vegard Nossum, H. Peter Anvin, Andi Kleen, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell > Probably because you're using p4-clockmod, and it's crap. Really should really bite the bullet and just remove it. People run in this all the time and I bet you can count the people who actually use it consciously and usefully with one hand. Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING" option explicitely. -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 18:36 ` Andi Kleen @ 2008-08-25 18:54 ` Dave Jones 2008-08-25 19:39 ` Andi Kleen 2008-08-25 19:08 ` H. Peter Anvin 1 sibling, 1 reply; 30+ messages in thread From: Dave Jones @ 2008-08-25 18:54 UTC (permalink / raw) To: Andi Kleen Cc: Vegard Nossum, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, Aug 25, 2008 at 08:36:11PM +0200, Andi Kleen wrote: > > Probably because you're using p4-clockmod, and it's crap. > > Really should really bite the bullet and just remove it. People > run in this all the time and I bet you can count the people who > actually use it consciously and usefully with one hand. > > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING" > option explicitely. We can't really remove it until ACPI processor driver has a better response than 'thermal event, argh!, shut down'. When that happens, I'll be glad to see it go. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 18:54 ` Dave Jones @ 2008-08-25 19:39 ` Andi Kleen 2008-08-25 19:50 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Andi Kleen @ 2008-08-25 19:39 UTC (permalink / raw) To: Dave Jones, Andi Kleen, Vegard Nossum, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, Aug 25, 2008 at 02:54:51PM -0400, Dave Jones wrote: > On Mon, Aug 25, 2008 at 08:36:11PM +0200, Andi Kleen wrote: > > > Probably because you're using p4-clockmod, and it's crap. > > > > Really should really bite the bullet and just remove it. People > > run in this all the time and I bet you can count the people who > > actually use it consciously and usefully with one hand. > > > > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING" > > option explicitely. > > We can't really remove it until ACPI processor driver has a better > response than 'thermal event, argh!, shut down'. It only does that when the critical trip point is reached (which basically means that the BIOS tells it -- "I'm on fire"). What else should it do in your opinion when this happens? -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 19:39 ` Andi Kleen @ 2008-08-25 19:50 ` Dave Jones 2008-08-25 20:36 ` Andi Kleen 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2008-08-25 19:50 UTC (permalink / raw) To: Andi Kleen Cc: Vegard Nossum, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, Aug 25, 2008 at 09:39:26PM +0200, Andi Kleen wrote: > On Mon, Aug 25, 2008 at 02:54:51PM -0400, Dave Jones wrote: > > On Mon, Aug 25, 2008 at 08:36:11PM +0200, Andi Kleen wrote: > > > > Probably because you're using p4-clockmod, and it's crap. > > > > > > Really should really bite the bullet and just remove it. People > > > run in this all the time and I bet you can count the people who > > > actually use it consciously and usefully with one hand. > > > > > > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING" > > > option explicitely. > > > > We can't really remove it until ACPI processor driver has a better > > response than 'thermal event, argh!, shut down'. > > It only does that when the critical trip point is reached (which > basically means that the BIOS tells it -- "I'm on fire"). What else should > it do in your opinion when this happens? On some systems (for which there aren't BIOS updates) the trip points are set too low. If we get a thermal event that was caused by temporary increased workload, temperature will drop off again when that workload is complete. For sustained workloads we'd get additional thermal events, at which time we make a decision "ok, we've throttled as far as we can, and things are still going badly, power off". In the event of a failed fan or similar, shutting down is obviously the right thing to do, and we'd get further thermal events after throttling which would allow us to do so. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 19:50 ` Dave Jones @ 2008-08-25 20:36 ` Andi Kleen 2008-08-25 20:47 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Andi Kleen @ 2008-08-25 20:36 UTC (permalink / raw) To: Dave Jones, Andi Kleen, Vegard Nossum, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell > On some systems (for which there aren't BIOS updates) the trip points are > set too low. There were patches floating to make this configurable. I was always a little sceptical of them, but they exist. > If we get a thermal event that was caused by temporary > increased workload, temperature will drop off again when that workload > is complete. But none of the cpufreq governours do this. They only care about load, not about temperature. > For sustained workloads we'd get additional thermal events, at which > time we make a decision "ok, we've throttled as far as we can, and > things are still going badly, power off". That is what the ACPI driver does when the trip point is reached. > In the event of a failed fan or similar, shutting down is obviously > the right thing to do, and we'd get further thermal events after > throttling which would allow us to do so. So you're saying processor_thermal should let the system cook for some time first before really taking action? -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 20:36 ` Andi Kleen @ 2008-08-25 20:47 ` Dave Jones 2008-08-25 21:24 ` Arjan van de Ven 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2008-08-25 20:47 UTC (permalink / raw) To: Andi Kleen Cc: Vegard Nossum, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, Aug 25, 2008 at 10:36:49PM +0200, Andi Kleen wrote: > > If we get a thermal event that was caused by temporary > > increased workload, temperature will drop off again when that workload > > is complete. > > But none of the cpufreq governours do this. They only care about > load, not about temperature. Which is good enough to stop p4 laptops from shutting down as soon as they've finished booting up. > > For sustained workloads we'd get additional thermal events, at which > > time we make a decision "ok, we've throttled as far as we can, and > > things are still going badly, power off". > > That is what the ACPI driver does when the trip point is reached. yes, except for that "we've throttled" part. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 20:47 ` Dave Jones @ 2008-08-25 21:24 ` Arjan van de Ven 0 siblings, 0 replies; 30+ messages in thread From: Arjan van de Ven @ 2008-08-25 21:24 UTC (permalink / raw) To: Dave Jones Cc: Andi Kleen, Vegard Nossum, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, 25 Aug 2008 16:47:02 -0400 Dave Jones <davej@redhat.com> wrote: > On Mon, Aug 25, 2008 at 10:36:49PM +0200, Andi Kleen wrote: > > > > If we get a thermal event that was caused by temporary > > > increased workload, temperature will drop off again when that > > > workload is complete. > > > > But none of the cpufreq governours do this. They only care about > > load, not about temperature. > > Which is good enough to stop p4 laptops from shutting down as > soon as they've finished booting up.\ that's such an enormous gamble it's not funny. really; if your bios has broken trippoints we should use the kernel commandline to disable them (and a dmi blacklist if the amount of bioses that have it wrong is low.. maybe combined with a date based threshold). Just praying that p4clockmod keeps it kinda low enough is not the answer. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 18:36 ` Andi Kleen 2008-08-25 18:54 ` Dave Jones @ 2008-08-25 19:08 ` H. Peter Anvin 2008-08-25 19:13 ` Dave Jones 1 sibling, 1 reply; 30+ messages in thread From: H. Peter Anvin @ 2008-08-25 19:08 UTC (permalink / raw) To: Andi Kleen Cc: Dave Jones, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell Andi Kleen wrote: >> Probably because you're using p4-clockmod, and it's crap. > > Really should really bite the bullet and just remove it. People > run in this all the time and I bet you can count the people who > actually use it consciously and usefully with one hand. > > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING" > option explicitely. > CONFIG_BROKEN? -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() 2008-08-25 19:08 ` H. Peter Anvin @ 2008-08-25 19:13 ` Dave Jones 0 siblings, 0 replies; 30+ messages in thread From: Dave Jones @ 2008-08-25 19:13 UTC (permalink / raw) To: H. Peter Anvin Cc: Andi Kleen, Vegard Nossum, the arch/x86 maintainers, Linux Kernel Mailing List, Rusty Russell On Mon, Aug 25, 2008 at 12:08:23PM -0700, H. Peter Anvin wrote: > Andi Kleen wrote: > >> Probably because you're using p4-clockmod, and it's crap. > > > > Really should really bite the bullet and just remove it. People > > run in this all the time and I bet you can count the people who > > actually use it consciously and usefully with one hand. > > > > Or at least only make it run when the user set a "I_REALLY_KNOW_WHAT_I_AM_DOING" > > option explicitely. > > CONFIG_BROKEN? It's not really broken (at least in the CONFIG_BROKEN sense), it just sucks when used in the wrong situations. (Which is 99% of the use-cases people try to use it). Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-08-25 21:24 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-19 19:51 latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask_bitmask+0xc3/0xe0() Vegard Nossum 2008-08-20 1:39 ` Andi Kleen 2008-08-20 6:26 ` Vegard Nossum 2008-08-22 0:36 ` Dave Jones 2008-08-22 2:13 ` H. Peter Anvin 2008-08-22 2:28 ` Andi Kleen 2008-08-22 6:24 ` H. Peter Anvin 2008-08-22 9:35 ` Andi Kleen 2008-08-22 16:41 ` H. Peter Anvin 2008-08-23 6:42 ` Jeremy Fitzhardinge 2008-08-23 6:44 ` H. Peter Anvin 2008-08-22 11:13 ` adobriyan 2008-08-24 9:20 ` Vegard Nossum 2008-08-24 16:43 ` H. Peter Anvin 2008-08-24 17:17 ` H. Peter Anvin 2008-08-24 17:22 ` Vegard Nossum 2008-08-24 17:45 ` Vegard Nossum 2008-08-24 17:59 ` H. Peter Anvin 2008-08-24 18:13 ` Dave Jones 2008-08-25 18:31 ` Vegard Nossum 2008-08-25 18:38 ` Dave Jones 2008-08-25 18:36 ` Andi Kleen 2008-08-25 18:54 ` Dave Jones 2008-08-25 19:39 ` Andi Kleen 2008-08-25 19:50 ` Dave Jones 2008-08-25 20:36 ` Andi Kleen 2008-08-25 20:47 ` Dave Jones 2008-08-25 21:24 ` Arjan van de Ven 2008-08-25 19:08 ` H. Peter Anvin 2008-08-25 19:13 ` Dave Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox