* [patch] Regression in 2.6.19-rc microcode driver
@ 2006-11-06 14:15 Arjan van de Ven
2006-11-06 19:01 ` Andrew Morton
2006-11-07 1:20 ` Shaohua Li
0 siblings, 2 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-06 14:15 UTC (permalink / raw)
To: linux-kernel; +Cc: shaohua.li, akpm, bunk
Hi,
if the microcode driver is built in (rather than module) there are some,
ehm, interesting effects happening due to the new "call out to
userspace" behavior that is introduced.. and which runs too early. The
result is a boot hang; which is really nasty.
The patch below is a minimally safe patch to fix this regression for
2.6.19 by just not requesting actual microcode updates during early
boot. (That is a good idea in general anyway)
The "real" fix is a lot more complex given the entire cpu hotplug
scenario (during cpu hotplug you normally need to load the microcode as
well); but the interactions for that are just really messy at this
point; this fix at least makes it work and avoids a full detangle of
hotplug.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
--- linux-2.6.18/arch/i386/kernel/microcode.c.org 2006-11-06 14:50:37.000000000 +0100
+++ linux-2.6.18/arch/i386/kernel/microcode.c 2006-11-06 14:52:30.000000000 +0100
@@ -577,7 +577,7 @@ static void microcode_init_cpu(int cpu)
set_cpus_allowed(current, cpumask_of_cpu(cpu));
mutex_lock(µcode_mutex);
collect_cpu_info(cpu);
- if (uci->valid)
+ if (uci->valid && system_state==SYSTEM_RUNNING)
cpu_request_microcode(cpu);
mutex_unlock(µcode_mutex);
set_cpus_allowed(current, old);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-06 14:15 [patch] Regression in 2.6.19-rc microcode driver Arjan van de Ven
@ 2006-11-06 19:01 ` Andrew Morton
2006-11-06 19:02 ` Arjan van de Ven
2006-11-06 19:18 ` Arjan van de Ven
2006-11-07 1:20 ` Shaohua Li
1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2006-11-06 19:01 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, shaohua.li, bunk
On Mon, 06 Nov 2006 15:15:38 +0100
Arjan van de Ven <arjan@linux.intel.com> wrote:
> Hi,
>
> if the microcode driver is built in (rather than module) there are some,
> ehm, interesting effects happening due to the new "call out to
> userspace" behavior that is introduced.. and which runs too early. The
> result is a boot hang; which is really nasty.
>
> The patch below is a minimally safe patch to fix this regression for
> 2.6.19 by just not requesting actual microcode updates during early
> boot. (That is a good idea in general anyway)
>
> The "real" fix is a lot more complex given the entire cpu hotplug
> scenario (during cpu hotplug you normally need to load the microcode as
> well); but the interactions for that are just really messy at this
> point; this fix at least makes it work and avoids a full detangle of
> hotplug.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
>
> --- linux-2.6.18/arch/i386/kernel/microcode.c.org 2006-11-06 14:50:37.000000000 +0100
> +++ linux-2.6.18/arch/i386/kernel/microcode.c 2006-11-06 14:52:30.000000000 +0100
> @@ -577,7 +577,7 @@ static void microcode_init_cpu(int cpu)
> set_cpus_allowed(current, cpumask_of_cpu(cpu));
> mutex_lock(µcode_mutex);
> collect_cpu_info(cpu);
> - if (uci->valid)
> + if (uci->valid && system_state==SYSTEM_RUNNING)
> cpu_request_microcode(cpu);
> mutex_unlock(µcode_mutex);
> set_cpus_allowed(current, old);
Can we fix this by switching to late_initcall() or something like that?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-06 19:01 ` Andrew Morton
@ 2006-11-06 19:02 ` Arjan van de Ven
2006-11-06 19:18 ` Arjan van de Ven
1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-06 19:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, shaohua.li, bunk
Andrew Morton wrote:
>> --- linux-2.6.18/arch/i386/kernel/microcode.c.org 2006-11-06 14:50:37.000000000 +0100
>> +++ linux-2.6.18/arch/i386/kernel/microcode.c 2006-11-06 14:52:30.000000000 +0100
>> @@ -577,7 +577,7 @@ static void microcode_init_cpu(int cpu)
>> set_cpus_allowed(current, cpumask_of_cpu(cpu));
>> mutex_lock(µcode_mutex);
>> collect_cpu_info(cpu);
>> - if (uci->valid)
>> + if (uci->valid && system_state==SYSTEM_RUNNING)
>> cpu_request_microcode(cpu);
>> mutex_unlock(µcode_mutex);
>> set_cpus_allowed(current, old);
>
> Can we fix this by switching to late_initcall() or something like that?
I will try but it then still runs before userspace (esp "init") is
alive so I'm not convinced it'll do the right thing
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-06 19:01 ` Andrew Morton
2006-11-06 19:02 ` Arjan van de Ven
@ 2006-11-06 19:18 ` Arjan van de Ven
1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-06 19:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, shaohua.li, bunk
Andrew Morton wrote:
> Can we fix this by switching to late_initcall() or something like that?
after testing this: the answer is "no." ;(
at least not without significant redesign on how this all interacts
(which includes cpuhotplug meeting sysfs which isn't all that pretty
already) which is imo not a 2.6.19 thing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-06 14:15 [patch] Regression in 2.6.19-rc microcode driver Arjan van de Ven
2006-11-06 19:01 ` Andrew Morton
@ 2006-11-07 1:20 ` Shaohua Li
2006-11-07 1:59 ` Andrew Morton
1 sibling, 1 reply; 10+ messages in thread
From: Shaohua Li @ 2006-11-07 1:20 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, akpm, bunk
On Mon, 2006-11-06 at 15:15 +0100, Arjan van de Ven wrote:
> Hi,
>
> if the microcode driver is built in (rather than module) there are some,
> ehm, interesting effects happening due to the new "call out to
> userspace" behavior that is introduced.. and which runs too early. The
> result is a boot hang; which is really nasty.
>
> The patch below is a minimally safe patch to fix this regression for
> 2.6.19 by just not requesting actual microcode updates during early
> boot. (That is a good idea in general anyway)
>
> The "real" fix is a lot more complex given the entire cpu hotplug
> scenario (during cpu hotplug you normally need to load the microcode as
> well); but the interactions for that are just really messy at this
> point; this fix at least makes it work and avoids a full detangle of
> hotplug.
Yes, this is an issue which I documented in my patch. It's not a hang,
but a long delay if you have many cpus. Other drivers with firmware
request have the same issue if they are built-in. Maybe we should fix
the firmware request mechanism itself. I hope no distribution has
microcode driver built-in.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-07 1:20 ` Shaohua Li
@ 2006-11-07 1:59 ` Andrew Morton
2006-11-07 9:13 ` Arjan van de Ven
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-11-07 1:59 UTC (permalink / raw)
To: Shaohua Li; +Cc: Arjan van de Ven, linux-kernel, bunk
On Tue, 07 Nov 2006 09:20:27 +0800
Shaohua Li <shaohua.li@intel.com> wrote:
> On Mon, 2006-11-06 at 15:15 +0100, Arjan van de Ven wrote:
> > Hi,
> >
> > if the microcode driver is built in (rather than module) there are some,
> > ehm, interesting effects happening due to the new "call out to
> > userspace" behavior that is introduced.. and which runs too early. The
> > result is a boot hang; which is really nasty.
> >
> > The patch below is a minimally safe patch to fix this regression for
> > 2.6.19 by just not requesting actual microcode updates during early
> > boot. (That is a good idea in general anyway)
> >
> > The "real" fix is a lot more complex given the entire cpu hotplug
> > scenario (during cpu hotplug you normally need to load the microcode as
> > well); but the interactions for that are just really messy at this
> > point; this fix at least makes it work and avoids a full detangle of
> > hotplug.
> Yes, this is an issue which I documented in my patch. It's not a hang,
> but a long delay if you have many cpus.
Due to the timeout? So it should come back after 10*num_online_cpus seconds?
Does Arjan have a lot of CPUs?
> Other drivers with firmware
> request have the same issue if they are built-in. Maybe we should fix
> the firmware request mechanism itself. I hope no distribution has
> microcode driver built-in.
But what would a fix look like? I think things would work OK if all the
appropriate stuff is present in initramfs, yes? We wouldn't want to break
that.
hm. kobject_uevent() stupidly returns void. If we were to fix that, is
there any reason why _request_firmware() should still wait for ten seconds
if kobject_uevent() returned a synchronous error? (ie:
__call_usermodehelper failed?)
Answer: yes. That won't work because request_firmware() uses
call_usermodehelper(wait=0) (iirc this bad thing was done because of
deadlock problems which were hard to fix properly).
But all it not lost - because call_usermodehelper() will use CLONE_VFORK I
_think_ we can still work out whether the child thread successfully exec'ed
a new program. It'd take a bit of hacking on the fork() code to make that
work though.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-07 1:59 ` Andrew Morton
@ 2006-11-07 9:13 ` Arjan van de Ven
2006-11-07 9:21 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-07 9:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: Shaohua Li, linux-kernel, bunk
>
> Due to the timeout? So it should come back after 10*num_online_cpus seconds?
>
> Does Arjan have a lot of CPUs?
eh yes, my test machine has quite a large number of those.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-07 9:13 ` Arjan van de Ven
@ 2006-11-07 9:21 ` Andrew Morton
2006-11-07 9:35 ` Arjan van de Ven
2006-11-07 9:43 ` Arjan van de Ven
0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2006-11-07 9:21 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Shaohua Li, linux-kernel, bunk
On Tue, 07 Nov 2006 10:13:17 +0100
Arjan van de Ven <arjan@linux.intel.com> wrote:
> >
> > Due to the timeout? So it should come back after 10*num_online_cpus seconds?
> >
> > Does Arjan have a lot of CPUs?
>
> eh yes, my test machine has quite a large number of those.
So did it really hang?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-07 9:21 ` Andrew Morton
@ 2006-11-07 9:35 ` Arjan van de Ven
2006-11-07 9:43 ` Arjan van de Ven
1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-07 9:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: Shaohua Li, linux-kernel, bunk
Andrew Morton wrote:
> On Tue, 07 Nov 2006 10:13:17 +0100
> Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>>> Due to the timeout? So it should come back after 10*num_online_cpus seconds?
>>>
>>> Does Arjan have a lot of CPUs?
>> eh yes, my test machine has quite a large number of those.
>
> So did it really hang?
I'll retry and wait a few minutes ;(
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Regression in 2.6.19-rc microcode driver
2006-11-07 9:21 ` Andrew Morton
2006-11-07 9:35 ` Arjan van de Ven
@ 2006-11-07 9:43 ` Arjan van de Ven
1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-07 9:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: Shaohua Li, linux-kernel, bunk
Andrew Morton wrote:
> On Tue, 07 Nov 2006 10:13:17 +0100
> Arjan van de Ven <arjan@linux.intel.com> wrote:
>
>>> Due to the timeout? So it should come back after 10*num_online_cpus seconds?
>>>
>>> Does Arjan have a lot of CPUs?
>> eh yes, my test machine has quite a large number of those.
>
> So did it really hang?
ok so it eventually continues.. just way past my "it hang" patience
window of about a minute
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-11-07 9:43 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-06 14:15 [patch] Regression in 2.6.19-rc microcode driver Arjan van de Ven
2006-11-06 19:01 ` Andrew Morton
2006-11-06 19:02 ` Arjan van de Ven
2006-11-06 19:18 ` Arjan van de Ven
2006-11-07 1:20 ` Shaohua Li
2006-11-07 1:59 ` Andrew Morton
2006-11-07 9:13 ` Arjan van de Ven
2006-11-07 9:21 ` Andrew Morton
2006-11-07 9:35 ` Arjan van de Ven
2006-11-07 9:43 ` Arjan van de Ven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox