* [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
@ 2001-12-03 5:46 j-nomura
2001-12-03 9:20 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: j-nomura @ 2001-12-03 5:46 UTC (permalink / raw)
To: linux-kernel; +Cc: j-nomura
Hello,
I experienced system hang on my SMP machine and it turned out to be due to
console write before mmu initialization completes.
To be more specific, even if secondary processors are not in status enough
to do actual console I/O (e.g. mmu is not initialized), call_console_drivers()
tries to do it.
This leads to unpredictable result. For me, for example, it cause machine
check abort and hang up system.
Attached is a patch for it.
--- kernel/printk.c 2001/11/27 04:41:49 1.1.1.8
+++ kernel/printk.c 2001/12/03 05:25:26
@@ -491,20 +491,22 @@
*/
void release_console_sem(void)
{
unsigned long flags;
unsigned long _con_start, _log_end;
unsigned long must_wake_klogd = 0;
for ( ; ; ) {
spin_lock_irqsave(&logbuf_lock, flags);
must_wake_klogd |= log_start - log_end;
+ if (!(cpu_online_map & 1UL << smp_processor_id()))
+ break;
if (con_start == log_end)
break; /* Nothing to print */
_con_start = con_start;
_log_end = log_end;
con_start = log_end; /* Flush */
spin_unlock_irqrestore(&logbuf_lock, flags);
call_console_drivers(_con_start, _log_end);
}
console_may_schedule = 0;
up(&console_sem);
Best regards.
--
NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) 2001-12-03 5:46 [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) j-nomura @ 2001-12-03 9:20 ` Andrew Morton 2001-12-03 10:32 ` j-nomura 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2001-12-03 9:20 UTC (permalink / raw) To: j-nomura; +Cc: linux-kernel j-nomura@ce.jp.nec.com wrote: > > Hello, > > I experienced system hang on my SMP machine and it turned out to be due to > console write before mmu initialization completes. > > To be more specific, even if secondary processors are not in status enough > to do actual console I/O (e.g. mmu is not initialized), call_console_drivers() > tries to do it. > This leads to unpredictable result. For me, for example, it cause machine > check abort and hang up system. > > Attached is a patch for it. > > --- kernel/printk.c 2001/11/27 04:41:49 1.1.1.8 > +++ kernel/printk.c 2001/12/03 05:25:26 > @@ -491,20 +491,22 @@ > */ > void release_console_sem(void) > { > unsigned long flags; > unsigned long _con_start, _log_end; > unsigned long must_wake_klogd = 0; > > for ( ; ; ) { > spin_lock_irqsave(&logbuf_lock, flags); > must_wake_klogd |= log_start - log_end; > + if (!(cpu_online_map & 1UL << smp_processor_id())) > + break; > if (con_start == log_end) > break; /* Nothing to print */ > _con_start = con_start; > _log_end = log_end; > con_start = log_end; /* Flush */ > spin_unlock_irqrestore(&logbuf_lock, flags); > call_console_drivers(_con_start, _log_end); > } > console_may_schedule = 0; > up(&console_sem); > Seems that there is some sort of ordering problem here - someone is calling printk before the MMU is initialised, but after some console drivers have been installed. I suspect the real fix is elsewhere, but I'm not sure where. Probably a clearer place to put this test would be within printk itself, immediately before the down_trylock. Does that work? - ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) 2001-12-03 9:20 ` Andrew Morton @ 2001-12-03 10:32 ` j-nomura 2001-12-04 1:45 ` [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: j-nomura @ 2001-12-03 10:32 UTC (permalink / raw) To: akpm; +Cc: j-nomura, linux-kernel Hi, Thank you for commenting. From: Andrew Morton <akpm@zip.com.au> Subject: Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) Date: Mon, 03 Dec 2001 01:20:28 -0800 > Seems that there is some sort of ordering problem here - someone > is calling printk before the MMU is initialised, but after some > console drivers have been installed. Yes. Because smp_init() is later in place than console_init(), printk() can be called in such a situation. For example, in IA-64, identify_cpu() is called before ia64_mmu_init(), while identify_cpu() calls printk() in it. I don't think the ordering itself is a problem. > I suspect the real fix is elsewhere, but I'm not sure where. > > Probably a clearer place to put this test would be within > printk itself, immediately before the down_trylock. Does that > work? The reason I put it in release_console_sem() is that release_console_sem() can be called from other functions than printk(), e.g. console_unblank(). I agree with you that it is clearer but I think it is not sufficient. Best regards. -- NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp> HPC Operating System Group, 1st Computers Software Division, Computers Software Operations Unit, NEC Solutions. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) 2001-12-03 10:32 ` j-nomura @ 2001-12-04 1:45 ` Andrew Morton 2001-12-06 5:01 ` j-nomura 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2001-12-04 1:45 UTC (permalink / raw) To: j-nomura; +Cc: linux-kernel j-nomura@ce.jp.nec.com wrote: > > Hi, > > Thank you for commenting. > > From: Andrew Morton <akpm@zip.com.au> > Subject: Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) > Date: Mon, 03 Dec 2001 01:20:28 -0800 > > > Seems that there is some sort of ordering problem here - someone > > is calling printk before the MMU is initialised, but after some > > console drivers have been installed. > > Yes. > Because smp_init() is later in place than console_init(), printk() can be > called in such a situation. > For example, in IA-64, identify_cpu() is called before ia64_mmu_init(), > while identify_cpu() calls printk() in it. > I don't think the ordering itself is a problem. > > > I suspect the real fix is elsewhere, but I'm not sure where. > > > > Probably a clearer place to put this test would be within > > printk itself, immediately before the down_trylock. Does that > > work? > > The reason I put it in release_console_sem() is that release_console_sem() > can be called from other functions than printk(), e.g. console_unblank(). > I agree with you that it is clearer but I think it is not sufficient. > I really doubt if any of those paths could be called before even the MMU is set up. It seems that the ia64 port has installed some console drivers, and has then called them before it is ready to do so. Via printk. It should not have installed the console drivers that early. Do you know what console driver is causing the problem? If the console driver is not fixable then a more general approach would be, in printk.c: #ifndef ARCH_HAS_PRINTK_MAY_BE_USED #define printk_may_be_used() (1) #endif then, in printk() itself: if (*p == '\n') log_level_unknown = 1; } + if (!printk_may_be_used()) + return printed_len; if (!down_trylock(&console_sem)) { /* then, for ia64, give it a printk_may_be_used() function, and define ARCH_HAS_PRINTK_MAY_BE_USED somewhere. Or just not install console drivers before they may be safely used! - ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) 2001-12-04 1:45 ` [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) Andrew Morton @ 2001-12-06 5:01 ` j-nomura 2001-12-07 3:40 ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: j-nomura @ 2001-12-06 5:01 UTC (permalink / raw) To: akpm; +Cc: j-nomura, linux-kernel Hi, excuse me for not prompt response. I've been off-line for 2 days. > > The reason I put it in release_console_sem() is that release_console_sem() > > can be called from other functions than printk(), e.g. console_unblank(). > > I agree with you that it is clearer but I think it is not sufficient. > > I really doubt if any of those paths could be called before > even the MMU is set up. I didn't have any intention to say that console_unblank() is called so early. OK. Here is revised patch which checks if cpu initialization is done just before down_trylock(). This works for me. diff -u -r1.1.1.8 printk.c --- kernel/printk.c 2001/11/27 04:41:49 1.1.1.8 +++ kernel/printk.c 2001/12/06 04:54:50 @@ -438,7 +438,13 @@ log_level_unknown = 1; } - if (!down_trylock(&console_sem)) { + if (!(cpu_online_map & 1UL << smp_processor_id())) { + /* + * The cpu has not been initialized completely + * enough to call console drivers. + */ + spin_unlock_irqrestore(&logbuf_lock, flags); + } else if (!down_trylock(&console_sem)) { /* * We own the drivers. We can drop the spinlock and let * release_console_sem() print the text Best regards. -- NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp> HPC Operating System Group, 1st Computers Software Division, Computers Software Operations Unit, NEC Solutions. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-06 5:01 ` j-nomura @ 2001-12-07 3:40 ` Andrew Morton 2001-12-07 18:52 ` Marcelo Tosatti 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2001-12-07 3:40 UTC (permalink / raw) To: j-nomura; +Cc: linux-kernel, David Mosberger, Marcelo Tosatti j-nomura@ce.jp.nec.com wrote: > > Hi, > excuse me for not prompt response. I've been off-line for 2 days. > > > > The reason I put it in release_console_sem() is that release_console_sem() > > > can be called from other functions than printk(), e.g. console_unblank(). > > > I agree with you that it is clearer but I think it is not sufficient. > > > > I really doubt if any of those paths could be called before > > even the MMU is set up. > Marcelo, after a fairly lengthy off-list discussion, it turns out that special-casing printk is probably the best way to proceed to fix this one. The problem is that the boot processor sets up the console drivers, and is able to call them via printk(), but the application processors at that time are not able to call printk() because the console device driver mappings are not set up. Undoing this inside the ia64 boot code is complex and fragile. Possibly the problem exists on other platforms but hasn't been discovered yet. So the patch defines an architecture-specific macro `arch_consoles_callable()' which, if not defined, defaults to `1', so the impact to other platforms (and to uniprocessor ia64) is zero. --- linux-2.4.17-pre4/include/asm-ia64/system.h Thu Nov 22 23:02:59 2001 +++ linux-akpm/include/asm-ia64/system.h Wed Dec 5 23:09:15 2001 @@ -405,6 +405,10 @@ extern void ia64_load_extra (struct task ia64_psr(ia64_task_regs(prev))->dfh = 1; \ __switch_to(prev,next,last); \ } while (0) + +/* Return true if this CPU can call the console drivers in printk() */ +#define arch_consoles_callable() (cpu_online_map & (1UL << smp_processor_id())) + #else # define switch_to(prev,next,last) do { \ ia64_psr(ia64_task_regs(next))->dfh = (ia64_get_fpu_owner() != (next)); \ --- linux-2.4.17-pre4/kernel/printk.c Thu Nov 22 23:02:59 2001 +++ linux-akpm/kernel/printk.c Wed Dec 5 23:03:54 2001 @@ -38,6 +38,10 @@ #define LOG_BUF_MASK (LOG_BUF_LEN-1) +#ifndef arch_consoles_callable +#define arch_consoles_callable() (1) +#endif + /* printk's without a loglevel use this.. */ #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */ @@ -438,6 +442,14 @@ asmlinkage int printk(const char *fmt, . log_level_unknown = 1; } + if (!arch_consoles_callable()) { + /* + * On some architectures, the consoles are not usable + * on secondary CPUs early in the boot process. + */ + spin_unlock_irqrestore(&logbuf_lock, flags); + goto out; + } if (!down_trylock(&console_sem)) { /* * We own the drivers. We can drop the spinlock and let @@ -454,6 +466,7 @@ asmlinkage int printk(const char *fmt, . */ spin_unlock_irqrestore(&logbuf_lock, flags); } +out: return printed_len; } EXPORT_SYMBOL(printk); ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 3:40 ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton @ 2001-12-07 18:52 ` Marcelo Tosatti 2001-12-07 20:52 ` William Lee Irwin III 2001-12-07 21:37 ` David Mosberger 0 siblings, 2 replies; 24+ messages in thread From: Marcelo Tosatti @ 2001-12-07 18:52 UTC (permalink / raw) To: Andrew Morton; +Cc: j-nomura, linux-kernel, David Mosberger How hard would it be to fix that on ia64 code? I'm really not willing to apply this kludge... On Thu, 6 Dec 2001, Andrew Morton wrote: > j-nomura@ce.jp.nec.com wrote: > > > > Hi, > > excuse me for not prompt response. I've been off-line for 2 days. > > > > > > The reason I put it in release_console_sem() is that release_console_sem() > > > > can be called from other functions than printk(), e.g. console_unblank(). > > > > I agree with you that it is clearer but I think it is not sufficient. > > > > > > I really doubt if any of those paths could be called before > > > even the MMU is set up. > > > > Marcelo, > > after a fairly lengthy off-list discussion, it turns out that > special-casing printk is probably the best way to proceed > to fix this one. > > The problem is that the boot processor sets up the console drivers, > and is able to call them via printk(), but the application processors > at that time are not able to call printk() because the console device > driver mappings are not set up. Undoing this inside the ia64 boot code is > complex and fragile. Possibly the problem exists on other platforms > but hasn't been discovered yet. > > So the patch defines an architecture-specific macro `arch_consoles_callable()' > which, if not defined, defaults to `1', so the impact to other platforms > (and to uniprocessor ia64) is zero. > > > > --- linux-2.4.17-pre4/include/asm-ia64/system.h Thu Nov 22 23:02:59 2001 > +++ linux-akpm/include/asm-ia64/system.h Wed Dec 5 23:09:15 2001 > @@ -405,6 +405,10 @@ extern void ia64_load_extra (struct task > ia64_psr(ia64_task_regs(prev))->dfh = 1; \ > __switch_to(prev,next,last); \ > } while (0) > + > +/* Return true if this CPU can call the console drivers in printk() */ > +#define arch_consoles_callable() (cpu_online_map & (1UL << smp_processor_id())) > + > #else > # define switch_to(prev,next,last) do { \ > ia64_psr(ia64_task_regs(next))->dfh = (ia64_get_fpu_owner() != (next)); \ > --- linux-2.4.17-pre4/kernel/printk.c Thu Nov 22 23:02:59 2001 > +++ linux-akpm/kernel/printk.c Wed Dec 5 23:03:54 2001 > @@ -38,6 +38,10 @@ > > #define LOG_BUF_MASK (LOG_BUF_LEN-1) > > +#ifndef arch_consoles_callable > +#define arch_consoles_callable() (1) > +#endif > + > /* printk's without a loglevel use this.. */ > #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */ > > @@ -438,6 +442,14 @@ asmlinkage int printk(const char *fmt, . > log_level_unknown = 1; > } > > + if (!arch_consoles_callable()) { > + /* > + * On some architectures, the consoles are not usable > + * on secondary CPUs early in the boot process. > + */ > + spin_unlock_irqrestore(&logbuf_lock, flags); > + goto out; > + } > if (!down_trylock(&console_sem)) { > /* > * We own the drivers. We can drop the spinlock and let > @@ -454,6 +466,7 @@ asmlinkage int printk(const char *fmt, . > */ > spin_unlock_irqrestore(&logbuf_lock, flags); > } > +out: > return printed_len; > } > EXPORT_SYMBOL(printk); > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 18:52 ` Marcelo Tosatti @ 2001-12-07 20:52 ` William Lee Irwin III 2001-12-07 21:37 ` David Mosberger 1 sibling, 0 replies; 24+ messages in thread From: William Lee Irwin III @ 2001-12-07 20:52 UTC (permalink / raw) To: linux-kernel On Fri, Dec 07, 2001 at 04:52:07PM -0200, Marcelo Tosatti wrote: > How hard would it be to fix that on ia64 code? > I'm really not willing to apply this kludge... It's possible, but the off-list discussion's consensus centered around this being a bootstrap ordering issue, where drivers may not be called from the application processors (at least not universally) until they have been initialized to some degree. printk() is in the unique position of performing such calls, and that is the idea around which this patch is centered. Specifically, on those architectures where some initialization of kernel virtual addressing is required to access memory regions from which console (or any) I/O is done, the driver calls may not be done from application processors until they have been initialized (on IA64, this is initializing the uncached region's region register and installing TLB fault handlers). So this is actually a broader issue than IA64 alone, though IA64 seems to be the first to have encountered it. The patch here provides a hook for all affected architectures to protect themselves against this issue, although there are perhaps other ways. Cheers, Bill ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 18:52 ` Marcelo Tosatti 2001-12-07 20:52 ` William Lee Irwin III @ 2001-12-07 21:37 ` David Mosberger 2001-12-07 20:47 ` Marcelo Tosatti ` (2 more replies) 1 sibling, 3 replies; 24+ messages in thread From: David Mosberger @ 2001-12-07 21:37 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andrew Morton, j-nomura, linux-kernel, David Mosberger >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: Marcelo> I'm really not willing to apply this kludge... Do you agree that it should always be safe to call printk() from C code? --david ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 21:37 ` David Mosberger @ 2001-12-07 20:47 ` Marcelo Tosatti 2001-12-07 22:17 ` David Mosberger 2001-12-07 22:08 ` Alan Cox 2001-12-07 22:14 ` Christopher Friesen 2 siblings, 1 reply; 24+ messages in thread From: Marcelo Tosatti @ 2001-12-07 20:47 UTC (permalink / raw) To: David Mosberger; +Cc: Andrew Morton, j-nomura, linux-kernel On Fri, 7 Dec 2001, David Mosberger wrote: > >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: > > Marcelo> I'm really not willing to apply this kludge... > > Do you agree that it should always be safe to call printk() from C code? No if you can't access the console to print the message :) Its just that I would prefer to see the thing fixed in arch-dependant code instead special casing core code. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 20:47 ` Marcelo Tosatti @ 2001-12-07 22:17 ` David Mosberger 2001-12-07 21:09 ` Marcelo Tosatti 0 siblings, 1 reply; 24+ messages in thread From: David Mosberger @ 2001-12-07 22:17 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: David Mosberger, Andrew Morton, j-nomura, linux-kernel >>>>> On Fri, 7 Dec 2001 18:47:11 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: Marcelo> On Fri, 7 Dec 2001, David Mosberger wrote: >> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti >> <marcelo@conectiva.com.br> said: >> Marcelo> I'm really not willing to apply this kludge... >> Do you agree that it should always be safe to call printk() from >> C code? Marcelo> No if you can't access the console to print the message :) Let me quote the first few lines of the Linux kernel: ---- asmlinkage void __init start_kernel(void) { char * command_line; unsigned long mempages; extern char saved_command_line[]; /* * Interrupts are still disabled. Do necessary setups, then * enable them */ lock_kernel(); printk(linux_banner); ---- You still think it doesn't make sense? Marcelo> Its just that I would prefer to see the thing fixed in Marcelo> arch-dependant code instead special casing core code. Only architecture specific problems should be fixed with architecture specific code. I'm not entirely sure whether this particular problem is architecture specific. Perhaps it is and, if so, I'm certainly happy to fix it in the ia64 specific code. However, are you really 100% certain that on x86 there are no console drivers which in some fashion depend on cpu_init() having completed execution? Note that the x86 version of cpu_init() also has printk()s. If you're not certain of this, the AP startup problem could occur on x86, too. I haven't looked at all the other platforms, but I suspect similar things will be true, there. --david ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 22:17 ` David Mosberger @ 2001-12-07 21:09 ` Marcelo Tosatti 2001-12-08 1:10 ` David Mosberger 0 siblings, 1 reply; 24+ messages in thread From: Marcelo Tosatti @ 2001-12-07 21:09 UTC (permalink / raw) To: David Mosberger; +Cc: Andrew Morton, j-nomura, linux-kernel On Fri, 7 Dec 2001, David Mosberger wrote: > >>>>> On Fri, 7 Dec 2001 18:47:11 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: > > Marcelo> On Fri, 7 Dec 2001, David Mosberger wrote: > > >> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti > >> <marcelo@conectiva.com.br> said: > >> > Marcelo> I'm really not willing to apply this kludge... > >> Do you agree that it should always be safe to call printk() from > >> C code? > > Marcelo> No if you can't access the console to print the message :) > > Let me quote the first few lines of the Linux kernel: > > ---- > asmlinkage void __init start_kernel(void) > { > char * command_line; > unsigned long mempages; > extern char saved_command_line[]; > /* > * Interrupts are still disabled. Do necessary setups, then > * enable them > */ > lock_kernel(); > printk(linux_banner); > ---- > > You still think it doesn't make sense? Ok, it does. However, this still does not make me change my mind. > Marcelo> Its just that I would prefer to see the thing fixed in > Marcelo> arch-dependant code instead special casing core code. > > Only architecture specific problems should be fixed with architecture > specific code. > > I'm not entirely sure whether this particular problem is architecture > specific. Perhaps it is and, if so, I'm certainly happy to fix it in > the ia64 specific code. However, are you really 100% certain that on > x86 there are no console drivers which in some fashion depend on > cpu_init() having completed execution? Note that the x86 version of > cpu_init() also has printk()s. If you're not certain of this, the AP > startup problem could occur on x86, too. I haven't looked at all the > other platforms, but I suspect similar things will be true, there. Prove, please. If you show me it can also happen on other architectures, I'll be glad to apply the patch. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 21:09 ` Marcelo Tosatti @ 2001-12-08 1:10 ` David Mosberger 2001-12-08 11:27 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: David Mosberger @ 2001-12-08 1:10 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: David Mosberger, Andrew Morton, j-nomura, linux-kernel >>>>> On Fri, 7 Dec 2001 19:09:03 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: >> I'm not entirely sure whether this particular problem is >> architecture specific. Perhaps it is and, if so, I'm certainly >> happy to fix it in the ia64 specific code. However, are you >> really 100% certain that on x86 there are no console drivers >> which in some fashion depend on cpu_init() having completed >> execution? Note that the x86 version of cpu_init() also has >> printk()s. If you're not certain of this, the AP startup problem >> could occur on x86, too. I haven't looked at all the other >> platforms, but I suspect similar things will be true, there. Marcelo> Prove, please. If you show me it can also happen on other Marcelo> architectures, I'll be glad to apply the patch. I'm no x86 expert, but I have the impression that current_cpu_data.loops_per_jiffy will be invalid (probably 0) until smp_store_cpu_info() is called in smp_callin(). If so, a console driver using udelay() might not work properly. I suspect there are other issues, but this is just based on looking at the x86 source code for 5 minutes. --david ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-08 1:10 ` David Mosberger @ 2001-12-08 11:27 ` Alan Cox 2001-12-08 16:41 ` David Mosberger 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2001-12-08 11:27 UTC (permalink / raw) To: davidm; +Cc: Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel > I'm no x86 expert, but I have the impression that > current_cpu_data.loops_per_jiffy will be invalid (probably 0) until > smp_store_cpu_info() is called in smp_callin(). If so, a console > driver using udelay() might not work properly. I suspect there are > other issues, but this is just based on looking at the x86 source code > for 5 minutes. x86_udelay_tsc wont have been set at that point so the main timer is still being used. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-08 11:27 ` Alan Cox @ 2001-12-08 16:41 ` David Mosberger 2001-12-08 20:45 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: David Mosberger @ 2001-12-08 16:41 UTC (permalink / raw) To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel >>>>> On Sat, 8 Dec 2001 11:27:19 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said: >> I'm no x86 expert, but I have the impression that >> current_cpu_data.loops_per_jiffy will be invalid (probably 0) >> until smp_store_cpu_info() is called in smp_callin(). If so, a >> console driver using udelay() might not work properly. I suspect >> there are other issues, but this is just based on looking at the >> x86 source code for 5 minutes. Alan> x86_udelay_tsc wont have been set at that point so the main Alan> timer is still being used. x86 does use current_cpu_data.loops_per_jiffy in the non-TSC case, no? --david ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-08 16:41 ` David Mosberger @ 2001-12-08 20:45 ` Alan Cox 2001-12-09 0:32 ` David Mosberger 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2001-12-08 20:45 UTC (permalink / raw) To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel > Alan> x86_udelay_tsc wont have been set at that point so the main > Alan> timer is still being used. > > x86 does use current_cpu_data.loops_per_jiffy in the non-TSC case, no? I believe so. So we should propogate that across earlier, although its not needed for our current console drivers that I can see ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-08 20:45 ` Alan Cox @ 2001-12-09 0:32 ` David Mosberger 2001-12-09 0:55 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: David Mosberger @ 2001-12-09 0:32 UTC (permalink / raw) To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel >>>>> On Sat, 8 Dec 2001 20:45:07 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said: Alan> x86_udelay_tsc wont have been set at that point so the main Alan> timer is still being used. >> x86 does use current_cpu_data.loops_per_jiffy in the non-TSC >> case, no? Alan> I believe so. So we should propogate that across earlier, Alan> although its not needed for our current console drivers that I Alan> can see I don't think you can do it early enough. calibrate_delay() requires irqs to be enabled and the first printk() happens long before irqs are enabled on an AP. --david ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-09 0:32 ` David Mosberger @ 2001-12-09 0:55 ` Alan Cox 2001-12-09 0:58 ` David Mosberger 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2001-12-09 0:55 UTC (permalink / raw) To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel > I don't think you can do it early enough. calibrate_delay() requires > irqs to be enabled and the first printk() happens long before irqs are > enabled on an AP. So we make sure our initial console code doesnt need udelay(), or set an initial safe default like 25MHz ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-09 0:55 ` Alan Cox @ 2001-12-09 0:58 ` David Mosberger 2001-12-09 1:15 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: David Mosberger @ 2001-12-09 0:58 UTC (permalink / raw) To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel >>>>> On Sun, 9 Dec 2001 00:55:25 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said: >> I don't think you can do it early enough. calibrate_delay() >> requires irqs to be enabled and the first printk() happens long >> before irqs are enabled on an AP. Alan> So we make sure our initial console code doesnt need udelay(), Alan> or set an initial safe default like 25MHz So someone is going to maintain a list of what a console driver can and cannot do for all 12+ ports in existence? The alternative is to do: --- linux-2.4.16/kernel/printk.c Mon Nov 26 11:19:24 2001 +++ lia64-kdb/kernel/printk.c Thu Nov 29 21:45:08 2001 @@ -498,6 +505,10 @@ for ( ; ; ) { spin_lock_irqsave(&logbuf_lock, flags); must_wake_klogd |= log_start - log_end; +#ifdef CONFIG_SMP + if (!(cpu_online_map & (1UL << smp_processor_id()))) + break; +#endif if (con_start == log_end) break; /* Nothing to print */ _con_start = con_start; and be done with it. --david ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-09 0:58 ` David Mosberger @ 2001-12-09 1:15 ` Alan Cox 2001-12-09 1:14 ` David Mosberger 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2001-12-09 1:15 UTC (permalink / raw) To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel > Alan> So we make sure our initial console code doesnt need udelay(), > Alan> or set an initial safe default like 25MHz > > So someone is going to maintain a list of what a console driver can > and cannot do for all 12+ ports in existence? > > The alternative is to do: And break the ability for non broken setups to debug SMP boot up. Lets do the job properly. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-09 1:15 ` Alan Cox @ 2001-12-09 1:14 ` David Mosberger 2001-12-09 1:32 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: David Mosberger @ 2001-12-09 1:14 UTC (permalink / raw) To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel >>>>> On Sun, 9 Dec 2001 01:15:25 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said: Alan> And break the ability for non broken setups to debug SMP boot Alan> up. Lets do the job properly. Then use Andrew's patch (attached below). --david --- linux-2.4.17-pre4/include/asm-ia64/system.h Thu Nov 22 23:02:59 2001 +++ linux-akpm/include/asm-ia64/system.h Wed Dec 5 23:09:15 2001 @@ -405,6 +405,10 @@ extern void ia64_load_extra (struct task ia64_psr(ia64_task_regs(prev))->dfh = 1; \ __switch_to(prev,next,last); \ } while (0) + +/* Return true if this CPU can call the console drivers in printk() */ +#define arch_consoles_callable() (cpu_online_map & (1UL << smp_processor_id())) + #else # define switch_to(prev,next,last) do { \ ia64_psr(ia64_task_regs(next))->dfh = (ia64_get_fpu_owner() != (next)); \ --- linux-2.4.17-pre4/kernel/printk.c Thu Nov 22 23:02:59 2001 +++ linux-akpm/kernel/printk.c Wed Dec 5 23:03:54 2001 @@ -38,6 +38,10 @@ #define LOG_BUF_MASK (LOG_BUF_LEN-1) +#ifndef arch_consoles_callable +#define arch_consoles_callable() (1) +#endif + /* printk's without a loglevel use this.. */ #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */ @@ -438,6 +442,14 @@ asmlinkage int printk(const char *fmt, . log_level_unknown = 1; } + if (!arch_consoles_callable()) { + /* + * On some architectures, the consoles are not usable + * on secondary CPUs early in the boot process. + */ + spin_unlock_irqrestore(&logbuf_lock, flags); + goto out; + } if (!down_trylock(&console_sem)) { /* * We own the drivers. We can drop the spinlock and let @@ -454,6 +466,7 @@ asmlinkage int printk(const char *fmt, . */ spin_unlock_irqrestore(&logbuf_lock, flags); } +out: return printed_len; } EXPORT_SYMBOL(printk); ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-09 1:14 ` David Mosberger @ 2001-12-09 1:32 ` Alan Cox 0 siblings, 0 replies; 24+ messages in thread From: Alan Cox @ 2001-12-09 1:32 UTC (permalink / raw) To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel > Alan> And break the ability for non broken setups to debug SMP boot > Alan> up. Lets do the job properly. > > Then use Andrew's patch (attached below). No objection. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 21:37 ` David Mosberger 2001-12-07 20:47 ` Marcelo Tosatti @ 2001-12-07 22:08 ` Alan Cox 2001-12-07 22:14 ` Christopher Friesen 2 siblings, 0 replies; 24+ messages in thread From: Alan Cox @ 2001-12-07 22:08 UTC (permalink / raw) To: davidm; +Cc: Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel > >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: > Marcelo> I'm really not willing to apply this kludge... > > Do you agree that it should always be safe to call printk() from C code? Sounds a good goal, but surely thats up to the arch code to get right ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) 2001-12-07 21:37 ` David Mosberger 2001-12-07 20:47 ` Marcelo Tosatti 2001-12-07 22:08 ` Alan Cox @ 2001-12-07 22:14 ` Christopher Friesen 2 siblings, 0 replies; 24+ messages in thread From: Christopher Friesen @ 2001-12-07 22:14 UTC (permalink / raw) To: davidm; +Cc: Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel David Mosberger wrote: > > >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said: > > Marcelo> I'm really not willing to apply this kludge... > > Do you agree that it should always be safe to call printk() from C code? Is it really safe to call this from interrupt handlers? I can think of cases where the time required to print can totally mess stuff up... Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2001-12-09 1:24 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-12-03 5:46 [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) j-nomura 2001-12-03 9:20 ` Andrew Morton 2001-12-03 10:32 ` j-nomura 2001-12-04 1:45 ` [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) Andrew Morton 2001-12-06 5:01 ` j-nomura 2001-12-07 3:40 ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton 2001-12-07 18:52 ` Marcelo Tosatti 2001-12-07 20:52 ` William Lee Irwin III 2001-12-07 21:37 ` David Mosberger 2001-12-07 20:47 ` Marcelo Tosatti 2001-12-07 22:17 ` David Mosberger 2001-12-07 21:09 ` Marcelo Tosatti 2001-12-08 1:10 ` David Mosberger 2001-12-08 11:27 ` Alan Cox 2001-12-08 16:41 ` David Mosberger 2001-12-08 20:45 ` Alan Cox 2001-12-09 0:32 ` David Mosberger 2001-12-09 0:55 ` Alan Cox 2001-12-09 0:58 ` David Mosberger 2001-12-09 1:15 ` Alan Cox 2001-12-09 1:14 ` David Mosberger 2001-12-09 1:32 ` Alan Cox 2001-12-07 22:08 ` Alan Cox 2001-12-07 22:14 ` Christopher Friesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox