* Re: [PATCH] panic: fix incomplete panic log in panic()
2012-10-15 22:02 ` Andrew Morton
@ 2012-10-15 22:06 ` Andi Kleen
2012-10-16 14:25 ` Qing Z
2012-10-17 10:44 ` Qing Z
2 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2012-10-15 22:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Qing Z, mingo, ben, markivx, linux-kernel, cxie4, binw, wwang27,
xjian, zhangwm, Qing Zhu
> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed. Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess. (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example. But you see the problem).
iirc the debugging code normally checks oops_in_progress, so this shouldn't
happen.
Agree on the rest of your email -- keep oops simple ... (KOSS)
-Andi
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] panic: fix incomplete panic log in panic()
2012-10-15 22:02 ` Andrew Morton
2012-10-15 22:06 ` Andi Kleen
@ 2012-10-16 14:25 ` Qing Z
2012-10-17 10:44 ` Qing Z
2 siblings, 0 replies; 10+ messages in thread
From: Qing Z @ 2012-10-16 14:25 UTC (permalink / raw)
To: Andrew Morton
Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
xjian, zhangwm, Qing Zhu
2012/10/16 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 15 Oct 2012 19:38:46 +0800
> Qing Z <njumical@gmail.com> wrote:
>
>> >> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >>
>> >> + /*
>> >> + * Unlock the console anyway here, in case it's occupied by another
>> >> + * one which has no chance to unlock the console thus prevents the
>> >> + * panic log prints on the console.
>> >> + */
>> >> + console_unlock();
>> >> +
>> >> bust_spinlocks(0);
>> >>
>> >> if (!panic_blink)
>> >
>> > hm. console_unlock() does a large amount of work, and it seems risky
>> > to do all of that when the system is in a bad-and-getting-worse state.
>> >
>> > Is there some more modest thing we could do here, for example,
>> >
>> > if (console_locked) {
>> > up(&console_sem);
>> > console_locked = 0;
>> > }
>> >
>> > or something along those lines?
>> >
>> > Also, perhaps this operation should be moved into bust_spinlocks().
>> > What would have happened if your code had triggered an oops, rather
>> > than called panic()?
>> >
>> >
>> Hi Andrew,
>> Thanks for your reply!
>> For your question" What would have happened if your code had
>> triggered an oops, rather than called panic()?", actually we found the
>> issue when trigger an oops. When we call FBIOPAN_DISPLAY in
>> ./drivers/video/fbmem.c, it will first lock console, if we trigger an
>> oops before unlock console, the issue happen. It also exist when call
>> panic() directly in the same case. It is a common issue for panic
>> process.
>> I have two options for solution:
>> 1. I agree with your suggestion that add some modest thing in
>> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
>> which prevent oops information from reaching the user. But it didn't
>> clear console_sem. We can add codes that clear console_sem.
>> 1) add up(&console_sem) in bust_spinlocks(0).
>> It will be risky in case that no printk after bust_spinlocks(0) in
>> panic(), because no console_unlock() to print log out.
>> 2) call console_unlock()in bust_spinlocks(0).
>> For bust_spinlocks(0), console_unblank() is used to flush oops to
>> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
>> Logically, if panic without the issue, console_unlock is called after
>> couples of console_lock and console_unlock; if panic with the issue,
>> will it be risky call console_unlock() in console_unblank() after
>> console_lock()?
>> 2. Moreover, there is another option. We can also add protect codes
>> in vprintk(), vprintk() just cover the cases that two cores' log
>> interleave when panic and printk recurse itself. We can add all cases'
>> protection here. Actually the original vprintk() don___t have the issue,
>> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
>> which fix two cores' log interleave issue , the issue is not covered.
>> I add a flag after panic_smp_self_stop() in panic(), and check the
>> flag, if flag is set, vprintk will call zap_locks(), I have tested the
>> option, the issue also disappear.
>> What do you think?
>
> The #1 priority is to get the oops message reliably delivered.
>
> That means we should avoid console_unlock() on the oops path: it's far
> too complicated and risks deadlocks, re-oopses, recursion, etc.
>
> If there was text queued in the console layer and that text fails to be
> emitted, well, that's sad, but it's more important that the oops
> message be displayed.
>
> If the oops trace is occasionally interleaved with other text then
> that's sad too, but at least the info we need is readable. Oopses
> inside console_lock() are rare.
>
>
> So I'd suggest that the code in bust_spinlocks(1) should simply do
> whatever needs to be done to make the forthcoming oops trace be
> visible, and leave it at that - don't bother trying to flush out any
> old text.
>
>
> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed. Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess. (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example. But you see the problem).
>
Hi Andrew,
It is indeed risky that call console_unlock() directly in
panic process, like deadlock that core0 stop core1 but core1 is
holding logbuf_lock.
As to old text, if we hold logbuf_lock when panic,we can keep
panic log continuous and entire, old text will be before panic
log.Frankly, I don't see any risk that old text impact or interleave
panic log.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] panic: fix incomplete panic log in panic()
2012-10-15 22:02 ` Andrew Morton
2012-10-15 22:06 ` Andi Kleen
2012-10-16 14:25 ` Qing Z
@ 2012-10-17 10:44 ` Qing Z
2012-10-18 0:06 ` Andrew Morton
2 siblings, 1 reply; 10+ messages in thread
From: Qing Z @ 2012-10-17 10:44 UTC (permalink / raw)
To: Andrew Morton
Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
xjian, zhangwm, Qing Zhu
2012/10/16 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 15 Oct 2012 19:38:46 +0800
> Qing Z <njumical@gmail.com> wrote:
>
>> >> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >>
>> >> + /*
>> >> + * Unlock the console anyway here, in case it's occupied by another
>> >> + * one which has no chance to unlock the console thus prevents the
>> >> + * panic log prints on the console.
>> >> + */
>> >> + console_unlock();
>> >> +
>> >> bust_spinlocks(0);
>> >>
>> >> if (!panic_blink)
>> >
>> > hm. console_unlock() does a large amount of work, and it seems risky
>> > to do all of that when the system is in a bad-and-getting-worse state.
>> >
>> > Is there some more modest thing we could do here, for example,
>> >
>> > if (console_locked) {
>> > up(&console_sem);
>> > console_locked = 0;
>> > }
>> >
>> > or something along those lines?
>> >
>> > Also, perhaps this operation should be moved into bust_spinlocks().
>> > What would have happened if your code had triggered an oops, rather
>> > than called panic()?
>> >
>> >
>> Hi Andrew,
>> Thanks for your reply!
>> For your question" What would have happened if your code had
>> triggered an oops, rather than called panic()?", actually we found the
>> issue when trigger an oops. When we call FBIOPAN_DISPLAY in
>> ./drivers/video/fbmem.c, it will first lock console, if we trigger an
>> oops before unlock console, the issue happen. It also exist when call
>> panic() directly in the same case. It is a common issue for panic
>> process.
>> I have two options for solution:
>> 1. I agree with your suggestion that add some modest thing in
>> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
>> which prevent oops information from reaching the user. But it didn't
>> clear console_sem. We can add codes that clear console_sem.
>> 1) add up(&console_sem) in bust_spinlocks(0).
>> It will be risky in case that no printk after bust_spinlocks(0) in
>> panic(), because no console_unlock() to print log out.
>> 2) call console_unlock()in bust_spinlocks(0).
>> For bust_spinlocks(0), console_unblank() is used to flush oops to
>> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
>> Logically, if panic without the issue, console_unlock is called after
>> couples of console_lock and console_unlock; if panic with the issue,
>> will it be risky call console_unlock() in console_unblank() after
>> console_lock()?
>> 2. Moreover, there is another option. We can also add protect codes
>> in vprintk(), vprintk() just cover the cases that two cores' log
>> interleave when panic and printk recurse itself. We can add all cases'
>> protection here. Actually the original vprintk() don___t have the issue,
>> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
>> which fix two cores' log interleave issue , the issue is not covered.
>> I add a flag after panic_smp_self_stop() in panic(), and check the
>> flag, if flag is set, vprintk will call zap_locks(), I have tested the
>> option, the issue also disappear.
>> What do you think?
>
> The #1 priority is to get the oops message reliably delivered.
>
> That means we should avoid console_unlock() on the oops path: it's far
> too complicated and risks deadlocks, re-oopses, recursion, etc.
>
> If there was text queued in the console layer and that text fails to be
> emitted, well, that's sad, but it's more important that the oops
> message be displayed.
>
> If the oops trace is occasionally interleaved with other text then
> that's sad too, but at least the info we need is readable. Oopses
> inside console_lock() are rare.
>
>
> So I'd suggest that the code in bust_spinlocks(1) should simply do
> whatever needs to be done to make the forthcoming oops trace be
> visible, and leave it at that - don't bother trying to flush out any
> old text.
>
>
> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed. Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess. (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example. But you see the problem).
>
Hi Andrew,
Thanks for your suggestions.
I am e newbie, may not state the issue clearly. Let me state it again:
In ./drivers/video/fbmem.c, codes below cause issues:
case FBIOPAN_DISPLAY:
...
console_lock();
ret = fb_pan_display(info, &var);
console_unlock();
...
break;
issue case 1:
1. core 0 call console_lock();
2. panic;
...
4. panic process done.
Result: all panic log won't be printed.
issue case 2:
1. core 0 panic;
2. core 1 call console_lock();
3. core 0 call smp_send_stop(), core1 stop;
4. core 0 panic process done.
Result: only little top part of panic log will be printed.
My soluiton according to your suggestions:
As you said, the first priority is to get oops message reliably
delivered. I think we needn't care about console_sem when panic, just
make sure we print the log imediately, so add
sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks()
do. It is safer than console_unlock() or up().
We can't add sema_init(..) in bust_spinlocks(1) due to issue case2,
although the condition is rare. About issue case 2: should we avoid
call console_lock() when panic?
If we init console_sem in panic, old text may be flushed too, but
should be before panic oops message. Also we can fix it by updating
con_start("con_start = log_end") once panic happen, only log after
panic will be printed.
What do you think? Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] panic: fix incomplete panic log in panic()
2012-10-17 10:44 ` Qing Z
@ 2012-10-18 0:06 ` Andrew Morton
2012-10-22 14:54 ` Qing Z
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2012-10-18 0:06 UTC (permalink / raw)
To: Qing Z
Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
xjian, zhangwm, Qing Zhu
On Wed, 17 Oct 2012 18:44:32 +0800
Qing Z <njumical@gmail.com> wrote:
> In ./drivers/video/fbmem.c, codes below cause issues:
>
> case FBIOPAN_DISPLAY:
> ...
> console_lock();
> ret = fb_pan_display(info, &var);
> console_unlock();
> ...
> break;
>
> issue case 1:
> 1. core 0 call console_lock();
> 2. panic;
> ...
> 4. panic process done.
> Result: all panic log won't be printed.
>
> issue case 2:
> 1. core 0 panic;
> 2. core 1 call console_lock();
> 3. core 0 call smp_send_stop(), core1 stop;
> 4. core 0 panic process done.
> Result: only little top part of panic log will be printed.
>
> My soluiton according to your suggestions:
>
> As you said, the first priority is to get oops message reliably
> delivered. I think we needn't care about console_sem when panic, just
> make sure we print the log imediately, so add
> sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks()
> do. It is safer than console_unlock() or up().
hm, I see.
> We can't add sema_init(..) in bust_spinlocks(1) due to issue case2,
> although the condition is rare. About issue case 2: should we avoid
> call console_lock() when panic?
Well, I think we do have infrastructure to support that:
+ if (!oops_in_progress)
console_lock();
I haven't looked to see how practical that approach would be.
It would be better if we were to do
if (oops_in_progress)
console_trylock();
else
console_lock();
where console_trylock() would *try* to do a console_lock() but would
bail out if it was unable to immediately take the lock. This is better
because most of the time, the oopsing CPU *will* lock the console and
will prevent other code from getting into the console code and messing
things up.
A problem with this approach is that it is very hard to test - the
"console_trylock failed" case will be rare.
I think it would be acceptable to just skip over the console_lock() if
oops_in_progress is set. And if we skipped the console_lock(), we
should also skip the console_unlock(). So something like:
bool console_unlock_needed = true;
if (unlikely(oops_in_progress))
console_unlock_needed = false;
else
console_lock();
...
if (console_unlock_needed)
console_unlock();
> If we init console_sem in panic, old text may be flushed too, but
> should be before panic oops message. Also we can fix it by updating
> con_start("con_start = log_end") once panic happen, only log after
> panic will be printed.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] panic: fix incomplete panic log in panic()
2012-10-18 0:06 ` Andrew Morton
@ 2012-10-22 14:54 ` Qing Z
2012-10-22 19:51 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Qing Z @ 2012-10-22 14:54 UTC (permalink / raw)
To: Andrew Morton
Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
xjian, zhangwm, Qing Zhu
2012/10/18 Andrew Morton <akpm@linux-foundation.org>:
> On Wed, 17 Oct 2012 18:44:32 +0800
> Qing Z <njumical@gmail.com> wrote:
>
>> In ./drivers/video/fbmem.c, codes below cause issues:
>>
>> case FBIOPAN_DISPLAY:
>> ...
>> console_lock();
>> ret = fb_pan_display(info, &var);
>> console_unlock();
>> ...
>> break;
>>
>> issue case 1:
>> 1. core 0 call console_lock();
>> 2. panic;
>> ...
>> 4. panic process done.
>> Result: all panic log won't be printed.
>>
>> issue case 2:
>> 1. core 0 panic;
>> 2. core 1 call console_lock();
>> 3. core 0 call smp_send_stop(), core1 stop;
>> 4. core 0 panic process done.
>> Result: only little top part of panic log will be printed.
>>
>> My soluiton according to your suggestions:
>>
>> As you said, the first priority is to get oops message reliably
>> delivered. I think we needn't care about console_sem when panic, just
>> make sure we print the log imediately, so add
>> sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks()
>> do. It is safer than console_unlock() or up().
>
> hm, I see.
>
>> We can't add sema_init(..) in bust_spinlocks(1) due to issue case2,
>> although the condition is rare. About issue case 2: should we avoid
>> call console_lock() when panic?
>
> Well, I think we do have infrastructure to support that:
>
> + if (!oops_in_progress)
> console_lock();
>
> I haven't looked to see how practical that approach would be.
>
>
> It would be better if we were to do
>
> if (oops_in_progress)
> console_trylock();
> else
> console_lock();
>
> where console_trylock() would *try* to do a console_lock() but would
> bail out if it was unable to immediately take the lock. This is better
> because most of the time, the oopsing CPU *will* lock the console and
> will prevent other code from getting into the console code and messing
> things up.
>
> A problem with this approach is that it is very hard to test - the
> "console_trylock failed" case will be rare.
>
>
> I think it would be acceptable to just skip over the console_lock() if
> oops_in_progress is set. And if we skipped the console_lock(), we
> should also skip the console_unlock(). So something like:
>
> bool console_unlock_needed = true;
>
> if (unlikely(oops_in_progress))
> console_unlock_needed = false;
> else
> console_lock();
>
> ...
>
> if (console_unlock_needed)
> console_unlock();
>
>
>> If we init console_sem in panic, old text may be flushed too, but
>> should be before panic oops message. Also we can fix it by updating
>> con_start("con_start = log_end") once panic happen, only log after
>> panic will be printed.
Hi Andrew,
Basically, console_unlock() should be called to make panic
log printed. Call console_unlock() in panic have some risks when
recurse in it(are there other bad cases?). The condition is very rare
and the two issue cases I list always happen between console_lock()
and console_unlock(). So I think we need to couple with
console_lock(), but should avoid the case that panic happen in
console_unlock(). I think it is a more modest and safe way. Please
corect me if there is something wrong. Thanks!
bool Is_in_console_unlock;
void console_unlock(void)
{
...
+ Is_in_console_unlock = ture;
/* flush buffered message fragment immediately to console */
console_cont_flush(text, sizeof(text));
again:
for (;;) {
....
+ Is_in_console_unlock = false;
}
void panic(const char *fmt, ...)
{
....
+ /*
+ * we should unlock console here to make oops log printed, in case
+ * console is locked before panic in this cpu, or other cpus lock the
+ * console before be stopped.
+ */
+ if( unlikely(console_locked) && !Is_in_console_unlock )
+ {
+ console_unlock();
+ console_locked = 0;
+ }
/*
* Note smp_send_stop is the usual smp shutdown function, which
* unfortunately means it may not be hardened to work in a panic
* situation.
*/
smp_send_stop();
....
}
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] panic: fix incomplete panic log in panic()
2012-10-22 14:54 ` Qing Z
@ 2012-10-22 19:51 ` Andrew Morton
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2012-10-22 19:51 UTC (permalink / raw)
To: Qing Z
Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
xjian, zhangwm, Qing Zhu
On Mon, 22 Oct 2012 22:54:54 +0800
Qing Z <njumical@gmail.com> wrote:
> Hi Andrew,
> Basically, console_unlock() should be called to make panic
> log printed. Call console_unlock() in panic have some risks when
> recurse in it(are there other bad cases?). The condition is very rare
> and the two issue cases I list always happen between console_lock()
> and console_unlock(). So I think we need to couple with
> console_lock(), but should avoid the case that panic happen in
> console_unlock(). I think it is a more modest and safe way. Please
> corect me if there is something wrong. Thanks!
>
> bool Is_in_console_unlock;
> void console_unlock(void)
> {
> ...
> + Is_in_console_unlock = ture;
> /* flush buffered message fragment immediately to console */
> console_cont_flush(text, sizeof(text));
> again:
> for (;;) {
> ....
> + Is_in_console_unlock = false;
> }
>
> void panic(const char *fmt, ...)
> {
> ....
> + /*
> + * we should unlock console here to make oops log printed, in case
> + * console is locked before panic in this cpu, or other cpus lock the
> + * console before be stopped.
> + */
> + if( unlikely(console_locked) && !Is_in_console_unlock )
> + {
> + console_unlock();
> + console_locked = 0;
> + }
>
> /*
> * Note smp_send_stop is the usual smp shutdown function, which
> * unfortunately means it may not be hardened to work in a panic
> * situation.
> */
> smp_send_stop();
>
> ....
> }
Well, if something like that will solve the problem then yes, I guess
that is the way to go. It's not pretty, but it is clear and direct,
and this isn't a pretty problem!
But is this approach sufficient? What happens in the case of an oops
or a BUG() inside console_lock()?
^ permalink raw reply [flat|nested] 10+ messages in thread