* printk badness with VMAP_STACK
@ 2016-10-26 22:55 Laura Abbott
2016-10-26 23:34 ` Linus Torvalds
2016-10-27 9:19 ` Petr Mladek
0 siblings, 2 replies; 5+ messages in thread
From: Laura Abbott @ 2016-10-26 22:55 UTC (permalink / raw)
To: Andy Lutomirski, Linus Torvalds
Cc: Linux Kernel Mailing List, Kees Cook, Andrew Morton
Hi,
I was playing around with overflowing stacks and I managed to generate a test
case that hung the kernel with vmapped stacks. The test case is just
static void noinline foo1(void)
{
pr_info("%p\n", (void *)current_stack_pointer());
foo2();
}
where foo$n is the same function with the name changed. I'm super
creative. I have a couple thousand of these for testing with the final
one doing a WARN. The kernel eventually hangs in printk on logbuf_lock
(gdb) bt
#0 __read_once_size (size=<optimized out>, res=<optimized out>, p=<optimized out>)
at ./include/linux/compiler.h:243
#1 queued_spin_lock_slowpath (lock=0xffffffff82078e6c <logbuf_lock>, val=1)
at kernel/locking/qspinlock.c:478
#2 0xffffffff8191611b in queued_spin_lock (lock=<optimized out>)
at ./include/asm-generic/qspinlock.h:103
#3 do_raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock.h:148
#4 __raw_spin_lock (lock=<optimized out>)
at ./include/linux/spinlock_api_smp.h:145
#5 _raw_spin_lock (lock=<optimized out>) at kernel/locking/spinlock.c:151
#6 0xffffffff810a4244 in vprintk_emit (facility=-2113434004, level=1,
dict=<optimized out>, dictlen=<optimized out>,
fmt=0x101 <irq_stack_union+257> <error: Cannot access memory at address 0x101>, args=0xffff880011804eb0) at kernel/printk/printk.c:1835
#7 0xffffffff810a476a in vprintk_default (fmt=<optimized out>,
args=<optimized out>) at kernel/printk/printk.c:1953
#8 0xffffffff81128152 in vprintk_func (args=<optimized out>, fmt=<optimized out>)
at kernel/printk/internal.h:36
#9 printk (fmt=<optimized out>) at kernel/printk/printk.c:1986
#10 0xffffffff8101d590 in handle_stack_overflow (
message=0xffffffff81ba3560 "kernel stack overflow (double-fault)",
regs=0xffff880011804f58, fault_address=<optimized out>)
at arch/x86/kernel/traps.c:300
#11 0xffffffff8101d67f in do_double_fault (regs=0xffff880011804f58, error_code=0)
at arch/x86/kernel/traps.c:393
#12 0xffffffff81917c32 in double_fault () at arch/x86/entry/entry_64.S:854
#13 0xffffc90000178038 in ?? ()
#14 0x0000000000ffff0a in ?? ()
#15 0x0000000000000000 in ?? ()
handle_stack_overflow does
printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
(void *)fault_address, current->stack,
(char *)current->stack + THREAD_SIZE - 1);
die(message, regs, 0);
so there is a printk before the die and bust_spinlocks there. Just doing a
bust_spinlock before the printk doesn't help though and if the printk is removed
the kernel still hangs in the printk in __die
gdb shows logbuf_cpu as unlocked
(gdb) print /x logbuf_cpu
$1 = 0xffffffff
and walking back up the stack it looks like this finally ran out of stack space
in console_unlock from the end of vprintk_emit. console_unlock takes logbuf_lock
but doesn't update logbuf_cpu to possibly check for recursion in a panic case,
probably because nobody every considered it would be possible to die there
before.
So I think this is a printk bug that VMAP_STACK + my flavor of test case is
exposing. I think the fix should be to update logbuf_cpu everywhere logbuf_lock
is taken but I wasn't confident enough to try it because the console is testy.
Thanks,
Laura
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: printk badness with VMAP_STACK
2016-10-26 22:55 printk badness with VMAP_STACK Laura Abbott
@ 2016-10-26 23:34 ` Linus Torvalds
2016-10-27 0:04 ` Laura Abbott
2016-10-27 9:19 ` Petr Mladek
1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2016-10-26 23:34 UTC (permalink / raw)
To: Laura Abbott
Cc: Andy Lutomirski, Linux Kernel Mailing List, Kees Cook,
Andrew Morton
On Wed, Oct 26, 2016 at 3:55 PM, Laura Abbott <labbott@redhat.com> wrote:
>
> I was playing around with overflowing stacks and I managed to generate a
> test
> case that hung the kernel with vmapped stacks. The test case is just
>
> static void noinline foo1(void)
> {
> pr_info("%p\n", (void *)current_stack_pointer());
> foo2();
> }
>
> where foo$n is the same function with the name changed. I'm super
> creative. I have a couple thousand of these for testing with the final
> one doing a WARN. The kernel eventually hangs in printk on logbuf_lock
So just to get this right - your test-case is intentionally doing that
mutually recursive thing with foo1/foo2 calling each other until they
run out of stack?
And yes, occasionally the stack will run out while in the middle of
"printk()", and then when we take a fault, we'll be screwed.
Note that we do *not* guarantee that "printk()" works in all contexts,
so it might not really be considered a bug. It's very much a "best
effort", but the scheduler and timekeeping, for example, uses
"printk_deferred()" exactly because one of the contexts where printk()
does *not* work is when you hold the rq lock.
And the reason for *that* is that printk() ends up relying on a few
different locks:
- logbuf_lock, obviously.
- console_sem for actual output
- cond_resched() requires rq->lock
And we do have some hacks on place - the recursive printk test
(logbuf_cpu, as you note) and oops_in_progress and that "zap_locks()".
But zap_locks only zaps logbuf_lock and console_sem, for example.
If you run out of stack somewhere in the middle of the scheduler when
the "cond_resched()" case of printk triggers, and we hold "rq->lock"
when the double fault occurs, the machine *will* be dead. It will
still try to print things out (thanks to that zap_locks thing), but
rq->lock will be wrong, and nothing will ever recover.
And it _sounds_ like that's the case you hit.
Basically, zap_locks and the other printk "try to at least print
things out" can only handle so much.
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: printk badness with VMAP_STACK
2016-10-26 23:34 ` Linus Torvalds
@ 2016-10-27 0:04 ` Laura Abbott
0 siblings, 0 replies; 5+ messages in thread
From: Laura Abbott @ 2016-10-27 0:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Linux Kernel Mailing List, Kees Cook,
Andrew Morton
On 10/26/2016 04:34 PM, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 3:55 PM, Laura Abbott <labbott@redhat.com> wrote:
>>
>> I was playing around with overflowing stacks and I managed to generate a
>> test
>> case that hung the kernel with vmapped stacks. The test case is just
>>
>> static void noinline foo1(void)
>> {
>> pr_info("%p\n", (void *)current_stack_pointer());
>> foo2();
>> }
>>
>> where foo$n is the same function with the name changed. I'm super
>> creative. I have a couple thousand of these for testing with the final
>> one doing a WARN. The kernel eventually hangs in printk on logbuf_lock
>
> So just to get this right - your test-case is intentionally doing that
> mutually recursive thing with foo1/foo2 calling each other until they
> run out of stack?
No, it's 1000 functions of foo1, foo2 up to foo1000. The idea is
to have a very deep stack that would eventually terminate if stack
space was infinite.
>
> And yes, occasionally the stack will run out while in the middle of
> "printk()", and then when we take a fault, we'll be screwed.
>
> Note that we do *not* guarantee that "printk()" works in all contexts,
> so it might not really be considered a bug. It's very much a "best
> effort", but the scheduler and timekeeping, for example, uses
> "printk_deferred()" exactly because one of the contexts where printk()
> does *not* work is when you hold the rq lock.
>
> And the reason for *that* is that printk() ends up relying on a few
> different locks:
>
> - logbuf_lock, obviously.
> - console_sem for actual output
> - cond_resched() requires rq->lock
>
> And we do have some hacks on place - the recursive printk test
> (logbuf_cpu, as you note) and oops_in_progress and that "zap_locks()".
>
> But zap_locks only zaps logbuf_lock and console_sem, for example.
>
> If you run out of stack somewhere in the middle of the scheduler when
> the "cond_resched()" case of printk triggers, and we hold "rq->lock"
> when the double fault occurs, the machine *will* be dead. It will
> still try to print things out (thanks to that zap_locks thing), but
> rq->lock will be wrong, and nothing will ever recover.
>
> And it _sounds_ like that's the case you hit.
>
A similar one at least.
> Basically, zap_locks and the other printk "try to at least print
> things out" can only handle so much.
>
> Linus
>
I wonder if the addition of hardening features will lead to an actual
increase in problems for printk since things will be faulting more
often instead of chugging along. Or maybe it will just be limited
to developers trying to purposely break things.
Anyway, thanks for the explanation about the limitations of printk
for future reference.
Laura
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: printk badness with VMAP_STACK
2016-10-26 22:55 printk badness with VMAP_STACK Laura Abbott
2016-10-26 23:34 ` Linus Torvalds
@ 2016-10-27 9:19 ` Petr Mladek
2016-10-27 11:58 ` Sergey Senozhatsky
1 sibling, 1 reply; 5+ messages in thread
From: Petr Mladek @ 2016-10-27 9:19 UTC (permalink / raw)
To: Laura Abbott
Cc: Andy Lutomirski, Linus Torvalds, Linux Kernel Mailing List,
Kees Cook, Andrew Morton, Sergey Senozhatsky
On Wed 2016-10-26 15:55:00, Laura Abbott wrote:
> Hi,
>
> I was playing around with overflowing stacks and I managed to generate a test
> case that hung the kernel with vmapped stacks. The test case is just
>
> static void noinline foo1(void)
> {
> pr_info("%p\n", (void *)current_stack_pointer());
> foo2();
> }
>
> where foo$n is the same function with the name changed. I'm super
> creative. I have a couple thousand of these for testing with the final
> one doing a WARN. The kernel eventually hangs in printk on logbuf_lock
>
> (gdb) bt
> #0 __read_once_size (size=<optimized out>, res=<optimized out>, p=<optimized out>)
> at ./include/linux/compiler.h:243
> #1 queued_spin_lock_slowpath (lock=0xffffffff82078e6c <logbuf_lock>, val=1)
> at kernel/locking/qspinlock.c:478
> #2 0xffffffff8191611b in queued_spin_lock (lock=<optimized out>)
> at ./include/asm-generic/qspinlock.h:103
> #3 do_raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock.h:148
> #4 __raw_spin_lock (lock=<optimized out>)
> at ./include/linux/spinlock_api_smp.h:145
> #5 _raw_spin_lock (lock=<optimized out>) at kernel/locking/spinlock.c:151
> #6 0xffffffff810a4244 in vprintk_emit (facility=-2113434004, level=1,
> dict=<optimized out>, dictlen=<optimized out>,
> fmt=0x101 <irq_stack_union+257> <error: Cannot access memory at address 0x101>, args=0xffff880011804eb0) at kernel/printk/printk.c:1835
> #7 0xffffffff810a476a in vprintk_default (fmt=<optimized out>,
> args=<optimized out>) at kernel/printk/printk.c:1953
> #8 0xffffffff81128152 in vprintk_func (args=<optimized out>, fmt=<optimized out>)
> at kernel/printk/internal.h:36
> #9 printk (fmt=<optimized out>) at kernel/printk/printk.c:1986
> #10 0xffffffff8101d590 in handle_stack_overflow (
> message=0xffffffff81ba3560 "kernel stack overflow (double-fault)",
> regs=0xffff880011804f58, fault_address=<optimized out>)
> at arch/x86/kernel/traps.c:300
> #11 0xffffffff8101d67f in do_double_fault (regs=0xffff880011804f58, error_code=0)
> at arch/x86/kernel/traps.c:393
> #12 0xffffffff81917c32 in double_fault () at arch/x86/entry/entry_64.S:854
> #13 0xffffc90000178038 in ?? ()
> #14 0x0000000000ffff0a in ?? ()
> #15 0x0000000000000000 in ?? ()
>
> handle_stack_overflow does
>
> printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
> (void *)fault_address, current->stack,
> (char *)current->stack + THREAD_SIZE - 1);
> die(message, regs, 0);
>
> so there is a printk before the die and bust_spinlocks there. Just doing a
> bust_spinlock before the printk doesn't help though and if the printk is removed
> the kernel still hangs in the printk in __die
>
> gdb shows logbuf_cpu as unlocked
>
> (gdb) print /x logbuf_cpu
> $1 = 0xffffffff
>
> and walking back up the stack it looks like this finally ran out of stack space
> in console_unlock from the end of vprintk_emit. console_unlock takes logbuf_lock
> but doesn't update logbuf_cpu to possibly check for recursion in a panic case,
> probably because nobody every considered it would be possible to die there
> before.
Yeah, logbuf_lock is taken on many locations but logbuf_cpu is set
only in vprintk_emit(). It means that the other locations, including
console_unlock() are not protected against this type of recursion.
There is actually a whole bunch of possible printk-related deadlocks.
There are several approaches how to handle some of them, for example:
+ printk_save(), see
https://lkml.kernel.org/r/20161018154045.7364-1-sergey.senozhatsky@gmail.com
+ async printk, see
https://lkml.kernel.org/r/1459789048-1337-1-git-send-email-sergey.senozhatsky@gmail.com
+ early console, see
https://lkml.kernel.org/r/20161018170830.405990950@infradead.org
The more we try to fix them, the more problems we see. Sergey probably
has the best overview about it at the moment.
We are going to discuss a possible progress on Plumbers next week.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: printk badness with VMAP_STACK
2016-10-27 9:19 ` Petr Mladek
@ 2016-10-27 11:58 ` Sergey Senozhatsky
0 siblings, 0 replies; 5+ messages in thread
From: Sergey Senozhatsky @ 2016-10-27 11:58 UTC (permalink / raw)
To: Petr Mladek
Cc: Laura Abbott, Andy Lutomirski, Linus Torvalds,
Linux Kernel Mailing List, Kees Cook, Andrew Morton,
Sergey Senozhatsky
Hello,
thanks for Cc-ing.
On (10/27/16 11:19), Petr Mladek wrote:
[..]
> Yeah, logbuf_lock is taken on many locations but logbuf_cpu is set
> only in vprintk_emit(). It means that the other locations, including
> console_unlock() are not protected against this type of recursion.
>
> There is actually a whole bunch of possible printk-related deadlocks.
> There are several approaches how to handle some of them, for example:
>
> + printk_save(), see
> https://lkml.kernel.org/r/20161018154045.7364-1-sergey.senozhatsky@gmail.com
>
> + async printk, see
> https://lkml.kernel.org/r/1459789048-1337-1-git-send-email-sergey.senozhatsky@gmail.com
>
> + early console, see
> https://lkml.kernel.org/r/20161018170830.405990950@infradead.org
>
>
> The more we try to fix them, the more problems we see. Sergey probably
> has the best overview about it at the moment.
yep. I think I'll re-fresh my printk_safe patch set tonight. it will
conflict with Linus' pr_cont() rework, so I'll keep my patch set as a
RFC.
> We are going to discuss a possible progress on Plumbers next week.
yep. I'll prepare some PDF slides :)
I'm afraid due to the lack of experience/time/"anything else that is
crucially important" I have no idea at the moment if my KS tech-topic
proposal [1][2] has been approved or rejected (is KS schedule available
somewhere online already? I can't find it) and I absolutely forgot to
submit it as a LPC micro-conference as a back-up plan (hm, can people
actually do this?). so in the worst case (printk tech-topic is out of KS
schedule) we will have to find an empty room on our own (I'm available
*any* time). my apologies for any incontinence in advance.
[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-July/002740.html
[2] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-September/004006.html
-ss
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-10-27 14:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-26 22:55 printk badness with VMAP_STACK Laura Abbott
2016-10-26 23:34 ` Linus Torvalds
2016-10-27 0:04 ` Laura Abbott
2016-10-27 9:19 ` Petr Mladek
2016-10-27 11:58 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).