* 2.6.19, more unwinder problems ...
@ 2006-12-15 10:15 Ingo Molnar
2006-12-15 10:55 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2006-12-15 10:15 UTC (permalink / raw)
To: Jan Beulich; +Cc: linux-kernel, Andrew Morton, Linus Torvalds
Jan,
i got the dump below on x86_64 (2.6.19 with the -rt patch applied but no
changes to the unwinder). How on earth did we get a pagefault in
unwind()? It seems we do:
0xffffffff8026127e is in unwind (kernel/unwind.c:1109).
1104 return -EIO;
1105 switch(reg_info[i].width) {
1106 #define CASE(n) case sizeof(u##n): \
1107 __get_user(FRAME_REG(i, u##n), (u##n *)addr); \
1108 break
1109 CASES;
1110 #undef CASE
1111 default:
1112 return -EIO;
1113 }
(gdb)
now that looks quite wrong to me - why the __get_user() to begin with -
you should really validate that the pointer is where you expect it - at
which point __get_user() is not needed.
now, this dump was not fatal so the pagefault did eventually manage to
extend the userspace stack - but it could have been fatal.
i thought we agreed that the unwinder would not be doing get_user() at
all?
(gcc version 4.0.2, binutils 2.16.1)
Ingo
------------>
BUG: sleeping function called from invalid context mount(489) at mm/rmap.c:78
in_atomic():0 [00000000], irqs_disabled():1
1 lock held by mount/489:
#0: ((struct rw_semaphore *)(&mm->mmap_sem)){----}, at: [<ffffffff804cade2>] do_page_fault+0x3de/0x848
irq event stamp: 861
hardirqs last enabled at (861): [<ffffffff80285828>] __inc_zone_page_state+0x4d/0x58
hardirqs last disabled at (860): [<ffffffff80285805>] __inc_zone_page_state+0x2a/0x58
softirqs last enabled at (0): [<ffffffff80235621>] copy_process+0x580/0x17c8
softirqs last disabled at (0): [<0000000000000000>] 0x0
Call Trace:
[<ffffffff8020b5f3>] dump_trace+0xaa/0x406
[<ffffffff8020b989>] show_trace+0x3a/0x60
[<ffffffff8020bbcb>] dump_stack+0x15/0x17
[<ffffffff8022d826>] __might_sleep+0x128/0x12d
[<ffffffff8028ff95>] anon_vma_prepare+0x27/0xe1
[<ffffffff8028c20d>] expand_stack+0x1e/0x140
[<ffffffff804cae5c>] do_page_fault+0x458/0x848
[<ffffffff804c8f4d>] error_exit+0x0/0x96
[<ffffffff8026127e>] unwind+0xaca/0xb0d
[<ffffffff8020b52e>] dump_trace_unwind+0x59/0x74
[<ffffffff802602f2>] unwind_init_running+0x20/0x22
[<ffffffff8020b5f3>] dump_trace+0xaa/0x406
[<ffffffff80211da7>] save_stack_trace+0x1f/0x38
[<ffffffff802538cb>] save_trace+0x46/0xa5
[<ffffffff80253a1d>] add_lock_to_list+0x78/0xa7
[<ffffffff80254e67>] __lock_acquire+0x906/0xa1f
[<ffffffff802554d8>] lock_acquire+0x4c/0x66
[<ffffffff804c78a9>] rt_spin_lock+0x3d/0x41
[<ffffffff80240d6d>] lock_timer_base+0x23/0x4a
[<ffffffff80240f27>] __mod_timer+0x40/0xd5
[<ffffffff80241002>] mod_timer+0x46/0x4b
[<ffffffff80440e37>] ledtrig_ide_activity+0x37/0x3b
[<ffffffff803fe329>] ide_do_rw_disk+0x6a/0x4a4
[<ffffffff803f535e>] ide_do_request+0x7e5/0x9ca
[<ffffffff803f5881>] do_ide_request+0x1b/0x1d
[<ffffffff8033a3b5>] __generic_unplug_device+0x27/0x2b
[<ffffffff8033a532>] blk_start_queueing+0x1e/0x20
[<ffffffff80344fc6>] cfq_insert_request+0x24e/0x3d9
[<ffffffff80339042>] elv_insert+0x14c/0x1ff
[<ffffffff8033918d>] __elv_add_request+0x98/0xa0
[<ffffffff8033e1c0>] __make_request+0x41b/0x459
[<ffffffff8033a74f>] generic_make_request+0x21b/0x236
[<ffffffff8033c62c>] submit_bio+0x110/0x119
[<ffffffff802cc47e>] mpage_bio_submit+0x22/0x26
[<ffffffff802cca56>] do_mpage_readpage+0x466/0x4fe
[<ffffffff802cd60e>] mpage_readpages+0xcf/0x165
[<ffffffff802efb4b>] ext3_readpages+0x1a/0x1c
[<ffffffff8028119d>] __do_page_cache_readahead+0x10b/0x1f0
[<ffffffff802813bf>] do_page_cache_readahead+0x52/0x5f
[<ffffffff8027c33e>] filemap_nopage+0x193/0x3dd
[<ffffffff80289c55>] __handle_mm_fault+0x22e/0xcf8
[<ffffffff804cae9e>] do_page_fault+0x49a/0x848
[<ffffffff804c8f4d>] error_exit+0x0/0x96
[<ffffffff8034c4ac>] __clear_user+0x3a/0x5c
[<ffffffff8034c592>] clear_user+0x2b/0x33
[<ffffffff802d7940>] load_elf_binary+0x7a1/0x1a76
[<ffffffff802a9e74>] search_binary_handler+0x113/0x35d
[<ffffffff802aa266>] do_execve+0x1a8/0x266
[<ffffffff8020821b>] sys_execve+0x36/0x89
[<ffffffff8020a407>] stub_execve+0x67/0xb0
[<ffffffff8029eb5c>] kmem_cache_zalloc+0x52/0x110
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.19, more unwinder problems ...
2006-12-15 10:15 2.6.19, more unwinder problems Ingo Molnar
@ 2006-12-15 10:55 ` Jan Beulich
2006-12-15 11:24 ` Ingo Molnar
0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2006-12-15 10:55 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Linus Torvalds, Andi Kleen, linux-kernel
When I submitted a patch to replace __get_user() with probe_kernel_address(),
Andi told me he had done this conversion already. I had assumed that he had
included this in his final merge submission. As he hasn't I now have to conclude
that he has or will submit it for 2.6.20.
Also, there is extra checking being done right before the memory access:
if ((state.regs[i].value * state.dataAlign)
% sizeof(unsigned long)
|| addr < startLoc
|| addr + sizeof(unsigned long) < addr
|| addr + sizeof(unsigned long) > endLoc)
return -EIO;
validating that the item read is between current and previous stack pointer,
which in turn are being derived from register state and unwind information.
A tighter register state check is also pending for 2.6.20 (or already in
2.6.19-gitX), but as I refuse to disallow the unwinder to follow stack
switches (as I view this one of its core features) I can't currently see how to
prevent the possibility of the cfa running wild in
cfa = FRAME_REG(state.cfa.reg, unsigned long) + state.cfa.offs;
startLoc = min((unsigned long)UNW_SP(frame), cfa);
endLoc = max((unsigned long)UNW_SP(frame), cfa);
if (STACK_LIMIT(startLoc) != STACK_LIMIT(endLoc)) {
startLoc = min(STACK_LIMIT(cfa), cfa);
endLoc = max(STACK_LIMIT(cfa), cfa);
}
other than by adding a range check to disallow this pointing into user space.
But in my opinion such a range check wouldn't help at all, as it doesn't catch
all wrong cases, it should really be the probe_kernel_address() that should
prevent any blocking page fault processing.
Jan
>>> Ingo Molnar <mingo@elte.hu> 15.12.06 11:15 >>>
Jan,
i got the dump below on x86_64 (2.6.19 with the -rt patch applied but no
changes to the unwinder). How on earth did we get a pagefault in
unwind()? It seems we do:
0xffffffff8026127e is in unwind (kernel/unwind.c:1109).
1104 return -EIO;
1105 switch(reg_info[i].width) {
1106 #define CASE(n) case sizeof(u##n): \
1107 __get_user(FRAME_REG(i, u##n), (u##n *)addr); \
1108 break
1109 CASES;
1110 #undef CASE
1111 default:
1112 return -EIO;
1113 }
(gdb)
now that looks quite wrong to me - why the __get_user() to begin with -
you should really validate that the pointer is where you expect it - at
which point __get_user() is not needed.
now, this dump was not fatal so the pagefault did eventually manage to
extend the userspace stack - but it could have been fatal.
i thought we agreed that the unwinder would not be doing get_user() at
all?
(gcc version 4.0.2, binutils 2.16.1)
Ingo
------------>
BUG: sleeping function called from invalid context mount(489) at mm/rmap.c:78
in_atomic():0 [00000000], irqs_disabled():1
1 lock held by mount/489:
#0: ((struct rw_semaphore *)(&mm->mmap_sem)){----}, at: [<ffffffff804cade2>] do_page_fault+0x3de/0x848
irq event stamp: 861
hardirqs last enabled at (861): [<ffffffff80285828>] __inc_zone_page_state+0x4d/0x58
hardirqs last disabled at (860): [<ffffffff80285805>] __inc_zone_page_state+0x2a/0x58
softirqs last enabled at (0): [<ffffffff80235621>] copy_process+0x580/0x17c8
softirqs last disabled at (0): [<0000000000000000>] 0x0
Call Trace:
[<ffffffff8020b5f3>] dump_trace+0xaa/0x406
[<ffffffff8020b989>] show_trace+0x3a/0x60
[<ffffffff8020bbcb>] dump_stack+0x15/0x17
[<ffffffff8022d826>] __might_sleep+0x128/0x12d
[<ffffffff8028ff95>] anon_vma_prepare+0x27/0xe1
[<ffffffff8028c20d>] expand_stack+0x1e/0x140
[<ffffffff804cae5c>] do_page_fault+0x458/0x848
[<ffffffff804c8f4d>] error_exit+0x0/0x96
[<ffffffff8026127e>] unwind+0xaca/0xb0d
[<ffffffff8020b52e>] dump_trace_unwind+0x59/0x74
[<ffffffff802602f2>] unwind_init_running+0x20/0x22
[<ffffffff8020b5f3>] dump_trace+0xaa/0x406
[<ffffffff80211da7>] save_stack_trace+0x1f/0x38
[<ffffffff802538cb>] save_trace+0x46/0xa5
[<ffffffff80253a1d>] add_lock_to_list+0x78/0xa7
[<ffffffff80254e67>] __lock_acquire+0x906/0xa1f
[<ffffffff802554d8>] lock_acquire+0x4c/0x66
[<ffffffff804c78a9>] rt_spin_lock+0x3d/0x41
[<ffffffff80240d6d>] lock_timer_base+0x23/0x4a
[<ffffffff80240f27>] __mod_timer+0x40/0xd5
[<ffffffff80241002>] mod_timer+0x46/0x4b
[<ffffffff80440e37>] ledtrig_ide_activity+0x37/0x3b
[<ffffffff803fe329>] ide_do_rw_disk+0x6a/0x4a4
[<ffffffff803f535e>] ide_do_request+0x7e5/0x9ca
[<ffffffff803f5881>] do_ide_request+0x1b/0x1d
[<ffffffff8033a3b5>] __generic_unplug_device+0x27/0x2b
[<ffffffff8033a532>] blk_start_queueing+0x1e/0x20
[<ffffffff80344fc6>] cfq_insert_request+0x24e/0x3d9
[<ffffffff80339042>] elv_insert+0x14c/0x1ff
[<ffffffff8033918d>] __elv_add_request+0x98/0xa0
[<ffffffff8033e1c0>] __make_request+0x41b/0x459
[<ffffffff8033a74f>] generic_make_request+0x21b/0x236
[<ffffffff8033c62c>] submit_bio+0x110/0x119
[<ffffffff802cc47e>] mpage_bio_submit+0x22/0x26
[<ffffffff802cca56>] do_mpage_readpage+0x466/0x4fe
[<ffffffff802cd60e>] mpage_readpages+0xcf/0x165
[<ffffffff802efb4b>] ext3_readpages+0x1a/0x1c
[<ffffffff8028119d>] __do_page_cache_readahead+0x10b/0x1f0
[<ffffffff802813bf>] do_page_cache_readahead+0x52/0x5f
[<ffffffff8027c33e>] filemap_nopage+0x193/0x3dd
[<ffffffff80289c55>] __handle_mm_fault+0x22e/0xcf8
[<ffffffff804cae9e>] do_page_fault+0x49a/0x848
[<ffffffff804c8f4d>] error_exit+0x0/0x96
[<ffffffff8034c4ac>] __clear_user+0x3a/0x5c
[<ffffffff8034c592>] clear_user+0x2b/0x33
[<ffffffff802d7940>] load_elf_binary+0x7a1/0x1a76
[<ffffffff802a9e74>] search_binary_handler+0x113/0x35d
[<ffffffff802aa266>] do_execve+0x1a8/0x266
[<ffffffff8020821b>] sys_execve+0x36/0x89
[<ffffffff8020a407>] stub_execve+0x67/0xb0
[<ffffffff8029eb5c>] kmem_cache_zalloc+0x52/0x110
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.19, more unwinder problems ...
2006-12-15 10:55 ` Jan Beulich
@ 2006-12-15 11:24 ` Ingo Molnar
2006-12-15 12:18 ` Jan Beulich
0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2006-12-15 11:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Morton, Linus Torvalds, Andi Kleen, linux-kernel
* Jan Beulich <jbeulich@novell.com> wrote:
> validating that the item read is between current and previous stack
> pointer, which in turn are being derived from register state and
> unwind information.
i still dont quite get it - and i feel deja vu. Didnt we agree that the
right way to go about this is to validate all stack information based on
what the kernel already knows about all the stacks that the task may
use? I.e. only allow pointers into the kernel stack and into the various
kernel stacks. No 'probe kernel pointer' or anything. If the unwind data
or register state ever points outside that basic filter, abandon the
walk. Am i missing something?
Ingo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.19, more unwinder problems ...
2006-12-15 11:24 ` Ingo Molnar
@ 2006-12-15 12:18 ` Jan Beulich
0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2006-12-15 12:18 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, Linus Torvalds, Andi Kleen, linux-kernel
>>> Ingo Molnar <mingo@elte.hu> 15.12.06 12:24 >>>
>
>* Jan Beulich <jbeulich@novell.com> wrote:
>
>> validating that the item read is between current and previous stack
>> pointer, which in turn are being derived from register state and
>> unwind information.
>
>i still dont quite get it - and i feel deja vu. Didnt we agree that the
>right way to go about this is to validate all stack information based on
>what the kernel already knows about all the stacks that the task may
>use? I.e. only allow pointers into the kernel stack and into the various
>kernel stacks. No 'probe kernel pointer' or anything. If the unwind data
>or register state ever points outside that basic filter, abandon the
>walk. Am i missing something?
No, we didn't agree on this. Linus asked me to, but I didn't agree (Andi
seemed to tend towards Linus' opinion), based on my intentions of rather
(mid to long term) removing all this a priori knowledge of what stacks may
be switched from/to. Also, I think a filter like you suggest can far too
easily get out of sync with reality (as an example, on i386 I had long ago
submitted a patch to enhance the double fault handling, which hadn't
been rejected with a reason but also wasn't picked up - in that light I
would immediately question whether the double fault stack, in anticipation
of the double fault handler hopefully calling the unwinder rather sooner
than later to get out information, if possible, on what lead to the double
fault, should be treated positively by that filter, and whether, once that
stack finally gets converted to a per-CPU ones, it would be remembered
to update the filter accordingly).
Basically, as I stated before, I wouldn't try to veto a change that does
that (especially under the impression that my opinion here doesn't seem
to count), but I would neither put me as Signed-off-by or Acked-by
under it.
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-12-15 12:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-15 10:15 2.6.19, more unwinder problems Ingo Molnar
2006-12-15 10:55 ` Jan Beulich
2006-12-15 11:24 ` Ingo Molnar
2006-12-15 12:18 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.