2.6.19, more unwinder problems ...

All of lore.kernel.org
 help / color / mirror / Atom feed

* 2.6.19, more unwinder problems ...
@ 2006-12-15 10:15 Ingo Molnar
  2006-12-15 10:55 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2006-12-15 10:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: linux-kernel, Andrew Morton, Linus Torvalds

Jan,

i got the dump below on x86_64 (2.6.19 with the -rt patch applied but no 
changes to the unwinder). How on earth did we get a pagefault in 
unwind()? It seems we do:

0xffffffff8026127e is in unwind (kernel/unwind.c:1109).
1104                                            return -EIO;
1105                                    switch(reg_info[i].width) {
1106    #define CASE(n)     case sizeof(u##n): \
1107                                            __get_user(FRAME_REG(i, u##n), (u##n *)addr); \
1108                                            break
1109                                    CASES;
1110    #undef CASE
1111                                    default:
1112                                            return -EIO;
1113                                    }
(gdb)

now that looks quite wrong to me - why the __get_user() to begin with - 
you should really validate that the pointer is where you expect it - at 
which point __get_user() is not needed.

now, this dump was not fatal so the pagefault did eventually manage to 
extend the userspace stack - but it could have been fatal.

i thought we agreed that the unwinder would not be doing get_user() at 
all?

(gcc version 4.0.2, binutils 2.16.1)

	Ingo

------------>
BUG: sleeping function called from invalid context mount(489) at mm/rmap.c:78
in_atomic():0 [00000000], irqs_disabled():1
1 lock held by mount/489:
 #0:  ((struct rw_semaphore *)(&mm->mmap_sem)){----}, at: [<ffffffff804cade2>] do_page_fault+0x3de/0x848
irq event stamp: 861
hardirqs last  enabled at (861): [<ffffffff80285828>] __inc_zone_page_state+0x4d/0x58
hardirqs last disabled at (860): [<ffffffff80285805>] __inc_zone_page_state+0x2a/0x58
softirqs last  enabled at (0): [<ffffffff80235621>] copy_process+0x580/0x17c8
softirqs last disabled at (0): [<0000000000000000>] 0x0

Call Trace:
 [<ffffffff8020b5f3>] dump_trace+0xaa/0x406
 [<ffffffff8020b989>] show_trace+0x3a/0x60
 [<ffffffff8020bbcb>] dump_stack+0x15/0x17
 [<ffffffff8022d826>] __might_sleep+0x128/0x12d
 [<ffffffff8028ff95>] anon_vma_prepare+0x27/0xe1
 [<ffffffff8028c20d>] expand_stack+0x1e/0x140
 [<ffffffff804cae5c>] do_page_fault+0x458/0x848
 [<ffffffff804c8f4d>] error_exit+0x0/0x96
 [<ffffffff8026127e>] unwind+0xaca/0xb0d
 [<ffffffff8020b52e>] dump_trace_unwind+0x59/0x74
 [<ffffffff802602f2>] unwind_init_running+0x20/0x22
 [<ffffffff8020b5f3>] dump_trace+0xaa/0x406
 [<ffffffff80211da7>] save_stack_trace+0x1f/0x38
 [<ffffffff802538cb>] save_trace+0x46/0xa5
 [<ffffffff80253a1d>] add_lock_to_list+0x78/0xa7
 [<ffffffff80254e67>] __lock_acquire+0x906/0xa1f
 [<ffffffff802554d8>] lock_acquire+0x4c/0x66
 [<ffffffff804c78a9>] rt_spin_lock+0x3d/0x41
 [<ffffffff80240d6d>] lock_timer_base+0x23/0x4a
 [<ffffffff80240f27>] __mod_timer+0x40/0xd5
 [<ffffffff80241002>] mod_timer+0x46/0x4b
 [<ffffffff80440e37>] ledtrig_ide_activity+0x37/0x3b
 [<ffffffff803fe329>] ide_do_rw_disk+0x6a/0x4a4
 [<ffffffff803f535e>] ide_do_request+0x7e5/0x9ca
 [<ffffffff803f5881>] do_ide_request+0x1b/0x1d
 [<ffffffff8033a3b5>] __generic_unplug_device+0x27/0x2b
 [<ffffffff8033a532>] blk_start_queueing+0x1e/0x20
 [<ffffffff80344fc6>] cfq_insert_request+0x24e/0x3d9
 [<ffffffff80339042>] elv_insert+0x14c/0x1ff
 [<ffffffff8033918d>] __elv_add_request+0x98/0xa0
 [<ffffffff8033e1c0>] __make_request+0x41b/0x459
 [<ffffffff8033a74f>] generic_make_request+0x21b/0x236
 [<ffffffff8033c62c>] submit_bio+0x110/0x119
 [<ffffffff802cc47e>] mpage_bio_submit+0x22/0x26
 [<ffffffff802cca56>] do_mpage_readpage+0x466/0x4fe
 [<ffffffff802cd60e>] mpage_readpages+0xcf/0x165
 [<ffffffff802efb4b>] ext3_readpages+0x1a/0x1c
 [<ffffffff8028119d>] __do_page_cache_readahead+0x10b/0x1f0
 [<ffffffff802813bf>] do_page_cache_readahead+0x52/0x5f
 [<ffffffff8027c33e>] filemap_nopage+0x193/0x3dd
 [<ffffffff80289c55>] __handle_mm_fault+0x22e/0xcf8
 [<ffffffff804cae9e>] do_page_fault+0x49a/0x848
 [<ffffffff804c8f4d>] error_exit+0x0/0x96
 [<ffffffff8034c4ac>] __clear_user+0x3a/0x5c
 [<ffffffff8034c592>] clear_user+0x2b/0x33
 [<ffffffff802d7940>] load_elf_binary+0x7a1/0x1a76
 [<ffffffff802a9e74>] search_binary_handler+0x113/0x35d
 [<ffffffff802aa266>] do_execve+0x1a8/0x266
 [<ffffffff8020821b>] sys_execve+0x36/0x89
 [<ffffffff8020a407>] stub_execve+0x67/0xb0
 [<ffffffff8029eb5c>] kmem_cache_zalloc+0x52/0x110


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.19, more unwinder problems ...
  2006-12-15 10:15 2.6.19, more unwinder problems Ingo Molnar
@ 2006-12-15 10:55 ` Jan Beulich
  2006-12-15 11:24   ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2006-12-15 10:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linus Torvalds, Andi Kleen, linux-kernel

When I submitted a patch to replace __get_user() with probe_kernel_address(),
Andi told me he had done this conversion already. I had assumed that he had
included this in his final merge submission. As he hasn't I now have to conclude
that he has or will submit it for 2.6.20.

Also, there is extra checking being done right before the memory access:

				if ((state.regs[i].value * state.dataAlign)
				    % sizeof(unsigned long)
				    || addr < startLoc
				    || addr + sizeof(unsigned long) < addr
				    || addr + sizeof(unsigned long) > endLoc)
					return -EIO;

validating that the item read is between current and previous stack pointer,
which in turn are being derived from register state and unwind information.
A tighter register state check is also pending for 2.6.20 (or already in
2.6.19-gitX), but as I refuse to disallow the unwinder to follow stack
switches (as I view this one of its core features) I can't currently see how to
prevent the possibility of the cfa running wild in

	cfa = FRAME_REG(state.cfa.reg, unsigned long) + state.cfa.offs;
	startLoc = min((unsigned long)UNW_SP(frame), cfa);
	endLoc = max((unsigned long)UNW_SP(frame), cfa);
	if (STACK_LIMIT(startLoc) != STACK_LIMIT(endLoc)) {
		startLoc = min(STACK_LIMIT(cfa), cfa);
		endLoc = max(STACK_LIMIT(cfa), cfa);
	}

other than by adding a range check to disallow this pointing into user space.
But in my opinion such a range check wouldn't help at all, as it doesn't catch
all wrong cases, it should really be the probe_kernel_address() that should
prevent any blocking page fault processing.

Jan

>>> Ingo Molnar <mingo@elte.hu> 15.12.06 11:15 >>>
Jan,

i got the dump below on x86_64 (2.6.19 with the -rt patch applied but no 
changes to the unwinder). How on earth did we get a pagefault in 
unwind()? It seems we do:

0xffffffff8026127e is in unwind (kernel/unwind.c:1109).
1104                                            return -EIO;
1105                                    switch(reg_info[i].width) {
1106    #define CASE(n)     case sizeof(u##n): \
1107                                            __get_user(FRAME_REG(i, u##n), (u##n *)addr); \
1108                                            break
1109                                    CASES;
1110    #undef CASE
1111                                    default:
1112                                            return -EIO;
1113                                    }
(gdb)

now that looks quite wrong to me - why the __get_user() to begin with - 
you should really validate that the pointer is where you expect it - at 
which point __get_user() is not needed.

now, this dump was not fatal so the pagefault did eventually manage to 
extend the userspace stack - but it could have been fatal.

i thought we agreed that the unwinder would not be doing get_user() at 
all?

(gcc version 4.0.2, binutils 2.16.1)

	Ingo

------------>
BUG: sleeping function called from invalid context mount(489) at mm/rmap.c:78
in_atomic():0 [00000000], irqs_disabled():1
1 lock held by mount/489:
 #0:  ((struct rw_semaphore *)(&mm->mmap_sem)){----}, at: [<ffffffff804cade2>] do_page_fault+0x3de/0x848
irq event stamp: 861
hardirqs last  enabled at (861): [<ffffffff80285828>] __inc_zone_page_state+0x4d/0x58
hardirqs last disabled at (860): [<ffffffff80285805>] __inc_zone_page_state+0x2a/0x58
softirqs last  enabled at (0): [<ffffffff80235621>] copy_process+0x580/0x17c8
softirqs last disabled at (0): [<0000000000000000>] 0x0

Call Trace:
 [<ffffffff8020b5f3>] dump_trace+0xaa/0x406
 [<ffffffff8020b989>] show_trace+0x3a/0x60
 [<ffffffff8020bbcb>] dump_stack+0x15/0x17
 [<ffffffff8022d826>] __might_sleep+0x128/0x12d
 [<ffffffff8028ff95>] anon_vma_prepare+0x27/0xe1
 [<ffffffff8028c20d>] expand_stack+0x1e/0x140
 [<ffffffff804cae5c>] do_page_fault+0x458/0x848
 [<ffffffff804c8f4d>] error_exit+0x0/0x96
 [<ffffffff8026127e>] unwind+0xaca/0xb0d
 [<ffffffff8020b52e>] dump_trace_unwind+0x59/0x74
 [<ffffffff802602f2>] unwind_init_running+0x20/0x22
 [<ffffffff8020b5f3>] dump_trace+0xaa/0x406
 [<ffffffff80211da7>] save_stack_trace+0x1f/0x38
 [<ffffffff802538cb>] save_trace+0x46/0xa5
 [<ffffffff80253a1d>] add_lock_to_list+0x78/0xa7
 [<ffffffff80254e67>] __lock_acquire+0x906/0xa1f
 [<ffffffff802554d8>] lock_acquire+0x4c/0x66
 [<ffffffff804c78a9>] rt_spin_lock+0x3d/0x41
 [<ffffffff80240d6d>] lock_timer_base+0x23/0x4a
 [<ffffffff80240f27>] __mod_timer+0x40/0xd5
 [<ffffffff80241002>] mod_timer+0x46/0x4b
 [<ffffffff80440e37>] ledtrig_ide_activity+0x37/0x3b
 [<ffffffff803fe329>] ide_do_rw_disk+0x6a/0x4a4
 [<ffffffff803f535e>] ide_do_request+0x7e5/0x9ca
 [<ffffffff803f5881>] do_ide_request+0x1b/0x1d
 [<ffffffff8033a3b5>] __generic_unplug_device+0x27/0x2b
 [<ffffffff8033a532>] blk_start_queueing+0x1e/0x20
 [<ffffffff80344fc6>] cfq_insert_request+0x24e/0x3d9
 [<ffffffff80339042>] elv_insert+0x14c/0x1ff
 [<ffffffff8033918d>] __elv_add_request+0x98/0xa0
 [<ffffffff8033e1c0>] __make_request+0x41b/0x459
 [<ffffffff8033a74f>] generic_make_request+0x21b/0x236
 [<ffffffff8033c62c>] submit_bio+0x110/0x119
 [<ffffffff802cc47e>] mpage_bio_submit+0x22/0x26
 [<ffffffff802cca56>] do_mpage_readpage+0x466/0x4fe
 [<ffffffff802cd60e>] mpage_readpages+0xcf/0x165
 [<ffffffff802efb4b>] ext3_readpages+0x1a/0x1c
 [<ffffffff8028119d>] __do_page_cache_readahead+0x10b/0x1f0
 [<ffffffff802813bf>] do_page_cache_readahead+0x52/0x5f
 [<ffffffff8027c33e>] filemap_nopage+0x193/0x3dd
 [<ffffffff80289c55>] __handle_mm_fault+0x22e/0xcf8
 [<ffffffff804cae9e>] do_page_fault+0x49a/0x848
 [<ffffffff804c8f4d>] error_exit+0x0/0x96
 [<ffffffff8034c4ac>] __clear_user+0x3a/0x5c
 [<ffffffff8034c592>] clear_user+0x2b/0x33
 [<ffffffff802d7940>] load_elf_binary+0x7a1/0x1a76
 [<ffffffff802a9e74>] search_binary_handler+0x113/0x35d
 [<ffffffff802aa266>] do_execve+0x1a8/0x266
 [<ffffffff8020821b>] sys_execve+0x36/0x89
 [<ffffffff8020a407>] stub_execve+0x67/0xb0
 [<ffffffff8029eb5c>] kmem_cache_zalloc+0x52/0x110


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.19, more unwinder problems ...
  2006-12-15 10:55 ` Jan Beulich
@ 2006-12-15 11:24   ` Ingo Molnar
  2006-12-15 12:18     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2006-12-15 11:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Morton, Linus Torvalds, Andi Kleen, linux-kernel

* Jan Beulich <jbeulich@novell.com> wrote:

> validating that the item read is between current and previous stack 
> pointer, which in turn are being derived from register state and 
> unwind information.

i still dont quite get it - and i feel deja vu. Didnt we agree that the 
right way to go about this is to validate all stack information based on 
what the kernel already knows about all the stacks that the task may 
use? I.e. only allow pointers into the kernel stack and into the various 
kernel stacks. No 'probe kernel pointer' or anything. If the unwind data 
or register state ever points outside that basic filter, abandon the 
walk. Am i missing something?

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.19, more unwinder problems ...
  2006-12-15 11:24   ` Ingo Molnar
@ 2006-12-15 12:18     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2006-12-15 12:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Linus Torvalds, Andi Kleen, linux-kernel

>>> Ingo Molnar <mingo@elte.hu> 15.12.06 12:24 >>>
>
>* Jan Beulich <jbeulich@novell.com> wrote:
>
>> validating that the item read is between current and previous stack 
>> pointer, which in turn are being derived from register state and 
>> unwind information.
>
>i still dont quite get it - and i feel deja vu. Didnt we agree that the 
>right way to go about this is to validate all stack information based on 
>what the kernel already knows about all the stacks that the task may 
>use? I.e. only allow pointers into the kernel stack and into the various 
>kernel stacks. No 'probe kernel pointer' or anything. If the unwind data 
>or register state ever points outside that basic filter, abandon the 
>walk. Am i missing something?

No, we didn't agree on this. Linus asked me to, but I didn't agree (Andi
seemed to tend towards Linus' opinion), based on my intentions of rather
(mid to long term) removing all this a priori knowledge of what stacks may
be switched from/to. Also, I think a filter like you suggest can far too
easily get out of sync with reality (as an example, on i386 I had long ago
submitted a patch to enhance the double fault handling, which hadn't
been rejected with a reason but also wasn't picked up - in that light I
would immediately question whether the double fault stack, in anticipation
of the double fault handler hopefully calling the unwinder rather sooner
than later to get out information, if possible, on what lead to the double
fault, should be treated positively by that filter, and whether, once that
stack finally gets converted to a per-CPU ones, it would be remembered
to update the filter accordingly).

Basically, as I stated before, I wouldn't try to veto a change that does
that (especially under the impression that my opinion here doesn't seem
to count), but I would neither put me as Signed-off-by or Acked-by
under it.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-12-15 12:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-15 10:15 2.6.19, more unwinder problems Ingo Molnar
2006-12-15 10:55 ` Jan Beulich
2006-12-15 11:24   ` Ingo Molnar
2006-12-15 12:18     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.