* [PATCH -tip 1/2] [BUGFIX]kprobes/x86: Fix page-fault handling logic
2014-02-20 3:39 [PATCH -tip 0/2] kprobes/x86: Fix bugs for NMI handling Masami Hiramatsu
@ 2014-02-20 3:39 ` Masami Hiramatsu
2014-02-20 3:39 ` [PATCH -tip 2/2] kprobes/x86: Allow to handle reentered kprobe on singlestepping Masami Hiramatsu
2014-02-24 2:39 ` [PATCH -tip 0/2] kprobes/x86: Fix bugs for NMI handling Masami Hiramatsu
2 siblings, 0 replies; 4+ messages in thread
From: Masami Hiramatsu @ 2014-02-20 3:39 UTC (permalink / raw)
To: Ingo Molnar, linux-kernel; +Cc: Thomas Gleixner, x86, fche, H. Peter Anvin
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).
In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.
But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.
----
[ 2264.726905] BUG: unable to handle kernel NULL pointer
dereference at 0000000000000020
[ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
[ 2264.727380] PGD cbcd067 PUD cbcc067 PMD 0
[ 2264.727529] Oops: 0000 [#1] SMP
[ 2264.727683] Modules linked in: ipt_MASQUERADE bnep bluetooth 6lowpan_iphc iptable_nat nf_nat_ipv4 nf_nat aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep
[ 2264.728391] CPU: 1 PID: 25094 Comm: perf Not tainted 3.14.0-rc1.badprobe+ #24
[ 2264.728592] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2264.728747] task: ffff88003db9c210 ti: ffff88000caac000 task.ti: ffff88000caac000
[ 2264.728950] RIP: 0010:[<ffffffff813c46e0>] [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
[ 2264.729163] RSP: 0018:ffff88003fd06bd0 EFLAGS: 00010246
[ 2264.729291] RAX: 0000000000000000 RBX: ffff88003fd06bf8 RCX: 0000000000000002
[ 2264.729472] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff88003fd06bf8
[ 2264.729661] RBP: ffff88003fd06bd8 R08: 0000000000000030 R09: 0000000000000000
[ 2264.729789] R10: 000000000000001e R11: 0000000000000015 R12: ffff88000caadfd8
[ 2264.729789] R13: ffff88003d76bc00 R14: ffff88003db9c210 R15: 0000000000000020
[ 2264.729789] FS: 00007f398bbcc780(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 2264.729789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2264.729789] CR2: 0000000000000020 CR3: 00000000204f2000 CR4: 00000000000007e0
[ 2264.729789] Stack:
[ 2264.729789] ffffffff813c5fd4 ffff88003fd06c30 ffffffff810183b0 ffff88003d76bc00
[ 2264.729789] ffff88003fd06ef8 0000000000000000 0000000000000000 ffff88003d76bc00
[ 2264.729789] 000000000000000c 0000000000052ce0 ffff88000956f800 ffff88000caadf58
[ 2264.729789] Call Trace:
[ 2264.729789] <NMI>
[ 2264.729789] [<ffffffff813c5fd4>] ? copy_from_user_nmi+0x64/0x70
[ 2264.729789] [<ffffffff810183b0>] perf_callchain_user+0xc0/0x220
[ 2264.729789] [<ffffffff8111cdb4>] perf_callchain+0x1c4/0x210
[ 2264.729789] [<ffffffff811193e3>] perf_prepare_sample+0x253/0x320
[ 2264.729789] [<ffffffff81119597>] __perf_event_overflow+0xe7/0x230
[ 2264.729789] [<ffffffff810177c8>] ? x86_perf_event_set_period+0xe8/0x150
[ 2264.729789] [<ffffffff8111a034>] perf_event_overflow+0x14/0x20
[ 2264.729789] [<ffffffff8101cf2d>] intel_pmu_handle_irq+0x1cd/0x400
[ 2264.729789] [<ffffffff81854c71>] ? ftrace_regs_caller+0x81/0xcd
[ 2264.729789] [<ffffffff813c46e0>] ? copy_user_generic_unrolled+0xc0/0xc0
[ 2264.729789] [<ffffffff810161bb>] perf_event_nmi_handler+0x2b/0x50
[ 2264.729789] [<ffffffff81006ce8>] nmi_handle+0x88/0x180
[ 2264.729789] [<ffffffff813c46e0>] ? copy_user_generic_unrolled+0xc0/0xc0
[ 2264.729789] [<ffffffff81006fca>] default_do_nmi+0x4a/0x140
[ 2264.729789] [<ffffffff81007168>] do_nmi+0xa8/0xe0
[ 2264.729789] [<ffffffff81856fe1>] end_repeat_nmi+0x1e/0x2e
[ 2264.729789] [<ffffffff813c46e0>] ? copy_user_generic_unrolled+0xc0/0xc0
[ 2264.729789] [<ffffffff810376ac>] ? skip_prefixes+0x1c/0x40
[ 2264.729789] [<ffffffff813c4c80>] ? bad_get_user+0x17/0x17
[ 2264.729789] [<ffffffff81854c71>] ? ftrace_regs_caller+0x81/0xcd
[ 2264.729789] [<ffffffff81854c71>] ? ftrace_regs_caller+0x81/0xcd
[ 2264.729789] [<ffffffff81854c71>] ? ftrace_regs_caller+0x81/0xcd
[ 2264.729789] <<EOE>>
[ 2264.729789] <#DB> [<ffffffff813c46e0>] ? copy_user_generic_unrolled+0xc0/0xc0
[ 2264.729789] [<ffffffff813c46e1>] ? copy_user_generic_string+0x1/0x40
[ 2264.729789] [<ffffffff810e4321>] ? ftrace_cmp_recs+0x1/0x30
[ 2264.729789] [<ffffffff813c4c85>] ? inat_get_opcode_attribute+0x5/0x20
[ 2264.729789] [<ffffffff813c4c85>] ? inat_get_opcode_attribute+0x5/0x20
[ 2264.729789] [<ffffffff810376ac>] ? skip_prefixes+0x1c/0x40
[ 2264.729789] [<ffffffff81038107>] resume_execution+0x37/0x1d0
[ 2264.729789] [<ffffffff810382df>] kprobe_debug_handler+0x3f/0xe0
[ 2264.729789] [<ffffffff8100305f>] do_debug+0x7f/0x1d0
[ 2264.729789] [<ffffffff81856afa>] debug+0x3a/0x50
[ 2264.729789] <<EOE>>
[ 2264.729789] [<ffffffff811aae38>] ? seq_read+0x88/0x390
[ 2264.729789] [<ffffffff81363bc4>] ? security_file_permission+0x84/0xa0
[ 2264.729789] [<ffffffff811ed58d>] proc_reg_read+0x3d/0x80
[ 2264.729789] [<ffffffff81187c7b>] vfs_read+0x9b/0x160
[ 2264.729789] [<ffffffff81188739>] SyS_read+0x49/0xa0
[ 2264.729789] [<ffffffff81854f99>] system_call_fastpath+0x16/0x1b
[ 2264.729789] Code: c9 75 ee 21 d2 74 10 89 d1 8a 06 88 07 48 ff c6 48 ff c7 ff
c9 75 f2 31 c0 0f 1f 00 c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <cc> 1f 00
83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29
[ 2264.729789] RIP [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
[ 2264.729789] RSP <ffff88003fd06bd0>
[ 2264.729789] CR2: 0000000000000020
[ 2264.729789] ---[ end trace 533fc16b4cc45447 ]---
[ 2264.729789] Kernel panic - not syncing: Fatal exception in interrupt
[ 2264.729789] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xf
fffffff80000000-0xffffffff9fffffff)
----
To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/kernel/kprobes/core.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 79a3f96..b482e96 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -897,9 +897,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
struct kprobe *cur = kprobe_running();
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- switch (kcb->kprobe_status) {
- case KPROBE_HIT_SS:
- case KPROBE_REENTER:
+ if (unlikely(regs->ip == (unsigned long)cur->ainsn.insn)) {
/*
* We are here because the instruction being single
* stepped caused a page fault. We reset the current
@@ -914,9 +912,8 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
else
reset_current_kprobe();
preempt_enable_no_resched();
- break;
- case KPROBE_HIT_ACTIVE:
- case KPROBE_HIT_SSDONE:
+ } else if (kcb->kprobe_status == KPROBE_HIT_ACTIVE ||
+ kcb->kprobe_status == KPROBE_HIT_SSDONE) {
/*
* We increment the nmissed count for accounting,
* we can also use npre/npostfault count for accounting
@@ -945,10 +942,8 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
* fixup routine could not handle it,
* Let do_page_fault() fix it.
*/
- break;
- default:
- break;
}
+
return 0;
}
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH -tip 2/2] kprobes/x86: Allow to handle reentered kprobe on singlestepping
2014-02-20 3:39 [PATCH -tip 0/2] kprobes/x86: Fix bugs for NMI handling Masami Hiramatsu
2014-02-20 3:39 ` [PATCH -tip 1/2] [BUGFIX]kprobes/x86: Fix page-fault handling logic Masami Hiramatsu
@ 2014-02-20 3:39 ` Masami Hiramatsu
2014-02-24 2:39 ` [PATCH -tip 0/2] kprobes/x86: Fix bugs for NMI handling Masami Hiramatsu
2 siblings, 0 replies; 4+ messages in thread
From: Masami Hiramatsu @ 2014-02-20 3:39 UTC (permalink / raw)
To: Ingo Molnar, linux-kernel; +Cc: Thomas Gleixner, x86, fche, H. Peter Anvin
Since the NMI handlers(e.g. perf) can interrupt in the
single stepping (or preparing the single stepping, do_debug
etc.), we should consider a kprobe is hit in the NMI
handler. Even in that case, the kprobe is allowed to be
reentered as same as the kprobes hit in kprobe handlers
(KPROBE_HIT_ACTIVE or KPROBE_HIT_SSDONE).
The real issue will happen when a kprobe hit while another
reentered kprobe is processing (KPROBE_REENTER), because
we already consumed a saved-area for the previous kprobe.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/kernel/kprobes/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index b482e96..a9a42fa 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -531,10 +531,11 @@ reenter_kprobe(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb
switch (kcb->kprobe_status) {
case KPROBE_HIT_SSDONE:
case KPROBE_HIT_ACTIVE:
+ case KPROBE_HIT_SS:
kprobes_inc_nmissed_count(p);
setup_singlestep(p, regs, kcb, 1);
break;
- case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
/* A probe has been hit in the codepath leading up to, or just
* after, single-stepping of a probed instruction. This entire
* codepath should strictly reside in .kprobes.text section.
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH -tip 0/2] kprobes/x86: Fix bugs for NMI handling
2014-02-20 3:39 [PATCH -tip 0/2] kprobes/x86: Fix bugs for NMI handling Masami Hiramatsu
2014-02-20 3:39 ` [PATCH -tip 1/2] [BUGFIX]kprobes/x86: Fix page-fault handling logic Masami Hiramatsu
2014-02-20 3:39 ` [PATCH -tip 2/2] kprobes/x86: Allow to handle reentered kprobe on singlestepping Masami Hiramatsu
@ 2014-02-24 2:39 ` Masami Hiramatsu
2 siblings, 0 replies; 4+ messages in thread
From: Masami Hiramatsu @ 2014-02-24 2:39 UTC (permalink / raw)
To: Ingo Molnar, linux-kernel, mingo
Cc: Thomas Gleixner, x86, fche, H. Peter Anvin
Hi Ingo,
Should I resend this series to your kernel.org address?
Thank you,
(2014/02/20 12:39), Masami Hiramatsu wrote:
> The following series fixes bugs hidden in the ancient code.
>
> The bugs suddenly appeared when I enabled over 6,000 kprobes
> and ran perf-top with --call-graph. The bugs are hidden in
> the old code and it have woken up by real stress testing.
>
> Actually, current kprobes doesn't expect an NMI handler
> hits in single-stepping state (including preparation and
> do_debug() handling). Moreover, the NMI handler causing
> a page fault by trying to access user pages, is out of
> imagination! :) But perf does it.
>
> Thus the previous code optimistically check the current
> running kprobe state, and if it is in the singlestep state,
> it changes the IP address to probed address and return,
> because it expects the page fault happened on the single
> stepped code.
> However, in fact, the perf's NMI can interrupt the
> do_debug or somewhere around that and it may cause a
> page fault. In this case, putting the IP address to
> probed address is simply wrong. It causes unexpected
> kernel crash.
> To handle this correctly, this patch fixes it to ensure
> the page-fault address is actually same to the single-
> stepping address, and only if so, set the IP address
> to the probed address.
>
> I also found another small mistake which gives up the
> recovery from reentered kprobes in single-stepping state,
> but it also assumes that there is no NMI handler interrupts
> in that state. It should gives up only when the nested
> reentering happens.
>
> Thanks to Ingo and Frank for encouraging me to start
> stress testing with massive multiple kprobes. :)
>
> Thank you,
>
> ---
>
> Masami Hiramatsu (2):
> [BUGFIX]kprobes/x86: Fix page-fault handling logic
> kprobes/x86: Allow to handle reentered kprobe on singlestepping
>
>
> arch/x86/kernel/kprobes/core.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> --
> Signature
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com
^ permalink raw reply [flat|nested] 4+ messages in thread