From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760105Ab2FAPBf (ORCPT ); Fri, 1 Jun 2012 11:01:35 -0400 Received: from hrndva-omtalb.mail.rr.com ([71.74.56.122]:24202 "EHLO hrndva-omtalb.mail.rr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759119Ab2FAPBA (ORCPT ); Fri, 1 Jun 2012 11:01:00 -0400 X-Authority-Analysis: v=2.0 cv=T6AOvo2Q c=1 sm=0 a=ZycB6UtQUfgMyuk2+PxD7w==:17 a=XQbtiDEiEegA:10 a=Ciwy3NGCPMMA:10 a=-pWtUAXVJaIA:10 a=5SG0PmZfjMsA:10 a=bbbx4UPp9XUA:10 a=meVymXHHAAAA:8 a=20KFwNOVAAAA:8 a=Ds3GogaVUjTumUKFKR0A:9 a=QEXdDO2ut3YA:10 a=jEp0ucaQiEUA:10 a=jeBq3FmKZ4MA:10 a=CfrRryvAoaqJq3jVKUoA:9 a=ZycB6UtQUfgMyuk2+PxD7w==:117 X-Cloudmark-Score: 0 X-Originating-IP: 74.67.80.29 Message-Id: <20120601150058.588173703@goodmis.org> User-Agent: quilt/0.60-1 Date: Fri, 01 Jun 2012 10:57:07 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Peter Zijlstra , Frederic Weisbecker , Masami Hiramatsu , "H. Peter Anvin" , Dave Jones , Andi Kleen Subject: [PATCH 5/5 v2] ftrace/x86: Do not change stacks in DEBUG when calling lockdep References: <20120601145702.428441016@goodmis.org> Content-Disposition: inline; filename=0005-ftrace-x86-Do-not-change-stacks-in-DEBUG-when-callin.patch Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="00GvhwF7k39YY" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --00GvhwF7k39YY Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: Steven Rostedt When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF will call into the lockdep code. The lockdep code can call lots of functions that may be traced by ftrace. When ftrace is updating its code and hits a breakpoint, the breakpoint handler will call into lockdep. If lockdep happens to call a function that also has a breakpoint attached, it will jump back into the breakpoint handler resetting the stack to the debug stack and corrupt the contents currently on that stack. The 'do_sym' call that calls do_int3() is protected by modifying the IST table to point to a different location if another breakpoint is hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if a breakpoint is hit from those, the stack will get corrupted, and the kernel will crash: [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 000= 0000000000002 [ 1013.272665] IP: [] 0xffff880145cbffff [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP [ 1013.310600] CPU 2 [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_i= pv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clm= ulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vend= or_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo= _bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1013.401848] [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 [ 1013.437943] RIP: 8eb8:[] [] 0xffff8= 80146309fff [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 00000000000= 00000 [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff822= 03b20 [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 00000000000= 00000 [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145c= c0000 [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff8801478= 0cb40 [ 1013.586108] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:= 0000000000000000 [ 1013.609458] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001= 407e0 [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000= 00000 [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000= 00400 [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, = task ffff88014020a6a0) [ 1013.717028] Stack: [ 1013.724131] ffff88014780f570 ffff880145cc0000 0000400000004000 00000000= 00000000 [ 1013.745918] cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff= 81651627 [ 1013.767870] ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff= 81f11bdb [ 1013.790021] Call Trace: [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a = 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a = d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff [ 1013.861443] RIP [] 0xffff880146309fff [ 1013.884466] RSP [ 1013.901507] CR2: 0000000000000002 The solution was to reuse the NMI functions that change the IDT table to ma= ke the debug stack keep its current stack (in kernel mode) when hitting a breakpoint: call debug_stack_set_zero TRACE_IRQS_ON call debug_stack_reset If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the curr= ent stack and not crash the box. Reported-by: Dave Jones Signed-off-by: Steven Rostedt --- arch/x86/kernel/entry_64.S | 44 ++++++++++++++++++++++++++++++++++++++++= +--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 320852d..7d65133 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -191,6 +191,44 @@ ENDPROC(native_usergs_sysret64) .endm =20 /* + * When dynamic function tracer is enabled it will add a breakpoint + * to all locations that it is about to modify, sync CPUs, update + * all the code, sync CPUs, then remove the breakpoints. In this time + * if lockdep is enabled, it might jump back into the debug handler + * outside the updating of the IST protection. (TRACE_IRQS_ON/OFF). + * + * We need to change the IDT table before calling TRACE_IRQS_ON/OFF to + * make sure the stack pointer does not get reset back to the top + * of the debug stack, and instead just reuses the current stack. + */ +#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS) + +.macro TRACE_IRQS_OFF_DEBUG + call debug_stack_set_zero + TRACE_IRQS_OFF + call debug_stack_reset +.endm + +.macro TRACE_IRQS_ON_DEBUG + call debug_stack_set_zero + TRACE_IRQS_ON + call debug_stack_reset +.endm + +.macro TRACE_IRQS_IRETQ_DEBUG offset=3DARGOFFSET + bt $9,EFLAGS-\offset(%rsp) /* interrupts off? */ + jnc 1f + TRACE_IRQS_ON_DEBUG +1: +.endm + +#else +# define TRACE_IRQS_OFF_DEBUG TRACE_IRQS_OFF +# define TRACE_IRQS_ON_DEBUG TRACE_IRQS_ON +# define TRACE_IRQS_IRETQ_DEBUG TRACE_IRQS_IRETQ +#endif + +/* * C code is not supposed to know about undefined top of stack. Every time * a C function with an pt_regs argument is called from the SYSCALL based * fast path FIXUP_TOP_OF_STACK is needed. @@ -1098,7 +1136,7 @@ ENTRY(\sym) subq $ORIG_RAX-R15, %rsp CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15 call save_paranoid - TRACE_IRQS_OFF + TRACE_IRQS_OFF_DEBUG movq %rsp,%rdi /* pt_regs pointer */ xorl %esi,%esi /* no error code */ subq $EXCEPTION_STKSZ, INIT_TSS_IST(\ist) @@ -1393,7 +1431,7 @@ paranoidzeroentry machine_check *machine_check_vector= (%rip) ENTRY(paranoid_exit) DEFAULT_FRAME DISABLE_INTERRUPTS(CLBR_NONE) - TRACE_IRQS_OFF + TRACE_IRQS_OFF_DEBUG testl %ebx,%ebx /* swapgs needed? */ jnz paranoid_restore testl $3,CS(%rsp) @@ -1404,7 +1442,7 @@ paranoid_swapgs: RESTORE_ALL 8 jmp irq_return paranoid_restore: - TRACE_IRQS_IRETQ 0 + TRACE_IRQS_IRETQ_DEBUG 0 RESTORE_ALL 8 jmp irq_return paranoid_userspace: --=20 1.7.10 --00GvhwF7k39YY Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPyNkqAAoJEIy3vGnGbaoA2OIQAN4yDDdGlG6tA8z45NZY02q4 Cb6iQLqWV+nXzFn8WK7TzTfJvHbH0VczYAXTE2a9T8RYkopse4XI2TLDRMdcmPNO z68IgaQsh7qJTGMiKYTcz1jm0TX4Dc1raXIRXr2br6mwU3Tw7dA+ol4KwZ2umFaw Jsawq2e3dUS3a1rwgZZz72s+Q7l7WKu0PBgxn4fRzGt/gDI9XAOZ5c/5bdeXCzuT WPoWPf4NVWh+YxmQmgFkz6kN7xoJU98WsK9gpIu041jvUTbhfhSrLgcBmMyItQPd K33JtQf+cs6Gqfrx2M1VIv5DDz52/cylhSep1DeRUWrlugBDpMDiJhfunem4n6jK UiXNJ3HcjBzjcMcJ0lPCE3+o2GOcWsmrl6jfmOXMuM92EeM+jabRaMD57gXbQVLn Xuxh0eYRAeKlLGeiETceBM6iq/4KE5irFUgRRq4VgXbVmIoA+OnSNPlz4aoO2S3r IaVJ440jOkYhk4dCkQLCniXrVOZr//sO+kMVBY/pC9UBoleuplpyf9b2Cd8mP/W7 X038aG+5+DxDvjWq13+hp6JnP5AhFfUgLIUlMCnz2DriUJBhljXioCNbr3PeMnVI 8gF9Wgc8nbGNaOFzmYQL8GwcoV/LEJ47fVrn7BdACVmRKQ9tPA4jCE2YKCq6jmqN fVE3So8CIQa0vJAJnMwJ =LWsh -----END PGP SIGNATURE----- --00GvhwF7k39YY--