* Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c [not found] ` <202209280801.2d5eebb5-yujie.liu@intel.com> @ 2022-09-28 15:44 ` Nathan Chancellor 2022-09-28 19:13 ` Josh Poimboeuf 0 siblings, 1 reply; 3+ messages in thread From: Nathan Chancellor @ 2022-09-28 15:44 UTC (permalink / raw) To: kernel test robot Cc: Sathvika Vasireddy, lkp, lkp, Peter Zijlstra, Christophe Leroy, linux-kbuild, linux-kernel, linuxppc-dev, jpoimboe, aik, mpe, mingo, rostedt, mbenes, npiggin, chenzhongjin, naveen.n.rao, llvm Hi all, On Wed, Sep 28, 2022 at 08:48:53AM +0800, kernel test robot wrote: > Greeting, > > FYI, we noticed the following commit (built with clang-14): > > commit: ca5e2b42c0d4438ba93623579b6860b98f3598f3 ("[PATCH v3 11/16] objtool: Add --mnop as an option to --mcount") > url: https://github.com/intel-lab-lkp/linux/commits/Sathvika-Vasireddy/objtool-Enable-and-implement-mcount-option-on-powerpc/20220912-163023 > base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git topic/ppc-kvm > patch link: https://lore.kernel.org/linuxppc-dev/20220912082020.226755-12-sv@linux.ibm.com > > in testcase: boot > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > [ 152.068363][ T0] jump_label: Fatal kernel bug, unexpected op at trace_initcall_start+0xc/0x180 [ffffffff810016ec] (e9 c9 00 00 00 != 0f 1f 44 00 00)) size:5 type:1 > [ 152.070368][ T0] ------------[ cut here ]------------ > [ 152.071050][ T0] kernel BUG at arch/x86/kernel/jump_label.c:73! > [ 152.071825][ T0] invalid opcode: 0000 [#1] SMP KASAN PTI > [ 152.072427][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc2-00011-gca5e2b42c0d4 #1 96a19ca45386d518c4bccc5b3bc53f548a2dc122 > [ 152.073837][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > [ 152.075461][ T0] RIP: 0010:__jump_label_patch+0x340/0x350 > [ 152.076162][ T0] Code: 00 48 89 da e9 51 fe ff ff 48 c7 c7 00 d1 80 83 4c 89 fe 4c 89 fa 4c 89 f9 49 89 d8 45 89 e9 41 54 e8 f2 91 34 02 48 83 c4 08 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 84 00 00 00 00 00 48 c7 c7 00 09 69 > [ 152.078374][ T0] RSP: 0000:ffffffff84607cb8 EFLAGS: 00010086 > [ 152.079159][ T0] RAX: 0000000000000092 RBX: ffffffff8380f62a RCX: ffffffff84634d80 > [ 152.080100][ T0] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 00000000fffffffe > [ 152.081020][ T0] RBP: ffffffff855d9f60 R08: ffffffff8124f17c R09: fffffbfff08c0f53 > [ 152.081936][ T0] R10: dffff7fff08c0f54 R11: 1ffffffff08c0f52 R12: 0000000000000001 > [ 152.082832][ T0] R13: 0000000000000005 R14: ffffffff8380f62a R15: ffffffff810016ec > [ 152.083744][ T0] FS: 0000000000000000(0000) GS:ffff8883aee00000(0000) knlGS:0000000000000000 > [ 152.084763][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 152.085567][ T0] CR2: ffff88843ffff000 CR3: 0000000004628000 CR4: 00000000000406b0 > [ 152.086472][ T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 152.087407][ T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 152.088326][ T0] Call Trace: > [ 152.088702][ T0] <TASK> > [ 152.089042][ T0] ? trace_initcall_start+0xc/0x180 > [ 152.089660][ T0] ? trace_initcall_start+0x1b/0x180 > [ 152.090281][ T0] ? trace_initcall_start+0x11/0x180 > [ 152.091237][ T0] ? jump_label_transform+0x25/0xd0 > [ 152.091923][ T0] ? arch_jump_label_transform_queue+0x87/0xd0 > [ 152.092651][ T0] ? __jump_label_update+0x192/0x3b0 > [ 152.093320][ T0] ? static_key_enable_cpuslocked+0x129/0x250 > [ 152.094020][ T0] ? rcu_lock_release+0x20/0x20 > [ 152.094573][ T0] ? static_key_enable+0x16/0x20 > [ 152.095167][ T0] ? tracepoint_add_func+0x87e/0x9d0 > [ 152.095822][ T0] ? rcu_lock_release+0x20/0x20 > [ 152.096394][ T0] ? tracepoint_probe_register+0x99/0xd0 > [ 152.097055][ T0] ? rcu_lock_release+0x20/0x20 > [ 152.097606][ T0] ? initcall_debug_enable+0x21/0x6b > [ 152.098305][ T0] ? start_kernel+0x24b/0x4e6 > [ 152.098861][ T0] ? secondary_startup_64_no_verify+0xce/0xdb > [ 152.099556][ T0] </TASK> > [ 152.099891][ T0] Modules linked in: > [ 152.100352][ T0] ---[ end trace 0000000000000000 ]--- > [ 152.100980][ T0] RIP: 0010:__jump_label_patch+0x340/0x350 > [ 152.101652][ T0] Code: 00 48 89 da e9 51 fe ff ff 48 c7 c7 00 d1 80 83 4c 89 fe 4c 89 fa 4c 89 f9 49 89 d8 45 89 e9 41 54 e8 f2 91 34 02 48 83 c4 08 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 84 00 00 00 00 00 48 c7 c7 00 09 69 > [ 152.103892][ T0] RSP: 0000:ffffffff84607cb8 EFLAGS: 00010086 > [ 152.104544][ T0] RAX: 0000000000000092 RBX: ffffffff8380f62a RCX: ffffffff84634d80 > [ 152.105421][ T0] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 00000000fffffffe > [ 152.106280][ T0] RBP: ffffffff855d9f60 R08: ffffffff8124f17c R09: fffffbfff08c0f53 > [ 152.107182][ T0] R10: dffff7fff08c0f54 R11: 1ffffffff08c0f52 R12: 0000000000000001 > [ 152.108110][ T0] R13: 0000000000000005 R14: ffffffff8380f62a R15: ffffffff810016ec > [ 152.109002][ T0] FS: 0000000000000000(0000) GS:ffff8883aee00000(0000) knlGS:0000000000000000 > [ 152.109986][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 152.110796][ T0] CR2: ffff88843ffff000 CR3: 0000000004628000 CR4: 00000000000406b0 > [ 152.111748][ T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 152.112686][ T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 152.113568][ T0] Kernel panic - not syncing: Fatal exception > > > If you fix the issue, kindly add following tag > | Reported-by: kernel test robot <yujie.liu@intel.com> > | Link: https://lore.kernel.org/r/202209280801.2d5eebb5-yujie.liu@intel.com This crash appears to just be a symptom of objtool erroring throughout the entire build, which means things like the jump label hacks do not get applied. I see a flood of error: objtool: --mnop requires --mcount throughout the build because the configuration has CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL is enabled so '--mnop' gets passed in without '--mcount'. This should obviously be fixed somehow, perhaps by moving the '--mnop' addition into the '--mcount' if, even if that makes the line really long. A secondary issue is that it seems like if objtool encounters a fatal error like this, it should completely fail the build to make it obvious that something is wrong, rather than allowing it to continue and generate a broken kernel, especially since x86_64 requires objtool to build a working kernel at this point. Cheers, Nathan ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c 2022-09-28 15:44 ` [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c Nathan Chancellor @ 2022-09-28 19:13 ` Josh Poimboeuf 2022-09-28 20:45 ` Nathan Chancellor 0 siblings, 1 reply; 3+ messages in thread From: Josh Poimboeuf @ 2022-09-28 19:13 UTC (permalink / raw) To: Nathan Chancellor Cc: kernel test robot, lkp, aik, linux-kbuild, Peter Zijlstra, chenzhongjin, llvm, npiggin, linux-kernel, lkp, mingo, Sathvika Vasireddy, rostedt, jpoimboe, naveen.n.rao, mbenes, linuxppc-dev On Wed, Sep 28, 2022 at 08:44:27AM -0700, Nathan Chancellor wrote: > This crash appears to just be a symptom of objtool erroring throughout > the entire build, which means things like the jump label hacks do not > get applied. I see a flood of > > error: objtool: --mnop requires --mcount > > throughout the build because the configuration has > CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is > unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but > '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL > is enabled so '--mnop' gets passed in without '--mcount'. This should > obviously be fixed somehow, perhaps by moving the '--mnop' addition into > the '--mcount' if, even if that makes the line really long. > > A secondary issue is that it seems like if objtool encounters a fatal > error like this, it should completely fail the build to make it obvious > that something is wrong, rather than allowing it to continue and > generate a broken kernel, especially since x86_64 requires objtool to > build a working kernel at this point. Grrr... I really dislike that objtool is capable of bricking the kernel like this. We just saw something similar in RHEL. IMO, we should just get rid of this "short JMP" feature in the jump label code, those saved three bytes aren't worth the pain. But yes, we do need to fix that config issue. And yes, maybe fatal objtool warnings should cause a build failure. We used to do that, but it brought a different sort of pain. But if objtool is going to be in the kernel's critical boot path then I guess we have to do that. -- Josh ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c 2022-09-28 19:13 ` Josh Poimboeuf @ 2022-09-28 20:45 ` Nathan Chancellor 0 siblings, 0 replies; 3+ messages in thread From: Nathan Chancellor @ 2022-09-28 20:45 UTC (permalink / raw) To: Josh Poimboeuf Cc: kernel test robot, lkp, aik, linux-kbuild, Peter Zijlstra, chenzhongjin, llvm, npiggin, linux-kernel, lkp, mingo, Sathvika Vasireddy, rostedt, jpoimboe, naveen.n.rao, mbenes, linuxppc-dev On Wed, Sep 28, 2022 at 12:13:53PM -0700, Josh Poimboeuf wrote: > On Wed, Sep 28, 2022 at 08:44:27AM -0700, Nathan Chancellor wrote: > > This crash appears to just be a symptom of objtool erroring throughout > > the entire build, which means things like the jump label hacks do not > > get applied. I see a flood of > > > > error: objtool: --mnop requires --mcount > > > > throughout the build because the configuration has > > CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is > > unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but > > '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL > > is enabled so '--mnop' gets passed in without '--mcount'. This should > > obviously be fixed somehow, perhaps by moving the '--mnop' addition into > > the '--mcount' if, even if that makes the line really long. > > > > A secondary issue is that it seems like if objtool encounters a fatal > > error like this, it should completely fail the build to make it obvious > > that something is wrong, rather than allowing it to continue and > > generate a broken kernel, especially since x86_64 requires objtool to > > build a working kernel at this point. > > Grrr... I really dislike that objtool is capable of bricking the kernel > like this. We just saw something similar in RHEL. > > IMO, we should just get rid of this "short JMP" feature in the jump > label code, those saved three bytes aren't worth the pain. > > But yes, we do need to fix that config issue. Right, I actually see that the report I was CC'd on was a part of a larger thread, where Naveen already suggested the fix for this problem, which is not clang specific it seems: https://lore.kernel.org/1663223588.wppdx3129x.naveen@linux.ibm.com/ > And yes, maybe fatal objtool warnings should cause a build failure. We > used to do that, but it brought a different sort of pain. But if > objtool is going to be in the kernel's critical boot path then I guess > we have to do that. Right, that was 644592d32837 ("objtool: Fail the kernel build on fatal errors") which was reverted in 655cf86548a3 ("objtool: Don't fail the kernel build on fatal errors") objtool should not error on warnings but it seems like it should error for invalid option combinations and other misconfiguration problems? Did this regress with commit b51277eb9775 ("objtool: Ditch subcommands")? I can see that the return code of the subcommands would be passed back via exit() (?) so objtool could fail the build if there was a true problem but after that change, objtool_run() does not have its return code checked so any errors that happen don't get passed back up. Perhaps just the following diff would resolve this? I assume we would need to look at all the different return values to know if this is safe though. Cheers, Nathan diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c index a7ecc32e3512..cda649644e32 100644 --- a/tools/objtool/objtool.c +++ b/tools/objtool/objtool.c @@ -146,7 +146,5 @@ int main(int argc, const char **argv) exec_cmd_init("objtool", UNUSED, UNUSED, UNUSED); pager_init(UNUSED); - objtool_run(argc, argv); - - return 0; + return objtool_run(argc, argv); } ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-28 20:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220912082020.226755-12-sv@linux.ibm.com>
[not found] ` <202209280801.2d5eebb5-yujie.liu@intel.com>
2022-09-28 15:44 ` [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c Nathan Chancellor
2022-09-28 19:13 ` Josh Poimboeuf
2022-09-28 20:45 ` Nathan Chancellor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox