All of lore.kernel.org
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: "open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org, lkft-triage@lists.linaro.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Mark Brown <broonie@kernel.org>,
	Zheng Yejian <zhengyejian1@huawei.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Arnd Bergmann <arnd@arndb.de>
Subject: Re: selftests: user_events: ftrace_test - RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470)
Date: Fri, 27 Oct 2023 12:20:11 -0700	[thread overview]
Message-ID: <20231027192011.GA436-beaub@linux.microsoft.com> (raw)
In-Reply-To: <CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6aA@mail.gmail.com>

On Fri, Oct 27, 2023 at 05:38:41PM +0530, Naresh Kamboju wrote:
> Following kernel crash noticed on x86_64 while running selftests: user_events:
> ftrace_test running 6.6.0-rc7-next-20231026.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 
> kselftest: Running tests in user_events
> TAP version 13
> 1..4
> # timeout set to 90
> # selftests: user_events: ftrace_test
> [ 2391.606817] general protection fault, probably for non-canonical
> address 0x6b6b6b6b6b6b8a83: 0000 [#1] PREEMPT SMP PTI
> [ 2391.617519] CPU: 1 PID: 34662 Comm: ftrace_test Not tainted
> 6.6.0-rc7-next-20231026 #1
> [ 2391.625428] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.7 12/07/2021
> [ 2391.632811] RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470)
> [ 2391.637952] Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00
> 55 31 f6 48 89 e5 41 55 41 54 53 48 89 fb 48 c7 c7 40 8c 61 94 e8 92
> d3 5a 01 <44> 0f b6 a3 18 1f 00 00 41 80 fc 01 0f 87 c8 dc 4e 01 45 31
> ed 41
> All code
> ========
>    0: 90                    nop
>    1: 90                    nop
>    2: 90                    nop
>    3: 90                    nop
>    4: 90                    nop
>    5: 90                    nop
>    6: 90                    nop
>    7: 90                    nop
>    8: 90                    nop
>    9: 90                    nop
>    a: 90                    nop
>    b: 90                    nop
>    c: 66 0f 1f 00          nopw   (%rax)
>   10: 55                    push   %rbp
>   11: 31 f6                xor    %esi,%esi
>   13: 48 89 e5              mov    %rsp,%rbp
>   16: 41 55                push   %r13
>   18: 41 54                push   %r12
>   1a: 53                    push   %rbx
>   1b: 48 89 fb              mov    %rdi,%rbx
>   1e: 48 c7 c7 40 8c 61 94 mov    $0xffffffff94618c40,%rdi
>   25: e8 92 d3 5a 01        callq  0x15ad3bc
>   2a:* 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d <-- trapping instruction
>   31: 00
>   32: 41 80 fc 01          cmp    $0x1,%r12b
>   36: 0f 87 c8 dc 4e 01    ja     0x14edd04
>   3c: 45 31 ed              xor    %r13d,%r13d
>   3f: 41                    rex.B
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d
>    7: 00
>    8: 41 80 fc 01          cmp    $0x1,%r12b
>    c: 0f 87 c8 dc 4e 01    ja     0x14edcda
>   12: 45 31 ed              xor    %r13d,%r13d
>   15: 41                    rex.B
> [ 2391.656696] RSP: 0018:ffffb36e0a477d80 EFLAGS: 00010246
> [ 2391.661937] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000080000000
> [ 2391.669064] RDX: 0000000000000000 RSI: ffffffff9299b722 RDI: ffffffff9299b722
> [ 2391.676195] RBP: ffffb36e0a477d98 R08: 000000000000002f R09: 0000000000000002
> [ 2391.683321] R10: ffffb36e0a477d70 R11: 0000000000000000 R12: 0000000000000002
> [ 2391.690453] R13: ffffb36e0a477e88 R14: ffff99c5803a2230 R15: ffff99c581c39000
> [ 2391.697586] FS:  00007fb4b9681740(0000) GS:ffff99c6efa80000(0000)
> knlGS:0000000000000000
> [ 2391.705670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2391.711410] CR2: 00007fb4b96ab5e0 CR3: 000000010635c002 CR4: 00000000003706f0
> [ 2391.718540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2391.725665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2391.732797] Call Trace:
> [ 2391.735240]  <TASK>
> [ 2391.737339] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [ 2391.740744] ? die_addr (arch/x86/kernel/dumpstack.c:421
> arch/x86/kernel/dumpstack.c:460)
> [ 2391.744056] ? exc_general_protection (arch/x86/kernel/traps.c:697
> arch/x86/kernel/traps.c:642)
> [ 2391.748766] ? asm_exc_general_protection
> (arch/x86/include/asm/idtentry.h:564)
> [ 2391.753652] ? __mutex_lock (kernel/locking/mutex.c:613
> (discriminator 3) kernel/locking/mutex.c:747 (discriminator 3))
> [ 2391.757487] ? __mutex_lock (kernel/locking/mutex.c:613
> (discriminator 3) kernel/locking/mutex.c:747 (discriminator 3))
> [ 2391.761318] ? tracing_update_buffers (kernel/trace/trace.c:6470)
> [ 2391.765851] event_enable_write (kernel/trace/trace_events.c:1408)
> [ 2391.769976] vfs_write (fs/read_write.c:582)
> [ 2391.773296] ? close_fd_get_file (fs/file.c:821)
> [ 2391.777396] ? preempt_count_sub (kernel/sched/core.c:5857
> kernel/sched/core.c:5853 kernel/sched/core.c:5875)
> [ 2391.781496] ksys_write (fs/read_write.c:638)
> [ 2391.784918] __x64_sys_write (fs/read_write.c:646)
> [ 2391.788671] do_syscall_64 (arch/x86/entry/common.c:51
> arch/x86/entry/common.c:82)
> [ 2391.792248] ? do_syscall_64 (arch/x86/entry/common.c:101)
> [ 2391.795995] ? syscall_exit_to_user_mode (kernel/entry/common.c:299)
> [ 2391.800785] ? do_syscall_64 (arch/x86/entry/common.c:101)
> [ 2391.804529] ? do_syscall_64 (arch/x86/entry/common.c:101)
> [ 2391.808275] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
> [ 2391.813327] RIP: 0033:0x7fb4b977c140
> [ 2391.816920] Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7
> c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28
> 48 89
> All code
> ========
>    0: 40 00 48 8b          add    %cl,-0x75(%rax)
>    4: 15 c1 9c 0d 00        adc    $0xd9cc1,%eax
>    9: f7 d8                neg    %eax
>    b: 64 89 02              mov    %eax,%fs:(%rdx)
>    e: 48 c7 c0 ff ff ff ff mov    $0xffffffffffffffff,%rax
>   15: eb b7                jmp    0xffffffffffffffce
>   17: 0f 1f 00              nopl   (%rax)
>   1a: 80 3d a1 24 0e 00 00 cmpb   $0x0,0xe24a1(%rip)        # 0xe24c2
>   21: 74 17                je     0x3a
>   23: b8 01 00 00 00        mov    $0x1,%eax
>   28: 0f 05                syscall
>   2a:* 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax <--
> trapping instruction
>   30: 77 58                ja     0x8a
>   32: c3                    retq
>   33: 0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>   3a: 48 83 ec 28          sub    $0x28,%rsp
>   3e: 48                    rex.W
>   3f: 89                    .byte 0x89
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax
>    6: 77 58                ja     0x60
>    8: c3                    retq
>    9: 0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>   10: 48 83 ec 28          sub    $0x28,%rsp
>   14: 48                    rex.W
>   15: 89                    .byte 0x89
> [ 2391.835660] RSP: 002b:00007ffc43b05b38 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> [ 2391.843225] RAX: ffffffffffffffda RBX: 00007ffc43b05d88 RCX: 00007fb4b977c140
> [ 2391.850350] RDX: 0000000000000002 RSI: 000056376b59b7d4 RDI: 0000000000000007
> [ 2391.857482] RBP: 00007ffc43b05b60 R08: 0000000000000000 R09: 00007fb4b9681740
> [ 2391.864615] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> [ 2391.871747] R13: 00007ffc43b05d98 R14: 000056376b59ddc8 R15: 00007fb4b9981020
> [ 2391.878907]  </TASK>
> [ 2391.881106] Modules linked in: x86_pkg_temp_thermal fuse configfs
> [ 2391.887288] ---[ end trace 0000000000000000 ]---
> [ 2391.891915] RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470)
> [ 2391.897231] Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00
> 55 31 f6 48 89 e5 41 55 41 54 53 48 89 fb 48 c7 c7 40 8c 61 94 e8 92
> d3 5a 01 <44> 0f b6 a3 18 1f 00 00 41 80 fc 01 0f 87 c8 dc 4e 01 45 31
> ed 41
> All code
> ========
>    0: 90                    nop
>    1: 90                    nop
>    2: 90                    nop
>    3: 90                    nop
>    4: 90                    nop
>    5: 90                    nop
>    6: 90                    nop
>    7: 90                    nop
>    8: 90                    nop
>    9: 90                    nop
>    a: 90                    nop
>    b: 90                    nop
>    c: 66 0f 1f 00          nopw   (%rax)
>   10: 55                    push   %rbp
>   11: 31 f6                xor    %esi,%esi
>   13: 48 89 e5              mov    %rsp,%rbp
>   16: 41 55                push   %r13
>   18: 41 54                push   %r12
>   1a: 53                    push   %rbx
>   1b: 48 89 fb              mov    %rdi,%rbx
>   1e: 48 c7 c7 40 8c 61 94 mov    $0xffffffff94618c40,%rdi
>   25: e8 92 d3 5a 01        callq  0x15ad3bc
>   2a:* 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d <-- trapping instruction
>   31: 00
>   32: 41 80 fc 01          cmp    $0x1,%r12b
>   36: 0f 87 c8 dc 4e 01    ja     0x14edd04
>   3c: 45 31 ed              xor    %r13d,%r13d
>   3f: 41                    rex.B
> 
> Code starting with the faulting instruction
> ===========================================
>    0: 44 0f b6 a3 18 1f 00 movzbl 0x1f18(%rbx),%r12d
>    7: 00
>    8: 41 80 fc 01          cmp    $0x1,%r12b
>    c: 0f 87 c8 dc 4e 01    ja     0x14edcda
>   12: 45 31 ed              xor    %r13d,%r13d
>   15: 41                    rex.B
> [ 2391.916120] RSP: 0018:ffffb36e0a477d80 EFLAGS: 00010246
> [ 2391.921569] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000080000000
> [ 2391.928872] RDX: 0000000000000000 RSI: ffffffff9299b722 RDI: ffffffff9299b722
> [ 2391.936237] RBP: ffffb36e0a477d98 R08: 000000000000002f R09: 0000000000000002
> [ 2391.943388] R10: ffffb36e0a477d70 R11: 0000000000000000 R12: 0000000000000002
> [ 2391.950527] R13: ffffb36e0a477e88 R14: ffff99c5803a2230 R15: ffff99c581c39000
> [ 2391.957670] FS:  00007fb4b9681740(0000) GS:ffff99c6efa80000(0000)
> knlGS:0000000000000000
> [ 2391.965822] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2391.971579] CR2: 00007fb4b96ab5e0 CR3: 000000010635c002 CR4: 00000000003706f0
> [ 2391.978721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2391.985879] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2391.993028] Kernel panic - not syncing: Fatal exception
> [ 2391.998287] Kernel Offset: 0x10000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2392.009066] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 

It's hard to repro this, it does repro after running several times for
me.

Steven, this is firing when the self test writes 1 to the enable file
for a user_event created event.

Sometimes I get this splat instead of a panic:
[  956.819778] ------------[ cut here ]------------
[  956.820526] WARNING: CPU: 12 PID: 914 at kernel/trace/trace.c:516 tracing_release_file_tr+0x46/0x50
[  956.821389] Modules linked in:
[  956.821708] CPU: 12 PID: 914 Comm: ftrace_test Not tainted 6.6.0-rc7-next-20231026 #1
[  956.822501] RIP: 0010:tracing_release_file_tr+0x46/0x50
[  956.822902] Code: d1 03 01 8b 83 c0 1e 00 00 85 c0 74 1d 83 e8 01 48 c7 c7 80 5b ef 84 89 83 c0 1e 00 00 e8 b2 b5 03 01 31 c0 5b e9 75 ee 27 01 <0f> 0b eb df 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[  956.824261] RSP: 0018:ffffb59d879e3ee8 EFLAGS: 00010246
[  956.824656] RAX: 0000000000000000 RBX: ffff9e386a82c058 RCX: 0000000000000000
[  956.825239] RDX: 0000000000000000 RSI: ffffffff84b6b383 RDI: 00000000ffffffff
[  956.825835] RBP: ffff9e3844076200 R08: 0000000000000000 R09: 0000000000000000
[  956.826425] R10: ffffb59d879e3ee8 R11: ffffffff8328526f R12: 00000000000f801f
[  956.827359] R13: ffff9e3846134020 R14: ffff9e3843fd03a8 R15: 0000000000000000
[  956.828732] FS:  00007eff4f818740(0000) GS:ffff9e3f26300000(0000) knlGS:0000000000000000
[  956.829436] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  956.829968] CR2: 00007eff4f42a250 CR3: 00000007d0744004 CR4: 0000000000370eb0
[  956.830591] Call Trace:
[  956.830804]  <TASK>
[  956.831113]  ? __warn+0x7f/0x160
[  956.831469]  ? tracing_release_file_tr+0x46/0x50
[  956.831891]  ? report_bug+0x1c3/0x1d0
[  956.832224]  ? handle_bug+0x3c/0x70
[  956.832544]  ? exc_invalid_op+0x14/0x70
[  956.832861]  ? asm_exc_invalid_op+0x16/0x20
[  956.833190]  ? tracing_release_file_tr+0x1f/0x50
[  956.833727]  ? tracing_release_file_tr+0x46/0x50
[  956.834159]  ? tracing_release_file_tr+0x1f/0x50
[  956.834579]  __fput+0xab/0x300
[  956.834903]  __x64_sys_close+0x38/0x80
[  956.835235]  do_syscall_64+0x41/0xf0
[  956.835554]  entry_SYSCALL_64_after_hwframe+0x6c/0x74
[  956.835974] RIP: 0033:0x7eff4f515157
[  956.836291] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 b8 f7 ff
[  956.837728] RSP: 002b:00007ffc021d8158 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  956.838351] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007eff4f515157
[  956.838975] RDX: 0000000000000002 RSI: 000055637921d7d4 RDI: 0000000000000005
[  956.839586] RBP: 00007ffc021d8180 R08: 0000000000000000 R09: 00007eff4f818740
[  956.840201] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc021d83b8
[  956.840810] R13: 000055637921cae7 R14: 000055637921fc38 R15: 00007eff4f946040
[  956.841562]  </TASK>
[  956.841787] irq event stamp: 881
[  956.842145] hardirqs last  enabled at (889): [<ffffffff831d7f49>] console_unlock+0x109/0x130
[  956.843013] hardirqs last disabled at (896): [<ffffffff831d7f2e>] console_unlock+0xee/0x130
[  956.843648] softirqs last  enabled at (694): [<ffffffff8314ba2a>] irq_exit_rcu+0x8a/0xe0
[  956.844276] softirqs last disabled at (689): [<ffffffff8314ba2a>] irq_exit_rcu+0x8a/0xe0
[  956.844893] ---[ end trace 0000000000000000 ]---

Warning is from this code:
static void __trace_array_put(struct trace_array *this_tr)
{
        WARN_ON(!this_tr->ref);
        this_tr->ref--;
}

It seems like there might be a timing window or an incorrect call to
trace_array_put() somewhere. Do you think this is related to the eventfs
work?

Thanks,
-Beau

> Links:
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823454/suite/log-parser-test/tests/
> - https://lkft.validation.linaro.org/scheduler/job/6974179#L5053
> 
> metadata:
> git_ref: master
> git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> git_sha: 2ef7141596eed0b4b45ef18b3626f428a6b0a822
> git_describe: next-20231026
> kernel_version: 6.6.0-rc7
> kernel-config:
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2XHt24sNSdog7DYY3FLKFZpZmjG/config
> artifact-location:
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2XHt24sNSdog7DYY3FLKFZpZmjG/
> toolchain: gcc-13
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org

  reply	other threads:[~2023-10-27 19:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-27 12:08 selftests: user_events: ftrace_test - RIP: 0010:tracing_update_buffers (kernel/trace/trace.c:6470) Naresh Kamboju
2023-10-27 19:20 ` Beau Belgrave [this message]
2023-10-27 22:36   ` Steven Rostedt
2023-10-28  2:33     ` Steven Rostedt
2023-10-30 16:31       ` Beau Belgrave
2023-10-30 16:42         ` Steven Rostedt
2023-10-30 16:50           ` Beau Belgrave
2023-10-30 21:31           ` Steven Rostedt
2023-10-31  0:27             ` Beau Belgrave
2023-10-31  4:00               ` Steven Rostedt
2023-10-31 14:45                 ` Steven Rostedt
2023-10-31 16:07                   ` Steven Rostedt
2023-10-31 17:05                     ` Beau Belgrave

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231027192011.GA436-beaub@linux.microsoft.com \
    --to=beaub@linux.microsoft.com \
    --cc=arnd@arndb.de \
    --cc=broonie@kernel.org \
    --cc=dan.carpenter@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lkft-triage@lists.linaro.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=zhengyejian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.