All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pengfei Xu <pengfei.xu@intel.com>
To: Hillf Danton <hdanton@sina.com>
Cc: <lihuafei1@huawei.com>, <rostedt@goodmis.org>,
	<linux-kernel@vger.kernel.org>, <lkp@intel.com>,
	<frederic@kernel.org>, <quic_neeraju@quicinc.com>,
	<paulmck@kernel.org>, <heng.su@intel.com>
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel
Date: Thu, 13 Apr 2023 16:03:51 +0800	[thread overview]
Message-ID: <ZDe3Z2WD5hxeQFTn@xpf.sh.intel.com> (raw)
In-Reply-To: <20230413070042.1479-1-hdanton@sina.com>

Hi Danton,

On 2023-04-13 at 15:00:42 +0800, Hillf Danton wrote:
> On 13 Apr 2023 10:45:35 +0800 Pengfei Xu <pengfei.xu@intel.com>
> > Hi Huafei and kernel experts,
> > 
> > It's a soft remind.
> > This issue could be reproduced in v6.3-rc6 kernel.
> > It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
> > After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
> > dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.
> > 
> > New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
> > are in link: https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu
> > 
> > Thanks!
> > BR.
> > 
> > On 2023-03-15 at 20:20:44 +0800, Pengfei Xu wrote:
> > > Hi Li Huafei and kernel experts,
> > > 
> > > Greeting!
> > > 
> > > Platform: x86 platforms
> > > There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:
> > > 
> > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
> > > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
> > > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
> > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
> > > v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
> > > "
> > > [   24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
> > > [   31.392668] hrtimer: interrupt took 10726570 ns
> 
> What hrtimer/s triggered this print? And for what? The hrtimer trace helps here.
> Info like that helps work out the cause of the task hang.
> Feel free to fold in the debug diff below if it makes sense to you.
> 
  Thanks for your debug patch.

  I newly updated previous alder lake S reproduced syzkaller repro.prog and
  machineInfo0(vm info) info in log link:
https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu

And tried debug kernel on alder lake S, there were 2 cpu in guest, and
CPU1 reported "interrupt took 331596 ns", then kernel should not trigger
"rcu_tasks_wait_gp" for this issue.
"
[  104.171591] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=334 'systemd'
[  126.770981] hrtimer CPU1: interrupt took 331596 ns      //  "rcu_tasks_wait_gp" after 15s
[  141.127282] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2571 jiffies old.
[  171.272100] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10107 jiffies old.
[  261.767189] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 32731 jiffies old.
[  300.102911] INFO: task repro_rcu:398 blocked for more than 147 seconds.
[  300.103408]       Not tainted 6.3.0-rc6-dbg-dirty #1
[  300.103743] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  300.104230] task:repro_rcu       state:D stack:0     pid:398   ppid:396    flags:0x00004006
[  300.104780] Call Trace:
[  300.104992]  <TASK>
[  300.105265]  __schedule+0x40a/0xc30
[  300.105706]  ? wait_for_completion+0x7b/0x180
[  300.106041]  schedule+0x5b/0xe0
[  300.106330]  schedule_timeout+0x4db/0x5b0
[  300.106959]  ? schedule_timeout+0x9/0x5b0
[  300.107364]  ? wait_for_completion+0x7b/0x180
[  300.107698]  wait_for_completion+0xa6/0x180
[  300.108141]  __wait_rcu_gp+0x136/0x160
[  300.108474]  ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
[  300.108875]  synchronize_rcu_tasks_generic.part.24+0x3b/0x60
[  300.109263]  ? __pfx_call_rcu_tasks+0x10/0x10
[  300.109592]  ? __pfx_wakeme_after_rcu+0x10/0x10
[  300.110020]  ? verify_cpu+0x10/0x100
[  300.110353]  synchronize_rcu_tasks_generic+0x24/0x70
[  300.110840]  synchronize_rcu_tasks+0x19/0x20
[  300.111173]  ftrace_shutdown+0x1cc/0x410
[  300.111564]  unregister_ftrace_function+0x35/0x230
[  300.111950]  ? __sanitizer_cov_trace_switch+0x57/0xa0
[  300.112380]  perf_ftrace_event_register+0x95/0xf0
[  300.112733]  ? __pfx_tp_perf_event_destroy+0x10/0x10
[  300.113092]  perf_trace_destroy+0x3a/0xa0
[  300.113410]  ? __pfx_tp_perf_event_destroy+0x10/0x10
[  300.113768]  tp_perf_event_destroy+0x1e/0x30
[  300.114109]  _free_event+0x101/0x810
[  300.114470]  put_event+0x3c/0x50
[  300.114920]  perf_event_release_kernel+0x2de/0x360
[  300.115266]  ? perf_event_release_kernel+0x9/0x360
[  300.115685]  ? __pfx_perf_release+0x10/0x10
[  300.116036]  perf_release+0x22/0x30
[  300.116344]  __fput+0x11f/0x450
[  300.116739]  ____fput+0x1e/0x30
[  300.117031]  task_work_run+0xb6/0x120
[  300.117407]  do_exit+0x547/0x12b0
[  300.117739]  ? write_comp_data+0x2f/0x90
[  300.118141]  do_group_exit+0x5e/0xf0
[  300.118498]  get_signal+0x1465/0x14a0
[  300.119142]  arch_do_signal_or_restart+0x33/0x280
[  300.119697]  exit_to_user_mode_prepare+0x13b/0x210
[  300.120078]  syscall_exit_to_user_mode+0x2d/0x60
[  300.120439]  do_syscall_64+0x4a/0x90
[  300.120748]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  300.121102] RIP: 0033:0x7f2c55b8a59d
[  300.121389] RSP: 002b:00007ffc6ea72078 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[  300.121894] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2c55b8a59d
[  300.122332] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000080
[  300.122890] RBP: 00007ffc6ea72090 R08: 0000000000000000 R09: 00007ffc6ea72090
[  300.123327] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011c0
[  300.123765] R13: 00007ffc6ea721b0 R14: 0000000000000000 R15: 0000000000000000
[  300.124479]  </TASK>
[  300.124676] 
               Showing all locks held in the system:
[  300.125063] 1 lock held by rcu_tasks_kthre/11:
[  300.125370]  #0: ffffffff83d63450 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[  300.126170] 1 lock held by rcu_tasks_rude_/12:
[  300.126476]  #0: ffffffff83d631d0 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[  300.127421] 1 lock held by rcu_tasks_trace/13:
[  300.127728]  #0: ffffffff83d62f10 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[  300.128551] 1 lock held by khungtaskd/29:
[  300.128834]  #0: ffffffff83d63e60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
[  300.129607] 1 lock held by systemd-journal/124:
[  300.129921] 2 locks held by repro_rcu/398:
[  300.130209]  #0: ffffffff83e20668 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
[  300.131083]  #1: ffffffff83e1cd28 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230

[  300.132040] =============================================
"

Thanks!
BR.

> +++ b/kernel/time/hrtimer.c
> @@ -1866,7 +1866,7 @@ retry:
>  	else
>  		expires_next = ktime_add(now, delta);
>  	tick_program_event(expires_next, 1);
> -	pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
> +	pr_warn_once("hrtimer CPU%u: interrupt took %llu ns\n", cpu_base->cpu, ktime_to_ns(delta));
>  }
>  
>  /* called with interrupts disabled */

  parent reply	other threads:[~2023-04-13  8:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 12:20 [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel Pengfei Xu
2023-04-13  2:45 ` Pengfei Xu
2023-04-13 11:19   ` Frederic Weisbecker
2023-04-13 11:51     ` Zhang, Qiang1
2023-04-13 12:09     ` Pengfei Xu
2023-04-13 12:40     ` Jens Axboe
2023-04-13 13:55       ` Frederic Weisbecker
     [not found] ` <20230413070042.1479-1-hdanton@sina.com>
2023-04-13  8:03   ` Pengfei Xu [this message]
     [not found]   ` <20230413083017.1518-1-hdanton@sina.com>
2023-04-13  9:08     ` Pengfei Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDe3Z2WD5hxeQFTn@xpf.sh.intel.com \
    --to=pengfei.xu@intel.com \
    --cc=frederic@kernel.org \
    --cc=hdanton@sina.com \
    --cc=heng.su@intel.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.