public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pengfei Xu <pengfei.xu@intel.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: <linux-kernel@vger.kernel.org>, <heng.su@intel.com>, <axboe@kernel.dk>
Subject: Re: [Syzkaller & bisect] There is "io_ring_exit_work" related Call Trace in v5.2-rc5 kernel
Date: Thu, 2 Feb 2023 16:56:21 +0800	[thread overview]
Message-ID: <Y9t6tflIcFPy84S5@xpf.sh.intel.com> (raw)
In-Reply-To: <15161e5f-fe26-23e9-1d0a-ebd579d64d9f@gmail.com>

Hi Pavel Begunkov,

On 2023-02-01 at 15:52:47 +0000, Pavel Begunkov wrote:
> On 1/28/23 14:49, Pavel Begunkov wrote:
> > On 1/28/23 08:44, Pengfei Xu wrote:
> > > Hi Pavel Begunkov and kernel expert,
> > > 
> > > Greeting!
> > > 
> > > There is "io_ring_exit_work" related Call Trace in v5.2-rc5 kernel in guest
> > > on Sapphire Rapids server.
> > 
> > Thanks for the report, we'll take a look
> 
> Not reproducible for me. Apparently, the repro creates a normal ring
> and immediately closes it, then io_ring_exit_work() hangs waiting to
> the task to execute task_work.
  Ah, I put the reproduced virtual machine in gitlab.
  git clone https://gitlab.com/xupengfe/repro_vm_env.git
  cd repro_vm_env
  tar -xvf repro_vm_env.tar.gz
  cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
     //start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  In vm and login with root,  there is no password for root.
  After login vm successfully, ./repro   to reproduce this issue.
  It could be reproduced in about 6min.

  If you already have qemu-system-x86_64, please ignore below info.
  If you want to install qemu v7.1.0 version,
  git clone https://github.com/qemu/qemu.git
  cd qemu
  git checkout -f v7.1.0
  mkdir build
  cd build
  yum install -y ninja-build.x86_64
  ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
  make
  make install
  Done.

> 
> It reminds me a bug we had before when the task was being stopped
> for debugging blocking task_work. Maybe some signal interaction, or
> vm and/or syz magic specific.
> 
  Thanks for your info!
  I hope it's helpful.

  Thanks!
  BR.

> > > INFO: task kworker/u4:2:32 blocked for more than 147 seconds.
> > > [  300.212512]       Not tainted 6.2.0-rc5-2241ab53cbb5 #1
> > > [  300.213087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  300.213921] task:kworker/u4:2    state:D stack:0     pid:32    ppid:2      flags:0x00004000
> > > [  300.214849] Workqueue: events_unbound io_ring_exit_work
> > > [  300.215506] Call Trace:
> > > [  300.215789]  <TASK>
> > > [  300.216051]  __schedule+0x385/0xb00
> > > [  300.216467]  ? write_comp_data+0x2f/0x90
> > > [  300.216949]  ? wait_for_completion+0x7b/0x180
> > > [  300.217445]  schedule+0x5b/0xe0
> > > [  300.217836]  schedule_timeout+0x561/0x650
> > > [  300.218301]  ? _raw_spin_unlock_irq+0x2b/0x60
> > > [  300.218816]  ? wait_for_completion+0x7b/0x180
> > > [  300.219341]  ? __this_cpu_preempt_check+0x20/0x30
> > > [  300.219893]  ? lockdep_hardirqs_on+0x8a/0x110
> > > [  300.220384]  ? _raw_spin_unlock_irq+0x2b/0x60
> > > [  300.220881]  ? wait_for_completion+0x9e/0x180
> > > [  300.221382]  ? trace_hardirqs_on+0x3f/0x100
> > > [  300.221868]  ? wait_for_completion+0x7b/0x180
> > > [  300.222379]  wait_for_completion+0xa6/0x180
> > > [  300.222897]  io_ring_exit_work+0x2f7/0x747
> > > [  300.223383]  ? __pfx_io_tctx_exit_cb+0x10/0x10
> > > [  300.223947]  process_one_work+0x3b1/0x960
> > > [  300.224446]  worker_thread+0x52/0x660
> > > [  300.224884]  ? __pfx_worker_thread+0x10/0x10
> > > [  300.225376]  kthread+0x161/0x1a0
> > > [  300.225782]  ? __pfx_kthread+0x10/0x10
> > > [  300.226225]  ret_from_fork+0x29/0x50
> > > [  300.226688]  </TASK>
> > > [  300.227791]
> > > [  300.227791] Showing all locks held in the system:
> > > [  300.228501] 1 lock held by rcu_tasks_kthre/11:
> > > [  300.229003]  #0: ffffffff83d617b0 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x2d/0x3d0
> > > [  300.230098] 1 lock held by rcu_tasks_rude_/12:
> > > [  300.230601]  #0: ffffffff83d61530 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x2d/0x3d0
> > > [  300.231733] 1 lock held by rcu_tasks_trace/13:
> > > [  300.232244]  #0: ffffffff83d61270 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x2d/0x3d0
> > > [  300.233377] 1 lock held by khungtaskd/29:
> > > [  300.233842]  #0: ffffffff83d621c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x27/0x18c
> > > [  300.234866] 2 locks held by kworker/u4:2/32:
> > > [  300.235351]  #0: ff11000006045d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x2ff/0x960
> > > [  300.236502]  #1: ffa0000000117e60 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x303/0x960
> > > [  300.237696]
> > > [  300.237882] =============================================
> > > [  300.237882]
> > > 
> > > Bisected and found the first bad commit is:
> > > eebd2e37e662617a6b8041db75205f0a262ce870
> > > io_uring: don't take task ring-file notes
> > > 
> > > But after reverted above commit eebd2e37e on top of v6.2-rc5 kernel and made
> > > the kernel failed. So it's just the suspected commit for above problem.
> > > 
> > > Syzkaller reproduced code, bisect info, kconfig and v6.2-rc5 dmesg are in
> > > attached.
> > > 
> > > All detailed info is in link:
> > > https://github.com/xupengfe/syzkaller_logs/tree/main/230124_055801_io_ring_exit_work
> > > 
> > > I hope it's helpful.
> > > 
> > > Thanks!
> > > BR.
> > 
> 
> -- 
> Pavel Begunkov

  reply	other threads:[~2023-02-02  8:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28  8:44 [Syzkaller & bisect] There is "io_ring_exit_work" related Call Trace in v5.2-rc5 kernel Pengfei Xu
2023-01-28  8:52 ` [Syzkaller & bisect] There is "io_ring_exit_work" related Call Trace in v6.2-rc5 kernel Pengfei Xu
2023-01-28 14:49 ` [Syzkaller & bisect] There is "io_ring_exit_work" related Call Trace in v5.2-rc5 kernel Pavel Begunkov
2023-02-01 15:52   ` Pavel Begunkov
2023-02-02  8:56     ` Pengfei Xu [this message]
2023-02-02 19:44       ` Jens Axboe
2023-02-02 20:37         ` Jens Axboe
2023-02-03  2:08           ` Pengfei Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9t6tflIcFPy84S5@xpf.sh.intel.com \
    --to=pengfei.xu@intel.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=heng.su@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox