Linux real-time development
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "idryomov@gmail.com" <idryomov@gmail.com>,
	"ionut.nechita@windriver.com" <ionut.nechita@windriver.com>
Cc: "ionut_n2001@yahoo.com" <ionut_n2001@yahoo.com>,
	"sage@newdream.net" <sage@newdream.net>,
	Xiubo Li <xiubli@redhat.com>,
	"linux-rt-devel@lists.linux.dev" <linux-rt-devel@lists.linux.dev>,
	"jkosina@suse.com" <jkosina@suse.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	Alex Markuze <amarkuze@redhat.com>,
	"jlayton@kernel.org" <jlayton@kernel.org>,
	"bigeasy@linutronix.de" <bigeasy@linutronix.de>,
	"clrkwllms@kernel.org" <clrkwllms@kernel.org>,
	"superm1@kernel.org" <superm1@kernel.org>
Subject: RE: [PATCH] ceph: add timeout protection to ceph_mdsc_sync() path
Date: Tue, 17 Feb 2026 21:52:00 +0000	[thread overview]
Message-ID: <fe346c26386a4fc85cfa27e669dffd13191e7ea4.camel@ibm.com> (raw)
In-Reply-To: <20260213075111.32886-2-ionut.nechita@windriver.com>

On Fri, 2026-02-13 at 09:51 +0200, Ionut Nechita (Wind River) wrote:
> I also created a tracker issue for this on the Ceph bug tracker:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_74897&d=DwIDaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=oVQ3XXnnOXYdQh1XLw3tF7NQtVn2RbspKR87xKMX9OaXwxMeG5-j9NZql6OVPhi1&s=RCmrpV6SMVfjurhivjMyHRm_bDekVEQl_uIhD5hbtno&e= 
> 


It looks like that I was able to reproduce the symptoms of the issue by multiple
runs of generic/013 xfstests' test-case:

#!/bin/bash

while true; do
  sudo ./check generic/013
done

Feb 16 15:46:30 ceph-0005 kernel: [ 1845.346895] INFO: task fsstress:14466
blocked for more than 122 seconds.
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.347995]       Not tainted 6.19.0-rc8+
#10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.348530] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349426] task:fsstress        state:D
stack:0     pid:14466 tgid:14466 ppid:14464  task
_flags:0x400140 flags:0x00080800
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349438] Call Trace:
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349441]  <TASK>
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349445]  __schedule+0xe8a/0x57f0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349457]  ? kasan_save_stack+0x39/0x60
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349466]  ? kasan_save_stack+0x26/0x60
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349471]  ? kasan_save_track+0x14/0x40
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349475]  ?
kasan_save_free_info+0x3b/0x60
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349485]  ? __kasan_slab_free+0x7a/0xb0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349489]  ?
ceph_mdsc_release_request+0x6a3/0x880
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349497]  ?
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349502]  ?
__kasan_check_write+0x14/0x30
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349507]  ?
__pv_queued_spin_lock_slowpath+0xb04/0xf80
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349514]  ? __pfx___schedule+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349520]  ? __kasan_check_read+0x11/0x20
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349525]  ?
__call_rcu_common+0x386/0x14b0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349532]  schedule+0x75/0x2f0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349538]  schedule_timeout+0x16d/0x210
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349542]  ?
__pfx_schedule_timeout+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349548]  ?
__kasan_check_write+0x14/0x30
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349553]  ?
_raw_spin_lock_irq+0x8b/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349559]  ?
__pfx__raw_spin_lock_irq+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349565]  ? kasan_save_track+0x14/0x40
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349569]
wait_for_completion+0x14a/0x340
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349573]  ?
__pfx_wait_for_completion+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349577]  ?
__kasan_check_write+0x14/0x30
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349582]  ? __pfx_mutex_unlock+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349587]  ceph_mdsc_sync+0x4b4/0xe80
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349593]  ?
__pfx_ceph_mdsc_sync+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349597]  ?
ceph_osdc_put_request+0x38/0x770
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349603]  ? ceph_osdc_sync+0x1cb/0x350
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349608]  ceph_sync_fs+0xa0/0x4c0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349612]  sync_filesystem+0x182/0x240
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349618]  __x64_sys_syncfs+0xac/0x160
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349623]  x64_sys_call+0x746/0x2360
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349629]  do_syscall_64+0x82/0x5d0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349635]  ? __x64_sys_openat+0x108/0x240
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349641]  ? __kasan_check_read+0x11/0x20
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349647]  ?
fpregs_assert_state_consistent+0x5c/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349655]  ?
__pfx___x64_sys_openat+0x10/0x10
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349661]  ?
__kasan_check_write+0x14/0x30
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349667]  ? ksys_write+0x1a3/0x230
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349672]  ? __kasan_check_read+0x11/0x20
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349677]  ?
fpregs_assert_state_consistent+0x5c/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349682]  ? do_syscall_64+0xbf/0x5d0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349687]  ?
fpregs_assert_state_consistent+0x5c/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349692]  ? __kasan_check_read+0x11/0x20
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349705]  ?
fpregs_assert_state_consistent+0x5c/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349709]  ? do_syscall_64+0xbf/0x5d0
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349715]  ? __kasan_check_read+0x11/0x20
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349720]  ?
fpregs_assert_state_consistent+0x5c/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349724]  ? irqentry_exit+0xa5/0x600
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349730]  ? exc_page_fault+0x95/0x100
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349736]
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349740] RIP: 0033:0x792fb1d1ba4b
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349745] RSP: 002b:00007ffc3844eb58
EFLAGS: 00000246 ORIG_RAX: 0000000000000132
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349752] RAX: ffffffffffffffda RBX:
0000000000000000 RCX: 0000792fb1d1ba4b
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349756] RDX: 0000000000000000 RSI:
000059045610b440 RDI: 0000000000000004
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349759] RBP: 0000000000000004 R08:
0000000000000026 R09: 00007ffc3844e986
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349762] R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000149
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349765] R13: 00007ffc3844eba0 R14:
000059042de9d0b3 R15: 0000000000000149
Feb 16 15:46:30 ceph-0005 kernel: [ 1845.349771]  </TASK>

l *ceph_mdsc_sync+0x4b4
0xffffffff82cddbe4 is in ceph_mdsc_sync (fs/ceph/mds_client.c:5916).
5911                }
5912                doutc(cl, "wait on %llu (want %llu)\n",
5913                      req->r_tid, want_tid);
5914                wait_for_completion(&req->r_safe_completion);
5915    
5916                mutex_lock(&mdsc->mutex);
5917                ceph_mdsc_put_request(req);
5918                if (!nextreq)
5919                    break;  /* next dne before, so we're done! */
5920                if (RB_EMPTY_NODE(&nextreq->r_node)) {

I am not sure yet that reason is the same.

Thanks,
Slava.

  reply	other threads:[~2026-02-17 21:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-08 13:18 [PATCH] ceph: add timeout protection to ceph_mdsc_sync() path Ionut Nechita (Wind River)
2026-02-09 23:03 ` Viacheslav Dubeyko
2026-02-11  7:21 ` Sebastian Andrzej Siewior
2026-02-13  7:51 ` Ionut Nechita (Wind River)
2026-02-17 21:52   ` Viacheslav Dubeyko [this message]
2026-02-18 19:57     ` Ionut Nechita (Wind River)
2026-02-18 20:04       ` Viacheslav Dubeyko
2026-02-19  9:37         ` Alex Markuze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe346c26386a4fc85cfa27e669dffd13191e7ea4.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=amarkuze@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=ceph-devel@vger.kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=idryomov@gmail.com \
    --cc=ionut.nechita@windriver.com \
    --cc=ionut_n2001@yahoo.com \
    --cc=jkosina@suse.com \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=sage@newdream.net \
    --cc=slava@dubeyko.com \
    --cc=superm1@kernel.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox