All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: Yu Kuai <hailan@yukuai.org.cn>, Yu Kuai <yukuai3@huawei.com>,
	tj@kernel.org, josef@toxicpanda.com, axboe@kernel.dk,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com,
	yi.zhang@huawei.com, yangerkun@huawei.com,
	johnny.chenyi@huawei.com
Subject: Re: [PATCH 0/4] blk-rq-qos: fix possible deadlock
Date: Wed, 15 Oct 2025 17:27:35 +0800	[thread overview]
Message-ID: <aO9pB8DEDFYcvbz7@fedora> (raw)
In-Reply-To: <4b8aab6f-f341-49af-9ccb-d592e1a40fe5@linux.ibm.com>

On Wed, Oct 15, 2025 at 10:46:51AM +0530, Nilay Shroff wrote:
> 
> 
> On 10/15/25 7:12 AM, Ming Lei wrote:
> > On Tue, Oct 14, 2025 at 07:14:16PM +0800, Yu Kuai wrote:
> >> Hi,
> >>
> >> 在 2025/10/14 18:58, Nilay Shroff 写道:
> >>>
> >>> On 10/14/25 7:51 AM, Yu Kuai wrote:
> >>>> Currently rq-qos debugfs entries is created from rq_qos_add(), while
> >>>> rq_qos_add() requires queue to be freezed. This can deadlock because
> >>>>
> >>>> creating new entries can trigger fs reclaim.
> >>>>
> >>>> Fix this problem by delaying creating rq-qos debugfs entries until
> >>>> it's initialization is complete.
> >>>>
> >>>> - For wbt, it can be initialized by default of by blk-sysfs, fix it by
> >>>>    delaying after wbt_init();
> >>>> - For other policies, they can only be initialized by blkg configuration,
> >>>>    fix it by delaying to blkg_conf_end();
> >>>>
> >>>> Noted this set is cooked on the top of my other thread:
> >>>> https://lore.kernel.org/all/20251010091446.3048529-1-yukuai@kernel.org/
> >>>>
> >>>> And the deadlock can be reporduced with above thead, by running blktests
> >>>> throtl/001 with wbt enabled by default. While the deadlock is really a
> >>>> long term problem.
> >>>>
> >>> While freezing the queue we also mark GFP_NOIO scope, so doesn't that
> >>> help avoid fs-reclaim? Or maybe if you can share the lockdep splat
> >>> encountered running throtl/001?
> >>
> >> Yes, we can avoid fs-reclaim if queue is freezing, however,
> >> because debugfs is a generic file system, and we can't avoid fs reclaim from
> >> all context. There is still
> >>
> >> Following is the log with above set and wbt enabled by default, the set acquire
> >> lock order by:
> >>
> >> freeze queue -> elevator lock -> rq_qos_mutex -> blkcg_mutex
> >>
> >> However, fs-reclaim from other context cause the deadlock report.
> >>
> >>
> >> [   45.632372][  T531] ======================================================
> >> [   45.633734][  T531] WARNING: possible circular locking dependency detected
> >> [   45.635062][  T531] 6.17.0-gfd4a560a0864-dirty #30 Not tainted
> >> [   45.636220][  T531] ------------------------------------------------------
> >> [   45.637587][  T531] check/531 is trying to acquire lock:
> >> [   45.638626][  T531] ffff9473884382b0 (&q->rq_qos_mutex){+.+.}-{4:4}, at: blkg_
> >> conf_start+0x116/0x190
> >> [   45.640416][  T531]
> >> [   45.640416][  T531] but task is already holding lock:
> >> [   45.641828][  T531] ffff9473884385d8 (&q->elevator_lock){+.+.}-{4:4}, at: blkg
> >> _conf_start+0x108/0x190
> >> [   45.643322][  T531]
> >> [   45.643322][  T531] which lock already depends on the new lock.
> >> [   45.643322][  T531]
> >> [   45.644862][  T531]
> >> [   45.644862][  T531] the existing dependency chain (in reverse order) is:
> >> [   45.646046][  T531]
> >> [   45.646046][  T531] -> #5 (&q->elevator_lock){+.+.}-{4:4}:
> >> [   45.647052][  T531]        __mutex_lock+0xd3/0x8d0
> >> [   45.647716][  T531]        blkg_conf_start+0x108/0x190
> >> [   45.648395][  T531]        tg_set_limit+0x74/0x300
> >> [   45.649046][  T531]        kernfs_fop_write_iter+0x14a/0x210
> >> [   45.649813][  T531]        vfs_write+0x29e/0x550
> >> [   45.650413][  T531]        ksys_write+0x74/0xf0
> >> [   45.651032][  T531]        do_syscall_64+0xbb/0x380
> >> [   45.651730][  T531] entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > 
> > Not sure why elevator lock is grabbed in throttle code, which looks a elevator lock
> > misuse, what does the elevator try to protect here?
> > 
> > The comment log doesn't explain the usage too:
> > 
> Lets go back to the history:
> The ->elevator_lock was first added in the wbt code path under this commit 
> 245618f8e45f ("block: protect wbt_lat_usec using q->elevator_lock"). It was
> introduced to protect the wbt latency and state updates which could be 
> simultaneously accessed from elevator switch path and from sysfs write method
> (queue_wb_lat_store()) as well as from cgroup (ioc_qos_write()).
> 
> Later above change caused a lockdep splat and then we updated the code 
> to fix locking order between ->freeze_lock, ->elevator_lock and ->rq_qos_mutex
> and that was implemented in this commit 9730763f4756 ("block: correct locking
> order for protecting blk-wbt parameters"). With this change we set the 
> locking order as follows: 
> ->freeze_lock ->elevator_lock ->rq_qos_mutex
> 
> Then later on under this commit 78c271344b6f ("block: move wbt_enable_default()
> out of queue freezing from sched ->exit()") we moved the wbt latency/stat
> update code out of the ->freeze_lock and ->elevator_lock from elevator switch
> path. So essentially with this commit now in theory we don't need to acquire
> ->elevator_lock while updating wbt latency/stat values. In fact, we also removed
> ->elevator_lock  from queue_wb_lat_store() in this commit but I think we missed
> to remove ->elevator_lock from cgroup (ioc_qos_write()). 

Nice looking back!

I will cook one patch to remove it from ioc_qos_write().

> 
> > 
> > I think it is still order issue between queue freeze and q->rq_qos_mutex
> > first, which need to be solved first.
> > 
> So yes we should first target to get rid off the use of ->elevator_lock
> from ioc_qos_write(). Later we can decide on locking order between
> ->freeze_lock, ->rq_qos_mutex and ->debugfs_mutex. 

Yu Kuai is working on ordering queue freeze and ->rq_qos_mutex.


Thanks,
Ming


      reply	other threads:[~2025-10-15  9:27 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14  2:21 [PATCH 0/4] blk-rq-qos: fix possible deadlock Yu Kuai
2025-10-14  2:21 ` [PATCH 1/4] blk-mq-debugfs: warn about " Yu Kuai
2025-10-14  8:06   ` Ming Lei
2025-10-14  8:21     ` Yu Kuai
2025-10-14  8:34       ` Ming Lei
2025-10-14  2:21 ` [PATCH 2/4] blk-mq-debugfs: factor out a helper blk_mq_debugfs_register_rq_qos() Yu Kuai
2025-10-14  2:21 ` [PATCH 3/4] blk-rq-qos: fix possible deadlock Yu Kuai
2025-10-14  8:13   ` Ming Lei
2025-10-14  8:24     ` Yu Kuai
2025-10-14  8:37       ` Ming Lei
2025-10-14  8:42         ` Yu Kuai
2025-10-14  8:55           ` Ming Lei
2025-10-14  9:03             ` Yu Kuai
2025-10-14  2:21 ` [PATCH 4/4] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
2025-10-14  8:15   ` Ming Lei
2025-10-14  8:26     ` Yu Kuai
2025-10-14 10:58 ` [PATCH 0/4] blk-rq-qos: fix possible deadlock Nilay Shroff
2025-10-14 11:14   ` Yu Kuai
2025-10-14 17:57     ` Nilay Shroff
2025-10-15  1:36       ` Yu Kuai
2025-10-15  1:42     ` Ming Lei
2025-10-15  5:16       ` Nilay Shroff
2025-10-15  9:27         ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aO9pB8DEDFYcvbz7@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=hailan@yukuai.org.cn \
    --cc=johnny.chenyi@huawei.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=tj@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.