From: Yu Kuai <yukuai@kernel.org>
To: Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
Sagi Grimberg <sagi@grimberg.me>,
Alasdair Kergon <agk@redhat.com>,
Benjamin Marzinski <bmarzins@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
Mikulas Patocka <mpatocka@redhat.com>,
Dongsheng Yang <dongsheng.yang@linux.dev>,
Zheng Gu <cengku@gmail.com>, Coly Li <colyli@fygo.io>,
Kent Overstreet <kent.overstreet@linux.dev>,
Josef Bacik <josef@toxicpanda.com>, Yu Kuai <yukuai@fygo.io>,
Nilay Shroff <nilay@linux.ibm.com>,
linux-block@vger.kernel.org, cgroups@vger.kernel.org,
linux-nvme@lists.infradead.org, dm-devel@lists.linux.dev,
linux-bcache@vger.kernel.org
Subject: [RFC PATCH v1 00/17] blk-cgroup: protect blkgs with blkcg_mutex
Date: Sun, 5 Jul 2026 03:51:07 +0800 [thread overview]
Message-ID: <20260704195124.1375075-1-yukuai@kernel.org> (raw)
From: Yu Kuai <yukuai@fygo.io>
This RFC moves queue-local blkg topology synchronization from
q->queue_lock to q->blkcg_mutex.
q->queue_lock is a hot block-layer spinlock used by request queue runtime
paths, and it is also used in irq-disabled or otherwise atomic contexts.
Using it to protect blkg topology makes blkg lookup, creation,
destruction, policy activation, and policy-state walks inherit those atomic
locking constraints. That forces awkward preallocation schemes such as
radix-tree preloading and prevents missing-blkg creation from sleeping,
even though blkg creation is a blkcg control-plane operation rather than a
queue dispatch fast-path operation.
q->blkcg_mutex is a better fit for blkg protection because it is already a
queue-local blkcg lock, it can serialize the full lookup/create/destroy and
policy activation path, and it allows allocation and parent lookup to run
from sleepable contexts. Moving blkg topology under q->blkcg_mutex also
separates blkcg topology from queue runtime locking, reducing queue_lock
scope and making the locking rules for blkcg policy users explicit.
bio_set_dev() and bio allocation with a bdev can associate a bio with the
destination queue's blkg. Once missing blkg creation is serialized by
q->blkcg_mutex, those helpers may sleep when they create a blkg. The first
part of the series therefore audits callers that can reach these helpers
from completion, spinlocked, irq-disabled, GFP_NOWAIT, or other
non-blocking paths, and either moves association to process context or uses
a nowait association path that avoids sleeping.
The preparatory patches cover NVMe multipath requeue, dm-thin and
dm-snapshot map paths, blk-throttle's private runtime lock, atomic bio
allocation helpers, bcache, dm-bufio, dm-pcache, DM NOWAIT clones/remaps,
and BFQ's locked cgroup update path. The final blkcg patches then move
blkg lookup/create/destroy, policy activation, and configuration
preparation to q->blkcg_mutex; remove radix-tree preloading; move blkg
allocation into blkg_create(); and share creation code between bio
association and config preparation.
This is RFC because the locking conversion changes a central blkcg lifetime
path and relies on all non-sleepable bio association users either being
converted or tolerating nowait association failure.
One intentional tradeoff is left in the nowait paths. They first associate
with an existing blkg. If a thread issues IO to a queue for the first time
from a GFP_NOWAIT or otherwise non-blocking path, the cgroup's blkg for
that queue may not exist yet. After blkg topology moves to q->blkcg_mutex,
preemptible task-context callers try q->blkcg_mutex and attempt blkg
creation. Once allocation moves into blkg_create(), that opportunistic
nowait creation uses GFP_ATOMIC. If the caller is in atomic context,
q->blkcg_mutex is contended, or allocation fails, the nowait helper still
fails and the caller needs to retry from a blocking context, defer the
association, or fall back to an existing slow path.
Patch layout:
Patch 1: move NVMe multipath failover bio retargeting to requeue work so
bio_set_dev() runs from process context instead of completion context.
Patches 2-3: remove or avoid bio_set_dev() while dm-thin and dm-snapshot
locks are held, and restore blkcg association later where needed.
Patch 4: give blk-throttle its own runtime-state lock so blkcg topology
can be moved away from queue_lock.
Patches 5-7: add bio_alloc_atomic(), make bio association nowait-aware,
and make bio allocation with a bdev fail rather than sleep for
non-blocking callers.
Patches 8-12: convert bcache, dm-bufio, dm-pcache, block helper
allocations, and DM NOWAIT remaps/clones to the new nowait or deferred
association model.
Patch 13: avoid a sleeping blkg lookup from BFQ while bfqd->lock is held.
Patch 14: protect queue-local blkg lookup, creation, destruction, policy
activation, and policy state walks with q->blkcg_mutex. This also makes
preemptible nowait bio association try q->blkcg_mutex instead of failing
immediately after an RCU lookup miss.
Patch 15: remove radix-tree preloading after blkg creation no longer runs
under queue_lock.
Patch 16: allocate blkgs inside blkg_create() and use GFP_ATOMIC for the
nowait bio-association trylock creation path.
Patch 17: share blkg creation between bio association and config
preparation.
Yu Kuai (17):
nvme-multipath: retarget failedover bios from requeue work
dm thin: avoid bio_set_dev under pool lock
dm snapshot: avoid bio_set_dev in locked map paths
blk-throttle: protect throttle state with td lock
block: add bio_alloc_atomic() for atomic bio users
blk-cgroup: support non-blocking bio association
block: support non-blocking bio allocation with a bdev
bcache: avoid sleeping blkg association from locked paths
dm bufio: avoid blkg association from GFP_NOWAIT bio init
dm pcache: handle non-blocking bio clone init failure
block: avoid scheduling from non-blocking helper allocations
dm: avoid sleeping blkg association from NOWAIT remaps
bfq: avoid blkg lookup from locked cgroup update
blk-cgroup: protect blkgs with blkcg_mutex
blk-cgroup: remove blkg radix tree preloading
blk-cgroup: allocate blkgs in blkg_create
blk-cgroup: share blkg creation between lookup and config prep
block/bfq-cgroup.c | 26 +-
block/bio.c | 50 +++-
block/blk-cgroup.c | 397 ++++++++++++-----------------
block/blk-cgroup.h | 16 +-
block/blk-crypto-fallback.c | 2 +-
block/blk-iocost.c | 5 +-
block/blk-iolatency.c | 7 +-
block/blk-lib.c | 3 +-
block/blk-map.c | 7 +-
block/blk-throttle.c | 93 +++++--
drivers/md/bcache/journal.c | 9 +-
drivers/md/bcache/request.c | 4 +-
drivers/md/dm-bufio.c | 9 +-
drivers/md/dm-linear.c | 2 +-
drivers/md/dm-pcache/backing_dev.c | 10 +-
drivers/md/dm-snap.c | 29 ++-
drivers/md/dm-stripe.c | 6 +-
drivers/md/dm-switch.c | 2 +-
drivers/md/dm-thin.c | 3 -
drivers/md/dm-unstripe.c | 2 +-
drivers/md/dm.c | 28 +-
drivers/md/md.c | 2 +-
drivers/nvdimm/nd_virtio.c | 11 +-
drivers/nvme/host/multipath.c | 4 +-
fs/gfs2/lops.c | 3 +-
fs/ocfs2/cluster/heartbeat.c | 15 +-
include/linux/bio.h | 53 ++--
include/linux/device-mapper.h | 8 +
include/linux/writeback.h | 2 +-
mm/page_io.c | 2 +-
30 files changed, 467 insertions(+), 343 deletions(-)
base-commit: a1c8bdbbd72564cebb0d02948c1ed57b80b2e773
--
2.51.0
next reply other threads:[~2026-07-04 19:51 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-04 19:51 Yu Kuai [this message]
2026-07-04 19:51 ` [RFC PATCH v1 01/17] nvme-multipath: retarget failedover bios from requeue work Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 02/17] dm thin: avoid bio_set_dev under pool lock Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 03/17] dm snapshot: avoid bio_set_dev in locked map paths Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 04/17] blk-throttle: protect throttle state with td lock Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 05/17] block: add bio_alloc_atomic() for atomic bio users Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 06/17] blk-cgroup: support non-blocking bio association Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 07/17] block: support non-blocking bio allocation with a bdev Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 08/17] bcache: avoid sleeping blkg association from locked paths Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 09/17] dm bufio: avoid blkg association from GFP_NOWAIT bio init Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 10/17] dm pcache: handle non-blocking bio clone init failure Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 11/17] block: avoid scheduling from non-blocking helper allocations Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 12/17] dm: avoid sleeping blkg association from NOWAIT remaps Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 13/17] bfq: avoid blkg lookup from locked cgroup update Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 14/17] blk-cgroup: protect blkgs with blkcg_mutex Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 15/17] blk-cgroup: remove blkg radix tree preloading Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 16/17] blk-cgroup: allocate blkgs in blkg_create Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 17/17] blk-cgroup: share blkg creation between lookup and config prep Yu Kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260704195124.1375075-1-yukuai@kernel.org \
--to=yukuai@kernel.org \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bmarzins@redhat.com \
--cc=cengku@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=colyli@fygo.io \
--cc=dm-devel@lists.linux.dev \
--cc=dongsheng.yang@linux.dev \
--cc=hch@lst.de \
--cc=josef@toxicpanda.com \
--cc=kbusch@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mpatocka@redhat.com \
--cc=nilay@linux.ibm.com \
--cc=sagi@grimberg.me \
--cc=snitzer@kernel.org \
--cc=tj@kernel.org \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox