From: Yu Kuai <yukuai@kernel.org>
To: Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
Sagi Grimberg <sagi@grimberg.me>,
Alasdair Kergon <agk@redhat.com>,
Benjamin Marzinski <bmarzins@redhat.com>,
Mike Snitzer <snitzer@kernel.org>,
Mikulas Patocka <mpatocka@redhat.com>,
Dongsheng Yang <dongsheng.yang@linux.dev>,
Zheng Gu <cengku@gmail.com>, Coly Li <colyli@fygo.io>,
Kent Overstreet <kent.overstreet@linux.dev>,
Josef Bacik <josef@toxicpanda.com>, Yu Kuai <yukuai@fygo.io>,
Nilay Shroff <nilay@linux.ibm.com>,
linux-block@vger.kernel.org, cgroups@vger.kernel.org,
linux-nvme@lists.infradead.org, dm-devel@lists.linux.dev,
linux-bcache@vger.kernel.org
Subject: [RFC PATCH v1 03/17] dm snapshot: avoid bio_set_dev in locked map paths
Date: Sun, 5 Jul 2026 03:51:10 +0800 [thread overview]
Message-ID: <20260704195124.1375075-4-yukuai@kernel.org> (raw)
In-Reply-To: <20260704195124.1375075-1-yukuai@kernel.org>
From: Yu Kuai <yukuai@fygo.io>
bio_set_dev() is about to become explicitly sleepable. It currently
updates the bio's target device and then associates the bio with the
destination queue's blkcg state. After blkcg lookup/creation is moved
under the queue's blkcg_mutex, that association may take blkcg_mutex and
allocate a new blkg. Callers therefore must not invoke bio_set_dev() from
atomic or otherwise non-sleepable sections.
snapshot_map() has several remap decisions inside
dm_exception_table_lock(), which nests the completed and pending
exception hash-table spinlocks. Those locks protect the lookup result,
pending-exception insertion, pe->started, and the pending bio lists until
the bio has either been returned to DM core or queued on the pending
exception. Dropping the locks just to call bio_set_dev() would require
revalidating the exception state and preserving the pending-list ordering
rules; calling a sleepable bio_set_dev() while holding the spinlocks is not
allowed either.
Split out snapshot_bio_set_dev() for these locked remap decisions. It only
performs the non-sleeping part of bio_set_dev(): clear BIO_REMAPPED, clear
BIO_BPS_THROTTLED when the bdev changes, and update bi_bdev. It
deliberately does not associate the bio with a blkg while snapshot locks
are held.
This does not lose blkcg attribution for the normal DM_MAPIO_REMAPPED case.
After the target returns, DM core submits the mapped bio through
dm_submit_bio_remap(), and that helper clones the blkg association from the
original bio in the normal submission context.
Some snapshot bios are not submitted by DM core immediately. Writes
waiting for a pending exception and bios queued during snapshot merge are
kept on snapshot-owned lists and submitted later after copy or merge
completion. Once bio_set_dev() is no longer used in the locked path,
these delayed bios also need their blkcg association restored at submission
time. Submit those bios through dm_submit_bio_remap() instead of
submit_bio_noacct() so the association is cloned from the original bio
after the snapshot locks have been released.
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
drivers/md/dm-snap.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 1489fda9d24a..373a94156ec7 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -192,6 +192,19 @@ static sector_t chunk_to_sector(struct dm_exception_store *store,
return chunk << store->chunk_shift;
}
+/*
+ * Snapshot exception-table locks are spinlocks. Only update the target
+ * device while holding them; dm_submit_bio_remap() will associate target-owned
+ * bios with the original bio's blkg from a sleepable submission context.
+ */
+static void snapshot_bio_set_dev(struct bio *bio, struct block_device *bdev)
+{
+ bio_clear_flag(bio, BIO_REMAPPED);
+ if (bio->bi_bdev != bdev)
+ bio_clear_flag(bio, BIO_BPS_THROTTLED);
+ bio->bi_bdev = bdev;
+}
+
static int bdev_equal(struct block_device *lhs, struct block_device *rhs)
{
/*
@@ -1566,7 +1579,7 @@ static void flush_bios(struct bio *bio)
while (bio) {
n = bio->bi_next;
bio->bi_next = NULL;
- submit_bio_noacct(bio);
+ dm_submit_bio_remap(bio, NULL);
bio = n;
}
}
@@ -1586,7 +1599,7 @@ static void retry_origin_bios(struct dm_snapshot *s, struct bio *bio)
bio->bi_next = NULL;
r = do_origin(s->origin, bio, false);
if (r == DM_MAPIO_REMAPPED)
- submit_bio_noacct(bio);
+ dm_submit_bio_remap(bio, NULL);
bio = n;
}
}
@@ -1827,7 +1840,7 @@ static void start_full_bio(struct dm_snap_pending_exception *pe,
bio->bi_end_io = full_bio_end_io;
bio->bi_private = callback_data;
- submit_bio_noacct(bio);
+ dm_submit_bio_remap(bio, NULL);
}
static struct dm_snap_pending_exception *
@@ -1898,7 +1911,7 @@ __find_pending_exception(struct dm_snapshot *s,
static void remap_exception(struct dm_snapshot *s, struct dm_exception *e,
struct bio *bio, chunk_t chunk)
{
- bio_set_dev(bio, s->cow->bdev);
+ snapshot_bio_set_dev(bio, s->cow->bdev);
bio->bi_iter.bi_sector =
chunk_to_sector(s->store, dm_chunk_number(e->new_chunk) +
(chunk - e->old_chunk)) +
@@ -1982,7 +1995,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
* defeat the goal of freeing space in origin that is
* implied by the "discard_passdown_origin" feature)
*/
- bio_set_dev(bio, s->origin->bdev);
+ snapshot_bio_set_dev(bio, s->origin->bdev);
track_chunk(s, bio, chunk);
goto out_unlock;
}
@@ -2081,7 +2094,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
goto out;
}
} else {
- bio_set_dev(bio, s->origin->bdev);
+ snapshot_bio_set_dev(bio, s->origin->bdev);
track_chunk(s, bio, chunk);
}
@@ -2143,7 +2156,7 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
chunk >= s->first_merging_chunk &&
chunk < (s->first_merging_chunk +
s->num_merging_chunks)) {
- bio_set_dev(bio, s->origin->bdev);
+ snapshot_bio_set_dev(bio, s->origin->bdev);
bio_list_add(&s->bios_queued_during_merge, bio);
r = DM_MAPIO_SUBMITTED;
goto out_unlock;
@@ -2157,7 +2170,7 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
}
redirect_to_origin:
- bio_set_dev(bio, s->origin->bdev);
+ snapshot_bio_set_dev(bio, s->origin->bdev);
if (bio_data_dir(bio) == WRITE) {
up_write(&s->lock);
--
2.51.0
next prev parent reply other threads:[~2026-07-04 19:52 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-04 19:51 [RFC PATCH v1 00/17] blk-cgroup: protect blkgs with blkcg_mutex Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 01/17] nvme-multipath: retarget failedover bios from requeue work Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 02/17] dm thin: avoid bio_set_dev under pool lock Yu Kuai
2026-07-04 19:51 ` Yu Kuai [this message]
2026-07-04 19:51 ` [RFC PATCH v1 04/17] blk-throttle: protect throttle state with td lock Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 05/17] block: add bio_alloc_atomic() for atomic bio users Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 06/17] blk-cgroup: support non-blocking bio association Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 07/17] block: support non-blocking bio allocation with a bdev Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 08/17] bcache: avoid sleeping blkg association from locked paths Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 09/17] dm bufio: avoid blkg association from GFP_NOWAIT bio init Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 10/17] dm pcache: handle non-blocking bio clone init failure Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 11/17] block: avoid scheduling from non-blocking helper allocations Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 12/17] dm: avoid sleeping blkg association from NOWAIT remaps Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 13/17] bfq: avoid blkg lookup from locked cgroup update Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 14/17] blk-cgroup: protect blkgs with blkcg_mutex Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 15/17] blk-cgroup: remove blkg radix tree preloading Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 16/17] blk-cgroup: allocate blkgs in blkg_create Yu Kuai
2026-07-04 19:51 ` [RFC PATCH v1 17/17] blk-cgroup: share blkg creation between lookup and config prep Yu Kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260704195124.1375075-4-yukuai@kernel.org \
--to=yukuai@kernel.org \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bmarzins@redhat.com \
--cc=cengku@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=colyli@fygo.io \
--cc=dm-devel@lists.linux.dev \
--cc=dongsheng.yang@linux.dev \
--cc=hch@lst.de \
--cc=josef@toxicpanda.com \
--cc=kbusch@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mpatocka@redhat.com \
--cc=nilay@linux.ibm.com \
--cc=sagi@grimberg.me \
--cc=snitzer@kernel.org \
--cc=tj@kernel.org \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox