From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Davidlohr Bueso <dave@stgolabs.net>, Jan Kara <jack@suse.cz>,
kdevops@lists.linux.dev, Luis Chamberlain <mcgrof@kernel.org>,
Christian Brauner <brauner@kernel.org>,
Sasha Levin <sashal@kernel.org>,
viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.14 26/39] fs/buffer: split locking for pagecache lookups
Date: Tue, 29 Apr 2025 19:49:53 -0400 [thread overview]
Message-ID: <20250429235006.536648-26-sashal@kernel.org> (raw)
In-Reply-To: <20250429235006.536648-1-sashal@kernel.org>
From: Davidlohr Bueso <dave@stgolabs.net>
[ Upstream commit 7ffe3de53a885dbb5836541c2178bd07d1bad7df ]
Callers of __find_get_block() may or may not allow for blocking
semantics, and is currently assumed that it will not. Layout
two paths based on this. The the private_lock scheme will
continued to be used for atomic contexts. Otherwise take the
folio lock instead, which protects the buffers, such as
vs migration and try_to_free_buffers().
Per the "hack idea", the latter can alleviate contention on
the private_lock for bdev mappings. For reasons of determinism
and avoid making bugs hard to reproduce, the trylocking is not
attempted.
No change in semantics. All lookup users still take the spinlock.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://kdevops.org/ext4/v6.15-rc2.html # [0]
Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1]
Link: https://lore.kernel.org/20250418015921.132400-2-dave@stgolabs.net
Tested-by: kdevops@lists.linux.dev
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/buffer.c | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index cc8452f602516..a03c245022dcf 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -176,18 +176,8 @@ void end_buffer_write_sync(struct buffer_head *bh, int uptodate)
}
EXPORT_SYMBOL(end_buffer_write_sync);
-/*
- * Various filesystems appear to want __find_get_block to be non-blocking.
- * But it's the page lock which protects the buffers. To get around this,
- * we get exclusion from try_to_free_buffers with the blockdev mapping's
- * i_private_lock.
- *
- * Hack idea: for the blockdev mapping, i_private_lock contention
- * may be quite high. This code could TryLock the page, and if that
- * succeeds, there is no need to take i_private_lock.
- */
static struct buffer_head *
-__find_get_block_slow(struct block_device *bdev, sector_t block)
+__find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
{
struct address_space *bd_mapping = bdev->bd_mapping;
const int blkbits = bd_mapping->host->i_blkbits;
@@ -204,7 +194,16 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
if (IS_ERR(folio))
goto out;
- spin_lock(&bd_mapping->i_private_lock);
+ /*
+ * Folio lock protects the buffers. Callers that cannot block
+ * will fallback to serializing vs try_to_free_buffers() via
+ * the i_private_lock.
+ */
+ if (atomic)
+ spin_lock(&bd_mapping->i_private_lock);
+ else
+ folio_lock(folio);
+
head = folio_buffers(folio);
if (!head)
goto out_unlock;
@@ -236,7 +235,10 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
1 << blkbits);
}
out_unlock:
- spin_unlock(&bd_mapping->i_private_lock);
+ if (atomic)
+ spin_unlock(&bd_mapping->i_private_lock);
+ else
+ folio_unlock(folio);
folio_put(folio);
out:
return ret;
@@ -1388,14 +1390,15 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
* it in the LRU and mark it as accessed. If it is not present then return
* NULL
*/
-struct buffer_head *
-__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
+static struct buffer_head *
+find_get_block_common(struct block_device *bdev, sector_t block,
+ unsigned size, bool atomic)
{
struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
if (bh == NULL) {
/* __find_get_block_slow will mark the page accessed */
- bh = __find_get_block_slow(bdev, block);
+ bh = __find_get_block_slow(bdev, block, atomic);
if (bh)
bh_lru_install(bh);
} else
@@ -1403,6 +1406,12 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
return bh;
}
+
+struct buffer_head *
+__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
+{
+ return find_get_block_common(bdev, block, size, true);
+}
EXPORT_SYMBOL(__find_get_block);
/**
--
2.39.5
next prev parent reply other threads:[~2025-04-29 23:50 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-29 23:49 [PATCH AUTOSEL 6.14 01/39] cpufreq: Add SM8650 to cpufreq-dt-platdev blocklist Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 02/39] cpufreq: Do not enable by default during compile testing Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 03/39] nvmem: rockchip-otp: Move read-offset into variant-data Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 04/39] nvmem: rockchip-otp: add rk3576 variant data Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 05/39] nvmem: core: fix bit offsets of more than one byte Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 06/39] nvmem: core: verify cell's raw_len Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 07/39] nvmem: core: update raw_len if the bit reading is required Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 08/39] nvmem: qfprom: switch to 4-byte aligned reads Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 09/39] scsi: target: iscsi: Fix timeout on deleted connection Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 10/39] scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 11/39] virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 12/39] dma/mapping.c: dev_dbg support for dma_addressing_limited Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 13/39] intel_th: avoid using deprecated page->mapping, index fields Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 14/39] driver core: introduce device_set_driver() helper Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 15/39] driver core: fix potential NULL pointer dereference in dev_uevent() Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 16/39] mei: vsc: Use struct vsc_tp_packet as vsc-tp tx_buf and rx_buf type Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 17/39] dma-mapping: avoid potential unused data compilation warning Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 18/39] btrfs: tree-checker: adjust error code for header level check Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 19/39] cgroup: Fix compilation issue due to cgroup_mutex not being exported Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 20/39] vhost_task: fix vhost_task_create() documentation Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 21/39] vhost-scsi: protect vq->log_used with vq->mutex Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 22/39] scsi: mpi3mr: Add level check to control event logging Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 23/39] net: enetc: refactor bulk flipping of RX buffers to separate function Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 24/39] dma-mapping: Fix warning reported for missing prototype Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 25/39] ima: process_measurement() needlessly takes inode_lock() on MAY_READ Sasha Levin
2025-04-29 23:49 ` Sasha Levin [this message]
2025-04-30 0:47 ` [PATCH AUTOSEL 6.14 26/39] fs/buffer: split locking for pagecache lookups Luis Chamberlain
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 27/39] fs/buffer: introduce sleeping flavors " Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 28/39] fs/buffer: use sleeping version of __find_get_block() Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 29/39] fs/ocfs2: " Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 30/39] fs/jbd2: " Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 31/39] fs/ext4: use sleeping version of sb_find_get_block() Sasha Levin
2025-04-29 23:49 ` [PATCH AUTOSEL 6.14 32/39] drm/amd/display: Enable urgent latency adjustment on DCN35 Sasha Levin
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 33/39] drm/amdgpu: Allow P2P access through XGMI Sasha Levin
2025-04-30 12:58 ` Alex Deucher
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 34/39] selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure Sasha Levin
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 35/39] block: fix race between set_blocksize and read paths Sasha Levin
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 36/39] block: hoist block size validation code to a separate function Sasha Levin
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 37/39] io_uring: don't duplicate flushing in io_req_post_cqe Sasha Levin
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 38/39] bpf: fix possible endless loop in BPF map iteration Sasha Levin
2025-04-29 23:50 ` [PATCH AUTOSEL 6.14 39/39] samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250429235006.536648-26-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=brauner@kernel.org \
--cc=dave@stgolabs.net \
--cc=jack@suse.cz \
--cc=kdevops@lists.linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox