* buffer cache cleanups
@ 2025-01-06 9:54 Christoph Hellwig
2025-01-06 9:54 ` [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets Christoph Hellwig
` (14 more replies)
0 siblings, 15 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
Hi all,
over the holidays I dusted off some old buffer cache cleanup as the bio
splitting in the zoned code gave me a better idea how to handle
discontiguous buffer bio submission. This spiraled a bit into various
additional minor fixes and cleanups.
Diffstt:
xfs_buf.c | 508 ++++++++++++++++++++-----------------------------------
xfs_buf.h | 9
xfs_buf_item.h | 5
xfs_dquot.c | 14 -
xfs_inode_item.c | 14 -
xfs_trans_ail.c | 9
xfs_trans_buf.c | 8
7 files changed, 195 insertions(+), 372 deletions(-)
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 2:00 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 02/15] xfs: remove the incorrect comment above xfs_buf_free_maps Christoph Hellwig
` (13 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
__xfs_buf_submit calls xfs_buf_ioend when b_io_remaining hits zero. For
in-memory buftargs b_io_remaining is never incremented from it's initial
value of 1, so this always happens. Thus the extra call to xfs_buf_ioend
in _xfs_buf_ioapply causes a double completion. Fortunately
__xfs_buf_submit is only used for synchronous reads on in-memory buftargs
due to the peculiarities of how they work, so this is mostly harmless and
just causes a little extra work to be done.
Fixes: 5076a6040ca1 ("xfs: support in-memory buffer cache targets")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index aa63b8efd782..787caf0c3254 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1656,10 +1656,8 @@ _xfs_buf_ioapply(
op |= REQ_META;
/* in-memory targets are directly mapped, no IO required. */
- if (xfs_buftarg_is_mem(bp->b_target)) {
- xfs_buf_ioend(bp);
+ if (xfs_buftarg_is_mem(bp->b_target))
return;
- }
/*
* Walk all the vectors issuing IO on them. Set up the initial offset
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 02/15] xfs: remove the incorrect comment above xfs_buf_free_maps
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
2025-01-06 9:54 ` [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 2:00 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 03/15] xfs: remove the incorrect comment about the b_pag field Christoph Hellwig
` (12 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
The comment above xfs_buf_free_maps talks about fields not even used in
the function and also doesn't add any other value. Remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 787caf0c3254..1927655fed13 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -202,9 +202,6 @@ xfs_buf_get_maps(
return 0;
}
-/*
- * Frees b_pages if it was allocated.
- */
static void
xfs_buf_free_maps(
struct xfs_buf *bp)
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 03/15] xfs: remove the incorrect comment about the b_pag field
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
2025-01-06 9:54 ` [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets Christoph Hellwig
2025-01-06 9:54 ` [PATCH 02/15] xfs: remove the incorrect comment above xfs_buf_free_maps Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 2:01 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 04/15] xfs: move xfs_buf_iowait out of (__)xfs_buf_submit Christoph Hellwig
` (11 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
The rbtree root is long gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 3d56bc7a35cc..da80399c7457 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -187,7 +187,7 @@ struct xfs_buf {
int b_io_error; /* internal IO error state */
wait_queue_head_t b_waiters; /* unpin waiters */
struct list_head b_list;
- struct xfs_perag *b_pag; /* contains rbtree root */
+ struct xfs_perag *b_pag;
struct xfs_mount *b_mount;
struct xfs_buftarg *b_target; /* buffer target (device) */
void *b_addr; /* virtual address of buffer */
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 04/15] xfs: move xfs_buf_iowait out of (__)xfs_buf_submit
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (2 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 03/15] xfs: remove the incorrect comment about the b_pag field Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 2:02 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf Christoph Hellwig
` (10 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
There is no good reason to pass a bool argument to wait for a buffer when
the callers that want that can easily just wait themselves.
This means the wait moves out of the extra hold of the buffer, but as the
callers of synchronous buffer I/O need to hold a reference anyway that is
perfectly fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 36 +++++++++++++++++-------------------
1 file changed, 17 insertions(+), 19 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1927655fed13..a3484421a6d8 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -52,14 +52,8 @@ struct kmem_cache *xfs_buf_cache;
* b_lock (trylock due to inversion)
*/
-static int __xfs_buf_submit(struct xfs_buf *bp, bool wait);
-
-static inline int
-xfs_buf_submit(
- struct xfs_buf *bp)
-{
- return __xfs_buf_submit(bp, !(bp->b_flags & XBF_ASYNC));
-}
+static int xfs_buf_submit(struct xfs_buf *bp);
+static int xfs_buf_iowait(struct xfs_buf *bp);
static inline bool xfs_buf_is_uncached(struct xfs_buf *bp)
{
@@ -797,13 +791,18 @@ _xfs_buf_read(
struct xfs_buf *bp,
xfs_buf_flags_t flags)
{
+ int error;
+
ASSERT(!(flags & XBF_WRITE));
ASSERT(bp->b_maps[0].bm_bn != XFS_BUF_DADDR_NULL);
bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD | XBF_DONE);
bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
- return xfs_buf_submit(bp);
+ error = xfs_buf_submit(bp);
+ if (!error && !(flags & XBF_ASYNC))
+ error = xfs_buf_iowait(bp);
+ return error;
}
/*
@@ -978,9 +977,10 @@ xfs_buf_read_uncached(
bp->b_flags |= XBF_READ;
bp->b_ops = ops;
- xfs_buf_submit(bp);
- if (bp->b_error) {
- error = bp->b_error;
+ error = xfs_buf_submit(bp);
+ if (!error)
+ error = xfs_buf_iowait(bp);
+ if (error) {
xfs_buf_relse(bp);
return error;
}
@@ -1483,6 +1483,8 @@ xfs_bwrite(
XBF_DONE);
error = xfs_buf_submit(bp);
+ if (!error)
+ error = xfs_buf_iowait(bp);
if (error)
xfs_force_shutdown(bp->b_mount, SHUTDOWN_META_IO_ERROR);
return error;
@@ -1698,9 +1700,8 @@ xfs_buf_iowait(
* holds an additional reference itself.
*/
static int
-__xfs_buf_submit(
- struct xfs_buf *bp,
- bool wait)
+xfs_buf_submit(
+ struct xfs_buf *bp)
{
int error = 0;
@@ -1764,9 +1765,6 @@ __xfs_buf_submit(
xfs_buf_ioend_async(bp);
}
- if (wait)
- error = xfs_buf_iowait(bp);
-
/*
* Release the hold that keeps the buffer referenced for the entire
* I/O. Note that if the buffer is async, it is not safe to reference
@@ -2322,7 +2320,7 @@ xfs_buf_delwri_submit_buffers(
bp->b_flags |= XBF_ASYNC;
xfs_buf_list_del(bp);
}
- __xfs_buf_submit(bp, false);
+ xfs_buf_submit(bp);
}
blk_finish_plug(&plug);
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (3 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 04/15] xfs: move xfs_buf_iowait out of (__)xfs_buf_submit Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 2:08 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers Christoph Hellwig
` (9 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
xfs_buf_delwri_pushbuf synchronously writes a buffer that is on a delwri
list already. Instead of doing a complicated dance with the delwri
and wait list, just leave them alone and open code the actual buffer
write.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 33 ++++++++-------------------------
1 file changed, 8 insertions(+), 25 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index a3484421a6d8..7edd7a1e9dae 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2391,14 +2391,9 @@ xfs_buf_delwri_submit(
* Push a single buffer on a delwri queue.
*
* The purpose of this function is to submit a single buffer of a delwri queue
- * and return with the buffer still on the original queue. The waiting delwri
- * buffer submission infrastructure guarantees transfer of the delwri queue
- * buffer reference to a temporary wait list. We reuse this infrastructure to
- * transfer the buffer back to the original queue.
+ * and return with the buffer still on the original queue.
*
- * Note the buffer transitions from the queued state, to the submitted and wait
- * listed state and back to the queued state during this call. The buffer
- * locking and queue management logic between _delwri_pushbuf() and
+ * The buffer locking and queue management logic between _delwri_pushbuf() and
* _delwri_queue() guarantee that the buffer cannot be queued to another list
* before returning.
*/
@@ -2407,33 +2402,21 @@ xfs_buf_delwri_pushbuf(
struct xfs_buf *bp,
struct list_head *buffer_list)
{
- LIST_HEAD (submit_list);
int error;
ASSERT(bp->b_flags & _XBF_DELWRI_Q);
trace_xfs_buf_delwri_pushbuf(bp, _RET_IP_);
- /*
- * Isolate the buffer to a new local list so we can submit it for I/O
- * independently from the rest of the original list.
- */
xfs_buf_lock(bp);
- list_move(&bp->b_list, &submit_list);
- xfs_buf_unlock(bp);
-
- /*
- * Delwri submission clears the DELWRI_Q buffer flag and returns with
- * the buffer on the wait list with the original reference. Rather than
- * bounce the buffer from a local wait list back to the original list
- * after I/O completion, reuse the original list as the wait list.
- */
- xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
+ bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC);
+ bp->b_flags |= XBF_WRITE;
+ xfs_buf_submit(bp);
/*
- * The buffer is now locked, under I/O and wait listed on the original
- * delwri queue. Wait for I/O completion, restore the DELWRI_Q flag and
- * return with the buffer unlocked and on the original queue.
+ * The buffer is now locked, under I/O but still on the original delwri
+ * queue. Wait for I/O completion, restore the DELWRI_Q flag and
+ * return with the buffer unlocked and still on the original queue.
*/
error = xfs_buf_iowait(bp);
bp->b_flags |= _XBF_DELWRI_Q;
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (4 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:31 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 07/15] xfs: move write verification out of _xfs_buf_ioapply Christoph Hellwig
` (8 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
xfs_buf_delwri_submit_buffers has two callers for synchronous and
asynchronous writes that share very little logic. Split out a helper for
the shared per-buffer loop and otherwise open code the submission in the
two callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 121 +++++++++++++++++++++--------------------------
1 file changed, 55 insertions(+), 66 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 7edd7a1e9dae..e48d796c786b 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2259,72 +2259,26 @@ xfs_buf_cmp(
return 0;
}
-/*
- * Submit buffers for write. If wait_list is specified, the buffers are
- * submitted using sync I/O and placed on the wait list such that the caller can
- * iowait each buffer. Otherwise async I/O is used and the buffers are released
- * at I/O completion time. In either case, buffers remain locked until I/O
- * completes and the buffer is released from the queue.
- */
-static int
-xfs_buf_delwri_submit_buffers(
- struct list_head *buffer_list,
- struct list_head *wait_list)
+static bool
+xfs_buf_delwri_submit_prep(
+ struct xfs_buf *bp)
{
- struct xfs_buf *bp, *n;
- int pinned = 0;
- struct blk_plug plug;
-
- list_sort(NULL, buffer_list, xfs_buf_cmp);
-
- blk_start_plug(&plug);
- list_for_each_entry_safe(bp, n, buffer_list, b_list) {
- if (!wait_list) {
- if (!xfs_buf_trylock(bp))
- continue;
- if (xfs_buf_ispinned(bp)) {
- xfs_buf_unlock(bp);
- pinned++;
- continue;
- }
- } else {
- xfs_buf_lock(bp);
- }
-
- /*
- * Someone else might have written the buffer synchronously or
- * marked it stale in the meantime. In that case only the
- * _XBF_DELWRI_Q flag got cleared, and we have to drop the
- * reference and remove it from the list here.
- */
- if (!(bp->b_flags & _XBF_DELWRI_Q)) {
- xfs_buf_list_del(bp);
- xfs_buf_relse(bp);
- continue;
- }
-
- trace_xfs_buf_delwri_split(bp, _RET_IP_);
-
- /*
- * If we have a wait list, each buffer (and associated delwri
- * queue reference) transfers to it and is submitted
- * synchronously. Otherwise, drop the buffer from the delwri
- * queue and submit async.
- */
- bp->b_flags &= ~_XBF_DELWRI_Q;
- bp->b_flags |= XBF_WRITE;
- if (wait_list) {
- bp->b_flags &= ~XBF_ASYNC;
- list_move_tail(&bp->b_list, wait_list);
- } else {
- bp->b_flags |= XBF_ASYNC;
- xfs_buf_list_del(bp);
- }
- xfs_buf_submit(bp);
+ /*
+ * Someone else might have written the buffer synchronously or marked it
+ * stale in the meantime. In that case only the _XBF_DELWRI_Q flag got
+ * cleared, and we have to drop the reference and remove it from the
+ * list here.
+ */
+ if (!(bp->b_flags & _XBF_DELWRI_Q)) {
+ xfs_buf_list_del(bp);
+ xfs_buf_relse(bp);
+ return false;
}
- blk_finish_plug(&plug);
- return pinned;
+ trace_xfs_buf_delwri_split(bp, _RET_IP_);
+ bp->b_flags &= ~_XBF_DELWRI_Q;
+ bp->b_flags |= XBF_WRITE;
+ return true;
}
/*
@@ -2347,7 +2301,30 @@ int
xfs_buf_delwri_submit_nowait(
struct list_head *buffer_list)
{
- return xfs_buf_delwri_submit_buffers(buffer_list, NULL);
+ struct xfs_buf *bp, *n;
+ int pinned = 0;
+ struct blk_plug plug;
+
+ list_sort(NULL, buffer_list, xfs_buf_cmp);
+
+ blk_start_plug(&plug);
+ list_for_each_entry_safe(bp, n, buffer_list, b_list) {
+ if (!xfs_buf_trylock(bp))
+ continue;
+ if (xfs_buf_ispinned(bp)) {
+ xfs_buf_unlock(bp);
+ pinned++;
+ continue;
+ }
+ if (!xfs_buf_delwri_submit_prep(bp))
+ continue;
+ bp->b_flags |= XBF_ASYNC;
+ xfs_buf_list_del(bp);
+ xfs_buf_submit(bp);
+ }
+ blk_finish_plug(&plug);
+
+ return pinned;
}
/*
@@ -2364,9 +2341,21 @@ xfs_buf_delwri_submit(
{
LIST_HEAD (wait_list);
int error = 0, error2;
- struct xfs_buf *bp;
+ struct xfs_buf *bp, *n;
+ struct blk_plug plug;
- xfs_buf_delwri_submit_buffers(buffer_list, &wait_list);
+ list_sort(NULL, buffer_list, xfs_buf_cmp);
+
+ blk_start_plug(&plug);
+ list_for_each_entry_safe(bp, n, buffer_list, b_list) {
+ xfs_buf_lock(bp);
+ if (!xfs_buf_delwri_submit_prep(bp))
+ continue;
+ bp->b_flags &= ~XBF_ASYNC;
+ list_move_tail(&bp->b_list, &wait_list);
+ xfs_buf_submit(bp);
+ }
+ blk_finish_plug(&plug);
/* Wait for IO to complete. */
while (!list_empty(&wait_list)) {
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 07/15] xfs: move write verification out of _xfs_buf_ioapply
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (5 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:33 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 08/15] xfs: move in-memory buftarg handling " Christoph Hellwig
` (7 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
Split the write verification logic out of _xfs_buf_ioapply into a new
xfs_buf_verify_write helper called by xfs_buf_submit given that it isn't
about applying the I/O and doesn't really fit in with the rest of
_xfs_buf_ioapply.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 67 ++++++++++++++++++++++++++----------------------
1 file changed, 37 insertions(+), 30 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e48d796c786b..18e830c4e990 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1615,36 +1615,6 @@ _xfs_buf_ioapply(
if (bp->b_flags & XBF_WRITE) {
op = REQ_OP_WRITE;
-
- /*
- * Run the write verifier callback function if it exists. If
- * this function fails it will mark the buffer with an error and
- * the IO should not be dispatched.
- */
- if (bp->b_ops) {
- bp->b_ops->verify_write(bp);
- if (bp->b_error) {
- xfs_force_shutdown(bp->b_mount,
- SHUTDOWN_CORRUPT_INCORE);
- return;
- }
- } else if (bp->b_rhash_key != XFS_BUF_DADDR_NULL) {
- struct xfs_mount *mp = bp->b_mount;
-
- /*
- * non-crc filesystems don't attach verifiers during
- * log recovery, so don't warn for such filesystems.
- */
- if (xfs_has_crc(mp)) {
- xfs_warn(mp,
- "%s: no buf ops on daddr 0x%llx len %d",
- __func__, xfs_buf_daddr(bp),
- bp->b_length);
- xfs_hex_dump(bp->b_addr,
- XFS_CORRUPTION_DUMP_LEN);
- dump_stack();
- }
- }
} else {
op = REQ_OP_READ;
if (bp->b_flags & XBF_READ_AHEAD)
@@ -1693,6 +1663,36 @@ xfs_buf_iowait(
return bp->b_error;
}
+/*
+ * Run the write verifier callback function if it exists. If this fails, mark
+ * the buffer with an error and do not dispatch the I/O.
+ */
+static bool
+xfs_buf_verify_write(
+ struct xfs_buf *bp)
+{
+ if (bp->b_ops) {
+ bp->b_ops->verify_write(bp);
+ if (bp->b_error)
+ return false;
+ } else if (bp->b_rhash_key != XFS_BUF_DADDR_NULL) {
+ /*
+ * Non-crc filesystems don't attach verifiers during log
+ * recovery, so don't warn for such filesystems.
+ */
+ if (xfs_has_crc(bp->b_mount)) {
+ xfs_warn(bp->b_mount,
+ "%s: no buf ops on daddr 0x%llx len %d",
+ __func__, xfs_buf_daddr(bp),
+ bp->b_length);
+ xfs_hex_dump(bp->b_addr, XFS_CORRUPTION_DUMP_LEN);
+ dump_stack();
+ }
+ }
+
+ return true;
+}
+
/*
* Buffer I/O submission path, read or write. Asynchronous submission transfers
* the buffer lock ownership and the current reference to the IO. It is not
@@ -1751,8 +1751,15 @@ xfs_buf_submit(
atomic_set(&bp->b_io_remaining, 1);
if (bp->b_flags & XBF_ASYNC)
xfs_buf_ioacct_inc(bp);
+
+ if ((bp->b_flags & XBF_WRITE) && !xfs_buf_verify_write(bp)) {
+ xfs_force_shutdown(bp->b_mount, SHUTDOWN_CORRUPT_INCORE);
+ goto done;
+ }
+
_xfs_buf_ioapply(bp);
+done:
/*
* If _xfs_buf_ioapply failed, we can get back here with only the IO
* reference we took above. If we drop it to zero, run completion so
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 08/15] xfs: move in-memory buftarg handling out of _xfs_buf_ioapply
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (6 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 07/15] xfs: move write verification out of _xfs_buf_ioapply Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:34 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 09/15] xfs: simplify buffer I/O submission Christoph Hellwig
` (6 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
No I/O to apply for in-memory buffers, so skip the function call
entirely. Clean up the b_io_error initialization logic to allow
for this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 18e830c4e990..e886605b5721 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1607,12 +1607,6 @@ _xfs_buf_ioapply(
int size;
int i;
- /*
- * Make sure we capture only current IO errors rather than stale errors
- * left over from previous use of the buffer (e.g. failed readahead).
- */
- bp->b_error = 0;
-
if (bp->b_flags & XBF_WRITE) {
op = REQ_OP_WRITE;
} else {
@@ -1624,10 +1618,6 @@ _xfs_buf_ioapply(
/* we only use the buffer cache for meta-data */
op |= REQ_META;
- /* in-memory targets are directly mapped, no IO required. */
- if (xfs_buftarg_is_mem(bp->b_target))
- return;
-
/*
* Walk all the vectors issuing IO on them. Set up the initial offset
* into the buffer and the desired IO size before we start -
@@ -1740,7 +1730,11 @@ xfs_buf_submit(
if (bp->b_flags & XBF_WRITE)
xfs_buf_wait_unpin(bp);
- /* clear the internal error state to avoid spurious errors */
+ /*
+ * Make sure we capture only current IO errors rather than stale errors
+ * left over from previous use of the buffer (e.g. failed readahead).
+ */
+ bp->b_error = 0;
bp->b_io_error = 0;
/*
@@ -1757,6 +1751,10 @@ xfs_buf_submit(
goto done;
}
+ /* In-memory targets are directly mapped, no I/O required. */
+ if (xfs_buftarg_is_mem(bp->b_target))
+ goto done;
+
_xfs_buf_ioapply(bp);
done:
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 09/15] xfs: simplify buffer I/O submission
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (7 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 08/15] xfs: move in-memory buftarg handling " Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:42 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 10/15] xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend Christoph Hellwig
` (5 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
The code in _xfs_buf_ioapply is unnecessarily complicated because it
doesn't take advantage of modern bio features.
Simplify it by making use of bio splitting and chaining, that is build
a single bio for the pages in the buffer using a simple loop, and then
split that bio on the map boundaries for discontiguous multi-FSB buffers
and chain the split bios to the main one so that there is only a single
I/O completion.
This not only simplifies the code to build the buffer, but also removes
the need for the b_io_remaining field as buffer ownership is granted
to the bio on submit of the final bio with no chance for a completion
before that as well as the b_io_error field that is now superfluous
because there always is exactly one completion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 207 ++++++++++++++---------------------------------
fs/xfs/xfs_buf.h | 2 -
2 files changed, 60 insertions(+), 149 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e886605b5721..094f16457998 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1364,13 +1364,6 @@ xfs_buf_ioend(
{
trace_xfs_buf_iodone(bp, _RET_IP_);
- /*
- * Pull in IO completion errors now. We are guaranteed to be running
- * single threaded, so we don't need the lock to read b_io_error.
- */
- if (!bp->b_error && bp->b_io_error)
- xfs_buf_ioerror(bp, bp->b_io_error);
-
if (bp->b_flags & XBF_READ) {
if (!bp->b_error && bp->b_ops)
bp->b_ops->verify_read(bp);
@@ -1494,118 +1487,26 @@ static void
xfs_buf_bio_end_io(
struct bio *bio)
{
- struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private;
+ struct xfs_buf *bp = bio->bi_private;
- if (!bio->bi_status &&
- (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
- XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
- bio->bi_status = BLK_STS_IOERR;
-
- /*
- * don't overwrite existing errors - otherwise we can lose errors on
- * buffers that require multiple bios to complete.
- */
- if (bio->bi_status) {
- int error = blk_status_to_errno(bio->bi_status);
-
- cmpxchg(&bp->b_io_error, 0, error);
- }
+ if (bio->bi_status)
+ xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
+ else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
+ XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
+ xfs_buf_ioerror(bp, -EIO);
if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
- if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
- xfs_buf_ioend_async(bp);
+ xfs_buf_ioend_async(bp);
bio_put(bio);
}
-static void
-xfs_buf_ioapply_map(
- struct xfs_buf *bp,
- int map,
- int *buf_offset,
- int *count,
- blk_opf_t op)
-{
- int page_index;
- unsigned int total_nr_pages = bp->b_page_count;
- int nr_pages;
- struct bio *bio;
- sector_t sector = bp->b_maps[map].bm_bn;
- int size;
- int offset;
-
- /* skip the pages in the buffer before the start offset */
- page_index = 0;
- offset = *buf_offset;
- while (offset >= PAGE_SIZE) {
- page_index++;
- offset -= PAGE_SIZE;
- }
-
- /*
- * Limit the IO size to the length of the current vector, and update the
- * remaining IO count for the next time around.
- */
- size = min_t(int, BBTOB(bp->b_maps[map].bm_len), *count);
- *count -= size;
- *buf_offset += size;
-
-next_chunk:
- atomic_inc(&bp->b_io_remaining);
- nr_pages = bio_max_segs(total_nr_pages);
-
- bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO);
- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = xfs_buf_bio_end_io;
- bio->bi_private = bp;
-
- for (; size && nr_pages; nr_pages--, page_index++) {
- int rbytes, nbytes = PAGE_SIZE - offset;
-
- if (nbytes > size)
- nbytes = size;
-
- rbytes = bio_add_page(bio, bp->b_pages[page_index], nbytes,
- offset);
- if (rbytes < nbytes)
- break;
-
- offset = 0;
- sector += BTOBB(nbytes);
- size -= nbytes;
- total_nr_pages--;
- }
-
- if (likely(bio->bi_iter.bi_size)) {
- if (xfs_buf_is_vmapped(bp)) {
- flush_kernel_vmap_range(bp->b_addr,
- xfs_buf_vmap_len(bp));
- }
- submit_bio(bio);
- if (size)
- goto next_chunk;
- } else {
- /*
- * This is guaranteed not to be the last io reference count
- * because the caller (xfs_buf_submit) holds a count itself.
- */
- atomic_dec(&bp->b_io_remaining);
- xfs_buf_ioerror(bp, -EIO);
- bio_put(bio);
- }
-
-}
-
-STATIC void
-_xfs_buf_ioapply(
- struct xfs_buf *bp)
+static inline blk_opf_t
+xfs_buf_bio_op(
+ struct xfs_buf *bp)
{
- struct blk_plug plug;
- blk_opf_t op;
- int offset;
- int size;
- int i;
+ blk_opf_t op;
if (bp->b_flags & XBF_WRITE) {
op = REQ_OP_WRITE;
@@ -1615,25 +1516,52 @@ _xfs_buf_ioapply(
op |= REQ_RAHEAD;
}
- /* we only use the buffer cache for meta-data */
- op |= REQ_META;
+ return op | REQ_META;
+}
+
+static void
+xfs_buf_submit_bio(
+ struct xfs_buf *bp)
+{
+ unsigned int size = BBTOB(bp->b_length);
+ unsigned int map = 0, p;
+ struct blk_plug plug;
+ struct bio *bio;
+
+ bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count,
+ xfs_buf_bio_op(bp), GFP_NOIO);
+ bio->bi_private = bp;
+ bio->bi_end_io = xfs_buf_bio_end_io;
+
+ if (bp->b_flags & _XBF_KMEM) {
+ __bio_add_page(bio, virt_to_page(bp->b_addr), size,
+ bp->b_offset);
+ } else {
+ for (p = 0; p < bp->b_page_count; p++)
+ __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0);
+ bio->bi_iter.bi_size = size; /* limit to the actual size used */
+
+ if (xfs_buf_is_vmapped(bp))
+ flush_kernel_vmap_range(bp->b_addr,
+ xfs_buf_vmap_len(bp));
+ }
/*
- * Walk all the vectors issuing IO on them. Set up the initial offset
- * into the buffer and the desired IO size before we start -
- * _xfs_buf_ioapply_vec() will modify them appropriately for each
- * subsequent call.
+ * If there is more than one map segment, split the original bio to
+ * submit a separate bio for each discontiguous range.
*/
- offset = bp->b_offset;
- size = BBTOB(bp->b_length);
blk_start_plug(&plug);
- for (i = 0; i < bp->b_map_count; i++) {
- xfs_buf_ioapply_map(bp, i, &offset, &size, op);
- if (bp->b_error)
- break;
- if (size <= 0)
- break; /* all done */
+ for (map = 0; map < bp->b_map_count - 1; map++) {
+ struct bio *split;
+
+ split = bio_split(bio, bp->b_maps[map].bm_len, GFP_NOFS,
+ &fs_bio_set);
+ split->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
+ bio_chain(split, bio);
+ submit_bio(split);
}
+ bio->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
+ submit_bio(bio);
blk_finish_plug(&plug);
}
@@ -1693,8 +1621,6 @@ static int
xfs_buf_submit(
struct xfs_buf *bp)
{
- int error = 0;
-
trace_xfs_buf_submit(bp, _RET_IP_);
ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
@@ -1735,14 +1661,7 @@ xfs_buf_submit(
* left over from previous use of the buffer (e.g. failed readahead).
*/
bp->b_error = 0;
- bp->b_io_error = 0;
- /*
- * Set the count to 1 initially, this will stop an I/O completion
- * callout which happens before we have started all the I/O from calling
- * xfs_buf_ioend too early.
- */
- atomic_set(&bp->b_io_remaining, 1);
if (bp->b_flags & XBF_ASYNC)
xfs_buf_ioacct_inc(bp);
@@ -1755,28 +1674,22 @@ xfs_buf_submit(
if (xfs_buftarg_is_mem(bp->b_target))
goto done;
- _xfs_buf_ioapply(bp);
+ xfs_buf_submit_bio(bp);
+ goto rele;
done:
- /*
- * If _xfs_buf_ioapply failed, we can get back here with only the IO
- * reference we took above. If we drop it to zero, run completion so
- * that we don't return to the caller with completion still pending.
- */
- if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
- if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
- xfs_buf_ioend(bp);
- else
- xfs_buf_ioend_async(bp);
- }
-
+ if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
+ xfs_buf_ioend(bp);
+ else
+ xfs_buf_ioend_async(bp);
+rele:
/*
* Release the hold that keeps the buffer referenced for the entire
* I/O. Note that if the buffer is async, it is not safe to reference
* after this release.
*/
xfs_buf_rele(bp);
- return error;
+ return 0;
}
void *
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index da80399c7457..c53d27439ff2 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -184,7 +184,6 @@ struct xfs_buf {
struct list_head b_lru; /* lru list */
spinlock_t b_lock; /* internal state lock */
unsigned int b_state; /* internal state flags */
- int b_io_error; /* internal IO error state */
wait_queue_head_t b_waiters; /* unpin waiters */
struct list_head b_list;
struct xfs_perag *b_pag;
@@ -202,7 +201,6 @@ struct xfs_buf {
struct xfs_buf_map __b_map; /* inline compound buffer map */
int b_map_count;
atomic_t b_pin_count; /* pin count */
- atomic_t b_io_remaining; /* #outstanding I/O requests */
unsigned int b_page_count; /* size of page array */
unsigned int b_offset; /* page offset of b_addr,
only for _XBF_KMEM buffers */
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 10/15] xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (8 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 09/15] xfs: simplify buffer I/O submission Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:42 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 11/15] xfs: remove the extra buffer reference in xfs_buf_submit Christoph Hellwig
` (4 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
Invalidating cache lines can be fairly expensive, so don't do it
in interrupt context. Note that in practice very few setup will
actually do anything here as virtually indexed caches are rather
uncommon, but we might as well move the call to the proper place
while touching this area.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 094f16457998..49df4adf0e98 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1365,6 +1365,9 @@ xfs_buf_ioend(
trace_xfs_buf_iodone(bp, _RET_IP_);
if (bp->b_flags & XBF_READ) {
+ if (!bp->b_error && xfs_buf_is_vmapped(bp))
+ invalidate_kernel_vmap_range(bp->b_addr,
+ xfs_buf_vmap_len(bp));
if (!bp->b_error && bp->b_ops)
bp->b_ops->verify_read(bp);
if (!bp->b_error)
@@ -1495,9 +1498,6 @@ xfs_buf_bio_end_io(
XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
xfs_buf_ioerror(bp, -EIO);
- if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
- invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
-
xfs_buf_ioend_async(bp);
bio_put(bio);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 11/15] xfs: remove the extra buffer reference in xfs_buf_submit
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (9 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 10/15] xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-13 7:13 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 12/15] xfs: always complete the buffer inline " Christoph Hellwig
` (3 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
Nothing touches the buffer after it has been submitted now, so the need for
the extra transient reference went away as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 49df4adf0e98..352cc50aeea5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1646,13 +1646,6 @@ xfs_buf_submit(
return -EIO;
}
- /*
- * Grab a reference so the buffer does not go away underneath us. For
- * async buffers, I/O completion drops the callers reference, which
- * could occur before submission returns.
- */
- xfs_buf_hold(bp);
-
if (bp->b_flags & XBF_WRITE)
xfs_buf_wait_unpin(bp);
@@ -1675,20 +1668,12 @@ xfs_buf_submit(
goto done;
xfs_buf_submit_bio(bp);
- goto rele;
-
+ return 0;
done:
if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
xfs_buf_ioend(bp);
else
xfs_buf_ioend_async(bp);
-rele:
- /*
- * Release the hold that keeps the buffer referenced for the entire
- * I/O. Note that if the buffer is async, it is not safe to reference
- * after this release.
- */
- xfs_buf_rele(bp);
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 12/15] xfs: always complete the buffer inline in xfs_buf_submit
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (10 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 11/15] xfs: remove the extra buffer reference in xfs_buf_submit Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:46 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 13/15] xfs: simplify xfsaild_resubmit_item Christoph Hellwig
` (2 subsequent siblings)
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
xfs_buf_submit now only completes a buffer on error, or for in-memory
buftargs. There is no point in using a workqueue for the latter as
the completion will just wake up the caller. Optimize this case by
avoiding the workqueue roundtrip.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 352cc50aeea5..0ad3cacfdba1 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1670,10 +1670,7 @@ xfs_buf_submit(
xfs_buf_submit_bio(bp);
return 0;
done:
- if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
- xfs_buf_ioend(bp);
- else
- xfs_buf_ioend_async(bp);
+ xfs_buf_ioend(bp);
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 13/15] xfs: simplify xfsaild_resubmit_item
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (11 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 12/15] xfs: always complete the buffer inline " Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:49 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 14/15] xfs: move b_li_list based retry handling to common code Christoph Hellwig
2025-01-06 9:54 ` [PATCH 15/15] xfs: add a b_iodone callback to struct xfs_buf Christoph Hellwig
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
Since commit acc8f8628c37 ("xfs: attach dquot buffer to dquot log item
buffer") all buf items that use bp->b_li_list are explicitly checked for
in the branch to just clears XFS_LI_FAILED. Remove the dead arm that
calls xfs_clear_li_failed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_trans_ail.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index f56d62dced97..0fcb1828e598 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -359,13 +359,8 @@ xfsaild_resubmit_item(
}
/* protected by ail_lock */
- list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
- if (bp->b_flags & (_XBF_INODES | _XBF_DQUOTS))
- clear_bit(XFS_LI_FAILED, &lip->li_flags);
- else
- xfs_clear_li_failed(lip);
- }
-
+ list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
+ clear_bit(XFS_LI_FAILED, &lip->li_flags);
xfs_buf_unlock(bp);
return XFS_ITEM_SUCCESS;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 14/15] xfs: move b_li_list based retry handling to common code
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (12 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 13/15] xfs: simplify xfsaild_resubmit_item Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:55 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 15/15] xfs: add a b_iodone callback to struct xfs_buf Christoph Hellwig
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
The dquot and inode version are very similar, which is expected given the
overall b_li_list logic. The differences are that the inode version also
clears the XFS_LI_FLUSHING which is defined in common but only ever set
by the inode item, and that the dquot version takes the ail_lock over
the list iteration. While this seems sensible given that additions and
removals from b_li_list are protected by the ail_lock, log items are
only added before buffer submission, and are only removed when completing
the buffer, so nothing can change the list when retrying a buffer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 12 ++++++------
fs/xfs/xfs_buf_item.h | 5 -----
fs/xfs/xfs_dquot.c | 12 ------------
fs/xfs/xfs_inode_item.c | 12 ------------
4 files changed, 6 insertions(+), 35 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 0ad3cacfdba1..1cf5d14d0d06 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1288,6 +1288,7 @@ xfs_buf_ioend_handle_error(
{
struct xfs_mount *mp = bp->b_mount;
struct xfs_error_cfg *cfg;
+ struct xfs_log_item *lip;
/*
* If we've already shutdown the journal because of I/O errors, there's
@@ -1335,12 +1336,11 @@ xfs_buf_ioend_handle_error(
}
/* Still considered a transient error. Caller will schedule retries. */
- if (bp->b_flags & _XBF_INODES)
- xfs_buf_inode_io_fail(bp);
- else if (bp->b_flags & _XBF_DQUOTS)
- xfs_buf_dquot_io_fail(bp);
- else
- ASSERT(list_empty(&bp->b_li_list));
+ list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
+ set_bit(XFS_LI_FAILED, &lip->li_flags);
+ clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
+ }
+
xfs_buf_ioerror(bp, 0);
xfs_buf_relse(bp);
return true;
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 4d8a6aece995..8cde85259a58 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -54,17 +54,12 @@ bool xfs_buf_item_put(struct xfs_buf_log_item *);
void xfs_buf_item_log(struct xfs_buf_log_item *, uint, uint);
bool xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
void xfs_buf_inode_iodone(struct xfs_buf *);
-void xfs_buf_inode_io_fail(struct xfs_buf *bp);
#ifdef CONFIG_XFS_QUOTA
void xfs_buf_dquot_iodone(struct xfs_buf *);
-void xfs_buf_dquot_io_fail(struct xfs_buf *bp);
#else
static inline void xfs_buf_dquot_iodone(struct xfs_buf *bp)
{
}
-static inline void xfs_buf_dquot_io_fail(struct xfs_buf *bp)
-{
-}
#endif /* CONFIG_XFS_QUOTA */
void xfs_buf_iodone(struct xfs_buf *);
bool xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index f11d475898f2..78dde811ab16 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1229,18 +1229,6 @@ xfs_buf_dquot_iodone(
}
}
-void
-xfs_buf_dquot_io_fail(
- struct xfs_buf *bp)
-{
- struct xfs_log_item *lip;
-
- spin_lock(&bp->b_mount->m_ail->ail_lock);
- list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
- set_bit(XFS_LI_FAILED, &lip->li_flags);
- spin_unlock(&bp->b_mount->m_ail->ail_lock);
-}
-
/* Check incore dquot for errors before we flush. */
static xfs_failaddr_t
xfs_qm_dqflush_check(
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 912f0b1bc3cb..4fb2e1a6ad26 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -1023,18 +1023,6 @@ xfs_buf_inode_iodone(
list_splice_tail(&flushed_inodes, &bp->b_li_list);
}
-void
-xfs_buf_inode_io_fail(
- struct xfs_buf *bp)
-{
- struct xfs_log_item *lip;
-
- list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
- set_bit(XFS_LI_FAILED, &lip->li_flags);
- clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
- }
-}
-
/*
* Clear the inode logging fields so no more flushes are attempted. If we are
* on a buffer list, it is now safe to remove it because the buffer is
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 15/15] xfs: add a b_iodone callback to struct xfs_buf
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
` (13 preceding siblings ...)
2025-01-06 9:54 ` [PATCH 14/15] xfs: move b_li_list based retry handling to common code Christoph Hellwig
@ 2025-01-06 9:54 ` Christoph Hellwig
2025-01-07 6:58 ` Darrick J. Wong
14 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-06 9:54 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
Stop open coding the log item completions and instead add a callback
into back into the submitter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 7 ++-----
fs/xfs/xfs_buf.h | 5 +----
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_inode_item.c | 2 +-
fs/xfs/xfs_trans_buf.c | 8 ++++----
5 files changed, 9 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1cf5d14d0d06..68a5148115e5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1394,11 +1394,8 @@ xfs_buf_ioend(
if (bp->b_log_item)
xfs_buf_item_done(bp);
- if (bp->b_flags & _XBF_INODES)
- xfs_buf_inode_iodone(bp);
- else if (bp->b_flags & _XBF_DQUOTS)
- xfs_buf_dquot_iodone(bp);
-
+ if (bp->b_iodone)
+ bp->b_iodone(bp);
}
bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD |
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index c53d27439ff2..10bf66e074a0 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -34,8 +34,6 @@ struct xfs_buf;
#define XBF_WRITE_FAIL (1u << 7) /* async writes have failed on this buffer */
/* buffer type flags for write callbacks */
-#define _XBF_INODES (1u << 16)/* inode buffer */
-#define _XBF_DQUOTS (1u << 17)/* dquot buffer */
#define _XBF_LOGRECOVERY (1u << 18)/* log recovery buffer */
/* flags used only internally */
@@ -65,8 +63,6 @@ typedef unsigned int xfs_buf_flags_t;
{ XBF_DONE, "DONE" }, \
{ XBF_STALE, "STALE" }, \
{ XBF_WRITE_FAIL, "WRITE_FAIL" }, \
- { _XBF_INODES, "INODES" }, \
- { _XBF_DQUOTS, "DQUOTS" }, \
{ _XBF_LOGRECOVERY, "LOG_RECOVERY" }, \
{ _XBF_PAGES, "PAGES" }, \
{ _XBF_KMEM, "KMEM" }, \
@@ -205,6 +201,7 @@ struct xfs_buf {
unsigned int b_offset; /* page offset of b_addr,
only for _XBF_KMEM buffers */
int b_error; /* error code on I/O */
+ void (*b_iodone)(struct xfs_buf *bp);
/*
* async write failure retry count. Initialised to zero on the first
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 78dde811ab16..e0a379729674 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1446,7 +1446,7 @@ xfs_qm_dqflush(
* Attach the dquot to the buffer so that we can remove this dquot from
* the AIL and release the flush lock once the dquot is synced to disk.
*/
- bp->b_flags |= _XBF_DQUOTS;
+ bp->b_iodone = xfs_buf_dquot_iodone;
list_add_tail(&lip->li_bio_list, &bp->b_li_list);
/*
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 4fb2e1a6ad26..e0990a8c4007 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -185,7 +185,7 @@ xfs_inode_item_precommit(
xfs_buf_hold(bp);
spin_lock(&iip->ili_lock);
iip->ili_item.li_buf = bp;
- bp->b_flags |= _XBF_INODES;
+ bp->b_iodone = xfs_buf_inode_iodone;
list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list);
xfs_trans_brelse(tp, bp);
}
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 8e886ecfd69a..53af546c0b23 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -659,7 +659,7 @@ xfs_trans_inode_buf(
ASSERT(atomic_read(&bip->bli_refcount) > 0);
bip->bli_flags |= XFS_BLI_INODE_BUF;
- bp->b_flags |= _XBF_INODES;
+ bp->b_iodone = xfs_buf_inode_iodone;
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
}
@@ -684,7 +684,7 @@ xfs_trans_stale_inode_buf(
ASSERT(atomic_read(&bip->bli_refcount) > 0);
bip->bli_flags |= XFS_BLI_STALE_INODE;
- bp->b_flags |= _XBF_INODES;
+ bp->b_iodone = xfs_buf_inode_iodone;
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
}
@@ -709,7 +709,7 @@ xfs_trans_inode_alloc_buf(
ASSERT(atomic_read(&bip->bli_refcount) > 0);
bip->bli_flags |= XFS_BLI_INODE_ALLOC_BUF;
- bp->b_flags |= _XBF_INODES;
+ bp->b_iodone = xfs_buf_inode_iodone;
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
}
@@ -820,6 +820,6 @@ xfs_trans_dquot_buf(
break;
}
- bp->b_flags |= _XBF_DQUOTS;
+ bp->b_iodone = xfs_buf_dquot_iodone;
xfs_trans_buf_set_type(tp, bp, type);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets
2025-01-06 9:54 ` [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets Christoph Hellwig
@ 2025-01-07 2:00 ` Darrick J. Wong
2025-01-07 6:05 ` Christoph Hellwig
0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 2:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:38AM +0100, Christoph Hellwig wrote:
> __xfs_buf_submit calls xfs_buf_ioend when b_io_remaining hits zero. For
> in-memory buftargs b_io_remaining is never incremented from it's initial
> value of 1, so this always happens. Thus the extra call to xfs_buf_ioend
> in _xfs_buf_ioapply causes a double completion. Fortunately
> __xfs_buf_submit is only used for synchronous reads on in-memory buftargs
> due to the peculiarities of how they work, so this is mostly harmless and
> just causes a little extra work to be done.
Tempted to add:
Cc: <stable@vger.kernel.org> # v6.9
though I think backporting isn't strictly necessary because in-memory
buffers don't have log items, right? If so, then we don't need to cc
stable.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> Fixes: 5076a6040ca1 ("xfs: support in-memory buffer cache targets")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_buf.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index aa63b8efd782..787caf0c3254 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1656,10 +1656,8 @@ _xfs_buf_ioapply(
> op |= REQ_META;
>
> /* in-memory targets are directly mapped, no IO required. */
> - if (xfs_buftarg_is_mem(bp->b_target)) {
> - xfs_buf_ioend(bp);
> + if (xfs_buftarg_is_mem(bp->b_target))
> return;
> - }
>
> /*
> * Walk all the vectors issuing IO on them. Set up the initial offset
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 02/15] xfs: remove the incorrect comment above xfs_buf_free_maps
2025-01-06 9:54 ` [PATCH 02/15] xfs: remove the incorrect comment above xfs_buf_free_maps Christoph Hellwig
@ 2025-01-07 2:00 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 2:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:39AM +0100, Christoph Hellwig wrote:
> The comment above xfs_buf_free_maps talks about fields not even used in
> the function and also doesn't add any other value. Remove it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 787caf0c3254..1927655fed13 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -202,9 +202,6 @@ xfs_buf_get_maps(
> return 0;
> }
>
> -/*
> - * Frees b_pages if it was allocated.
> - */
> static void
> xfs_buf_free_maps(
> struct xfs_buf *bp)
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 03/15] xfs: remove the incorrect comment about the b_pag field
2025-01-06 9:54 ` [PATCH 03/15] xfs: remove the incorrect comment about the b_pag field Christoph Hellwig
@ 2025-01-07 2:01 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 2:01 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:40AM +0100, Christoph Hellwig wrote:
> The rbtree root is long gone.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 3d56bc7a35cc..da80399c7457 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -187,7 +187,7 @@ struct xfs_buf {
> int b_io_error; /* internal IO error state */
> wait_queue_head_t b_waiters; /* unpin waiters */
> struct list_head b_list;
> - struct xfs_perag *b_pag; /* contains rbtree root */
> + struct xfs_perag *b_pag;
> struct xfs_mount *b_mount;
> struct xfs_buftarg *b_target; /* buffer target (device) */
> void *b_addr; /* virtual address of buffer */
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 04/15] xfs: move xfs_buf_iowait out of (__)xfs_buf_submit
2025-01-06 9:54 ` [PATCH 04/15] xfs: move xfs_buf_iowait out of (__)xfs_buf_submit Christoph Hellwig
@ 2025-01-07 2:02 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 2:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:41AM +0100, Christoph Hellwig wrote:
> There is no good reason to pass a bool argument to wait for a buffer when
> the callers that want that can easily just wait themselves.
>
> This means the wait moves out of the extra hold of the buffer, but as the
> callers of synchronous buffer I/O need to hold a reference anyway that is
> perfectly fine.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
That's much cleaner...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 36 +++++++++++++++++-------------------
> 1 file changed, 17 insertions(+), 19 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 1927655fed13..a3484421a6d8 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -52,14 +52,8 @@ struct kmem_cache *xfs_buf_cache;
> * b_lock (trylock due to inversion)
> */
>
> -static int __xfs_buf_submit(struct xfs_buf *bp, bool wait);
> -
> -static inline int
> -xfs_buf_submit(
> - struct xfs_buf *bp)
> -{
> - return __xfs_buf_submit(bp, !(bp->b_flags & XBF_ASYNC));
> -}
> +static int xfs_buf_submit(struct xfs_buf *bp);
> +static int xfs_buf_iowait(struct xfs_buf *bp);
>
> static inline bool xfs_buf_is_uncached(struct xfs_buf *bp)
> {
> @@ -797,13 +791,18 @@ _xfs_buf_read(
> struct xfs_buf *bp,
> xfs_buf_flags_t flags)
> {
> + int error;
> +
> ASSERT(!(flags & XBF_WRITE));
> ASSERT(bp->b_maps[0].bm_bn != XFS_BUF_DADDR_NULL);
>
> bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD | XBF_DONE);
> bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
>
> - return xfs_buf_submit(bp);
> + error = xfs_buf_submit(bp);
> + if (!error && !(flags & XBF_ASYNC))
> + error = xfs_buf_iowait(bp);
> + return error;
> }
>
> /*
> @@ -978,9 +977,10 @@ xfs_buf_read_uncached(
> bp->b_flags |= XBF_READ;
> bp->b_ops = ops;
>
> - xfs_buf_submit(bp);
> - if (bp->b_error) {
> - error = bp->b_error;
> + error = xfs_buf_submit(bp);
> + if (!error)
> + error = xfs_buf_iowait(bp);
> + if (error) {
> xfs_buf_relse(bp);
> return error;
> }
> @@ -1483,6 +1483,8 @@ xfs_bwrite(
> XBF_DONE);
>
> error = xfs_buf_submit(bp);
> + if (!error)
> + error = xfs_buf_iowait(bp);
> if (error)
> xfs_force_shutdown(bp->b_mount, SHUTDOWN_META_IO_ERROR);
> return error;
> @@ -1698,9 +1700,8 @@ xfs_buf_iowait(
> * holds an additional reference itself.
> */
> static int
> -__xfs_buf_submit(
> - struct xfs_buf *bp,
> - bool wait)
> +xfs_buf_submit(
> + struct xfs_buf *bp)
> {
> int error = 0;
>
> @@ -1764,9 +1765,6 @@ __xfs_buf_submit(
> xfs_buf_ioend_async(bp);
> }
>
> - if (wait)
> - error = xfs_buf_iowait(bp);
> -
> /*
> * Release the hold that keeps the buffer referenced for the entire
> * I/O. Note that if the buffer is async, it is not safe to reference
> @@ -2322,7 +2320,7 @@ xfs_buf_delwri_submit_buffers(
> bp->b_flags |= XBF_ASYNC;
> xfs_buf_list_del(bp);
> }
> - __xfs_buf_submit(bp, false);
> + xfs_buf_submit(bp);
> }
> blk_finish_plug(&plug);
>
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf
2025-01-06 9:54 ` [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf Christoph Hellwig
@ 2025-01-07 2:08 ` Darrick J. Wong
2025-01-07 6:06 ` Christoph Hellwig
0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 2:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:42AM +0100, Christoph Hellwig wrote:
> xfs_buf_delwri_pushbuf synchronously writes a buffer that is on a delwri
> list already. Instead of doing a complicated dance with the delwri
> and wait list, just leave them alone and open code the actual buffer
> write.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_buf.c | 33 ++++++++-------------------------
> 1 file changed, 8 insertions(+), 25 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index a3484421a6d8..7edd7a1e9dae 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -2391,14 +2391,9 @@ xfs_buf_delwri_submit(
> * Push a single buffer on a delwri queue.
> *
> * The purpose of this function is to submit a single buffer of a delwri queue
> - * and return with the buffer still on the original queue. The waiting delwri
> - * buffer submission infrastructure guarantees transfer of the delwri queue
> - * buffer reference to a temporary wait list. We reuse this infrastructure to
> - * transfer the buffer back to the original queue.
> + * and return with the buffer still on the original queue.
> *
> - * Note the buffer transitions from the queued state, to the submitted and wait
> - * listed state and back to the queued state during this call. The buffer
> - * locking and queue management logic between _delwri_pushbuf() and
> + * The buffer locking and queue management logic between _delwri_pushbuf() and
> * _delwri_queue() guarantee that the buffer cannot be queued to another list
> * before returning.
> */
> @@ -2407,33 +2402,21 @@ xfs_buf_delwri_pushbuf(
> struct xfs_buf *bp,
> struct list_head *buffer_list)
> {
> - LIST_HEAD (submit_list);
> int error;
>
> ASSERT(bp->b_flags & _XBF_DELWRI_Q);
>
> trace_xfs_buf_delwri_pushbuf(bp, _RET_IP_);
>
> - /*
> - * Isolate the buffer to a new local list so we can submit it for I/O
> - * independently from the rest of the original list.
> - */
> xfs_buf_lock(bp);
> - list_move(&bp->b_list, &submit_list);
> - xfs_buf_unlock(bp);
> -
> - /*
> - * Delwri submission clears the DELWRI_Q buffer flag and returns with
> - * the buffer on the wait list with the original reference. Rather than
> - * bounce the buffer from a local wait list back to the original list
> - * after I/O completion, reuse the original list as the wait list.
> - */
> - xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
> + bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC);
> + bp->b_flags |= XBF_WRITE;
> + xfs_buf_submit(bp);
Why is it ok to ignore the return value here? Is it because the only
error path in xfs_buf_submit is the xlog_is_shutdown case, in which case
the buffer ioend will have been called already and the EIO will be
returned by xfs_buf_iowait?
--D
>
> /*
> - * The buffer is now locked, under I/O and wait listed on the original
> - * delwri queue. Wait for I/O completion, restore the DELWRI_Q flag and
> - * return with the buffer unlocked and on the original queue.
> + * The buffer is now locked, under I/O but still on the original delwri
> + * queue. Wait for I/O completion, restore the DELWRI_Q flag and
> + * return with the buffer unlocked and still on the original queue.
> */
> error = xfs_buf_iowait(bp);
> bp->b_flags |= _XBF_DELWRI_Q;
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets
2025-01-07 2:00 ` Darrick J. Wong
@ 2025-01-07 6:05 ` Christoph Hellwig
0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-07 6:05 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 06:00:35PM -0800, Darrick J. Wong wrote:
> Cc: <stable@vger.kernel.org> # v6.9
>
> though I think backporting isn't strictly necessary because in-memory
> buffers don't have log items, right? If so, then we don't need to cc
> stable.
The stable CC is fine with me. But as I tried to explain I think
it's nor really causing problems right now, just a double call
to read_verify. It is a bit of a landmine, though.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf
2025-01-07 2:08 ` Darrick J. Wong
@ 2025-01-07 6:06 ` Christoph Hellwig
2025-01-13 7:12 ` Darrick J. Wong
0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-07 6:06 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 06:08:10PM -0800, Darrick J. Wong wrote:
> > - * after I/O completion, reuse the original list as the wait list.
> > - */
> > - xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
> > + bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC);
> > + bp->b_flags |= XBF_WRITE;
> > + xfs_buf_submit(bp);
>
> Why is it ok to ignore the return value here? Is it because the only
> error path in xfs_buf_submit is the xlog_is_shutdown case, in which case
> the buffer ioend will have been called already and the EIO will be
> returned by xfs_buf_iowait?
A very good question to be asked to the author of the original
xfs_buf_delwri_submit_buffers code that this go extracted from :)
I think you're provided answer is correct and also implies that we
should either get rid of the xfs_buf_submit return value or check it
more consistently.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers
2025-01-06 9:54 ` [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers Christoph Hellwig
@ 2025-01-07 6:31 ` Darrick J. Wong
2025-01-07 6:33 ` Christoph Hellwig
0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:31 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:43AM +0100, Christoph Hellwig wrote:
> xfs_buf_delwri_submit_buffers has two callers for synchronous and
> asynchronous writes that share very little logic. Split out a helper for
> the shared per-buffer loop and otherwise open code the submission in the
> two callers.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_buf.c | 121 +++++++++++++++++++++--------------------------
> 1 file changed, 55 insertions(+), 66 deletions(-)
Sheesh, splitting a function into two reduces line count by 11??
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 7edd7a1e9dae..e48d796c786b 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -2259,72 +2259,26 @@ xfs_buf_cmp(
> return 0;
> }
>
> -/*
> - * Submit buffers for write. If wait_list is specified, the buffers are
> - * submitted using sync I/O and placed on the wait list such that the caller can
> - * iowait each buffer. Otherwise async I/O is used and the buffers are released
> - * at I/O completion time. In either case, buffers remain locked until I/O
> - * completes and the buffer is released from the queue.
> - */
> -static int
> -xfs_buf_delwri_submit_buffers(
> - struct list_head *buffer_list,
> - struct list_head *wait_list)
> +static bool
> +xfs_buf_delwri_submit_prep(
> + struct xfs_buf *bp)
> {
> - struct xfs_buf *bp, *n;
> - int pinned = 0;
> - struct blk_plug plug;
> -
> - list_sort(NULL, buffer_list, xfs_buf_cmp);
> -
> - blk_start_plug(&plug);
> - list_for_each_entry_safe(bp, n, buffer_list, b_list) {
> - if (!wait_list) {
> - if (!xfs_buf_trylock(bp))
> - continue;
> - if (xfs_buf_ispinned(bp)) {
> - xfs_buf_unlock(bp);
> - pinned++;
> - continue;
> - }
> - } else {
> - xfs_buf_lock(bp);
> - }
> -
> - /*
> - * Someone else might have written the buffer synchronously or
> - * marked it stale in the meantime. In that case only the
> - * _XBF_DELWRI_Q flag got cleared, and we have to drop the
> - * reference and remove it from the list here.
> - */
> - if (!(bp->b_flags & _XBF_DELWRI_Q)) {
> - xfs_buf_list_del(bp);
> - xfs_buf_relse(bp);
> - continue;
> - }
> -
> - trace_xfs_buf_delwri_split(bp, _RET_IP_);
> -
> - /*
> - * If we have a wait list, each buffer (and associated delwri
> - * queue reference) transfers to it and is submitted
> - * synchronously. Otherwise, drop the buffer from the delwri
> - * queue and submit async.
> - */
> - bp->b_flags &= ~_XBF_DELWRI_Q;
> - bp->b_flags |= XBF_WRITE;
> - if (wait_list) {
> - bp->b_flags &= ~XBF_ASYNC;
> - list_move_tail(&bp->b_list, wait_list);
> - } else {
> - bp->b_flags |= XBF_ASYNC;
> - xfs_buf_list_del(bp);
> - }
> - xfs_buf_submit(bp);
> + /*
> + * Someone else might have written the buffer synchronously or marked it
> + * stale in the meantime. In that case only the _XBF_DELWRI_Q flag got
> + * cleared, and we have to drop the reference and remove it from the
> + * list here.
> + */
> + if (!(bp->b_flags & _XBF_DELWRI_Q)) {
> + xfs_buf_list_del(bp);
> + xfs_buf_relse(bp);
> + return false;
> }
> - blk_finish_plug(&plug);
>
> - return pinned;
> + trace_xfs_buf_delwri_split(bp, _RET_IP_);
> + bp->b_flags &= ~_XBF_DELWRI_Q;
> + bp->b_flags |= XBF_WRITE;
> + return true;
> }
>
> /*
> @@ -2347,7 +2301,30 @@ int
> xfs_buf_delwri_submit_nowait(
> struct list_head *buffer_list)
> {
> - return xfs_buf_delwri_submit_buffers(buffer_list, NULL);
> + struct xfs_buf *bp, *n;
> + int pinned = 0;
> + struct blk_plug plug;
> +
> + list_sort(NULL, buffer_list, xfs_buf_cmp);
> +
> + blk_start_plug(&plug);
> + list_for_each_entry_safe(bp, n, buffer_list, b_list) {
> + if (!xfs_buf_trylock(bp))
> + continue;
> + if (xfs_buf_ispinned(bp)) {
> + xfs_buf_unlock(bp);
> + pinned++;
> + continue;
> + }
> + if (!xfs_buf_delwri_submit_prep(bp))
> + continue;
> + bp->b_flags |= XBF_ASYNC;
> + xfs_buf_list_del(bp);
> + xfs_buf_submit(bp);
> + }
> + blk_finish_plug(&plug);
> +
> + return pinned;
> }
>
> /*
> @@ -2364,9 +2341,21 @@ xfs_buf_delwri_submit(
> {
> LIST_HEAD (wait_list);
> int error = 0, error2;
> - struct xfs_buf *bp;
> + struct xfs_buf *bp, *n;
> + struct blk_plug plug;
>
> - xfs_buf_delwri_submit_buffers(buffer_list, &wait_list);
> + list_sort(NULL, buffer_list, xfs_buf_cmp);
> +
> + blk_start_plug(&plug);
> + list_for_each_entry_safe(bp, n, buffer_list, b_list) {
> + xfs_buf_lock(bp);
> + if (!xfs_buf_delwri_submit_prep(bp))
> + continue;
> + bp->b_flags &= ~XBF_ASYNC;
> + list_move_tail(&bp->b_list, &wait_list);
> + xfs_buf_submit(bp);
> + }
> + blk_finish_plug(&plug);
>
> /* Wait for IO to complete. */
> while (!list_empty(&wait_list)) {
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers
2025-01-07 6:31 ` Darrick J. Wong
@ 2025-01-07 6:33 ` Christoph Hellwig
0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-07 6:33 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:31:48PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 06, 2025 at 10:54:43AM +0100, Christoph Hellwig wrote:
> > xfs_buf_delwri_submit_buffers has two callers for synchronous and
> > asynchronous writes that share very little logic. Split out a helper for
> > the shared per-buffer loop and otherwise open code the submission in the
> > two callers.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > fs/xfs/xfs_buf.c | 121 +++++++++++++++++++++--------------------------
> > 1 file changed, 55 insertions(+), 66 deletions(-)
>
> Sheesh, splitting a function into two reduces line count by 11??
Well, most of that is the comment in the low-level function say it
does either the thing the one callers wants which is already documented
in that caller, or the thing the other caller wants already documented
in that caller..
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 07/15] xfs: move write verification out of _xfs_buf_ioapply
2025-01-06 9:54 ` [PATCH 07/15] xfs: move write verification out of _xfs_buf_ioapply Christoph Hellwig
@ 2025-01-07 6:33 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:33 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:44AM +0100, Christoph Hellwig wrote:
> Split the write verification logic out of _xfs_buf_ioapply into a new
> xfs_buf_verify_write helper called by xfs_buf_submit given that it isn't
> about applying the I/O and doesn't really fit in with the rest of
> _xfs_buf_ioapply.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Yeah, it's useful to break up this function...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 67 ++++++++++++++++++++++++++----------------------
> 1 file changed, 37 insertions(+), 30 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index e48d796c786b..18e830c4e990 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1615,36 +1615,6 @@ _xfs_buf_ioapply(
>
> if (bp->b_flags & XBF_WRITE) {
> op = REQ_OP_WRITE;
> -
> - /*
> - * Run the write verifier callback function if it exists. If
> - * this function fails it will mark the buffer with an error and
> - * the IO should not be dispatched.
> - */
> - if (bp->b_ops) {
> - bp->b_ops->verify_write(bp);
> - if (bp->b_error) {
> - xfs_force_shutdown(bp->b_mount,
> - SHUTDOWN_CORRUPT_INCORE);
> - return;
> - }
> - } else if (bp->b_rhash_key != XFS_BUF_DADDR_NULL) {
> - struct xfs_mount *mp = bp->b_mount;
> -
> - /*
> - * non-crc filesystems don't attach verifiers during
> - * log recovery, so don't warn for such filesystems.
> - */
> - if (xfs_has_crc(mp)) {
> - xfs_warn(mp,
> - "%s: no buf ops on daddr 0x%llx len %d",
> - __func__, xfs_buf_daddr(bp),
> - bp->b_length);
> - xfs_hex_dump(bp->b_addr,
> - XFS_CORRUPTION_DUMP_LEN);
> - dump_stack();
> - }
> - }
> } else {
> op = REQ_OP_READ;
> if (bp->b_flags & XBF_READ_AHEAD)
> @@ -1693,6 +1663,36 @@ xfs_buf_iowait(
> return bp->b_error;
> }
>
> +/*
> + * Run the write verifier callback function if it exists. If this fails, mark
> + * the buffer with an error and do not dispatch the I/O.
> + */
> +static bool
> +xfs_buf_verify_write(
> + struct xfs_buf *bp)
> +{
> + if (bp->b_ops) {
> + bp->b_ops->verify_write(bp);
> + if (bp->b_error)
> + return false;
> + } else if (bp->b_rhash_key != XFS_BUF_DADDR_NULL) {
> + /*
> + * Non-crc filesystems don't attach verifiers during log
> + * recovery, so don't warn for such filesystems.
> + */
> + if (xfs_has_crc(bp->b_mount)) {
> + xfs_warn(bp->b_mount,
> + "%s: no buf ops on daddr 0x%llx len %d",
> + __func__, xfs_buf_daddr(bp),
> + bp->b_length);
> + xfs_hex_dump(bp->b_addr, XFS_CORRUPTION_DUMP_LEN);
> + dump_stack();
> + }
> + }
> +
> + return true;
> +}
> +
> /*
> * Buffer I/O submission path, read or write. Asynchronous submission transfers
> * the buffer lock ownership and the current reference to the IO. It is not
> @@ -1751,8 +1751,15 @@ xfs_buf_submit(
> atomic_set(&bp->b_io_remaining, 1);
> if (bp->b_flags & XBF_ASYNC)
> xfs_buf_ioacct_inc(bp);
> +
> + if ((bp->b_flags & XBF_WRITE) && !xfs_buf_verify_write(bp)) {
> + xfs_force_shutdown(bp->b_mount, SHUTDOWN_CORRUPT_INCORE);
> + goto done;
> + }
> +
> _xfs_buf_ioapply(bp);
>
> +done:
> /*
> * If _xfs_buf_ioapply failed, we can get back here with only the IO
> * reference we took above. If we drop it to zero, run completion so
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 08/15] xfs: move in-memory buftarg handling out of _xfs_buf_ioapply
2025-01-06 9:54 ` [PATCH 08/15] xfs: move in-memory buftarg handling " Christoph Hellwig
@ 2025-01-07 6:34 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:45AM +0100, Christoph Hellwig wrote:
> No I/O to apply for in-memory buffers, so skip the function call
> entirely. Clean up the b_io_error initialization logic to allow
> for this.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Nice cleanup!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 20 +++++++++-----------
> 1 file changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 18e830c4e990..e886605b5721 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1607,12 +1607,6 @@ _xfs_buf_ioapply(
> int size;
> int i;
>
> - /*
> - * Make sure we capture only current IO errors rather than stale errors
> - * left over from previous use of the buffer (e.g. failed readahead).
> - */
> - bp->b_error = 0;
> -
> if (bp->b_flags & XBF_WRITE) {
> op = REQ_OP_WRITE;
> } else {
> @@ -1624,10 +1618,6 @@ _xfs_buf_ioapply(
> /* we only use the buffer cache for meta-data */
> op |= REQ_META;
>
> - /* in-memory targets are directly mapped, no IO required. */
> - if (xfs_buftarg_is_mem(bp->b_target))
> - return;
> -
> /*
> * Walk all the vectors issuing IO on them. Set up the initial offset
> * into the buffer and the desired IO size before we start -
> @@ -1740,7 +1730,11 @@ xfs_buf_submit(
> if (bp->b_flags & XBF_WRITE)
> xfs_buf_wait_unpin(bp);
>
> - /* clear the internal error state to avoid spurious errors */
> + /*
> + * Make sure we capture only current IO errors rather than stale errors
> + * left over from previous use of the buffer (e.g. failed readahead).
> + */
> + bp->b_error = 0;
> bp->b_io_error = 0;
>
> /*
> @@ -1757,6 +1751,10 @@ xfs_buf_submit(
> goto done;
> }
>
> + /* In-memory targets are directly mapped, no I/O required. */
> + if (xfs_buftarg_is_mem(bp->b_target))
> + goto done;
> +
> _xfs_buf_ioapply(bp);
>
> done:
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 09/15] xfs: simplify buffer I/O submission
2025-01-06 9:54 ` [PATCH 09/15] xfs: simplify buffer I/O submission Christoph Hellwig
@ 2025-01-07 6:42 ` Darrick J. Wong
2025-01-07 6:46 ` Christoph Hellwig
0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:42 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:46AM +0100, Christoph Hellwig wrote:
> The code in _xfs_buf_ioapply is unnecessarily complicated because it
> doesn't take advantage of modern bio features.
>
> Simplify it by making use of bio splitting and chaining, that is build
> a single bio for the pages in the buffer using a simple loop, and then
> split that bio on the map boundaries for discontiguous multi-FSB buffers
> and chain the split bios to the main one so that there is only a single
> I/O completion.
>
> This not only simplifies the code to build the buffer, but also removes
> the need for the b_io_remaining field as buffer ownership is granted
> to the bio on submit of the final bio with no chance for a completion
> before that as well as the b_io_error field that is now superfluous
> because there always is exactly one completion.
So I guess b_io_remaining was the count of in-flight bios? And making a
chain of bios basically just moves all that to the bio layer, so all
xfs_buf needs to know is whether or not a bio is in progress, right?
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_buf.c | 207 ++++++++++++++---------------------------------
> fs/xfs/xfs_buf.h | 2 -
> 2 files changed, 60 insertions(+), 149 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index e886605b5721..094f16457998 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1364,13 +1364,6 @@ xfs_buf_ioend(
> {
> trace_xfs_buf_iodone(bp, _RET_IP_);
>
> - /*
> - * Pull in IO completion errors now. We are guaranteed to be running
> - * single threaded, so we don't need the lock to read b_io_error.
> - */
> - if (!bp->b_error && bp->b_io_error)
> - xfs_buf_ioerror(bp, bp->b_io_error);
> -
> if (bp->b_flags & XBF_READ) {
> if (!bp->b_error && bp->b_ops)
> bp->b_ops->verify_read(bp);
> @@ -1494,118 +1487,26 @@ static void
> xfs_buf_bio_end_io(
> struct bio *bio)
> {
> - struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private;
> + struct xfs_buf *bp = bio->bi_private;
>
> - if (!bio->bi_status &&
> - (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> - XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> - bio->bi_status = BLK_STS_IOERR;
> -
> - /*
> - * don't overwrite existing errors - otherwise we can lose errors on
> - * buffers that require multiple bios to complete.
> - */
> - if (bio->bi_status) {
> - int error = blk_status_to_errno(bio->bi_status);
> -
> - cmpxchg(&bp->b_io_error, 0, error);
> - }
> + if (bio->bi_status)
> + xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
> + else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> + XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> + xfs_buf_ioerror(bp, -EIO);
>
> if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
> invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
>
> - if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> - xfs_buf_ioend_async(bp);
> + xfs_buf_ioend_async(bp);
> bio_put(bio);
> }
>
> -static void
> -xfs_buf_ioapply_map(
> - struct xfs_buf *bp,
> - int map,
> - int *buf_offset,
> - int *count,
> - blk_opf_t op)
> -{
> - int page_index;
> - unsigned int total_nr_pages = bp->b_page_count;
> - int nr_pages;
> - struct bio *bio;
> - sector_t sector = bp->b_maps[map].bm_bn;
> - int size;
> - int offset;
> -
> - /* skip the pages in the buffer before the start offset */
> - page_index = 0;
> - offset = *buf_offset;
> - while (offset >= PAGE_SIZE) {
> - page_index++;
> - offset -= PAGE_SIZE;
> - }
> -
> - /*
> - * Limit the IO size to the length of the current vector, and update the
> - * remaining IO count for the next time around.
> - */
> - size = min_t(int, BBTOB(bp->b_maps[map].bm_len), *count);
> - *count -= size;
> - *buf_offset += size;
> -
> -next_chunk:
> - atomic_inc(&bp->b_io_remaining);
> - nr_pages = bio_max_segs(total_nr_pages);
> -
> - bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO);
> - bio->bi_iter.bi_sector = sector;
> - bio->bi_end_io = xfs_buf_bio_end_io;
> - bio->bi_private = bp;
> -
> - for (; size && nr_pages; nr_pages--, page_index++) {
> - int rbytes, nbytes = PAGE_SIZE - offset;
> -
> - if (nbytes > size)
> - nbytes = size;
> -
> - rbytes = bio_add_page(bio, bp->b_pages[page_index], nbytes,
> - offset);
> - if (rbytes < nbytes)
> - break;
> -
> - offset = 0;
> - sector += BTOBB(nbytes);
> - size -= nbytes;
> - total_nr_pages--;
> - }
> -
> - if (likely(bio->bi_iter.bi_size)) {
> - if (xfs_buf_is_vmapped(bp)) {
> - flush_kernel_vmap_range(bp->b_addr,
> - xfs_buf_vmap_len(bp));
> - }
> - submit_bio(bio);
> - if (size)
> - goto next_chunk;
> - } else {
> - /*
> - * This is guaranteed not to be the last io reference count
> - * because the caller (xfs_buf_submit) holds a count itself.
> - */
> - atomic_dec(&bp->b_io_remaining);
> - xfs_buf_ioerror(bp, -EIO);
> - bio_put(bio);
> - }
> -
> -}
> -
> -STATIC void
> -_xfs_buf_ioapply(
> - struct xfs_buf *bp)
> +static inline blk_opf_t
> +xfs_buf_bio_op(
> + struct xfs_buf *bp)
> {
> - struct blk_plug plug;
> - blk_opf_t op;
> - int offset;
> - int size;
> - int i;
> + blk_opf_t op;
>
> if (bp->b_flags & XBF_WRITE) {
> op = REQ_OP_WRITE;
> @@ -1615,25 +1516,52 @@ _xfs_buf_ioapply(
> op |= REQ_RAHEAD;
> }
>
> - /* we only use the buffer cache for meta-data */
> - op |= REQ_META;
> + return op | REQ_META;
> +}
> +
> +static void
> +xfs_buf_submit_bio(
> + struct xfs_buf *bp)
> +{
> + unsigned int size = BBTOB(bp->b_length);
> + unsigned int map = 0, p;
> + struct blk_plug plug;
> + struct bio *bio;
> +
> + bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count,
> + xfs_buf_bio_op(bp), GFP_NOIO);
> + bio->bi_private = bp;
> + bio->bi_end_io = xfs_buf_bio_end_io;
> +
> + if (bp->b_flags & _XBF_KMEM) {
> + __bio_add_page(bio, virt_to_page(bp->b_addr), size,
> + bp->b_offset);
> + } else {
> + for (p = 0; p < bp->b_page_count; p++)
> + __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0);
> + bio->bi_iter.bi_size = size; /* limit to the actual size used */
> +
> + if (xfs_buf_is_vmapped(bp))
> + flush_kernel_vmap_range(bp->b_addr,
> + xfs_buf_vmap_len(bp));
> + }
>
> /*
> - * Walk all the vectors issuing IO on them. Set up the initial offset
> - * into the buffer and the desired IO size before we start -
> - * _xfs_buf_ioapply_vec() will modify them appropriately for each
> - * subsequent call.
> + * If there is more than one map segment, split the original bio to
> + * submit a separate bio for each discontiguous range.
> */
> - offset = bp->b_offset;
> - size = BBTOB(bp->b_length);
> blk_start_plug(&plug);
> - for (i = 0; i < bp->b_map_count; i++) {
> - xfs_buf_ioapply_map(bp, i, &offset, &size, op);
> - if (bp->b_error)
> - break;
> - if (size <= 0)
> - break; /* all done */
Eerrugh, I spent quite a while trying to wrap my head around the old
code when I was writing the in-memory xfs_buf support. This is much
less weird to look at...
> + for (map = 0; map < bp->b_map_count - 1; map++) {
...but why isn't this "map < bp->b_map_count"?
--D
> + struct bio *split;
> +
> + split = bio_split(bio, bp->b_maps[map].bm_len, GFP_NOFS,
> + &fs_bio_set);
> + split->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
> + bio_chain(split, bio);
> + submit_bio(split);
> }
> + bio->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
> + submit_bio(bio);
> blk_finish_plug(&plug);
> }
>
> @@ -1693,8 +1621,6 @@ static int
> xfs_buf_submit(
> struct xfs_buf *bp)
> {
> - int error = 0;
> -
> trace_xfs_buf_submit(bp, _RET_IP_);
>
> ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
> @@ -1735,14 +1661,7 @@ xfs_buf_submit(
> * left over from previous use of the buffer (e.g. failed readahead).
> */
> bp->b_error = 0;
> - bp->b_io_error = 0;
>
> - /*
> - * Set the count to 1 initially, this will stop an I/O completion
> - * callout which happens before we have started all the I/O from calling
> - * xfs_buf_ioend too early.
> - */
> - atomic_set(&bp->b_io_remaining, 1);
> if (bp->b_flags & XBF_ASYNC)
> xfs_buf_ioacct_inc(bp);
>
> @@ -1755,28 +1674,22 @@ xfs_buf_submit(
> if (xfs_buftarg_is_mem(bp->b_target))
> goto done;
>
> - _xfs_buf_ioapply(bp);
> + xfs_buf_submit_bio(bp);
> + goto rele;
>
> done:
> - /*
> - * If _xfs_buf_ioapply failed, we can get back here with only the IO
> - * reference we took above. If we drop it to zero, run completion so
> - * that we don't return to the caller with completion still pending.
> - */
> - if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> - if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> - xfs_buf_ioend(bp);
> - else
> - xfs_buf_ioend_async(bp);
> - }
> -
> + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> + xfs_buf_ioend(bp);
> + else
> + xfs_buf_ioend_async(bp);
> +rele:
> /*
> * Release the hold that keeps the buffer referenced for the entire
> * I/O. Note that if the buffer is async, it is not safe to reference
> * after this release.
> */
> xfs_buf_rele(bp);
> - return error;
> + return 0;
> }
>
> void *
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index da80399c7457..c53d27439ff2 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -184,7 +184,6 @@ struct xfs_buf {
> struct list_head b_lru; /* lru list */
> spinlock_t b_lock; /* internal state lock */
> unsigned int b_state; /* internal state flags */
> - int b_io_error; /* internal IO error state */
> wait_queue_head_t b_waiters; /* unpin waiters */
> struct list_head b_list;
> struct xfs_perag *b_pag;
> @@ -202,7 +201,6 @@ struct xfs_buf {
> struct xfs_buf_map __b_map; /* inline compound buffer map */
> int b_map_count;
> atomic_t b_pin_count; /* pin count */
> - atomic_t b_io_remaining; /* #outstanding I/O requests */
> unsigned int b_page_count; /* size of page array */
> unsigned int b_offset; /* page offset of b_addr,
> only for _XBF_KMEM buffers */
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 10/15] xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend
2025-01-06 9:54 ` [PATCH 10/15] xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend Christoph Hellwig
@ 2025-01-07 6:42 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:42 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:47AM +0100, Christoph Hellwig wrote:
> Invalidating cache lines can be fairly expensive, so don't do it
> in interrupt context. Note that in practice very few setup will
> actually do anything here as virtually indexed caches are rather
> uncommon, but we might as well move the call to the proper place
> while touching this area.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Seems fine to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 094f16457998..49df4adf0e98 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1365,6 +1365,9 @@ xfs_buf_ioend(
> trace_xfs_buf_iodone(bp, _RET_IP_);
>
> if (bp->b_flags & XBF_READ) {
> + if (!bp->b_error && xfs_buf_is_vmapped(bp))
> + invalidate_kernel_vmap_range(bp->b_addr,
> + xfs_buf_vmap_len(bp));
> if (!bp->b_error && bp->b_ops)
> bp->b_ops->verify_read(bp);
> if (!bp->b_error)
> @@ -1495,9 +1498,6 @@ xfs_buf_bio_end_io(
> XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> xfs_buf_ioerror(bp, -EIO);
>
> - if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
> - invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
> -
> xfs_buf_ioend_async(bp);
> bio_put(bio);
> }
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 09/15] xfs: simplify buffer I/O submission
2025-01-07 6:42 ` Darrick J. Wong
@ 2025-01-07 6:46 ` Christoph Hellwig
2025-01-07 6:57 ` Darrick J. Wong
0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-07 6:46 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:42:24PM -0800, Darrick J. Wong wrote:
> So I guess b_io_remaining was the count of in-flight bios?
Yes. Plus a bias of 1 for the submitting context so that the
completion isn't run until all bios are submitted.
> And making a
> chain of bios basically just moves all that to the bio layer, so all
> xfs_buf needs to know is whether or not a bio is in progress, right?
Yes.
> Eerrugh, I spent quite a while trying to wrap my head around the old
> code when I was writing the in-memory xfs_buf support. This is much
> less weird to look at...
>
> > + for (map = 0; map < bp->b_map_count - 1; map++) {
>
> ...but why isn't this "map < bp->b_map_count"?
Because the final ("main") bio is submitted outside the loop as the
loop body chains the ones before to it. I guess this should go into
a comment to confuse the readers a little less.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 12/15] xfs: always complete the buffer inline in xfs_buf_submit
2025-01-06 9:54 ` [PATCH 12/15] xfs: always complete the buffer inline " Christoph Hellwig
@ 2025-01-07 6:46 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:46 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:49AM +0100, Christoph Hellwig wrote:
> xfs_buf_submit now only completes a buffer on error, or for in-memory
> buftargs. There is no point in using a workqueue for the latter as
> the completion will just wake up the caller. Optimize this case by
> avoiding the workqueue roundtrip.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
This all seems simpler now...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 352cc50aeea5..0ad3cacfdba1 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1670,10 +1670,7 @@ xfs_buf_submit(
> xfs_buf_submit_bio(bp);
> return 0;
> done:
> - if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> - xfs_buf_ioend(bp);
> - else
> - xfs_buf_ioend_async(bp);
> + xfs_buf_ioend(bp);
> return 0;
> }
>
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 13/15] xfs: simplify xfsaild_resubmit_item
2025-01-06 9:54 ` [PATCH 13/15] xfs: simplify xfsaild_resubmit_item Christoph Hellwig
@ 2025-01-07 6:49 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:49 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:50AM +0100, Christoph Hellwig wrote:
> Since commit acc8f8628c37 ("xfs: attach dquot buffer to dquot log item
> buffer") all buf items that use bp->b_li_list are explicitly checked for
> in the branch to just clears XFS_LI_FAILED. Remove the dead arm that
> calls xfs_clear_li_failed.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Makes sense to me
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_trans_ail.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index f56d62dced97..0fcb1828e598 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -359,13 +359,8 @@ xfsaild_resubmit_item(
> }
>
> /* protected by ail_lock */
> - list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
> - if (bp->b_flags & (_XBF_INODES | _XBF_DQUOTS))
> - clear_bit(XFS_LI_FAILED, &lip->li_flags);
> - else
> - xfs_clear_li_failed(lip);
> - }
> -
> + list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
> + clear_bit(XFS_LI_FAILED, &lip->li_flags);
> xfs_buf_unlock(bp);
> return XFS_ITEM_SUCCESS;
> }
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 14/15] xfs: move b_li_list based retry handling to common code
2025-01-06 9:54 ` [PATCH 14/15] xfs: move b_li_list based retry handling to common code Christoph Hellwig
@ 2025-01-07 6:55 ` Darrick J. Wong
2025-01-07 7:03 ` Christoph Hellwig
0 siblings, 1 reply; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:55 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:51AM +0100, Christoph Hellwig wrote:
> The dquot and inode version are very similar, which is expected given the
> overall b_li_list logic. The differences are that the inode version also
> clears the XFS_LI_FLUSHING which is defined in common but only ever set
> by the inode item, and that the dquot version takes the ail_lock over
> the list iteration. While this seems sensible given that additions and
> removals from b_li_list are protected by the ail_lock, log items are
> only added before buffer submission, and are only removed when completing
> the buffer, so nothing can change the list when retrying a buffer.
Heh, I think that's not quite true -- I think xfs_dquot_detach_buf
actually has a bug where it needs to take the buffer lock before
detaching the dquot from the b_li_list. And I think kfence just whacked
me for that on tonight's fstests run.
But that's neither here nor there. Moving along...
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_buf.c | 12 ++++++------
> fs/xfs/xfs_buf_item.h | 5 -----
> fs/xfs/xfs_dquot.c | 12 ------------
> fs/xfs/xfs_inode_item.c | 12 ------------
> 4 files changed, 6 insertions(+), 35 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 0ad3cacfdba1..1cf5d14d0d06 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1288,6 +1288,7 @@ xfs_buf_ioend_handle_error(
> {
> struct xfs_mount *mp = bp->b_mount;
> struct xfs_error_cfg *cfg;
> + struct xfs_log_item *lip;
>
> /*
> * If we've already shutdown the journal because of I/O errors, there's
> @@ -1335,12 +1336,11 @@ xfs_buf_ioend_handle_error(
> }
>
> /* Still considered a transient error. Caller will schedule retries. */
> - if (bp->b_flags & _XBF_INODES)
> - xfs_buf_inode_io_fail(bp);
> - else if (bp->b_flags & _XBF_DQUOTS)
> - xfs_buf_dquot_io_fail(bp);
> - else
> - ASSERT(list_empty(&bp->b_li_list));
> + list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
> + set_bit(XFS_LI_FAILED, &lip->li_flags);
> + clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
Should dquot log items be setting XFS_LI_FLUSHING?
--D
> + }
> +
> xfs_buf_ioerror(bp, 0);
> xfs_buf_relse(bp);
> return true;
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index 4d8a6aece995..8cde85259a58 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -54,17 +54,12 @@ bool xfs_buf_item_put(struct xfs_buf_log_item *);
> void xfs_buf_item_log(struct xfs_buf_log_item *, uint, uint);
> bool xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
> void xfs_buf_inode_iodone(struct xfs_buf *);
> -void xfs_buf_inode_io_fail(struct xfs_buf *bp);
> #ifdef CONFIG_XFS_QUOTA
> void xfs_buf_dquot_iodone(struct xfs_buf *);
> -void xfs_buf_dquot_io_fail(struct xfs_buf *bp);
> #else
> static inline void xfs_buf_dquot_iodone(struct xfs_buf *bp)
> {
> }
> -static inline void xfs_buf_dquot_io_fail(struct xfs_buf *bp)
> -{
> -}
> #endif /* CONFIG_XFS_QUOTA */
> void xfs_buf_iodone(struct xfs_buf *);
> bool xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index f11d475898f2..78dde811ab16 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1229,18 +1229,6 @@ xfs_buf_dquot_iodone(
> }
> }
>
> -void
> -xfs_buf_dquot_io_fail(
> - struct xfs_buf *bp)
> -{
> - struct xfs_log_item *lip;
> -
> - spin_lock(&bp->b_mount->m_ail->ail_lock);
> - list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
> - set_bit(XFS_LI_FAILED, &lip->li_flags);
> - spin_unlock(&bp->b_mount->m_ail->ail_lock);
> -}
> -
> /* Check incore dquot for errors before we flush. */
> static xfs_failaddr_t
> xfs_qm_dqflush_check(
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 912f0b1bc3cb..4fb2e1a6ad26 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -1023,18 +1023,6 @@ xfs_buf_inode_iodone(
> list_splice_tail(&flushed_inodes, &bp->b_li_list);
> }
>
> -void
> -xfs_buf_inode_io_fail(
> - struct xfs_buf *bp)
> -{
> - struct xfs_log_item *lip;
> -
> - list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
> - set_bit(XFS_LI_FAILED, &lip->li_flags);
> - clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
> - }
> -}
> -
> /*
> * Clear the inode logging fields so no more flushes are attempted. If we are
> * on a buffer list, it is now safe to remove it because the buffer is
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 09/15] xfs: simplify buffer I/O submission
2025-01-07 6:46 ` Christoph Hellwig
@ 2025-01-07 6:57 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:57 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Tue, Jan 07, 2025 at 07:46:37AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 10:42:24PM -0800, Darrick J. Wong wrote:
> > So I guess b_io_remaining was the count of in-flight bios?
>
> Yes. Plus a bias of 1 for the submitting context so that the
> completion isn't run until all bios are submitted.
>
> > And making a
> > chain of bios basically just moves all that to the bio layer, so all
> > xfs_buf needs to know is whether or not a bio is in progress, right?
>
> Yes.
>
> > Eerrugh, I spent quite a while trying to wrap my head around the old
> > code when I was writing the in-memory xfs_buf support. This is much
> > less weird to look at...
> >
> > > + for (map = 0; map < bp->b_map_count - 1; map++) {
> >
> > ...but why isn't this "map < bp->b_map_count"?
>
> Because the final ("main") bio is submitted outside the loop as the
> loop body chains the ones before to it. I guess this should go into
> a comment to confuse the readers a little less.
Ah, right, because we start with one bio for N maps, which means we need
to split it N-1 times to end up with N (chained) bios, one for each map.
Yes, comment please. :)
--D
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 15/15] xfs: add a b_iodone callback to struct xfs_buf
2025-01-06 9:54 ` [PATCH 15/15] xfs: add a b_iodone callback to struct xfs_buf Christoph Hellwig
@ 2025-01-07 6:58 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-07 6:58 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:52AM +0100, Christoph Hellwig wrote:
> Stop open coding the log item completions and instead add a callback
> into back into the submitter.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Yeah, seems reasonable to me...
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 7 ++-----
> fs/xfs/xfs_buf.h | 5 +----
> fs/xfs/xfs_dquot.c | 2 +-
> fs/xfs/xfs_inode_item.c | 2 +-
> fs/xfs/xfs_trans_buf.c | 8 ++++----
> 5 files changed, 9 insertions(+), 15 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 1cf5d14d0d06..68a5148115e5 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1394,11 +1394,8 @@ xfs_buf_ioend(
> if (bp->b_log_item)
> xfs_buf_item_done(bp);
>
> - if (bp->b_flags & _XBF_INODES)
> - xfs_buf_inode_iodone(bp);
> - else if (bp->b_flags & _XBF_DQUOTS)
> - xfs_buf_dquot_iodone(bp);
> -
> + if (bp->b_iodone)
> + bp->b_iodone(bp);
> }
>
> bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD |
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index c53d27439ff2..10bf66e074a0 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -34,8 +34,6 @@ struct xfs_buf;
> #define XBF_WRITE_FAIL (1u << 7) /* async writes have failed on this buffer */
>
> /* buffer type flags for write callbacks */
> -#define _XBF_INODES (1u << 16)/* inode buffer */
> -#define _XBF_DQUOTS (1u << 17)/* dquot buffer */
> #define _XBF_LOGRECOVERY (1u << 18)/* log recovery buffer */
>
> /* flags used only internally */
> @@ -65,8 +63,6 @@ typedef unsigned int xfs_buf_flags_t;
> { XBF_DONE, "DONE" }, \
> { XBF_STALE, "STALE" }, \
> { XBF_WRITE_FAIL, "WRITE_FAIL" }, \
> - { _XBF_INODES, "INODES" }, \
> - { _XBF_DQUOTS, "DQUOTS" }, \
> { _XBF_LOGRECOVERY, "LOG_RECOVERY" }, \
> { _XBF_PAGES, "PAGES" }, \
> { _XBF_KMEM, "KMEM" }, \
> @@ -205,6 +201,7 @@ struct xfs_buf {
> unsigned int b_offset; /* page offset of b_addr,
> only for _XBF_KMEM buffers */
> int b_error; /* error code on I/O */
> + void (*b_iodone)(struct xfs_buf *bp);
>
> /*
> * async write failure retry count. Initialised to zero on the first
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index 78dde811ab16..e0a379729674 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1446,7 +1446,7 @@ xfs_qm_dqflush(
> * Attach the dquot to the buffer so that we can remove this dquot from
> * the AIL and release the flush lock once the dquot is synced to disk.
> */
> - bp->b_flags |= _XBF_DQUOTS;
> + bp->b_iodone = xfs_buf_dquot_iodone;
> list_add_tail(&lip->li_bio_list, &bp->b_li_list);
>
> /*
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 4fb2e1a6ad26..e0990a8c4007 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -185,7 +185,7 @@ xfs_inode_item_precommit(
> xfs_buf_hold(bp);
> spin_lock(&iip->ili_lock);
> iip->ili_item.li_buf = bp;
> - bp->b_flags |= _XBF_INODES;
> + bp->b_iodone = xfs_buf_inode_iodone;
> list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list);
> xfs_trans_brelse(tp, bp);
> }
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 8e886ecfd69a..53af546c0b23 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -659,7 +659,7 @@ xfs_trans_inode_buf(
> ASSERT(atomic_read(&bip->bli_refcount) > 0);
>
> bip->bli_flags |= XFS_BLI_INODE_BUF;
> - bp->b_flags |= _XBF_INODES;
> + bp->b_iodone = xfs_buf_inode_iodone;
> xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
> }
>
> @@ -684,7 +684,7 @@ xfs_trans_stale_inode_buf(
> ASSERT(atomic_read(&bip->bli_refcount) > 0);
>
> bip->bli_flags |= XFS_BLI_STALE_INODE;
> - bp->b_flags |= _XBF_INODES;
> + bp->b_iodone = xfs_buf_inode_iodone;
> xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
> }
>
> @@ -709,7 +709,7 @@ xfs_trans_inode_alloc_buf(
> ASSERT(atomic_read(&bip->bli_refcount) > 0);
>
> bip->bli_flags |= XFS_BLI_INODE_ALLOC_BUF;
> - bp->b_flags |= _XBF_INODES;
> + bp->b_iodone = xfs_buf_inode_iodone;
> xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF);
> }
>
> @@ -820,6 +820,6 @@ xfs_trans_dquot_buf(
> break;
> }
>
> - bp->b_flags |= _XBF_DQUOTS;
> + bp->b_iodone = xfs_buf_dquot_iodone;
> xfs_trans_buf_set_type(tp, bp, type);
> }
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 14/15] xfs: move b_li_list based retry handling to common code
2025-01-07 6:55 ` Darrick J. Wong
@ 2025-01-07 7:03 ` Christoph Hellwig
2025-01-13 7:18 ` Darrick J. Wong
0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-07 7:03 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:55:47PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 06, 2025 at 10:54:51AM +0100, Christoph Hellwig wrote:
> > The dquot and inode version are very similar, which is expected given the
> > overall b_li_list logic. The differences are that the inode version also
> > clears the XFS_LI_FLUSHING which is defined in common but only ever set
> > by the inode item, and that the dquot version takes the ail_lock over
> > the list iteration. While this seems sensible given that additions and
> > removals from b_li_list are protected by the ail_lock, log items are
> > only added before buffer submission, and are only removed when completing
> > the buffer, so nothing can change the list when retrying a buffer.
>
> Heh, I think that's not quite true -- I think xfs_dquot_detach_buf
> actually has a bug where it needs to take the buffer lock before
> detaching the dquot from the b_li_list. And I think kfence just whacked
> me for that on tonight's fstests run.
Ooops :)
> > + list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
> > + set_bit(XFS_LI_FAILED, &lip->li_flags);
> > + clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
>
> Should dquot log items be setting XFS_LI_FLUSHING?
That would help to avoid roundtrips into ->iop_push and thus a
dqlock (try)lock roundtrip for them. So it would be nice to have,
but it's not functionally needed.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf
2025-01-07 6:06 ` Christoph Hellwig
@ 2025-01-13 7:12 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-13 7:12 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Tue, Jan 07, 2025 at 07:06:57AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 06:08:10PM -0800, Darrick J. Wong wrote:
> > > - * after I/O completion, reuse the original list as the wait list.
> > > - */
> > > - xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
> > > + bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC);
> > > + bp->b_flags |= XBF_WRITE;
> > > + xfs_buf_submit(bp);
> >
> > Why is it ok to ignore the return value here? Is it because the only
> > error path in xfs_buf_submit is the xlog_is_shutdown case, in which case
> > the buffer ioend will have been called already and the EIO will be
> > returned by xfs_buf_iowait?
>
> A very good question to be asked to the author of the original
> xfs_buf_delwri_submit_buffers code that this go extracted from :)
>
> I think you're provided answer is correct and also implies that we
> should either get rid of the xfs_buf_submit return value or check it
> more consistently.
<nod>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 11/15] xfs: remove the extra buffer reference in xfs_buf_submit
2025-01-06 9:54 ` [PATCH 11/15] xfs: remove the extra buffer reference in xfs_buf_submit Christoph Hellwig
@ 2025-01-13 7:13 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-13 7:13 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 06, 2025 at 10:54:48AM +0100, Christoph Hellwig wrote:
> Nothing touches the buffer after it has been submitted now, so the need for
> the extra transient reference went away as well.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Ooops, forgot this one.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 17 +----------------
> 1 file changed, 1 insertion(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 49df4adf0e98..352cc50aeea5 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1646,13 +1646,6 @@ xfs_buf_submit(
> return -EIO;
> }
>
> - /*
> - * Grab a reference so the buffer does not go away underneath us. For
> - * async buffers, I/O completion drops the callers reference, which
> - * could occur before submission returns.
> - */
> - xfs_buf_hold(bp);
> -
> if (bp->b_flags & XBF_WRITE)
> xfs_buf_wait_unpin(bp);
>
> @@ -1675,20 +1668,12 @@ xfs_buf_submit(
> goto done;
>
> xfs_buf_submit_bio(bp);
> - goto rele;
> -
> + return 0;
> done:
> if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> xfs_buf_ioend(bp);
> else
> xfs_buf_ioend_async(bp);
> -rele:
> - /*
> - * Release the hold that keeps the buffer referenced for the entire
> - * I/O. Note that if the buffer is async, it is not safe to reference
> - * after this release.
> - */
> - xfs_buf_rele(bp);
> return 0;
> }
>
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 14/15] xfs: move b_li_list based retry handling to common code
2025-01-07 7:03 ` Christoph Hellwig
@ 2025-01-13 7:18 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-13 7:18 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Tue, Jan 07, 2025 at 08:03:22AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 10:55:47PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 06, 2025 at 10:54:51AM +0100, Christoph Hellwig wrote:
> > > The dquot and inode version are very similar, which is expected given the
> > > overall b_li_list logic. The differences are that the inode version also
> > > clears the XFS_LI_FLUSHING which is defined in common but only ever set
> > > by the inode item, and that the dquot version takes the ail_lock over
> > > the list iteration. While this seems sensible given that additions and
> > > removals from b_li_list are protected by the ail_lock, log items are
> > > only added before buffer submission, and are only removed when completing
> > > the buffer, so nothing can change the list when retrying a buffer.
> >
> > Heh, I think that's not quite true -- I think xfs_dquot_detach_buf
> > actually has a bug where it needs to take the buffer lock before
> > detaching the dquot from the b_li_list. And I think kfence just whacked
> > me for that on tonight's fstests run.
>
> Ooops :)
...and I think this is now in -rc7 so no worries here.
> > > + list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
> > > + set_bit(XFS_LI_FAILED, &lip->li_flags);
> > > + clear_bit(XFS_LI_FLUSHING, &lip->li_flags);
> >
> > Should dquot log items be setting XFS_LI_FLUSHING?
>
> That would help to avoid roundtrips into ->iop_push and thus a
> dqlock (try)lock roundtrip for them. So it would be nice to have,
> but it's not functionally needed.
<nod> Sounds like a reasonable cleanup for someone.
For this change,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 09/15] xfs: simplify buffer I/O submission
2025-01-13 14:12 buffer cache cleanups v2 Christoph Hellwig
@ 2025-01-13 14:12 ` Christoph Hellwig
2025-01-13 18:15 ` Darrick J. Wong
0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2025-01-13 14:12 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Darrick J. Wong, linux-xfs
The code in _xfs_buf_ioapply is unnecessarily complicated because it
doesn't take advantage of modern bio features.
Simplify it by making use of bio splitting and chaining, that is build
a single bio for the pages in the buffer using a simple loop, and then
split that bio on the map boundaries for discontiguous multi-FSB buffers
and chain the split bios to the main one so that there is only a single
I/O completion.
This not only simplifies the code to build the buffer, but also removes
the need for the b_io_remaining field as buffer ownership is granted
to the bio on submit of the final bio with no chance for a completion
before that as well as the b_io_error field that is now superfluous
because there always is exactly one completion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_buf.c | 204 ++++++++++++++---------------------------------
fs/xfs/xfs_buf.h | 2 -
2 files changed, 60 insertions(+), 146 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1e98fa812ba9..541e56b13869 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1362,13 +1362,6 @@ xfs_buf_ioend(
{
trace_xfs_buf_iodone(bp, _RET_IP_);
- /*
- * Pull in IO completion errors now. We are guaranteed to be running
- * single threaded, so we don't need the lock to read b_io_error.
- */
- if (!bp->b_error && bp->b_io_error)
- xfs_buf_ioerror(bp, bp->b_io_error);
-
if (bp->b_flags & XBF_READ) {
if (!bp->b_error && bp->b_ops)
bp->b_ops->verify_read(bp);
@@ -1491,118 +1484,26 @@ static void
xfs_buf_bio_end_io(
struct bio *bio)
{
- struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private;
+ struct xfs_buf *bp = bio->bi_private;
- if (!bio->bi_status &&
- (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
- XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
- bio->bi_status = BLK_STS_IOERR;
-
- /*
- * don't overwrite existing errors - otherwise we can lose errors on
- * buffers that require multiple bios to complete.
- */
- if (bio->bi_status) {
- int error = blk_status_to_errno(bio->bi_status);
-
- cmpxchg(&bp->b_io_error, 0, error);
- }
+ if (bio->bi_status)
+ xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
+ else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
+ XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
+ xfs_buf_ioerror(bp, -EIO);
if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
- if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
- xfs_buf_ioend_async(bp);
+ xfs_buf_ioend_async(bp);
bio_put(bio);
}
-static void
-xfs_buf_ioapply_map(
- struct xfs_buf *bp,
- int map,
- int *buf_offset,
- int *count,
- blk_opf_t op)
-{
- int page_index;
- unsigned int total_nr_pages = bp->b_page_count;
- int nr_pages;
- struct bio *bio;
- sector_t sector = bp->b_maps[map].bm_bn;
- int size;
- int offset;
-
- /* skip the pages in the buffer before the start offset */
- page_index = 0;
- offset = *buf_offset;
- while (offset >= PAGE_SIZE) {
- page_index++;
- offset -= PAGE_SIZE;
- }
-
- /*
- * Limit the IO size to the length of the current vector, and update the
- * remaining IO count for the next time around.
- */
- size = min_t(int, BBTOB(bp->b_maps[map].bm_len), *count);
- *count -= size;
- *buf_offset += size;
-
-next_chunk:
- atomic_inc(&bp->b_io_remaining);
- nr_pages = bio_max_segs(total_nr_pages);
-
- bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO);
- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = xfs_buf_bio_end_io;
- bio->bi_private = bp;
-
- for (; size && nr_pages; nr_pages--, page_index++) {
- int rbytes, nbytes = PAGE_SIZE - offset;
-
- if (nbytes > size)
- nbytes = size;
-
- rbytes = bio_add_page(bio, bp->b_pages[page_index], nbytes,
- offset);
- if (rbytes < nbytes)
- break;
-
- offset = 0;
- sector += BTOBB(nbytes);
- size -= nbytes;
- total_nr_pages--;
- }
-
- if (likely(bio->bi_iter.bi_size)) {
- if (xfs_buf_is_vmapped(bp)) {
- flush_kernel_vmap_range(bp->b_addr,
- xfs_buf_vmap_len(bp));
- }
- submit_bio(bio);
- if (size)
- goto next_chunk;
- } else {
- /*
- * This is guaranteed not to be the last io reference count
- * because the caller (xfs_buf_submit) holds a count itself.
- */
- atomic_dec(&bp->b_io_remaining);
- xfs_buf_ioerror(bp, -EIO);
- bio_put(bio);
- }
-
-}
-
-STATIC void
-_xfs_buf_ioapply(
- struct xfs_buf *bp)
+static inline blk_opf_t
+xfs_buf_bio_op(
+ struct xfs_buf *bp)
{
- struct blk_plug plug;
- blk_opf_t op;
- int offset;
- int size;
- int i;
+ blk_opf_t op;
if (bp->b_flags & XBF_WRITE) {
op = REQ_OP_WRITE;
@@ -1612,25 +1513,53 @@ _xfs_buf_ioapply(
op |= REQ_RAHEAD;
}
- /* we only use the buffer cache for meta-data */
- op |= REQ_META;
+ return op | REQ_META;
+}
+
+static void
+xfs_buf_submit_bio(
+ struct xfs_buf *bp)
+{
+ unsigned int size = BBTOB(bp->b_length);
+ unsigned int map = 0, p;
+ struct blk_plug plug;
+ struct bio *bio;
+
+ bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count,
+ xfs_buf_bio_op(bp), GFP_NOIO);
+ bio->bi_private = bp;
+ bio->bi_end_io = xfs_buf_bio_end_io;
+
+ if (bp->b_flags & _XBF_KMEM) {
+ __bio_add_page(bio, virt_to_page(bp->b_addr), size,
+ bp->b_offset);
+ } else {
+ for (p = 0; p < bp->b_page_count; p++)
+ __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0);
+ bio->bi_iter.bi_size = size; /* limit to the actual size used */
+
+ if (xfs_buf_is_vmapped(bp))
+ flush_kernel_vmap_range(bp->b_addr,
+ xfs_buf_vmap_len(bp));
+ }
/*
- * Walk all the vectors issuing IO on them. Set up the initial offset
- * into the buffer and the desired IO size before we start -
- * _xfs_buf_ioapply_vec() will modify them appropriately for each
- * subsequent call.
+ * If there is more than one map segment, split out a new bio for each
+ * map except of the last one. The last map is handled by the
+ * remainder of the original bio outside the loop.
*/
- offset = bp->b_offset;
- size = BBTOB(bp->b_length);
blk_start_plug(&plug);
- for (i = 0; i < bp->b_map_count; i++) {
- xfs_buf_ioapply_map(bp, i, &offset, &size, op);
- if (bp->b_error)
- break;
- if (size <= 0)
- break; /* all done */
+ for (map = 0; map < bp->b_map_count - 1; map++) {
+ struct bio *split;
+
+ split = bio_split(bio, bp->b_maps[map].bm_len, GFP_NOFS,
+ &fs_bio_set);
+ split->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
+ bio_chain(split, bio);
+ submit_bio(split);
}
+ bio->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
+ submit_bio(bio);
blk_finish_plug(&plug);
}
@@ -1729,14 +1658,7 @@ xfs_buf_submit(
* left over from previous use of the buffer (e.g. failed readahead).
*/
bp->b_error = 0;
- bp->b_io_error = 0;
- /*
- * Set the count to 1 initially, this will stop an I/O completion
- * callout which happens before we have started all the I/O from calling
- * xfs_buf_ioend too early.
- */
- atomic_set(&bp->b_io_remaining, 1);
if (bp->b_flags & XBF_ASYNC)
xfs_buf_ioacct_inc(bp);
@@ -1749,21 +1671,15 @@ xfs_buf_submit(
if (xfs_buftarg_is_mem(bp->b_target))
goto done;
- _xfs_buf_ioapply(bp);
+ xfs_buf_submit_bio(bp);
+ goto rele;
done:
- /*
- * If _xfs_buf_ioapply failed, we can get back here with only the IO
- * reference we took above. If we drop it to zero, run completion so
- * that we don't return to the caller with completion still pending.
- */
- if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
- if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
- xfs_buf_ioend(bp);
- else
- xfs_buf_ioend_async(bp);
- }
-
+ if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
+ xfs_buf_ioend(bp);
+ else
+ xfs_buf_ioend_async(bp);
+rele:
/*
* Release the hold that keeps the buffer referenced for the entire
* I/O. Note that if the buffer is async, it is not safe to reference
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index da80399c7457..c53d27439ff2 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -184,7 +184,6 @@ struct xfs_buf {
struct list_head b_lru; /* lru list */
spinlock_t b_lock; /* internal state lock */
unsigned int b_state; /* internal state flags */
- int b_io_error; /* internal IO error state */
wait_queue_head_t b_waiters; /* unpin waiters */
struct list_head b_list;
struct xfs_perag *b_pag;
@@ -202,7 +201,6 @@ struct xfs_buf {
struct xfs_buf_map __b_map; /* inline compound buffer map */
int b_map_count;
atomic_t b_pin_count; /* pin count */
- atomic_t b_io_remaining; /* #outstanding I/O requests */
unsigned int b_page_count; /* size of page array */
unsigned int b_offset; /* page offset of b_addr,
only for _XBF_KMEM buffers */
--
2.45.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 09/15] xfs: simplify buffer I/O submission
2025-01-13 14:12 ` [PATCH 09/15] xfs: simplify buffer I/O submission Christoph Hellwig
@ 2025-01-13 18:15 ` Darrick J. Wong
0 siblings, 0 replies; 41+ messages in thread
From: Darrick J. Wong @ 2025-01-13 18:15 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-xfs
On Mon, Jan 13, 2025 at 03:12:13PM +0100, Christoph Hellwig wrote:
> The code in _xfs_buf_ioapply is unnecessarily complicated because it
> doesn't take advantage of modern bio features.
>
> Simplify it by making use of bio splitting and chaining, that is build
> a single bio for the pages in the buffer using a simple loop, and then
> split that bio on the map boundaries for discontiguous multi-FSB buffers
> and chain the split bios to the main one so that there is only a single
> I/O completion.
>
> This not only simplifies the code to build the buffer, but also removes
> the need for the b_io_remaining field as buffer ownership is granted
> to the bio on submit of the final bio with no chance for a completion
> before that as well as the b_io_error field that is now superfluous
> because there always is exactly one completion.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Much simpler! And thank you for amending the comments per the last
round of review.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_buf.c | 204 ++++++++++++++---------------------------------
> fs/xfs/xfs_buf.h | 2 -
> 2 files changed, 60 insertions(+), 146 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 1e98fa812ba9..541e56b13869 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1362,13 +1362,6 @@ xfs_buf_ioend(
> {
> trace_xfs_buf_iodone(bp, _RET_IP_);
>
> - /*
> - * Pull in IO completion errors now. We are guaranteed to be running
> - * single threaded, so we don't need the lock to read b_io_error.
> - */
> - if (!bp->b_error && bp->b_io_error)
> - xfs_buf_ioerror(bp, bp->b_io_error);
> -
> if (bp->b_flags & XBF_READ) {
> if (!bp->b_error && bp->b_ops)
> bp->b_ops->verify_read(bp);
> @@ -1491,118 +1484,26 @@ static void
> xfs_buf_bio_end_io(
> struct bio *bio)
> {
> - struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private;
> + struct xfs_buf *bp = bio->bi_private;
>
> - if (!bio->bi_status &&
> - (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> - XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> - bio->bi_status = BLK_STS_IOERR;
> -
> - /*
> - * don't overwrite existing errors - otherwise we can lose errors on
> - * buffers that require multiple bios to complete.
> - */
> - if (bio->bi_status) {
> - int error = blk_status_to_errno(bio->bi_status);
> -
> - cmpxchg(&bp->b_io_error, 0, error);
> - }
> + if (bio->bi_status)
> + xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
> + else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> + XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> + xfs_buf_ioerror(bp, -EIO);
>
> if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
> invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
>
> - if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> - xfs_buf_ioend_async(bp);
> + xfs_buf_ioend_async(bp);
> bio_put(bio);
> }
>
> -static void
> -xfs_buf_ioapply_map(
> - struct xfs_buf *bp,
> - int map,
> - int *buf_offset,
> - int *count,
> - blk_opf_t op)
> -{
> - int page_index;
> - unsigned int total_nr_pages = bp->b_page_count;
> - int nr_pages;
> - struct bio *bio;
> - sector_t sector = bp->b_maps[map].bm_bn;
> - int size;
> - int offset;
> -
> - /* skip the pages in the buffer before the start offset */
> - page_index = 0;
> - offset = *buf_offset;
> - while (offset >= PAGE_SIZE) {
> - page_index++;
> - offset -= PAGE_SIZE;
> - }
> -
> - /*
> - * Limit the IO size to the length of the current vector, and update the
> - * remaining IO count for the next time around.
> - */
> - size = min_t(int, BBTOB(bp->b_maps[map].bm_len), *count);
> - *count -= size;
> - *buf_offset += size;
> -
> -next_chunk:
> - atomic_inc(&bp->b_io_remaining);
> - nr_pages = bio_max_segs(total_nr_pages);
> -
> - bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO);
> - bio->bi_iter.bi_sector = sector;
> - bio->bi_end_io = xfs_buf_bio_end_io;
> - bio->bi_private = bp;
> -
> - for (; size && nr_pages; nr_pages--, page_index++) {
> - int rbytes, nbytes = PAGE_SIZE - offset;
> -
> - if (nbytes > size)
> - nbytes = size;
> -
> - rbytes = bio_add_page(bio, bp->b_pages[page_index], nbytes,
> - offset);
> - if (rbytes < nbytes)
> - break;
> -
> - offset = 0;
> - sector += BTOBB(nbytes);
> - size -= nbytes;
> - total_nr_pages--;
> - }
> -
> - if (likely(bio->bi_iter.bi_size)) {
> - if (xfs_buf_is_vmapped(bp)) {
> - flush_kernel_vmap_range(bp->b_addr,
> - xfs_buf_vmap_len(bp));
> - }
> - submit_bio(bio);
> - if (size)
> - goto next_chunk;
> - } else {
> - /*
> - * This is guaranteed not to be the last io reference count
> - * because the caller (xfs_buf_submit) holds a count itself.
> - */
> - atomic_dec(&bp->b_io_remaining);
> - xfs_buf_ioerror(bp, -EIO);
> - bio_put(bio);
> - }
> -
> -}
> -
> -STATIC void
> -_xfs_buf_ioapply(
> - struct xfs_buf *bp)
> +static inline blk_opf_t
> +xfs_buf_bio_op(
> + struct xfs_buf *bp)
> {
> - struct blk_plug plug;
> - blk_opf_t op;
> - int offset;
> - int size;
> - int i;
> + blk_opf_t op;
>
> if (bp->b_flags & XBF_WRITE) {
> op = REQ_OP_WRITE;
> @@ -1612,25 +1513,53 @@ _xfs_buf_ioapply(
> op |= REQ_RAHEAD;
> }
>
> - /* we only use the buffer cache for meta-data */
> - op |= REQ_META;
> + return op | REQ_META;
> +}
> +
> +static void
> +xfs_buf_submit_bio(
> + struct xfs_buf *bp)
> +{
> + unsigned int size = BBTOB(bp->b_length);
> + unsigned int map = 0, p;
> + struct blk_plug plug;
> + struct bio *bio;
> +
> + bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count,
> + xfs_buf_bio_op(bp), GFP_NOIO);
> + bio->bi_private = bp;
> + bio->bi_end_io = xfs_buf_bio_end_io;
> +
> + if (bp->b_flags & _XBF_KMEM) {
> + __bio_add_page(bio, virt_to_page(bp->b_addr), size,
> + bp->b_offset);
> + } else {
> + for (p = 0; p < bp->b_page_count; p++)
> + __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0);
> + bio->bi_iter.bi_size = size; /* limit to the actual size used */
> +
> + if (xfs_buf_is_vmapped(bp))
> + flush_kernel_vmap_range(bp->b_addr,
> + xfs_buf_vmap_len(bp));
> + }
>
> /*
> - * Walk all the vectors issuing IO on them. Set up the initial offset
> - * into the buffer and the desired IO size before we start -
> - * _xfs_buf_ioapply_vec() will modify them appropriately for each
> - * subsequent call.
> + * If there is more than one map segment, split out a new bio for each
> + * map except of the last one. The last map is handled by the
> + * remainder of the original bio outside the loop.
> */
> - offset = bp->b_offset;
> - size = BBTOB(bp->b_length);
> blk_start_plug(&plug);
> - for (i = 0; i < bp->b_map_count; i++) {
> - xfs_buf_ioapply_map(bp, i, &offset, &size, op);
> - if (bp->b_error)
> - break;
> - if (size <= 0)
> - break; /* all done */
> + for (map = 0; map < bp->b_map_count - 1; map++) {
> + struct bio *split;
> +
> + split = bio_split(bio, bp->b_maps[map].bm_len, GFP_NOFS,
> + &fs_bio_set);
> + split->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
> + bio_chain(split, bio);
> + submit_bio(split);
> }
> + bio->bi_iter.bi_sector = bp->b_maps[map].bm_bn;
> + submit_bio(bio);
> blk_finish_plug(&plug);
> }
>
> @@ -1729,14 +1658,7 @@ xfs_buf_submit(
> * left over from previous use of the buffer (e.g. failed readahead).
> */
> bp->b_error = 0;
> - bp->b_io_error = 0;
>
> - /*
> - * Set the count to 1 initially, this will stop an I/O completion
> - * callout which happens before we have started all the I/O from calling
> - * xfs_buf_ioend too early.
> - */
> - atomic_set(&bp->b_io_remaining, 1);
> if (bp->b_flags & XBF_ASYNC)
> xfs_buf_ioacct_inc(bp);
>
> @@ -1749,21 +1671,15 @@ xfs_buf_submit(
> if (xfs_buftarg_is_mem(bp->b_target))
> goto done;
>
> - _xfs_buf_ioapply(bp);
> + xfs_buf_submit_bio(bp);
> + goto rele;
>
> done:
> - /*
> - * If _xfs_buf_ioapply failed, we can get back here with only the IO
> - * reference we took above. If we drop it to zero, run completion so
> - * that we don't return to the caller with completion still pending.
> - */
> - if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> - if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> - xfs_buf_ioend(bp);
> - else
> - xfs_buf_ioend_async(bp);
> - }
> -
> + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> + xfs_buf_ioend(bp);
> + else
> + xfs_buf_ioend_async(bp);
> +rele:
> /*
> * Release the hold that keeps the buffer referenced for the entire
> * I/O. Note that if the buffer is async, it is not safe to reference
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index da80399c7457..c53d27439ff2 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -184,7 +184,6 @@ struct xfs_buf {
> struct list_head b_lru; /* lru list */
> spinlock_t b_lock; /* internal state lock */
> unsigned int b_state; /* internal state flags */
> - int b_io_error; /* internal IO error state */
> wait_queue_head_t b_waiters; /* unpin waiters */
> struct list_head b_list;
> struct xfs_perag *b_pag;
> @@ -202,7 +201,6 @@ struct xfs_buf {
> struct xfs_buf_map __b_map; /* inline compound buffer map */
> int b_map_count;
> atomic_t b_pin_count; /* pin count */
> - atomic_t b_io_remaining; /* #outstanding I/O requests */
> unsigned int b_page_count; /* size of page array */
> unsigned int b_offset; /* page offset of b_addr,
> only for _XBF_KMEM buffers */
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2025-01-13 18:15 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-06 9:54 buffer cache cleanups Christoph Hellwig
2025-01-06 9:54 ` [PATCH 01/15] xfs: fix a double completion for buffers on in-memory targets Christoph Hellwig
2025-01-07 2:00 ` Darrick J. Wong
2025-01-07 6:05 ` Christoph Hellwig
2025-01-06 9:54 ` [PATCH 02/15] xfs: remove the incorrect comment above xfs_buf_free_maps Christoph Hellwig
2025-01-07 2:00 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 03/15] xfs: remove the incorrect comment about the b_pag field Christoph Hellwig
2025-01-07 2:01 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 04/15] xfs: move xfs_buf_iowait out of (__)xfs_buf_submit Christoph Hellwig
2025-01-07 2:02 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 05/15] xfs: simplify xfs_buf_delwri_pushbuf Christoph Hellwig
2025-01-07 2:08 ` Darrick J. Wong
2025-01-07 6:06 ` Christoph Hellwig
2025-01-13 7:12 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 06/15] xfs: remove xfs_buf_delwri_submit_buffers Christoph Hellwig
2025-01-07 6:31 ` Darrick J. Wong
2025-01-07 6:33 ` Christoph Hellwig
2025-01-06 9:54 ` [PATCH 07/15] xfs: move write verification out of _xfs_buf_ioapply Christoph Hellwig
2025-01-07 6:33 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 08/15] xfs: move in-memory buftarg handling " Christoph Hellwig
2025-01-07 6:34 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 09/15] xfs: simplify buffer I/O submission Christoph Hellwig
2025-01-07 6:42 ` Darrick J. Wong
2025-01-07 6:46 ` Christoph Hellwig
2025-01-07 6:57 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 10/15] xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend Christoph Hellwig
2025-01-07 6:42 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 11/15] xfs: remove the extra buffer reference in xfs_buf_submit Christoph Hellwig
2025-01-13 7:13 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 12/15] xfs: always complete the buffer inline " Christoph Hellwig
2025-01-07 6:46 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 13/15] xfs: simplify xfsaild_resubmit_item Christoph Hellwig
2025-01-07 6:49 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 14/15] xfs: move b_li_list based retry handling to common code Christoph Hellwig
2025-01-07 6:55 ` Darrick J. Wong
2025-01-07 7:03 ` Christoph Hellwig
2025-01-13 7:18 ` Darrick J. Wong
2025-01-06 9:54 ` [PATCH 15/15] xfs: add a b_iodone callback to struct xfs_buf Christoph Hellwig
2025-01-07 6:58 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2025-01-13 14:12 buffer cache cleanups v2 Christoph Hellwig
2025-01-13 14:12 ` [PATCH 09/15] xfs: simplify buffer I/O submission Christoph Hellwig
2025-01-13 18:15 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox