* [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces
@ 2014-08-15 6:38 Dave Chinner
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
` (8 more replies)
0 siblings, 9 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:38 UTC (permalink / raw)
To: xfs
Hi folks,
This patch set has come from the conversion Brian and I have been
having over how to fix a use-after-free issue related to IO
completions racing with synchronous IO submission. It seemed to me
we were having trouble grasping all the different things that were
wrong and putting them in a line, so I figured the easiest way to do
with was to write a patchset that fixed the original bug, everything
else we'd noticed, and changed the IO API such that we wouldn't have
problems in the future.
So, this patch contains:
a) initial fix for the use after free issue (patch 1)
b) clear separation of IO submission errors at the XFS level
from IO errors reported through BIO completion callbacks
(patches 2,3)
c) removal of some historic cruft around IO submission
(patch 4)
d) rework of unnecessarily complex historic error handling
paths (patches 5,6,7)
e) a new API for clear and concise asynchronous and
synchronous IO. (patch 8)
The last patch fixes a bug I found writing patch 8, and can probably
be moved to first in the series...
I was just going to send this compile tested for comments, but I
threw it at a couple of VMs and it's half way through xfstests now
without any failures. That means I haven't broken anything badly,
though the error hendling changes will be interesting to test.
The error handling changes have something in common - a buffer that
took an IO error was mostly marked stale, but not always. Sometimes
they were marked as "not done" indicating that IO needed to be done
on them before the contents were valid. Some got marked stale and
done, and so on. The thing is, when we mark a buffer stale, it is
essentially the same thing as marking it !XBF_DONE. So all the error
handling paths now stale the buffer and clear XBF_DONE. Nice,
consistent error handling. This enabled some significant cleanups -
xfs_trans_buf_read_map() finally got the love it's been needing for
long time.
It also defines a specific, safe way for waiting on XBF_ASYNC buffer
IO, and makes use of that in the buffer delayed write code. Hence
the patchset also solves the problem of how do async dispatch, some
work, then wait for IO completion as separate steps as opposed to
issuing synchronous IO that is waited on immediately.
Also, the rework of the IO submission interfaces removes the fixes
added in the first patch to solve the use-after-free issue - we no
longer need an extra reference to protect parts of the IO completion
path because the reference we take during submission is held until
after the IO completion wait has occurred.
Anyway, I figured patches would speak louder than any description,
and most especially a diffstat that looks like this:
9 files changed, 286 insertions(+), 360 deletions(-)
Comments welcome.
-Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/9] xfs: synchronous buffer IO needs a reference
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
@ 2014-08-15 6:38 ` Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-29 0:18 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality Dave Chinner
` (7 subsequent siblings)
8 siblings, 2 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:38 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When synchronous IO runs IO completion work, it does so without an
IO reference or a hold reference on the buffer. The IO "hold
reference" is owned by the submitter, and released when the
submission is complete. The IO reference is released when both the
submitter and the bio end_io processing is run, and so if the io
completion work is run from IO completion context, it is run without
an IO reference.
Hence we can get the situation where the submitter can submit the
IO, see an error on the buffer and unlock and free the buffer while
there is still IO in progress. This leads to use-after-free and
memory corruption.
Fix this by taking a "sync IO hold" reference that is owned by the
IO and not released until after the buffer completion calls are run
to wake up synchronous waiters. This means that the buffer will not
be freed in any circumstance until all IO processing is completed.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 46 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index cd7b8ca..5d86bbd 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1019,6 +1019,7 @@ xfs_buf_iodone_work(
else {
ASSERT(read && bp->b_ops);
complete(&bp->b_iowait);
+ xfs_buf_rele(bp);
}
}
@@ -1044,6 +1045,7 @@ xfs_buf_ioend(
} else {
bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
complete(&bp->b_iowait);
+ xfs_buf_rele(bp);
}
}
@@ -1383,22 +1385,50 @@ xfs_buf_iorequest(
if (bp->b_flags & XBF_WRITE)
xfs_buf_wait_unpin(bp);
+
+ /*
+ * Take references to the buffer. For XBF_ASYNC buffers, holding a
+ * reference for as long as submission takes is all that is necessary
+ * here. The IO inherits the lock and hold count from the submitter,
+ * and these are release during IO completion processing. Taking a hold
+ * over submission ensures that the buffer is not freed until we have
+ * completed all processing, regardless of when IO errors occur or are
+ * reported.
+ *
+ * However, for synchronous IO, the IO does not inherit the submitters
+ * reference count, nor the buffer lock. Hence we need to take an extra
+ * reference to the buffer for the for the IO context so that we can
+ * guarantee the buffer is not freed until all IO completion processing
+ * is done. Otherwise the caller can drop their reference while the IO
+ * is still in progress and hence trigger a use-after-free situation.
+ */
xfs_buf_hold(bp);
+ if (!(bp->b_flags & XBF_ASYNC))
+ xfs_buf_hold(bp);
+
/*
- * Set the count to 1 initially, this will stop an I/O
- * completion callout which happens before we have started
- * all the I/O from calling xfs_buf_ioend too early.
+ * Set the count to 1 initially, this will stop an I/O completion
+ * callout which happens before we have started all the I/O from calling
+ * xfs_buf_ioend too early.
*/
atomic_set(&bp->b_io_remaining, 1);
_xfs_buf_ioapply(bp);
+
/*
- * If _xfs_buf_ioapply failed, we'll get back here with
- * only the reference we took above. _xfs_buf_ioend will
- * drop it to zero, so we'd better not queue it for later,
- * or we'll free it before it's done.
+ * If _xfs_buf_ioapply failed or we are doing synchronous IO that
+ * completes extremely quickly, we can get back here with only the IO
+ * reference we took above. _xfs_buf_ioend will drop it to zero, so
+ * we'd better run completion processing synchronously so that the we
+ * don't return to the caller with completion still pending. In the
+ * error case, this allows the caller to check b_error safely without
+ * waiting, and in the synchronous IO case it avoids unnecessary context
+ * switches an latency for high-peformance devices.
*/
- _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
+ if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
+ _xfs_buf_ioend(bp, 0);
+ else
+ _xfs_buf_ioend(bp, 1);
xfs_buf_rele(bp);
}
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-29 0:22 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling Dave Chinner
` (6 subsequent siblings)
8 siblings, 2 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
We do some work in xfs_buf_ioend, and some work in
xfs_buf_iodone_work, but much of that functionality is the same.
This work can all be done in a single function, leaving
xfs_buf_iodone just a wrapper to determine if we should execute it
by workqueue or directly. hence rename xfs_buf_iodone_work to
xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
need async processing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 79 +++++++++++++++++++++---------------------------
fs/xfs/xfs_buf.h | 2 +-
fs/xfs/xfs_buf_item.c | 4 +--
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_log.c | 2 +-
fs/xfs/xfs_log_recover.c | 2 +-
6 files changed, 40 insertions(+), 51 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 5d86bbd..1b7f0bc 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -999,54 +999,49 @@ xfs_buf_wait_unpin(
*/
STATIC void
-xfs_buf_iodone_work(
- struct work_struct *work)
+xfs_buf_ioend(
+ struct xfs_buf *bp)
{
- struct xfs_buf *bp =
- container_of(work, xfs_buf_t, b_iodone_work);
- bool read = !!(bp->b_flags & XBF_READ);
+ bool read = !!(bp->b_flags & XBF_READ);
+
+ trace_xfs_buf_iodone(bp, _RET_IP_);
bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
- /* only validate buffers that were read without errors */
- if (read && bp->b_ops && !bp->b_error && (bp->b_flags & XBF_DONE))
- bp->b_ops->verify_read(bp);
+ if (!bp->b_error) {
+ bp->b_flags |= XBF_DONE;
+
+ /* only validate buffers that were read without errors */
+ if (read && bp->b_ops)
+ bp->b_ops->verify_read(bp);
+ }
if (bp->b_iodone)
(*(bp->b_iodone))(bp);
else if (bp->b_flags & XBF_ASYNC)
xfs_buf_relse(bp);
else {
- ASSERT(read && bp->b_ops);
complete(&bp->b_iowait);
xfs_buf_rele(bp);
}
}
-void
-xfs_buf_ioend(
- struct xfs_buf *bp,
- int schedule)
+static void
+xfs_buf_ioend_work(
+ struct work_struct *work)
{
- bool read = !!(bp->b_flags & XBF_READ);
-
- trace_xfs_buf_iodone(bp, _RET_IP_);
+ struct xfs_buf *bp =
+ container_of(work, xfs_buf_t, b_iodone_work);
- if (bp->b_error == 0)
- bp->b_flags |= XBF_DONE;
+ xfs_buf_ioend(bp);
+}
- if (bp->b_iodone || (read && bp->b_ops) || (bp->b_flags & XBF_ASYNC)) {
- if (schedule) {
- INIT_WORK(&bp->b_iodone_work, xfs_buf_iodone_work);
- queue_work(xfslogd_workqueue, &bp->b_iodone_work);
- } else {
- xfs_buf_iodone_work(&bp->b_iodone_work);
- }
- } else {
- bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
- complete(&bp->b_iowait);
- xfs_buf_rele(bp);
- }
+void
+xfs_buf_ioend_async(
+ struct xfs_buf *bp)
+{
+ INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
+ queue_work(xfslogd_workqueue, &bp->b_iodone_work);
}
void
@@ -1094,7 +1089,7 @@ xfs_bioerror(
XFS_BUF_UNDONE(bp);
xfs_buf_stale(bp);
- xfs_buf_ioend(bp, 0);
+ xfs_buf_ioend(bp);
return -EIO;
}
@@ -1181,15 +1176,6 @@ xfs_bwrite(
}
STATIC void
-_xfs_buf_ioend(
- xfs_buf_t *bp,
- int schedule)
-{
- if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
- xfs_buf_ioend(bp, schedule);
-}
-
-STATIC void
xfs_buf_bio_end_io(
struct bio *bio,
int error)
@@ -1206,7 +1192,8 @@ xfs_buf_bio_end_io(
if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
- _xfs_buf_ioend(bp, 1);
+ if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
+ xfs_buf_ioend_async(bp);
bio_put(bio);
}
@@ -1425,10 +1412,12 @@ xfs_buf_iorequest(
* waiting, and in the synchronous IO case it avoids unnecessary context
* switches an latency for high-peformance devices.
*/
- if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
- _xfs_buf_ioend(bp, 0);
- else
- _xfs_buf_ioend(bp, 1);
+ if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
+ if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
+ xfs_buf_ioend(bp);
+ else
+ xfs_buf_ioend_async(bp);
+ }
xfs_buf_rele(bp);
}
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index c753183..4585c15 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -286,7 +286,7 @@ extern void xfs_buf_unlock(xfs_buf_t *);
/* Buffer Read and Write Routines */
extern int xfs_bwrite(struct xfs_buf *bp);
-extern void xfs_buf_ioend(xfs_buf_t *, int);
+extern void xfs_buf_ioend(struct xfs_buf *bp);
extern void xfs_buf_ioerror(xfs_buf_t *, int);
extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
extern void xfs_buf_iorequest(xfs_buf_t *);
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 76007de..4fd41b5 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -491,7 +491,7 @@ xfs_buf_item_unpin(
xfs_buf_ioerror(bp, -EIO);
XFS_BUF_UNDONE(bp);
xfs_buf_stale(bp);
- xfs_buf_ioend(bp, 0);
+ xfs_buf_ioend(bp);
}
}
@@ -1115,7 +1115,7 @@ do_callbacks:
xfs_buf_do_callbacks(bp);
bp->b_fspriv = NULL;
bp->b_iodone = NULL;
- xfs_buf_ioend(bp, 0);
+ xfs_buf_ioend(bp);
}
/*
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index fea3c92..00d210b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3056,7 +3056,7 @@ cluster_corrupt_out:
XFS_BUF_UNDONE(bp);
xfs_buf_stale(bp);
xfs_buf_ioerror(bp, -EIO);
- xfs_buf_ioend(bp, 0);
+ xfs_buf_ioend(bp);
} else {
xfs_buf_stale(bp);
xfs_buf_relse(bp);
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8eaa8f5..e4665db 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1689,7 +1689,7 @@ xlog_bdstrat(
if (iclog->ic_state & XLOG_STATE_IOERROR) {
xfs_buf_ioerror(bp, -EIO);
xfs_buf_stale(bp);
- xfs_buf_ioend(bp, 0);
+ xfs_buf_ioend(bp);
/*
* It would seem logical to return EIO here, but we rely on
* the log state machine to propagate I/O errors instead of
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1fd5787..4ba19bf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -383,7 +383,7 @@ xlog_recover_iodone(
SHUTDOWN_META_IO_ERROR);
}
bp->b_iodone = NULL;
- xfs_buf_ioend(bp, 0);
+ xfs_buf_ioend(bp);
}
/*
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
2014-08-15 6:39 ` [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-29 0:23 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 4/9] xfs: kill xfs_bdstrat_cb Dave Chinner
` (5 subsequent siblings)
8 siblings, 2 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Currently the report of a bio error from completion
immediately marks the buffer with an error. The issue is that this
is racy w.r.t. synchronous IO - the submitter can see b_error being
set before the IO is complete, and hence we cannot differentiate
between submission failures and completion failures.
Add an internal b_io_error field protected by the b_lock to catch IO
completion errors, and only propagate that to the buffer during
final IO completion handling. Hence we can tell in xfs_buf_iorequest
if we've had a submission failure bey checking bp->b_error before
dropping our b_io_remaining reference - that reference will prevent
b_io_error values from being propagated to b_error in the event that
completion races with submission.
In doing so, allow xfs_buf_iorequest to return an error. That way,
the caller can check for submission errors safely if required, and
easily distinguish them from completion errors that come from
xfs_buf_iowait().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 18 ++++++++++++++++--
fs/xfs/xfs_buf.h | 1 +
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1b7f0bc..58ae34c 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1008,6 +1008,13 @@ xfs_buf_ioend(
bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
+ /*
+ * Pull in IO completion errors now. We are guaranteed to be running
+ * single threaded, so we don't need the lock to read b_io_error.
+ */
+ if (!bp->b_error && bp->b_io_error)
+ xfs_buf_ioerror(bp, bp->b_io_error);
+
if (!bp->b_error) {
bp->b_flags |= XBF_DONE;
@@ -1186,8 +1193,12 @@ xfs_buf_bio_end_io(
* don't overwrite existing errors - otherwise we can lose errors on
* buffers that require multiple bios to complete.
*/
- if (!bp->b_error)
- xfs_buf_ioerror(bp, error);
+ if (error) {
+ spin_lock(&bp->b_lock);
+ if (!bp->b_io_error)
+ bp->b_io_error = error;
+ spin_unlock(&bp->b_lock);
+ }
if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
@@ -1373,6 +1384,9 @@ xfs_buf_iorequest(
if (bp->b_flags & XBF_WRITE)
xfs_buf_wait_unpin(bp);
+ /* clear the internal error state to avoid spurious errors */
+ bp->b_io_error = 0;
+
/*
* Take references to the buffer. For XBF_ASYNC buffers, holding a
* reference for as long as submission takes is all that is necessary
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 4585c15..44db8cd 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -158,6 +158,7 @@ typedef struct xfs_buf {
struct list_head b_lru; /* lru list */
spinlock_t b_lock; /* internal state lock */
unsigned int b_state; /* internal state flags */
+ int b_io_error; /* internal IO error state */
wait_queue_head_t b_waiters; /* unpin waiters */
struct list_head b_list;
struct xfs_perag *b_pag; /* contains rbtree root */
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 4/9] xfs: kill xfs_bdstrat_cb
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
` (2 preceding siblings ...)
2014-08-15 6:39 ` [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-29 0:24 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 5/9] xfs: xfs_bioerror can die Dave Chinner
` (4 subsequent siblings)
8 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Only has two callers, and is just a shutdown check and error handler
around xfs_buf_iorequest. However, the error handling is a mess of
read and write semantics, and both internal callers only call it for
writes. Hence kill the wrapper, and follow up with a patch to
sanitise the error handling.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 51 ++++++++++++++++++++++++++++-----------------------
1 file changed, 28 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 58ae34c..98fcf5a 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1140,27 +1140,6 @@ xfs_bioerror_relse(
return -EIO;
}
-STATIC int
-xfs_bdstrat_cb(
- struct xfs_buf *bp)
-{
- if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
- trace_xfs_bdstrat_shut(bp, _RET_IP_);
- /*
- * Metadata write that didn't get logged but
- * written delayed anyway. These aren't associated
- * with a transaction, and can be ignored.
- */
- if (!bp->b_iodone && !XFS_BUF_ISREAD(bp))
- return xfs_bioerror_relse(bp);
- else
- return xfs_bioerror(bp);
- }
-
- xfs_buf_iorequest(bp);
- return 0;
-}
-
int
xfs_bwrite(
struct xfs_buf *bp)
@@ -1172,7 +1151,20 @@ xfs_bwrite(
bp->b_flags |= XBF_WRITE;
bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q | XBF_WRITE_FAIL);
- xfs_bdstrat_cb(bp);
+ if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
+ trace_xfs_bdstrat_shut(bp, _RET_IP_);
+
+ /*
+ * Metadata write that didn't get logged but written anyway.
+ * These aren't associated with a transaction, and can be
+ * ignored.
+ */
+ if (!bp->b_iodone)
+ return xfs_bioerror_relse(bp);
+ return xfs_bioerror(bp);
+ }
+
+ xfs_buf_iorequest(bp);
error = xfs_buf_iowait(bp);
if (error) {
@@ -1852,7 +1844,20 @@ __xfs_buf_delwri_submit(
bp->b_flags |= XBF_ASYNC;
list_del_init(&bp->b_list);
}
- xfs_bdstrat_cb(bp);
+
+ if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
+ trace_xfs_bdstrat_shut(bp, _RET_IP_);
+
+ /*
+ * Metadata write that didn't get logged but written anyway.
+ * These aren't associated with a transaction, and can be
+ * ignored.
+ */
+ if (!bp->b_iodone)
+ return xfs_bioerror_relse(bp);
+ return xfs_bioerror(bp);
+ }
+ xfs_buf_iorequest(bp);
}
blk_finish_plug(&plug);
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 5/9] xfs: xfs_bioerror can die.
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
` (3 preceding siblings ...)
2014-08-15 6:39 ` [PATCH 4/9] xfs: kill xfs_bdstrat_cb Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-15 14:35 ` Brian Foster
2014-08-29 0:28 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 6/9] xfs: kill xfs_bioerror_relse Dave Chinner
` (3 subsequent siblings)
8 siblings, 2 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Internal buffer write error handling is a mess due to the unnatural
split between xfs_bioerror and xfs_bioerror_relse(). The buffer
reference counting is also wrong for the xfs_bioerror path for
xfs_bwrite - it uses sync IO and so xfs_buf_ioend will release a
hold count that was never taken.
Further, xfs_bwrite only does sync IO and determines the handler to
call based on b_iodone, so for this caller the only difference
between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE
flag. We don't care what the XBF_DONE flag state is because we stale
the buffer in both paths - the next buffer lookup will clear
XBF_DONE anyway because XBF_STALE is set. Hence we can use common
error handling for xfs_bwrite().
__xfs_buf_delwri_submit() is a similar - it's only ever called
on writes - all sync or async - and again there's no reason to
handle them any differently at all.
Clean up the nasty error handling and remove xfs_bioerror().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 67 +++++++++++++++++---------------------------------------
1 file changed, 20 insertions(+), 47 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 98fcf5a..f444285 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1072,36 +1072,6 @@ xfs_buf_ioerror_alert(
}
/*
- * Called when we want to stop a buffer from getting written or read.
- * We attach the EIO error, muck with its flags, and call xfs_buf_ioend
- * so that the proper iodone callbacks get called.
- */
-STATIC int
-xfs_bioerror(
- xfs_buf_t *bp)
-{
-#ifdef XFSERRORDEBUG
- ASSERT(XFS_BUF_ISREAD(bp) || bp->b_iodone);
-#endif
-
- /*
- * No need to wait until the buffer is unpinned, we aren't flushing it.
- */
- xfs_buf_ioerror(bp, -EIO);
-
- /*
- * We're calling xfs_buf_ioend, so delete XBF_DONE flag.
- */
- XFS_BUF_UNREAD(bp);
- XFS_BUF_UNDONE(bp);
- xfs_buf_stale(bp);
-
- xfs_buf_ioend(bp);
-
- return -EIO;
-}
-
-/*
* Same as xfs_bioerror, except that we are releasing the buffer
* here ourselves, and avoiding the xfs_buf_ioend call.
* This is meant for userdata errors; metadata bufs come with
@@ -1149,19 +1119,19 @@ xfs_bwrite(
ASSERT(xfs_buf_islocked(bp));
bp->b_flags |= XBF_WRITE;
- bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q | XBF_WRITE_FAIL);
+ bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
+ XBF_WRITE_FAIL | XBF_DONE);
if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
trace_xfs_bdstrat_shut(bp, _RET_IP_);
- /*
- * Metadata write that didn't get logged but written anyway.
- * These aren't associated with a transaction, and can be
- * ignored.
- */
- if (!bp->b_iodone)
- return xfs_bioerror_relse(bp);
- return xfs_bioerror(bp);
+ xfs_buf_ioerror(bp, -EIO);
+ xfs_buf_stale(bp);
+
+ /* sync IO, xfs_buf_ioend is going to remove a ref here */
+ xfs_buf_hold(bp);
+ xfs_buf_ioend(bp);
+ return -EIO;
}
xfs_buf_iorequest(bp);
@@ -1848,16 +1818,19 @@ __xfs_buf_delwri_submit(
if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
trace_xfs_bdstrat_shut(bp, _RET_IP_);
+ xfs_buf_ioerror(bp, -EIO);
+ xfs_buf_stale(bp);
+
/*
- * Metadata write that didn't get logged but written anyway.
- * These aren't associated with a transaction, and can be
- * ignored.
+ * if sync IO, xfs_buf_ioend is going to remove a
+ * ref here. We need to add that so we can collect
+ * all the buffer errors in the wait loop.
*/
- if (!bp->b_iodone)
- return xfs_bioerror_relse(bp);
- return xfs_bioerror(bp);
- }
- xfs_buf_iorequest(bp);
+ if (wait)
+ xfs_buf_hold(bp);
+ xfs_buf_ioend(bp);
+ } else
+ xfs_buf_iorequest(bp);
}
blk_finish_plug(&plug);
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 6/9] xfs: kill xfs_bioerror_relse
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
` (4 preceding siblings ...)
2014-08-15 6:39 ` [PATCH 5/9] xfs: xfs_bioerror can die Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-29 0:32 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 7/9] xfs: clean up xfs_trans_buf_read_map Dave Chinner
` (2 subsequent siblings)
8 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There is only one caller now - xfs_trans_read_buf_map() - and it has
very well defined call semantics - read, synchronous, and b_iodone
is NULL. Hence it's pretty clear what error handling is necessary
for this case. The bigger problem of untangling
xfs_trans_read_buf_map error handling is left to a future patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 39 ---------------------------------------
fs/xfs/xfs_buf.h | 2 --
fs/xfs/xfs_trans_buf.c | 8 +++++---
3 files changed, 5 insertions(+), 44 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f444285..352e9219 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1071,45 +1071,6 @@ xfs_buf_ioerror_alert(
(__uint64_t)XFS_BUF_ADDR(bp), func, -bp->b_error, bp->b_length);
}
-/*
- * Same as xfs_bioerror, except that we are releasing the buffer
- * here ourselves, and avoiding the xfs_buf_ioend call.
- * This is meant for userdata errors; metadata bufs come with
- * iodone functions attached, so that we can track down errors.
- */
-int
-xfs_bioerror_relse(
- struct xfs_buf *bp)
-{
- int64_t fl = bp->b_flags;
- /*
- * No need to wait until the buffer is unpinned.
- * We aren't flushing it.
- *
- * chunkhold expects B_DONE to be set, whether
- * we actually finish the I/O or not. We don't want to
- * change that interface.
- */
- XFS_BUF_UNREAD(bp);
- XFS_BUF_DONE(bp);
- xfs_buf_stale(bp);
- bp->b_iodone = NULL;
- if (!(fl & XBF_ASYNC)) {
- /*
- * Mark b_error and B_ERROR _both_.
- * Lot's of chunkcache code assumes that.
- * There's no reason to mark error for
- * ASYNC buffers.
- */
- xfs_buf_ioerror(bp, -EIO);
- complete(&bp->b_iowait);
- } else {
- xfs_buf_relse(bp);
- }
-
- return -EIO;
-}
-
int
xfs_bwrite(
struct xfs_buf *bp)
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 44db8cd..d8f57f6 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -297,8 +297,6 @@ extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
#define xfs_buf_zero(bp, off, len) \
xfs_buf_iomove((bp), (off), (len), NULL, XBRW_ZERO)
-extern int xfs_bioerror_relse(struct xfs_buf *);
-
/* Buffer Utility Routines */
extern xfs_caddr_t xfs_buf_offset(xfs_buf_t *, size_t);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 96c898e..758c07d 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -324,11 +324,13 @@ xfs_trans_read_buf_map(
*/
if (XFS_FORCED_SHUTDOWN(mp)) {
trace_xfs_bdstrat_shut(bp, _RET_IP_);
- xfs_bioerror_relse(bp);
- } else {
- xfs_buf_iorequest(bp);
+ bp->b_flags &= ~(XBF_READ | XBF_DONE);
+ xfs_buf_ioerror(bp, -EIO);
+ xfs_buf_stale(bp);
+ return -EIO;
}
+ xfs_buf_iorequest(bp);
error = xfs_buf_iowait(bp);
if (error) {
xfs_buf_ioerror_alert(bp, __func__);
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 7/9] xfs: clean up xfs_trans_buf_read_map
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
` (5 preceding siblings ...)
2014-08-15 6:39 ` [PATCH 6/9] xfs: kill xfs_bioerror_relse Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
2014-08-15 6:39 ` [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly Dave Chinner
8 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The error handling is a mess of inconsistent spaghetti. Clean it up
like Christoph's comment from long ago said we should. While there,
get rid of the "xfs_do_error" error injection debug code. If we need
to do error injection, we can do it on demand via kprobes rather
than needing to run the kernel under a debug and poke magic
variables.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_trans_buf.c | 187 ++++++++++++++++++-------------------------------
1 file changed, 69 insertions(+), 118 deletions(-)
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 758c07d..9c3e610 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -229,13 +229,6 @@ xfs_trans_getsb(xfs_trans_t *tp,
return bp;
}
-#ifdef DEBUG
-xfs_buftarg_t *xfs_error_target;
-int xfs_do_error;
-int xfs_req_num;
-int xfs_error_mod = 33;
-#endif
-
/*
* Get and lock the buffer for the caller if it is not already
* locked within the given transaction. If it has not yet been
@@ -257,165 +250,123 @@ xfs_trans_read_buf_map(
struct xfs_buf **bpp,
const struct xfs_buf_ops *ops)
{
- xfs_buf_t *bp;
- xfs_buf_log_item_t *bip;
+ struct xfs_buf *bp = NULL;
int error;
+ bool release = true;
*bpp = NULL;
- if (!tp) {
- bp = xfs_buf_read_map(target, map, nmaps, flags, ops);
- if (!bp)
- return (flags & XBF_TRYLOCK) ?
- -EAGAIN : -ENOMEM;
-
- if (bp->b_error) {
- error = bp->b_error;
- xfs_buf_ioerror_alert(bp, __func__);
- XFS_BUF_UNDONE(bp);
- xfs_buf_stale(bp);
- xfs_buf_relse(bp);
-
- /* bad CRC means corrupted metadata */
- if (error == -EFSBADCRC)
- error = -EFSCORRUPTED;
- return error;
- }
-#ifdef DEBUG
- if (xfs_do_error) {
- if (xfs_error_target == target) {
- if (((xfs_req_num++) % xfs_error_mod) == 0) {
- xfs_buf_relse(bp);
- xfs_debug(mp, "Returning error!");
- return -EIO;
- }
- }
- }
-#endif
- if (XFS_FORCED_SHUTDOWN(mp))
- goto shutdown_abort;
- *bpp = bp;
- return 0;
- }
/*
- * If we find the buffer in the cache with this transaction
- * pointer in its b_fsprivate2 field, then we know we already
- * have it locked. If it is already read in we just increment
- * the lock recursion count and return the buffer to the caller.
- * If the buffer is not yet read in, then we read it in, increment
- * the lock recursion count, and return it to the caller.
+ * If we find the buffer in this transaction's item list, then we know
+ * we already have it locked. If it is already read in we just
+ * increment the lock recursion count and return the buffer to the
+ * caller.
*/
- bp = xfs_trans_buf_item_match(tp, target, map, nmaps);
- if (bp != NULL) {
+ if (tp)
+ bp = xfs_trans_buf_item_match(tp, target, map, nmaps);
+
+ if (bp) {
+ struct xfs_buf_log_item *bip;
+
+ /*
+ * We don't own this buffer ourselves, so we shouldn't release
+ * it if we come across any errors in processing it.
+ */
+ release = false;
+
ASSERT(xfs_buf_islocked(bp));
ASSERT(bp->b_transp == tp);
ASSERT(bp->b_fspriv != NULL);
ASSERT(!bp->b_error);
- if (!(XFS_BUF_ISDONE(bp))) {
+ if (!(bp->b_flags & XBF_DONE)) {
trace_xfs_trans_read_buf_io(bp, _RET_IP_);
- ASSERT(!XFS_BUF_ISASYNC(bp));
+ ASSERT(!(bp->b_flags & XBF_ASYNC));
ASSERT(bp->b_iodone == NULL);
- XFS_BUF_READ(bp);
+
+ bp->b_flags |= XBF_READ;
bp->b_ops = ops;
- /*
- * XXX(hch): clean up the error handling here to be less
- * of a mess..
- */
if (XFS_FORCED_SHUTDOWN(mp)) {
trace_xfs_bdstrat_shut(bp, _RET_IP_);
- bp->b_flags &= ~(XBF_READ | XBF_DONE);
- xfs_buf_ioerror(bp, -EIO);
- xfs_buf_stale(bp);
- return -EIO;
+ error = -EIO;
+ goto out_stale;
}
xfs_buf_iorequest(bp);
error = xfs_buf_iowait(bp);
if (error) {
xfs_buf_ioerror_alert(bp, __func__);
- xfs_buf_relse(bp);
- /*
- * We can gracefully recover from most read
- * errors. Ones we can't are those that happen
- * after the transaction's already dirty.
- */
- if (tp->t_flags & XFS_TRANS_DIRTY)
- xfs_force_shutdown(tp->t_mountp,
- SHUTDOWN_META_IO_ERROR);
- /* bad CRC means corrupted metadata */
- if (error == -EFSBADCRC)
- error = -EFSCORRUPTED;
- return error;
+ goto out_do_shutdown;
+
}
}
- /*
- * We never locked this buf ourselves, so we shouldn't
- * brelse it either. Just get out.
- */
+
if (XFS_FORCED_SHUTDOWN(mp)) {
trace_xfs_trans_read_buf_shut(bp, _RET_IP_);
- *bpp = NULL;
return -EIO;
}
-
+ /* good buffer! */
bip = bp->b_fspriv;
bip->bli_recur++;
-
ASSERT(atomic_read(&bip->bli_refcount) > 0);
trace_xfs_trans_read_buf_recur(bip);
*bpp = bp;
return 0;
}
+ /*
+ * Read the buffer from disk as there is no transaction context or we
+ * didn't find a matching buffer in the transaction item list.
+ */
bp = xfs_buf_read_map(target, map, nmaps, flags, ops);
- if (bp == NULL) {
- *bpp = NULL;
- return (flags & XBF_TRYLOCK) ?
- 0 : -ENOMEM;
+ if (!bp) {
+ /* XXX(dgc): should always return -EAGAIN on trylock failure */
+ if (!(flags & XBF_TRYLOCK))
+ return -ENOMEM;
+ if (!tp)
+ return -EAGAIN;
+ return 0;
}
if (bp->b_error) {
- error = bp->b_error;
- xfs_buf_stale(bp);
- XFS_BUF_DONE(bp);
xfs_buf_ioerror_alert(bp, __func__);
- if (tp->t_flags & XFS_TRANS_DIRTY)
- xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR);
- xfs_buf_relse(bp);
-
- /* bad CRC means corrupted metadata */
- if (error == -EFSBADCRC)
- error = -EFSCORRUPTED;
- return error;
+ error = bp->b_error;
+ goto out_do_shutdown;
}
-#ifdef DEBUG
- if (xfs_do_error && !(tp->t_flags & XFS_TRANS_DIRTY)) {
- if (xfs_error_target == target) {
- if (((xfs_req_num++) % xfs_error_mod) == 0) {
- xfs_force_shutdown(tp->t_mountp,
- SHUTDOWN_META_IO_ERROR);
- xfs_buf_relse(bp);
- xfs_debug(mp, "Returning trans error!");
- return -EIO;
- }
- }
+
+ if (XFS_FORCED_SHUTDOWN(mp)) {
+ trace_xfs_trans_read_buf_shut(bp, _RET_IP_);
+ error = -EIO;
+ goto out_stale;
}
-#endif
- if (XFS_FORCED_SHUTDOWN(mp))
- goto shutdown_abort;
- _xfs_trans_bjoin(tp, bp, 1);
- trace_xfs_trans_read_buf(bp->b_fspriv);
+ if (tp) {
+ _xfs_trans_bjoin(tp, bp, 1);
+ trace_xfs_trans_read_buf(bp->b_fspriv);
+ }
*bpp = bp;
return 0;
-shutdown_abort:
- trace_xfs_trans_read_buf_shut(bp, _RET_IP_);
- xfs_buf_relse(bp);
- *bpp = NULL;
- return -EIO;
+
+out_do_shutdown:
+ /*
+ * We can gracefully recover from most read errors. Ones we can't are
+ * those that happen after the transaction's already dirty.
+ */
+ if (tp && (tp->t_flags & XFS_TRANS_DIRTY))
+ xfs_force_shutdown(tp->t_mountp, SHUTDOWN_META_IO_ERROR);
+out_stale:
+ bp->b_flags &= ~(XBF_READ | XBF_DONE);
+ xfs_buf_ioerror(bp, error);
+ xfs_buf_stale(bp);
+ if (release)
+ xfs_buf_relse(bp);
+
+ /* bad CRC means corrupted metadata */
+ if (error == -EFSBADCRC)
+ error = -EFSCORRUPTED;
+ return error;
}
/*
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
` (6 preceding siblings ...)
2014-08-15 6:39 ` [PATCH 7/9] xfs: clean up xfs_trans_buf_read_map Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-15 13:10 ` Christoph Hellwig
` (2 more replies)
2014-08-15 6:39 ` [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly Dave Chinner
8 siblings, 3 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There is a lot of cookie-cutter code that looks like:
if (shutdown)
handle buffer error
xfs_buf_iorequest(bp)
error = xfs_buf_iowait(bp)
if (error)
handle buffer error
spread through XFS. There's significant complexity now in
xfs_buf_iorequest() to specifically handle this sort of synchronous
IO pattern, but there's all sorts of nasty surprises in different
error handling code dependent on who owns the buffer references and
the locks.
Pull this pattern into a single helper, where we can hide all the
synchronous IO warts and hence make the error handling for all the
callers much saner. This removes the need for a special extra
reference to protect IO completion processing, as we can now hold a
single reference across dispatch and waiting, simplifying the sync
IO smeantics and error handling.
In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
make it explicitly handle on asynchronous IO. This forces all users
to be switched specifically to one interface or the other and
removes any ambiguity between how the interfaces are to be used. It
also means that xfs_buf_iowait() goes away.
For the special case of delwri buffer submission and waiting, we
don't need to issue IO synchronously at all. The second pass to cal
xfs_buf_iowait() can now just block on xfs_buf_lock() - the buffer
will be unlocked when the async IO is complete. This formalises a
sane the method of waiting for async IO - take an extra reference,
submit the IO, call xfs_buf_lock() when you want to wait for IO
completion. i.e.:
bp = xfs_buf_find();
xfs_buf_hold(bp);
bp->b_flags |= XBF_ASYNC;
xfs_buf_iosubmit(bp);
xfs_buf_lock(bp)
error = bp->b_error;
....
xfs_buf_relse(bp);
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_bmap_util.c | 14 +---
fs/xfs/xfs_buf.c | 186 ++++++++++++++++++++++++++---------------------
fs/xfs/xfs_buf.h | 4 +-
fs/xfs/xfs_buf_item.c | 4 +-
fs/xfs/xfs_log.c | 2 +-
fs/xfs/xfs_log_recover.c | 22 ++----
fs/xfs/xfs_trans_buf.c | 17 ++---
7 files changed, 122 insertions(+), 127 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2f1e30d..c53cc03 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1157,12 +1157,7 @@ xfs_zero_remaining_bytes(
XFS_BUF_READ(bp);
XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
- if (XFS_FORCED_SHUTDOWN(mp)) {
- error = -EIO;
- break;
- }
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
+ error = xfs_buf_submit_wait(bp);
if (error) {
xfs_buf_ioerror_alert(bp,
"xfs_zero_remaining_bytes(read)");
@@ -1175,12 +1170,7 @@ xfs_zero_remaining_bytes(
XFS_BUF_UNREAD(bp);
XFS_BUF_WRITE(bp);
- if (XFS_FORCED_SHUTDOWN(mp)) {
- error = -EIO;
- break;
- }
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
+ error = xfs_buf_submit_wait(bp);
if (error) {
xfs_buf_ioerror_alert(bp,
"xfs_zero_remaining_bytes(write)");
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 352e9219..a2599f9 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -623,10 +623,11 @@ _xfs_buf_read(
bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD);
bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
- xfs_buf_iorequest(bp);
- if (flags & XBF_ASYNC)
+ if (flags & XBF_ASYNC) {
+ xfs_buf_submit(bp);
return 0;
- return xfs_buf_iowait(bp);
+ }
+ return xfs_buf_submit_wait(bp);
}
xfs_buf_t *
@@ -708,12 +709,7 @@ xfs_buf_read_uncached(
bp->b_flags |= XBF_READ;
bp->b_ops = ops;
- if (XFS_FORCED_SHUTDOWN(target->bt_mount)) {
- xfs_buf_relse(bp);
- return NULL;
- }
- xfs_buf_iorequest(bp);
- xfs_buf_iowait(bp);
+ xfs_buf_submit_wait(bp);
return bp;
}
@@ -1027,10 +1023,8 @@ xfs_buf_ioend(
(*(bp->b_iodone))(bp);
else if (bp->b_flags & XBF_ASYNC)
xfs_buf_relse(bp);
- else {
+ else
complete(&bp->b_iowait);
- xfs_buf_rele(bp);
- }
}
static void
@@ -1083,21 +1077,7 @@ xfs_bwrite(
bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
XBF_WRITE_FAIL | XBF_DONE);
- if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
- trace_xfs_bdstrat_shut(bp, _RET_IP_);
-
- xfs_buf_ioerror(bp, -EIO);
- xfs_buf_stale(bp);
-
- /* sync IO, xfs_buf_ioend is going to remove a ref here */
- xfs_buf_hold(bp);
- xfs_buf_ioend(bp);
- return -EIO;
- }
-
- xfs_buf_iorequest(bp);
-
- error = xfs_buf_iowait(bp);
+ error = xfs_buf_submit_wait(bp);
if (error) {
xfs_force_shutdown(bp->b_target->bt_mount,
SHUTDOWN_META_IO_ERROR);
@@ -1206,7 +1186,7 @@ next_chunk:
} else {
/*
* This is guaranteed not to be the last io reference count
- * because the caller (xfs_buf_iorequest) holds a count itself.
+ * because the caller (xfs_buf_submit) holds a count itself.
*/
atomic_dec(&bp->b_io_remaining);
xfs_buf_ioerror(bp, -EIO);
@@ -1296,13 +1276,29 @@ _xfs_buf_ioapply(
blk_finish_plug(&plug);
}
+/*
+ * Asynchronous IO submission path. This transfers the buffer lock ownership and
+ * the current reference to the IO. It is not safe to reference the buffer after
+ * a call to this function unless the caller holds an additional reference
+ * itself.
+ */
void
-xfs_buf_iorequest(
- xfs_buf_t *bp)
+xfs_buf_submit(
+ struct xfs_buf *bp)
{
trace_xfs_buf_iorequest(bp, _RET_IP_);
ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+ ASSERT(bp->b_flags & XBF_ASYNC);
+
+ /* on shutdown we stale and complete the buffer immediately */
+ if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
+ xfs_buf_ioerror(bp, -EIO);
+ bp->b_flags &= ~XBF_DONE;
+ xfs_buf_stale(bp);
+ xfs_buf_ioend(bp);
+ return;
+ }
if (bp->b_flags & XBF_WRITE)
xfs_buf_wait_unpin(bp);
@@ -1311,25 +1307,10 @@ xfs_buf_iorequest(
bp->b_io_error = 0;
/*
- * Take references to the buffer. For XBF_ASYNC buffers, holding a
- * reference for as long as submission takes is all that is necessary
- * here. The IO inherits the lock and hold count from the submitter,
- * and these are release during IO completion processing. Taking a hold
- * over submission ensures that the buffer is not freed until we have
- * completed all processing, regardless of when IO errors occur or are
- * reported.
- *
- * However, for synchronous IO, the IO does not inherit the submitters
- * reference count, nor the buffer lock. Hence we need to take an extra
- * reference to the buffer for the for the IO context so that we can
- * guarantee the buffer is not freed until all IO completion processing
- * is done. Otherwise the caller can drop their reference while the IO
- * is still in progress and hence trigger a use-after-free situation.
+ * Take an extra reference so that we don't have to guess when it's no
+ * longer safe to reference the buffer that was passed to us.
*/
xfs_buf_hold(bp);
- if (!(bp->b_flags & XBF_ASYNC))
- xfs_buf_hold(bp);
-
/*
* Set the count to 1 initially, this will stop an I/O completion
@@ -1340,14 +1321,13 @@ xfs_buf_iorequest(
_xfs_buf_ioapply(bp);
/*
- * If _xfs_buf_ioapply failed or we are doing synchronous IO that
- * completes extremely quickly, we can get back here with only the IO
+ * If _xfs_buf_ioapply failed,
+ * we can get back here with only the IO
* reference we took above. _xfs_buf_ioend will drop it to zero, so
* we'd better run completion processing synchronously so that the we
- * don't return to the caller with completion still pending. In the
- * error case, this allows the caller to check b_error safely without
- * waiting, and in the synchronous IO case it avoids unnecessary context
- * switches an latency for high-peformance devices.
+ * don't return to the caller with completion still pending.
+ * this allows the caller to check b_error safely without
+ * waiting
*/
if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
@@ -1357,25 +1337,70 @@ xfs_buf_iorequest(
}
xfs_buf_rele(bp);
+ /* Note: it is not safe to reference bp now we've dropped our ref */
}
/*
- * Waits for I/O to complete on the buffer supplied. It returns immediately if
- * no I/O is pending or there is already a pending error on the buffer, in which
- * case nothing will ever complete. It returns the I/O error code, if any, or
- * 0 if there was no error.
+ * Synchronous buffer IO submission path, read or write.
*/
int
-xfs_buf_iowait(
- xfs_buf_t *bp)
+xfs_buf_submit_wait(
+ struct xfs_buf *bp)
{
- trace_xfs_buf_iowait(bp, _RET_IP_);
+ int error;
+
+ trace_xfs_buf_iorequest(bp, _RET_IP_);
- if (!bp->b_error)
- wait_for_completion(&bp->b_iowait);
+ ASSERT(!(bp->b_flags & (_XBF_DELWRI_Q | XBF_ASYNC)));
+
+ if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
+ xfs_buf_ioerror(bp, -EIO);
+ xfs_buf_stale(bp);
+ bp->b_flags &= ~XBF_DONE;
+ return -EIO;
+ }
+ if (bp->b_flags & XBF_WRITE)
+ xfs_buf_wait_unpin(bp);
+
+ /* clear the internal error state to avoid spurious errors */
+ bp->b_io_error = 0;
+
+ /*
+ * For synchronous IO, the IO does not inherit the submitters reference
+ * count, nor the buffer lock. Hence we cannot release the reference we
+ * are about to take until we've waited for all IO completion to occur,
+ * including any xfs_buf_ioend_async() work that may be pending.
+ */
+ xfs_buf_hold(bp);
+
+ /*
+ * Set the count to 1 initially, this will stop an I/O completion
+ * callout which happens before we have started all the I/O from calling
+ * xfs_buf_ioend too early.
+ */
+ atomic_set(&bp->b_io_remaining, 1);
+ _xfs_buf_ioapply(bp);
+
+ /*
+ * make sure we run completion synchronously if it raced with us and is
+ * already complete.
+ */
+ if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
+ xfs_buf_ioend(bp);
+
+ /* wait for completion before gathering the error from the buffer */
+ trace_xfs_buf_iowait(bp, _RET_IP_);
+ wait_for_completion(&bp->b_iowait);
trace_xfs_buf_iowait_done(bp, _RET_IP_);
- return bp->b_error;
+ error = bp->b_error;
+
+ /*
+ * all done now, we can release the hold that keeps the buffer
+ * referenced for the entire IO.
+ */
+ xfs_buf_rele(bp);
+ return error;
}
xfs_caddr_t
@@ -1769,29 +1794,19 @@ __xfs_buf_delwri_submit(
blk_start_plug(&plug);
list_for_each_entry_safe(bp, n, io_list, b_list) {
bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL);
- bp->b_flags |= XBF_WRITE;
+ bp->b_flags |= XBF_WRITE | XBF_ASYNC;
- if (!wait) {
- bp->b_flags |= XBF_ASYNC;
+ /*
+ * we do all Io submission async. This means if we need to wait
+ * for IO completion we need to take an extra reference so the
+ * buffer is still valid on the other side.
+ */
+ if (!wait)
list_del_init(&bp->b_list);
- }
-
- if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
- trace_xfs_bdstrat_shut(bp, _RET_IP_);
-
- xfs_buf_ioerror(bp, -EIO);
- xfs_buf_stale(bp);
+ else
+ xfs_buf_hold(bp);
- /*
- * if sync IO, xfs_buf_ioend is going to remove a
- * ref here. We need to add that so we can collect
- * all the buffer errors in the wait loop.
- */
- if (wait)
- xfs_buf_hold(bp);
- xfs_buf_ioend(bp);
- } else
- xfs_buf_iorequest(bp);
+ xfs_buf_submit(bp);
}
blk_finish_plug(&plug);
@@ -1838,7 +1853,10 @@ xfs_buf_delwri_submit(
bp = list_first_entry(&io_list, struct xfs_buf, b_list);
list_del_init(&bp->b_list);
- error2 = xfs_buf_iowait(bp);
+
+ /* locking the buffer will wait for async IO completion. */
+ xfs_buf_lock(bp);
+ error2 = bp->b_error;
xfs_buf_relse(bp);
if (!error)
error = error2;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index d8f57f6..0acfc30 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -290,8 +290,8 @@ extern int xfs_bwrite(struct xfs_buf *bp);
extern void xfs_buf_ioend(struct xfs_buf *bp);
extern void xfs_buf_ioerror(xfs_buf_t *, int);
extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
-extern void xfs_buf_iorequest(xfs_buf_t *);
-extern int xfs_buf_iowait(xfs_buf_t *);
+extern void xfs_buf_submit(struct xfs_buf *bp);
+extern int xfs_buf_submit_wait(struct xfs_buf *bp);
extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
xfs_buf_rw_t);
#define xfs_buf_zero(bp, off, len) \
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 4fd41b5..cbea909 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1081,7 +1081,7 @@ xfs_buf_iodone_callbacks(
* a way to shut the filesystem down if the writes keep failing.
*
* In practice we'll shut the filesystem down soon as non-transient
- * erorrs tend to affect the whole device and a failing log write
+ * errors tend to affect the whole device and a failing log write
* will make us give up. But we really ought to do better here.
*/
if (XFS_BUF_ISASYNC(bp)) {
@@ -1094,7 +1094,7 @@ xfs_buf_iodone_callbacks(
if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
bp->b_flags |= XBF_WRITE | XBF_ASYNC |
XBF_DONE | XBF_WRITE_FAIL;
- xfs_buf_iorequest(bp);
+ xfs_buf_submit(bp);
} else {
xfs_buf_relse(bp);
}
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e4665db..c4d7f79 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1699,7 +1699,7 @@ xlog_bdstrat(
return 0;
}
- xfs_buf_iorequest(bp);
+ xfs_buf_submit(bp);
return 0;
}
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 4ba19bf..1e14452 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -193,12 +193,8 @@ xlog_bread_noalign(
bp->b_io_length = nbblks;
bp->b_error = 0;
- if (XFS_FORCED_SHUTDOWN(log->l_mp))
- return -EIO;
-
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
- if (error)
+ error = xfs_buf_submit_wait(bp);
+ if (error && !XFS_FORCED_SHUTDOWN(log->l_mp))
xfs_buf_ioerror_alert(bp, __func__);
return error;
}
@@ -4427,16 +4423,12 @@ xlog_do_recover(
XFS_BUF_UNASYNC(bp);
bp->b_ops = &xfs_sb_buf_ops;
- if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
- xfs_buf_relse(bp);
- return -EIO;
- }
-
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
+ error = xfs_buf_submit_wait(bp);
if (error) {
- xfs_buf_ioerror_alert(bp, __func__);
- ASSERT(0);
+ if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
+ xfs_buf_ioerror_alert(bp, __func__);
+ ASSERT(0);
+ }
xfs_buf_relse(bp);
return error;
}
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 9c3e610..4bdf02c 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -286,18 +286,13 @@ xfs_trans_read_buf_map(
bp->b_flags |= XBF_READ;
bp->b_ops = ops;
- if (XFS_FORCED_SHUTDOWN(mp)) {
- trace_xfs_bdstrat_shut(bp, _RET_IP_);
- error = -EIO;
- goto out_stale;
- }
-
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
+ error = xfs_buf_submit_wait(bp);
if (error) {
- xfs_buf_ioerror_alert(bp, __func__);
- goto out_do_shutdown;
-
+ if (!XFS_FORCED_SHUTDOWN(mp)) {
+ xfs_buf_ioerror_alert(bp, __func__);
+ goto out_do_shutdown;
+ }
+ goto out_stale;
}
}
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
` (7 preceding siblings ...)
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
@ 2014-08-15 6:39 ` Dave Chinner
2014-08-15 12:56 ` Christoph Hellwig
8 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 6:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
xfs_buf_read_uncached() has two failure modes. If can either return
NULL or bp->b_error != 0 depending on the type of failure, and not
all callers check for both. Fix it up.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_mount.c | 55 ++++++++++++++++++++++++++++++------------------------
1 file changed, 31 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d0ef01e..10a365a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -304,13 +304,8 @@ xfs_readsb(
reread:
bp = xfs_buf_read_uncached(mp->m_ddev_targp, XFS_SB_DADDR,
BTOBB(sector_size), 0, buf_ops);
- if (!bp) {
- if (loud)
- xfs_warn(mp, "SB buffer read failed");
- return -EIO;
- }
- if (bp->b_error) {
- error = bp->b_error;
+ if (!bp || bp->b_error) {
+ error = bp ? bp->b_error : -ENOMEM;
if (loud)
xfs_warn(mp, "SB validate failed with error %d.", error);
/* bad CRC means corrupted metadata */
@@ -368,7 +363,8 @@ reread:
return 0;
release_buf:
- xfs_buf_relse(bp);
+ if (bp)
+ xfs_buf_relse(bp);
return error;
}
@@ -546,10 +542,12 @@ xfs_set_inoalignment(xfs_mount_t *mp)
* Check that the data (and log if separate) is an ok size.
*/
STATIC int
-xfs_check_sizes(xfs_mount_t *mp)
+xfs_check_sizes(
+ struct xfs_mount *mp)
{
- xfs_buf_t *bp;
+ struct xfs_buf *bp;
xfs_daddr_t d;
+ int error;
d = (xfs_daddr_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
if (XFS_BB_TO_FSB(mp, d) != mp->m_sb.sb_dblocks) {
@@ -559,28 +557,37 @@ xfs_check_sizes(xfs_mount_t *mp)
bp = xfs_buf_read_uncached(mp->m_ddev_targp,
d - XFS_FSS_TO_BB(mp, 1),
XFS_FSS_TO_BB(mp, 1), 0, NULL);
- if (!bp) {
+ if (!bp || bp->b_error) {
xfs_warn(mp, "last sector read failed");
- return -EIO;
+ goto out_fail;
}
xfs_buf_relse(bp);
- if (mp->m_logdev_targp != mp->m_ddev_targp) {
- d = (xfs_daddr_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
- if (XFS_BB_TO_FSB(mp, d) != mp->m_sb.sb_logblocks) {
- xfs_warn(mp, "log size mismatch detected");
- return -EFBIG;
- }
- bp = xfs_buf_read_uncached(mp->m_logdev_targp,
+ if (mp->m_logdev_targp == mp->m_ddev_targp)
+ return 0;
+
+ d = (xfs_daddr_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
+ if (XFS_BB_TO_FSB(mp, d) != mp->m_sb.sb_logblocks) {
+ xfs_warn(mp, "log size mismatch detected");
+ return -EFBIG;
+ }
+ bp = xfs_buf_read_uncached(mp->m_logdev_targp,
d - XFS_FSB_TO_BB(mp, 1),
XFS_FSB_TO_BB(mp, 1), 0, NULL);
- if (!bp) {
- xfs_warn(mp, "log device read failed");
- return -EIO;
- }
- xfs_buf_relse(bp);
+ if (!bp || bp->b_error) {
+ xfs_warn(mp, "log device read failed");
+ goto out_fail;
}
+ xfs_buf_relse(bp);
return 0;
+
+out_fail:
+ if (!bp)
+ return -EIO;
+ error = bp->b_error;
+ xfs_buf_relse(bp);
+ return error;
+
}
/*
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly
2014-08-15 6:39 ` [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly Dave Chinner
@ 2014-08-15 12:56 ` Christoph Hellwig
2014-08-15 23:58 ` Dave Chinner
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-15 12:56 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:07PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> xfs_buf_read_uncached() has two failure modes. If can either return
> NULL or bp->b_error != 0 depending on the type of failure, and not
> all callers check for both. Fix it up.
I'd rather get rid of these annoying calling conventions. Make it
return and errno, and the bp in a pointer argument, with the bp
never non-NULL in case of error.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
@ 2014-08-15 13:10 ` Christoph Hellwig
2014-08-15 23:37 ` Dave Chinner
2014-08-15 14:35 ` Brian Foster
2014-08-15 16:13 ` Brian Foster
2 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-15 13:10 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2f1e30d..c53cc03 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1157,12 +1157,7 @@ xfs_zero_remaining_bytes(
xfs_zero_remaining_bytes really should be switched to
xfs_buf_read_uncached + xfs_bwrite instead of all this mess just to
micro-optimize a few memory allocation away that don't even happen for
the common case of blocksize == PAGE_SIZE. I'm not even sure we
should be using the buffer cache at all for something inside a regular
file, but that's a discussion for another time.
> void
> +xfs_buf_submit(
> + struct xfs_buf *bp)
> {
> trace_xfs_buf_iorequest(bp, _RET_IP_);
I suspect these two should have properly name and separate trace
points now?
It also seems like a lot of the guts of the two functions are
still the same, so factoring that into a __xfs_buf_submit helper
would be useful.
> + * If _xfs_buf_ioapply failed,
> + * we can get back here with only the IO
> * reference we took above. _xfs_buf_ioend will drop it to zero, so
> * we'd better run completion processing synchronously so that the we
> + * don't return to the caller with completion still pending.
> + * this allows the caller to check b_error safely without
> + * waiting
> */
> if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
I don't think the !ASYNC case can happen here, can it?
> + if (!wait)
> list_del_init(&bp->b_list);
> + else
> + xfs_buf_hold(bp);
Maybe switch this around to avoid the negated condition in the if else?
Also might this be worth a change of it's own?
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -286,18 +286,13 @@ xfs_trans_read_buf_map(
> bp->b_flags |= XBF_READ;
> bp->b_ops = ops;
>
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> + if (!XFS_FORCED_SHUTDOWN(mp)) {
> + xfs_buf_ioerror_alert(bp, __func__);
> + goto out_do_shutdown;
> + }
> + goto out_stale;
Again maybe reverse the conditions here for a cleaner flow?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/9] xfs: synchronous buffer IO needs a reference
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
@ 2014-08-15 13:18 ` Brian Foster
2014-08-15 23:17 ` Dave Chinner
2014-08-29 0:18 ` Christoph Hellwig
1 sibling, 1 reply; 43+ messages in thread
From: Brian Foster @ 2014-08-15 13:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:38:59PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When synchronous IO runs IO completion work, it does so without an
> IO reference or a hold reference on the buffer. The IO "hold
> reference" is owned by the submitter, and released when the
> submission is complete. The IO reference is released when both the
> submitter and the bio end_io processing is run, and so if the io
> completion work is run from IO completion context, it is run without
> an IO reference.
>
> Hence we can get the situation where the submitter can submit the
> IO, see an error on the buffer and unlock and free the buffer while
> there is still IO in progress. This leads to use-after-free and
> memory corruption.
>
> Fix this by taking a "sync IO hold" reference that is owned by the
> IO and not released until after the buffer completion calls are run
> to wake up synchronous waiters. This means that the buffer will not
> be freed in any circumstance until all IO processing is completed.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_buf.c | 46 ++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 38 insertions(+), 8 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index cd7b8ca..5d86bbd 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1019,6 +1019,7 @@ xfs_buf_iodone_work(
> else {
> ASSERT(read && bp->b_ops);
> complete(&bp->b_iowait);
> + xfs_buf_rele(bp);
/* !XBF_ASYNC ref */
> }
> }
>
> @@ -1044,6 +1045,7 @@ xfs_buf_ioend(
> } else {
> bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
> complete(&bp->b_iowait);
> + xfs_buf_rele(bp);
/* !XBF_ASYNC ref */
> }
> }
>
> @@ -1383,22 +1385,50 @@ xfs_buf_iorequest(
>
> if (bp->b_flags & XBF_WRITE)
> xfs_buf_wait_unpin(bp);
> +
> + /*
> + * Take references to the buffer. For XBF_ASYNC buffers, holding a
> + * reference for as long as submission takes is all that is necessary
> + * here. The IO inherits the lock and hold count from the submitter,
> + * and these are release during IO completion processing. Taking a hold
> + * over submission ensures that the buffer is not freed until we have
> + * completed all processing, regardless of when IO errors occur or are
> + * reported.
> + *
> + * However, for synchronous IO, the IO does not inherit the submitters
> + * reference count, nor the buffer lock. Hence we need to take an extra
> + * reference to the buffer for the for the IO context so that we can
> + * guarantee the buffer is not freed until all IO completion processing
> + * is done. Otherwise the caller can drop their reference while the IO
> + * is still in progress and hence trigger a use-after-free situation.
> + */
> xfs_buf_hold(bp);
> + if (!(bp->b_flags & XBF_ASYNC))
> + xfs_buf_hold(bp);
> +
>
> /*
> - * Set the count to 1 initially, this will stop an I/O
> - * completion callout which happens before we have started
> - * all the I/O from calling xfs_buf_ioend too early.
> + * Set the count to 1 initially, this will stop an I/O completion
> + * callout which happens before we have started all the I/O from calling
> + * xfs_buf_ioend too early.
> */
> atomic_set(&bp->b_io_remaining, 1);
> _xfs_buf_ioapply(bp);
> +
> /*
> - * If _xfs_buf_ioapply failed, we'll get back here with
> - * only the reference we took above. _xfs_buf_ioend will
> - * drop it to zero, so we'd better not queue it for later,
> - * or we'll free it before it's done.
> + * If _xfs_buf_ioapply failed or we are doing synchronous IO that
> + * completes extremely quickly, we can get back here with only the IO
> + * reference we took above. _xfs_buf_ioend will drop it to zero, so
> + * we'd better run completion processing synchronously so that the we
> + * don't return to the caller with completion still pending. In the
> + * error case, this allows the caller to check b_error safely without
> + * waiting, and in the synchronous IO case it avoids unnecessary context
> + * switches an latency for high-peformance devices.
> */
AFAICT there is no real wait if the buf has completed at this point. The
wait just decrements the completion counter. So what's the benefit of
"not waiting?" Where is the potential context switch? Are you referring
to the case where error is set but I/O is not complete? Are you saying
the advantage to the caller is it doesn't have to care about the state
of further I/O once it has been determined at least one error has
occurred? (If so, who cares about latency given that some operation that
depends on this I/O is already doomed to fail?).
The code looks fine, but I'm trying to understand the reasoning better
(and I suspect we can clarify the comment).
> - _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> + _xfs_buf_ioend(bp, 0);
> + else
> + _xfs_buf_ioend(bp, 1);
Not related to this patch, but it seems like the problem this code tries
to address is still possible. Perhaps this papers over a particular
instance. Consider the case where an I/O fails immediately after this
call completes, but not before. We have an extra reference now for
completion, but we can still return to the caller with completion
pending. I suppose its fine if we consider the "problem" to be that the
reference goes away underneath the completion, as opposed to the caller
caring about the status of completion.
Brian
>
> xfs_buf_rele(bp);
> }
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
2014-08-15 6:39 ` [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality Dave Chinner
@ 2014-08-15 13:18 ` Brian Foster
2014-08-15 23:21 ` Dave Chinner
2014-08-29 0:22 ` Christoph Hellwig
1 sibling, 1 reply; 43+ messages in thread
From: Brian Foster @ 2014-08-15 13:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:00PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> We do some work in xfs_buf_ioend, and some work in
> xfs_buf_iodone_work, but much of that functionality is the same.
> This work can all be done in a single function, leaving
> xfs_buf_iodone just a wrapper to determine if we should execute it
> by workqueue or directly. hence rename xfs_buf_iodone_work to
> xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
> need async processing.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_buf.c | 79 +++++++++++++++++++++---------------------------
> fs/xfs/xfs_buf.h | 2 +-
> fs/xfs/xfs_buf_item.c | 4 +--
> fs/xfs/xfs_inode.c | 2 +-
> fs/xfs/xfs_log.c | 2 +-
> fs/xfs/xfs_log_recover.c | 2 +-
> 6 files changed, 40 insertions(+), 51 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 5d86bbd..1b7f0bc 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -999,54 +999,49 @@ xfs_buf_wait_unpin(
> */
>
> STATIC void
> -xfs_buf_iodone_work(
> - struct work_struct *work)
> +xfs_buf_ioend(
> + struct xfs_buf *bp)
Compile failure here due to STATIC.
> {
> - struct xfs_buf *bp =
> - container_of(work, xfs_buf_t, b_iodone_work);
> - bool read = !!(bp->b_flags & XBF_READ);
> + bool read = !!(bp->b_flags & XBF_READ);
> +
> + trace_xfs_buf_iodone(bp, _RET_IP_);
>
> bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
>
> - /* only validate buffers that were read without errors */
> - if (read && bp->b_ops && !bp->b_error && (bp->b_flags & XBF_DONE))
> - bp->b_ops->verify_read(bp);
> + if (!bp->b_error) {
> + bp->b_flags |= XBF_DONE;
> +
> + /* only validate buffers that were read without errors */
> + if (read && bp->b_ops)
> + bp->b_ops->verify_read(bp);
> + }
Probably not a cause of errors, but this code is now executed twice for
I/O with b_iodone callbacks. Once for the initial call from bio_end_io,
again from the callback via the b_iodone handler. The flags bits are
probably fine, but we don't want to be running the verifiers multiple
times unnecessarily.
>
> if (bp->b_iodone)
> (*(bp->b_iodone))(bp);
> else if (bp->b_flags & XBF_ASYNC)
> xfs_buf_relse(bp);
> else {
> - ASSERT(read && bp->b_ops);
> complete(&bp->b_iowait);
> xfs_buf_rele(bp);
> }
> }
>
> -void
> -xfs_buf_ioend(
> - struct xfs_buf *bp,
> - int schedule)
> +static void
> +xfs_buf_ioend_work(
> + struct work_struct *work)
> {
> - bool read = !!(bp->b_flags & XBF_READ);
> -
> - trace_xfs_buf_iodone(bp, _RET_IP_);
> + struct xfs_buf *bp =
> + container_of(work, xfs_buf_t, b_iodone_work);
>
> - if (bp->b_error == 0)
> - bp->b_flags |= XBF_DONE;
> + xfs_buf_ioend(bp);
> +}
>
> - if (bp->b_iodone || (read && bp->b_ops) || (bp->b_flags & XBF_ASYNC)) {
> - if (schedule) {
> - INIT_WORK(&bp->b_iodone_work, xfs_buf_iodone_work);
> - queue_work(xfslogd_workqueue, &bp->b_iodone_work);
> - } else {
> - xfs_buf_iodone_work(&bp->b_iodone_work);
> - }
> - } else {
> - bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
> - complete(&bp->b_iowait);
> - xfs_buf_rele(bp);
> - }
> +void
> +xfs_buf_ioend_async(
> + struct xfs_buf *bp)
> +{
> + INIT_WORK(&bp->b_iodone_work, xfs_buf_ioend_work);
> + queue_work(xfslogd_workqueue, &bp->b_iodone_work);
> }
>
> void
> @@ -1094,7 +1089,7 @@ xfs_bioerror(
> XFS_BUF_UNDONE(bp);
> xfs_buf_stale(bp);
>
> - xfs_buf_ioend(bp, 0);
> + xfs_buf_ioend(bp);
>
> return -EIO;
> }
> @@ -1181,15 +1176,6 @@ xfs_bwrite(
> }
>
> STATIC void
> -_xfs_buf_ioend(
> - xfs_buf_t *bp,
> - int schedule)
> -{
> - if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> - xfs_buf_ioend(bp, schedule);
> -}
> -
> -STATIC void
> xfs_buf_bio_end_io(
> struct bio *bio,
> int error)
> @@ -1206,7 +1192,8 @@ xfs_buf_bio_end_io(
> if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
> invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
>
> - _xfs_buf_ioend(bp, 1);
> + if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> + xfs_buf_ioend_async(bp);
> bio_put(bio);
> }
>
> @@ -1425,10 +1412,12 @@ xfs_buf_iorequest(
> * waiting, and in the synchronous IO case it avoids unnecessary context
> * switches an latency for high-peformance devices.
> */
> - if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> - _xfs_buf_ioend(bp, 0);
> - else
> - _xfs_buf_ioend(bp, 1);
> + if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> + xfs_buf_ioend(bp);
> + else
> + xfs_buf_ioend_async(bp);
> + }
This looks cleaner, but the comment is out of whack at this point.
Brian
>
> xfs_buf_rele(bp);
> }
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index c753183..4585c15 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -286,7 +286,7 @@ extern void xfs_buf_unlock(xfs_buf_t *);
>
> /* Buffer Read and Write Routines */
> extern int xfs_bwrite(struct xfs_buf *bp);
> -extern void xfs_buf_ioend(xfs_buf_t *, int);
> +extern void xfs_buf_ioend(struct xfs_buf *bp);
> extern void xfs_buf_ioerror(xfs_buf_t *, int);
> extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
> extern void xfs_buf_iorequest(xfs_buf_t *);
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 76007de..4fd41b5 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -491,7 +491,7 @@ xfs_buf_item_unpin(
> xfs_buf_ioerror(bp, -EIO);
> XFS_BUF_UNDONE(bp);
> xfs_buf_stale(bp);
> - xfs_buf_ioend(bp, 0);
> + xfs_buf_ioend(bp);
> }
> }
>
> @@ -1115,7 +1115,7 @@ do_callbacks:
> xfs_buf_do_callbacks(bp);
> bp->b_fspriv = NULL;
> bp->b_iodone = NULL;
> - xfs_buf_ioend(bp, 0);
> + xfs_buf_ioend(bp);
> }
>
> /*
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index fea3c92..00d210b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3056,7 +3056,7 @@ cluster_corrupt_out:
> XFS_BUF_UNDONE(bp);
> xfs_buf_stale(bp);
> xfs_buf_ioerror(bp, -EIO);
> - xfs_buf_ioend(bp, 0);
> + xfs_buf_ioend(bp);
> } else {
> xfs_buf_stale(bp);
> xfs_buf_relse(bp);
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8eaa8f5..e4665db 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1689,7 +1689,7 @@ xlog_bdstrat(
> if (iclog->ic_state & XLOG_STATE_IOERROR) {
> xfs_buf_ioerror(bp, -EIO);
> xfs_buf_stale(bp);
> - xfs_buf_ioend(bp, 0);
> + xfs_buf_ioend(bp);
> /*
> * It would seem logical to return EIO here, but we rely on
> * the log state machine to propagate I/O errors instead of
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 1fd5787..4ba19bf 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -383,7 +383,7 @@ xlog_recover_iodone(
> SHUTDOWN_META_IO_ERROR);
> }
> bp->b_iodone = NULL;
> - xfs_buf_ioend(bp, 0);
> + xfs_buf_ioend(bp);
> }
>
> /*
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling
2014-08-15 6:39 ` [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling Dave Chinner
@ 2014-08-15 13:18 ` Brian Foster
2014-08-15 23:25 ` Dave Chinner
2014-08-29 0:23 ` Christoph Hellwig
1 sibling, 1 reply; 43+ messages in thread
From: Brian Foster @ 2014-08-15 13:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:01PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Currently the report of a bio error from completion
> immediately marks the buffer with an error. The issue is that this
> is racy w.r.t. synchronous IO - the submitter can see b_error being
> set before the IO is complete, and hence we cannot differentiate
> between submission failures and completion failures.
>
> Add an internal b_io_error field protected by the b_lock to catch IO
> completion errors, and only propagate that to the buffer during
> final IO completion handling. Hence we can tell in xfs_buf_iorequest
> if we've had a submission failure bey checking bp->b_error before
> dropping our b_io_remaining reference - that reference will prevent
> b_io_error values from being propagated to b_error in the event that
> completion races with submission.
>
> In doing so, allow xfs_buf_iorequest to return an error. That way,
> the caller can check for submission errors safely if required, and
> easily distinguish them from completion errors that come from
> xfs_buf_iowait().
>
I don't see any changes to xfs_buf_iorequest() at all in this one. That
aside, it looks fine. It's not how I was thinking about it, but a clear
separation of submission error regardless.
Brian
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_buf.c | 18 ++++++++++++++++--
> fs/xfs/xfs_buf.h | 1 +
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 1b7f0bc..58ae34c 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1008,6 +1008,13 @@ xfs_buf_ioend(
>
> bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
>
> + /*
> + * Pull in IO completion errors now. We are guaranteed to be running
> + * single threaded, so we don't need the lock to read b_io_error.
> + */
> + if (!bp->b_error && bp->b_io_error)
> + xfs_buf_ioerror(bp, bp->b_io_error);
> +
> if (!bp->b_error) {
> bp->b_flags |= XBF_DONE;
>
> @@ -1186,8 +1193,12 @@ xfs_buf_bio_end_io(
> * don't overwrite existing errors - otherwise we can lose errors on
> * buffers that require multiple bios to complete.
> */
> - if (!bp->b_error)
> - xfs_buf_ioerror(bp, error);
> + if (error) {
> + spin_lock(&bp->b_lock);
> + if (!bp->b_io_error)
> + bp->b_io_error = error;
> + spin_unlock(&bp->b_lock);
> + }
>
> if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
> invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
> @@ -1373,6 +1384,9 @@ xfs_buf_iorequest(
> if (bp->b_flags & XBF_WRITE)
> xfs_buf_wait_unpin(bp);
>
> + /* clear the internal error state to avoid spurious errors */
> + bp->b_io_error = 0;
> +
> /*
> * Take references to the buffer. For XBF_ASYNC buffers, holding a
> * reference for as long as submission takes is all that is necessary
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 4585c15..44db8cd 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -158,6 +158,7 @@ typedef struct xfs_buf {
> struct list_head b_lru; /* lru list */
> spinlock_t b_lock; /* internal state lock */
> unsigned int b_state; /* internal state flags */
> + int b_io_error; /* internal IO error state */
> wait_queue_head_t b_waiters; /* unpin waiters */
> struct list_head b_list;
> struct xfs_perag *b_pag; /* contains rbtree root */
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] xfs: xfs_bioerror can die.
2014-08-15 6:39 ` [PATCH 5/9] xfs: xfs_bioerror can die Dave Chinner
@ 2014-08-15 14:35 ` Brian Foster
2014-08-15 23:27 ` Dave Chinner
2014-08-29 0:28 ` Christoph Hellwig
1 sibling, 1 reply; 43+ messages in thread
From: Brian Foster @ 2014-08-15 14:35 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:03PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Internal buffer write error handling is a mess due to the unnatural
> split between xfs_bioerror and xfs_bioerror_relse(). The buffer
> reference counting is also wrong for the xfs_bioerror path for
> xfs_bwrite - it uses sync IO and so xfs_buf_ioend will release a
> hold count that was never taken.
>
> Further, xfs_bwrite only does sync IO and determines the handler to
> call based on b_iodone, so for this caller the only difference
> between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE
> flag. We don't care what the XBF_DONE flag state is because we stale
> the buffer in both paths - the next buffer lookup will clear
> XBF_DONE anyway because XBF_STALE is set. Hence we can use common
> error handling for xfs_bwrite().
>
> __xfs_buf_delwri_submit() is a similar - it's only ever called
> on writes - all sync or async - and again there's no reason to
> handle them any differently at all.
>
> Clean up the nasty error handling and remove xfs_bioerror().
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_buf.c | 67 +++++++++++++++++---------------------------------------
> 1 file changed, 20 insertions(+), 47 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 98fcf5a..f444285 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1072,36 +1072,6 @@ xfs_buf_ioerror_alert(
> }
>
> /*
> - * Called when we want to stop a buffer from getting written or read.
> - * We attach the EIO error, muck with its flags, and call xfs_buf_ioend
> - * so that the proper iodone callbacks get called.
> - */
> -STATIC int
> -xfs_bioerror(
> - xfs_buf_t *bp)
> -{
> -#ifdef XFSERRORDEBUG
> - ASSERT(XFS_BUF_ISREAD(bp) || bp->b_iodone);
> -#endif
> -
> - /*
> - * No need to wait until the buffer is unpinned, we aren't flushing it.
> - */
> - xfs_buf_ioerror(bp, -EIO);
> -
> - /*
> - * We're calling xfs_buf_ioend, so delete XBF_DONE flag.
> - */
> - XFS_BUF_UNREAD(bp);
> - XFS_BUF_UNDONE(bp);
> - xfs_buf_stale(bp);
> -
> - xfs_buf_ioend(bp);
> -
> - return -EIO;
> -}
> -
> -/*
> * Same as xfs_bioerror, except that we are releasing the buffer
> * here ourselves, and avoiding the xfs_buf_ioend call.
> * This is meant for userdata errors; metadata bufs come with
> @@ -1149,19 +1119,19 @@ xfs_bwrite(
> ASSERT(xfs_buf_islocked(bp));
>
> bp->b_flags |= XBF_WRITE;
> - bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q | XBF_WRITE_FAIL);
> + bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
> + XBF_WRITE_FAIL | XBF_DONE);
>
> if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> trace_xfs_bdstrat_shut(bp, _RET_IP_);
>
> - /*
> - * Metadata write that didn't get logged but written anyway.
> - * These aren't associated with a transaction, and can be
> - * ignored.
> - */
> - if (!bp->b_iodone)
> - return xfs_bioerror_relse(bp);
> - return xfs_bioerror(bp);
> + xfs_buf_ioerror(bp, -EIO);
> + xfs_buf_stale(bp);
> +
> + /* sync IO, xfs_buf_ioend is going to remove a ref here */
> + xfs_buf_hold(bp);
Looks correct, but that's kind of ugly. The reference serves no purpose
except to appease the error sequence. It gives the impression that the
reference management should be handled at a higher level (and with truly
synchronous I/O primitives/mechanisms, it is ;).
At the very least, can we reorganize the ioend side of things to handle
this? We already have the duplicate execution issue previously mentioned
that has to get resolved. Some duplicate code is fine if it improves
clarity, imo.
Brian
> + xfs_buf_ioend(bp);
> + return -EIO;
> }
>
> xfs_buf_iorequest(bp);
> @@ -1848,16 +1818,19 @@ __xfs_buf_delwri_submit(
> if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> trace_xfs_bdstrat_shut(bp, _RET_IP_);
>
> + xfs_buf_ioerror(bp, -EIO);
> + xfs_buf_stale(bp);
> +
> /*
> - * Metadata write that didn't get logged but written anyway.
> - * These aren't associated with a transaction, and can be
> - * ignored.
> + * if sync IO, xfs_buf_ioend is going to remove a
> + * ref here. We need to add that so we can collect
> + * all the buffer errors in the wait loop.
> */
> - if (!bp->b_iodone)
> - return xfs_bioerror_relse(bp);
> - return xfs_bioerror(bp);
> - }
> - xfs_buf_iorequest(bp);
> + if (wait)
> + xfs_buf_hold(bp);
> + xfs_buf_ioend(bp);
> + } else
> + xfs_buf_iorequest(bp);
> }
> blk_finish_plug(&plug);
>
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
2014-08-15 13:10 ` Christoph Hellwig
@ 2014-08-15 14:35 ` Brian Foster
2014-08-15 23:39 ` Dave Chinner
2014-08-15 16:13 ` Brian Foster
2 siblings, 1 reply; 43+ messages in thread
From: Brian Foster @ 2014-08-15 14:35 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:06PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> There is a lot of cookie-cutter code that looks like:
>
> if (shutdown)
> handle buffer error
> xfs_buf_iorequest(bp)
> error = xfs_buf_iowait(bp)
> if (error)
> handle buffer error
>
> spread through XFS. There's significant complexity now in
> xfs_buf_iorequest() to specifically handle this sort of synchronous
> IO pattern, but there's all sorts of nasty surprises in different
> error handling code dependent on who owns the buffer references and
> the locks.
>
> Pull this pattern into a single helper, where we can hide all the
> synchronous IO warts and hence make the error handling for all the
> callers much saner. This removes the need for a special extra
> reference to protect IO completion processing, as we can now hold a
> single reference across dispatch and waiting, simplifying the sync
> IO smeantics and error handling.
>
> In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
> make it explicitly handle on asynchronous IO. This forces all users
> to be switched specifically to one interface or the other and
> removes any ambiguity between how the interfaces are to be used. It
> also means that xfs_buf_iowait() goes away.
>
> For the special case of delwri buffer submission and waiting, we
> don't need to issue IO synchronously at all. The second pass to cal
> xfs_buf_iowait() can now just block on xfs_buf_lock() - the buffer
> will be unlocked when the async IO is complete. This formalises a
> sane the method of waiting for async IO - take an extra reference,
> submit the IO, call xfs_buf_lock() when you want to wait for IO
> completion. i.e.:
>
> bp = xfs_buf_find();
> xfs_buf_hold(bp);
> bp->b_flags |= XBF_ASYNC;
> xfs_buf_iosubmit(bp);
> xfs_buf_lock(bp)
> error = bp->b_error;
> ....
> xfs_buf_relse(bp);
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
On a quick look at submit_wait this looks pretty good. It actually
implements the general model I've been looking for for sync I/O. E.g.,
send the I/O, wait on synchronization, then check for errors. In other
words, a pure synchronous mechanism. The refactoring and new helpers and
whatnot are additional bonus and abstract it nicely.
I still have to take a closer look to review the actual code, but since
we go and remove the additional sync I/O reference counting business,
why do we even add that stuff early on? Can't we get from where the
current code is to here in a more direct manner?
Brian
> fs/xfs/xfs_bmap_util.c | 14 +---
> fs/xfs/xfs_buf.c | 186 ++++++++++++++++++++++++++---------------------
> fs/xfs/xfs_buf.h | 4 +-
> fs/xfs/xfs_buf_item.c | 4 +-
> fs/xfs/xfs_log.c | 2 +-
> fs/xfs/xfs_log_recover.c | 22 ++----
> fs/xfs/xfs_trans_buf.c | 17 ++---
> 7 files changed, 122 insertions(+), 127 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2f1e30d..c53cc03 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1157,12 +1157,7 @@ xfs_zero_remaining_bytes(
> XFS_BUF_READ(bp);
> XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
>
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - error = -EIO;
> - break;
> - }
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> xfs_buf_ioerror_alert(bp,
> "xfs_zero_remaining_bytes(read)");
> @@ -1175,12 +1170,7 @@ xfs_zero_remaining_bytes(
> XFS_BUF_UNREAD(bp);
> XFS_BUF_WRITE(bp);
>
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - error = -EIO;
> - break;
> - }
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> xfs_buf_ioerror_alert(bp,
> "xfs_zero_remaining_bytes(write)");
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 352e9219..a2599f9 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -623,10 +623,11 @@ _xfs_buf_read(
> bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD);
> bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
>
> - xfs_buf_iorequest(bp);
> - if (flags & XBF_ASYNC)
> + if (flags & XBF_ASYNC) {
> + xfs_buf_submit(bp);
> return 0;
> - return xfs_buf_iowait(bp);
> + }
> + return xfs_buf_submit_wait(bp);
> }
>
> xfs_buf_t *
> @@ -708,12 +709,7 @@ xfs_buf_read_uncached(
> bp->b_flags |= XBF_READ;
> bp->b_ops = ops;
>
> - if (XFS_FORCED_SHUTDOWN(target->bt_mount)) {
> - xfs_buf_relse(bp);
> - return NULL;
> - }
> - xfs_buf_iorequest(bp);
> - xfs_buf_iowait(bp);
> + xfs_buf_submit_wait(bp);
> return bp;
> }
>
> @@ -1027,10 +1023,8 @@ xfs_buf_ioend(
> (*(bp->b_iodone))(bp);
> else if (bp->b_flags & XBF_ASYNC)
> xfs_buf_relse(bp);
> - else {
> + else
> complete(&bp->b_iowait);
> - xfs_buf_rele(bp);
> - }
> }
>
> static void
> @@ -1083,21 +1077,7 @@ xfs_bwrite(
> bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
> XBF_WRITE_FAIL | XBF_DONE);
>
> - if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> - trace_xfs_bdstrat_shut(bp, _RET_IP_);
> -
> - xfs_buf_ioerror(bp, -EIO);
> - xfs_buf_stale(bp);
> -
> - /* sync IO, xfs_buf_ioend is going to remove a ref here */
> - xfs_buf_hold(bp);
> - xfs_buf_ioend(bp);
> - return -EIO;
> - }
> -
> - xfs_buf_iorequest(bp);
> -
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> xfs_force_shutdown(bp->b_target->bt_mount,
> SHUTDOWN_META_IO_ERROR);
> @@ -1206,7 +1186,7 @@ next_chunk:
> } else {
> /*
> * This is guaranteed not to be the last io reference count
> - * because the caller (xfs_buf_iorequest) holds a count itself.
> + * because the caller (xfs_buf_submit) holds a count itself.
> */
> atomic_dec(&bp->b_io_remaining);
> xfs_buf_ioerror(bp, -EIO);
> @@ -1296,13 +1276,29 @@ _xfs_buf_ioapply(
> blk_finish_plug(&plug);
> }
>
> +/*
> + * Asynchronous IO submission path. This transfers the buffer lock ownership and
> + * the current reference to the IO. It is not safe to reference the buffer after
> + * a call to this function unless the caller holds an additional reference
> + * itself.
> + */
> void
> -xfs_buf_iorequest(
> - xfs_buf_t *bp)
> +xfs_buf_submit(
> + struct xfs_buf *bp)
> {
> trace_xfs_buf_iorequest(bp, _RET_IP_);
>
> ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
> + ASSERT(bp->b_flags & XBF_ASYNC);
> +
> + /* on shutdown we stale and complete the buffer immediately */
> + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> + xfs_buf_ioerror(bp, -EIO);
> + bp->b_flags &= ~XBF_DONE;
> + xfs_buf_stale(bp);
> + xfs_buf_ioend(bp);
> + return;
> + }
>
> if (bp->b_flags & XBF_WRITE)
> xfs_buf_wait_unpin(bp);
> @@ -1311,25 +1307,10 @@ xfs_buf_iorequest(
> bp->b_io_error = 0;
>
> /*
> - * Take references to the buffer. For XBF_ASYNC buffers, holding a
> - * reference for as long as submission takes is all that is necessary
> - * here. The IO inherits the lock and hold count from the submitter,
> - * and these are release during IO completion processing. Taking a hold
> - * over submission ensures that the buffer is not freed until we have
> - * completed all processing, regardless of when IO errors occur or are
> - * reported.
> - *
> - * However, for synchronous IO, the IO does not inherit the submitters
> - * reference count, nor the buffer lock. Hence we need to take an extra
> - * reference to the buffer for the for the IO context so that we can
> - * guarantee the buffer is not freed until all IO completion processing
> - * is done. Otherwise the caller can drop their reference while the IO
> - * is still in progress and hence trigger a use-after-free situation.
> + * Take an extra reference so that we don't have to guess when it's no
> + * longer safe to reference the buffer that was passed to us.
> */
> xfs_buf_hold(bp);
> - if (!(bp->b_flags & XBF_ASYNC))
> - xfs_buf_hold(bp);
> -
>
> /*
> * Set the count to 1 initially, this will stop an I/O completion
> @@ -1340,14 +1321,13 @@ xfs_buf_iorequest(
> _xfs_buf_ioapply(bp);
>
> /*
> - * If _xfs_buf_ioapply failed or we are doing synchronous IO that
> - * completes extremely quickly, we can get back here with only the IO
> + * If _xfs_buf_ioapply failed,
> + * we can get back here with only the IO
> * reference we took above. _xfs_buf_ioend will drop it to zero, so
> * we'd better run completion processing synchronously so that the we
> - * don't return to the caller with completion still pending. In the
> - * error case, this allows the caller to check b_error safely without
> - * waiting, and in the synchronous IO case it avoids unnecessary context
> - * switches an latency for high-peformance devices.
> + * don't return to the caller with completion still pending.
> + * this allows the caller to check b_error safely without
> + * waiting
> */
> if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> @@ -1357,25 +1337,70 @@ xfs_buf_iorequest(
> }
>
> xfs_buf_rele(bp);
> + /* Note: it is not safe to reference bp now we've dropped our ref */
> }
>
> /*
> - * Waits for I/O to complete on the buffer supplied. It returns immediately if
> - * no I/O is pending or there is already a pending error on the buffer, in which
> - * case nothing will ever complete. It returns the I/O error code, if any, or
> - * 0 if there was no error.
> + * Synchronous buffer IO submission path, read or write.
> */
> int
> -xfs_buf_iowait(
> - xfs_buf_t *bp)
> +xfs_buf_submit_wait(
> + struct xfs_buf *bp)
> {
> - trace_xfs_buf_iowait(bp, _RET_IP_);
> + int error;
> +
> + trace_xfs_buf_iorequest(bp, _RET_IP_);
>
> - if (!bp->b_error)
> - wait_for_completion(&bp->b_iowait);
> + ASSERT(!(bp->b_flags & (_XBF_DELWRI_Q | XBF_ASYNC)));
> +
> + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> + xfs_buf_ioerror(bp, -EIO);
> + xfs_buf_stale(bp);
> + bp->b_flags &= ~XBF_DONE;
> + return -EIO;
> + }
>
> + if (bp->b_flags & XBF_WRITE)
> + xfs_buf_wait_unpin(bp);
> +
> + /* clear the internal error state to avoid spurious errors */
> + bp->b_io_error = 0;
> +
> + /*
> + * For synchronous IO, the IO does not inherit the submitters reference
> + * count, nor the buffer lock. Hence we cannot release the reference we
> + * are about to take until we've waited for all IO completion to occur,
> + * including any xfs_buf_ioend_async() work that may be pending.
> + */
> + xfs_buf_hold(bp);
> +
> + /*
> + * Set the count to 1 initially, this will stop an I/O completion
> + * callout which happens before we have started all the I/O from calling
> + * xfs_buf_ioend too early.
> + */
> + atomic_set(&bp->b_io_remaining, 1);
> + _xfs_buf_ioapply(bp);
> +
> + /*
> + * make sure we run completion synchronously if it raced with us and is
> + * already complete.
> + */
> + if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> + xfs_buf_ioend(bp);
> +
> + /* wait for completion before gathering the error from the buffer */
> + trace_xfs_buf_iowait(bp, _RET_IP_);
> + wait_for_completion(&bp->b_iowait);
> trace_xfs_buf_iowait_done(bp, _RET_IP_);
> - return bp->b_error;
> + error = bp->b_error;
> +
> + /*
> + * all done now, we can release the hold that keeps the buffer
> + * referenced for the entire IO.
> + */
> + xfs_buf_rele(bp);
> + return error;
> }
>
> xfs_caddr_t
> @@ -1769,29 +1794,19 @@ __xfs_buf_delwri_submit(
> blk_start_plug(&plug);
> list_for_each_entry_safe(bp, n, io_list, b_list) {
> bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL);
> - bp->b_flags |= XBF_WRITE;
> + bp->b_flags |= XBF_WRITE | XBF_ASYNC;
>
> - if (!wait) {
> - bp->b_flags |= XBF_ASYNC;
> + /*
> + * we do all Io submission async. This means if we need to wait
> + * for IO completion we need to take an extra reference so the
> + * buffer is still valid on the other side.
> + */
> + if (!wait)
> list_del_init(&bp->b_list);
> - }
> -
> - if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> - trace_xfs_bdstrat_shut(bp, _RET_IP_);
> -
> - xfs_buf_ioerror(bp, -EIO);
> - xfs_buf_stale(bp);
> + else
> + xfs_buf_hold(bp);
>
> - /*
> - * if sync IO, xfs_buf_ioend is going to remove a
> - * ref here. We need to add that so we can collect
> - * all the buffer errors in the wait loop.
> - */
> - if (wait)
> - xfs_buf_hold(bp);
> - xfs_buf_ioend(bp);
> - } else
> - xfs_buf_iorequest(bp);
> + xfs_buf_submit(bp);
> }
> blk_finish_plug(&plug);
>
> @@ -1838,7 +1853,10 @@ xfs_buf_delwri_submit(
> bp = list_first_entry(&io_list, struct xfs_buf, b_list);
>
> list_del_init(&bp->b_list);
> - error2 = xfs_buf_iowait(bp);
> +
> + /* locking the buffer will wait for async IO completion. */
> + xfs_buf_lock(bp);
> + error2 = bp->b_error;
> xfs_buf_relse(bp);
> if (!error)
> error = error2;
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index d8f57f6..0acfc30 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -290,8 +290,8 @@ extern int xfs_bwrite(struct xfs_buf *bp);
> extern void xfs_buf_ioend(struct xfs_buf *bp);
> extern void xfs_buf_ioerror(xfs_buf_t *, int);
> extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
> -extern void xfs_buf_iorequest(xfs_buf_t *);
> -extern int xfs_buf_iowait(xfs_buf_t *);
> +extern void xfs_buf_submit(struct xfs_buf *bp);
> +extern int xfs_buf_submit_wait(struct xfs_buf *bp);
> extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
> xfs_buf_rw_t);
> #define xfs_buf_zero(bp, off, len) \
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 4fd41b5..cbea909 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1081,7 +1081,7 @@ xfs_buf_iodone_callbacks(
> * a way to shut the filesystem down if the writes keep failing.
> *
> * In practice we'll shut the filesystem down soon as non-transient
> - * erorrs tend to affect the whole device and a failing log write
> + * errors tend to affect the whole device and a failing log write
> * will make us give up. But we really ought to do better here.
> */
> if (XFS_BUF_ISASYNC(bp)) {
> @@ -1094,7 +1094,7 @@ xfs_buf_iodone_callbacks(
> if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> XBF_DONE | XBF_WRITE_FAIL;
> - xfs_buf_iorequest(bp);
> + xfs_buf_submit(bp);
> } else {
> xfs_buf_relse(bp);
> }
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index e4665db..c4d7f79 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1699,7 +1699,7 @@ xlog_bdstrat(
> return 0;
> }
>
> - xfs_buf_iorequest(bp);
> + xfs_buf_submit(bp);
> return 0;
> }
>
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 4ba19bf..1e14452 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -193,12 +193,8 @@ xlog_bread_noalign(
> bp->b_io_length = nbblks;
> bp->b_error = 0;
>
> - if (XFS_FORCED_SHUTDOWN(log->l_mp))
> - return -EIO;
> -
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> - if (error)
> + error = xfs_buf_submit_wait(bp);
> + if (error && !XFS_FORCED_SHUTDOWN(log->l_mp))
> xfs_buf_ioerror_alert(bp, __func__);
> return error;
> }
> @@ -4427,16 +4423,12 @@ xlog_do_recover(
> XFS_BUF_UNASYNC(bp);
> bp->b_ops = &xfs_sb_buf_ops;
>
> - if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
> - xfs_buf_relse(bp);
> - return -EIO;
> - }
> -
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> - xfs_buf_ioerror_alert(bp, __func__);
> - ASSERT(0);
> + if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
> + xfs_buf_ioerror_alert(bp, __func__);
> + ASSERT(0);
> + }
> xfs_buf_relse(bp);
> return error;
> }
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 9c3e610..4bdf02c 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -286,18 +286,13 @@ xfs_trans_read_buf_map(
> bp->b_flags |= XBF_READ;
> bp->b_ops = ops;
>
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - trace_xfs_bdstrat_shut(bp, _RET_IP_);
> - error = -EIO;
> - goto out_stale;
> - }
> -
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> - xfs_buf_ioerror_alert(bp, __func__);
> - goto out_do_shutdown;
> -
> + if (!XFS_FORCED_SHUTDOWN(mp)) {
> + xfs_buf_ioerror_alert(bp, __func__);
> + goto out_do_shutdown;
> + }
> + goto out_stale;
> }
> }
>
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
2014-08-15 13:10 ` Christoph Hellwig
2014-08-15 14:35 ` Brian Foster
@ 2014-08-15 16:13 ` Brian Foster
2014-08-15 23:58 ` Dave Chinner
2 siblings, 1 reply; 43+ messages in thread
From: Brian Foster @ 2014-08-15 16:13 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:06PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> There is a lot of cookie-cutter code that looks like:
>
> if (shutdown)
> handle buffer error
> xfs_buf_iorequest(bp)
> error = xfs_buf_iowait(bp)
> if (error)
> handle buffer error
>
> spread through XFS. There's significant complexity now in
> xfs_buf_iorequest() to specifically handle this sort of synchronous
> IO pattern, but there's all sorts of nasty surprises in different
> error handling code dependent on who owns the buffer references and
> the locks.
>
> Pull this pattern into a single helper, where we can hide all the
> synchronous IO warts and hence make the error handling for all the
> callers much saner. This removes the need for a special extra
> reference to protect IO completion processing, as we can now hold a
> single reference across dispatch and waiting, simplifying the sync
> IO smeantics and error handling.
>
> In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
> make it explicitly handle on asynchronous IO. This forces all users
> to be switched specifically to one interface or the other and
> removes any ambiguity between how the interfaces are to be used. It
> also means that xfs_buf_iowait() goes away.
>
> For the special case of delwri buffer submission and waiting, we
> don't need to issue IO synchronously at all. The second pass to cal
> xfs_buf_iowait() can now just block on xfs_buf_lock() - the buffer
> will be unlocked when the async IO is complete. This formalises a
> sane the method of waiting for async IO - take an extra reference,
> submit the IO, call xfs_buf_lock() when you want to wait for IO
> completion. i.e.:
>
> bp = xfs_buf_find();
> xfs_buf_hold(bp);
> bp->b_flags |= XBF_ASYNC;
> xfs_buf_iosubmit(bp);
> xfs_buf_lock(bp)
> error = bp->b_error;
> ....
> xfs_buf_relse(bp);
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_bmap_util.c | 14 +---
> fs/xfs/xfs_buf.c | 186 ++++++++++++++++++++++++++---------------------
> fs/xfs/xfs_buf.h | 4 +-
> fs/xfs/xfs_buf_item.c | 4 +-
> fs/xfs/xfs_log.c | 2 +-
> fs/xfs/xfs_log_recover.c | 22 ++----
> fs/xfs/xfs_trans_buf.c | 17 ++---
> 7 files changed, 122 insertions(+), 127 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2f1e30d..c53cc03 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1157,12 +1157,7 @@ xfs_zero_remaining_bytes(
> XFS_BUF_READ(bp);
> XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
>
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - error = -EIO;
> - break;
> - }
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> xfs_buf_ioerror_alert(bp,
> "xfs_zero_remaining_bytes(read)");
> @@ -1175,12 +1170,7 @@ xfs_zero_remaining_bytes(
> XFS_BUF_UNREAD(bp);
> XFS_BUF_WRITE(bp);
>
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - error = -EIO;
> - break;
> - }
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> xfs_buf_ioerror_alert(bp,
> "xfs_zero_remaining_bytes(write)");
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 352e9219..a2599f9 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -623,10 +623,11 @@ _xfs_buf_read(
> bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD);
> bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
>
> - xfs_buf_iorequest(bp);
> - if (flags & XBF_ASYNC)
> + if (flags & XBF_ASYNC) {
> + xfs_buf_submit(bp);
> return 0;
> - return xfs_buf_iowait(bp);
> + }
> + return xfs_buf_submit_wait(bp);
> }
>
> xfs_buf_t *
> @@ -708,12 +709,7 @@ xfs_buf_read_uncached(
> bp->b_flags |= XBF_READ;
> bp->b_ops = ops;
>
> - if (XFS_FORCED_SHUTDOWN(target->bt_mount)) {
> - xfs_buf_relse(bp);
> - return NULL;
> - }
> - xfs_buf_iorequest(bp);
> - xfs_buf_iowait(bp);
> + xfs_buf_submit_wait(bp);
> return bp;
> }
>
> @@ -1027,10 +1023,8 @@ xfs_buf_ioend(
> (*(bp->b_iodone))(bp);
> else if (bp->b_flags & XBF_ASYNC)
> xfs_buf_relse(bp);
> - else {
> + else
> complete(&bp->b_iowait);
> - xfs_buf_rele(bp);
> - }
> }
>
> static void
> @@ -1083,21 +1077,7 @@ xfs_bwrite(
> bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
> XBF_WRITE_FAIL | XBF_DONE);
>
> - if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> - trace_xfs_bdstrat_shut(bp, _RET_IP_);
> -
> - xfs_buf_ioerror(bp, -EIO);
> - xfs_buf_stale(bp);
> -
> - /* sync IO, xfs_buf_ioend is going to remove a ref here */
> - xfs_buf_hold(bp);
> - xfs_buf_ioend(bp);
> - return -EIO;
> - }
> -
> - xfs_buf_iorequest(bp);
> -
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> xfs_force_shutdown(bp->b_target->bt_mount,
> SHUTDOWN_META_IO_ERROR);
> @@ -1206,7 +1186,7 @@ next_chunk:
> } else {
> /*
> * This is guaranteed not to be the last io reference count
> - * because the caller (xfs_buf_iorequest) holds a count itself.
> + * because the caller (xfs_buf_submit) holds a count itself.
> */
> atomic_dec(&bp->b_io_remaining);
> xfs_buf_ioerror(bp, -EIO);
> @@ -1296,13 +1276,29 @@ _xfs_buf_ioapply(
> blk_finish_plug(&plug);
> }
>
> +/*
> + * Asynchronous IO submission path. This transfers the buffer lock ownership and
> + * the current reference to the IO. It is not safe to reference the buffer after
> + * a call to this function unless the caller holds an additional reference
> + * itself.
> + */
> void
> -xfs_buf_iorequest(
> - xfs_buf_t *bp)
> +xfs_buf_submit(
> + struct xfs_buf *bp)
> {
> trace_xfs_buf_iorequest(bp, _RET_IP_);
>
> ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
> + ASSERT(bp->b_flags & XBF_ASYNC);
> +
> + /* on shutdown we stale and complete the buffer immediately */
> + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> + xfs_buf_ioerror(bp, -EIO);
> + bp->b_flags &= ~XBF_DONE;
> + xfs_buf_stale(bp);
> + xfs_buf_ioend(bp);
> + return;
> + }
>
> if (bp->b_flags & XBF_WRITE)
> xfs_buf_wait_unpin(bp);
> @@ -1311,25 +1307,10 @@ xfs_buf_iorequest(
> bp->b_io_error = 0;
>
I know this is from the previous patch, but I wonder if it's cleaner to
reset b_io_error when the error is transferred to b_error. That seems
slightly more future proof at least.
> /*
> - * Take references to the buffer. For XBF_ASYNC buffers, holding a
> - * reference for as long as submission takes is all that is necessary
> - * here. The IO inherits the lock and hold count from the submitter,
> - * and these are release during IO completion processing. Taking a hold
> - * over submission ensures that the buffer is not freed until we have
> - * completed all processing, regardless of when IO errors occur or are
> - * reported.
> - *
> - * However, for synchronous IO, the IO does not inherit the submitters
> - * reference count, nor the buffer lock. Hence we need to take an extra
> - * reference to the buffer for the for the IO context so that we can
> - * guarantee the buffer is not freed until all IO completion processing
> - * is done. Otherwise the caller can drop their reference while the IO
> - * is still in progress and hence trigger a use-after-free situation.
> + * Take an extra reference so that we don't have to guess when it's no
> + * longer safe to reference the buffer that was passed to us.
> */
Assuming my understanding is correct:
/*
* The caller's reference is released by buffer I/O completion. Technically
* this should not occur until we release the last b_io_remaining reference.
* Take a direct reference across the I/O submission anyways to be sure it's
* safe to access the buffer until we release it.
*/
> xfs_buf_hold(bp);
> - if (!(bp->b_flags & XBF_ASYNC))
> - xfs_buf_hold(bp);
> -
>
> /*
> * Set the count to 1 initially, this will stop an I/O completion
> @@ -1340,14 +1321,13 @@ xfs_buf_iorequest(
> _xfs_buf_ioapply(bp);
>
> /*
> - * If _xfs_buf_ioapply failed or we are doing synchronous IO that
> - * completes extremely quickly, we can get back here with only the IO
> + * If _xfs_buf_ioapply failed,
> + * we can get back here with only the IO
> * reference we took above. _xfs_buf_ioend will drop it to zero, so
> * we'd better run completion processing synchronously so that the we
> - * don't return to the caller with completion still pending. In the
> - * error case, this allows the caller to check b_error safely without
> - * waiting, and in the synchronous IO case it avoids unnecessary context
> - * switches an latency for high-peformance devices.
> + * don't return to the caller with completion still pending.
> + * this allows the caller to check b_error safely without
> + * waiting
> */
> if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> @@ -1357,25 +1337,70 @@ xfs_buf_iorequest(
> }
>
> xfs_buf_rele(bp);
> + /* Note: it is not safe to reference bp now we've dropped our ref */
> }
>
> /*
> - * Waits for I/O to complete on the buffer supplied. It returns immediately if
> - * no I/O is pending or there is already a pending error on the buffer, in which
> - * case nothing will ever complete. It returns the I/O error code, if any, or
> - * 0 if there was no error.
> + * Synchronous buffer IO submission path, read or write.
> */
> int
> -xfs_buf_iowait(
> - xfs_buf_t *bp)
> +xfs_buf_submit_wait(
> + struct xfs_buf *bp)
> {
> - trace_xfs_buf_iowait(bp, _RET_IP_);
> + int error;
> +
> + trace_xfs_buf_iorequest(bp, _RET_IP_);
>
> - if (!bp->b_error)
> - wait_for_completion(&bp->b_iowait);
> + ASSERT(!(bp->b_flags & (_XBF_DELWRI_Q | XBF_ASYNC)));
> +
> + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> + xfs_buf_ioerror(bp, -EIO);
> + xfs_buf_stale(bp);
> + bp->b_flags &= ~XBF_DONE;
> + return -EIO;
> + }
>
> + if (bp->b_flags & XBF_WRITE)
> + xfs_buf_wait_unpin(bp);
> +
> + /* clear the internal error state to avoid spurious errors */
> + bp->b_io_error = 0;
> +
> + /*
> + * For synchronous IO, the IO does not inherit the submitters reference
> + * count, nor the buffer lock. Hence we cannot release the reference we
> + * are about to take until we've waited for all IO completion to occur,
> + * including any xfs_buf_ioend_async() work that may be pending.
> + */
> + xfs_buf_hold(bp);
> +
Harmless, but why is this necessary? The caller should have the
reference, the I/O completion won't release it and we wait on b_iowait
before we return. Isn't the caller's reference sufficient?
> + /*
> + * Set the count to 1 initially, this will stop an I/O completion
> + * callout which happens before we have started all the I/O from calling
> + * xfs_buf_ioend too early.
> + */
> + atomic_set(&bp->b_io_remaining, 1);
> + _xfs_buf_ioapply(bp);
> +
> + /*
> + * make sure we run completion synchronously if it raced with us and is
> + * already complete.
> + */
> + if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> + xfs_buf_ioend(bp);
> +
> + /* wait for completion before gathering the error from the buffer */
> + trace_xfs_buf_iowait(bp, _RET_IP_);
> + wait_for_completion(&bp->b_iowait);
> trace_xfs_buf_iowait_done(bp, _RET_IP_);
> - return bp->b_error;
> + error = bp->b_error;
> +
> + /*
> + * all done now, we can release the hold that keeps the buffer
> + * referenced for the entire IO.
> + */
> + xfs_buf_rele(bp);
> + return error;
> }
>
> xfs_caddr_t
> @@ -1769,29 +1794,19 @@ __xfs_buf_delwri_submit(
> blk_start_plug(&plug);
> list_for_each_entry_safe(bp, n, io_list, b_list) {
> bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL);
> - bp->b_flags |= XBF_WRITE;
> + bp->b_flags |= XBF_WRITE | XBF_ASYNC;
>
> - if (!wait) {
> - bp->b_flags |= XBF_ASYNC;
> + /*
> + * we do all Io submission async. This means if we need to wait
> + * for IO completion we need to take an extra reference so the
> + * buffer is still valid on the other side.
> + */
> + if (!wait)
> list_del_init(&bp->b_list);
> - }
> -
> - if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> - trace_xfs_bdstrat_shut(bp, _RET_IP_);
> -
> - xfs_buf_ioerror(bp, -EIO);
> - xfs_buf_stale(bp);
> + else
> + xfs_buf_hold(bp);
>
> - /*
> - * if sync IO, xfs_buf_ioend is going to remove a
> - * ref here. We need to add that so we can collect
> - * all the buffer errors in the wait loop.
> - */
> - if (wait)
> - xfs_buf_hold(bp);
> - xfs_buf_ioend(bp);
> - } else
> - xfs_buf_iorequest(bp);
> + xfs_buf_submit(bp);
> }
> blk_finish_plug(&plug);
>
> @@ -1838,7 +1853,10 @@ xfs_buf_delwri_submit(
> bp = list_first_entry(&io_list, struct xfs_buf, b_list);
>
> list_del_init(&bp->b_list);
> - error2 = xfs_buf_iowait(bp);
> +
> + /* locking the buffer will wait for async IO completion. */
> + xfs_buf_lock(bp);
> + error2 = bp->b_error;
Interesting delwri cleanup overall. The lock here works for
synchronization (blocking), but it doesn't look safe for error
processing. Once the buffer lock is dropped, who says b_error is from
our write (and not a subsequent, for example) and thus this caller's
responsibility to handle the error? I suspect this is a reason the lock
is typically held across a sync I/O, so the error is valid after the
I/O.
Also, it looks like blocking on async I/O as such opens up the
possibility of blocking indefinitely on failing writes if the right
combination of delwri and b_iodone handler is used (see
xfs_buf_iodone_callbacks()). I'm not sure if that's a real problem
today, though.
> xfs_buf_relse(bp);
> if (!error)
> error = error2;
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index d8f57f6..0acfc30 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -290,8 +290,8 @@ extern int xfs_bwrite(struct xfs_buf *bp);
> extern void xfs_buf_ioend(struct xfs_buf *bp);
> extern void xfs_buf_ioerror(xfs_buf_t *, int);
> extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
> -extern void xfs_buf_iorequest(xfs_buf_t *);
> -extern int xfs_buf_iowait(xfs_buf_t *);
> +extern void xfs_buf_submit(struct xfs_buf *bp);
> +extern int xfs_buf_submit_wait(struct xfs_buf *bp);
> extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
> xfs_buf_rw_t);
> #define xfs_buf_zero(bp, off, len) \
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 4fd41b5..cbea909 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1081,7 +1081,7 @@ xfs_buf_iodone_callbacks(
> * a way to shut the filesystem down if the writes keep failing.
> *
> * In practice we'll shut the filesystem down soon as non-transient
> - * erorrs tend to affect the whole device and a failing log write
> + * errors tend to affect the whole device and a failing log write
> * will make us give up. But we really ought to do better here.
> */
> if (XFS_BUF_ISASYNC(bp)) {
> @@ -1094,7 +1094,7 @@ xfs_buf_iodone_callbacks(
> if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> XBF_DONE | XBF_WRITE_FAIL;
> - xfs_buf_iorequest(bp);
> + xfs_buf_submit(bp);
> } else {
> xfs_buf_relse(bp);
> }
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index e4665db..c4d7f79 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1699,7 +1699,7 @@ xlog_bdstrat(
> return 0;
> }
>
> - xfs_buf_iorequest(bp);
> + xfs_buf_submit(bp);
> return 0;
> }
>
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 4ba19bf..1e14452 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -193,12 +193,8 @@ xlog_bread_noalign(
> bp->b_io_length = nbblks;
> bp->b_error = 0;
>
> - if (XFS_FORCED_SHUTDOWN(log->l_mp))
> - return -EIO;
> -
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> - if (error)
> + error = xfs_buf_submit_wait(bp);
> + if (error && !XFS_FORCED_SHUTDOWN(log->l_mp))
> xfs_buf_ioerror_alert(bp, __func__);
> return error;
> }
> @@ -4427,16 +4423,12 @@ xlog_do_recover(
> XFS_BUF_UNASYNC(bp);
> bp->b_ops = &xfs_sb_buf_ops;
>
> - if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
> - xfs_buf_relse(bp);
> - return -EIO;
> - }
> -
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> - xfs_buf_ioerror_alert(bp, __func__);
> - ASSERT(0);
> + if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
Should this be !XFS_FORCED_SHUTDOWN()?
Brian
> + xfs_buf_ioerror_alert(bp, __func__);
> + ASSERT(0);
> + }
> xfs_buf_relse(bp);
> return error;
> }
> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 9c3e610..4bdf02c 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -286,18 +286,13 @@ xfs_trans_read_buf_map(
> bp->b_flags |= XBF_READ;
> bp->b_ops = ops;
>
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - trace_xfs_bdstrat_shut(bp, _RET_IP_);
> - error = -EIO;
> - goto out_stale;
> - }
> -
> - xfs_buf_iorequest(bp);
> - error = xfs_buf_iowait(bp);
> + error = xfs_buf_submit_wait(bp);
> if (error) {
> - xfs_buf_ioerror_alert(bp, __func__);
> - goto out_do_shutdown;
> -
> + if (!XFS_FORCED_SHUTDOWN(mp)) {
> + xfs_buf_ioerror_alert(bp, __func__);
> + goto out_do_shutdown;
> + }
> + goto out_stale;
> }
> }
>
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/9] xfs: synchronous buffer IO needs a reference
2014-08-15 13:18 ` Brian Foster
@ 2014-08-15 23:17 ` Dave Chinner
2014-08-18 14:15 ` Brian Foster
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:17 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Fri, Aug 15, 2014 at 09:18:04AM -0400, Brian Foster wrote:
> On Fri, Aug 15, 2014 at 04:38:59PM +1000, Dave Chinner wrote:
....
> > if (bp->b_flags & XBF_WRITE)
> > xfs_buf_wait_unpin(bp);
> > +
> > + /*
> > + * Take references to the buffer. For XBF_ASYNC buffers, holding a
> > + * reference for as long as submission takes is all that is necessary
> > + * here. The IO inherits the lock and hold count from the submitter,
> > + * and these are release during IO completion processing. Taking a hold
> > + * over submission ensures that the buffer is not freed until we have
> > + * completed all processing, regardless of when IO errors occur or are
> > + * reported.
> > + *
> > + * However, for synchronous IO, the IO does not inherit the submitters
> > + * reference count, nor the buffer lock. Hence we need to take an extra
> > + * reference to the buffer for the for the IO context so that we can
> > + * guarantee the buffer is not freed until all IO completion processing
> > + * is done. Otherwise the caller can drop their reference while the IO
> > + * is still in progress and hence trigger a use-after-free situation.
> > + */
> > xfs_buf_hold(bp);
> > + if (!(bp->b_flags & XBF_ASYNC))
> > + xfs_buf_hold(bp);
> > +
> >
> > /*
> > - * Set the count to 1 initially, this will stop an I/O
> > - * completion callout which happens before we have started
> > - * all the I/O from calling xfs_buf_ioend too early.
> > + * Set the count to 1 initially, this will stop an I/O completion
> > + * callout which happens before we have started all the I/O from calling
> > + * xfs_buf_ioend too early.
> > */
> > atomic_set(&bp->b_io_remaining, 1);
> > _xfs_buf_ioapply(bp);
> > +
> > /*
> > - * If _xfs_buf_ioapply failed, we'll get back here with
> > - * only the reference we took above. _xfs_buf_ioend will
> > - * drop it to zero, so we'd better not queue it for later,
> > - * or we'll free it before it's done.
> > + * If _xfs_buf_ioapply failed or we are doing synchronous IO that
> > + * completes extremely quickly, we can get back here with only the IO
> > + * reference we took above. _xfs_buf_ioend will drop it to zero, so
> > + * we'd better run completion processing synchronously so that the we
> > + * don't return to the caller with completion still pending. In the
> > + * error case, this allows the caller to check b_error safely without
> > + * waiting, and in the synchronous IO case it avoids unnecessary context
> > + * switches an latency for high-peformance devices.
> > */
>
> AFAICT there is no real wait if the buf has completed at this point. The
> wait just decrements the completion counter.
If the IO has completed, then we run the completion code.
> So what's the benefit of
> "not waiting?" Where is the potential context switch?
async work for completion processing on synchrnous IO means we queue
the work, then sleep in xfs_buf_iowait(). Two context switches, plus
a work queue execution
> Are you referring
> to the case where error is set but I/O is not complete? Are you saying
> the advantage to the caller is it doesn't have to care about the state
> of further I/O once it has been determined at least one error has
> occurred? (If so, who cares about latency given that some operation that
> depends on this I/O is already doomed to fail?).
No, you're reading *way* too much into this. For sync IO, it's
always best to process completion inline. For async, it doesn't
matter, but if there's a submission error is *more effecient* to
process it in the current context.
> The code looks fine, but I'm trying to understand the reasoning better
> (and I suspect we can clarify the comment).
>
> > - _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> > + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > + _xfs_buf_ioend(bp, 0);
> > + else
> > + _xfs_buf_ioend(bp, 1);
>
> Not related to this patch, but it seems like the problem this code tries
> to address is still possible.
The race condition is still possible - it just won't result in a
use-after-free. The race condition is not fixed until patch 8,
but as a backportable fix, this patch is much, much simpler.
> Perhaps this papers over a particular
> instance. Consider the case where an I/O fails immediately after this
> call completes, but not before. We have an extra reference now for
> completion, but we can still return to the caller with completion
> pending. I suppose its fine if we consider the "problem" to be that the
> reference goes away underneath the completion, as opposed to the caller
> caring about the status of completion.
Precisely.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
2014-08-15 13:18 ` Brian Foster
@ 2014-08-15 23:21 ` Dave Chinner
2014-08-18 14:15 ` Brian Foster
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:21 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Fri, Aug 15, 2014 at 09:18:21AM -0400, Brian Foster wrote:
> On Fri, Aug 15, 2014 at 04:39:00PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > We do some work in xfs_buf_ioend, and some work in
> > xfs_buf_iodone_work, but much of that functionality is the same.
> > This work can all be done in a single function, leaving
> > xfs_buf_iodone just a wrapper to determine if we should execute it
> > by workqueue or directly. hence rename xfs_buf_iodone_work to
> > xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
> > need async processing.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_buf.c | 79 +++++++++++++++++++++---------------------------
> > fs/xfs/xfs_buf.h | 2 +-
> > fs/xfs/xfs_buf_item.c | 4 +--
> > fs/xfs/xfs_inode.c | 2 +-
> > fs/xfs/xfs_log.c | 2 +-
> > fs/xfs/xfs_log_recover.c | 2 +-
> > 6 files changed, 40 insertions(+), 51 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 5d86bbd..1b7f0bc 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -999,54 +999,49 @@ xfs_buf_wait_unpin(
> > */
> >
> > STATIC void
> > -xfs_buf_iodone_work(
> > - struct work_struct *work)
> > +xfs_buf_ioend(
> > + struct xfs_buf *bp)
>
> Compile failure here due to STATIC.
>
> > {
> > - struct xfs_buf *bp =
> > - container_of(work, xfs_buf_t, b_iodone_work);
> > - bool read = !!(bp->b_flags & XBF_READ);
> > + bool read = !!(bp->b_flags & XBF_READ);
> > +
> > + trace_xfs_buf_iodone(bp, _RET_IP_);
> >
> > bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
> >
> > - /* only validate buffers that were read without errors */
> > - if (read && bp->b_ops && !bp->b_error && (bp->b_flags & XBF_DONE))
> > - bp->b_ops->verify_read(bp);
> > + if (!bp->b_error) {
> > + bp->b_flags |= XBF_DONE;
> > +
> > + /* only validate buffers that were read without errors */
> > + if (read && bp->b_ops)
> > + bp->b_ops->verify_read(bp);
> > + }
>
> Probably not a cause of errors, but this code is now executed twice for
> I/O with b_iodone callbacks.
reads don't have b_iodone callbacks.
> Once for the initial call from bio_end_io,
> again from the callback via the b_iodone handler. The flags bits are
> probably fine, but we don't want to be running the verifiers multiple
> times unnecessarily.
Which we don't ;)
> > @@ -1425,10 +1412,12 @@ xfs_buf_iorequest(
> > * waiting, and in the synchronous IO case it avoids unnecessary context
> > * switches an latency for high-peformance devices.
> > */
> > - if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > - _xfs_buf_ioend(bp, 0);
> > - else
> > - _xfs_buf_ioend(bp, 1);
> > + if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> > + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > + xfs_buf_ioend(bp);
> > + else
> > + xfs_buf_ioend_async(bp);
> > + }
>
> This looks cleaner, but the comment is out of whack at this point.
The code is functionally identical, so the comment didn't get
changed. As it is, the behaviour that exists in this patch goes away
in later patches, so it's mostly irrelevant that a comment is
absoultely correct in an intermediate point within the patch set.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling
2014-08-15 13:18 ` Brian Foster
@ 2014-08-15 23:25 ` Dave Chinner
0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:25 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Fri, Aug 15, 2014 at 09:18:37AM -0400, Brian Foster wrote:
> On Fri, Aug 15, 2014 at 04:39:01PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Currently the report of a bio error from completion
> > immediately marks the buffer with an error. The issue is that this
> > is racy w.r.t. synchronous IO - the submitter can see b_error being
> > set before the IO is complete, and hence we cannot differentiate
> > between submission failures and completion failures.
> >
> > Add an internal b_io_error field protected by the b_lock to catch IO
> > completion errors, and only propagate that to the buffer during
> > final IO completion handling. Hence we can tell in xfs_buf_iorequest
> > if we've had a submission failure bey checking bp->b_error before
> > dropping our b_io_remaining reference - that reference will prevent
> > b_io_error values from being propagated to b_error in the event that
> > completion races with submission.
> >
> > In doing so, allow xfs_buf_iorequest to return an error. That way,
> > the caller can check for submission errors safely if required, and
> > easily distinguish them from completion errors that come from
> > xfs_buf_iowait().
> >
>
> I don't see any changes to xfs_buf_iorequest() at all in this one. That
> aside, it looks fine. It's not how I was thinking about it, but a clear
> separation of submission error regardless.
Apart from adding a zeroing of b_io_error, no, it doesn't change.
What I meant to describe was the new capability this adds. I'll
rewrite the paragraph to say:
In doing so, xfs_buf_iorequest now has the capability to
distinguish between submission and completion errors, and hence if
we need to we can return submission errors directly and allow
callers to gather completion errors from xfs_buf_iowait().
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] xfs: xfs_bioerror can die.
2014-08-15 14:35 ` Brian Foster
@ 2014-08-15 23:27 ` Dave Chinner
0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:27 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Fri, Aug 15, 2014 at 10:35:41AM -0400, Brian Foster wrote:
> On Fri, Aug 15, 2014 at 04:39:03PM +1000, Dave Chinner wrote:
> > @@ -1149,19 +1119,19 @@ xfs_bwrite(
> > ASSERT(xfs_buf_islocked(bp));
> >
> > bp->b_flags |= XBF_WRITE;
> > - bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q | XBF_WRITE_FAIL);
> > + bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q |
> > + XBF_WRITE_FAIL | XBF_DONE);
> >
> > if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> > trace_xfs_bdstrat_shut(bp, _RET_IP_);
> >
> > - /*
> > - * Metadata write that didn't get logged but written anyway.
> > - * These aren't associated with a transaction, and can be
> > - * ignored.
> > - */
> > - if (!bp->b_iodone)
> > - return xfs_bioerror_relse(bp);
> > - return xfs_bioerror(bp);
> > + xfs_buf_ioerror(bp, -EIO);
> > + xfs_buf_stale(bp);
> > +
> > + /* sync IO, xfs_buf_ioend is going to remove a ref here */
> > + xfs_buf_hold(bp);
>
> Looks correct, but that's kind of ugly. The reference serves no purpose
> except to appease the error sequence. It gives the impression that the
> reference management should be handled at a higher level (and with truly
> synchronous I/O primitives/mechanisms, it is ;).
Oh, yeah, it's nasty ugly, but that goes away with patch 8. This is
just a temporary state that then gets factored neatly when the new
interfaces are introduced.
> At the very least, can we reorganize the ioend side of things to handle
> this? We already have the duplicate execution issue previously mentioned
> that has to get resolved. Some duplicate code is fine if it improves
> clarity, imo.
patch 8 does that - this is just preparing the code for being easy
to factor.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 13:10 ` Christoph Hellwig
@ 2014-08-15 23:37 ` Dave Chinner
2014-08-16 4:55 ` Christoph Hellwig
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:37 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, Aug 15, 2014 at 06:10:20AM -0700, Christoph Hellwig wrote:
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 2f1e30d..c53cc03 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -1157,12 +1157,7 @@ xfs_zero_remaining_bytes(
>
> xfs_zero_remaining_bytes really should be switched to
> xfs_buf_read_uncached + xfs_bwrite instead of all this mess just to
> micro-optimize a few memory allocation away that don't even happen for
> the common case of blocksize == PAGE_SIZE. I'm not even sure we
> should be using the buffer cache at all for something inside a regular
> file, but that's a discussion for another time.
xfs_zero_remaining_bytes is uses an uncached buffer, so we're not
using the buffer cache at all for the blocks being zeroed. That is
why it does the flag twiddling dance it does. However, consolidation
all the different block zeroing functions we have is an exercise for
a different day....
>
> > void
> > +xfs_buf_submit(
> > + struct xfs_buf *bp)
> > {
> > trace_xfs_buf_iorequest(bp, _RET_IP_);
>
> I suspect these two should have properly name and separate trace
> points now?
Yes. Just haven't got to it.
> It also seems like a lot of the guts of the two functions are
> still the same, so factoring that into a __xfs_buf_submit helper
> would be useful.
Possibly, if we can ensure that the helper it never called directly
by any other code. Then we end up back in the mess we are currently
in.
> > + * If _xfs_buf_ioapply failed,
> > + * we can get back here with only the IO
> > * reference we took above. _xfs_buf_ioend will drop it to zero, so
> > * we'd better run completion processing synchronously so that the we
> > + * don't return to the caller with completion still pending.
> > + * this allows the caller to check b_error safely without
> > + * waiting
> > */
> > if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> > if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
>
> I don't think the !ASYNC case can happen here, can it?
Right, I forget to clean that part up when I was splitting up the
functions.
>
> > + if (!wait)
> > list_del_init(&bp->b_list);
> > + else
> > + xfs_buf_hold(bp);
>
> Maybe switch this around to avoid the negated condition in the if else?
>
> Also might this be worth a change of it's own?
Yeah, that's probably a good idea - the algorithm change is not
directly related to the interface change.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 14:35 ` Brian Foster
@ 2014-08-15 23:39 ` Dave Chinner
2014-08-18 14:16 ` Brian Foster
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:39 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Fri, Aug 15, 2014 at 10:35:58AM -0400, Brian Foster wrote:
> On Fri, Aug 15, 2014 at 04:39:06PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > There is a lot of cookie-cutter code that looks like:
> >
> > if (shutdown)
> > handle buffer error
> > xfs_buf_iorequest(bp)
> > error = xfs_buf_iowait(bp)
> > if (error)
> > handle buffer error
> >
> > spread through XFS. There's significant complexity now in
> > xfs_buf_iorequest() to specifically handle this sort of synchronous
> > IO pattern, but there's all sorts of nasty surprises in different
> > error handling code dependent on who owns the buffer references and
> > the locks.
> >
> > Pull this pattern into a single helper, where we can hide all the
> > synchronous IO warts and hence make the error handling for all the
> > callers much saner. This removes the need for a special extra
> > reference to protect IO completion processing, as we can now hold a
> > single reference across dispatch and waiting, simplifying the sync
> > IO smeantics and error handling.
> >
> > In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
> > make it explicitly handle on asynchronous IO. This forces all users
> > to be switched specifically to one interface or the other and
> > removes any ambiguity between how the interfaces are to be used. It
> > also means that xfs_buf_iowait() goes away.
> >
> > For the special case of delwri buffer submission and waiting, we
> > don't need to issue IO synchronously at all. The second pass to cal
> > xfs_buf_iowait() can now just block on xfs_buf_lock() - the buffer
> > will be unlocked when the async IO is complete. This formalises a
> > sane the method of waiting for async IO - take an extra reference,
> > submit the IO, call xfs_buf_lock() when you want to wait for IO
> > completion. i.e.:
> >
> > bp = xfs_buf_find();
> > xfs_buf_hold(bp);
> > bp->b_flags |= XBF_ASYNC;
> > xfs_buf_iosubmit(bp);
> > xfs_buf_lock(bp)
> > error = bp->b_error;
> > ....
> > xfs_buf_relse(bp);
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
>
> On a quick look at submit_wait this looks pretty good. It actually
> implements the general model I've been looking for for sync I/O. E.g.,
> send the I/O, wait on synchronization, then check for errors. In other
> words, a pure synchronous mechanism. The refactoring and new helpers and
> whatnot are additional bonus and abstract it nicely.
>
> I still have to take a closer look to review the actual code, but since
> we go and remove the additional sync I/O reference counting business,
> why do we even add that stuff early on? Can't we get from where the
> current code is to here in a more direct manner?
Simply because we need a fix that we can backport, and that fix is a
simple addition that does not significantly affect the rest of the
patchset...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 16:13 ` Brian Foster
@ 2014-08-15 23:58 ` Dave Chinner
2014-08-18 14:26 ` Brian Foster
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:58 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Fri, Aug 15, 2014 at 12:13:20PM -0400, Brian Foster wrote:
> On Fri, Aug 15, 2014 at 04:39:06PM +1000, Dave Chinner wrote:
> > + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> > + xfs_buf_ioerror(bp, -EIO);
> > + bp->b_flags &= ~XBF_DONE;
> > + xfs_buf_stale(bp);
> > + xfs_buf_ioend(bp);
> > + return;
> > + }
> >
> > if (bp->b_flags & XBF_WRITE)
> > xfs_buf_wait_unpin(bp);
> > @@ -1311,25 +1307,10 @@ xfs_buf_iorequest(
> > bp->b_io_error = 0;
> >
>
> I know this is from the previous patch, but I wonder if it's cleaner to
> reset b_io_error when the error is transferred to b_error. That seems
> slightly more future proof at least.
I much prefer zeroing just before the variable is needed to be zero,
simply to indicate the context in which we care that the value is
correct. Outside of actively submitted IO, the value of b_io_error
is irrelevant, so it's value really doesn't matter. The other
advantage of leaving it untocuhed is for debug purposes - the caller
might clear b_error, but we still know what the state of the last IO
that was completed in the buffer was...
>
> > /*
> > - * Take references to the buffer. For XBF_ASYNC buffers, holding a
> > - * reference for as long as submission takes is all that is necessary
> > - * here. The IO inherits the lock and hold count from the submitter,
> > - * and these are release during IO completion processing. Taking a hold
> > - * over submission ensures that the buffer is not freed until we have
> > - * completed all processing, regardless of when IO errors occur or are
> > - * reported.
> > - *
> > - * However, for synchronous IO, the IO does not inherit the submitters
> > - * reference count, nor the buffer lock. Hence we need to take an extra
> > - * reference to the buffer for the for the IO context so that we can
> > - * guarantee the buffer is not freed until all IO completion processing
> > - * is done. Otherwise the caller can drop their reference while the IO
> > - * is still in progress and hence trigger a use-after-free situation.
> > + * Take an extra reference so that we don't have to guess when it's no
> > + * longer safe to reference the buffer that was passed to us.
> > */
>
> Assuming my understanding is correct:
>
> /*
> * The caller's reference is released by buffer I/O completion. Technically
> * this should not occur until we release the last b_io_remaining reference.
That's not quite right. The caller's reference is released some time
*after* b_io_remaining goes to zero. That's the reason we need to
hold a reference - after we drop our b_io_remaining count, we have
to have some other method of ensuring the buffer doesn't go away
until we have finished with the buffer.
I'll rewrite the comment.
> > +xfs_buf_submit_wait(
> > + struct xfs_buf *bp)
> > {
> > - trace_xfs_buf_iowait(bp, _RET_IP_);
> > + int error;
> > +
> > + trace_xfs_buf_iorequest(bp, _RET_IP_);
> >
> > - if (!bp->b_error)
> > - wait_for_completion(&bp->b_iowait);
> > + ASSERT(!(bp->b_flags & (_XBF_DELWRI_Q | XBF_ASYNC)));
> > +
> > + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> > + xfs_buf_ioerror(bp, -EIO);
> > + xfs_buf_stale(bp);
> > + bp->b_flags &= ~XBF_DONE;
> > + return -EIO;
> > + }
> >
> > + if (bp->b_flags & XBF_WRITE)
> > + xfs_buf_wait_unpin(bp);
> > +
> > + /* clear the internal error state to avoid spurious errors */
> > + bp->b_io_error = 0;
> > +
> > + /*
> > + * For synchronous IO, the IO does not inherit the submitters reference
> > + * count, nor the buffer lock. Hence we cannot release the reference we
> > + * are about to take until we've waited for all IO completion to occur,
> > + * including any xfs_buf_ioend_async() work that may be pending.
> > + */
> > + xfs_buf_hold(bp);
> > +
>
> Harmless, but why is this necessary? The caller should have the
> reference, the I/O completion won't release it and we wait on b_iowait
> before we return. Isn't the caller's reference sufficient?
Consistency - I'd prefer that all IO has the same reference counting
behaviour. i.e. that the IO holds it's own reference to ensure,
regardless of anything else that happens, that the buffer has a
valid reference count the entire time the IO subsystem processing
the buffer.
> > @@ -1838,7 +1853,10 @@ xfs_buf_delwri_submit(
> > bp = list_first_entry(&io_list, struct xfs_buf, b_list);
> >
> > list_del_init(&bp->b_list);
> > - error2 = xfs_buf_iowait(bp);
> > +
> > + /* locking the buffer will wait for async IO completion. */
> > + xfs_buf_lock(bp);
> > + error2 = bp->b_error;
>
> Interesting delwri cleanup overall. The lock here works for
> synchronization (blocking), but it doesn't look safe for error
> processing. Once the buffer lock is dropped, who says b_error is from
> our write (and not a subsequent, for example) and thus this caller's
> responsibility to handle the error? I suspect this is a reason the lock
> is typically held across a sync I/O, so the error is valid after the
> I/O.
It's fine because there is only a limited set of blocking callers,
all of which are special cases. The only callers that use this
blocking xfs_buf_delwri_submit() interface are:
1. log recovery: running single threaded, so isn't going
to be racing with anything else that can modify the error
2. quotacheck: also running single threaded
3. quota shrinker, which has nowhere to report an error to,
so the error status simply doesn't matter.
> Also, it looks like blocking on async I/O as such opens up the
> possibility of blocking indefinitely on failing writes if the right
> combination of delwri and b_iodone handler is used (see
> xfs_buf_iodone_callbacks()). I'm not sure if that's a real problem
> today, though.
I don't think it is - we don't permanently rewrite metadata writes
from IO completion anymore. We retry once, but then require higher
layers to restart the IO again (e.g. the xfsaild). Hence we can't
get stuck forever on async write errors.
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 4ba19bf..1e14452 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -193,12 +193,8 @@ xlog_bread_noalign(
> > bp->b_io_length = nbblks;
> > bp->b_error = 0;
> >
> > - if (XFS_FORCED_SHUTDOWN(log->l_mp))
> > - return -EIO;
> > -
> > - xfs_buf_iorequest(bp);
> > - error = xfs_buf_iowait(bp);
> > - if (error)
> > + error = xfs_buf_submit_wait(bp);
> > + if (error && !XFS_FORCED_SHUTDOWN(log->l_mp))
> > xfs_buf_ioerror_alert(bp, __func__);
> > return error;
> > }
> > @@ -4427,16 +4423,12 @@ xlog_do_recover(
> > XFS_BUF_UNASYNC(bp);
> > bp->b_ops = &xfs_sb_buf_ops;
> >
> > - if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
> > - xfs_buf_relse(bp);
> > - return -EIO;
> > - }
> > -
> > - xfs_buf_iorequest(bp);
> > - error = xfs_buf_iowait(bp);
> > + error = xfs_buf_submit_wait(bp);
> > if (error) {
> > - xfs_buf_ioerror_alert(bp, __func__);
> > - ASSERT(0);
> > + if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
>
> Should this be !XFS_FORCED_SHUTDOWN()?
Right, good catch, especially after reading all that code ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly
2014-08-15 12:56 ` Christoph Hellwig
@ 2014-08-15 23:58 ` Dave Chinner
2014-08-29 0:37 ` Christoph Hellwig
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-15 23:58 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, Aug 15, 2014 at 05:56:08AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 15, 2014 at 04:39:07PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > xfs_buf_read_uncached() has two failure modes. If can either return
> > NULL or bp->b_error != 0 depending on the type of failure, and not
> > all callers check for both. Fix it up.
>
> I'd rather get rid of these annoying calling conventions. Make it
> return and errno, and the bp in a pointer argument, with the bp
> never non-NULL in case of error.
Ok. I considered that, then just did the simple thing.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 23:37 ` Dave Chinner
@ 2014-08-16 4:55 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-16 4:55 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Sat, Aug 16, 2014 at 09:37:43AM +1000, Dave Chinner wrote:
> xfs_zero_remaining_bytes is uses an uncached buffer, so we're not
> using the buffer cache at all for the blocks being zeroed. That is
> why it does the flag twiddling dance it does. However, consolidation
> all the different block zeroing functions we have is an exercise for
> a different day....
Well, we're using buffers. Anyway, below is what I think it should
look like when using buffers. Although I wonder how either the old
or new variant pass the verifier check in _xfs_buf_ioapply for v5
filesystems given that we don't pass any ops in.
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2f1e30d..c495dce 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1122,14 +1122,6 @@ xfs_zero_remaining_bytes(
if (endoff > XFS_ISIZE(ip))
endoff = XFS_ISIZE(ip);
- bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
- mp->m_rtdev_targp : mp->m_ddev_targp,
- BTOBB(mp->m_sb.sb_blocksize), 0);
- if (!bp)
- return -ENOMEM;
-
- xfs_buf_unlock(bp);
-
for (offset = startoff; offset <= endoff; offset = lastoffset + 1) {
uint lock_mode;
@@ -1152,42 +1144,26 @@ xfs_zero_remaining_bytes(
ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
if (imap.br_state == XFS_EXT_UNWRITTEN)
continue;
- XFS_BUF_UNDONE(bp);
- XFS_BUF_UNWRITE(bp);
- XFS_BUF_READ(bp);
- XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock));
- if (XFS_FORCED_SHUTDOWN(mp)) {
- error = -EIO;
- break;
- }
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
- if (error) {
- xfs_buf_ioerror_alert(bp,
- "xfs_zero_remaining_bytes(read)");
- break;
+ bp = xfs_buf_read_uncached(XFS_IS_REALTIME_INODE(ip) ?
+ mp->m_rtdev_targp : mp->m_ddev_targp,
+ xfs_fsb_to_db(ip, imap.br_startblock),
+ BTOBB(mp->m_sb.sb_blocksize),
+ 0, NULL);
+ if (!bp)
+ return -ENOMEM;
+ if (bp->b_error) {
+ error = bp->b_error;
+ xfs_buf_relse(bp);
+ return error;
}
+
memset(bp->b_addr +
(offset - XFS_FSB_TO_B(mp, imap.br_startoff)),
0, lastoffset - offset + 1);
- XFS_BUF_UNDONE(bp);
- XFS_BUF_UNREAD(bp);
- XFS_BUF_WRITE(bp);
- if (XFS_FORCED_SHUTDOWN(mp)) {
- error = -EIO;
- break;
- }
- xfs_buf_iorequest(bp);
- error = xfs_buf_iowait(bp);
- if (error) {
- xfs_buf_ioerror_alert(bp,
- "xfs_zero_remaining_bytes(write)");
- break;
- }
+ xfs_bwrite(bp);
}
- xfs_buf_free(bp);
return error;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 1/9] xfs: synchronous buffer IO needs a reference
2014-08-15 23:17 ` Dave Chinner
@ 2014-08-18 14:15 ` Brian Foster
0 siblings, 0 replies; 43+ messages in thread
From: Brian Foster @ 2014-08-18 14:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Sat, Aug 16, 2014 at 09:17:36AM +1000, Dave Chinner wrote:
> On Fri, Aug 15, 2014 at 09:18:04AM -0400, Brian Foster wrote:
> > On Fri, Aug 15, 2014 at 04:38:59PM +1000, Dave Chinner wrote:
> ....
> > > if (bp->b_flags & XBF_WRITE)
> > > xfs_buf_wait_unpin(bp);
> > > +
> > > + /*
> > > + * Take references to the buffer. For XBF_ASYNC buffers, holding a
> > > + * reference for as long as submission takes is all that is necessary
> > > + * here. The IO inherits the lock and hold count from the submitter,
> > > + * and these are release during IO completion processing. Taking a hold
> > > + * over submission ensures that the buffer is not freed until we have
> > > + * completed all processing, regardless of when IO errors occur or are
> > > + * reported.
> > > + *
> > > + * However, for synchronous IO, the IO does not inherit the submitters
> > > + * reference count, nor the buffer lock. Hence we need to take an extra
> > > + * reference to the buffer for the for the IO context so that we can
> > > + * guarantee the buffer is not freed until all IO completion processing
> > > + * is done. Otherwise the caller can drop their reference while the IO
> > > + * is still in progress and hence trigger a use-after-free situation.
> > > + */
> > > xfs_buf_hold(bp);
> > > + if (!(bp->b_flags & XBF_ASYNC))
> > > + xfs_buf_hold(bp);
> > > +
> > >
> > > /*
> > > - * Set the count to 1 initially, this will stop an I/O
> > > - * completion callout which happens before we have started
> > > - * all the I/O from calling xfs_buf_ioend too early.
> > > + * Set the count to 1 initially, this will stop an I/O completion
> > > + * callout which happens before we have started all the I/O from calling
> > > + * xfs_buf_ioend too early.
> > > */
> > > atomic_set(&bp->b_io_remaining, 1);
> > > _xfs_buf_ioapply(bp);
> > > +
> > > /*
> > > - * If _xfs_buf_ioapply failed, we'll get back here with
> > > - * only the reference we took above. _xfs_buf_ioend will
> > > - * drop it to zero, so we'd better not queue it for later,
> > > - * or we'll free it before it's done.
> > > + * If _xfs_buf_ioapply failed or we are doing synchronous IO that
> > > + * completes extremely quickly, we can get back here with only the IO
> > > + * reference we took above. _xfs_buf_ioend will drop it to zero, so
> > > + * we'd better run completion processing synchronously so that the we
> > > + * don't return to the caller with completion still pending. In the
> > > + * error case, this allows the caller to check b_error safely without
> > > + * waiting, and in the synchronous IO case it avoids unnecessary context
> > > + * switches an latency for high-peformance devices.
> > > */
> >
> > AFAICT there is no real wait if the buf has completed at this point. The
> > wait just decrements the completion counter.
>
> If the IO has completed, then we run the completion code.
>
> > So what's the benefit of
> > "not waiting?" Where is the potential context switch?
>
> async work for completion processing on synchrnous IO means we queue
> the work, then sleep in xfs_buf_iowait(). Two context switches, plus
> a work queue execution
>
Right...
> > Are you referring
> > to the case where error is set but I/O is not complete? Are you saying
> > the advantage to the caller is it doesn't have to care about the state
> > of further I/O once it has been determined at least one error has
> > occurred? (If so, who cares about latency given that some operation that
> > depends on this I/O is already doomed to fail?).
>
> No, you're reading *way* too much into this. For sync IO, it's
> always best to process completion inline. For async, it doesn't
> matter, but if there's a submission error is *more effecient* to
> process it in the current context.
>
Heh. Sure, that makes sense. Perhaps it's just the way I read it,
implying that how we process I/O completion effects what the calling
code should look like. Simple case of the comment being a bit more
confusing than the code. ;) FWIW, the following is more clear to me:
/*
* If _xfs_buf_ioapply failed or we are doing synchronous IO that
* completes extremely quickly, we can get back here with only the IO
* reference we took above. _xfs_buf_ioend will drop it to zero. Run
* completion processing synchronously so that we don't return to the
* caller with completion still pending. This avoids unnecessary context
* switches associated with the end_io workqueue.
*/
Thanks for the explanation.
Brian
> > The code looks fine, but I'm trying to understand the reasoning better
> > (and I suspect we can clarify the comment).
> >
> > > - _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> > > + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > > + _xfs_buf_ioend(bp, 0);
> > > + else
> > > + _xfs_buf_ioend(bp, 1);
> >
> > Not related to this patch, but it seems like the problem this code tries
> > to address is still possible.
>
> The race condition is still possible - it just won't result in a
> use-after-free. The race condition is not fixed until patch 8,
> but as a backportable fix, this patch is much, much simpler.
>
> > Perhaps this papers over a particular
> > instance. Consider the case where an I/O fails immediately after this
> > call completes, but not before. We have an extra reference now for
> > completion, but we can still return to the caller with completion
> > pending. I suppose its fine if we consider the "problem" to be that the
> > reference goes away underneath the completion, as opposed to the caller
> > caring about the status of completion.
>
> Precisely.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
2014-08-15 23:21 ` Dave Chinner
@ 2014-08-18 14:15 ` Brian Foster
0 siblings, 0 replies; 43+ messages in thread
From: Brian Foster @ 2014-08-18 14:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Sat, Aug 16, 2014 at 09:21:35AM +1000, Dave Chinner wrote:
> On Fri, Aug 15, 2014 at 09:18:21AM -0400, Brian Foster wrote:
> > On Fri, Aug 15, 2014 at 04:39:00PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > We do some work in xfs_buf_ioend, and some work in
> > > xfs_buf_iodone_work, but much of that functionality is the same.
> > > This work can all be done in a single function, leaving
> > > xfs_buf_iodone just a wrapper to determine if we should execute it
> > > by workqueue or directly. hence rename xfs_buf_iodone_work to
> > > xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
> > > need async processing.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > > fs/xfs/xfs_buf.c | 79 +++++++++++++++++++++---------------------------
> > > fs/xfs/xfs_buf.h | 2 +-
> > > fs/xfs/xfs_buf_item.c | 4 +--
> > > fs/xfs/xfs_inode.c | 2 +-
> > > fs/xfs/xfs_log.c | 2 +-
> > > fs/xfs/xfs_log_recover.c | 2 +-
> > > 6 files changed, 40 insertions(+), 51 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > index 5d86bbd..1b7f0bc 100644
> > > --- a/fs/xfs/xfs_buf.c
> > > +++ b/fs/xfs/xfs_buf.c
> > > @@ -999,54 +999,49 @@ xfs_buf_wait_unpin(
> > > */
> > >
> > > STATIC void
> > > -xfs_buf_iodone_work(
> > > - struct work_struct *work)
> > > +xfs_buf_ioend(
> > > + struct xfs_buf *bp)
> >
> > Compile failure here due to STATIC.
> >
> > > {
> > > - struct xfs_buf *bp =
> > > - container_of(work, xfs_buf_t, b_iodone_work);
> > > - bool read = !!(bp->b_flags & XBF_READ);
> > > + bool read = !!(bp->b_flags & XBF_READ);
> > > +
> > > + trace_xfs_buf_iodone(bp, _RET_IP_);
> > >
> > > bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD);
> > >
> > > - /* only validate buffers that were read without errors */
> > > - if (read && bp->b_ops && !bp->b_error && (bp->b_flags & XBF_DONE))
> > > - bp->b_ops->verify_read(bp);
> > > + if (!bp->b_error) {
> > > + bp->b_flags |= XBF_DONE;
> > > +
> > > + /* only validate buffers that were read without errors */
> > > + if (read && bp->b_ops)
> > > + bp->b_ops->verify_read(bp);
> > > + }
> >
> > Probably not a cause of errors, but this code is now executed twice for
> > I/O with b_iodone callbacks.
>
> reads don't have b_iodone callbacks.
>
Ah, Ok.
> > Once for the initial call from bio_end_io,
> > again from the callback via the b_iodone handler. The flags bits are
> > probably fine, but we don't want to be running the verifiers multiple
> > times unnecessarily.
>
> Which we don't ;)
>
Good point, but that's still a landmine IMO. It looks like the previous
code would avoid it for sync I/O, but not for async. You could probably
avoid it generally via a new flag or just by going off of XBF_DONE. The
latter seems logical to me. A comment wouldn't hurt either.
> > > @@ -1425,10 +1412,12 @@ xfs_buf_iorequest(
> > > * waiting, and in the synchronous IO case it avoids unnecessary context
> > > * switches an latency for high-peformance devices.
> > > */
> > > - if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > > - _xfs_buf_ioend(bp, 0);
> > > - else
> > > - _xfs_buf_ioend(bp, 1);
> > > + if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> > > + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > > + xfs_buf_ioend(bp);
> > > + else
> > > + xfs_buf_ioend_async(bp);
> > > + }
> >
> > This looks cleaner, but the comment is out of whack at this point.
>
> The code is functionally identical, so the comment didn't get
> changed. As it is, the behaviour that exists in this patch goes away
> in later patches, so it's mostly irrelevant that a comment is
> absoultely correct in an intermediate point within the patch set.
>
This was just a minor point that the comment refers to _xfs_buf_ioend().
That obviously no longer exists but the comment is still around at the
end of the series.
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 23:39 ` Dave Chinner
@ 2014-08-18 14:16 ` Brian Foster
0 siblings, 0 replies; 43+ messages in thread
From: Brian Foster @ 2014-08-18 14:16 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Sat, Aug 16, 2014 at 09:39:35AM +1000, Dave Chinner wrote:
> On Fri, Aug 15, 2014 at 10:35:58AM -0400, Brian Foster wrote:
> > On Fri, Aug 15, 2014 at 04:39:06PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > There is a lot of cookie-cutter code that looks like:
> > >
> > > if (shutdown)
> > > handle buffer error
> > > xfs_buf_iorequest(bp)
> > > error = xfs_buf_iowait(bp)
> > > if (error)
> > > handle buffer error
> > >
> > > spread through XFS. There's significant complexity now in
> > > xfs_buf_iorequest() to specifically handle this sort of synchronous
> > > IO pattern, but there's all sorts of nasty surprises in different
> > > error handling code dependent on who owns the buffer references and
> > > the locks.
> > >
> > > Pull this pattern into a single helper, where we can hide all the
> > > synchronous IO warts and hence make the error handling for all the
> > > callers much saner. This removes the need for a special extra
> > > reference to protect IO completion processing, as we can now hold a
> > > single reference across dispatch and waiting, simplifying the sync
> > > IO smeantics and error handling.
> > >
> > > In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
> > > make it explicitly handle on asynchronous IO. This forces all users
> > > to be switched specifically to one interface or the other and
> > > removes any ambiguity between how the interfaces are to be used. It
> > > also means that xfs_buf_iowait() goes away.
> > >
> > > For the special case of delwri buffer submission and waiting, we
> > > don't need to issue IO synchronously at all. The second pass to cal
> > > xfs_buf_iowait() can now just block on xfs_buf_lock() - the buffer
> > > will be unlocked when the async IO is complete. This formalises a
> > > sane the method of waiting for async IO - take an extra reference,
> > > submit the IO, call xfs_buf_lock() when you want to wait for IO
> > > completion. i.e.:
> > >
> > > bp = xfs_buf_find();
> > > xfs_buf_hold(bp);
> > > bp->b_flags |= XBF_ASYNC;
> > > xfs_buf_iosubmit(bp);
> > > xfs_buf_lock(bp)
> > > error = bp->b_error;
> > > ....
> > > xfs_buf_relse(bp);
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> >
> > On a quick look at submit_wait this looks pretty good. It actually
> > implements the general model I've been looking for for sync I/O. E.g.,
> > send the I/O, wait on synchronization, then check for errors. In other
> > words, a pure synchronous mechanism. The refactoring and new helpers and
> > whatnot are additional bonus and abstract it nicely.
> >
> > I still have to take a closer look to review the actual code, but since
> > we go and remove the additional sync I/O reference counting business,
> > why do we even add that stuff early on? Can't we get from where the
> > current code is to here in a more direct manner?
>
> Simply because we need a fix that we can backport, and that fix is a
> simple addition that does not significantly affect the rest of the
> patchset...
>
Ok, it wasn't clear that multiple fixes was the intent. I guess it's
unforunate this won't see much tot testing, but we can find other ways
to make sure it's tested. ;)
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait]
2014-08-15 23:58 ` Dave Chinner
@ 2014-08-18 14:26 ` Brian Foster
0 siblings, 0 replies; 43+ messages in thread
From: Brian Foster @ 2014-08-18 14:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Sat, Aug 16, 2014 at 09:58:04AM +1000, Dave Chinner wrote:
> On Fri, Aug 15, 2014 at 12:13:20PM -0400, Brian Foster wrote:
> > On Fri, Aug 15, 2014 at 04:39:06PM +1000, Dave Chinner wrote:
> > > + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> > > + xfs_buf_ioerror(bp, -EIO);
> > > + bp->b_flags &= ~XBF_DONE;
> > > + xfs_buf_stale(bp);
> > > + xfs_buf_ioend(bp);
> > > + return;
> > > + }
> > >
> > > if (bp->b_flags & XBF_WRITE)
> > > xfs_buf_wait_unpin(bp);
> > > @@ -1311,25 +1307,10 @@ xfs_buf_iorequest(
> > > bp->b_io_error = 0;
> > >
> >
> > I know this is from the previous patch, but I wonder if it's cleaner to
> > reset b_io_error when the error is transferred to b_error. That seems
> > slightly more future proof at least.
>
> I much prefer zeroing just before the variable is needed to be zero,
> simply to indicate the context in which we care that the value is
> correct. Outside of actively submitted IO, the value of b_io_error
> is irrelevant, so it's value really doesn't matter. The other
> advantage of leaving it untocuhed is for debug purposes - the caller
> might clear b_error, but we still know what the state of the last IO
> that was completed in the buffer was...
>
Sounds good, I don't really have a strong preference.
> >
> > > /*
> > > - * Take references to the buffer. For XBF_ASYNC buffers, holding a
> > > - * reference for as long as submission takes is all that is necessary
> > > - * here. The IO inherits the lock and hold count from the submitter,
> > > - * and these are release during IO completion processing. Taking a hold
> > > - * over submission ensures that the buffer is not freed until we have
> > > - * completed all processing, regardless of when IO errors occur or are
> > > - * reported.
> > > - *
> > > - * However, for synchronous IO, the IO does not inherit the submitters
> > > - * reference count, nor the buffer lock. Hence we need to take an extra
> > > - * reference to the buffer for the for the IO context so that we can
> > > - * guarantee the buffer is not freed until all IO completion processing
> > > - * is done. Otherwise the caller can drop their reference while the IO
> > > - * is still in progress and hence trigger a use-after-free situation.
> > > + * Take an extra reference so that we don't have to guess when it's no
> > > + * longer safe to reference the buffer that was passed to us.
> > > */
> >
> > Assuming my understanding is correct:
> >
> > /*
> > * The caller's reference is released by buffer I/O completion. Technically
> > * this should not occur until we release the last b_io_remaining reference.
>
> That's not quite right. The caller's reference is released some time
> *after* b_io_remaining goes to zero. That's the reason we need to
> hold a reference - after we drop our b_io_remaining count, we have
> to have some other method of ensuring the buffer doesn't go away
> until we have finished with the buffer.
>
Right, that's what I meant by the fact that we have to release the last
b_io_remaining ref one way or another first...
> I'll rewrite the comment.
>
Ok, the rest of the comment I wrote isn't really clear when I re-read it
back either.
> > > +xfs_buf_submit_wait(
> > > + struct xfs_buf *bp)
> > > {
> > > - trace_xfs_buf_iowait(bp, _RET_IP_);
> > > + int error;
> > > +
> > > + trace_xfs_buf_iorequest(bp, _RET_IP_);
> > >
> > > - if (!bp->b_error)
> > > - wait_for_completion(&bp->b_iowait);
> > > + ASSERT(!(bp->b_flags & (_XBF_DELWRI_Q | XBF_ASYNC)));
> > > +
> > > + if (XFS_FORCED_SHUTDOWN(bp->b_target->bt_mount)) {
> > > + xfs_buf_ioerror(bp, -EIO);
> > > + xfs_buf_stale(bp);
> > > + bp->b_flags &= ~XBF_DONE;
> > > + return -EIO;
> > > + }
> > >
> > > + if (bp->b_flags & XBF_WRITE)
> > > + xfs_buf_wait_unpin(bp);
> > > +
> > > + /* clear the internal error state to avoid spurious errors */
> > > + bp->b_io_error = 0;
> > > +
> > > + /*
> > > + * For synchronous IO, the IO does not inherit the submitters reference
> > > + * count, nor the buffer lock. Hence we cannot release the reference we
> > > + * are about to take until we've waited for all IO completion to occur,
> > > + * including any xfs_buf_ioend_async() work that may be pending.
> > > + */
> > > + xfs_buf_hold(bp);
> > > +
> >
> > Harmless, but why is this necessary? The caller should have the
> > reference, the I/O completion won't release it and we wait on b_iowait
> > before we return. Isn't the caller's reference sufficient?
>
> Consistency - I'd prefer that all IO has the same reference counting
> behaviour. i.e. that the IO holds it's own reference to ensure,
> regardless of anything else that happens, that the buffer has a
> valid reference count the entire time the IO subsystem processing
> the buffer.
>
Sounds good.
> > > @@ -1838,7 +1853,10 @@ xfs_buf_delwri_submit(
> > > bp = list_first_entry(&io_list, struct xfs_buf, b_list);
> > >
> > > list_del_init(&bp->b_list);
> > > - error2 = xfs_buf_iowait(bp);
> > > +
> > > + /* locking the buffer will wait for async IO completion. */
> > > + xfs_buf_lock(bp);
> > > + error2 = bp->b_error;
> >
> > Interesting delwri cleanup overall. The lock here works for
> > synchronization (blocking), but it doesn't look safe for error
> > processing. Once the buffer lock is dropped, who says b_error is from
> > our write (and not a subsequent, for example) and thus this caller's
> > responsibility to handle the error? I suspect this is a reason the lock
> > is typically held across a sync I/O, so the error is valid after the
> > I/O.
>
> It's fine because there is only a limited set of blocking callers,
> all of which are special cases. The only callers that use this
> blocking xfs_buf_delwri_submit() interface are:
>
> 1. log recovery: running single threaded, so isn't going
> to be racing with anything else that can modify the error
>
> 2. quotacheck: also running single threaded
>
> 3. quota shrinker, which has nowhere to report an error to,
> so the error status simply doesn't matter.
>
Sure, to be honest I wasn't really expecting there to be a scenario
where this is currently possible. I figured the log recovery context was
obviously not a concern because that occurs on mount. I hadn't gone to
look at the other contexts where we use delwri. quotacheck makes sense
for the same reason. We also use delwri for AIL pushing, but not the
synchronous version.
My comment was more from the perspective of this code will be around for
a long time and this little characteristic of the mechanism is not at
all explicit or obvious. So it won't ever be a problem until somebody
uses sync delwri in a context where racing is possible. Then it won't
ever reproduce until some user drives said mechanism and hits an error
in just the right manner to lead to some undefined error recovery
behavior. Maybe that doesn't happen for 20 years, but whenever it does,
it's guaranteed to not be fun to debug. ;) So my point is not to suggest
there's a race somewhere. It's just that the mechanism is racy and I'd
prefer we eliminate the possibility of having to debug the flaw that
this leaves behind. :) Another way to look at it is that we can't ever
use this delwri mechanism from anywhere but such special, isolated
contexts going forward. If we ever find that we want to, that punts the
problem to a prereq for whatever work that happens to be.
It's not clear to me if there's a way to deal with that without
fundamentally changing this mechanism. Anything explicit probably just
adds ugliness that's dedicated specifically to the fact that this is
racy (e.g., delwri specific lock/counter), so I don't think we want to
go there.
It seems like the more general problem is that we have two I/O models
(sync and async) that are fundamentally incompatible, but here we try to
build a batch sync I/O mechanism on top of the async submission
mechanism. So the definition of the async model is no longer sufficient
for our use, since we clearly have a use case to wait on an async I/O.
Perhaps we should think more about making the async/sync mechanisms more
generic/compatible..? Thinking out loud, making the lock context
dynamically transferrable to the I/O might be an interesting experiment
to start to decouple things (much like we do for joining items to
transactions, for example).
FWIW, I'm Ok with deferring that or adding it to my own todo list based
on the fact that there are currently no racing contexts, so long as the
async is a subset of sync I/O model is generally acceptable.
> > Also, it looks like blocking on async I/O as such opens up the
> > possibility of blocking indefinitely on failing writes if the right
> > combination of delwri and b_iodone handler is used (see
> > xfs_buf_iodone_callbacks()). I'm not sure if that's a real problem
> > today, though.
>
> I don't think it is - we don't permanently rewrite metadata writes
> from IO completion anymore. We retry once, but then require higher
> layers to restart the IO again (e.g. the xfsaild). Hence we can't
> get stuck forever on async write errors.
>
Ok, we have the XBF_WRITE_FAIL thing now. Forgot about that.
Brian
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index 4ba19bf..1e14452 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -193,12 +193,8 @@ xlog_bread_noalign(
> > > bp->b_io_length = nbblks;
> > > bp->b_error = 0;
> > >
> > > - if (XFS_FORCED_SHUTDOWN(log->l_mp))
> > > - return -EIO;
> > > -
> > > - xfs_buf_iorequest(bp);
> > > - error = xfs_buf_iowait(bp);
> > > - if (error)
> > > + error = xfs_buf_submit_wait(bp);
> > > + if (error && !XFS_FORCED_SHUTDOWN(log->l_mp))
> > > xfs_buf_ioerror_alert(bp, __func__);
> > > return error;
> > > }
> > > @@ -4427,16 +4423,12 @@ xlog_do_recover(
> > > XFS_BUF_UNASYNC(bp);
> > > bp->b_ops = &xfs_sb_buf_ops;
> > >
> > > - if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
> > > - xfs_buf_relse(bp);
> > > - return -EIO;
> > > - }
> > > -
> > > - xfs_buf_iorequest(bp);
> > > - error = xfs_buf_iowait(bp);
> > > + error = xfs_buf_submit_wait(bp);
> > > if (error) {
> > > - xfs_buf_ioerror_alert(bp, __func__);
> > > - ASSERT(0);
> > > + if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
> >
> > Should this be !XFS_FORCED_SHUTDOWN()?
>
> Right, good catch, especially after reading all that code ;)
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/9] xfs: synchronous buffer IO needs a reference
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
2014-08-15 13:18 ` Brian Foster
@ 2014-08-29 0:18 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:18 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
2014-08-15 6:39 ` [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality Dave Chinner
2014-08-15 13:18 ` Brian Foster
@ 2014-08-29 0:22 ` Christoph Hellwig
2014-08-29 0:55 ` Dave Chinner
1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:22 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> STATIC void
needs the static removed as pointed out by Brian.
> - bool read = !!(bp->b_flags & XBF_READ);
> + bool read = !!(bp->b_flags & XBF_READ);
We don't really need the double negation here.
Otherwise this looks reasonable to me.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling
2014-08-15 6:39 ` [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling Dave Chinner
2014-08-15 13:18 ` Brian Foster
@ 2014-08-29 0:23 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:23 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:01PM +1000, Dave Chinner wrote:
> In doing so, allow xfs_buf_iorequest to return an error. That way,
> the caller can check for submission errors safely if required, and
> easily distinguish them from completion errors that come from
> xfs_buf_iowait().
While this looks correct to me it also looks confusing. Why can't
we simply propagat the submission errors through error returns
and use ->b_error for completion errors instead?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 4/9] xfs: kill xfs_bdstrat_cb
2014-08-15 6:39 ` [PATCH 4/9] xfs: kill xfs_bdstrat_cb Dave Chinner
@ 2014-08-29 0:24 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:02PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Only has two callers, and is just a shutdown check and error handler
> around xfs_buf_iorequest. However, the error handling is a mess of
> read and write semantics, and both internal callers only call it for
> writes. Hence kill the wrapper, and follow up with a patch to
> sanitise the error handling.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] xfs: xfs_bioerror can die.
2014-08-15 6:39 ` [PATCH 5/9] xfs: xfs_bioerror can die Dave Chinner
2014-08-15 14:35 ` Brian Foster
@ 2014-08-29 0:28 ` Christoph Hellwig
2014-08-29 1:05 ` Dave Chinner
1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 15, 2014 at 04:39:03PM +1000, Dave Chinner wrote:
> Internal buffer write error handling is a mess due to the unnatural
> split between xfs_bioerror and xfs_bioerror_relse(). The buffer
> reference counting is also wrong for the xfs_bioerror path for
> xfs_bwrite - it uses sync IO and so xfs_buf_ioend will release a
> hold count that was never taken.
Which seems to be cause by patch 1 in the series, and thus the
additional hold for the error case should be introduced there.
Otherwise this looks good to me.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/9] xfs: kill xfs_bioerror_relse
2014-08-15 6:39 ` [PATCH 6/9] xfs: kill xfs_bioerror_relse Dave Chinner
@ 2014-08-29 0:32 ` Christoph Hellwig
2014-08-29 1:12 ` Dave Chinner
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> index 96c898e..758c07d 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -324,11 +324,13 @@ xfs_trans_read_buf_map(
> */
> if (XFS_FORCED_SHUTDOWN(mp)) {
> trace_xfs_bdstrat_shut(bp, _RET_IP_);
> - xfs_bioerror_relse(bp);
> - } else {
> - xfs_buf_iorequest(bp);
> + bp->b_flags &= ~(XBF_READ | XBF_DONE);
> + xfs_buf_ioerror(bp, -EIO);
> + xfs_buf_stale(bp);
> + return -EIO;
> }
This is a large change of behavior as it doesn't hit the error
path after the xfs_buf_iowait anymore. While I don't think that
that path was entirely correct this version seems to be even less so
by not releasing the buffer reference or forcing the shutdown.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly
2014-08-15 23:58 ` Dave Chinner
@ 2014-08-29 0:37 ` Christoph Hellwig
0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 0:37 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Sat, Aug 16, 2014 at 09:58:43AM +1000, Dave Chinner wrote:
> > I'd rather get rid of these annoying calling conventions. Make it
> > return and errno, and the bp in a pointer argument, with the bp
> > never non-NULL in case of error.
>
> Ok. I considered that, then just did the simple thing.
There's only about a dozen callers of all xfs_buf_read* variants, so
I think switching them over to return an errno shouldn't be much of a
problem.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality
2014-08-29 0:22 ` Christoph Hellwig
@ 2014-08-29 0:55 ` Dave Chinner
0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-29 0:55 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Aug 28, 2014 at 05:22:09PM -0700, Christoph Hellwig wrote:
> > STATIC void
>
> needs the static removed as pointed out by Brian.
> > - bool read = !!(bp->b_flags & XBF_READ);
> > + bool read = !!(bp->b_flags & XBF_READ);
>
> We don't really need the double negation here.
Good point. I'll fix it.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/9] xfs: xfs_bioerror can die.
2014-08-29 0:28 ` Christoph Hellwig
@ 2014-08-29 1:05 ` Dave Chinner
0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-29 1:05 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Aug 28, 2014 at 05:28:25PM -0700, Christoph Hellwig wrote:
> On Fri, Aug 15, 2014 at 04:39:03PM +1000, Dave Chinner wrote:
> > Internal buffer write error handling is a mess due to the unnatural
> > split between xfs_bioerror and xfs_bioerror_relse(). The buffer
> > reference counting is also wrong for the xfs_bioerror path for
> > xfs_bwrite - it uses sync IO and so xfs_buf_ioend will release a
> > hold count that was never taken.
>
> Which seems to be cause by patch 1 in the series, and thus the
> additional hold for the error case should be introduced there.
will fix.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/9] xfs: kill xfs_bioerror_relse
2014-08-29 0:32 ` Christoph Hellwig
@ 2014-08-29 1:12 ` Dave Chinner
2014-08-29 18:26 ` Christoph Hellwig
0 siblings, 1 reply; 43+ messages in thread
From: Dave Chinner @ 2014-08-29 1:12 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Aug 28, 2014 at 05:32:57PM -0700, Christoph Hellwig wrote:
> > index 96c898e..758c07d 100644
> > --- a/fs/xfs/xfs_trans_buf.c
> > +++ b/fs/xfs/xfs_trans_buf.c
> > @@ -324,11 +324,13 @@ xfs_trans_read_buf_map(
> > */
> > if (XFS_FORCED_SHUTDOWN(mp)) {
> > trace_xfs_bdstrat_shut(bp, _RET_IP_);
> > - xfs_bioerror_relse(bp);
> > - } else {
> > - xfs_buf_iorequest(bp);
> > + bp->b_flags &= ~(XBF_READ | XBF_DONE);
> > + xfs_buf_ioerror(bp, -EIO);
> > + xfs_buf_stale(bp);
> > + return -EIO;
> > }
>
> This is a large change of behavior as it doesn't hit the error
> path after the xfs_buf_iowait anymore. While I don't think that
> that path was entirely correct this version seems to be even less so
> by not releasing the buffer reference or forcing the shutdown.
The IO is synchronous, so the previous behaviour did not release
the buffer here. But, yes, it needs to because we're not running the
io wait on it anymore. And this happens only during a shutdown, so i
don't see any need to trigger a shutdown ;)
As it is, I think this gets properly fixed by the next patch....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/9] xfs: kill xfs_bioerror_relse
2014-08-29 1:12 ` Dave Chinner
@ 2014-08-29 18:26 ` Christoph Hellwig
2014-08-30 0:05 ` Dave Chinner
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2014-08-29 18:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Fri, Aug 29, 2014 at 11:12:36AM +1000, Dave Chinner wrote:
> > This is a large change of behavior as it doesn't hit the error
> > path after the xfs_buf_iowait anymore. While I don't think that
> > that path was entirely correct this version seems to be even less so
> > by not releasing the buffer reference or forcing the shutdown.
>
> The IO is synchronous, so the previous behaviour did not release
> the buffer here. But, yes, it needs to because we're not running the
> io wait on it anymore. And this happens only during a shutdown, so i
> don't see any need to trigger a shutdown ;)
>
> As it is, I think this gets properly fixed by the next patch....
Do you have any scenario that might actually exercise this path? I
enabled xfs_trans_read_buf_io trace events and run xfstests as well
as a few other workloads and couldn't hit it at all. And when you
think of it: when would be do a trans_get_buf, then not actually
updating it with data and then recurse into a trans_read_buf in
the same transaction?
Maybe it's just time do a bit more of an audit and put this whole
code path to rest.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/9] xfs: kill xfs_bioerror_relse
2014-08-29 18:26 ` Christoph Hellwig
@ 2014-08-30 0:05 ` Dave Chinner
0 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2014-08-30 0:05 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, Aug 29, 2014 at 11:26:10AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 29, 2014 at 11:12:36AM +1000, Dave Chinner wrote:
> > > This is a large change of behavior as it doesn't hit the error
> > > path after the xfs_buf_iowait anymore. While I don't think that
> > > that path was entirely correct this version seems to be even less so
> > > by not releasing the buffer reference or forcing the shutdown.
> >
> > The IO is synchronous, so the previous behaviour did not release
> > the buffer here. But, yes, it needs to because we're not running the
> > io wait on it anymore. And this happens only during a shutdown, so i
> > don't see any need to trigger a shutdown ;)
> >
> > As it is, I think this gets properly fixed by the next patch....
>
> Do you have any scenario that might actually exercise this path? I
> enabled xfs_trans_read_buf_io trace events and run xfstests as well
> as a few other workloads and couldn't hit it at all. And when you
> think of it: when would be do a trans_get_buf, then not actually
> updating it with data and then recurse into a trans_read_buf in
> the same transaction?
*nod*. This path has been there is the Irix days, and I suspect that
it was there to handle some wierd corner case to do with async
buffer reads which IIRC, could once be done through this code...
> Maybe it's just time do a bit more of an audit and put this whole
> code path to rest.
Perhaps so.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2014-08-30 0:05 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-15 23:17 ` Dave Chinner
2014-08-18 14:15 ` Brian Foster
2014-08-29 0:18 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-15 23:21 ` Dave Chinner
2014-08-18 14:15 ` Brian Foster
2014-08-29 0:22 ` Christoph Hellwig
2014-08-29 0:55 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-15 23:25 ` Dave Chinner
2014-08-29 0:23 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 4/9] xfs: kill xfs_bdstrat_cb Dave Chinner
2014-08-29 0:24 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 5/9] xfs: xfs_bioerror can die Dave Chinner
2014-08-15 14:35 ` Brian Foster
2014-08-15 23:27 ` Dave Chinner
2014-08-29 0:28 ` Christoph Hellwig
2014-08-29 1:05 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 6/9] xfs: kill xfs_bioerror_relse Dave Chinner
2014-08-29 0:32 ` Christoph Hellwig
2014-08-29 1:12 ` Dave Chinner
2014-08-29 18:26 ` Christoph Hellwig
2014-08-30 0:05 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 7/9] xfs: clean up xfs_trans_buf_read_map Dave Chinner
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
2014-08-15 13:10 ` Christoph Hellwig
2014-08-15 23:37 ` Dave Chinner
2014-08-16 4:55 ` Christoph Hellwig
2014-08-15 14:35 ` Brian Foster
2014-08-15 23:39 ` Dave Chinner
2014-08-18 14:16 ` Brian Foster
2014-08-15 16:13 ` Brian Foster
2014-08-15 23:58 ` Dave Chinner
2014-08-18 14:26 ` Brian Foster
2014-08-15 6:39 ` [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly Dave Chinner
2014-08-15 12:56 ` Christoph Hellwig
2014-08-15 23:58 ` Dave Chinner
2014-08-29 0:37 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox