From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: sandeen@sandeen.net, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/8] libxfs: enable tools to check that metadata updates have been committed
Date: Thu, 20 Feb 2020 10:26:42 -0800 [thread overview]
Message-ID: <20200220182642.GZ9506@magnolia> (raw)
In-Reply-To: <20200220175850.GH48977@bfoster>
On Thu, Feb 20, 2020 at 12:58:50PM -0500, Brian Foster wrote:
> On Thu, Feb 20, 2020 at 08:46:38AM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 20, 2020 at 09:06:12AM -0500, Brian Foster wrote:
> > > On Wed, Feb 19, 2020 at 05:42:06PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > Add a new function that will ensure that everything we changed has
> > > > landed on stable media, and report the results. Subsequent commits will
> > > > teach the individual programs to report when things go wrong.
> > > >
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > include/xfs_mount.h | 3 +++
> > > > libxfs/init.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> > > > libxfs/libxfs_io.h | 2 ++
> > > > libxfs/rdwr.c | 27 +++++++++++++++++++++++++--
> > > > 4 files changed, 73 insertions(+), 2 deletions(-)
> > > >
> > > >
> > > > diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> > > > index 29b3cc1b..c80aaf69 100644
> > > > --- a/include/xfs_mount.h
> > > > +++ b/include/xfs_mount.h
> > > > @@ -187,4 +187,7 @@ extern xfs_mount_t *libxfs_mount (xfs_mount_t *, xfs_sb_t *,
> > > > extern void libxfs_umount (xfs_mount_t *);
> > > > extern void libxfs_rtmount_destroy (xfs_mount_t *);
> > > >
> > > > +void libxfs_flush_devices(struct xfs_mount *mp, int *datadev, int *logdev,
> > > > + int *rtdev);
> > > > +
> > > > #endif /* __XFS_MOUNT_H__ */
> > > > diff --git a/libxfs/init.c b/libxfs/init.c
> > > > index a0d4b7f4..d1d3f4df 100644
> > > > --- a/libxfs/init.c
> > > > +++ b/libxfs/init.c
> > > > @@ -569,6 +569,8 @@ libxfs_buftarg_alloc(
> > > > }
> > > > btp->bt_mount = mp;
> > > > btp->dev = dev;
> > > > + btp->lost_writes = false;
> > > > +
> > > > return btp;
> > > > }
> > > >
> > > > @@ -791,6 +793,47 @@ libxfs_rtmount_destroy(xfs_mount_t *mp)
> > > > mp->m_rsumip = mp->m_rbmip = NULL;
> > > > }
> > > >
> > > > +static inline int
> > > > +libxfs_flush_buftarg(
> > > > + struct xfs_buftarg *btp)
> > > > +{
> > > > + if (btp->lost_writes)
> > > > + return -ENOTRECOVERABLE;
> > >
> > > I'm curious why we'd want to skip the flush just because some writes
> > > happened to fail..? I suppose the fs might be borked, but it seems a
> > > little strange to at least not try the flush, particularly since we
> > > might still flush any of the other two possible devices.
> >
> > My thinking here was that if the write verifiers (or the pwrite() calls
> > themselves) failed then there's no point in telling the disk to flush
> > its write cache since we already know it's not in sync with the buffer
> > cache.
> >
>
> I suppose, but it seems there is some value in flushing what we did
> write.. That's effectively historical behavior (since we ignored
> errors), right?
It's the historical behavior, yes. I don't think it makes much sense,
but OTOH I'm not opposed to restoring that.
> > > > +
> > > > + return libxfs_blkdev_issue_flush(btp);
> > > > +}
> > > > +
> > > > +/*
> > > > + * Purge the buffer cache to write all dirty buffers to disk and free all
> > > > + * incore buffers. Buffers that cannot be written will cause the lost_writes
> > > > + * flag to be set in the buftarg. If there were no lost writes, flush the
> > > > + * device to make sure the writes made it to stable storage.
> > > > + *
> > > > + * For each device, the return code will be set to -ENOTRECOVERABLE if we
> > > > + * couldn't write something to disk; or the results of the block device flush
> > > > + * operation.
> > >
> > > Why not -EIO?
> >
> > Originally I thought it might be useful to be able to distinguish
> > between "dirty buffers never even made it out of the buffer cache" vs.
> > "dirty buffers were sent to the disk but the disk sent back media
> > errors", though in the end the userspace tools don't make any
> > distinction.
> >
> > That said, looking at this again, maybe I should track write verifier
> > failure separately so that we can return EFSCORRUPTED for that?
> >
>
> It's not clear to me that anything application level would care much
> about verifier failure vs. I/O failure, but I've no objection to doing
> something like that either.
Yeah. The single usecase I can think of is where repair trips over a
write verifier and we should make it really obvious to the sysadmin that
repair is buggy and needs either (a) an upgrade or (b) a complaint filed
on linux-xfs.
--D
> Brian
>
> > --D
> >
> > >
> > > Brian
> > >
> > > > + */
> > > > +void
> > > > +libxfs_flush_devices(
> > > > + struct xfs_mount *mp,
> > > > + int *datadev,
> > > > + int *logdev,
> > > > + int *rtdev)
> > > > +{
> > > > + *datadev = *logdev = *rtdev = 0;
> > > > +
> > > > + libxfs_bcache_purge();
> > > > +
> > > > + if (mp->m_ddev_targp)
> > > > + *datadev = libxfs_flush_buftarg(mp->m_ddev_targp);
> > > > +
> > > > + if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
> > > > + *logdev = libxfs_flush_buftarg(mp->m_logdev_targp);
> > > > +
> > > > + if (mp->m_rtdev_targp)
> > > > + *rtdev = libxfs_flush_buftarg(mp->m_rtdev_targp);
> > > > +}
> > > > +
> > > > /*
> > > > * Release any resource obtained during a mount.
> > > > */
> > > > diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
> > > > index 579df52b..fc0fd060 100644
> > > > --- a/libxfs/libxfs_io.h
> > > > +++ b/libxfs/libxfs_io.h
> > > > @@ -23,10 +23,12 @@ struct xfs_perag;
> > > > struct xfs_buftarg {
> > > > struct xfs_mount *bt_mount;
> > > > dev_t dev;
> > > > + bool lost_writes;
> > > > };
> > > >
> > > > extern void libxfs_buftarg_init(struct xfs_mount *mp, dev_t ddev,
> > > > dev_t logdev, dev_t rtdev);
> > > > +int libxfs_blkdev_issue_flush(struct xfs_buftarg *btp);
> > > >
> > > > #define LIBXFS_BBTOOFF64(bbs) (((xfs_off_t)(bbs)) << BBSHIFT)
> > > >
> > > > diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> > > > index 8b47d438..92e497f9 100644
> > > > --- a/libxfs/rdwr.c
> > > > +++ b/libxfs/rdwr.c
> > > > @@ -17,6 +17,7 @@
> > > > #include "xfs_inode_fork.h"
> > > > #include "xfs_inode.h"
> > > > #include "xfs_trans.h"
> > > > +#include "libfrog/platform.h"
> > > >
> > > > #include "libxfs.h" /* for LIBXFS_EXIT_ON_FAILURE */
> > > >
> > > > @@ -1227,9 +1228,11 @@ libxfs_brelse(
> > > >
> > > > if (!bp)
> > > > return;
> > > > - if (bp->b_flags & LIBXFS_B_DIRTY)
> > > > + if (bp->b_flags & LIBXFS_B_DIRTY) {
> > > > fprintf(stderr,
> > > > "releasing dirty buffer to free list!\n");
> > > > + bp->b_target->lost_writes = true;
> > > > + }
> > > >
> > > > pthread_mutex_lock(&xfs_buf_freelist.cm_mutex);
> > > > list_add(&bp->b_node.cn_mru, &xfs_buf_freelist.cm_list);
> > > > @@ -1248,9 +1251,11 @@ libxfs_bulkrelse(
> > > > return 0 ;
> > > >
> > > > list_for_each_entry(bp, list, b_node.cn_mru) {
> > > > - if (bp->b_flags & LIBXFS_B_DIRTY)
> > > > + if (bp->b_flags & LIBXFS_B_DIRTY) {
> > > > fprintf(stderr,
> > > > "releasing dirty buffer (bulk) to free list!\n");
> > > > + bp->b_target->lost_writes = true;
> > > > + }
> > > > count++;
> > > > }
> > > >
> > > > @@ -1479,6 +1484,24 @@ libxfs_irele(
> > > > kmem_cache_free(xfs_inode_zone, ip);
> > > > }
> > > >
> > > > +/*
> > > > + * Flush everything dirty in the kernel and disk write caches to stable media.
> > > > + * Returns 0 for success or a negative error code.
> > > > + */
> > > > +int
> > > > +libxfs_blkdev_issue_flush(
> > > > + struct xfs_buftarg *btp)
> > > > +{
> > > > + int fd, ret;
> > > > +
> > > > + if (btp->dev == 0)
> > > > + return 0;
> > > > +
> > > > + fd = libxfs_device_to_fd(btp->dev);
> > > > + ret = platform_flush_device(fd, btp->dev);
> > > > + return ret ? -errno : 0;
> > > > +}
> > > > +
> > > > /*
> > > > * Write out a buffer list synchronously.
> > > > *
> > > >
> > >
> >
>
next prev parent reply other threads:[~2020-02-20 18:26 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-20 1:41 [PATCH v2 0/8] xfsprogs: actually check that writes succeeded Darrick J. Wong
2020-02-20 1:41 ` [PATCH 1/8] libxfs: libxfs_buf_delwri_submit should write buffers immediately Darrick J. Wong
2020-02-20 1:41 ` [PATCH 2/8] libxfs: complain when write IOs fail Darrick J. Wong
2020-02-20 1:42 ` [PATCH 3/8] libxfs: return flush failures Darrick J. Wong
2020-02-20 1:42 ` [PATCH 4/8] libxfs: enable tools to check that metadata updates have been committed Darrick J. Wong
2020-02-20 14:06 ` Brian Foster
2020-02-20 16:46 ` Darrick J. Wong
2020-02-20 17:58 ` Brian Foster
2020-02-20 18:26 ` Darrick J. Wong [this message]
2020-02-20 18:50 ` Brian Foster
2020-02-20 23:40 ` Dave Chinner
2020-02-21 0:33 ` Darrick J. Wong
2020-02-20 1:42 ` [PATCH 5/8] xfs_db: " Darrick J. Wong
2020-02-20 14:06 ` Brian Foster
2020-02-20 16:58 ` Darrick J. Wong
2020-02-20 17:58 ` Brian Foster
2020-02-20 18:34 ` Darrick J. Wong
2020-02-21 0:01 ` Dave Chinner
2020-02-21 0:39 ` Darrick J. Wong
2020-02-21 1:17 ` Dave Chinner
2020-02-20 1:42 ` [PATCH 6/8] mkfs: " Darrick J. Wong
2020-02-20 1:42 ` [PATCH 7/8] xfs_repair: " Darrick J. Wong
2020-02-20 1:42 ` [PATCH 8/8] libfrog: always fsync when flushing a device Darrick J. Wong
2020-02-20 14:06 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200220182642.GZ9506@magnolia \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox